Welcome!

Blog Feed Post

Better Incident Postmortems

While a major incident is ongoing, all of your focus is on restoring service: watch the smoke, figure out where the fire is, and put it out. But after service has been restored — the incident is resolved, the adrenaline has drained, and it’s peace-time — that’s the time to learn from what happened and to use those learnings to get better at resolving, responding, and preventing future incidents. The core best practice that enable this cycle of improvement is the postmortem process, and PagerDuty is pleased to introduce integrated support for postmortems in our full lifecycle incident management platform! Coupled with several other PagerDuty capabilities, such as system and operational efficiency analytics and the Operations Command Console, we now provide everything you need to learn and proactively improve both the resiliency of your infrastructure and your resolution process.

PagerDuty improves all parts of the postmortem process, from building the timeline all the way through to tracking the status of postmortems. Construct a timeline with relevant PagerDuty and chat activity in minutes instead of hours, then use that detailed breakdown to efficiently investigate root cause, assess response effectiveness, and determine the most important follow-up actions. We’ve taken the friction out of conducting effective postmortems, so that more of your postmortem time can be focused on learning and less on manual work. How easy can your postmortems be? Let’s take a look!

Now you can kick off the postmortem process for an incident in a single click:

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 1600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 360w" sizes="(max-width: 1024px) 100vw, 1024px" />

Investigate

With the postmortem report created, it’s time to roll up our sleeves and start investigating what actually happened. We’ll want to pull in activity from our already existing sources of communication and incident response: chat and PagerDuty. Our PagerDuty incident information was automatically associated with our new postmortem, so let’s add in the relevant chat channels:

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 1532w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 360w" sizes="(max-width: 1024px) 100vw, 1024px" />

Now we can review the combined activity available from the incident and these chat rooms, and include in the postmortem timeline exactly those bits that are most relevant to understanding how the incident played out. We want to cover several aspects of the incident: the technology systems involved, our response effectiveness, and resolution steps.

Postmortem Timeline

Including an item in the postmortem timeline is also just a single click — no cut and paste, no switching between applications, no error-prone and manual time zone math. The full range of PagerDuty activity can be included: incident state changes, notes, escalations, notifications, when additional responders were requested, when status updates were dispatched to stakeholders, and more. Once the activity is in the timeline, you can also annotate to describe its relevance to the incident, as seen here:

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 1600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 360w" sizes="(max-width: 1024px) 100vw, 1024px" />

Analyze

With the timeline built out, we can continue on to the analysis phase. This consists of summarizing what happened, identifying the underlying root cause, calling out the path to resolution, and so on. This step is key as it enables the team to introspect on what worked well and where we could have done better, then identify the most important improvements to pursue as action items. All of this is easy to capture within the postmortem editor, which also provides instructions for approaching each of these sections:

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 1148w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 360w" sizes="(max-width: 698px) 100vw, 698px" />

And it’s as simple as that!

Streamline Postmortem Management

Not only is individual postmortem construction easier and more effective, the overall process is also significantly streamlined. All postmortems are available in the catalog.

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 1600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 360w" sizes="(max-width: 1024px) 100vw, 1024px" />

This makes it easy to locate postmortems, identify impactful long-running incidents, and see which postmortems are still in progress, or are already complete. Postmortems can also be exported as PDFs for distribution or archiving, and both the report template and per-section instructions for authors can be customized to fit the needs of your organization. Together, all of these tools provide a complete end-to-end postmortem process that is both easy to use and easy to manage.

This suite of functionality helps you get the most from postmortems:

  • Timeline building is faster, less painful, and enables broader insights.
  • It’s far easier to manage the postmortem process with a simplified toolchain.
  • Your team can accelerate continuous improvement by getting more and better learnings, while spending less time on the process.

We hope that this capability makes it as easy as possible for your team to facilitate a culture of shared learning. And if you’re interested in learning more, download our free post-mortem handbook for best practices on conducting effective postmortems.

PagerDuty Postmortems is included for all customers on our Standard and Enterprise plans. To get started, check out the support article here!

 

The post Better Incident Postmortems appeared first on PagerDuty.

Read the original blog entry...

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.

Latest Stories
Regardless of what business you’re in, it’s increasingly a software-driven business. Consumers’ rising expectations for connected digital and physical experiences are driving what some are calling the "Customer Experience Challenge.” In his session at @DevOpsSummit at 20th Cloud Expo, Marco Morales, Director of Global Solutions at CollabNet, will discuss how organizations are increasingly adopting a discipline of Value Stream Mapping to ensure that the software they are producing is poised to o...
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
For financial firms, the cloud is going to increasingly become a crucial part of dealing with customers over the next five years and beyond, particularly with the growing use and acceptance of virtual currencies. There are new data storage paradigms on the horizon that will deliver secure solutions for storing and moving sensitive financial data around the world without touching terrestrial networks. In his session at 20th Cloud Expo, Cliff Beek, President of Cloud Constellation Corporation, w...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that EARP Integration will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. EARP Integration is a passionate software house. Since its inception in 2009 the company successfully delivers smart solutions for cities and factories that start their digital transformation. EARP provides bespoke solutions like, for example, advanced enterprise portals, business intelligence systems an...
IBM helps FinTechs and financial services companies build and monetize cognitive-enabled financial services apps quickly and at scale. Hosted on IBM Bluemix, IBM’s platform builds in customer insights, regulatory compliance analytics and security to help reduce development time and testing. In his session at 20th Cloud Expo, Tom Eck, Industry Platforms CTO at IBM Cloud, will discuss how these tools simplify the time-consuming tasks of selection, mapping and data integration, allowing developers ...
SYS-CON Events announced today that Outscale, a global pure play Infrastructure as a Service provider and strategic partner of Dassault Systèmes, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2010, Outscale simplifies infrastructure complexities and boosts the business agility of its customers. Outscale delivers a secure, reliable and industrial strength solution for its customers, which in...
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs oft...
Interested in leveling up on your Cloud Foundry skills? Join IBM for Cloud Foundry Days on June 7 at Cloud Expo New York at the Javits Center in New York City. Cloud Foundry Days is a free half day educational conference and networking event. Come find out why Cloud Foundry is the industry's fastest-growing and most adopted cloud application platform.
In order to meet the rapidly changing demands of today’s customers, companies are continually forced to redefine their business strategies in order to meet these needs, stay relevant and continue to see profitable growth. IoT deployment and development is integral in this transformation, and today businesses are increasingly seeing the value of investing their resources into IoT deployments. These technologies are able increase ROI through projects such as connecting supply chains or enabling sm...
Most DevOps journeys involve several phases of maturity. Research shows that the inflection point where organizations begin to see maximum value is when they implement tight integration deploying their code to their infrastructure. Success at this level is the last barrier to at-will deployment. Storage, for instance, is more capable than where we read and write data. In his session at @DevOpsSummit at 20th Cloud Expo, Josh Atwell, a Developer Advocate for NetApp, will discuss the role and value...
As cloud adoption continues to transform business, today's global enterprises are challenged with managing a growing amount of information living outside of the data center. The rapid adoption of IoT and increasingly mobile workforce are exacerbating the problem. Ensuring secure data sharing and efficient backup poses capacity and bandwidth considerations as well as policy and regulatory compliance issues.
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Ge...
When NSA's digital armory was leaked, it was only a matter of time before the code was morphed into a ransom seeking worm. This talk, designed for C-level attendees, demonstrates a Live Hack of a virtual environment to show the ease in which any average user can leverage these tools and infiltrate their network environment. This session will include an overview of the Shadbrokers NSA leak situation.
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.