Welcome!

Blog Feed Post

Better Incident Postmortems

While a major incident is ongoing, all of your focus is on restoring service: watch the smoke, figure out where the fire is, and put it out. But after service has been restored — the incident is resolved, the adrenaline has drained, and it’s peace-time — that’s the time to learn from what happened and to use those learnings to get better at resolving, responding, and preventing future incidents. The core best practice that enable this cycle of improvement is the postmortem process, and PagerDuty is pleased to introduce integrated support for postmortems in our full lifecycle incident management platform! Coupled with several other PagerDuty capabilities, such as system and operational efficiency analytics and the Operations Command Console, we now provide everything you need to learn and proactively improve both the resiliency of your infrastructure and your resolution process.

PagerDuty improves all parts of the postmortem process, from building the timeline all the way through to tracking the status of postmortems. Construct a timeline with relevant PagerDuty and chat activity in minutes instead of hours, then use that detailed breakdown to efficiently investigate root cause, assess response effectiveness, and determine the most important follow-up actions. We’ve taken the friction out of conducting effective postmortems, so that more of your postmortem time can be focused on learning and less on manual work. How easy can your postmortems be? Let’s take a look!

Now you can kick off the postmortem process for an incident in a single click:

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 1600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 360w" sizes="(max-width: 1024px) 100vw, 1024px" />

Investigate

With the postmortem report created, it’s time to roll up our sleeves and start investigating what actually happened. We’ll want to pull in activity from our already existing sources of communication and incident response: chat and PagerDuty. Our PagerDuty incident information was automatically associated with our new postmortem, so let’s add in the relevant chat channels:

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 1532w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-contexu... 360w" sizes="(max-width: 1024px) 100vw, 1024px" />

Now we can review the combined activity available from the incident and these chat rooms, and include in the postmortem timeline exactly those bits that are most relevant to understanding how the incident played out. We want to cover several aspects of the incident: the technology systems involved, our response effectiveness, and resolution steps.

Postmortem Timeline

Including an item in the postmortem timeline is also just a single click — no cut and paste, no switching between applications, no error-prone and manual time zone math. The full range of PagerDuty activity can be included: incident state changes, notes, escalations, notifications, when additional responders were requested, when status updates were dispatched to stakeholders, and more. Once the activity is in the timeline, you can also annotate to describe its relevance to the incident, as seen here:

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 1600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 360w" sizes="(max-width: 1024px) 100vw, 1024px" />

Analyze

With the timeline built out, we can continue on to the analysis phase. This consists of summarizing what happened, identifying the underlying root cause, calling out the path to resolution, and so on. This step is key as it enables the team to introspect on what worked well and where we could have done better, then identify the most important improvements to pursue as action items. All of this is easy to capture within the postmortem editor, which also provides instructions for approaching each of these sections:

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 1148w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-overvie... 360w" sizes="(max-width: 698px) 100vw, 698px" />

And it’s as simple as that!

Streamline Postmortem Management

Not only is individual postmortem construction easier and more effective, the overall process is also significantly streamlined. All postmortems are available in the catalog.

https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 180w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 1600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 600w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 500w, https://www.pagerduty.com/wp-content/uploads/2017/05/post-mortem-inciden... 360w" sizes="(max-width: 1024px) 100vw, 1024px" />

This makes it easy to locate postmortems, identify impactful long-running incidents, and see which postmortems are still in progress, or are already complete. Postmortems can also be exported as PDFs for distribution or archiving, and both the report template and per-section instructions for authors can be customized to fit the needs of your organization. Together, all of these tools provide a complete end-to-end postmortem process that is both easy to use and easy to manage.

This suite of functionality helps you get the most from postmortems:

  • Timeline building is faster, less painful, and enables broader insights.
  • It’s far easier to manage the postmortem process with a simplified toolchain.
  • Your team can accelerate continuous improvement by getting more and better learnings, while spending less time on the process.

We hope that this capability makes it as easy as possible for your team to facilitate a culture of shared learning. And if you’re interested in learning more, download our free post-mortem handbook for best practices on conducting effective postmortems.

PagerDuty Postmortems is included for all customers on our Standard and Enterprise plans. To get started, check out the support article here!

 

The post Better Incident Postmortems appeared first on PagerDuty.

Read the original blog entry...

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.

Latest Stories
Containers are rapidly finding their way into enterprise data centers, but change is difficult. How do enterprises transform their architecture with technologies like containers without losing the reliable components of their current solutions? In his session at @DevOpsSummit at 21st Cloud Expo, Tony Campbell, Director, Educational Services at CoreOS, will explore the challenges organizations are facing today as they move to containers and go over how Kubernetes applications can deploy with lega...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, will provide a fun and simple way to introduce Machine Leaning to anyone and everyone. Together we will solve a machine learning problem and find an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intellige...
Today most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes significant work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reducti...
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
As hybrid cloud becomes the de-facto standard mode of operation for most enterprises, new challenges arise on how to efficiently and economically share data across environments. In his session at 21st Cloud Expo, Dr. Allon Cohen, VP of Product at Elastifile, will explore new techniques and best practices that help enterprise IT benefit from the advantages of hybrid cloud environments by enabling data availability for both legacy enterprise and cloud-native mission critical applications. By rev...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, will lead you through the exciting evolution of the cloud. He'll look at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering ...
SYS-CON Events announced today that Ryobi Systems will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ryobi Systems Co., Ltd., as an information service company, specialized in business support for local governments and medical industry. We are challenging to achive the precision farming with AI. For more information, visit http:...
Amazon is pursuing new markets and disrupting industries at an incredible pace. Almost every industry seems to be in its crosshairs. Companies and industries that once thought they were safe are now worried about being “Amazoned.”. The new watch word should be “Be afraid. Be very afraid.” In his session 21st Cloud Expo, Chris Kocher, a co-founder of Grey Heron, will address questions such as: What new areas is Amazon disrupting? How are they doing this? Where are they likely to go? What are th...
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: imple...
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, will discuss how by using...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...