Blog Feed Post

Announcing the Modern Incident Resolution Lifecycle

Today, we’re excited to announce a suite of new functionality to power even faster resolution and accelerate learning from major business-impacting incidents with the definitive Incident Resolution Lifecycle. With this release, we help you to differentiate major incidents from other day-to-day operational issues, and easily adopt best practices to streamline incident resolution and learning in your organization. These stages include:

  • Assess — Enable responders to quickly diagnose local vs. global impact by using groupings of alerts and transparently communicating priority to others.
  • Respond — Coordinate across teams, collaborate your way using tools of your choice, and engage stakeholders to orchestrate business-wide response and drive even faster resolution.
  • Learn — Build postmortem timelines in minutes rather than hours and initiate the conversation on how to learn from past incidents and improve as an organization.

modern incident resolution lifecyclehttps://www.pagerduty.com/wp-content/uploads/2017/05/process-diagram-it-... 300w, https://www.pagerduty.com/wp-content/uploads/2017/05/process-diagram-it-... 768w, https://www.pagerduty.com/wp-content/uploads/2017/05/process-diagram-it-... 1024w, https://www.pagerduty.com/wp-content/uploads/2017/05/process-diagram-it-... 250w, https://www.pagerduty.com/wp-content/uploads/2017/05/process-diagram-it-... 180w" sizes="(max-width: 2200px) 100vw, 2200px" />

The Need for Better Incident Resolution

Complexity is on the rise. To meet the rising demands of customers, organizations are being forced to scale their operations in ways that introduce additional complexity and chaos. More people are involved in operations and in incident response, across an ever-increasing mix of systems, applications, tools, and layers of abstraction, resulting in more and more risk to the business.

As digital operations scale up within an organization — especially when developers are given operational responsibilities to own the services they build in production —  one of the core challenges becomes ensuring the best possible customer experience during an outage. Organizations looking to improve their incident response must first establish consistent practices, roles, and terminology.

Own the Incident Response Process

Many organizations assign the role of establishing and refining the incident resolution process to one person or team. At PagerDuty, we benefit from working directly with our customers — some of the most mature digital operations teams in the world. Whether you choose to call it “insights engineering” or SRE (site reliability engineering), or simply, “major incident management,” the first crucial step is answering this question: what is an incident to your product or service?

1. What is an incident?

Distinguishing from day-to-day operational maintenance issues and customer-impacting incidents can be difficult, which is exactly why this assessment is best performed by the individual teams in their area of the product. Giving those teams a framework for triage decisions (P1 through P5, or Sev-1 through Sev-3, or whatever levels you decide to use) is fundamental to establishing common ground during a firefight. This new capability in PagerDuty now helps everyone distinguish major incidents from other minor operational or untriaged issues.

2. How do you respond to an incident?

The next step is establishing how your organization responds to incidents. If you can define clear roles for individuals involved in the response, this goes a long way in ensuring an effective process. Once again, PagerDuty’s open-sourced incident response best practices is a great resource for what we’ve seen commonly in operationally mature organizations and what we practice ourselves. We do actually practice the process in all circumstances, including during our Failure Fridays.

3. Own the Tools

The third and final step is also likely the biggest challenge: driving consistency of your process at scale. This is why we frequently see incident management process owners build or manage the tools they want the organization to use. In this area, PagerDuty aims to make organizational adoption of your process much easier in two ways: through automation and simplification.

Integrate Your Toolchain

If you are using an ITSM or ticketing solution such as ServiceNow or JIRA software (see all of our integrations), we are greatly expanding our integrations with both products to eliminate duplicate effort by responders or incident managers and ensuring the output of the assessment phase can feed seamlessly into your tool of choice. We are also introducing additional extensibility that allows you to create custom actions directly accessible via the incident in PagerDuty — simplifying troubleshooting by automating common tasks or remediations.

In order to streamline your process, we’re also introducing our new incident postmortem builder to help teams greatly simplify the act of reviewing and learning from a major incident. Post-mortems, also known as incident reports, post-incident reports, or root cause analysis, are critical for facilitating the right culture around continuous learning and improvement of both services and the incident response process. In addition, we’ve also expanded our permissions model to ensure that teams can manage their own artifacts while adhering to your top-level process.

As the leader in digital operations management, PagerDuty helps you scale both your on-call process and your incident resolution process, no matter where you are in your operational maturity. Do you own the incident resolution process or tools for your organization? Tell us what has worked for you and where we can continue to improve in order to better support you!

Check out all of our new capabilities by signing up for a free 14-day trial of PagerDuty.


Note: the Incident Priority functionality and our new JIRA Extension are both in limited availability for Standard & Enterprise customers at this time. Please reach out to [email protected] to enable it on your account.

The post Announcing the Modern Incident Resolution Lifecycle appeared first on PagerDuty.

Read the original blog entry...

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.

Latest Stories
Regardless of what business you’re in, it’s increasingly a software-driven business. Consumers’ rising expectations for connected digital and physical experiences are driving what some are calling the "Customer Experience Challenge.” In his session at @DevOpsSummit at 20th Cloud Expo, Marco Morales, Director of Global Solutions at CollabNet, will discuss how organizations are increasingly adopting a discipline of Value Stream Mapping to ensure that the software they are producing is poised to o...
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
For financial firms, the cloud is going to increasingly become a crucial part of dealing with customers over the next five years and beyond, particularly with the growing use and acceptance of virtual currencies. There are new data storage paradigms on the horizon that will deliver secure solutions for storing and moving sensitive financial data around the world without touching terrestrial networks. In his session at 20th Cloud Expo, Cliff Beek, President of Cloud Constellation Corporation, w...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that EARP Integration will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. EARP Integration is a passionate software house. Since its inception in 2009 the company successfully delivers smart solutions for cities and factories that start their digital transformation. EARP provides bespoke solutions like, for example, advanced enterprise portals, business intelligence systems an...
IBM helps FinTechs and financial services companies build and monetize cognitive-enabled financial services apps quickly and at scale. Hosted on IBM Bluemix, IBM’s platform builds in customer insights, regulatory compliance analytics and security to help reduce development time and testing. In his session at 20th Cloud Expo, Tom Eck, Industry Platforms CTO at IBM Cloud, will discuss how these tools simplify the time-consuming tasks of selection, mapping and data integration, allowing developers ...
SYS-CON Events announced today that Outscale, a global pure play Infrastructure as a Service provider and strategic partner of Dassault Systèmes, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2010, Outscale simplifies infrastructure complexities and boosts the business agility of its customers. Outscale delivers a secure, reliable and industrial strength solution for its customers, which in...
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs oft...
Interested in leveling up on your Cloud Foundry skills? Join IBM for Cloud Foundry Days on June 7 at Cloud Expo New York at the Javits Center in New York City. Cloud Foundry Days is a free half day educational conference and networking event. Come find out why Cloud Foundry is the industry's fastest-growing and most adopted cloud application platform.
In order to meet the rapidly changing demands of today’s customers, companies are continually forced to redefine their business strategies in order to meet these needs, stay relevant and continue to see profitable growth. IoT deployment and development is integral in this transformation, and today businesses are increasingly seeing the value of investing their resources into IoT deployments. These technologies are able increase ROI through projects such as connecting supply chains or enabling sm...
Most DevOps journeys involve several phases of maturity. Research shows that the inflection point where organizations begin to see maximum value is when they implement tight integration deploying their code to their infrastructure. Success at this level is the last barrier to at-will deployment. Storage, for instance, is more capable than where we read and write data. In his session at @DevOpsSummit at 20th Cloud Expo, Josh Atwell, a Developer Advocate for NetApp, will discuss the role and value...
As cloud adoption continues to transform business, today's global enterprises are challenged with managing a growing amount of information living outside of the data center. The rapid adoption of IoT and increasingly mobile workforce are exacerbating the problem. Ensuring secure data sharing and efficient backup poses capacity and bandwidth considerations as well as policy and regulatory compliance issues.
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Ge...
When NSA's digital armory was leaked, it was only a matter of time before the code was morphed into a ransom seeking worm. This talk, designed for C-level attendees, demonstrates a Live Hack of a virtual environment to show the ease in which any average user can leverage these tools and infiltrate their network environment. This session will include an overview of the Shadbrokers NSA leak situation.
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.