|By PagerDuty Blog||
|March 15, 2017 09:30 AM EDT|
Avoiding Noise in Incident Management
Suppression. According to the thesaurus, this word is synonymous with terms like deletion, elimination, and annihilation.
Yet within the context of incident management, suppression means something quite different. It’s not about getting rid of data forever. It serves instead as a way of making sure that admins focus on the right alerts at the right time by mitigating noise.
Here’s a look at how suppression significantly helps streamline incident management.
Why Suppression is Important
Why is suppression useful in incident management? Simply put, it’s because modern infrastructure generates a huge volume of alerts and admins can’t reasonably expect to be able to review each and every alert. If they try, they will soon become subject to alert fatigue, which means they will begin ignoring potentially important alerts because they are overwhelmed and burned out. And if they stop paying attention to alerts, then the entire incident management process breaks down.https://www.pagerduty.com/wp-content/uploads/2016/11/suppression-300x175... 300w, https://www.pagerduty.com/wp-content/uploads/2016/11/suppression-250x145... 250w, https://www.pagerduty.com/wp-content/uploads/2016/11/suppression-180x105... 180w" sizes="(max-width: 500px) 100vw, 500px" />
Alert suppression is a way of avoiding this issue. By suppressing alerts of certain types, admins can ensure that actionable, high-priority alerts receive the greatest attention. They can also reduce the overall number of alerts that appear on their dashboards, which helps to prevent the risk of alert fatigue.
As an example, consider an organization whose workstations reboot once a week overnight after updates are installed. The reboot would generate a series of alerts as workstations go offline and come back up. Adding these to the incidents dashboard that admins see wouldn’t be helpful, because the alerts in this case reflect a routine procedural event that does not require action. In order to avoid adding this unhelpful noise to admins’ dashboards, admins can configure their incident management software to suppress alerts related to a workstation rebooting.
Suppression: Not an Either/Or Proposition
An important point to understand about alert suppression is that suppressing alerts is not an either/or proposition. In other words, admins’ options are not limited simply to enabling all alerts of a certain type or permanently suppressing all of them.
They can instead take a more nuanced approach to suppression. Alert suppression could be configured in such a way that alerts of a given type are suppressed unless they occur repeatedly within a certain period of time, for example. Alerts could also be configured so that they are reported if they occur during a certain time of day, but are suppressed during other times. Similarly, admins might want to suppress alerts of a particular type if they occur on a certain kind of device, but not others.
This flexibility is important because it ensures that admins can maximize the effectiveness of alerts. Instead of applying broad, blunt suppression policies, they can tweak suppression settings in order to maximize the visibility of important events without adding unnecessary noise to the incident management system.
Nuanced suppression could be helpful in the example above. As I noted, admins generally don’t want to receive alerts when a workstation reboots in the middle of the night following a software update. But if the incident management software detects a workstation that reboots multiple times during the same period, that could signal a problem (like a flawed software update) that admins will want to know about. In this situation, having suppression configured so that only recurring reboots generate incidents that appear in the central dashboard, would help to optimize incident management effectiveness.
Suppression Doesn’t Mean Losing Data
It’s also worth emphasizing that suppression in the context of incident management does not mean that suppressed alerts disappear forever. On the contrary, suppressed alerts still happen, and data related to them should be saved. The only difference between a suppressed alert and a non-suppressed one is that the former is not sent to priority dashboards in the incident management system.
This is important to understand because it means that admins retain the ability to look up suppressed alerts to gain insight into an incident if they need to. This also helps them better tune their alerting thresholds. In addition, suppressed alerts still figure into historical incident management data, which can be used to reveal lots of valuable information about infrastructure efficiency and health trends.
With suppression, then, you get to have your alerts and eat them, too—or something like that.
Suppressed alerts can be leveraged in any way admins need to help identify and respond to incidents, but they don’t clutter dashboards with non-actionable information that gets in the way of resolving incidents that are likely to be of a higher priority. Moreover, suppression can be tweaked so that alerts are suppressed only under exactly the right circumstances, but are always reported so you gain full visibility into your infrastructure.
Mar. 26, 2017 12:30 AM EDT Reads: 5,147
Mar. 26, 2017 12:30 AM EDT Reads: 2,869
Mar. 26, 2017 12:30 AM EDT Reads: 1,828
Mar. 26, 2017 12:15 AM EDT Reads: 707
Mar. 26, 2017 12:00 AM EDT Reads: 4,103
Mar. 26, 2017 12:00 AM EDT Reads: 1,688
Mar. 25, 2017 11:15 PM EDT Reads: 2,912
Mar. 25, 2017 10:45 PM EDT Reads: 3,614
Mar. 25, 2017 09:45 PM EDT Reads: 3,555
Mar. 25, 2017 08:45 PM EDT Reads: 5,969
Mar. 25, 2017 08:45 PM EDT Reads: 2,758
Mar. 25, 2017 06:15 PM EDT Reads: 2,562
Mar. 25, 2017 05:15 PM EDT Reads: 1,930
Mar. 25, 2017 04:00 PM EDT Reads: 508
Adding public cloud resources to an existing application can be a daunting process. The tools that you currently use to manage the software and hardware outside the cloud aren’t always the best tools to efficiently grow into the cloud. All of the major configuration management tools have cloud orchestration plugins that can be leveraged, but there are also cloud-native tools that can dramatically improve the efficiency of managing your application lifecycle.
Mar. 25, 2017 04:00 PM EDT Reads: 2,831