Welcome!

Blog Feed Post

OnPage’s 3 Steps to Mastering IT On-Call Scheduling

IT on-call

Almost half of all technology professionals experience on-call as an integral part of their job. Life for an IT on-call often means 2 am wake up calls for false alarms or for issues the engineer can do little about.  The results of these sorts of sleep interruptions and tensions inevitably lead to alert fatigue which is considered to be the #1 pain point for both traditional IT teams as well as modern DevOps engineers.

Previous guides have failed to focus on the salient issues that need to be addressed in order to move the conversation forward. As such, OnPage is putting forth the following to highlight the issues that need to be discussed and provide solutions to help improve life on call.

The goal of this blog is to:

  • note what has impeded us from reaching effective life on-call
  • provide 3 steps to mastering life on-call
  • highlight what will be achieved with effective life on-call

Issues impeding effective life on call

Email

Email remains the number one channel people learn about problems. However, this is the worst way to learn about an issue. Email often gets buried under many other messages so it provides the recipient with no immediacy. Furthermore, there is no easily separate communications on a particular incident in an email channel.

Alert Noise

As more technologies get added to the IT stack, the number of items being monitored is vastly increasing. This need to monitor more things than we used to is often referred to as ‘alert hell’ and it is only going to increase exponentially in the future. In fact, large IT organizations can receive up to 150,000 alerts per day from their monitoring systems. It is physically impossible for teams to respond to this number of alerts.

Inefficient Communication

When you are unable to effectively reach engineers or colleagues and don’t know who is on-call, your ability to effectively resolve problems drastically decreases. Additionally, not having the tools to exchange information quickly is also a significant problem. If on-call engineers do have effective communication tools at their finger tips, they are much more productive in managing their on-call shifts and solving problems quickly.

Improving life for IT on-call

More than limiting the number of alerts to the on-call team, the goal of on-call is to limit disruption to the end customer. To this end, a pageable alert is only fired when action must be taken. Anything that doesn’t take place in that context, is a ticket.

Step 1: Create a fair on-call schedule

Use group schedules to make sure everyone gets a chance at bat. Rotations are key in this regard as they ensure everyone is put on-call at some point during a normal schedule. Moreover, a fair schedule will promote the sense that no one group is being picked on or forced to work more hours than any other.

Step 2: Make sure alerts are persistent

How many times has someone on your team said they didn’t respond to the alert because they didn’t hear it? Most alerting technologies notify engineers via SMS or email and don’t provide persistent alerting if the engineer is temporarily out of range.

Instead, make sure you are using a tool that avoids these problems and instead creates persistent alerts that will be heard. Additionally, make sure the alerts will be heard when the engineer comes back into range.

Step 3: Messaging for efficient communications

Make sure the on-call communications tools you use enable communications between engineers.  That is, make sure they have the right tool which will enable both alerting and critical communications. Engineers should be able to message fellow engineers as well as groups.

Ideally, your messaging platform will also integrate with widely used industry tools such as Slack. From Slack, for example, engineers could alert individuals to significant events that need their colleague’s input.

Conclusion

Life on-call doesn’t need to remind everyone of a Stephen King horror novel. Instead, with adequate forethought, life on call can actually be manageable and lead to a decrease in alert fatigue.

Want to read 4 more steps to improve on-call scheduling? Download our whitepaper.

The post OnPage’s 3 Steps to Mastering IT On-Call Scheduling appeared first on OnPage.

Read the original blog entry...

More Stories By OnPage Blog

OnPage is a disruptive technology and application that leverages today's technology and smartphone capabilities for priority mobile messaging. With a top notch history of ensuring uninterrupted communication for businesses and critical response organizations, OnPage is once again poised to pioneer new mobile communications methodology for business and organizational use.

Latest Stories
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
"CA has been doing a lot of things in the area of DevOps. Now we have a complete set of tool sets in order to enable customers to go all the way from planning to development to testing down to release into the operations," explained Aruna Ravichandran, Vice President of Global Marketing and Strategy at CA Technologies, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...