Blog Feed Post

Stepping Up to the Plate: A Story About Being On-Call

The Alert

A few weeks ago, I went to my first baseball game. The San Francisco Giants were playing the San Diego Padres at the AT&T park, and my relatives had an extra ticket for me. I met my relatives at the front entrance of the park, and when we entered I took the whole spectacle in: the big LED lights, the endless rows, the infinite hallway of food stalls. After gathering the necessary garlic fries and chicken tenders, we made our way to our seats.

The Giants make three runs in the first inning, and the whole stadium was electric with excitement. But the sixth inning, the lead had flipped, and the Padres were beating the Giants by three runs. Tensions were high in the ballpark, with the few fans clad in blue getting louder and louder, while those sporting orange grew silent.

The Ghirardelli man was making his rounds, and my cousin flagged him down to grab us some hot chocolate. He makes his way to our row, pours a cup and passes it down to me. My hands grab the cup — and then my phone goes off. I’m startled by the loud ring and vibration, and the cup of hot chocolate slips from my hands. My cousin sitting beside me catches it, though my jeans take some stains. The spectators behind us complain, telling me to silence my phone. My phone was on silent though. I had configured it to only make a sound if it was an alert from PagerDuty.

“Hold that for me, there’s something I have to do,” I tell her.

“You okay? What’s wrong?” My cousin asks.

“There’s been an incident, I need to go.”

I grab my headphones from my purse, stand up, push my way past the legs of the seated spectators of row three, and run up the stairs.

The Response

I roamed around trying to find a private place to take the call, but everywhere I went the speakers blared and the cheers reverberated throughout the stadium. At the end of the food hall, I spot the illuminated sign and book it into the bathroom. The acoustics only amplified the crowd’s jeering, but I was running out of time. I pick the farthest stall from the entrance, put the toilet seat down, plug my headphones in and join the call. I mute my microphone, I did not want the background noise to bother anybody. I was the third person to join the call, and I entered mid-conversation.

“We’re waiting for the on-call member from the EM team,” a voice says.

“All right, who is the EM on-call?” another asks.

“I’m not sure. We’ll just wait and-” the first voice is interrupted.

“Hello?” A third voice.

“Hi there,” someone replies.

“Hi, this the EM on-call.”

“Hello, what’s the situation, and what’s your status on resolving it?”

“I already resolved it, but let me get on the portal to make sure everything’s okay.”

“What!” I yell in disbelief. I cover my mouth, then realized (with relief) that they could not hear me. It had only been two minutes since the initial alert was sent, and the on-call engineer had already solved the incident before joining the call. In the next few minutes, the three voices started rattling off numbers and analyzing metrics. While I had no idea what any of it meant, I took it from the calm tone of their voices and lack of swearing that we were out of any sort of trouble.

“Yeah, it’s back to normal now.”

“Awesome. Do you have any reason to think this will happen again?”

“No, I don’t think that this will come up again, but I will keep an eye out.”

“All right then. Well, thanks for handling this.”

“No problem, thank you everybody for being here. Goodbye.”

“Goodbye, have a good weekend.”

The conference call ends, and I look at my phone screen. 8 minutes and 38 seconds. 8 minutes to resolve an incident, or to talk about it anyway. I sat there in the bathroom stall, dumbfounded. I come out of the stall to wash my hands, and noticed in the mirror that I had not attended to the dark hot chocolate stains on my jeans.

As I start trying to wipe the splotches away, I realize how grossly under prepared I was for what had happened. I was stressed and flustered, and I was only shadowing. One, I did not have my laptop with me. Two, my phone was on 15% charge. Three, I had one too many beers I doubt I could have solved any sort of technical problem, let alone explain what I was solving to someone else. If I were the on-call engineer, I would have struck out. I would have let my team down.

The Post-Mortem

That evening, the Giants came back in the bottom of the ninth inning, and I realized that being on-call is somewhat like baseball. Specifically, being on-call is like being the batter when your team has two outs, has third base filled, and is down by one point at the bottom of the ninth inning. In that moment, the team’s success rides on you and you alone. In front of you, you have teammates on the bases, and their success is entirely dependent on yours. Behind you, you have the rest of the team in the dugout, waiting to see whether you fail or fly.

The batter swings and the ball is in play. That was when it clicked. With PagerDuty, the on-call engineer is no longer the lone batter, and is instead one of the players on the field. With PagerDuty, being on-call ceases to be an individual endeavor: it becomes a team sport. Instead of the on-call engineer having to sift through thousands of alerts to find the problem and solve it on his own, he had a team to support him, and a central line in which they could communicate, and a platform that filtered out all the unnecessary noise. When the ball is in play they assess the situation, they pass it around to who is best positioned to solve the problem, all with the common goal of resolving the issue before it shows up on the customer’s screen.

PagerDuty’s platform goes beyond making sure that the customer’s digital experience is seamless and smooth: it makes the on-call experience less stressful, less uncertain and less overwhelming.

I do not have a technical background in engineering or computer science, nor am I a huge sports fan, so I find it humorously ironic that I was able to make sense of both these things by putting them together.

The post Stepping Up to the Plate: A Story About Being On-Call appeared first on PagerDuty.

Read the original blog entry...

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.

Latest Stories
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
"We started a Master of Science in business analytics - that's the hot topic. We serve the business community around San Francisco so we educate the working professionals and this is where they all want to be," explained Judy Lee, Associate Professor and Department Chair at Golden Gate University, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
Daniel Jones is CTO of EngineerBetter, helping enterprises deliver value faster. Previously he was an IT consultant, indie video games developer, head of web development in the finance sector, and an award-winning martial artist. Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams.
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory? In her Day 2 Keynote at @DevOpsSummit at 21st Cloud Expo, Aruna Ravichandran, VP, DevOps Solutions Marketing, CA Technologies, was jo...
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.