Welcome!

Blog Feed Post

Rebuilding After Disaster: DevOps is the first step

Picture compliments of FS.comhttp://www.stacki.com/wp-content/uploads/2016/05/DC.Flood_.jpg 600w" sizes="(max-width: 300px) 100vw, 300px" />

Not our flooded DC, but similar.

If you’ve ever stood in the ruins of what was once your datacenter and pondered how much work you had to do and how little time you had to do it, then you probably nodded at the title. If you have ever worked to get as much data off of a massive RAID array as possible with blinking red lights telling you that your backups should have been better, you too probably nodded at the title.

It is true that I have experienced both of these situations. A totally flooded datacenter (caused by broken pipes in the ceiling) set us to scrambling so that we could put together something while we restored normal function. The water had come through the ceiling and was a couple feet deep, so the destruction was pretty thorough. In a different role, a RAID array with many disks lost one disk, and before our service contractor came to replace that disk (less than 24 hours), two more went. Eventually, as more people than just us had problems, the entire batch of disks that this RAID devices’ drives came out of was determined to be faulty. Thing is, a ton of our operational intelligence was on those disks in the form of integrations – the system this array was for knitted together a dozen or so systems on several OS/DB combinations, and all the integration code was stored on the array. The system was essentially the cash register of the organization, so downtime was not an option. And I was the manager responsible.

Both of these scenarios came about before DevOps was popular, and in both scenarios we had taken reasonable precautions. But when the fire was burning and the clock was ticking, our reasonable precautions weren’t good enough to get us up and running (even minimally) in a short amount of time. And that “minimally” is massively important. In the flood scenario, the vast majority of our hardware was unusable, and in a highly dynamic environment, some of our code – and even purchased packages – was not in the most recent set of backups. That last bit was true with the RAID array also. We were building something that had never been done before at the scale we were working on, so change was constant, new data inputs were constant, and backups – like most backups – were not continuous.

With DevOps, these types of disasters are still an issue, some of the problems we had will still have to be dealt with, but one of the big issues we ran into – getting new hardware, getting it installed, getting apps on it, and getting it running so customers/users could access something is largely taken care of.

With provisioning – server and app – and continuous integration, the environment you need can be recreated in a short amount of time, assuming you can get hardware to use it on, or are able to use it either hosted or in the cloud for the near term.

Assuming that you are following DevOps practices (I’d say “best practices”, but this is really more fundamental than that), you have configuration and reinstall information in GitHub or Bitbucket or something similar. So getting some of your services back online becomes a case of downloading and installing a tool like Stacki or Cobbler, hooking it to a tool like Puppet or SaltStack, and getting your configuration files down to start deploying servers from RAID to app.

Will it be perfect? Not at all. If your organization has gone all-in and has network configuration information in a tool like Puppet with the Cisco or F5 plugins, for example, it is highly unlikely that short-term network gear while you work things out with an insurance company is going to be configurable by that information. But having DevOps in place will save you a lot of time, because you don’t have to rebuild everything by hand.

And trust me, at that instant, the number one thing you will care about is “How fast can I get things going again?” knowing full well that the answer to that question will be temporary while the real problems are dealt with. Anything that can make that process easier will help, you will already be stressed trying to get someone – be it vendor reps for faulty disk drives or insurance reps for disasters – out to help you do the longer-term recovery, the short term should be as automatic as possible.

Surprisingly, I can’t say “I hope you never have to deal with this”. It is part of life in IT, and I honestly learned a ton from it. The few thousand lines of code and tens of thousands of billable data we lost with the RAID issue was an expensive lesson, but we came out stronger and more resilient. The flooded datacenter gave me a chance to deal with insurance on a scale most people never have to, and (with the help of the other team members of course) to build a shiny new datacenter from the ground up – something we all want to do. But if you have a choice, avoid it. Since you don’t generally have a choice, prepare for it. DevOps is one way of preparing.

 

Read the original blog entry...

More Stories By Don MacVittie

Don MacVittie is founder of Ingrained Technology, A technical advocacy and software development consultancy. He has experience in application development, architecture, infrastructure, technical writing,DevOps, and IT management. MacVittie holds a B.S. in Computer Science from Northern Michigan University, and an M.S. in Computer Science from Nova Southeastern University.

Latest Stories
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that's no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, explored how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He expla...
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices t...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to clos...
"I focus on what we are calling CAST Highlight, which is our SaaS application portfolio analysis tool. It is an extremely lightweight tool that can integrate with pretty much any build process right now," explained Andrew Siegmund, Application Migration Specialist for CAST, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based infrastructure. Delivering modern applications to end users, therefore, is a constantly shifting challenge. Delivery automation helps IT Ops teams ensure that apps are providing an optimal end user experience over hybrid-cloud and multi-cloud environments, no matter what the current state of the infrastructure is. To employ a delivery automation strategy that reflects your business rules, making r...
The past few years have brought a sea change in the way applications are architected, developed, and consumed—increasing both the complexity of testing and the business impact of software failures. How can software testing professionals keep pace with modern application delivery, given the trends that impact both architectures (cloud, microservices, and APIs) and processes (DevOps, agile, and continuous delivery)? This is where continuous testing comes in. D
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...