Welcome!

Blog Feed Post

Rebuilding After Disaster: DevOps is the first step

Picture compliments of FS.comhttp://www.stacki.com/wp-content/uploads/2016/05/DC.Flood_.jpg 600w" sizes="(max-width: 300px) 100vw, 300px" />

Not our flooded DC, but similar.

If you’ve ever stood in the ruins of what was once your datacenter and pondered how much work you had to do and how little time you had to do it, then you probably nodded at the title. If you have ever worked to get as much data off of a massive RAID array as possible with blinking red lights telling you that your backups should have been better, you too probably nodded at the title.

It is true that I have experienced both of these situations. A totally flooded datacenter (caused by broken pipes in the ceiling) set us to scrambling so that we could put together something while we restored normal function. The water had come through the ceiling and was a couple feet deep, so the destruction was pretty thorough. In a different role, a RAID array with many disks lost one disk, and before our service contractor came to replace that disk (less than 24 hours), two more went. Eventually, as more people than just us had problems, the entire batch of disks that this RAID devices’ drives came out of was determined to be faulty. Thing is, a ton of our operational intelligence was on those disks in the form of integrations – the system this array was for knitted together a dozen or so systems on several OS/DB combinations, and all the integration code was stored on the array. The system was essentially the cash register of the organization, so downtime was not an option. And I was the manager responsible.

Both of these scenarios came about before DevOps was popular, and in both scenarios we had taken reasonable precautions. But when the fire was burning and the clock was ticking, our reasonable precautions weren’t good enough to get us up and running (even minimally) in a short amount of time. And that “minimally” is massively important. In the flood scenario, the vast majority of our hardware was unusable, and in a highly dynamic environment, some of our code – and even purchased packages – was not in the most recent set of backups. That last bit was true with the RAID array also. We were building something that had never been done before at the scale we were working on, so change was constant, new data inputs were constant, and backups – like most backups – were not continuous.

With DevOps, these types of disasters are still an issue, some of the problems we had will still have to be dealt with, but one of the big issues we ran into – getting new hardware, getting it installed, getting apps on it, and getting it running so customers/users could access something is largely taken care of.

With provisioning – server and app – and continuous integration, the environment you need can be recreated in a short amount of time, assuming you can get hardware to use it on, or are able to use it either hosted or in the cloud for the near term.

Assuming that you are following DevOps practices (I’d say “best practices”, but this is really more fundamental than that), you have configuration and reinstall information in GitHub or Bitbucket or something similar. So getting some of your services back online becomes a case of downloading and installing a tool like Stacki or Cobbler, hooking it to a tool like Puppet or SaltStack, and getting your configuration files down to start deploying servers from RAID to app.

Will it be perfect? Not at all. If your organization has gone all-in and has network configuration information in a tool like Puppet with the Cisco or F5 plugins, for example, it is highly unlikely that short-term network gear while you work things out with an insurance company is going to be configurable by that information. But having DevOps in place will save you a lot of time, because you don’t have to rebuild everything by hand.

And trust me, at that instant, the number one thing you will care about is “How fast can I get things going again?” knowing full well that the answer to that question will be temporary while the real problems are dealt with. Anything that can make that process easier will help, you will already be stressed trying to get someone – be it vendor reps for faulty disk drives or insurance reps for disasters – out to help you do the longer-term recovery, the short term should be as automatic as possible.

Surprisingly, I can’t say “I hope you never have to deal with this”. It is part of life in IT, and I honestly learned a ton from it. The few thousand lines of code and tens of thousands of billable data we lost with the RAID issue was an expensive lesson, but we came out stronger and more resilient. The flooded datacenter gave me a chance to deal with insurance on a scale most people never have to, and (with the help of the other team members of course) to build a shiny new datacenter from the ground up – something we all want to do. But if you have a choice, avoid it. Since you don’t generally have a choice, prepare for it. DevOps is one way of preparing.

 

Read the original blog entry...

More Stories By Don MacVittie

Don MacVittie is founder of Ingrained Technology, A technical advocacy and software development consultancy. He has experience in application development, architecture, infrastructure, technical writing,DevOps, and IT management. MacVittie holds a B.S. in Computer Science from Northern Michigan University, and an M.S. in Computer Science from Nova Southeastern University.

Latest Stories
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
As organizations shift towards IT-as-a-service models, the need for managing and protecting data residing across physical, virtual, and now cloud environments grows with it. Commvault can ensure protection, access and E-Discovery of your data – whether in a private cloud, a Service Provider delivered public cloud, or a hybrid cloud environment – across the heterogeneous enterprise. In his general session at 18th Cloud Expo, Randy De Meno, Chief Technologist - Windows Products and Microsoft Part...
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
DXWorldEXPO LLC announced today that ICC-USA, a computer systems integrator and server manufacturing company focused on developing products and product appliances, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of ...
HyperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let's say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors!
DXWorldEXPO LLC announced today that the upcoming DXWorldEXPO | CloudEXPO New York event will feature 10 companies from Poland to participate at the "Poland Digital Transformation Pavilion" on November 12-13, 2018.
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
"We're focused on how to get some of the attributes that you would expect from an Amazon, Azure, Google, and doing that on-prem. We believe today that you can actually get those types of things done with certain architectures available in the market today," explained Steve Conner, VP of Sales at Cloudistics, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Vulnerability management is vital for large companies that need to secure containers across thousands of hosts, but many struggle to understand how exposed they are when they discover a new high security vulnerability. In his session at 21st Cloud Expo, John Morello, CTO of Twistlock, addressed this pressing concern by introducing the concept of the “Vulnerability Risk Tree API,” which brings all the data together in a simple REST endpoint, allowing companies to easily grasp the severity of the ...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...