Blog Feed Post

DNS Outage Was Doomsday for the Internet

What was supposed to be a quiet Friday suddenly turned into a real “Black Friday” for us (as well as most of the Internet) when Dyn suffered a major DDOS attack. From an internet disruption’s perspective, the widespread damage the outage caused made it the worst I have ever experienced.

At the core of it all, the managed DNS provider Dyn was targeted in a DDOS attack that impacted thousands of web properties, services, SaaS providers, and more.

The chart below shows the DNS resolution time and availability of twitter.com from around the world. There were three clear waves of outages:

  • 7:10 EST to 9:10 EST
  • 11:52 EST to 16:33 EST
  • 19:13 EST to 20:38 EST

dns-twitterhttp://assetsblogfly2.catchpoint.com/wp-content/uploads/2016/10/DNS-Twit... 300w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2016/10/DNS-Twit... 768w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2016/10/DNS-Twit... 624w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2016/10/DNS-Twit... 1453w" sizes="(max-width: 625px) 100vw, 625px" />

The DNS failures were the result of Dyn nameservers not responding to DNS queries for more than four seconds.

We were impacted in three ways:

  • Our domain Catchpoint.com was not reachable for a solid 30 minutes until we introduced our secondary managed DNS provider Verisign. We also brought up and publicized to our customers a backup domain that was never on Dyn, so our customers could login to our portal and keep an eye on their online services. All of these were in standby mode prior to the incident.
  • Our nodes could not reliably talk to our globally distributed command and control systems until we switched to IP only mode, bypassing DNS lookups. This was a feature we had developed, tested, and in production, but was not active as our engineering teams planned one more enhancement. Due to the nature of the situation, we deemed the enhancement to be lower risk than what we were experiencing.
  • Many of our own third party vendors that our company relies on stopped working- Customer Support and Online Help solution, CRM, office door badging system, SSO, 2 Factor Authentication services, one of the CDNs, a file sharing solution, and the list goes on and on.

This blog post is not about finger pointing; the folks at Dyn had a horrible day putting up with their worst nightmare. They did an amazing job of dealing with it, from notifications to extinguishing the fire. This is about how to deal with the worst case outage, as a company and an industry.

As with every outage, it’s important to take the time to reflect on what took place and how this can be avoided in the future.

Here are some of my takeaways from Friday, and the must-have solutions:

  • DNS is still one of the weakest links in our Internet infrastructure and digital economy. We have to keep learning and sharing that knowledge with each other. Here are several articles we have written on DNS.
  • A single DNS provider is not an option anymore for anyone. No company, small or large, can rely on a single DNS provider.
  • DNS vendors should create knowledge base articles about how to introduce secondary DNS providers, and they must be easy to find and follow.
  • DNS vendors need to make the setup of auto – transfer easier to find. Having to open a ticket in a middle of a crisis to find out the IP of the xtransfer name servers is simply not a viable option.
  • DNS Vendors should not set high TTLs (two days) on the authoritative nameserver records they pass on the DNS queries, and it should be easy to drop or change TTL. While this is great to bypass changing records on the TLDs, making the nameservers authoritative for two days becomes a headache when trying to switch to or migrate from a back-up solution.

image001http://assetsblogfly2.catchpoint.com/wp-content/uploads/2016/10/image001... 768w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2016/10/image001... 624w, http://assetsblogfly2.catchpoint.com/wp-content/uploads/2016/10/image001... 885w" sizes="(max-width: 300px) 100vw, 300px" />

Introducing another DNS vendor wouldn’t have achieved 100% of the result until you go into the Dyn configuration and add that other solution in the mix:

Some takeaways from a monitoring standpoint:

I had people tell me, “But Mehdi, I am not seeing a problem in my RUM.” When your site isn’t reachable, RUM won’t tell you anything because there is no user activity to show. This is why your monitoring strategy must include synthetic and RUM.

  • DNS monitoring is critical to understand the “why.”
  • DNS performance impacts web performance.
  • The impact was so incredible, some sites that didn’t rely on Dyn still suffered outages or bad user experience. This is because they used third parties that did rely on Dyn.

We interact with many things on a daily basis (cars, cell phones, planes, hair dryers) that have some sort of certification. I urge whoever is responsible to consider the following:

  • A ban on any Internet-connected device that does not force the change of default credential upon starting it. There shouldn’t Admin/Admin for anything including cameras, refrigerators, access points, routers, etc.
  • A ban on accessing of such devices from any place on the Internet. There should be some limitation, either access through the provider interface or from local network.
  • Consumers should also pressure the industry by not buying the products that aren’t safe. Maybe we need an “Internet Safety Rating” from a governmental agency or worldwide organization.
  • A must-have feature on every home and SMB router, and access point is the ability to detect abnormal traffic/activity and turn it off or slow it down; sending thousands of DNS requests in a minute is not normal. We should learn from Microsoft and what they did with Windows XP to limit an infected host.
  • Local ISPs must have capabilities to detect and stop rogue traffic.

Cybersecurity is dire. I hope this incident serves as a huge wake-up call for everyone. What happened Friday was a Code Blue event; we rely on the Internet for practically everything in society today, and it’s our job to do everything we can to protect it.

Thank you, Dyn, for the prompt response times to the support tickets, to Verisign for last-minute questions, our customers who were very patient and understanding, our entire support organization, and some special friends in major companies who offered a helping hand by providing some amazing advice around DNS.

Mehdi – Catchpoint CEO and Co-Founder

To learn more about how you can handle a major outage like this in the future, join our upcoming Ask Me Anything: OUTAGE! with VictorOps, Target, and Release Engineering Approaches.

The post DNS Outage Was Doomsday for the Internet appeared first on Catchpoint's Blog.

Read the original blog entry...

More Stories By Mehdi Daoudi

Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.

Founded in 2008 by four DoubleClick / Google executives with a passion for speed, reliability and overall better online experiences, Catchpoint has now become the most innovative provider of web performance testing and monitoring solutions. We are a team with expertise in designing, building, operating, scaling and monitoring highly transactional Internet services used by thousands of companies and impacting the experience of millions of users. Catchpoint is funded by top-tier venture capital firm, Battery Ventures, which has invested in category leaders such as Akamai, Omniture (Adobe Systems), Optimizely, Tealium, BazaarVoice, Marketo and many more.

Latest Stories
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara California. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infras...
What's the role of an IT self-service portal when you get to continuous delivery and Infrastructure as Code? This general session showed how to create the continuous delivery culture and eight accelerators for leading the change. Don Demcsak is a DevOps and Cloud Native Modernization Principal for Dell EMC based out of New Jersey. He is a former, long time, Microsoft Most Valuable Professional, specializing in building and architecting Application Delivery Pipelines for hybrid legacy, and cloud ...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, pane...
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
Join us at Cloud Expo June 6-8 to find out how to securely connect your cloud app to any cloud or on-premises data source – without complex firewall changes. More users are demanding access to on-premises data from their cloud applications. It’s no longer a “nice-to-have” but an important differentiator that drives competitive advantages. It’s the new “must have” in the hybrid era. Users want capabilities that give them a unified view of the data to get closer to customers and grow business. The...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
In this presentation, Striim CTO and founder Steve Wilkes will discuss practical strategies for counteracting fraud and cyberattacks by leveraging real-time streaming analytics. In his session at @ThingsExpo, Steve Wilkes, Founder and Chief Technology Officer at Striim, will provide a detailed look into leveraging streaming data management to correlate events in real time, and identify potential breaches across IoT and non-IoT systems throughout the enterprise. Strategies for processing massive ...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo Silicon Valley which will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is at the intersection of technology and business-optimizing tools, organizations and processes to bring measurable improvements in productivity and profitability," said Aruna Ravichandran, vice president, DevOps product and solutions marketing...
In his session at Cloud Expo, Alan Winters, an entertainment executive/TV producer turned serial entrepreneur, presented a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to ma...