Blog Feed Post

Choosing the Right Tools for Effective DevOps

Becoming more operationally mature isn’t something that happens overnight after implementing some new tools. At its core, it’s a concerted effort in shifting culture so that people can break down communication silos to ship better software. And while tools alone are not enough, they do provide a crucial advantage by applying automation that improves speed and accuracy, and can facilitate collaboration.

The DevOps tools that make sense for your environment may vary significantly depending on your team’s size and specific needs. There’s no right answer to how you should build your toolchain. In fact, we’ll be the first to admit that some of the best workflows and tools we learn from and try to emulate are built in-house by innovative teams from Netflix, Etsy, Dropbox, and others.

The goal of this list is simply to share a few of the most popular tools across each stage of the software delivery lifecycle, some of which we also use internally. We hope that you find it a useful reference as you evaluate the right tools for your own team.

https://www.pagerduty.com/wp-content/uploads/2017/07/devops_loop-300x140... 300w, https://www.pagerduty.com/wp-content/uploads/2017/07/devops_loop-1024x47... 1024w, https://www.pagerduty.com/wp-content/uploads/2017/07/devops_loop-250x117... 250w, https://www.pagerduty.com/wp-content/uploads/2017/07/devops_loop-180x84.png 180w" sizes="(max-width: 699px) 100vw, 699px" />


How and what to build

The waterfall approach of planning out all the work and every single dependency for a release runs counter to DevOps. Instead, with an agile approach, you can reduce risk and improve visibility by building and testing software in smaller chunks and iterating on customer feedback. That’s why many teams, including our own, plan and release in sprints of around two to four weeks.

As your team is sharing ideas and planning at the start of each sprint, take into account feedback from the previous sprint’s retrospective so the team and your services are continuously improving. Tools can help you centralize learnings and ideas into a plan. To kick off the brainstorming process, you can identify your target personas, and map customer research data, feature requests, and bug reports to each of them. We like to use physical or digital sticky notes during this process to make it easier to group together common themes and construct user stories for the backlog. By mapping user stories back to specific personas, we can place more of an emphasis on the customer value delivered. After the ideation phase, we organize and prioritize the tasks within a tool that tracks project burndown and daily sprint progress.

Top tools and ones we use include: Active Collab, Pivotal Tracker, VersionOne, Jira, Trello, StoriesOnBoard


After you’ve written some code, there’s a few things that need to happen before getting it into staging. Get it code reviewed and approved, merge it into the master branch in a version control repository, and run any local tests as needed.

Top tools: Github, Bitbucket, Gerrit, GitLab

Test and build

Now it’s time to automate the execution of tasks related to building, testing, and releasing code. Before the build can get deployed, it needs to undergo a number of tests to ensure that it’s safe to push to production: unit tests, integration tests, functional tests, acceptance tests, and more. Tests are a great way to ensure that existing pieces of your codebase continue to function as expected when new changes are introduced. It’s important to have tests that run automatically whenever there’s a new pull request. This minimizes errors that escape because of manual oversight, reduces the cost of performing reliable tests, and exposes bugs earlier.

There are also a number of great open source and paid tools that do useful things once the tests are complete, like automatically picking up changes to the master and pulling down dependencies from a repository to build new packages.

Top tools include: Jenkins, GoCD, Maven, CruiseControl, TravisCI, CircleCI

Containers and Schedulers

With the advent of Docker and containers, teams can now easily provision lightweight, consistent, and disposable staging and development environments without needing to spin up new virtualized operating systems.

Containers standardize how you package your application, improving storage capacity and flexibility, and making it easier to make changes faster. This also enables your application to run anywhere. In other words, things will magically behave in production exactly as they did when you made the changes on your laptop.

Top tools include: Docker, Kubernetes, Mesos, Nomad

Configuration Management

With configuration management, you can track changes to your infrastructure and maintain a single source of system configuration. Look for a tool that makes it easy to version control and make replicas of images — i.e. anything you can take a snapshot of like a system, cloud instance, or container. The goal here is to ensure standardized environments and consistent product performance. Configuration management also helps you better identify issues that resulted from changes, and simplifies autoscaling by automatically reproducing existing servers when more capacity is needed.

Top tools include: Chef, Ansible, Puppet, SaltStack


Release automation

Release automation tools enable you to automatically deploy to production. They should include capabilities such as automated rollbacks, copying artifacts to the host before starting the deployment, and especially if you’re a larger organization, agentless architecture to easily install agents and configure firewalls at scale to your server instances.

Note that if something passes the tests, it typically automatically gets deployed. One best practice is to perform a canary deployment first that deploys to a subset of your infrastructure, and if there are no errors, then do a fleet wide deploy.

A lot of teams also use chat-based deployment workflows, using bots to deploy with simple commands, so everyone can easily see deployment activity and learn together.

Top tools include: Bamboo, Puppet

Monitor the deployment

It can be really helpful to have release dashboards and monitors set up that help you visualize high-level release progress and status of requirements. It’s also key to understand whether services are healthy and if there are any anomalies before, during, and after a deploy. Make sure you are notified in real time on key events that take place on your continuous integration servers so you know if there’s a failed build, or know to hold or roll back on a deploy.

Top tools include: Datadog, Elastic Stack, PagerDuty


Server monitoring

Server monitoring gives you an infrastructure-level view. A lot of teams also use log aggregation to drill down into specific issues. This type of monitoring enables you to aggregate metrics (such as memory, CPU, system load averages, etc.) and understand the health of your servers so that you can take action on issues, ideally before applications — and the customers that use them — are affected.

Top tools include: Datadog, AWS Cloudwatch, Splunk, Nagios, Pingdom, Solarwinds, Sensu

Application performance monitoring

Application performance monitoring provides code-level visibility of the performance and availability of applications and business services, such as your website. This makes it easy to quickly understand performance metrics and meet service SLA’s.

Top tools include: New Relic, Dynatrace, AppDynamics

Respond and Learn

Incident resolution

Monitoring tools provide a lot of rich data, but that data isn’t useful if it isn’t routed to the right people who can take the right actions on an issue in real time. To minimize downtime, people must be notified with the right information when issues are detected, have well-defined processes around triage and prioritization, and be enabled to engage in efficient collaboration and resolution.

When application and performance issues now often cost thousands of dollars a minute, orchestrating the right response is often highly stressful, but it can’t afford to be chaotic. In the middle of a fire, you don’t want to waste half an hour pulling up a contact directory and trying to figure out how to get the right people on a conference bridge.

The good news is, PagerDuty automates the end-to-end incident response process to shave time off of resolving both major customer-impacting incidents or daily operational issues. Here at PagerDuty, everyone from our engineering teams, support teams, security teams, executives, and more, uses our product to orchestrate coordinated response to IT and business disruptions. We have the flexibility to manage on-call resources, suppress what’s not actionable, consolidate related context, mobilize the right people and business stakeholders, and collaborate with our preferred tools. If you can easily architect exactly what you want your wartime response to look like, you’ll have a lot more peace of mind.

Tools we use: PagerDuty, HipChat, Slack, Conferencing tools

Learn and improve

When wartime is over, incidents provide a crucial learning opportunity to understand how to improve processes and systems to be more resilient. In accordance with the CAMS pillars of DevOps (Culture, Automation, Measurement, Sharing), it’s important to understand incident response and system performance metrics, and facilitate open dialogue to share successes and failures towards the goal of continuous improvement.

Look for a solution that enables you to streamline post mortems and post mortem analysis for the purpose of prioritizing action items regarding what needs to be fixed. You’ll want to measure the success of a service relative to business goals and customer experience metrics, with tools that help you understand product usage and customer feedback. All of these will feed into the next sprint so that you can accurately plan and prioritize both system and feature improvements — for even better product, and happier customers.

Tools we use: PagerDuty Postmortems, Looker, Pendo, SurveyMonkey

Again, simply investing in tools will not get you from a monolithic to a microservices architecture, or magically result in teams that can perform self-service deployments many times a day. But by bolstering a culture shift with the right tools and processes, you’ll be well on your way to optimizing and continuously improving software delivery and enabling seamless collaboration and trust between everyone responsible for it.

With that, we wish you success in exploring and finding the right tools for you! And check out these resources if you’re interested in learning more about which tools we use internally to maximize inclusivity, and how to accelerate DevOps best practices.

The post Choosing the Right Tools for Effective DevOps appeared first on PagerDuty.

Read the original blog entry...

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.

Latest Stories
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, will examine the regulations and provide insight on how it affects technology, challenges the established rules and will usher in new levels of diligence a...
Existing Big Data solutions are mainly focused on the discovery and analysis of data. The solutions are scalable and highly available but tedious when swapping in and swapping out occurs in disarray and thrashing takes place. The resolution for thrashing through machine learning algorithms and support nomenclature is through simple techniques. Organizations that have been collecting large customer data are increasingly seeing the need to use the data for swapping in and out and thrashing occurs ...
When you focus on a journey from up-close, you look at your own technical and cultural history and how you changed it for the benefit of the customer. This was our starting point: too many integration issues, 13 SWP days and very long cycles. It was evident that in this fast-paced industry we could no longer afford this reality. We needed something that would take us beyond reducing the development lifecycles, CI and Agile methodologies. We made a fundamental difference, even changed our culture...
yperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let’s say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it....
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that’s no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, will explore how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He wi...
In the enterprise today, connected IoT devices are everywhere – both inside and outside corporate environments. The need to identify, manage, control and secure a quickly growing web of connections and outside devices is making the already challenging task of security even more important, and onerous. In his session at @ThingsExpo, Rich Boyer, CISO and Chief Architect for Security at NTT i3, discussed new ways of thinking and the approaches needed to address the emerging challenges of security i...
Docker containers have brought great opportunities to shorten the deployment process through continuous integration and the delivery of applications and microservices. This applies equally to enterprise data centers as well as the cloud. In his session at 20th Cloud Expo, Jari Kolehmainen, founder and CTO of Kontena, discussed solutions and benefits of a deeply integrated deployment pipeline using technologies such as container management platforms, Docker containers, and the drone.io Cl tool. H...
Cloud adoption is often driven by a desire to increase efficiency, boost agility and save money. All too often, however, the reality involves unpredictable cost spikes and lack of oversight due to resource limitations. In his session at 20th Cloud Expo, Joe Kinsella, CTO and Founder of CloudHealth Technologies, tackled the question: “How do you build a fully optimized cloud?” He will examine: Why TCO is critical to achieving cloud success – and why attendees should be thinking holistically ab...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, will provide a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to...
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, will discuss th...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
SYS-CON Events announced today that Datera will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera offers a radically new approach to data management, where innovative software makes data infrastructure invisible, elastic and able to perform at the highest level. It eliminates hardware lock-in and gives IT organizations the choice to source x86 server nodes, with business model option...
"Cloud computing is certainly changing how people consume storage, how they use it, and what they use it for. It's also making people rethink how they architect their environment," stated Brad Winett, Senior Technologist for DDN Storage, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, Cloud Expo and @ThingsExpo are two of the most important technology events of the year. Since its launch over eight years ago, Cloud Expo and @ThingsExpo have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, I provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading the...