Blog Feed Post

Scale Yourself Before You Wreck Yourself

Achieving scale — that is, the ability to meet application demand at any level — is essential if you want your business and user base to grow, or if you hope to be able to handle the vicissitudes of modern software deployment.

Yet scaling is no easy feat. Most legacy applications struggle to support thousands of users. An unexpected traffic spike will simply knock over an application not designed for it, and countless customers and dollars are lost while the ITOps team struggles to spin up VMs or rack-and-stack servers to handle the load.

And even if you run your app in the cloud, scalability is not guaranteed. A poorly designed cloud app will experience bottlenecks that render it unusable.

Given the massive costs of suboptimal digital services on productivity, lost opportunities, and more, scalability is mission-critical to any organization today. And it is possible! It requires implementing the right tools and processes, the right team, and the right communication lines between that team. Below, I explain how to achieve scalability in order to avoid derailing your software and organization.

The Principles of Scalability

When preparing for scalability, flexibility and matching deployed infrastructure to meet the load is key. This can also drive cost efficiencies in deploying an application. Understanding your traffic patterns, average usage, and the standard deviation will help you properly size your environment, and planning to rapidly scale for an exceedingly rare (but possible) event can save a lot of headaches when the application goes viral. If an application is deployed regionally, often, idle cycles can be found during the middle of the night. Weekday load vs. weekend load can vary significantly. Many businesses are seasonal, and usage of the application is fractionally or exponentially lower from one time of year to the next.

Scale also involves ensuring the reproducibility of your artifacts, which in turn forces consistency in production deployments. The service artifacts can then be scaled independently as application needs change and grow. This method requires a strong understanding of DevOps, with a durable continuous integration and continuous deployment pipeline at its core.

Tools and Processes that Enable Scalability

First and foremost, application source code needs to be checked into a version control system. Instead of taking this well-structured output and building a bespoke server stack around it, the server stack itself also needs to be transformed into code. It can be a painful process at first, but the only way to scale an infrastructure consistently, every time, is to not rely on an ITOps staff member clicking the “next” button or typing commands into the console on every server deployed to dev, test, and production.

Once your infrastructure and code are both well-defined, you can write integration tests to ensure they function as they should in a fully built environment. To take these to the next level of sophistication, containers can be used as infrastructure building blocks. Those blocks then have consistent “downward” facing hooks to the infrastructure. A cloud container management platform, combined with manifest files that describe how the services fit together and should scale, turn these consistent artifacts into a highly resilient and scalable application.

Coordinating Your Team

The often missed essential ingredient for scalability is a team that maps well to the technological topology described above. Such a team includes three main groups (note: naming conventions for titles and division of responsibilities can vary across organizations):

  • Developers — feeding application code into the pipeline
  • ITOps Teams — responsible for writing Infrastructure-as-Code to build images to contain and run the application code
  • Site Reliability Engineers (SREs) — work with the ITOps team to monitor and optimize the running application in production, designing and building the environment for scale and reliability

The trick to coordinating your team in such a way as to maximize scale is to have SREs focus on reliability efforts by leveraging the Infrastructure-as-Code that their team members have written, rather than spending time on manual configuration. This makes for a different type of team arrangement than a legacy team structure, in which application code is simply “thrown over the wall” by developers to the ITOps team to deploy and run. The legacy model is a highly manual environment and is prone to error.

To complement this greater infrastructure visibility, engineering teams can implement a greater degree of application trace logging to help discover issues more quickly. As an incentive to create a more highly instrumented application, canary releases can be quickly deployed to a subset of the application’s user base, letting the team test new features and find bugs more quickly without affecting the larger application user base. Canary releases also let you gradually release new features, reducing the likelihood of incident spikes during rollout.

Communication is Key

Last but not least, remember how important communication is. It should go without saying that even the best-structured team will not succeed in enabling scalability unless team members can communicate seamlessly with each other.

Effective communication requires not only tools that can automate communication tasks, but also a commitment to ensuring that everyone on your team “speaks the same language”— meaning that developers, ITOps, and SREs can all talk to one another in a mutually intelligible way because they all understand each other’s roles and needs.

It can be intimidating to take the first steps down a path of application scale. People, processes, and technology all need to change to move from a Waterfall method to DevOps, and to evolve legacy infrastructure management practices into modern ITOps and reliability engineering.

Much in the same way that the agile development revolution added value on a quicker timeline, each step in the scaling journey brings value that can be realized immediately.

The post Scale Yourself Before You Wreck Yourself appeared first on PagerDuty.

Read the original blog entry...

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.

Latest Stories
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
"CA has been doing a lot of things in the area of DevOps. Now we have a complete set of tool sets in order to enable customers to go all the way from planning to development to testing down to release into the operations," explained Aruna Ravichandran, Vice President of Global Marketing and Strategy at CA Technologies, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...