Welcome!

Blog Feed Post

Rails migrations with no downtime

Rails Migrations with no Downtime

This is a guest blog post by Pedro Belo. Pedro worked as a Ruby consultant in Brazil and in the US, until joining Heroku in 2008. When we first read his article from July 2011 we immediately knew it would be perfect to publish on our blog.


Lets understand the challenge of changing a Rails database without introducing any downtime with a simple, apparently harmless migration:

class ApparentlyHarmlessMigration < ActiveRecord::Migration
  def self.up
    remove_column :users, :notes
  end
end

I learned this kind of migration is not really harmless the hard way: during a production deploy. Since the column was not being referred anywhere in the code, I assumed it was a migration that I could just run at will.

But as soon as rake db:migrate was done, errors started to pop up at the logs:

PGError: ERROR: column "notes" does not exist

Turns out that ActiveRecord caches table columns, and uses this cache to build INSERT statements. Even if the code is not touching that column, ActiveRecord will still attempt to set it to NULL when saving models.

I was just starting to realize how delicate database migrations are.

My first reaction was to call for a maintenance window whenever we had a migration to deploy. But that practice quickly became unfeasible as it blocked deploys, left users unhappy and just made it evident that we were doing things wrong.

It was time to understand the problem, and fix it for good.

Hot Compatibility

So here’s the basic principle that allows you to avoid downtime: any migration being deployed should be compatible with the code that is already running.

In order to do so, you’ll usually split your deploy process in two steps:

  1. Make the code compatible with the migration you need to run
  2. Run the migration, and remove any code written specifically for it

Going back to our example: if you want to remove a column, you’ll need to deploy a patch telling ActiveRecord to ignore it first. Only then you can deploy the migration, and clean up that patch.

With that in mind, lets understand what are the different patches that will make your code ready for a migration.

Patterns

Keep in mind some of the patterns I’ll introduce here are for Postgres only.

Read-only models

Most of the issues coming from migrations happen when you’re writing to the database. If you don’t need to save your model make it explicitly read-only:

class Role < ActiveRecord::Base
  def readonly?
    true
  end
end

Tables for read-only models can be modified without any concerns.

Removing columns

Tell ActiveRecord to ignore a column from its cache:

class User
  def self.columns
    super.reject { |c| c.name == "notes" }
  end
end

Once this patch is deployed you can safely remove the specified column.

Renaming columns

There’s no way to rename a column without downtime. You can get the same results, though, by adding a column, migrating the data and then dropping the previous one.

So the first step here is to just add a new column and make sure your code is writing to it. The catch is that at this point you need to read from both columns, adding a fallback to the accessor:

def first_name
  super || attributes["fname"]
end

This will make AR read from first_name if available, and default to fname otherwise. You can then populate the new column, and safely remove the old one (removing this patch as well).

This obviously doesn’t address SQL queries. If you’re currently using the column in any search expression, you’ll need to split the deploy in three steps:

  1. Add a column with the desired name and change your model to write data to both (but don’t change any queries yet)
  2. Populate the new column with data from the previous one, and update queries to refer to the new column name
  3. Delete the old column

NOT NULL constraint

First make sure you’re writing to the column that will be receiving that constraint. For example:

before_save :assign_defaults

def assign_defaults
  self.admin ||= false
end

Then update all existing records that have it set to null, and only then you’re safe to add the constraint.

Creating indexes

Creating indexes on a live system is surprisingly unsafe: ActiveRecord doesn’t create indexes concurrently, so your table will be locked against writes. If you’re writing a lot of data to the table, or if there’s a lot of data to be indexed, you probably want to create it concurrently instead.

The catch is that you can’t create concurrent indexes from a transaction, and all Rails migrations run within one. So you’ll need to resort to a hack and create your index using raw SQL:

class IndexUsersEmails < ActiveRecord::Migration
  def ddl_transaction(&block)
    block.call # do not start a transaction
  end

  def self.up
    execute "CREATE INDEX CONCURRENTLY index_users_on_email ON users(email)"
  end
end

The good news is that Rails will be able to dump that index to a Ruby schema normally (despite the raw SQL).

Cheat sheet

Adding columns – Safe for readonly models – Safe when there are no constraints on the column

Removing columns – Safe for readonly models – Tell AR to ignore the column first

Renaming columns – Not safe – First add a new column, then remove the old one – When the column is used on SQL queries you’ll need to split this in three steps

Creating tables – Safe

Removing tables – Safe

Creating indexes – Safe only for readonly models – Otherwise make sure you create indexes concurrently

Removing indexes – Safe

Future

Running migrations with no downtime takes a lot of planning, and work. But programmers are good exactly at abstracting work and turning repetitive tasks like this into something that can be reused. So it seems like we’ll naturally see a lot of the work described above abstracted on a level below the application.

ActiveRecord, for instance, could be more resilient to migrations. A naive approach to resolve the problem of dropping columns would be to rescue the database exception saying the column doesn’t exist, remove it from the cache and retry. It’s hard to do this in a reliable and clean way, but it seems possible.

Going further, as we move from monolithic applications running on a single server to distributed systems, the need for a database that can elegantly support migrations gets much higher. That’s certainly part of the motivation behind the NoSQL movement – but I’d expect change in relational databases too. Ideally they would adapt to this new ecosystem by providing tools to make our lives easier, like the ability to alias a column.

But enough speculating.

The reality today is that hot compatibility needs to be addressed on the application level, and that’s the best way to avoid maintenance windows or serving errors to your users.

Pedro talking about Zero Downtime Deploys at Rails Conf 2012

Pedro talked about Zero Downtime Deploys at Rails Conf 2012. You can check out his video here:

We want to thank Pedro for letting us republish his original article. If you have any questions or tell us about your experience with Zero Downtime Deploys let us know in the comments! You can also get in touch with Pedro on twitter.

Further Information

We talked about Zero Downtime Deployment in our Workflow Series here. You can also check out our free eBook which has a dedicated chapter on Zero Downtime Deployment.

Read the original blog entry...

More Stories By Manuel Weiss

I am the cofounder of Codeship – a hosted Continuous Integration and Deployment platform for web applications. On the Codeship blog we love to write about Software Testing, Continuos Integration and Deployment. Also check out our weekly screencast series 'Testing Tuesday'!

Latest Stories
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to gre...
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develop...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...