1. Know your basic tools

  • bin/rails db:migrate:status, your go-to "where am I" command. What has ran, what hasn't.

  • bin/rails db:migrate, run all migrations that haven't been ran yet.

  • bin/rails db:rollback, find the one latest migration that has been ran and reverse it.

  • bin/rails db:migrate:up VERSION=12345678, find migration by timestamp and run it.

  • bin/rails db:migrate:down VERSION=12345678, find migration by timestamp and reverse it.

  • bin/rails db:migrate:redo, rollback the latest migration, then bring it back up again.

2. Change data via raw SQL in isolation from the rest of your app

When backfilling or transforming data in a migration, rather than referencing any Active Record objects, opt to write them in raw SQL instead. If for no other reason, it's good for the brain to remember how to do it without the convenience of an ORM.

But the real reasons are that raw SQL executes much faster, leading to reduced downtime or error rates on deploy, and that references to app code that is changed/moved will cause surprises and confusion. Because migrations run in many different environments at many different times, the odds are that if someone changes a constant name, for example, that could cause an incoming migration that references it to break.

If you absolutely must use Active Record, define whatever model you need within the migration. That way, it remains separate from your app code but you still have the ability to work with data in Ruby. In my experience this very rarely happens.

class DoThisAndThat < ActiveRecord::Migration[123]
  MyObject = Class.new(ActiveRecord::Base) # Barebones model bound to table `my_objects`

  def up
    MyObject.where(foo: :bar)
  end

  # ...
end

As a bonus, call the SQL command-specific method you are writing, and you'll get extra informational output.

  • execute for select
  • insert, update, and delete

3. Fully test a rollback and understand how it affects the data

Consider whether your migration will roll back cleanly. Does it transform the data back to fit the original column configuration? Is there any case where it would destroy data?

With highly complex data changes, the reality is that often we don't take the extra time to perfectly transform the data back to how it was, especially considering you more than likely won't even run it. Kinda seems like wasted effort. Since your teammates will run your migration on all of their machines on very different branches in Git, they are, in a way, your customer. At least put the old columns back in their original configuration to provide flexibility when switching between branches, even if the end result is destroyed data, which is fine in the development environment.

And so, to truly test a migration before calling it done:

  1. Implement the migration, successfully migrate it against a fresh production dump, and commit it and schema.rb.
  2. Roll it back and forward, checking there are no further changes to schema.rb. If there is a diff, you need to re-commit.

At that point, you can be reasonably sure both directions behave as you expect, and that generated files are up-to-date.

4. Deprecate rather than drop columns for risky changes

To dovetail with a rollback strategy-- big complex migrations are scary, humans are imperfect, and business requirements change often. When you have unpredictability in a data change, consider adding _deprecated to the end of now-unused column names rather than removing them.

This not only gives you a sure-fire rollback plan but also affords you flexibility in any follow-up migrations. It can also help you keep test data, saving you from having to re-import production data when testing locally. One caveat is that you have to remember to circle back and clean up the deprecated columns once you're sure everything is working properly.

5. Know if your database supports transactional DDL commands

DDL is the subset of SQL that encompasses altering the columns within your tables. Postgres supports transactional schema changes, MariaDB (MySQL) does not. And if you're not using either of those, take a read of this article that has a full list of database software.

This is pretty important to know when you start building migrations. If your DB software does not support this, you have to take extra consideration when executing the migration and something blows up. You can't just tweak your code and rerun the migration because any previous operation will run again, raising a different error about columns already added or removed.

6. Leverage comments or explicit raises when dev cycling

If you're on MariaDB/MySQL or a DB that doesn't support transactional DDL commands, use comments to maintain your current position when debugging through a migration. That is, if operation #1 succeeds but operation #2 fails, comment out operation #1 and rerun the migration. Then, that allows you dev cycle against operation #2.

class MyCoolMigration < ActiveRecord::Migration[5.1]
  def change
    # change_table :foobars do |t|
    #   t.references :baz, polymorphic: true, index: true
    #   t.integer :cool_number, null: true
    #   t.boolean :awesome, null: false, default: false
    # end

    # change_column_default :foobars, :this_thing, '123'

    reversible do |dir|
      dir.up do
        update "update foobars set blablabla complex SQL" # Troubleshooting this bit..
      end

      # ...
    end
  end
end

If you keep hitting errors, rinse and repeat. You can leave a trail of comments until you are through your up method, then uncomment it all. You can use the same technique to develop your down method.

Conversely, if your DB software supports running schema changes in a transaction, your life is easier. Try out a fail call at the end of either direction when building it. That will raise an error before your migration has a chance to complete, meaning you can keep running db:migrate or db:rollback without having to reset the data at the end of each dev cycle. That better allows you to focus when working out a specific issue with either direction.

7. Take advantage of automatically reversible migration methods

Active Record gives us quite a head start with its knowledge of how to reverse certain operations, but undoubtedly you'll hit an IrreversibleMigration error at some point. Before falling back to pre-Rails 4 up/down methods, break out a reversible block instead:

def change
  create_table :things do |t|
    # ...
  end

  reversible do |dir|
    dir.up do
      # ...
    end

    dir.down do
      # ...
    end
  end

  change_column_null :things, :foo, false
end

This lets us take advantage of auto-reversible migration methods while implementing a rollback strategy side-by-side. This reads beautifully for data transformation, for instance, as reversible's direction blocks are a great place for raw SQL.

Be aware that passing an extra argument or two makes many single-column migration methods reversible. For example:

  • remove_column :table, :column, :datatype instead of remove_column :table, :column

  • rename_column :table, :column, from: :old, to: :new instead of rename_column :table, :column, :newname

8. Your schema.rb or structure.sql file is your reality check

Take special note of your "schema blueprint" as I like to call it. If you are unfamiliar, running Rails migrations produces the final result of their changes, becoming the definition of your current database structure in its entirety. Every table, column, and attributes of the column is there. It has to be exhaustive, because you should be able to execute the blueprint in any environment to create an exact replica of what's in production, minus the data of course.

So once you put the polish on a migration and are ready to commit, inspect the blueprint's git diff closely to double-check the changes you made. This helps a lot because it not only shows your changes but also the context they were made in, prompting you to consider further ones. For example, you might notice another similar column when locking down null values that should also be null: false.

Getting in this habit will also prevent having to add follow-up commits when you forget to check in the blueprint or an incorrect one.

Note that it's also regenerated on running or rolling back any migration in any environment. If you perform db commands on your test environment, for instance, that will change the blueprint to reflect the state of your test database. Most of the time the two environments should be in sync anyway, but if you see any unexpected output, remember to check both.

Lastly, if your app uses structure.sql, note that sometimes you'll find tiny one-off changes in your diff. This happens due to slight differences in database versions between environments. For example, I notice some versions of Postgres on some OSes surround parts of more complex indices with parenthesis, while others don't. While somewhat annoying, sometimes you have to throw away these minuscule changes via a file-specific git checkout.

9. Be cognisant of migration file changes over multiple commits

You can add simpler migrations to your codebase in 1-2 commits, but otherwise you'll find yourself making the changes over the span of many more. There's nothing wrong with this, in fact they often reflect the progression of different ideas and may each have future value.

Just remember that more commits means more busywork when combining your DB changes with incoming ones via a rebase. Each commit that changes the blueprint would be conflicted, sticking you with conflict resolution over and over again.

Use this to inform your strategy and consider merging rather than rebasing when introducing more complex schema changes. Otherwise, rebasing works with simpler ones.

More blog posts