In the need for separating data migration from schema migration

Doing freelance with Rails, I keep facing an issue when I join an existing project: outdated migrations that fail dozens of times during rake db:migrate

``

To fix those, I need to do very hacky things, like commenting the failing lines, but also the already-migrated-schema ones which is very error-prone. Anyways, that’s not the way it should be.

Should I use

rake schema:load

``

? That’s not what I read as a good practice.

After some thinking, I came up with the simple idea of separating schema migrations from data migrations.

Why?

Because it’s always data migration that fail.

Eg.

class AddStatusToUser < ActiveRecord::Migration
  def up
    add_column :users, :status, :string
    User.find_each do |user|
      user.status = 'active'
      user.save!
    end
  end

  def down
    remove_column :users, :status
  end
end

Then you’d remove the User class and the migration is broken.

After some search, I found this very nice article that summarized all the pain-points coming from not separating data migrations and schema migrations : Change data in migrations like a boss – WideFix Here, Andrey Koleshko suggests the following syntax: class CreateUsers < ActiveRecord::Migration def change # Database schema changes as usual end

def data User.create!(name: ‘Andrey’) end end

I would suggest instead, something like creating two types of migrations : normal migrations, that remain unchanged, and data migrations, that would work the following way:

class CreateDefaultUser < ActiveRecord::DataMigration # Inherits from a different class: DataMigration

def up User.create!(name: ‘Andrey’) end

def down User.find_by_name(‘Andrey’).destroy end

end

I reckon it makes sense to put it in different files, as this is not the same type of database modification.

And would be created calling
rails generate migration --data CreateDefaultUser

``

They would behave slightly differently:

- Ignored when running something like:
 rake db:migrate --no-data # or
 rake db:setup --migrations

``

``

- or at least non-blocking (rescuing exception as a warning)

The cool thing about this is that there is no backward compability issue here.

You could say that's ok, there's already a gem that does that, those who need it can use it. But that's not true, as most developers would find out about the gem way too late, when all their broken migrations are already there. Plus if we call it a "best practice", it should not be optionnal.

If it does make sense, I would definitely look at how to write such change.

Thanks for consideration.

Augustin-

I already solved this problem with a Gem I wrote. Instructions here: https://github.com/jasonfb/nondestructive_migrations

As for whether or not it should be in Rails core, I cannot speak to that.

Generally speaking on large Rails apps most developers don’t run the schema migrations from the beginning of time (ever). This is true of most of the larger apps I’ve worked on. Unless you tirelessly work at always backporting fixes to old migrations, you will always get old migrations fail as Rails gets upgrades (simply from deprecations in syntax). You basically have either two options: (1) always work to keep your migrations current over the lifetime of the app (you really can’t be a freelancer for that— that’s something that needs to come from the team lead), or (2) forget about running the schema migrations from the dawn of time and just import your production data.

Most larger apps eventually switch over to strategy #2 described above.

In terms of your schema migration-data migration split— yes, the benefits you describe are inherent. Also, you could strategically choose to keep the schema migrations running from the beginning of time but not the data migrations.

As well, I use data migrations to speed up deployment. In particular, running background data migrations while the app is live helps me reduce hours of downtime down to minutes. (My last enormous data migration took 11 hours).

-Jason

Jason, how do you avoid the same data migration problems in your gem? If models you reference in a migration change, the migration would break, wouldn’t it?

Augustin, I find it best to not reference any app models but rather define new classes for the models I need inside the migration.

Jason, how do you avoid the same data migration problems in your gem? If models you reference in a migration change, the migration would break, wouldn’t it?

I don’t run the data migrations from the beginning of time – I have no need to, I just dump Production data. Since the migrations have already been run on Production, they don’t run again in the future (at the time of a changed model reference)

I do run the schema migrations from the beginning of time-- and I rarely if ever reference Model names in them.

Augustin, I find it best to not reference any app models but rather define new classes for the models I need inside the migration.

Also if you can write your migration in SQL it is typically many times faster than Ruby, so when I can I prefer writing the migration in raw SQL

So yes the problems you describe don’t go away, I just use different strategies to deal with them.

You should be using rake db:setup which imports the schema and loads seeds. seeds are the place that is intended for initializing the database with data, not migrations.

It is actually good to prune old migrations from time to time. For example, I like to use GitHub - jalkoby/squasher: Squasher - squash your old migrations in a single command to create a new "first"migration and then just delete all old migrations.

The answer is in the default comment atop of schema.rb:

Note that this schema.rb definition is the authoritative source for your

database schema. If you need to create the application database on another

system, you should be using db:schema:load, not running all the migrations

from scratch. The latter is a flawed and unsustainable approach (the more migrations

you’ll amass, the slower it’ll run and the greater likelihood for issues).

Migrations are meant to be temporary ways of shipping schema changes between developers and production. Not as a bootstrap method.

Or if you, like us, need to use some specific data types not supported by schema.rb, and you don't need to support multiple database vendors, then you should use the structure.sql format instead. But DHH's explanation is correct, I'm just amending since schema.rb may not be suitable for everyone and some people might think that the only alternative would be to re-run all migrations. It's not.

Rodrigo.