Correct Use or Naming of Migrations

Hi Folks,

There is much discussion where I work at present regarding migrations, and the ‘correct’ usage or not of them.

TL;DR

Should migrations be used just to modify schema, and other ‘deploy’ tasks done other ways? If yes, how to ensure no repeat runs of such tasks (seeds, jobs etc)

Background

Once you’re live there are often additional things which need to be done on a deploy. They only need running on that specfic deploy of code, not each subsequent deploy. This might include:

  • Sending emails to subset of customers

  • Seeding new tables

  • Starting off a delayed job

Currently the simplest mechanism we’re aware of is to add in a migration which will either seed or insert jobs or whatever. Rails then knows that this migration ran, and won’t run it on subsequent releases.

The question is as to whether this is an abuse of migrations, given that it’s db:migrate. The expectation might be that they should be about schema transformation only…

Should migrations be considered anything to migrate the system state on a deploy, which includes seeds and other tasks, in which case, should db:migrate be renamed or aliased to something which suggests use for more than just ‘db’, like deploy:migrate (kind of like how before_filters were renamed to avoid confusion of people stating they should only be used for filters

If on the other hand, they’re being abused. What is the way to support single run on deploy tasks? The migration framework is really the right ‘fit’ here. Does an abstraction of it need building for deploy:tasks which does the same sort of thing.

The general issue we encounter is that if on deploy today we need to run rake xyz and rake abc, but then next deploy we run rake cde it’s prone to human error, rather that automation.

Additionally, rails doesn’t (to my knowledge) provide a way to run db:seed after you have a live system which doesn’t try and push prior seeds in too. What is the recommended way to add seeded values to new tables in a live system?

Even if you just modify schemas, the only reliable way I know of recreating your db is to run all migrations. The guides tell you otherwise:

http://guides.rubyonrails.org/migrations.html#schema-dumping-and-you

"There is no need (and it is error prone) to deploy a new instance

of an app by replaying the entire migration history. It is much simpler and faster to just load into the database a description of the current schema."

This doesn't work for many cases if your schema format is :ruby (the

default). For instance, my application makes use of different PostgreSQL schemas to organize some tables.

Also, I strongly believe that you're wrong if you're not using

foreign keys in your database. That’s why I believe the :ruby format should either support foreign keys representation or :sql should be the default one.

But looking at the guide it seems it's not Rails opinion on the

subject:

"
The Active Record way claims that intelligence belongs in your

models, not in the database. As such, features such as triggers or foreign key constraints, which push some of that intelligence back into the database, are not heavily used."

Unless you use :sql for schema_format of course.

@mrloz

I think doing anything outside of modifying the database (seeding is pushing it, but I think in some cases it is acceptable) is definitely a hack. The immediately obvious problem with that solution is that some of the tasks you described is not idempotent, such as sending emails. This might case problem if you experienced problems with your new deploy and had to roll back the changes temporarily and revert the rollback at a later time. You can raise an exception on rollback, but since you’ll have to determine how to manually recover from the broken state (and how to restore it later), I don’t quite see what you are gaining here.

Deployment is one of the few things that Rails doesn’t manage for you since it is difficult to come up with a general framework that is a good enough starting point for most people. Migrations is designed specifically to manage changes to your database schema, and is not technically tied your deployment flow (while you might just automatically run db:migrate on every deploy, it is not exactly straight forward to keep your schema in sync with your code on rollback).

So, because Rails doesn’t dictate how you manage your deployment workflow, you’d just have to engineer your own solution that works for your particular deployment flow.

@rodrigo

Even if you just modify schemas, the only reliable way I know of recreating your db is to run all migrations.

This doesn’t work for many cases if your schema format is :ruby (the default).

Unless you use :sql for schema_format of course.

I think you kind of had it backwards:

  1. The purpose of schema.rb/sql is exactly what the guides described - to recreate the database schema on a new instance of the app without having to re-run all the migrations.

  2. You are correct that schema_format = :ruby does not support vendor-specific features like HStores or other “advanced” SQL features like foreign key constraints.

From 1 & 2, the obvious conclusion should be that “If you are using any vendor-specific or advanced SQL features, you should turn on schema_format = :sql” instead of “If you are using any vendor-specific or advanced SQL features, you should re-run all your migrations on a fresh deploy.”

As for the foreign-key constraints debate, IIRC it was quite controversial in the early days of Rails, but it’s a largely settled debate. There are still different opinions on this of course, but at the end of the day Rails would have to pick a default convention, and “intelligence belongs in your models” have won. And hence schema_format = :ruby is the default. (With new AR features like hstore support on the horizon, that decision might worth a revisit, but that’d be a separate discussion.)

You can still choose to use vendor-specific or advanced SQL features of course, hence the ability to toggle that setting to :sql.

Godfrey

I changed my mind about this since I sent my last response:

I think doing anything outside of modifying the database (seeding is pushing it, but I think in some cases it is acceptable) is definitely a hack.

Reflecting on the history of Rails, particularly what led to the addition of seed.rb and the rake db:seed - I realized even seeding the database in migrations is a bad idea in all cases - unless the migrations DSL provides a way to do that and have that reflected in schema.rb/sql, which I don’t believe is possible.

I think, the rule of thumb is that if you are doing anything in your migrations that would case your schema.rb/sql + seed.rb to be out of sync with your actual database state (i.e. running db:schema:load and db:seed is not sufficient to boot a fresh instance of your app), then you are doing it wrong.

The best way to do seeding is to make sure your seed.rb is “one-way idempotent” (as in - running db:seed multiple times on incremental rollbacks would not create duplicate data, etc. Use find_or_create_by and friends to help) and run that after every deploy together with db:migrate. Of course, when it comes to rolling back a deploy, you’d still have to figure out what to do with those data on a case-by-case basis, but that’s expected for any non-idempotent actions you perform on deploy. Alternatively, if you engineered a custom framework/workflow for these after deploy actions (deploy-specific cap scripts and whatnot), then you could do the seeding there, but you’d still want to keep your seed.rb up to date.

Of course I meant to type "deploys". My apologies for the spam.

Foreign key is a basic concept in RDBMS for several years and implemented in all major RDBMS vendors, such as Oracle, PostgreSQL, Firebird, MS SQL Server and MySQL (it is also supported by default nowadays since InnoDB is now the default engine) among many others used in production systems. I can’t really recall any RDBMS vendor that doesn’t support foreign keys (no one should be really using MyISAM as MySQL engine).

It is part of the ANSI SQL specs:

http://en.wikipedia.org/wiki/Foreign_key

"Foreign keys are defined in the ANSI SQL Standard, through a

FOREIGN KEY constraint. The syntax to add such a constraint to an existing table is defined in SQL:2003 as shown below…"

While I agree that HStores, PG schemas and other similar features

can be considered advanced ones, foreign keys by no means could be confused as an advanced SQL feature. It’s simply essential for any serious application relying on RDBMS, just like transactions support, unique constraints and so forth.

Keep the discussion on topic, will you? If you want to bash about how Rails sucks for not using foreign keys then open a new thread for that.

I believe supporting a "seeds" table in addition to the migrations table would be a great idea. The same way we currently have migrations under db/migrations/ it would be great if we could have our seeds under db/seeds/. Then "rake db:migrate" would also run "db:seed" or whatever other task name you might prefer. When running rake db:reset the seed migration would be run after restoring the DB infra-structure.

That sounds like a good tool to add to AR migrations.

I agree.

Whilst there might be a number of different things people do, rails providing some guidance, or reusable hooks to support the following would be helpful:

1. Seeding stuff after you have a live product 2. Running one-time tasks on a deploy without changing your automated release processes.

Both lend themselves to having a open framework which migrations is 'built' on, which lets you use a gem or whatever to provide another 'tracked' table such as a seeds table or a 'deployment_tasks' table which uses the same basic techniques as the migration system, that is:

1. it will have a rails generator to build a skeleton class to perform the 'task' which is named with a timestamp etc like migrations 2. a rake task to run the task which will execute any 'unrun' jobs in the folder

This would allow Rails' migrations to be built on top of this (and thus support rollback or whatever), but allow plugins or to build a system for more app specific tasks which don't however have to reinvent the robust framework that migrations currently has.

One small thing I’d like to point out is that running migrations serially is not the same as recreating the database from a schema dump. For example, the Rails migration timestamp directive has flip-flopped in its behavior (created_at was originally nullable, then not nullable, then nullable again) so the effect of running a migration is a combination of the migration’s own code plus the underlying gems and application code in effect at the time it runs.

I like the idea of a sister directory to migrations to synchronize deploys with the one-time tasks that accompany them.

At Gazelle, we essentially copy the migration rake tasks and call them iterations. So, as you’d imagine:

rake iteration:migrate # like rake db:migrate

We use it for data only changes.

Here’s the task I alluded to if anyone is interested: https://gist.github.com/blatyo/5047690

This looks like pretty much what I think should be available, only difference being the ability to specify what table to record the running to keep db:migrate free from non db concerns