Rake tasks for development and testing

I've created 3 rake tasks:

rake db:create:test
rake db:build
rake db:rebuild

The first creates your test database. The second, creates your
development database, migrates it, creates the test database and then
clones the development database to the test database.a

Lastly, the third drops your database (dev), creates it, migrates it,
and clones it for your test database.

For a few projects, I've gotten tired of doing: rake db:drop && rake
db:create && rake db:migrate && rake db:test:clone

link: http://dev.rubyonrails.org/ticket/10316

What do you guys think, +1's?

I recently had a long discussion with David Heinemeier Hansson. I was
a bit annoyed at the fact that rake db:reset was changed to use the
schema file.

Here is my explanation of how I use rake db:reset

I've been using rake db:reset mainly in the testing environment. Since
I do TDD, I usually don't create a migration all at once.
I write a test, generate the migration needed for the test to pass,
add the code, migrate, test passes. I refactor, write another test,
modify the migration, db:reset, write more code, test passes. I also
use rake:db:reset to make sure migrations won't break.

David made a good point saying that:

You're just iterating over one migration, which
db:rollback + db:migrate would deal with. I can sympathize with that
one. I just don't see the need to run ALL migrations again. Especially
not on production systems that might have hundreds of migrations.

I definitely get the point of verifying the current migration you're
working on. Perhaps something like db:regrate => [ db:rollback,
db:migrate ] would solve that case?

I was also annoyed by the change to db:reset, which is why I then wrote
rake db:rebuild. I mainly use it when I'm in the early stages of
development and my migrations are going through some changes. I could
rollback to and then migrate back up, but I just prefer the short syntax
of db:rebuild.

rake db:test:clone clones your test database from your schema (same as
rake db:reset - from schema)

I prefer to have a test database up from the get go (which is why I have
it in rake db:build) and when I remigrate.

The reason I prefer migrations over the schema is that I usually have
some data in my migrations, e.g. admin user, that I'd like to have
setup. I'm sure I'm not the only one who has some data in the
migrations.

The rake db:create:test came about because I wanted to have rake
db:build and have my test database setup(with the current schema). You
can't do rake :build => ["rake db:create RAILS_ENV=test"], it will fail
and rake db:create:all will create all databases defined in your yaml
file - I didn't want that either.

Those were my reasons why these came about and why I've been using them
for a while now. Am I missing something or doing something here that
shouldn't be done?

Matt Aimonetti wrote:

I'm not sure I exactly follow Robert's use cases, but I sympathize
with the frustration with schema.rb versus migrations. Check out this
thread for a similar discussion:

http://groups.google.com/group/rubyonrails-core/t/d871469cb2a6589a?hl=en

The highly-specific use cases for Robert's patch will make it tough to
be accepted, but I do believe there is hope for some migration support
in test. There are a lot of people blogging (and raising tickets like
8389) about wanting to use migrations to build test. If that happens,
at least some of the infrastructure for Robert's use cases will be in
place.

Matt, your conversation with DHH sounds like it touched on migrations
versus schema dumper. I'd love your thoughts (or David's!) on the
subject of migrations for building the test DB.

-Chris

Matt, your conversation with DHH sounds like it touched on migrations
versus schema dumper. I'd love your thoughts (or David's!) on the
subject of migrations for building the test DB.

Migrations were never meant to be data seeders. They were meant to
change the schema and massaging existing data to fit the new schema.
In that context, it doesn't make sense to use migrations for the test
database because the test database does not have permanent data of
importance.

But it seems that this misuse of migrations highlights something that
might be lacking: a data seeding system. People are cajoling
migrations to fit that role too even though it wasn't designed as
such. So we should think about addressing that concern as a separate
function.

For me personally, fixtures fulfill the seeder role for the test
database. I'm interested in knowing when that doesn't work for others.
BTW, I agree that fixtures should not be used to seed the production
database. That's another concern that Rails doesn't really address at
the moment.

In that context, it doesn't make sense to use migrations for the test
database because the test database does not have permanent data of
importance.

I think, it does make sense to run migrations in the continuous
integration loop (but not in the local build). Reason: you want to
test them, but you don't want to slow down the local build. A fairly
common practice is to use 001_initial_schema migration as the only
migration on the project for as long as there is no valuable
production data to preserve.

But it seems that this misuse of migrations highlights something that might be lacking: a data seeding system.

Yup. Another common practice is db/dataload.rb, a script of
ActiveRecord operations to put some data into the database, with the
corresponding db:dataload Rake task. Using AR and domain to create
this data is much easier than doing the same thing with YAML-based
fixtures.

Yup. Another common practice is db/dataload.rb, a script of
ActiveRecord operations to put some data into the database, with the
corresponding db:dataload Rake task. Using AR and domain to create
this data is much easier than doing the same thing with YAML-based
fixtures.

Mephisto uses fixtures in a special db/bootstrap dir, and inserts them
with a db:bootstrap task. db:bootstrap includes schema:load and the
custom data. Though I agree that doing this in ruby would be
simpler...

> In that context, it doesn't make sense to use migrations for the test
> database because the test database does not have permanent data of
> importance.

I think, it does make sense to run migrations in the continuous
integration loop (but not in the local build). Reason: you want to
test them, but you don't want to slow down the local build. A fairly
common practice is to use 001_initial_schema migration as the only
migration on the project for as long as there is no valuable
production data to preserve.

I don't think I understand this. Why do you want or need to
continuously test the migrations? In my opinion, migrations are
transient artifacts that only serve the purpose of moving everyone on
a schema version A to schema version B. Once everyone has been moved,
the migrations are useless and could essentially be deleted.

I'm not sure the "continuous" that Alexey was referring to was the CI
process (as in: it is always running), or a repeated run of every
migration each time the test suite is run.

The former certainly makes sense; you'd want to test that a migration
can successfully run based solely on the contents of the SVN
repository and other expected artefacts; verifying that it won't fail
because a developer has failed to commit or add a particular file
before you try and run that migration on the production system.

That is, you'd want to test that any new migrations don't cause a
system failure - not that every migration can be run, each time the CI
system runs against a new build.

I really like Alexey idea of a ruby file for loading the seed data and
run it with a rake task. Should that be something in rails or rather
"best practices"? Or should we have something akin to create_table
that's like create_data_for :users do... within something like
ActiveRecord::SeedData? (with associated rake tasks)

Granted that my case may be one of few. With that said, what do you
think about my proposed rake tasks, minus the db:test:clone? That way,
we can build the test db with a rake task, and can rebuild from
migrations as well? Or am I still off the mark?

DHH wrote:

> > I think, it does make sense to run migrations in the continuous
> > integration loop (but not in the local build). Reason: you want to
> > test them
>
> I don't think I understand this. Why do you want or need to
> continuously test the migrations?

I'm not sure the "continuous" that Alexey was referring to was the CI
process (as in: it is always running), or a repeated run of every
migration each time the test suite is run.

The former certainly makes sense; you'd want to test that a migration
can successfully run based solely on the contents of the SVN
repository and other expected artefacts; verifying that it won't fail
because a developer has failed to commit or add a particular file
before you try and run that migration on the production system.

That is, you'd want to test that any new migrations don't cause a
system failure - not that every migration can be run, each time the CI
system runs against a new build.

+1

I would like to see a way to test migration, especially those
involving substantive changes to the data, in the framework. And
being able to run them against a large enough dataset, preferably a
copy of the production database. But I don't see testing migrations
being the same thing as running test cases against the test database.

Assaf

Yup. Another common practice is db/dataload.rb, a script of
ActiveRecord operations to put some data into the database, with the
corresponding db:dataload Rake task. Using AR and domain to create
this data is much easier than doing the same thing with YAML-based
fixtures.

I've set up apps to detect when they have an empty database, and to
run an action which uses regular AR stuff like User.new to seed the
database. That way, the application "sets itself up" the first time
it's run - no additional rake task needed (but with the overhead of
checking to see if we've got a "clean slate").

In an ideal world, I think Rails applications that have the right info
in config/database.yml would be able to create their own database
(something like rake db:create), load their own schema (rake
db:schema:load), and seed their own data (rake db:bootstrap)
automatically when run for the "first time".

The trick would be knowing when an application was being run for the
first time, but that might be as simple as telling people to not run
rake db:schema:load (or rake:db:create) and simply starting their
application after filling out config/database.yml (if database doesn't
exist or has no tables, run some "init" action if it exists).

I'm not sure if this sort of thing is possible (or a good idea), but
it might be worth thinking about.

- Trevor

"...migrations are transient artifacts that only serve the purpose of
moving everyone on a schema version A to schema version B."

David,
Koz expressed almost exactly this same sentiment yesterday in another
thread (http://groups.google.com/group/rubyonrails-core/browse_frm/
thread/d871469cb2a6589a?hl=en). You guys are consistent in the
message. But there is an argument being expressed in these threads,
plugins and trac tickets for using migrations for more than just one-
time changes.

I use migrations for building the databases FROM SCRATCH for both
development and production. And I would like to do the same in test
because it works so well for development and production.

*Development: (Before going live and before production even exists)
Occasionally I will end up with a development DB that is full of cruft
and I want to reset. So I drop the development DB and rebuild.
*Production: After months of development, I'm ready to put an app into
production, so I contract with a hosting site and build it from
scratch.

What both these scenarios have in common is that the ruby schema
dumper is inadequate (no DB-specific stuff supported) and the sql
schema dumper is also inadequate (no non-DDL available, such as seed
data loading). Migrations work beautifully to address these problems
in a very Rails-like way (no plugin required!) and using syntax I've
already invested in. I can add an Admin user, a Guest user and their
authorizations and be able to use the app after rake db:migrate.

On a related note, there seems to be a migration-versus-fixtures
debate for seed data coming over the horizon. There is no reason you
can't do a hybrid by loading fixtures within a migration. In fact,
such an approach is described in Agile Web Development with Rails
(page 271, section 16.4). It works well and capitalizes on two well-
tested and understood Rails tools.

It is unfortunate that such a great tool (migrations) can't also be
used to build the test DB. As it is now, I occasionally find my
migrations fail due to subtle DB-side changes or model changes. The
only way to keep them fresh is to manually rebuild a database from
time to time. But it sure would be nice if they could be used in the
day-to-day of building the test DB.

-Chris

*Development: (Before going live and before production even exists)
Occasionally I will end up with a development DB that is full of cruft
and I want to reset. So I drop the development DB and rebuild.
*Production: After months of development, I'm ready to put an app into
production, so I contract with a hosting site and build it from
scratch.

Both of these scenarios are intended to be solved with db:schema:load.
That task isn't working for you because you're putting seed data into
migrations. In turn, you feel pain from db:schema:load because it
doesn't include your seed data. I think the problem here is seed data
in migrations, not migrations vs schema.

What both these scenarios have in common is that the ruby schema
dumper is inadequate (no DB-specific stuff supported) and the sql
schema dumper is also inadequate (no non-DDL available, such as seed
data loading).

In my mind, this is a perfect case for SQL schema dumper. You have a
db-specific schema that uses tricks not accessible by the Ruby dumper.
If you split out the concern of seed data, I think a lot of your
problems go away.

Migrations work beautifully to address these problems
in a very Rails-like way (no plugin required!) and using syntax I've
already invested in. I can add an Admin user, a Guest user and their
authorizations and be able to use the app after rake db:migrate.

Again, I think this is a mistake and it was certainly not what
migrations were designed for. They lead to all the pains and problems
you're describing with migrations.

I fully realize that people are misusing migrations in this way
because they were missing a seed system and just grabbed something
that had the same vague outline. But I think the problem then is to
consider how to best do seeding. Not to twist migrations into a seed
system.

It is unfortunate that such a great tool (migrations) can't also be
used to build the test DB. As it is now, I occasionally find my
migrations fail due to subtle DB-side changes or model changes. The
only way to keep them fresh is to manually rebuild a database from
time to time. But it sure would be nice if they could be used in the
day-to-day of building the test DB.

Again, this is a symptom of wanting to run migrations all the time and
thus needing to make sure they'll work for all eternity. I think
that's a waste of time and hard too. You might very well have old
migrations that depend on classes and methods that are no longer
around. I've seen some of the hoops that people jump through to keep
legacy behavior intact for migrations and it sure ain't pretty.

So in summary, what we need is a seed system as either a best
practice, plugin, or core (doubtful, it doesn't feel like a Most
People, Most of The Time concern) and stop trying to turn migrations
(or even fixtures) into a seed system.

In an ideal world, I think Rails applications that have the right info
in config/database.yml would be able to create their own database
(something like rake db:create), load their own schema (rake
db:schema:load), and seed their own data (rake db:bootstrap)
automatically when run for the "first time".

I think this is being way too clever. Different applications will have
different things they need to have happen before they can run. That
might be gem dependencies, that might be ensuring a certain version of
Ruby, it might be setting up seed data, it might be so many things
that it's not worth standardizing. Just create script/setup and put in
the README that people should run that when first installing the
application. Problem solved, IMO.

Would something like the Scenarios plugin solve your problem?

http://faithfulcode.rubyforge.org/docs/scenarios/

[snip]

Again, I think this is a mistake and it was certainly not what
migrations were designed for. They lead to all the pains and problems
you're describing with migrations.

I fully realize that people are misusing migrations in this way
because they were missing a seed system and just grabbed something
that had the same vague outline. But I think the problem then is to
consider how to best do seeding. Not to twist migrations into a seed
system.

I'm one of the people misusing migrations in this way, and a seed system could fulfill part of the problem, but not all of it.

I often use migrations for creating data that needs to be present in every environment. For example, a new account in an accounting table. I want to add that to an existing production application without reloading all of the data. By including it in a migration, I'm sure that it will exist in the database of every developer, our integration environment, and finally production. It keeps me from having to track down data bugs.

Of course, the fact that I also have to encode that data into my fixtures isn't very DRY. I'd love to have some way of specifying the data only once, I just don't have any brilliant ideas about how to do it.

[snip]

Mike Mangino
http://www.elevatedrails.com

If we kept the integrity of migrations and moved to create something
like SeedData or an implementation of Scenarios, then we can still
keep the state that you want. We'd just have the normal migration
tasks and then tasks for seeding the data and a task for migrating and
then seeding.

IMO, I'd love to see a seed system that mimics migrations a bit and
keeps the standard AR syntax that we are used to: Person.create(...).

Perhaps something like

class SeedPeople < ActiveRecord::SeedData
  def self.up
    create_data :people do |p|
      p.create(:name => "Robert", :password => "secret")
      p.create(:name => "John", :password => "supersecret")
    end
  end
end

I find that the above syntax feels comfortable to me - what do other
people think of the above? John Long has a lot of this already done.

Are there issues for having a simple rake task for setting up just the
test db or even just the production db (without loading any data)?
e.g. rake db:create:test, rake db:create:production - Perhaps I'm just
not seeing the reason why these aren't nice conveniences?

IMO, I’d love to see a seed system that mimics migrations a bit and
keeps the standard AR syntax that we are used to: Person.create(…).

If we have a version-based data-seeding system then we’ve really just created a parallel set of migrations. Same benefits, same problems. Once the models are out of date (say you move a field from one table to another) then your older seed files will be broken.

Yet, if we don’t use a version-based system then it’s difficult to know what actions to perform to update a given environment. Adding missing data is easy enough but how do you track when data was removed?

I can see now why we haven’t had any kind of data-seeding mechanism and why many of us (myself included) cannibalized migrations for that purpose.

::Jack Danger

I know that these tasks may not be for core, but for those that did find
some of them useful, I have plugin that includes them and some other
tasks, including Tobias Lutke's backup task:

http://svn.robertrevans.com/plugins/data_tasks/