Rake tasks for development and testing

I've created 3 rake tasks:

rake db:create:test rake db:build rake db:rebuild

The first creates your test database. The second, creates your development database, migrates it, creates the test database and then clones the development database to the test database.a

Lastly, the third drops your database (dev), creates it, migrates it, and clones it for your test database.

For a few projects, I've gotten tired of doing: rake db:drop && rake db:create && rake db:migrate && rake db:test:clone

link: http://dev.rubyonrails.org/ticket/10316

What do you guys think, +1's?

I recently had a long discussion with David Heinemeier Hansson. I was a bit annoyed at the fact that rake db:reset was changed to use the schema file.

Here is my explanation of how I use rake db:reset

I've been using rake db:reset mainly in the testing environment. Since I do TDD, I usually don't create a migration all at once. I write a test, generate the migration needed for the test to pass, add the code, migrate, test passes. I refactor, write another test, modify the migration, db:reset, write more code, test passes. I also use rake:db:reset to make sure migrations won't break.

David made a good point saying that:

You're just iterating over one migration, which db:rollback + db:migrate would deal with. I can sympathize with that one. I just don't see the need to run ALL migrations again. Especially not on production systems that might have hundreds of migrations.

I definitely get the point of verifying the current migration you're working on. Perhaps something like db:regrate => [ db:rollback, db:migrate ] would solve that case?

I was also annoyed by the change to db:reset, which is why I then wrote rake db:rebuild. I mainly use it when I'm in the early stages of development and my migrations are going through some changes. I could rollback to and then migrate back up, but I just prefer the short syntax of db:rebuild.

rake db:test:clone clones your test database from your schema (same as rake db:reset - from schema)

I prefer to have a test database up from the get go (which is why I have it in rake db:build) and when I remigrate.

The reason I prefer migrations over the schema is that I usually have some data in my migrations, e.g. admin user, that I'd like to have setup. I'm sure I'm not the only one who has some data in the migrations.

The rake db:create:test came about because I wanted to have rake db:build and have my test database setup(with the current schema). You can't do rake :build => ["rake db:create RAILS_ENV=test"], it will fail and rake db:create:all will create all databases defined in your yaml file - I didn't want that either.

Those were my reasons why these came about and why I've been using them for a while now. Am I missing something or doing something here that shouldn't be done?

Matt Aimonetti wrote:

I'm not sure I exactly follow Robert's use cases, but I sympathize with the frustration with schema.rb versus migrations. Check out this thread for a similar discussion:

http://groups.google.com/group/rubyonrails-core/t/d871469cb2a6589a?hl=en

The highly-specific use cases for Robert's patch will make it tough to be accepted, but I do believe there is hope for some migration support in test. There are a lot of people blogging (and raising tickets like 8389) about wanting to use migrations to build test. If that happens, at least some of the infrastructure for Robert's use cases will be in place.

Matt, your conversation with DHH sounds like it touched on migrations versus schema dumper. I'd love your thoughts (or David's!) on the subject of migrations for building the test DB.

-Chris

Matt, your conversation with DHH sounds like it touched on migrations versus schema dumper. I'd love your thoughts (or David's!) on the subject of migrations for building the test DB.

Migrations were never meant to be data seeders. They were meant to change the schema and massaging existing data to fit the new schema. In that context, it doesn't make sense to use migrations for the test database because the test database does not have permanent data of importance.

But it seems that this misuse of migrations highlights something that might be lacking: a data seeding system. People are cajoling migrations to fit that role too even though it wasn't designed as such. So we should think about addressing that concern as a separate function.

For me personally, fixtures fulfill the seeder role for the test database. I'm interested in knowing when that doesn't work for others. BTW, I agree that fixtures should not be used to seed the production database. That's another concern that Rails doesn't really address at the moment.

In that context, it doesn't make sense to use migrations for the test database because the test database does not have permanent data of importance.

I think, it does make sense to run migrations in the continuous integration loop (but not in the local build). Reason: you want to test them, but you don't want to slow down the local build. A fairly common practice is to use 001_initial_schema migration as the only migration on the project for as long as there is no valuable production data to preserve.

But it seems that this misuse of migrations highlights something that might be lacking: a data seeding system.

Yup. Another common practice is db/dataload.rb, a script of ActiveRecord operations to put some data into the database, with the corresponding db:dataload Rake task. Using AR and domain to create this data is much easier than doing the same thing with YAML-based fixtures.

Yup. Another common practice is db/dataload.rb, a script of ActiveRecord operations to put some data into the database, with the corresponding db:dataload Rake task. Using AR and domain to create this data is much easier than doing the same thing with YAML-based fixtures.

Mephisto uses fixtures in a special db/bootstrap dir, and inserts them with a db:bootstrap task. db:bootstrap includes schema:load and the custom data. Though I agree that doing this in ruby would be simpler...

> In that context, it doesn't make sense to use migrations for the test > database because the test database does not have permanent data of > importance.

I think, it does make sense to run migrations in the continuous integration loop (but not in the local build). Reason: you want to test them, but you don't want to slow down the local build. A fairly common practice is to use 001_initial_schema migration as the only migration on the project for as long as there is no valuable production data to preserve.

I don't think I understand this. Why do you want or need to continuously test the migrations? In my opinion, migrations are transient artifacts that only serve the purpose of moving everyone on a schema version A to schema version B. Once everyone has been moved, the migrations are useless and could essentially be deleted.

I'm not sure the "continuous" that Alexey was referring to was the CI process (as in: it is always running), or a repeated run of every migration each time the test suite is run.

The former certainly makes sense; you'd want to test that a migration can successfully run based solely on the contents of the SVN repository and other expected artefacts; verifying that it won't fail because a developer has failed to commit or add a particular file before you try and run that migration on the production system.

That is, you'd want to test that any new migrations don't cause a system failure - not that every migration can be run, each time the CI system runs against a new build.

I really like Alexey idea of a ruby file for loading the seed data and run it with a rake task. Should that be something in rails or rather "best practices"? Or should we have something akin to create_table that's like create_data_for :users do... within something like ActiveRecord::SeedData? (with associated rake tasks)

Granted that my case may be one of few. With that said, what do you think about my proposed rake tasks, minus the db:test:clone? That way, we can build the test db with a rake task, and can rebuild from migrations as well? Or am I still off the mark?

DHH wrote:

> > I think, it does make sense to run migrations in the continuous > > integration loop (but not in the local build). Reason: you want to > > test them > > I don't think I understand this. Why do you want or need to > continuously test the migrations?

I'm not sure the "continuous" that Alexey was referring to was the CI process (as in: it is always running), or a repeated run of every migration each time the test suite is run.

The former certainly makes sense; you'd want to test that a migration can successfully run based solely on the contents of the SVN repository and other expected artefacts; verifying that it won't fail because a developer has failed to commit or add a particular file before you try and run that migration on the production system.

That is, you'd want to test that any new migrations don't cause a system failure - not that every migration can be run, each time the CI system runs against a new build.

+1

I would like to see a way to test migration, especially those involving substantive changes to the data, in the framework. And being able to run them against a large enough dataset, preferably a copy of the production database. But I don't see testing migrations being the same thing as running test cases against the test database.

Assaf

Yup. Another common practice is db/dataload.rb, a script of ActiveRecord operations to put some data into the database, with the corresponding db:dataload Rake task. Using AR and domain to create this data is much easier than doing the same thing with YAML-based fixtures.

I've set up apps to detect when they have an empty database, and to run an action which uses regular AR stuff like User.new to seed the database. That way, the application "sets itself up" the first time it's run - no additional rake task needed (but with the overhead of checking to see if we've got a "clean slate").

In an ideal world, I think Rails applications that have the right info in config/database.yml would be able to create their own database (something like rake db:create), load their own schema (rake db:schema:load), and seed their own data (rake db:bootstrap) automatically when run for the "first time".

The trick would be knowing when an application was being run for the first time, but that might be as simple as telling people to not run rake db:schema:load (or rake:db:create) and simply starting their application after filling out config/database.yml (if database doesn't exist or has no tables, run some "init" action if it exists).

I'm not sure if this sort of thing is possible (or a good idea), but it might be worth thinking about.

- Trevor

"...migrations are transient artifacts that only serve the purpose of moving everyone on a schema version A to schema version B."

David, Koz expressed almost exactly this same sentiment yesterday in another thread (http://groups.google.com/group/rubyonrails-core/browse_frm/ thread/d871469cb2a6589a?hl=en). You guys are consistent in the message. But there is an argument being expressed in these threads, plugins and trac tickets for using migrations for more than just one- time changes.

I use migrations for building the databases FROM SCRATCH for both development and production. And I would like to do the same in test because it works so well for development and production.

*Development: (Before going live and before production even exists) Occasionally I will end up with a development DB that is full of cruft and I want to reset. So I drop the development DB and rebuild. *Production: After months of development, I'm ready to put an app into production, so I contract with a hosting site and build it from scratch.

What both these scenarios have in common is that the ruby schema dumper is inadequate (no DB-specific stuff supported) and the sql schema dumper is also inadequate (no non-DDL available, such as seed data loading). Migrations work beautifully to address these problems in a very Rails-like way (no plugin required!) and using syntax I've already invested in. I can add an Admin user, a Guest user and their authorizations and be able to use the app after rake db:migrate.

On a related note, there seems to be a migration-versus-fixtures debate for seed data coming over the horizon. There is no reason you can't do a hybrid by loading fixtures within a migration. In fact, such an approach is described in Agile Web Development with Rails (page 271, section 16.4). It works well and capitalizes on two well- tested and understood Rails tools.

It is unfortunate that such a great tool (migrations) can't also be used to build the test DB. As it is now, I occasionally find my migrations fail due to subtle DB-side changes or model changes. The only way to keep them fresh is to manually rebuild a database from time to time. But it sure would be nice if they could be used in the day-to-day of building the test DB.

-Chris

*Development: (Before going live and before production even exists) Occasionally I will end up with a development DB that is full of cruft and I want to reset. So I drop the development DB and rebuild. *Production: After months of development, I'm ready to put an app into production, so I contract with a hosting site and build it from scratch.

Both of these scenarios are intended to be solved with db:schema:load. That task isn't working for you because you're putting seed data into migrations. In turn, you feel pain from db:schema:load because it doesn't include your seed data. I think the problem here is seed data in migrations, not migrations vs schema.

What both these scenarios have in common is that the ruby schema dumper is inadequate (no DB-specific stuff supported) and the sql schema dumper is also inadequate (no non-DDL available, such as seed data loading).

In my mind, this is a perfect case for SQL schema dumper. You have a db-specific schema that uses tricks not accessible by the Ruby dumper. If you split out the concern of seed data, I think a lot of your problems go away.

Migrations work beautifully to address these problems in a very Rails-like way (no plugin required!) and using syntax I've already invested in. I can add an Admin user, a Guest user and their authorizations and be able to use the app after rake db:migrate.

Again, I think this is a mistake and it was certainly not what migrations were designed for. They lead to all the pains and problems you're describing with migrations.

I fully realize that people are misusing migrations in this way because they were missing a seed system and just grabbed something that had the same vague outline. But I think the problem then is to consider how to best do seeding. Not to twist migrations into a seed system.

It is unfortunate that such a great tool (migrations) can't also be used to build the test DB. As it is now, I occasionally find my migrations fail due to subtle DB-side changes or model changes. The only way to keep them fresh is to manually rebuild a database from time to time. But it sure would be nice if they could be used in the day-to-day of building the test DB.

Again, this is a symptom of wanting to run migrations all the time and thus needing to make sure they'll work for all eternity. I think that's a waste of time and hard too. You might very well have old migrations that depend on classes and methods that are no longer around. I've seen some of the hoops that people jump through to keep legacy behavior intact for migrations and it sure ain't pretty.

So in summary, what we need is a seed system as either a best practice, plugin, or core (doubtful, it doesn't feel like a Most People, Most of The Time concern) and stop trying to turn migrations (or even fixtures) into a seed system.

In an ideal world, I think Rails applications that have the right info in config/database.yml would be able to create their own database (something like rake db:create), load their own schema (rake db:schema:load), and seed their own data (rake db:bootstrap) automatically when run for the "first time".

I think this is being way too clever. Different applications will have different things they need to have happen before they can run. That might be gem dependencies, that might be ensuring a certain version of Ruby, it might be setting up seed data, it might be so many things that it's not worth standardizing. Just create script/setup and put in the README that people should run that when first installing the application. Problem solved, IMO.

Would something like the Scenarios plugin solve your problem?

http://faithfulcode.rubyforge.org/docs/scenarios/

[snip]

Again, I think this is a mistake and it was certainly not what migrations were designed for. They lead to all the pains and problems you're describing with migrations.

I fully realize that people are misusing migrations in this way because they were missing a seed system and just grabbed something that had the same vague outline. But I think the problem then is to consider how to best do seeding. Not to twist migrations into a seed system.

I'm one of the people misusing migrations in this way, and a seed system could fulfill part of the problem, but not all of it.

I often use migrations for creating data that needs to be present in every environment. For example, a new account in an accounting table. I want to add that to an existing production application without reloading all of the data. By including it in a migration, I'm sure that it will exist in the database of every developer, our integration environment, and finally production. It keeps me from having to track down data bugs.

Of course, the fact that I also have to encode that data into my fixtures isn't very DRY. I'd love to have some way of specifying the data only once, I just don't have any brilliant ideas about how to do it.

[snip]

Mike Mangino http://www.elevatedrails.com

If we kept the integrity of migrations and moved to create something like SeedData or an implementation of Scenarios, then we can still keep the state that you want. We'd just have the normal migration tasks and then tasks for seeding the data and a task for migrating and then seeding.

IMO, I'd love to see a seed system that mimics migrations a bit and keeps the standard AR syntax that we are used to: Person.create(...).

Perhaps something like

class SeedPeople < ActiveRecord::SeedData   def self.up     create_data :people do |p|       p.create(:name => "Robert", :password => "secret")       p.create(:name => "John", :password => "supersecret")     end   end end

I find that the above syntax feels comfortable to me - what do other people think of the above? John Long has a lot of this already done.

Are there issues for having a simple rake task for setting up just the test db or even just the production db (without loading any data)? e.g. rake db:create:test, rake db:create:production - Perhaps I'm just not seeing the reason why these aren't nice conveniences?

IMO, I’d love to see a seed system that mimics migrations a bit and keeps the standard AR syntax that we are used to: Person.create(…).

If we have a version-based data-seeding system then we’ve really just created a parallel set of migrations. Same benefits, same problems. Once the models are out of date (say you move a field from one table to another) then your older seed files will be broken.

Yet, if we don’t use a version-based system then it’s difficult to know what actions to perform to update a given environment. Adding missing data is easy enough but how do you track when data was removed?

I can see now why we haven’t had any kind of data-seeding mechanism and why many of us (myself included) cannibalized migrations for that purpose.

::Jack Danger

I know that these tasks may not be for core, but for those that did find some of them useful, I have plugin that includes them and some other tasks, including Tobias Lutke's backup task:

http://svn.robertrevans.com/plugins/data_tasks/