Hi,
As Rails developers, we often encounter situations where we need to modify data in production databases. This could be to correct erroneously created data or to populate data after schema changes etc.
Defining these operations as “data migrations,” I’d like to discuss the following points:
- What approaches exist for data migration on Rails?
- Is there any officially recommended approach?
What approaches exist for data migration on Rails?
Based on my experience and discussions with colleagues, I’ve identified several approaches:
(1) Performing data migrations alongside db:migrate
This involves writing data migration code in the up
, down
, or change
methods of ActiveRecord::Migration
subclasses in the db/migrate directory. I’ve heard that GitLab adopts this approach. This approach was also long documented in the Rails Guides.
(2) Preparing and executing separate scripts from db migration files
Often implemented as rake tasks, executed via bin/rails my_task
. This clearly separates data migrations from schema migrations. I’m most familiar with this approach.
(3) Executing SQL directly without using Rails
Connecting to the database with a user having write permissions and executing SQL. While it’s the most primitive approach, it’s less common in Rails projects, possibly because it bypasses model validations and callbacks. I’ve seen it used in non-Rails projects like Go or Java.
(4) Using specialized gems
- GitHub - ilyakatz/data-migrate: Migrate and update data alongside your database structure.
- Implement scripts for data migration. Run via
data:migrate
rake task. (notdb:migrate
)
- Implement scripts for data migration. Run via
- GitHub - ka8725/migration_data: Migrate data along with schema migrations in Rails and keep them up to date.
- Run scripts that hooks
db:migrate
- Run scripts that hooks
- GitHub - pboling/seed_migration: Seed Migration
- Almost same as migration_data. Run via
seed:migrate
rake task.
- Almost same as migration_data. Run via
- GitHub - Shopify/maintenance_tasks: A Rails engine for queueing and managing data migrations.
The maintenance_tasks gem stands out by providing a rich GUI and features like job pausing and resuming.
Any other common/significant approaches that I miss?
Is there any officially recommended approach?
For a long time, the Active Record Migration Guide demonstrated performing data migrations simultaneously with schema migrations. This was present until the guide refresh in June 2024.
# quoted from https://guides.rubyonrails.org/v7.2/active_record_migrations.html#migrations-and-seed-data
class AddInitialProducts < ActiveRecord::Migration[7.2]
def up
5.times do |i|
Product.create(name: "Product ##{i}", description: "A product.")
end
end
def down
Product.delete_all
end
end
However, the current Active Record Migration Guide (as of September 2024) advises separating schema migrations from data migrations.
In Rails, it is generally not advised to perform data migrations using migration files.
Additionally, in this discussion, it was mentioned that Basecamp places data migration scripts in script/migrate/*
. This led to the addition of a script
generator in Add script folder and generator by jeromedalbert · Pull Request #52335 · rails/rails · GitHub.
Given these points, it seems the current official recommendation might be:
- Separate schema migrations from data migrations.
- Implement Ruby scripts in
script/
and execute them viaruby script/migrate/foo.rb
orrails r script/migrate/foo.rb
. - Or, recommend GitHub - Shopify/maintenance_tasks: A Rails engine for queueing and managing data migrations. except in certain cases.
The “in certain cases” refers to scenarios outlined in the maintenance_tasks README:
If your task shouldn’t run as an Active Job, it probably isn’t a good match for this gem. If your task doesn’t need to run in the background, consider a runner script instead. If your task doesn’t need to be interruptible, consider a normal Active Job.
Thoughts
I understand that the best approach may vary depending on system and team size, and operational constraints etc. My goal is to be able to make better decisions considering the recent situation and these factors.
Past discussions
I’ve reviewed past discussions on data migration in Rails Discussions, but they seem to be from the 2010s or earlier. I hope that it’s worth to bring up this discussion in 2024.