Patterns for "data-only" Rails migrations

How do you manage data migrations that do not fill on those constrains?

One case I had recently:

I had a field with 2 possible values per user that for different reasons we had to transform on a table that had around 3-4 rows per group of users.

That transformation was not trivial and not only needed to go through some complicated business logic, it also needed to emit some notification events in the process for things like trazability, etc…

In my experience runnning those within schema migrations becomes problematic pretty soon as your data grows and in some cases it might even become a danger depending on your deploy strategy.

The solutions I’ve seen so far for these cases (and on different companies) where just ad-hoc scripts that somebody ran like bundle exec ruby some_script.rb on the production server or in the best of the cases a rake task.

Don’t get me wrong, I’m not saying this is wrong either. But I wonder which other tools we could have around processes like this that helped us make it easier… I’m also not sure it fits into Rails.

Some of the things I’m talking to and that I think a better data migration tooling would provide:

  1. Specific Instrumentation: Having things like DataDog or NewRelic just hooking into the instrumentation support for Rails is amazing, saves time and headaches and makes issues easier to solve. I miss having this without having to solve it on my own.
  2. User input capabilities. I’ve found in more than one case that I wanted to do things like: If we reach this case I would like to be able to stop, and ask some domain expert before continuing.
  3. Notifications. Summaries with the actions that have been done, the tame it has taken or the human interventions.
  4. Versions. Kind of what you get with rails db:migrate:status but for data migrations (there are some gems that do this already I think.
  5. Good output by default. Knowing how long the process will take or how much of the things I need to migrate have already been migrated.
  6. Testability: I’d love to have a super simple way to write specific fixtures that I could use to write tests for these cases (or to test them manually)
  7. Disposability: Many of those are just there to be run once and after some time they can be disposed and that should also easily dispose of their tests and fixtures.

Again, I’m sure that for many, many cases nothing will beat a well written SQL query. But for the more complicated cases, the ones in which you need to scale up your solution. How do others manage beyond schema migrations? Does this not exist on rails world because it is not sucha common itch? Does it exist but I’m not aware of it? Would somebody else like to join if I decided to build it?

I’m also not sure that something like this should be supported by Rails but the topic was here and I couldn’t resist asking just in case.

PS: Thanks everybody for the May Of WTF!