Consider sorting of hashes for schema_cache.yml

Per Consider sorting of hashes for schema_cache.yml · Issue #42717 · rails/rails · GitHub, I was told to get some feedback here.

Should the db/schema_cache.yml be sorted?

Steps to reproduce

  1. Two developers have migrations run in different orders
  2. Each developer runs rake db:schema:cache:dump

Expected behavior

Contents of the schema_cache.yml produced on 2 separate machines/databases for the same exact schema (although column orders might be different) will be exactly the same.

Actual behavior

Contents of the schema_cache.yml produced on 2 separate machines/databases for the same exact schema (although column orders might be different)will vary by sort order only.

Monkey Patch to Sort

class ActiveRecord::ConnectionAdapters::SchemaCache
  def sort_by_hash_of_array_of_named_objects(hash)
    hash.transform_values { |v| v.sort_by(&:name) }.sort.to_h
  end

  def sort_all!
    # rubocop:disable Rails/Output
    puts 'MONKEY PATCH (schema_cache_extension.rb): Sorting @colums, @columns_hash, @primary_keys, @data_sources, @indexes of ActiveRecord::ConnectionAdapters::SchemaCache'
    # rubocop:enable Rails/Output
    @columns = sort_by_hash_of_array_of_named_objects(@columns)

    # This is a Hash of Hashes
    @columns_hash = @columns_hash.transform_values { |v| v.sort.to_h }.sort.to_h

    @primary_keys = @primary_keys.sort.to_h
    @data_sources = @data_sources.sort.to_h

    @indexes = sort_by_hash_of_array_of_named_objects(@indexes)
  end
end

module ActiveRecord::Tasks::DatabaseTasks
  def dump_schema_cache(conn, filename)
    # rubocop:disable Rails/Output
    binding.pry
    puts 'MONKEY PATCH (schema_cache_extension.rb): ActiveRecord::Tasks::DatabaseTasks.dump_schema_cache adjusted to sort'
    # rubocop:enable Rails/Output
    conn.schema_cache.clear!
    conn.data_sources.each { |table| conn.schema_cache.add(table) }
    conn.schema_cache.sort_all!

   # This is the line that skipped the compression
    # open(filename, 'wb') { |f| f.write(YAML.dump(conn.schema_cache)) }

# correct line
    conn.schema_cache.dump_to(filename)

  end
end

On a related note, the GitHub - jakeonrails/fix-db-schema-conflicts does something similar for db/schema.rb files.

Are you adding that file in the repository? That file is not supposed to be committed.

If you generate the cache in development and deploy the code before running the migration the framework will think you have a schema that doesn’t reflect the production schema.

The idea of this file is for it to be generated based on the production database right before the rails server is started.

Would the order of the hashes be relevant if the file was not committed?

That’s why I’m not using it right now…

But in case there’s a good reason to have the generation be consistent, then we could use it.

For example, suppose we generate the schema_cache.yml on a Heroku staging app in a Heroku pipeline. And then on production, in the release phase, we could verify that the file is the same as the one we put in the slug. How did we the schema_cache.yml in the slug? We generate it on CI. And cache it to S3, then in the build phase on Heroku, we grab the file based on a cache key from an MD5 of the schema.rb and some other factors.

I’m not against making it consistent. I actually think it would be better. But, sharing the same cache between different environments is asking for trouble, even between staging and production.

Do you mind to open a PR making the content consistent?

2 Likes