Horizontal sharding schema management

Does anybody use the new horizontal sharding support in ActiveRecord 6.1 combined with schema management and migrations? I found it creates a schema file for every shard, is this intentional? Is there a way to prevent this?

We have a model which is functionally partitioned off from our primary database, and is horizontally sharded. Right now we do this with a database configuration per shard, and a model per shard which does an establish_connection to the appropriate configuration. Schema is managed for these shards manually.

This is a simplified example:

# config/database.yml
development: ...
chunks_shard_one: ...
chunks_shard_two: ...
module Chunks
  class Base < ActiveRecord::Base
    # There is no `chunks` table in the primary database, so this model would
    # fail if used, but we want all child classes to have the same table name.
    self.table_name = "chunks"
  end

  class ShardOne < Base
    establish_connection :chunks_shard_one
  end

  class ShardTwo < Base
    establish_connection :chunks_shard_two
  end

  SHARDS = {shard_one: ShardOne, shard_two: ShardTwo}

  def self.for(supermodel)
    SHARDS.fetch(supermodel.shard_id).where(supermodel_id: supermodel.id).all
  end
end

I’d like to be able to do this, and have each shard based on the same schema and migrated from the same migrations:

# config/database.yml
development:
  primary: ...
  chunks_shard_one:
    ...
    migrations_path: db/chunks_migrate
  chunks_shard_two:
    ...
    migrations_path: db/chunks_migrate
class ChunkRecord < ActiveRecord::Base
  self.abstract_class = true

  connected_to shards: {
    shard_one: { writing: :chunks_shard_one, reading: chunks_shard_one },
    shard_two: { writing: :chunks_shard_two, reading: chunks_shard_two },
  }
end

class Chunk < ChunkRecord
end

module Chunks
  def self.for(supermodel)
    ChunkRecord.connected_to(role: :writing, shard: supermodel.shard_id) do
      Chunk.where(supermodel_id: supermodel.id).all
    end
  end
end

But when I now run:

bin/rails db:prepare

I get two schema files:

db/chunks_shard_one_schema.rb
db/chunks_shard_two_schema.rb

Where I only want:

db/chunks_schema.rb

Poking around in ActiveRecord::Tasks::DatabaseTasks and friends it looks like the filename is based on the configuration name, like chunks_shard_one:

Which makes sense until thinking about horizontal sharding.

I can’t see a way to override the filename per configuration, or that might head toward a good solve:

# config/database.yml
development:
  primary: ...
  chunks_shard: *chunks_shard
    schema_path: db/chunks_schema.rb
    migrations_path: db/chunks_migrate
  chunks_shard_one:
    <<: &chunks_shard
    ...
  chunks_shard_two:
    <<: &chunks_shard
    ...

I’ve been starting at this too long and feel like I might be missing something. :sweat_smile:

Or am I holding it wrong? Is this a gap? Is there a solve planned? Or would a contribution be welcome?

2 Likes

I’ve created a workaround for now which adds schema_path as a database configuration option, matching schema_cache_path and migration_paths:

# config/initializers/active_record_schema_path.rb

# Teach ActiveRecord the ability to specify a schema/structure file path in a
# databases configuration as `schema_path`, matching `schema_cache_path` and
# `migration_paths`, allowing horizontal shards to share the same schema,
# schema cache, and migrations.
ActiveSupport.on_load(:active_record) do
  if ActiveRecord::DatabaseConfigurations::DatabaseConfig.respond_to? :schema_path
    raise "Has this patch been upstreamed? Is it time to remove this file?"
  end

  module ActiveRecordSchemaPathDatabaseConfig
    extend ActiveSupport::Concern

    def schema_path
      raise NotImplementedError
    end
  end

  ActiveRecord::DatabaseConfigurations::DatabaseConfig.prepend ActiveRecordSchemaPathDatabaseConfig

  module ActiveRecordSchemaPathHashConfig
    extend ActiveSupport::Concern

    def schema_path
      configuration_hash[:schema_path]
    end
  end

  ActiveRecord::DatabaseConfigurations::HashConfig.prepend ActiveRecordSchemaPathHashConfig

  module ActiveRecordSchemaPathDatabaseTasks
    extend ActiveSupport::Concern

    def dump_filename(db_config_name, format = ActiveRecord::Base.schema_format)
      if config = ActiveRecord::Base.configurations.configs_for(name: db_config_name)
        if config.schema_path.present?
          return config.schema_path
        end
      end

      super
    end
  end

  ActiveRecord::Tasks::DatabaseTasks.singleton_class.prepend ActiveRecordSchemaPathDatabaseTasks
end

Also looking for a solution to this. Since all shards share the same migrations, why don’t they share the same schema file?

1 Like

We ended up shipping the patch I posted above. I might open a PR upstream.

thanks for the awesome information.