[Feature Proposal] define a shards method for ActiveRecord abstract classes

ritikesh · December 8, 2020, 8:06pm

With native horizontal sharding in place, an API to be able to run snippets across all shards for a given ActiveRecord::Base abstract class(ApplicationRecord in most cases) would be a definite nice to have.

def run_something_on_all_shards
  ApplicationRecord.shards.each do |shard|
    ApplicationRecord.connected_to(shard: shard, role: :writing) do
      # execute some script for data collection / updation.
    end
  end
end

The alternative to this would be to provide a hook to execute a block of code on all connections, but I felt like that’s application logic.

def run_something_on_all_shards
  ApplicationRecord.on_all_shards(role: :writing).each do |shard|
    # execute some script for data collection / updation.
  end
end

This is a regular use-case for running 1-off scripts across all your tenants. Keeping/reading shard information across multiple PODs for every run isn’t practical.

Happy to help with a PR based on suggestions.

rafaelfranca · December 8, 2020, 9:46pm

I’m positive on this feature, although in our codebase we disallow this operation, except in a few context like data migrations, given it is easier to reason about an unit of work (job or request) if its entirety is always inside the same shard, and in our case, for the same tenant.

@eileencodes do you have an opinion about this?

ritikesh · December 8, 2020, 9:56pm

Likewise. It isn’t used within the codebase. But it does come up quite frequently during migrations and when reading/extracting data out of multiple tenants spread across shards - like on some specific plans, etc.

eileencodes · December 8, 2020, 10:23pm

I’m not against the feature from a “nice to have standpoint” but I think it’d be difficult to do technically.

At the moment Rails has no concept of what shards there are until the model is actually connected. We can’t read connects_to :shards from models until the models are loaded and therefore connected to the databases. To accomplish this we’d need a method where shard names are set, and then would need to verify the connects_to against that set.
We don’t currently force applications to use the same shards across clusters. It’s possible for ApplicationRecord to have shards one and two and AnimalsRecord to have one, two, and three - or really anything. The only requirement is a default shard.
There isn’t a way to define a method that would be able to find all classes with connections and run a migration on them. It would have to be per-class and defined in connection_handling.rb like connected_to is. That way we can call on ApplicationRecord or on AnimalsRecord from the app. Rails can’t do this for you because in non-eager loaded environments connection classes other than AppicationRecord are not automatically loaded/connected. They only connect once they’re called.

ritikesh · December 9, 2020, 10:50am

Will we be available to leverage the options passed to connects_to to store the list of shard names in an accessor as a set?

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  # sets up connections and also the shard_names accessor to [:default, :shard_one]
  connects_to shards: {
    default: { writing: :primary, reading: :primary_replica },
    shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica }
  }
end

if the user were to only pass the database option to another abstract class, say AnimalsRecord, I believe we internally still map this as the default shard and can set the shard_names with the same as well:

class AnimalsRecord < ApplicationRecord
  self.abstract_class = true

  # sets up connections and also the shard_names accessor to [:default]
  connects_to database: { writing: :animals }
end

I can then do the below or even write my own wrapper around it:

# for ApplicationRecord
ApplicationRecord.shards.each do |shard|
  ApplicationRecord.connected_to(shard: shard, role: :writing) do
    # execute some script for data collection / updation.
  end
end
# for AnimalsRecord
AnimalsRecord.shards.each do |shard|
  AnimalsRecord.connected_to(shard: shard, role: :writing) do
    # execute some script for data collection / updation.
  end
end

Let me know your thoughts and thank you both for responding.

ritikesh · December 11, 2020, 7:28pm

Hi @eileencodes / @rafaelfranca ,

I was able to achieve this using a shard_names accessor in ActiveRecord::ConnectionHandling. Here is the patch - https://github.com/rails/rails/compare/master...ritikesh:shard_names?expand=1. With this implementation, I am able to run this:

# for ApplicationRecord
ApplicationRecord.shard_names.each do |shard|
  ApplicationRecord.connected_to(shard: shard, role: :writing) do
    # execute some script for data collection / updation.
  end
end
# for AnimalsRecord
AnimalsRecord.shard_names.each do |shard|
  AnimalsRecord.connected_to(shard: shard, role: :writing) do
    # execute some script for data collection / updation.
  end
end

I will add tests and docs if the approach seems fine to you.

Topic		Replies	Views
Help With Sharded Database Setup and Third Party Gem Models rubyonrails-talk	0	359	July 13, 2022
Feature proposal: Add a `shard_keys` method to ActiveRecord models rubyonrails-core	3	558	March 13, 2024
Horizontal sharding schema management rubyonrails-talk activerecord	6	2007	October 26, 2021
[Question] Horizontal sharding in a thread rubyonrails-core	1	407	April 4, 2021
Rails 7 Sharding + Multiple Databases "no connection pool" error rubyonrails-talk	3	342	December 6, 2024

[Feature Proposal] define a shards method for ActiveRecord abstract classes

Related topics

More Resources