Rails 7 Sharding + Multiple Databases "no connection pool" error

Hey folks, I’d appreciate a sanity check – I feel like what we need is achievable, but I’m missing something along the way.

Our database has been horizontally sharded for close to a decade, and in order to accomplish that we’ve been monkeypatching ActiveRecord in a bunch of ways. With Rails shards, I’d like to finally remove that monkeypatch and come back to a convention – it will allow us to implement better caching, connection pooling, etc etc.

We have a database architecture that resembles this fictional list of databases and tables:

  • National Database
    • Table: Presidents (name: String)
  • Geographical Database Shards
    • Table: Students (name: String)
    • US East Shard
    • US West Shard

I’ve mirrored that setup in a dummy Rails app like this:

# database.yml
default: &default
  adapter: postgresql
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  national:
    <<: *default
    database: national
    migrations_paths: db/national_migrations
  students_us_east:
    <<: *default
    database: students_us_east
    migrations_paths: db/geographical_migrations
  students_us_west:
    <<: *default
    database: students_us_west
    migrations_paths: db/geographical_migrations
# models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
  primary_abstract_class
end

# models/national_record.rb
class NationalRecord < ApplicationRecord
  self.abstract_class = true

  connects_to database: { writing: :national }
end

# models/geographical_record.rb
class GeographicalRecord < ApplicationRecord
  self.abstract_class = true

  connects_to shards: {
    us_east: { writing: :students_us_east },
    us_west: { writing: :students_us_west }
  }
end

# models/student.rb
class Student < GeographicalRecord
end

# models/president.rb
class President < NationalRecord
end

Now, I would think I could create and query records from the three databases, like this:

> rails db:setup db:migrate
Running via Spring preloader in process 82253
Created database 'national'
Created database 'students_us_east'
Created database 'students_us_west'
> rails c
Running via Spring preloader in process 95494
Loading development environment (Rails 7.2.1.1)
vanilla-rails-app(dev)* ActiveRecord::Base.connected_to(shard: :us_east, role: :writer) do
vanilla-rails-app(dev)*   Student.create(name: "bobby tables")
vanilla-rails-app(dev)*   puts Student.first.name
vanilla-rails-app(dev)> end

But alas, I’ve done something wrong. The error message I get is:

/.asdf/installs/ruby/3.3.0/lib/ruby/gems/3.3.0/gems/irb-1.14.1/lib/irb.rb:1260:in `full_message’: No connection pool for ‘GeographicalRecord’ found for the ‘us_east’ shard. (ActiveRecord::ConnectionNotEstablished)

Here’s what I’ve tried so far:

  • GeographicalRecord.connected_to(shard: :us_east, role: :writer) do (as above)

  • `ActiveRecord::Base.connected_to(…) (calling connected_to on ActiveRecord::Base instead of my own parent class)

  • I’ve read and inserted debugger statements in a ton of the connection lookup and handling code for active_record, and I can see a lot of what’s going on but I’m still missing something.

  • Attempted to solve my problem by re-creating a dedicated brand new application and building everything with generators:

    • rails new vanilla_rails_app --database=postgresql
    • Edit database.yml to above
    • rails generate scaffold presidents name --database national
    • rails generate scaffold students name --database=students_us_east --parent=GeographicalRecord
      • The generator kind of fell over here, not knowing what to do with students_us_east and GeographicalRecord. I edited files to match above.

Hey Vox! Welcome.

Did you try it without using primary_abstract_class? As in, NationalRecord and GeographicalRecord inherit from ActiveRecord::Base?

@ridiculous thanks!

I did originally try to inherit everything from ApplicationRecord, yes.

I also tried manually declaring primary_abstract_class on each of the base classes – ActiveRecord complains about that.

It occurred to me this afternoon that the problem may not be in my configuration, but in my activation of the shard. I attempted to use the somewhat underdocumented connected_to_many method which isn’t in the guides to be very explicit about the databases I wanted to use:

vanilla-rails-app(dev)* ActiveRecord::Base.connected_to_many(NationalRecord, role: :writing) do
vanilla-rails-app(dev)*   ActiveRecord::Base.connected_to_many(GeographicalRecord, role: :writing, shard: :us_east) do
vanilla-rails-app(dev)*     Student.first
vanilla-rails-app(dev)*   end
vanilla-rails-app(dev)> end
/.asdf/installs/ruby/3.3.0/lib/ruby/gems/3.3.0/gems/irb-1.14.1/lib/irb.rb:1260:in `full_message': No connection pool for 'GeographicalRecord' found for the 'us_east' shard. (ActiveRecord::ConnectionNotEstablished)

Interestingly, this does work (querying without manually activating a shard – the default shard as declared above seems to be us_east):

vanilla-rails-app(dev)> Student.current_shard
=> :us_east
vanilla-rails-app(dev)> Student.create(name: "bobby tables")
  TRANSACTION (0.2ms)  BEGIN
  Student Create (1.1ms)  INSERT INTO "students" ("name", "created_at", "updated_at") VALUES ($1, $2, $3) RETURNING "id"  [["name", "bobby tables"], ["created_at", "2024-10-17 21:29:38.609129"], ["updated_at", "2024-10-17 21:29:38.609129"]]
  TRANSACTION (0.7ms)  COMMIT
=> 
#<Student:0x000000011fdcdfc8
 id: 2,
 name: "bobby tables",
 created_at: "2024-10-17 21:29:38.609129000 +0000",
 updated_at: "2024-10-17 21:29:38.609129000 +0000">
vanilla-rails-app(dev)> Student.first
  Student Load (0.4ms)  SELECT "students".* FROM "students" ORDER BY "students"."id" ASC LIMIT $1  [["LIMIT", 1]]
=> 
#<Student:0x000000011fdcd208
 id: 2,
 name: "bobby tables",
 created_at: "2024-10-17 21:29:38.609129000 +0000",
 updated_at: "2024-10-17 21:29:38.609129000 +0000">
1 Like