Is there a better method for switching database in rails without exhausting the connection pool?

I currently use this little addition to ActiveRecord to allow me to switch databases on the fly (used in a middleware on an incoming connection and in other places where I need to work with multiple customers databases (like migrations)):

module ActiveRecord
  module ConnectionAdapters
    class Mysql2Adapter < AbstractMysqlAdapter
      # Allows us to connect to a new database in a clean way
      def change_database!(db)
        if @config[:database] != db
          @config.merge!(:database => db)
          clear_query_cache
          reconnect!
        end
      end

      def restore_default_database!
        database = YAML::load_file('config/database.yml')[Rails.env]['database']
        change_database!(database)
      end
    end
  end
end

I was excited to try the Rails 6 approach to multiple databases but was saddened to see that they have only considered a finite number of databases and those are listed in the databases.yml file.

In my situation I can create a new database while the application is running (when creating a new client site) and then connect to it by visiting the domain name associated with the site.

My code above reuses the same connection. I’ve tried using establish_connection but this eventually ends up returning No connection pool with 'primary' found. once the connection pool appears to exhaust itself. I’d imagine I’d need a pool with at least the number of databases I have in it for it not to run out, and even then, it still could if there was more than one connection to a site per time period.

My gut is to stick with what I have, but I was wondering if there are any other caches I should be clearing when changing databases mid-stream? (for Rails 6.0 and 6.1)

I should mention that each database is exactly the same structure; it’s just a way of partitioning data between customers.

I’ve posted this question on StackOverflow but I think this might be a better place to look for a more qualified response.

1 Like

I think you would want to tag Eileen for this. She’s the one who had led the development for multi database support for the past few years

Thanks @pinzonjulian :slight_smile:

@eileencodes, if you wouldn’t mind chipping in on this one I’d really appreciate it :slight_smile: I’ve been using this approximate method (I think it was originally called database hijacking!) since Rails 1.6. It’s always made me nervous but it’s worked in production since 2007 so I guess it can’t be all that bad given we have over 180 seperate mysql databases. In the early days our switching method was a bit rubbish and we used to run out of connections to mysql, but the current method seems to recycle the connections nicely and basically emulates a dropped connection right at the start of the request.

1 Like

In my situation I can create a new database while the application is running (when creating a new client site) and then connect to it by visiting the domain name associated with the site.

I’ve seen a few requests for this kind of behavior but I really don’t think this is a good idea to support. We actually did have some vague support for it with the database kwarg in connected_to but I got so many bug reports from teams using it incorrectly that I decided to deprecate and remove it. It wasn’t worth the extra overhead of support requests. Additionally the risk of someone using it wrong and taking down their database was too high. Fundamentally I don’t think Rails should support adding connections to the database on the fly after boot.

I think this is especially dangerous for anyone who doesn’t understand how the connections work - if you were to establish a new connection with the same class name it would clobber the existing connection pool. I’m think that’s what’s happening when you get the no connection pool error - the pool got clobbered, not exhausted.

I don’t really understand though why you need to create connections while the application is running though. It might help if you make a demo app that shows how this works. I don’t know when I’d get to implementing something like this but I’ll keep it in mind as things evolve.

There’s still a few features we need to add for tenant sharding for it to be a bit more robust (autoswitcher, config improvements, rake tasks improvements). It’s not been something I’ve been able to prioritize since we don’t use this method at GitHub.

1 Like

Hi @eileencodes, thank you so much for getting back to me. Please excuse my delay in replying as I’ve been busily extracting my functionality into a demo application that you can access here: GitHub - brendon/switcharoo: An example of live database creation and switching

I’ve tried to detail everything in the README and hopefully the commit comments help also.

Having the ability to create new clients (websites in my case) while the application is running is crucial for us. While we could theoretically stop the application and create the new client, then start it up again (presumably with another Rails application?), it would certainly be clunky. We have around 180 clients (and thus a database for each of them). This case doesn’t fit within the existing Rails multi-tenant model that demands that we specify all of these databases up front.

Having a database per client is great for keeping customer data seperate. It also makes backups/restores easier and makes cross-pollution of client data impossible.

The switching mechanism is per my first post. We don’t muck around with connections other than to change the database that rails is configured to connect to, then force a reconnect.

I hope this helps :slight_smile: I’m very happy to answer any more questions that you might have. We’ve tweaked this concept over many years and it’s working well for us now.

I should quickly follow up with: that while I’ve not noticed any negative side-effects from connection switching in this way (apart from needing to clear the query cache), I’m not 100% sure if there are other caches that need to be cleared also.

The prepared statement cache is already cleared on a reconnect so I don’t bother doing that.

Hey there, just wondering if you’ve thought about / tried implementing a zero downtime deployment strategy for your application? That way when you wanted to add an additional database and deploy the changes (ie additional database config), the old version of the app is still running & user facing until the new one is complete?

Hi @Anthony_Hernandez, I kind of do this already with Capistrano, though of course not for the case of adding a new database.

Adding a new customer (including creating the database and loading in the schema) takes all of a couple of seconds or less, and is done infrequently enough, and only by our staff, that there’s no problem with downtime in that regard. The running application doesn’t even care that the new database has been created until it’s actually connected to via someone requesting it via one of its domains.

Check out my sample app to see what I mean.

The databases that are being switched out are identical in structure, just not in content, so apart from having to clear some caches, Rails should be none the wiser.

Another way to do this did cross my mind using ActiveSupport::CurrentAttributes and having something like:

self.table_name =  "#{Current.database}.table_name"

in each class, but this seems less elegant.

Another way to think of what I’m doing is Schema Switching in the postgresql world, where there is an admin schema and then a schema of identical tables for each client. On an incoming connection we’d set the schema to be [admin, customers_schema] and carry on. Schema switching is well supported in the postgresql driver last time I looked at this. Alas, I’m using MySQL, but fortunately a ‘database’ in MySQL is more like a schema anyway, since we can join across ‘databases’ (which I do).

To avoid this error you can change the size of your connection pool manually by customizing your connection settings!

@Joseph3511, I think the error pointed to a deeper issue with pursuing this approach. In the end, I’ve stuck with what I was already doing and it works great. Definitely interested to see what others think of my approach in the demo app.

@eileencodes I’d be happy to help figure out how to get that stuff built if you want to brain dumb those ideas somewhere (maybe basecamp) :pray:

@eileencodes, just a friendly bump to see what you thought of the concepts I presented in the demo application you asked for: GitHub - brendon/switcharoo: An example of live database creation and switching

I totally appreciate you’re super busy, so absolutely no pressure to even look at it if you don’t have time. :slight_smile:

Could you dynamically add connection configs, and utilize them as horizontal shards? You’d need to be careful about how many connection pools and connections you end up with. You might even need some sort of connection pool reaper.

1 Like

Thanks @sj26, that is one possibility. If you check out my sample application, I’m actually modifying the exiting config in-place (just changing the database string), then reconnecting.

I guess that’s just a form of dynamic horizontal sharding. When I heard of Rails multi-database support I was hopeful that it would allow for dynamic sharding, but having it all fixed in a yaml file is a no-go when one has 180 seperate databases (one per customer) as I do.

My solution works well. My intention with this thread was to see if others faced a similar issue to mine, and secondarily to see if there was any appetite for making dynamic sharding (for want of a better name) a part of core Rails.

Yeah, interesting. I did look at your example. But you’re creating a new database connection on every request, reusing the same activerecord connection adapter within the same pool, and so you’re not gaining any connection pooling benefits, and you’re leaking connections connected to a different database back into a shared pool. This approach will work at a small scale, and with all databases on the same host, but I think will become difficult to maintain and scale over time.

Utilising a primary database which is the “admin” database, and then dynamically configuring additional shards for tenanting would allow using connected_to(shard: ...) around each request which will do transparent connection pool management and tenant isolation. Isolating connection pools per tenant also avoids noisy neighbour problems.

Thanks @sj26, those are great insights. Exactly the feedback I was looking for.

As you allude to, having lots of connection pools might cause issues, especially if each instance (say in Passenger) doesn’t share its pools.

I’ll investigate the path you suggest the next time I’m working on this area of the application. It’s currently working well at our low scale. My strategy for if we ever got too big for one server was just to operate two or more seperate servers with seperate ‘admins’ and just spread the customers across these.

1 Like

If you’re committed to using a single database host and you’re using MySQL then you could also have middleware which does database switching on the same connection per request with USE. That’s kind-of what you’re doing, but it wouldn’t need to reconnect entirely, nor reach so deeply into ActiveRecord.

That’s not a bad idea. It rings some bells. I think we used to do that in the early days (Rails 1 and 2), but ran into some issues around the @configuration object becoming out of sync with what it thought it was connected to. Certainly something to look into though.