I currently use this little addition to ActiveRecord to allow me to switch databases on the fly (used in a middleware on an incoming connection and in other places where I need to work with multiple customers databases (like migrations)):
module ActiveRecord
module ConnectionAdapters
class Mysql2Adapter < AbstractMysqlAdapter
# Allows us to connect to a new database in a clean way
def change_database!(db)
if @config[:database] != db
@config.merge!(:database => db)
clear_query_cache
reconnect!
end
end
def restore_default_database!
database = YAML::load_file('config/database.yml')[Rails.env]['database']
change_database!(database)
end
end
end
end
I was excited to try the Rails 6 approach to multiple databases but was saddened to see that they have only considered a finite number of databases and those are listed in the databases.yml file.
In my situation I can create a new database while the application is running (when creating a new client site) and then connect to it by visiting the domain name associated with the site.
My code above reuses the same connection. Iāve tried using establish_connection but this eventually ends up returning No connection pool with 'primary' found. once the connection pool appears to exhaust itself. Iād imagine Iād need a pool with at least the number of databases I have in it for it not to run out, and even then, it still could if there was more than one connection to a site per time period.
My gut is to stick with what I have, but I was wondering if there are any other caches I should be clearing when changing databases mid-stream? (for Rails 6.0 and 6.1)
I should mention that each database is exactly the same structure; itās just a way of partitioning data between customers.
Iāve posted this question on StackOverflow but I think this might be a better place to look for a more qualified response.
@eileencodes, if you wouldnāt mind chipping in on this one Iād really appreciate it Iāve been using this approximate method (I think it was originally called database hijacking!) since Rails 1.6. Itās always made me nervous but itās worked in production since 2007 so I guess it canāt be all that bad given we have over 180 seperate mysql databases. In the early days our switching method was a bit rubbish and we used to run out of connections to mysql, but the current method seems to recycle the connections nicely and basically emulates a dropped connection right at the start of the request.
In my situation I can create a new database while the application is running (when creating a new client site) and then connect to it by visiting the domain name associated with the site.
Iāve seen a few requests for this kind of behavior but I really donāt think this is a good idea to support. We actually did have some vague support for it with the database kwarg in connected_to but I got so many bug reports from teams using it incorrectly that I decided to deprecate and remove it. It wasnāt worth the extra overhead of support requests. Additionally the risk of someone using it wrong and taking down their database was too high. Fundamentally I donāt think Rails should support adding connections to the database on the fly after boot.
I think this is especially dangerous for anyone who doesnāt understand how the connections work - if you were to establish a new connection with the same class name it would clobber the existing connection pool. Iām think thatās whatās happening when you get the no connection pool error - the pool got clobbered, not exhausted.
I donāt really understand though why you need to create connections while the application is running though. It might help if you make a demo app that shows how this works. I donāt know when Iād get to implementing something like this but Iāll keep it in mind as things evolve.
Thereās still a few features we need to add for tenant sharding for it to be a bit more robust (autoswitcher, config improvements, rake tasks improvements). Itās not been something Iāve been able to prioritize since we donāt use this method at GitHub.
Iāve tried to detail everything in the README and hopefully the commit comments help also.
Having the ability to create new clients (websites in my case) while the application is running is crucial for us. While we could theoretically stop the application and create the new client, then start it up again (presumably with another Rails application?), it would certainly be clunky. We have around 180 clients (and thus a database for each of them). This case doesnāt fit within the existing Rails multi-tenant model that demands that we specify all of these databases up front.
Having a database per client is great for keeping customer data seperate. It also makes backups/restores easier and makes cross-pollution of client data impossible.
The switching mechanism is per my first post. We donāt muck around with connections other than to change the database that rails is configured to connect to, then force a reconnect.
I hope this helps Iām very happy to answer any more questions that you might have. Weāve tweaked this concept over many years and itās working well for us now.
I should quickly follow up with: that while Iāve not noticed any negative side-effects from connection switching in this way (apart from needing to clear the query cache), Iām not 100% sure if there are other caches that need to be cleared also.
The prepared statement cache is already cleared on a reconnect so I donāt bother doing that.
Hey there, just wondering if youāve thought about / tried implementing a zero downtime deployment strategy for your application? That way when you wanted to add an additional database and deploy the changes (ie additional database config), the old version of the app is still running & user facing until the new one is complete?
Hi @Anthony_Hernandez, I kind of do this already with Capistrano, though of course not for the case of adding a new database.
Adding a new customer (including creating the database and loading in the schema) takes all of a couple of seconds or less, and is done infrequently enough, and only by our staff, that thereās no problem with downtime in that regard. The running application doesnāt even care that the new database has been created until itās actually connected to via someone requesting it via one of its domains.
Check out my sample app to see what I mean.
The databases that are being switched out are identical in structure, just not in content, so apart from having to clear some caches, Rails should be none the wiser.
Another way to do this did cross my mind using ActiveSupport::CurrentAttributes and having something like:
Another way to think of what Iām doing is Schema Switching in the postgresql world, where there is an admin schema and then a schema of identical tables for each client. On an incoming connection weād set the schema to be [admin, customers_schema] and carry on. Schema switching is well supported in the postgresql driver last time I looked at this. Alas, Iām using MySQL, but fortunately a ādatabaseā in MySQL is more like a schema anyway, since we can join across ādatabasesā (which I do).
@Joseph3511, I think the error pointed to a deeper issue with pursuing this approach. In the end, Iāve stuck with what I was already doing and it works great. Definitely interested to see what others think of my approach in the demo app.
Could you dynamically add connection configs, and utilize them as horizontal shards? Youād need to be careful about how many connection pools and connections you end up with. You might even need some sort of connection pool reaper.
Thanks @sj26, that is one possibility. If you check out my sample application, Iām actually modifying the exiting config in-place (just changing the database string), then reconnecting.
I guess thatās just a form of dynamic horizontal sharding. When I heard of Rails multi-database support I was hopeful that it would allow for dynamic sharding, but having it all fixed in a yaml file is a no-go when one has 180 seperate databases (one per customer) as I do.
My solution works well. My intention with this thread was to see if others faced a similar issue to mine, and secondarily to see if there was any appetite for making dynamic sharding (for want of a better name) a part of core Rails.
Yeah, interesting. I did look at your example. But youāre creating a new database connection on every request, reusing the same activerecord connection adapter within the same pool, and so youāre not gaining any connection pooling benefits, and youāre leaking connections connected to a different database back into a shared pool. This approach will work at a small scale, and with all databases on the same host, but I think will become difficult to maintain and scale over time.
Utilising a primary database which is the āadminā database, and then dynamically configuring additional shards for tenanting would allow using connected_to(shard: ...) around each request which will do transparent connection pool management and tenant isolation. Isolating connection pools per tenant also avoids noisy neighbour problems.
Thanks @sj26, those are great insights. Exactly the feedback I was looking for.
As you allude to, having lots of connection pools might cause issues, especially if each instance (say in Passenger) doesnāt share its pools.
Iāll investigate the path you suggest the next time Iām working on this area of the application. Itās currently working well at our low scale. My strategy for if we ever got too big for one server was just to operate two or more seperate servers with seperate āadminsā and just spread the customers across these.
If youāre committed to using a single database host and youāre using MySQL then you could also have middleware which does database switching on the same connection per request with USE. Thatās kind-of what youāre doing, but it wouldnāt need to reconnect entirely, nor reach so deeply into ActiveRecord.
Thatās not a bad idea. It rings some bells. I think we used to do that in the early days (Rails 1 and 2), but ran into some issues around the @configuration object becoming out of sync with what it thought it was connected to. Certainly something to look into though.
Iād still love to do this, but I feel that even with dynamic database lists (as I think might now be possible with a proc-backed configuration?) I suspect rails will create a new pool for each and every database which would create other issues further up the chain with MySQLās connection limits.
Is the each database on a different server or the same server? If the same server, Iāve seen an app where it just has a before_action that tells the database to switch databases. I.E.
So desired_database would return the database it should use depending on some logic (subdomain, user, etc). This is off the top of my head so there may be some additional detail but I think that was all it was.