After upgrading our app to Rails 7, our error monitoring started repeatedly reporting ActiveRecord::ConnectionTimeoutError - could not obtain a connection from the pool within 5.000 seconds (waited 5.002 seconds); all pooled connections were in use. Why did this start to occur after upgrading to Rails 7 and what is the proper way to fix this issue? Is the correct fix to increase RAILS_MAX_THREADS (currently not set)? I would just like to understand why this would now be needed?
All the errors are thrown when returning Active Storage images, with routes such as /rails/active_storage/representations/proxy/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaEpJaWxrTW1FM05XRTRNQzAxWW1JekxUUmhPVFl0T1Roak5TMHhNR1JoTnpNd1pEWXpOemdHT2daRlZBPT0iLCJleHAiOm51bGwsInB1ciI6ImJsb2JfaWQifX0=--250c68a26053e646856f2....
This happens when the user visits pages with collections of Active Storage pictures (typically 30 pictures) stored in S3 and proxied with CloudFlare using this configuration. When I visit these pages myself the images show up fine and the network tab shows they are hitting CloudFlare’s cache (cf-cache-status: HIT). However I guess users in other areas might miss CloudFlare’s cache, resulting in the origin server being hit with too many requests?
The app uses Puma which is configured in config/puma.rb with 14 max threads and 2 workers, and uses a Postgresql standard-0 database which should allow 120 connections. Heroku’s datastore dashboard shows a database connection usage on the last 24 hours of up to max 49 connections. When inspecting ActiveRecord::ConnectionAdapters::ConnectionPool.size in the Rails console it returns 5 and ActiveRecord::ConnectionAdapters::ConnectionPool.connection.length returns 1.
I am not so familiar with devops and database management, so I would be grateful for any help!
The best advice I have seen about connection pools and application server workers is that you should have at least the same number of database connections possible as you have application server workers. And you should set these conservatively at first, and then raise them in lock-step until you see performance degrade. Then add another whole server and a load balancer to divide the traffic between your servers.
The default for Rails is 5 database connections in the pool, unless you explicitly changed that somewhere. If you have more application servers than database connections (and don’t forget any background workers – they need them as well) then you will get this sort of starvation, even if you have enough resources in your database server to handle more.
Thanks very much Walter. In Puma I’ve just reverted the count of 14 threads back to 5 as per the Heroku documentation (*). So we have 1 standard-2X web dyno + 1 standard-1x worker dyno; Puma is configured with 2 workers (a.k.a processes in the Heroku docs) and 5 threads; and as you said, the Rails database adapter has a pool of 5 connections (default).
Maybe reducing the thread count to 5 fixed the issue we were observing? I will monitor. Given our dyno config and the 2 application server workers/processes, how should the various numbers be combined to calculate RAILS_MAX_THREADS to allow increasing it for better performance while not starving the connections? Or maybe it does not affect DB connections and it is some other param that I should look at? I am confused
(*) See Heroku. It had been previously increased to 14 for an undocumented reason (but since then, the dyno had been upgraded to standard-2X and the puma worker count increased to 2).