About pool management in ActiveRecord

Yesterday someone commented in my article on Sequel, where I compare
it with AR in some aspects, including pool management:

To answer his comments I decided to first take a glance at the

current (4.1.1) implementation of how the connections’ pool work in
Rails with AR.
After our discussion in the comments, when I was about to sleep, I
was thinking more about this subject and decided it might worth
bringing some ideas to you, in case you’d be interested on them…
Basically, ActiveRecord currently relies on delegating the
connection pool management to the user. Most users don’t realize it
because they don’t usually spawn new threads from the main request
thread and there’s an AR middleware that’s automatically integrated
to Rails that will checkin the connection back to the pool in the
end of the request. Since the connection id is set in a thread local
that means the Rack middleware can only checkin the connection used
in the main request thread. Here’s some example to illustrate:
Assuming the default pool size (5), running this action 6 times will
fail currently:
ab -n 6 -c 1 This is not anything new and Aaron Patterson has already touched
this subject long ago, in 2011:
In a side note, yesterday I learned about an interesting project to
set a common API for job libraries that is intended to be merged to
Rails at some point:
The default adapter (inline) implements an “enqueue_at” method that
will spawn a new thread:
So, calling enqueue_at for a job using the default adapter will
share the same problems of the implementation above.
Then I was thinking that most of AR API could be implemented in a
smarter way, so that this wouldn’t be a problem. That means calling
“with_connection” behind the scenes whenever they need a connection.
Also, even “execute” could be implemented this way. Instead of
checking out a connection by calling AR::Base.connection, it could
simply return a proxy. If you really want to checkout and reserve
that connection you could call “connection.lock” for instance and
then the user would know that they must ensure “unlock” is called
after it’s done. But otherwise, calling “execute” would perform the
query under a “with_connection” block, checking the connection in
back to the pool after running the SQL statement.
I’m just suggesting the idea in case someone might be interested in
coming up with a PoC for this in case the core team agrees with the
suggested approach (it introduces a bit of backward
incompatibilities). I don’t plan to work on this, specially because
I don’t use AR myself, but maybe a better automatic handling of
connections in the pool might be of interest to most AR users…

Specifically regarding a Thread outside of the main request thread holding onto a connection, you might be interested in https://github.com/rails/rails/pull/14360 about more intelligently reaping connections held onto by dead threads.

Unfortunately, the comments on that PR say it will not be backported to 4.1 and will only be included in 4.2 :frowning:

Hi Schuck, that was indeed a smart trick to associate each connection with is owner thread and check whether they are alive when there's apparently no available connection. In that case the reaper is run and could free up those connections when the thread is dead. I liked it :slight_smile: Simple and effective for most use cases.

In theory even the middleware would no longer be strictly required, although it could add a bit of performance by avoiding the reaper to run everytime...

Congrats to Mathew :slight_smile:

But there's still another use case where a smarter implementation would help.

Consider you spawn a new thread and this thread will perform a request to the database. After it's done it will start some processing that could take quite a lot of time but that will no longer need to perform any statements in the database until the long processing is finished. Something like this:

def my_action
   Thread.start do
     posts = Post.all.to_a
     do_long_time_processing_with posts
     Post.update processed: true

In this case, if someone needs a connection while the thread is processing some data the connection would still be marked as in use even though it's idle most of the time. You could certainly work around it by issuing a close in the connection after Post.all.to_a, but it would be better if this was handled in a transparent way.

Specially because it's not always obvious that this is the case. A better management of the connections could decrease the need for a bigger pool size.

That's why I think that it would be a good thing if the connections were not collected only in the end of the request (or thread) cycle but rather just after the statement is run. Unless the user explicitly wants to lock the connection in case they are creating temporary tables valid for the duration of the connection, or something like this...

But it's certainly good to know about this change for the upcoming Rails 4.2.

Thanks for sharing.