About pool management in ActiveRecord

Yesterday someone commented in my article on Sequel, where I compare it with AR in some aspects, including pool management:

To answer his comments I decided to first take a glance at the

current (4.1.1) implementation of how the connections’ pool work in Rails with AR. After our discussion in the comments, when I was about to sleep, I was thinking more about this subject and decided it might worth bringing some ideas to you, in case you’d be interested on them… Basically, ActiveRecord currently relies on delegating the connection pool management to the user. Most users don’t realize it because they don’t usually spawn new threads from the main request thread and there’s an AR middleware that’s automatically integrated to Rails that will checkin the connection back to the pool in the end of the request. Since the connection id is set in a thread local that means the Rack middleware can only checkin the connection used in the main request thread. Here’s some example to illustrate: Assuming the default pool size (5), running this action 6 times will fail currently: ab -n 6 -c 1 This is not anything new and Aaron Patterson has already touched this subject long ago, in 2011: In a side note, yesterday I learned about an interesting project to set a common API for job libraries that is intended to be merged to Rails at some point: The default adapter (inline) implements an “enqueue_at” method that will spawn a new thread: So, calling enqueue_at for a job using the default adapter will share the same problems of the implementation above. Then I was thinking that most of AR API could be implemented in a smarter way, so that this wouldn’t be a problem. That means calling “with_connection” behind the scenes whenever they need a connection. Also, even “execute” could be implemented this way. Instead of checking out a connection by calling AR::Base.connection, it could simply return a proxy. If you really want to checkout and reserve that connection you could call “connection.lock” for instance and then the user would know that they must ensure “unlock” is called after it’s done. But otherwise, calling “execute” would perform the query under a “with_connection” block, checking the connection in back to the pool after running the SQL statement. I’m just suggesting the idea in case someone might be interested in coming up with a PoC for this in case the core team agrees with the suggested approach (it introduces a bit of backward incompatibilities). I don’t plan to work on this, specially because I don’t use AR myself, but maybe a better automatic handling of connections in the pool might be of interest to most AR users… Cheers, Rodrigo.

Specifically regarding a Thread outside of the main request thread holding onto a connection, you might be interested in https://github.com/rails/rails/pull/14360 about more intelligently reaping connections held onto by dead threads.

Unfortunately, the comments on that PR say it will not be backported to 4.1 and will only be included in 4.2 :frowning:

Hi Schuck, that was indeed a smart trick to associate each connection with is owner thread and check whether they are alive when there's apparently no available connection. In that case the reaper is run and could free up those connections when the thread is dead. I liked it :slight_smile: Simple and effective for most use cases.

In theory even the middleware would no longer be strictly required, although it could add a bit of performance by avoiding the reaper to run everytime...

Congrats to Mathew :slight_smile:

But there's still another use case where a smarter implementation would help.

Consider you spawn a new thread and this thread will perform a request to the database. After it's done it will start some processing that could take quite a lot of time but that will no longer need to perform any statements in the database until the long processing is finished. Something like this:

def my_action    Thread.start do      posts = Post.all.to_a      do_long_time_processing_with posts      Post.update processed: true    end end

In this case, if someone needs a connection while the thread is processing some data the connection would still be marked as in use even though it's idle most of the time. You could certainly work around it by issuing a close in the connection after Post.all.to_a, but it would be better if this was handled in a transparent way.

Specially because it's not always obvious that this is the case. A better management of the connections could decrease the need for a bigger pool size.

That's why I think that it would be a good thing if the connections were not collected only in the end of the request (or thread) cycle but rather just after the statement is run. Unless the user explicitly wants to lock the connection in case they are creating temporary tables valid for the duration of the connection, or something like this...

But it's certainly good to know about this change for the upcoming Rails 4.2.

Thanks for sharing.

Rodrigo.