[RFC] Connection pools for AR

I've started a refactoring working towards creating a proper connection pool class/object for ActiveRecord. Progress can be found in my connection_pool branch on github [1]. Some notes about the work:

- One connection pool object created and cached per AR::Base.establish_connection call.

- Connection pool manages the individual connections. Currently it's still a hash of connections per thread but it should be easy to move to a proper fixed pool with connection acquire/release functionality.

- At some point, in order to leverage fixed-size connection pools, we'll need to find a way for application code to notify the pool that it's done with the connection. A controller after_filter is one possibility, I'm interested to hear other thoughts as well.

- Connection API has not changed significantly, but I hope to at least make the connection pool API more sane and deprecate and/or remove a lot of the cruft like active_connection/clear_active_connections!/clear_reloadable_connections!/verify_active_connections! etc. Anyone who uses these or knows the original intent of these, please let me hear more about it. Their purpose seems less clear in light of the refactoring so far.

- Synchronization monitors have been introduced around both connection pool and connection access through a new active_support Module extension I wrote that lets you easily apply synchronization at the method level.

- allow_concurrency now simply flips between a real and null monitor for connection pool access.

- The synchronization overhead is small, enough that I'm wondering what people think about possibly making allow_concurrency default to true (favoring safety over a small boost in speed of connection acquisition). I ran the AR tests for mysql, postgresql and sqlite3 with master and my branch:

master

- One connection pool object created and cached per AR::Base.establish_connection call.

Excellent!

- Connection pool manages the individual connections. Currently it's still a hash of connections per thread but it should be easy to move to a proper fixed pool with connection acquire/release functionality.

- At some point, in order to leverage fixed-size connection pools, we'll need to find a way for application code to notify the pool that it's done with the connection. A controller after_filter is one possibility, I'm interested to hear other thoughts as well.

This will be the biggest change, we can do implicit checkout of a connection when trying to access the database for the first time in a thread, but implicit checkin seems to be asking for trouble. We can handle it for ActionController easily enough by checking the connections back in after the request has been dispatched.

So my first thought is:

Foo.find(:first) # retrieves connection Thread.new do   Foo.find(:first) # retrieves another   Bar.find(:first) # uses existing   ActiveRecord::Base.release_connection   Foo.find(:first) # retrieves another end

With retrieving blocking until the connection is available.

- Connection API has not changed significantly, but I hope to at least make the connection pool API more sane and deprecate and/or remove a lot of the cruft like active_connection/clear_active_connections!/clear_reloadable_connections!/verify_active_connections! etc. Anyone who uses these or knows the original intent of these, please let me hear more about it. Their purpose seems less clear in light of the refactoring so far.

There are two cases that I'm aware of that lead to those 'interesting' methods.

* Reloading in development mode (the classes get undefined, so the connection is gone) * re-establishing timed out connections.

By moving the connections to the pool rather than the classes that get reloaded, we avoid the first problem. Obviously checking out a connection should imply that its been verified.

So with a proper 'pool' we can avoid all those methods, and improve the code for the simple one-thread case too.

This might seem like YAGNI for a lot of you, but I think it has the dual benefit of cleaning up the connection_specification code, plus supporting the endgame for my standpoint, which is making the connection pool implementation pluggable so that I can use the Java application server's connection pool when running with JRuby.

The code cleanup is pretty desperately needed, it's some hokey nasty stuff ;). The dual benefit of getting jruby connection handling a little easier is a definite win.

Comments appreciated!

Rather than swapping in and out a different monitor, why not have two different connection handler classes, one of which returns and manages a single connection, and another which manages a pool of a fixed size.    The single connection version can expose the same API (checkout / check in / release) and perhaps raise exceptions if accessed from threads other than the one which created the connection.

new-connection-per-thread isn't a particularly great idea and I'd be open to dropping it entirely.

Thanks for your work on this, it'll be good to tidy this up.

There are still a bunch of outstanding issues to be decided, especially relating to objects retrieved in one thread and being worked-on in another. This messes up :lock=>true etc. But let's tidy up the code first, then decide what level on concurrent use we want to support.

Fucking Awesome!

Now what the hell am I going to work on for GSoC? O yeah, AP is still full of cruft.

This will be the biggest change, we can do implicit checkout of a connection when trying to access the database for the first time in a thread, but implicit checkin seems to be asking for trouble. We can handle it for ActionController easily enough by checking the connections back in after the request has been dispatched.

Agreed. Implicit check-in seems bad. Pratik suggested in #rails-contrib that we might use a Dispatcher hook instead of a controller filter, which makes sense. Get it out of the way of the application code so it doesn't get touched inadvertently.

So my first thought is:

Foo.find(:first) # retrieves connection Thread.new do   Foo.find(:first) # retrieves another   Bar.find(:first) # uses existing   ActiveRecord::Base.release_connection   Foo.find(:first) # retrieves another end

With retrieving blocking until the connection is available.

I realized we might also be able to make ActiveRecord::Base.transaction do its own checkout/checkin of the connection. There will probably be more opportunities to optimize this later too. For example, making ActiveRecord::Base.connection accept a block and start to change places in the framework to use the block form instead of the return the connection/implicit checkout version. I haven't looked too closely to see what the potential is there, though.

/Nick

LOL, thanks! Heh, sorry to steal your thunder. I'm still very much in favor of your proposal, and the rest of the guys on the JRuby team are too, as it's a big deal for JRuby w/ non-GIL native threading. There's still a lot to do, and this work in AR is actually pretty small in comparison.

Cheers, /Nick

You're doing a better job than I would of. You know what you want from the connection pool to do the JRuby voodoo you were talking about. That makes you more qualified :wink:

You might want to look at this very old patch, which includes another (old) adapter pool implementation and some tests:

http://dev.rubyonrails.org/attachment/ticket/2162/connection_pool_test.rb

It used the approach of acting as an actual adapter, using method_missing to handle checkin/checkout before delegating to the underlying connection. The disadvantage of this is clearly that the overhead of checkin/checkout occurs for every transaction. The advantage though is that the rest of the application need not be aware of connection management at all.

Tom