Hi, just a small question on the way connections are managed in ActiveRecord.
From the best I can gather reading the code (and based on behaviour), ActiveRecord will check a connection out of the pool the first time a thread asks for one, then that thread will continue to hang on to the connection until it is closed. What is the reasoning behind this approach as opposed to checking out a connection for each DB operation and letting go of it once the operation is completed?
My team has had a couple of issues with this (leading to us needing to use a connection pool equal to the number of threads being used by our server, which is not really a big problem, but feels a bit counter-intuitive). This post is on much the same subject. Iâm really just curious about the way this works.
Hi, just a small question on the way connections are managed in
ActiveRecord.
From the best I can gather reading the code (and based on behaviour),
ActiveRecord will check a connection out of the pool the first time a
thread asks for one, then that thread will continue to hang on to the
connection until it is closed. What is the reasoning behind this approach
as opposed to checking out a connection for each DB operation and letting
go of it once the operation is completed?
Constantly checking out a connection then checking it back in seems like
it would cause a performance bottleneck. The assumption is that lock
contention will become the bottleneck if we have to constantly check in
/ check out connections. The other side of that is it will cause more
complexity throughout AR as we'll have to change every place that
touches the connection.
Doing it once per thread, and giving the user control over checking in
connections (see `release_connection` on the connection pool) seemed
like a good compromise for performance and code complexity.
My team has had a couple of issues with this (leading to us needing to use
a connection pool equal to the number of threads being used by our server,
which is not really a big problem, but feels a bit counter-intuitive).
Is it counter-intuitive? If you're doing database intensive work, then
you'd probably want the same number of connections as you have request
threads in order to maximize throughput, otherwise you'll still be
blocking other requests at some point (see Amdahl's law).
Is it counter-intuitive? If youâre doing database intensive work, then
youâd probably want the same number of connections as you have request
threads in order to maximize throughput, otherwise youâll still be
blocking other requests at some point (see Amdahlâs law).
Yes, in case youâre doing heavy database work you certainly want to have enough connections available to avoid contention. In our case, we had a couple of short database transactions wrapped around other computational work, including a call to an external service (we know). The thing we found counter-intuitive was that when the external call got slow (as they always do), we started being unable to checkout database connections. Not a particularly common or desirable use-case, I know, and I now get that we can work around this by manually checking out connections, so that part is good.
Iâm certainly not suggesting that optimising for the above case is a good idea; like I said, I was just interested. I donât really come from a web or Ruby background, Iâm used to resource sharing systems either forcing you to explicitly checkout/checkin manually or automatically checkout/checkin when you request the resource. I guess threads in web are pretty short lived so it wouldnât normally be an issue.
> Constantly checking out a connection then checking it back in seems like
> it would cause a performance bottleneck. The assumption is that lock
> contention will become the bottleneck if we have to constantly check in
> / check out connections.
>
I dunno, it feels to me like under normal circumstances you'd not see any
*more* lock contention. I would *guess* that under the current scheme, the
first few web transactions would be fast (no locking at all), but after
that you'd end up waiting on average about the same amount of time (since
each thread has to wait for a connection before it can get started
I don't really follow. If you have 5 request threads, and 5 connections
available, no request thread will contend on a lock for a connection
(unless you are using threads, which means you'd have 1 + new threads.count
connections used, which I said is off the beaten path).
and then
needs to do *all* its operations before it checks it back in). I'm curious,
I'll see if I can mock something up to test this using the existing
checkout/checkin functionality.
Sorry, I meant in the case where you have, say, a pool of five connections and ten request threads (which is entirely possible given the default Rails and, say Puma configuration). Where you have a thread per connection, Iâd expect the existing model to perform very slightly better because you only have to go through the process of acquiring the connection once per thread (rather than going through the locking/synchronising process multiple times). Iâd expect them to wait about the same time in this situation when all (or at least a large majority) of the work being done by the app is in the database; the more time you spend processing in the app, the more you could benefit from releasing the connection between database work.
Working on that test case right now, will get back to you
Ah, I see what you're saying now. TBQH, as it really seems fine if we
internally do check connections in. I just don't know how much of an
effort it will require. I suspect there are maybe 1 or 2 places that
we would have to change to get the most benefit. The main thing I worry
about is checking out a connection where we don't actually need to do
any database work (like getting the schema cache).
If you're interested in finding the places we need to fix, I'd be happy
to merge the patch!
From what I can tell, ActiveRecord only provides one way to manually open a transaction, which is a transaction do ... end block; in this case, if the connection is checked out and in around transaction, it will still work OK; youâll have the same connection and the same transaction for everything within that block.
Aaron, I have no idea if Iâll be able to work that out, but Iâll give it a shot.
The only time Iâve seen one connection per thread being an issue was in one app that ran many processes and it started to reach the connection limit of their db server in traffic peaks. Even in that scenario, contention is also a risk if you reduce the pool.
Other than the mental image that youâre using less resources (at the price of contention), I am not sure there is going to be any significant practical win. It could be, I am not saying it wouldnât (I have not done your study), but at first sight the cost/benefit is not clear to me in practical terms.
Regarding transactions, #execute is public AR interface, and
is valid AR code, I am not sure if drivers know the connection is in a transaction and have API to check, but #transaction_open? returns false as of this writing. Why would anybody do that? I donât know, maybe because they are writing a heavy SQL oriented script and doing that manually feels natural⌠it doesnât matter, it can be done.
Another practical point is that when a connection is checked out the connection pool tests if it is alive issuing for example SELECT 1. That would mean that if a request performs today 200 queries, they would become 400 queries. I donât know if the alive check could be rethought to run less frequently, but it probably should to avoid that 2x.
If the concern is long-running threads holding DB connections they donât need, wouldnât a simpler solution be to explicitly return connections to the pool (via release_connection) before doing a long-running non-DB operation? Checking out and back in on most operations seems like optimizing for that uncommon case at a runtime cost for the common one.
Some people will build really small rack apps that use active record, and want to max out their processes/threads. Itâs not that common but it happens.
More commonly people donât know about this behavior and get ActiveRecord::ConnectionTimeoutError because theyâre using default Puma which is set to 16 threads and default AR which is set to 5 connections. I wrote these docs to help people understand the connection requirements of AR https://devcenter.heroku.com/articles/concurrency-and-database-connections
The biggest benefit is going to come to people using a DB like Postgres where each connection uses an entire backend (forked worker essentially), and, given the high cost of backends, itâs recommended to keep the number of connections low.
Checking connections in/out of the pool as necessary would, I think, significantly many medium or larger applications using Postgres as it would make it easier to scale the number of Rails backends without impacting the number of connections to PG required.
Good point. That's an additional overhead, especially for systems with DBs
that are farther away.
One other point, possibly MySQL-specific: there are connection-specific SQL
variables. Under the current system, code can rely on them being set after
they are set in the same thread:
ActiveRecord::Base.connection.execute("SET SQL_MODE = ''")
# now SQL_MODE for the connection is set to empty string
SomeModel.create(...)
With per-DB-interaction checkin/checkout, it doesn't appear to be possible
to reliably manipulate these variables, as the connection used in the first
statement may not match the one used in the second. More amusing /
worrisome, somebody ELSE gets a connection with an altered SQL_MODE...
With per-DB-interaction checkin/checkout, it doesnât appear to be possible to reliably manipulate these variables, as the connection used in the first statement may not match the one used in the second. More amusing / worrisome, somebody ELSE gets a connection with an altered SQL_MODEâŚ
I would personally actually prefer to have to open and close a connection around variables like this, if only because it would serve as a reminder to set, say, SQL_MODE back to what it was; it only feels slightly harder in the current system to hand somebody else a connection with an altered SQL_MODE.
Other than the mental image that youâre using less resources (at the price of contention), I am not sure there is going to be any significant
practical win. It could be, I am not saying it wouldnât (I have not done your study), but at first sight the cost/benefit is not clear to me
in practical terms.
Itâs not so much the mental image that youâre using less resources as the actual ability to use less resources. The practical benefit is that for applications which donât use the database to do all
their grunt work (which seems to be most of them; rendering seems to take at least as long as a DB round trip for at least most things), you donât have to saturate your threads with database connections, which means you can scale your app further.
I guess my theory was that there would be negligibly more contention, and my research seems to
have backed that up, especially given the below. Happy to be proven wrong!
Regarding transactions, #execute is public AR interface, and
is valid AR code, I am not sure if drivers know the connection is in a transaction and have API to check, but #transaction_open? returns false as of this writing. Why would anybody do that? I donât know, maybe because they are writing a heavy SQL oriented script and doing that manually feels natural⌠it doesnât matter, it can be done.
This is certainly an issue; Iâm not sure how to get around it without either connection-per-thread or #transaction_open?. It does feel like a bad thing to do, though; is keeping support for this really that important?
OK; looks like other systems either do an alive-check on each checkout (I donât think this necessarily needs to run a query on the database; for example, JDBC provides a connection#isAlive method; I canât find any source for what this actually does though. Maybe it does just run a query), or they do it scheduled in a background thread, which probably isnât an option here because of the GIL.
My guess is that because my tests were run with the database so close to the app (on the same machine), this wasnât really an issue; I guess latency would be the real performance-killer in this case. Iâll rerun the tests with an external database just to be sure.
The other issue I noticed (which I just realised I hadnât mentioned) was that I had to disable the query cache middleware in order to make checking in and out connections possible; Iâm not entirely sure what that middleware is for (it appears to just semi-temporarily enable the query cache on all requests?), but it poses a problem to checkin-checkout (even in the manual sense).