An issue people frequently run into when scaling apps that use Postgres is the high memory use per-connection on the Postgres server (Postgres uses a process per-connection). This is commonly solved by running PgBouncer in transaction-pooling mode in front of Postgres to share those connections between many clients (Heroku recommend this approach and even have a buildpack for it).
Unfortunately, ActiveRecord makes use of session-level settings, which are set during connection initialisation. In most cases, it’s not a huge problem. You can use PgBouncer’s connect_query to set the same settings as ActiveRecord (making the ActiveRecord SET commands effectively no-op).
However, in a notable exception, ActiveRecord temporarily disables reporting of query errors. This leaves random server connections in a bad state, where they won’t report any errors, including constraint violations (see #22101).
At a minimum, it feels like #22101 should be merged, and some documentation written for Rails users who are running PgBouncer (we have some thoughts on what it should be[1] and we’re happy to write this).
It may also be nice to provide a flag which stops ActiveRecord from using connection-level settings. This would reduce instances of weird behaviour where people have a connect_query that is out of sync with what ActiveRecord does (which itself may change between releases).
To be clear, this goes beyond #22101. ActiveRecord sets things like time zone at the connection level, so users may get weird behaviour with date/time, depending which backend connection PgBouncer happens to use for a given query.
From a quick search, it looks like other people are running into this situation in the wild:
I’ve been doing a little more digging. I instrumented PG::Connection.async_exec to see everything that gets executed during connection initialisation. It turns out relatively few things are set on the connection[1]. Those things are:
client_min_messages
standard_conforming_strings
time zone
PgBouncer has a safe way of setting the last two when using transaction pooling - specifically:
Startup parameters are: client_encoding, datestyle, timezone, standard_conforming_strings and application_name. PgBouncer can detect their changes so it can guarantee they remain consistent for client. Available from PgBouncer 1.1.
If the ActiveRecord maintainers are cool with the approach, I’d like to:
Set standard_conforming_strings and time zone as part of the connection string, rather than using SET statements on an already open connection
Write documentation for using ActiveRecord with PgBouncer (an explanation of the caveats, which features you can and can’t use)
The last debatable point is client_min_messages. In Rails 4.0, WARNING became the default setting to avoid noise in logs. I suspect the solution here is documentation - have people set it to match the Postgres default in database.yml (or conversely use PgBouncer’s connect_query to match what ActiveRecord sets).
I realise there’s a lot of subtle stuff going on here. If anything’s unclear, I’m happy to go into more detail. If not, I’d love to hear from a maintainer on whether that proposed solution is or .
[1]: More things may be set depending on the contents of database.yml. For instance, if a schema_search_path or extra variables have been provided, those will be set on the connection.
Write documentation for using ActiveRecord with PgBouncer (an explanation of the caveats, which features you can and can’t use)
Sounds like there definitely needs to be better documentation. Most of these issues come about with any connection pooler that supports connection pooling, so it doesn’t need to be pgBouncer-specific.
The last debatable point is client_min_messages. In Rails 4.0, WARNING became the default setting to avoid noise in logs. I suspect the solution here is documentation - have people set it to match the Postgres default in database.yml (or conversely use PgBouncer’s connect_query to match what ActiveRecord sets).
I don’t know if this is possible, but the best way to handle this would be a mode where Rails simply disconnects from the database at the end of handling an inbound connection. One of the major points of a connection pooler is to make getting a new connection fast, so that you can connect and disconnect at will. You might not want that if you’re not using an external pooler, but having rails doing pooling and then using an external pooler as well is just begging for trouble.
In 99% of cases letting the connection pooler worry about managing database connections will be the best option, because it makes each Rails thread stateless. The only time you wouldn’t want that is if you need to run some very time consuming stuff before returning the page, and don’t want to hold the database connection while you do it. The best way to handle that is to allow Rails code to close the database connection explicitly.
With current functionality, you can get all kinds of screwy behavior, beyond what’s referenced here. For example, starting a transaction and not explicitly closing it (with either a commit or a rollback). This is the danger of any kind of nested connection pooling: if you’re not extremely careful to keep both pools in sync you’re going to get all kinds of weird things.