We are seeing database scalability issues when working with AWS Aurora Postgresql read replica autoscaling. Specifically, after a new read replica becomes available, connections are not rebalanced across the read replica cluster, causing wasted resources. This largely happens because AWS RDS primary read-only endpoint seems to be implemented using DNS load balancing across the read replicas. Once the connection is established, the opportunity to load balance no longer exists.
A potential solution to this would be allowing the
ActiveRecord::ConnectionAdapters::ConnectionPool to have an optional
lifetime_timeout, which would force a connection to be reconnected. It could likely be implemented as simply as disconnecting and removing a connection from the pool when it is checked back in and the connection has been alive for longer than the configured timeout.
I would presume the default behavior for a connection pool would be to not need to recycle connections unless explicitly configured to. The likely values someone would want to set a
lifetime_timeout to would be in the 10 minutes to 1-hour range. This would allow the system to still benefit from connection pooling/reuse while still allowing for periodic cycling behavior.
There are additional benefits to connection maximum lifetimes with PostgreSQL, such as allowing PostgreSQL to more efficiently manage used memory and caches.
Does this seem like a beneficial addition to an
ActiveRecord::ConnectionAdapters::ConnectionPool or are there alternatives we have missed and should consider? Happy to attempt a PR if the idea seems worthy.