Proposal: cross-pool connection cap for multi-tenant (DB-per-tenant) apps

TL;DR

In multi-tenant Rails apps using database-per-tenant, the only knobs we have to bound total connections per app instance are per-pool (max_connections, idle_timeout, max_age, etc.). This means worst-case total connections scale with tenant_count × max_connections, with no hard ceiling. I’d like to propose an opt-in cross-pool reaping step: when reaping runs, if a global cap is configured and sum(active_connections across pools) > cap, reap idle connections across pools until under the cap.

Background discussion (which I’m migrating here per CONTRIBUTING): Reap Connections Based on Total Connections Across Pools · rails/rails · Discussion #55529 · GitHub

Problem

We run a multi-tenant app with a separate database per tenant — each tenant gets its own ConnectionPool. As tenant count grows, two things scale badly:

  1. DB-server side: total connections to a shared Postgres instance can spiral. (PgBouncer/RDS Proxy mitigates this, but is out of scope here.)
  2. App-instance side: total connections held by one app process can spiral to roughly N_tenants × max_connections_per_pool + 1. There’s no upper bound enforceable from within Rails.

For (2), today’s mitigation is to tune idle_timeout and reaping_frequency aggressively — but those are probabilistic. They reduce average usage; they don’t guarantee a ceiling. A traffic burst across many tenants can still push total connections well past what the app’s resources (or the DB’s max_connections) can support.

What 8.1 already gives us

The new options in 8.1 (max_age, min_connections, pool_jitter, refined keepalive) are genuinely helpful — max_age in particular forces recycling of long-lived connections, and pool_jitter reduces synchronized reconnect storms across many pools. But all of these are per-pool and none provide a global ceiling. If I have 500 tenants and max_connections: 5, I can still legitimately hold 2,500 connections from one app instance during a burst.

Proposed solution (sketch)

Add an opt-in configuration (working name: max_total_connections, settable on ActiveRecord::Base.connection_handler or as a top-level setting) that:

  1. During the existing reaping cycle, after each pool’s local reap completes, check sum(connections_in_use_or_idle) across all pools managed by the handler.
  2. If the sum exceeds the configured cap, iterate pools (LRU by last activity? round-robin? — open question) and reap idle connections until the sum is back under the cap.
  3. Connections in active use are never forcibly closed — the cap is a soft cap that’s enforced by aggressive idle reaping, not by interrupting checked-out work. If demand legitimately exceeds the cap and no idle connections exist to reap, callers experience the existing checkout_timeout semantics.

This is purely additive: the option defaults to Float::INFINITY (current behavior).

Open questions

  1. Scope. Should the cap apply per-ConnectionHandler, per-process, or be configurable? Per-handler seems right (one cap per “role” in the multi-DB sense) but I’d love input.
  2. Reaping order. When over-budget, which pool gives up connections first? Options: LRU by last activity, largest pool first, weighted by min_connections, or simple round-robin. LRU feels most intuitive but adds bookkeeping.
  3. Interaction with min_connections. A pool with min_connections: 2 should presumably not be reaped below its floor even when over budget — otherwise the floor is meaningless. Is that the right call, or should the global cap override?
  4. Where to wire it in. Most naturally inside the Reaper, called once per reaping cycle after each pool’s reap finishes. But there may be a cleaner place I’m not seeing.
  5. Naming. max_total_connections? global_max_connections? connection_handler_max_connections?

Happy to put together a patch if there’s appetite for this. Wanted to validate the approach first.