[Proposal] Adding postgres pipeline support in ActiveRecord

Problem Statement We have a scenario where there is a network delay in the database communication because of inter region / datacentre communication. While load_async looks promising to improve app performance, it does add some more load on no of database connections and no of Ruby threads. There is a feature in Postgres 14 of pipelines PostgreSQL: Documentation: 14: 34.5. Pipeline Mode which will help in sending queries to database even before waiting for the response on same connection and can bring performance benefits. If the support for Pipelining is added in ActiveRecord then it will benefit in reducing the latency where network delays are biggest concern.

ruby-pg change Change in Ruby-pg gem has been done as part of the PR Add support for Pipeline mode of PostgreSQL-14 · Issue #401 · ged/ruby-pg · GitHub

Proposed interface Similar to ActiveRecord::Base.connected_to(role: :reading) , there can be an interface like

ActiveRecord::Base.connected_with_pipeline(role: :reading) do
  # all code in this block will be connected and executed in pipeline connection
end

For an entity with children being loaded as separate DB calls instead of joins (e.g. the case of preload), the interface could be something like

User.connected_with_pipeline(role: :reading).joins(:addresses).where("addresses.country = ?", "Poland").preload(:addresses)

Challenges / Views

Wanted to understand the impact areas which we need to be aware of before making/start doing this change. Also wanted the community views around this new behaviour to be added and any relevant suggestions on the same

Few of the challenges in implementation that I could think of are :

  • In pipeline mode, the results are sent back in the order in which the send_query was invoked, therefore some state would be needed to maintain the order in case of block calls, so as to initialise correct entities (Refer PostgreSQL: Documentation: 14: 34.5. Pipeline Mode).
  • For dependent queries, having a mechanism of waiting if a subsequent query uses a field of a previous query (Similar behaviour in load_async).
  • Error response handling for different queries, as if one of the query fails in the pipeline other queries are not processed
  • Compatibility with other modes of loading an entity (load_async, eager_load, includes, preload)

Do suggest any relevant things to watch out for, or in general. about the change which we are thinking to add.

Thanks

You could probably piggy back on ActiveRecord::FutureResult to implement this, since the interface would likely be the same.

That being said, my humble opinion after reading this:

  • This is a fairly niche need (talking to a DB that has bad latency is generally done as a last resort measure, nobody willingly does this).
  • Like load_async, It’s only applicable if you have independant queries, which further limits the applicability.
  • It’s quite a lot of extra complexity and code for Active Record (even though a good part could be shared with load_async) for something that only works with one DB (even though other clients could add support for it too).

Where it could be useful in a more general case is for preloads. We’ve done some testing with using load_async automatically when preloading associations, and the results were mixed, but maybe pipelining would be more interesting there. But again the overwhelming majority of apps talks to databases with a fast LAN connection, so the rountrip is expected to be way under 1ms, so you’d need to pipeline quite a few queries before you see noticeable gains.

All this to say that you should feel free to explore this, but it won’t be easy and the likeliness of it being merged isn’t particularly high.

2 Likes

Also, another approach that may make sense, and could be way easier, would be to implement a PostgresqlAdapter that always pipeline and return ActiveRecord::FutureResult.

With such adapter and some minimal changes in Active Record, you’d get efficient pipelining when using preload.

1 Like