Async and concurrent SQL Queries and async view rendering

yejbszwt · July 25, 2020, 10:52am

There is something I always wondered. Why are we not starting SQL Queries ahead of time in an async way in the controller so that once we reach the controller, we might already have some results available? In my world view, the SQL Query is mostly blocking IO which might could be run concurrently and might be started ahead of time. Of course, this would mess up the idea of one database connection per request and many more concepts like the lazy queries.

# Controller
def index
  @independent_query_slow = Async do Product.all end
  @independent_query_fast = Async do User.all end 
end

# View
<%= @independent_query_slow.each do |as_usual| %>
  # no lazy query happens here, but the query result was might loaded already
<% end %>
<%= @independent_query_fast.each do |as_usual| %>
  # no lazy query happens here, but the query result was probably loaded already
<% end %>
<%= @independent_query_slow.where(id: 1234).each do |as_usual| %>
  # lazy as usual but has to wait till the query result is available?
<% end %>

In a similar fashion, couldn’t the view be rendered in an async way too? Again, this would break many assumptions but could be dealt with a clear syntax around the blocks.

# Controller
def index
  @independent_query_slow = Product.all
  @independent_query_fast = User.all
end

# View
<%= @independent_query_slow.some_scope.each do |as_usual| %>
  # wait till the query result is available and then start rendering 
<% end.async %>
<%= @independent_query_fast.some_scope.each do |as_usual| %>
   # wait till the query result is available and then start rendering
   # The output will be likely be rendered ahead of the above slower block.
<% end.async %>
<% request.wait # ????? %>
# Or wait at the end of the view till all async renders have finished

The case of the async view rendering could be isolated using something like partials:

# app/views/users/index.html
<% User.all.each do |user| %>
  <%= render "user", async: true, user: user %>
<% end %>

# app/views/users/_user.html
<%= image_tag user.avatar # Nicely async loaded from S3 %>
<% user.products.each do |product |%>
     # The database connection pool be like 🤯 or 
     # hopefully has a limit of max X concurrent Queries
     # per Request and then starts blocking again.
<% end %>

Sorry that I can’t describe my idea better, but I see all this blocking IO and wonder if we couldn’t speed up the response times of each request by having an opt-in for async/concurrency on certain occasions.

ioquatix · July 28, 2020, 2:04pm

It’s a great idea and should be possible to implement using the db gem and falcon web server.

https://rubygems.org/gems/db

SQL server interfaces already support sending the query and enumerating the results separately.

Thibaut_Barrere · August 1, 2020, 9:26am

I’ve been using that type of patterns on Elixir apps successfully (it is very well equipped for that type of stuff), and I’ve seen it mentioned by a few in the Ruby world (although I never implemented that for Rails app at this point).

One tricky point is the lifecycle of those async queries:

What will happen to async queries if your main request is interrupted (e.g. an error of some sort) - the impact can be different if these are read-only queries, or queries bringing changes to the database too. Slightly long-running queries can stack up & create troubles.
What will happen to them if your main request finishes, but does not for some reason properly “wait” for the async queries.

While some tooling can help (e.g. futures in concurrent-ruby), it can get tricky fast .

It is especially tricky in Ruby because AFAIK you still cannot safely kill Ruby threads (article), so this means that you will have to bake some kind of short-term interruptibility (via short safe timeouts, or via another home-baked mechanism e.g. partial checkpoints in multi-step queries).

Another important point is the impact on your connection pool (you could see contention there & meet time-outs due to other queries trying to grab many connections from the pool at once).

So - it is an interesting idea, it will be easy to implement on a small-scale system, but having a robust implementation for this in Ruby will be a fair bit of work.

I’m curious to know if anyone already implemented general-purpose tooling to help that, though, happy to discover anything helpful in that area!

byroot · August 5, 2020, 9:42am

I’ve been toying with that async queries idea for a while now. As other said it’s easy to do for simple cases, basically just a ThreadPool from concurrent-ruby and call Relation#load from there.

But once you try to bring this to a real world application some problems arise:

Many apps track various request/job state in Thread.current via various construct. When you schedule the query on a background thread you break these expectations. Marginalia rely on this for instance.
Even Rails itself use thread variables for various stuff, e.g. ActiveSupport::Instrumentation.
If your request fail you want to cancel all your async queries, it’s not that easy.
If the query fail you need to forward the exception.

I’m currently working on a prototype of that, but I’m not very confident it can lead to something easy to use and reliable.

yoelblum · August 15, 2020, 5:39am

Could be interesting to follow the new async support in Django 3 Asynchronous support | Django documentation | Django It seems like it would be quite a huge effort for something that already works pretty well with external gems (you could reach a similar result with Sidekiq / any other background queue and probably have less issues and pitfalls), but it’s still pretty impressive Django is able to do that.

byroot · August 15, 2020, 9:20am

Python has built-in async/await constructs which makes this much easier and non-controversial. To do something similar in Rails we’d have to pick an async library, it really wouldn’t be as good.

Also for now Django mostly support async controller and views, but the most important part IMHO is to be able to do async DB queries.

I believe I found a way to avoid all the pitfalls and submitted a PR a few days ago: https://github.com/rails/rails/pull/40037

yoelblum · August 15, 2020, 11:53am

Thanks for sharing , looks interesting! I would also mention async views GitHub - renderedtext/render_async: render_async lets you include pages asynchronously with AJAX , thats also a practical way to load stuff in a non blocking manner.

To do something similar in Rails we’d have to pick an async library,

But why is that out of the realm of possibility? To me the harder part to solve is what @Thibaut_Barrere is talking about , and that’s more on the Rails side actually than on the async library. How do u know when all the async parts of a request are done, can normal Rails / rack architecture support this? Sounds hard.

byroot · August 15, 2020, 12:24pm

I would also mention async views

That gem is really another topic, it’s more of a frontend thing, people have been doing this since ages (e.g. SSI).

But why is that out of the realm of possibility?

Because it’s the old problem of mixing async with sync, the same reason why event-machine &co always has been hard to use efficiently, e.g you need to avoid doing blocking IOs, etc. By being integrated into python, asyncio solves most of that mixing problem, it’s not the case with Ruby, so it would lead to tons of hard problems for users.

How do u know when all the async parts of a request are done

That’s what I think I solved with my PR, the async query returns a future, and when you try to iterate on it, hopefully it’s fully queried, if it’s not we wait on it. If the request is canceled or finish without using the queries, it has a reference on them and can cancel them.

And since my PR uses a thread pool for querying, you don’t need to be fully async aware, the rest of your application can continue to do blocking IOs normally.

yoelblum · August 30, 2020, 8:15am

@byroot Your PR is super interesting, I’m just going over it and the benchmarks you supplied. Question: do you think the performance boost is going to be different (e.g smaller) between a threaded webserver like puma / (or sidekiq) and something like Unicorn? Because afaik Ruby VM should switch threads and not block between

Product.all
User.all

And if it doesn’t block overall throughput should be about the same as your async PR no?

byroot · August 30, 2020, 11:19am

do you think the performance boost is going to be different (e.g smaller) between a threaded webserver like puma / (or sidekiq) and something like Unicorn?

No. Assuming you have enough connections and db capacity for all your puma threads, then it should hold up. It’s totally orthogonal to throughput.

Because afaik Ruby VM should switch threads and not block between …

Yes, but it would execute these queries one after the other rather than in parallel, that is what my patch is about. So it would in some case improve response time, and whenever you improve response time you improve throughput, the inverse isn’t true.

yoelblum · August 30, 2020, 12:44pm

This is what I want to wrap my head around, even considering creating a simple benchmark but I’m not entirely sure when would we expect improvement…

byroot · August 30, 2020, 12:59pm

I’m not entirely sure when would we expect improvement…

Whenever you have a controller action that perform more than one query, and that these queries are not dependent on each others (typically your Product / User example).

Also whenever you are preloading more than one relation, e.g Post.all.preload(:comments, :tags), comments and tags could be performed in parallel.

yejbszwt · April 6, 2021, 7:35am

Rails 7 adds the ability to schedule the query on the background thread pool

There seems to be some progress in this direction

github.com/rails/rails

Allow Adapter#select_all to be performed asynchronously from a background thread pool

rails:main ← Shopify:ar-adapter-async-query

opened 10:17AM - 13 Aug 20 UTC

casperisfine

+388 -35

### Context Sometimes a controller or a job has to perform multiple independe…nt queries, e.g.: ``` def index @posts = Post.published @categories = Category.active end ``` Since these two queries are totally independent, ideally you could execute them in parallel, so that assuming that each take 50ms, the total query time would be 50ms rather than 100ms. A very naive way to do this is to simply call `Relation#to_a` in a background thread, the problem is that most Rails applications, and even Rails itself rely on thread local state (`PerThreadRegistry`, `CurrentAttributes`, etc). So executing such a high level interface from another thread is likely to lead to many context loss problems or even thread safety issues. What we can do instead, is to schedule a much lower level operation (`Adapter#select_all`) in a thread pool, and return a future/promise. This way we keep most of the risky code on the main thread, but perform the slow IO in background, with very little chance of executing some code that rely on state stored in thread local storage. Also since most users are on MRI, only the IO can really be parallelized, so scheduling more code to be executed in background wouldn't lead to better performance. For more context, I experimented with a quick proof of concept in this gist: https://gist.github.com/casperisfine/0ccd24dc209665c46e83bcc2920dd7dc ### PR Scope As to make the feature easier to review, here's I'm only adding the async interface to `AbstractAdapter`, If that's OK, I'd rather keep the changes to `Relation` for a followup. ### Implementation concerns I'll be adding review comments for some specific points, but in general: - The Async query interface should be as close to the database driver as possible as to avoid thread context issues. E.g. the SQL query and binds must already be evaluated etc. - If the request / job is completed without accessing a query result, we should make sure to cancel any in flight async query. That is what `AsynchronousQueriesTracker` is for. I'm just unsure of the best way to wrap it around user code. I feel like we could piggy back on the query cache enable/disable? - I'm quite unsure what to do with `ActiveSupport::Notifications.instrument`, ideally I'd like to find a way to call the notifier on the main thread as to avoid any thread state issues with users callbacks. cc @rafaelfranca @Edouard-chin @tenderlove @matthewd because we talked about this last week. cc @kamipo because Active Record cc @eileencodes because connection pools etc.

yejbszwt · September 21, 2021, 11:21am

beautiful, it feels like it made it into rails 7 alpha

Asynchronous Query Loading

When you have a controller action that needs to load two unrelated queries, you can now do it concurrently through Relation#load_async. If you have three complex queries that each take 100ms, you’d have to spend 300ms executing them one by one before. Now you can run them in parallel, spending only a total of 100ms on the set.

Topic		Replies	Views
ActiveRecord::Querying#find_by_sql supports async rubyonrails-core feature	0	386	January 19, 2022
Feature request/proposal: Add a few more async query apis rubyonrails-core feature , activerecord	0	800	March 20, 2024
Simple system for asynchronous SQL queries rubyonrails-talk	3	185	October 20, 2009
Is there a way to check of an async database query has actually loaded? rubyonrails-core activerecord	4	643	March 25, 2023
Webrick Development vs. Apache Server Speed rubyonrails-talk	1	158	December 14, 2006

Async and concurrent SQL Queries and async view rendering

Rails 7 adds the ability to schedule the query on the background thread pool

Asynchronous Query Loading

Related topics

More Resources