Async and concurrent SQL Queries and async view rendering

There is something I always wondered. Why are we not starting SQL Queries ahead of time in an async way in the controller so that once we reach the controller, we might already have some results available? In my world view, the SQL Query is mostly blocking IO which might could be run concurrently and might be started ahead of time. Of course, this would mess up the idea of one database connection per request and many more concepts like the lazy queries.

# Controller
def index
  @independent_query_slow = Async do Product.all end
  @independent_query_fast = Async do User.all end 
end

# View
<%= @independent_query_slow.each do |as_usual| %>
  # no lazy query happens here, but the query result was might loaded already
<% end %>
<%= @independent_query_fast.each do |as_usual| %>
  # no lazy query happens here, but the query result was probably loaded already
<% end %>
<%= @independent_query_slow.where(id: 1234).each do |as_usual| %>
  # lazy as usual but has to wait till the query result is available?
<% end %>

In a similar fashion, couldn’t the view be rendered in an async way too? Again, this would break many assumptions but could be dealt with a clear syntax around the blocks.

# Controller
def index
  @independent_query_slow = Product.all
  @independent_query_fast = User.all
end

# View
<%= @independent_query_slow.some_scope.each do |as_usual| %>
  # wait till the query result is available and then start rendering 
<% end.async %>
<%= @independent_query_fast.some_scope.each do |as_usual| %>
   # wait till the query result is available and then start rendering
   # The output will be likely be rendered ahead of the above slower block.
<% end.async %>
<% request.wait # ????? %>
# Or wait at the end of the view till all async renders have finished

The case of the async view rendering could be isolated using something like partials:

# app/views/users/index.html
<% User.all.each do |user| %>
  <%= render "user", async: true, user: user %>
<% end %>

# app/views/users/_user.html
<%= image_tag user.avatar # Nicely async loaded from S3 %>
<% user.products.each do |product |%>
     # The database connection pool be like 🤯 or 
     # hopefully has a limit of max X concurrent Queries
     # per Request and then starts blocking again.
<% end %>

Sorry that I can’t describe my idea better, but I see all this blocking IO and wonder if we couldn’t speed up the response times of each request by having an opt-in for async/concurrency on certain occasions.

3 Likes

It’s a great idea and should be possible to implement using the db gem and falcon web server.

https://rubygems.org/gems/db

SQL server interfaces already support sending the query and enumerating the results separately.

2 Likes

I’ve been using that type of patterns on Elixir apps successfully (it is very well equipped for that type of stuff), and I’ve seen it mentioned by a few in the Ruby world (although I never implemented that for Rails app at this point).

One tricky point is the lifecycle of those async queries:

  • What will happen to async queries if your main request is interrupted (e.g. an error of some sort) - the impact can be different if these are read-only queries, or queries bringing changes to the database too. Slightly long-running queries can stack up & create troubles.
  • What will happen to them if your main request finishes, but does not for some reason properly ā€œwaitā€ for the async queries.

While some tooling can help (e.g. futures in concurrent-ruby), it can get tricky fast :slight_smile:.

It is especially tricky in Ruby because AFAIK you still cannot safely kill Ruby threads (article), so this means that you will have to bake some kind of short-term interruptibility (via short safe timeouts, or via another home-baked mechanism e.g. partial checkpoints in multi-step queries).

Another important point is the impact on your connection pool (you could see contention there & meet time-outs due to other queries trying to grab many connections from the pool at once).

So - it is an interesting idea, it will be easy to implement on a small-scale system, but having a robust implementation for this in Ruby will be a fair bit of work.

I’m curious to know if anyone already implemented general-purpose tooling to help that, though, happy to discover anything helpful in that area!

3 Likes

I’ve been toying with that async queries idea for a while now. As other said it’s easy to do for simple cases, basically just a ThreadPool from concurrent-ruby and call Relation#load from there.

But once you try to bring this to a real world application some problems arise:

  • Many apps track various request/job state in Thread.current via various construct. When you schedule the query on a background thread you break these expectations. Marginalia rely on this for instance.
  • Even Rails itself use thread variables for various stuff, e.g. ActiveSupport::Instrumentation.
  • If your request fail you want to cancel all your async queries, it’s not that easy.
  • If the query fail you need to forward the exception.

I’m currently working on a prototype of that, but I’m not very confident it can lead to something easy to use and reliable.

3 Likes

Could be interesting to follow the new async support in Django 3 Asynchronous support | Django documentation | Django It seems like it would be quite a huge effort for something that already works pretty well with external gems (you could reach a similar result with Sidekiq / any other background queue and probably have less issues and pitfalls), but it’s still pretty impressive Django is able to do that.

Python has built-in async/await constructs which makes this much easier and non-controversial. To do something similar in Rails we’d have to pick an async library, it really wouldn’t be as good.

Also for now Django mostly support async controller and views, but the most important part IMHO is to be able to do async DB queries.

I believe I found a way to avoid all the pitfalls and submitted a PR a few days ago: https://github.com/rails/rails/pull/40037

2 Likes

Thanks for sharing , looks interesting! I would also mention async views GitHub - renderedtext/render_async: render_async lets you include pages asynchronously with AJAX , thats also a practical way to load stuff in a non blocking manner.

To do something similar in Rails we’d have to pick an async library,

But why is that out of the realm of possibility? To me the harder part to solve is what @Thibaut_Barrere is talking about , and that’s more on the Rails side actually than on the async library. How do u know when all the async parts of a request are done, can normal Rails / rack architecture support this? Sounds hard.

I would also mention async views

That gem is really another topic, it’s more of a frontend thing, people have been doing this since ages (e.g. SSI).

But why is that out of the realm of possibility?

Because it’s the old problem of mixing async with sync, the same reason why event-machine &co always has been hard to use efficiently, e.g you need to avoid doing blocking IOs, etc. By being integrated into python, asyncio solves most of that mixing problem, it’s not the case with Ruby, so it would lead to tons of hard problems for users.

How do u know when all the async parts of a request are done

That’s what I think I solved with my PR, the async query returns a future, and when you try to iterate on it, hopefully it’s fully queried, if it’s not we wait on it. If the request is canceled or finish without using the queries, it has a reference on them and can cancel them.

And since my PR uses a thread pool for querying, you don’t need to be fully async aware, the rest of your application can continue to do blocking IOs normally.

1 Like

@byroot Your PR is super interesting, I’m just going over it and the benchmarks you supplied. Question: do you think the performance boost is going to be different (e.g smaller) between a threaded webserver like puma / (or sidekiq) and something like Unicorn? Because afaik Ruby VM should switch threads and not block between

Product.all
User.all

And if it doesn’t block overall throughput should be about the same as your async PR no?

1 Like

do you think the performance boost is going to be different (e.g smaller) between a threaded webserver like puma / (or sidekiq) and something like Unicorn?

No. Assuming you have enough connections and db capacity for all your puma threads, then it should hold up. It’s totally orthogonal to throughput.

Because afaik Ruby VM should switch threads and not block between …

Yes, but it would execute these queries one after the other rather than in parallel, that is what my patch is about. So it would in some case improve response time, and whenever you improve response time you improve throughput, the inverse isn’t true.

2 Likes

This is what I want to wrap my head around, even considering creating a simple benchmark but I’m not entirely sure when would we expect improvement…

I’m not entirely sure when would we expect improvement…

Whenever you have a controller action that perform more than one query, and that these queries are not dependent on each others (typically your Product / User example).

Also whenever you are preloading more than one relation, e.g Post.all.preload(:comments, :tags), comments and tags could be performed in parallel.

3 Likes

Rails 7 adds the ability to schedule the query on the background thread pool

There seems to be some progress in this direction

1 Like

beautiful, it feels like it made it into rails 7 alpha

Asynchronous Query Loading

When you have a controller action that needs to load two unrelated queries, you can now do it concurrently through Relation#load_async. If you have three complex queries that each take 100ms, you’d have to spend 300ms executing them one by one before. Now you can run them in parallel, spending only a total of 100ms on the set.