There is something I always wondered. Why are we not starting SQL Queries ahead of time in an async way in the controller so that once we reach the controller, we might already have some results available? In my world view, the SQL Query is mostly blocking IO which might could be run concurrently and might be started ahead of time. Of course, this would mess up the idea of one database connection per request and many more concepts like the lazy queries.
# Controller
def index
@independent_query_slow = Async do Product.all end
@independent_query_fast = Async do User.all end
end
# View
<%= @independent_query_slow.each do |as_usual| %>
# no lazy query happens here, but the query result was might loaded already
<% end %>
<%= @independent_query_fast.each do |as_usual| %>
# no lazy query happens here, but the query result was probably loaded already
<% end %>
<%= @independent_query_slow.where(id: 1234).each do |as_usual| %>
# lazy as usual but has to wait till the query result is available?
<% end %>
In a similar fashion, couldnāt the view be rendered in an async way too? Again, this would break many assumptions but could be dealt with a clear syntax around the blocks.
# Controller
def index
@independent_query_slow = Product.all
@independent_query_fast = User.all
end
# View
<%= @independent_query_slow.some_scope.each do |as_usual| %>
# wait till the query result is available and then start rendering
<% end.async %>
<%= @independent_query_fast.some_scope.each do |as_usual| %>
# wait till the query result is available and then start rendering
# The output will be likely be rendered ahead of the above slower block.
<% end.async %>
<% request.wait # ????? %>
# Or wait at the end of the view till all async renders have finished
The case of the async view rendering could be isolated using something like partials:
# app/views/users/index.html
<% User.all.each do |user| %>
<%= render "user", async: true, user: user %>
<% end %>
# app/views/users/_user.html
<%= image_tag user.avatar # Nicely async loaded from S3 %>
<% user.products.each do |product |%>
# The database connection pool be like 𤯠or
# hopefully has a limit of max X concurrent Queries
# per Request and then starts blocking again.
<% end %>
Sorry that I canāt describe my idea better, but I see all this blocking IO and wonder if we couldnāt speed up the response times of each request by having an opt-in for async/concurrency on certain occasions.
Iāve been using that type of patterns on Elixir apps successfully (it is very well equipped for that type of stuff), and Iāve seen it mentioned by a few in the Ruby world (although I never implemented that for Rails app at this point).
One tricky point is the lifecycle of those async queries:
What will happen to async queries if your main request is interrupted (e.g. an error of some sort) - the impact can be different if these are read-only queries, or queries bringing changes to the database too. Slightly long-running queries can stack up & create troubles.
What will happen to them if your main request finishes, but does not for some reason properly āwaitā for the async queries.
While some tooling can help (e.g. futures in concurrent-ruby), it can get tricky fast .
It is especially tricky in Ruby because AFAIK you still cannot safely kill Ruby threads (article), so this means that you will have to bake some kind of short-term interruptibility (via short safe timeouts, or via another home-baked mechanism e.g. partial checkpoints in multi-step queries).
Another important point is the impact on your connection pool (you could see contention there & meet time-outs due to other queries trying to grab many connections from the pool at once).
So - it is an interesting idea, it will be easy to implement on a small-scale system, but having a robust implementation for this in Ruby will be a fair bit of work.
Iām curious to know if anyone already implemented general-purpose tooling to help that, though, happy to discover anything helpful in that area!
Iāve been toying with that async queries idea for a while now. As other said itās easy to do for simple cases, basically just a ThreadPool from concurrent-ruby and call Relation#load from there.
But once you try to bring this to a real world application some problems arise:
Many apps track various request/job state in Thread.current via various construct. When you schedule the query on a background thread you break these expectations. Marginalia rely on this for instance.
Even Rails itself use thread variables for various stuff, e.g. ActiveSupport::Instrumentation.
If your request fail you want to cancel all your async queries, itās not that easy.
If the query fail you need to forward the exception.
Iām currently working on a prototype of that, but Iām not very confident it can lead to something easy to use and reliable.
Could be interesting to follow the new async support in Django 3 Asynchronous support | Django documentation | Django
It seems like it would be quite a huge effort for something that already works pretty well with external gems (you could reach a similar result with Sidekiq / any other background queue and probably have less issues and pitfalls), but itās still pretty impressive Django is able to do that.
Python has built-in async/await constructs which makes this much easier and non-controversial. To do something similar in Rails weād have to pick an async library, it really wouldnāt be as good.
Also for now Django mostly support async controller and views, but the most important part IMHO is to be able to do async DB queries.
To do something similar in Rails weād have to pick an async library,
But why is that out of the realm of possibility? To me the harder part to solve is what @Thibaut_Barrere is talking about , and thatās more on the Rails side actually than on the async library. How do u know when all the async parts of a request are done, can normal Rails / rack architecture support this? Sounds hard.
That gem is really another topic, itās more of a frontend thing, people have been doing this since ages (e.g. SSI).
But why is that out of the realm of possibility?
Because itās the old problem of mixing async with sync, the same reason why event-machine &co always has been hard to use efficiently, e.g you need to avoid doing blocking IOs, etc. By being integrated into python, asyncio solves most of that mixing problem, itās not the case with Ruby, so it would lead to tons of hard problems for users.
How do u know when all the async parts of a request are done
Thatās what I think I solved with my PR, the async query returns a future, and when you try to iterate on it, hopefully itās fully queried, if itās not we wait on it. If the request is canceled or finish without using the queries, it has a reference on them and can cancel them.
And since my PR uses a thread pool for querying, you donāt need to be fully async aware, the rest of your application can continue to do blocking IOs normally.
@byroot Your PR is super interesting, Iām just going over it and the benchmarks you supplied.
Question: do you think the performance boost is going to be different (e.g smaller) between a threaded webserver like puma / (or sidekiq) and something like Unicorn? Because afaik Ruby VM should switch threads and not block between
Product.all
User.all
And if it doesnāt block overall throughput should be about the same as your async PR no?
do you think the performance boost is going to be different (e.g smaller) between a threaded webserver like puma / (or sidekiq) and something like Unicorn?
No. Assuming you have enough connections and db capacity for all your puma threads, then it should hold up. Itās totally orthogonal to throughput.
Because afaik Ruby VM should switch threads and not block between ā¦
Yes, but it would execute these queries one after the other rather than in parallel, that is what my patch is about. So it would in some case improve response time, and whenever you improve response time you improve throughput, the inverse isnāt true.
This is what I want to wrap my head around, even considering creating a simple benchmark but Iām not entirely sure when would we expect improvementā¦
Iām not entirely sure when would we expect improvementā¦
Whenever you have a controller action that perform more than one query, and that these queries are not dependent on each others (typically your Product / User example).
Also whenever you are preloading more than one relation, e.g Post.all.preload(:comments, :tags), comments and tags could be performed in parallel.
beautiful, it feels like it made it into rails 7 alpha
Asynchronous Query Loading
When you have a controller action that needs to load two unrelated queries, you can now do it concurrently through Relation#load_async. If you have three complex queries that each take 100ms, youād have to spend 300ms executing them one by one before. Now you can run them in parallel, spending only a total of 100ms on the set.