I recently spent some time in our oldest rails application trying to optimize various endpoints that fire a lot of queries, by using the relatively new #load_async
method.
I had some success but there are (and I’m sure I’m not the only one in this situation) a lot of queries that are simply not structured as pure ActiveRecord queries, or are otherwise not amenable to the APIs available.
We have not upgraded to 7.1 yet so I did not attempt to use any of the other async_*
methods but I was poking around our code trying to figure out if it would even be helpful. I concluded that these methods (from here):
async_count
async_sum
async_minimum
async_maximum
async_average
async_pluck
async_pick
async_find_by_sql
async_count_by_sql
…did not fit most of our code, and so I would like to make the case for additional APIs I believe would be the most useful.
I hope this provides a useful perspective on the async query API.
async_find_by
Sometimes the code looks like this:
something1 = Something.find_by(...)
something2 = Something.find_by(...)
something4 = Something.where(...).join(...).etc.load_async
something3 = Something.find_by(...)
# ...
Where we have find_by queries interspersed with larger ones. The rails app could fire all the find_by
queries while the larger one was processing if only async_find_by
existed:
something4 = Something.where(...).join(...).etc.load_async
something1_promise = Something.async_find_by(...)
something2_promise = Something.async_find_by(...)
something3_promise = Something.async_find_by(...)
# at some point later, wait for the queries
something1 = something1_promise.value
something2 = something2_promise.value
something3 = something3_promise.value
The workaround here is to refactor to a where().limit(1)
:
something4 = Something.where(...).join(...).etc.load_async
something1_query = Something.where(...).limit(1).load_async
something2_query = Something.where(...).limit(1).load_async
something3_query = Something.where(...).limit(1).load_async
something1 = something1_query.first
something2 = something2_query.first
something3 = something3_query.first
async_execute
We use ApplicationRecord.connection.execute
to run raw sql, perhaps more often than is advisable. This makes #load_async
rather lackluster when you have the following situation:
q1 = Something.where(...).load_async
stuff1 = ApplicationRecord.connection.execute(<<~SQL)
SELECT lots_of_sql FROM eons_ago;
SQL
q2 = Something.where(...).load_async
stuff2 = ApplicationRecord.connection.execute(<<~SQL)
SELECT lots_of_sql FROM eons_ago;
SQL
q3 = Something.where(...).load_async
It would be nice to have #async_execute
for this situation:
q1 = Something.where(...).load_async
stuff1_promise = ApplicationRecord.connection.async_execute(<<~SQL)
SELECT lots_of_sql FROM eons_ago;
SQL
q2 = Something.where(...).load_async
stuff2_promise = ApplicationRecord.connection.async_execute(<<~SQL)
SELECT lots_of_sql FROM eons_ago;
SQL
q3 = Something.where(...).load_async
# at some point later, wait for the queries
stuff1 = stuff1_promise.value
stuff2 = stuff2_promise.value
Not in the least because those raw queries are likely the longest-running compared to the (usually smaller) activerecord queries.
Rails.async { ... }
We also have situations that look like this:
something1 = Something.where(...).join(...).etc.load_async
something2 = Something.get_the_thing2
something3 = Something.get_the_thing3
something4 = Something.get_the_thing4
But to make matters worse, the helper methods don’t simply return ActiveRecord queries (relations), they actually process the data a bit in ruby:
def get_the_thing2
Something.where(...).join(...)
.map do |record| # this makes it pointless to throw #load_async in here
# do stuff...
end
end
And of course sometimes they are raw queries:
def get_the_thing3
result = ApplicationRecord.connection.execute(<<~SQL)
SELECT async FROM rails;
SQL
# async_execute wouldn't help because we couldn't resist the tempation of massaging the data a bit inside this method
result.map do |record|
# do stuff...
end
end
If only I could throw arbitrary ruby code onto the background thread:
something1 = Something.where(...).join(...).etc.load_async
something2_promise = Rails.async do
Something.get_the_thing2
end
something3_promise = Rails.async do
Something.get_the_thing3
end
something4_promise = Rails.async do
Something.get_the_thing4
end
# at some point later, wait for the queries
something2 = something2_promise.value
something3 = something3_promise.value
something4 = something4_promise.value
This idea raises many questions, but would undoubtedly be very powerful.
What if I don’t need the database?
# perhaps an argument to prevent rails from making a database connection available?
promise = Rails.async(db: false) do
sleep 3
ApplicationRecord.connection # => nil/Error?
2 + 2
end
# wait for 4
promise.value # => 4
What about updating the database?
promise = Rails.async do
something1 = Something.find_by(...)
# should updates be prevented?
something1.update!(fields: true)
# updates can't be prevented if I use raw queries
ApplicationRecord.raw_query(<<~SQL)
UPDATE the_database WHERE most_inconvenient;
SQL
end
Can outside data be passed in?
some_integer = 42
user = User.find(1234)
promise = Rails.async do
# would this work? what if the model is a query?
something1 = Something.where(int: some_integer, user: user)
end
Conclusion
Many of these problems could undoubtedly be alleviated by refactoring our codebase, but I still believe the ActiveRecord::Promise
could be taken greater advantage of to provide a few more useful APIs that were “missed” the first time around.
Thoughts? Concerns?