Feature request/proposal: Add a few more async query apis

I recently spent some time in our oldest rails application trying to optimize various endpoints that fire a lot of queries, by using the relatively new #load_async method.

I had some success but there are (and I’m sure I’m not the only one in this situation) a lot of queries that are simply not structured as pure ActiveRecord queries, or are otherwise not amenable to the APIs available.

We have not upgraded to 7.1 yet so I did not attempt to use any of the other async_* methods but I was poking around our code trying to figure out if it would even be helpful. I concluded that these methods (from here):

  • async_count
  • async_sum
  • async_minimum
  • async_maximum
  • async_average
  • async_pluck
  • async_pick
  • async_find_by_sql
  • async_count_by_sql

…did not fit most of our code, and so I would like to make the case for additional APIs I believe would be the most useful.

I hope this provides a useful perspective on the async query API.

async_find_by

Sometimes the code looks like this:

something1 = Something.find_by(...)
something2 = Something.find_by(...)
something4 = Something.where(...).join(...).etc.load_async
something3 = Something.find_by(...)

# ...

Where we have find_by queries interspersed with larger ones. The rails app could fire all the find_by queries while the larger one was processing if only async_find_by existed:

something4 = Something.where(...).join(...).etc.load_async
something1_promise = Something.async_find_by(...)
something2_promise = Something.async_find_by(...)
something3_promise = Something.async_find_by(...)

# at some point later, wait for the queries
something1 = something1_promise.value
something2 = something2_promise.value
something3 = something3_promise.value

The workaround here is to refactor to a where().limit(1):

something4 = Something.where(...).join(...).etc.load_async

something1_query = Something.where(...).limit(1).load_async
something2_query = Something.where(...).limit(1).load_async
something3_query = Something.where(...).limit(1).load_async

something1 = something1_query.first
something2 = something2_query.first
something3 = something3_query.first

async_execute

We use ApplicationRecord.connection.execute to run raw sql, perhaps more often than is advisable. This makes #load_async rather lackluster when you have the following situation:

q1 = Something.where(...).load_async
stuff1 = ApplicationRecord.connection.execute(<<~SQL)
  SELECT lots_of_sql FROM eons_ago;
SQL
q2 = Something.where(...).load_async
stuff2 = ApplicationRecord.connection.execute(<<~SQL)
  SELECT lots_of_sql FROM eons_ago;
SQL
q3 = Something.where(...).load_async

It would be nice to have #async_execute for this situation:

q1 = Something.where(...).load_async
stuff1_promise = ApplicationRecord.connection.async_execute(<<~SQL)
  SELECT lots_of_sql FROM eons_ago;
SQL
q2 = Something.where(...).load_async
stuff2_promise = ApplicationRecord.connection.async_execute(<<~SQL)
  SELECT lots_of_sql FROM eons_ago;
SQL
q3 = Something.where(...).load_async

# at some point later, wait for the queries
stuff1 = stuff1_promise.value
stuff2 = stuff2_promise.value

Not in the least because those raw queries are likely the longest-running compared to the (usually smaller) activerecord queries.

Rails.async { ... }

We also have situations that look like this:

something1 = Something.where(...).join(...).etc.load_async
something2 = Something.get_the_thing2
something3 = Something.get_the_thing3
something4 = Something.get_the_thing4

But to make matters worse, the helper methods don’t simply return ActiveRecord queries (relations), they actually process the data a bit in ruby:

def get_the_thing2
  Something.where(...).join(...)
    .map do |record| # this makes it pointless to throw #load_async in here
      # do stuff...
    end
end

And of course sometimes they are raw queries:

def get_the_thing3
  result = ApplicationRecord.connection.execute(<<~SQL)
    SELECT async FROM rails;
  SQL

  # async_execute wouldn't help because we couldn't resist the tempation of massaging the data a bit inside this method
  result.map do |record|
    # do stuff...
  end
end

If only I could throw arbitrary ruby code onto the background thread:

something1 = Something.where(...).join(...).etc.load_async
something2_promise = Rails.async do
  Something.get_the_thing2
end
something3_promise = Rails.async do
  Something.get_the_thing3
end
something4_promise = Rails.async do
  Something.get_the_thing4
end

# at some point later, wait for the queries
something2 = something2_promise.value
something3 = something3_promise.value
something4 = something4_promise.value

This idea raises many questions, but would undoubtedly be very powerful.

What if I don’t need the database?

# perhaps an argument to prevent rails from making a database connection available?
promise = Rails.async(db: false) do
  sleep 3
  ApplicationRecord.connection # => nil/Error?
  2 + 2
end

# wait for 4
promise.value # => 4

What about updating the database?

promise = Rails.async do
  something1 = Something.find_by(...)

  # should updates be prevented?
  something1.update!(fields: true)

  # updates can't be prevented if I use raw queries
  ApplicationRecord.raw_query(<<~SQL)
    UPDATE the_database WHERE most_inconvenient;
  SQL
end

Can outside data be passed in?

some_integer = 42
user = User.find(1234)
promise = Rails.async do
  # would this work? what if the model is a query?
  something1 = Something.where(int: some_integer, user: user)
end

Conclusion

Many of these problems could undoubtedly be alleviated by refactoring our codebase, but I still believe the ActiveRecord::Promise could be taken greater advantage of to provide a few more useful APIs that were “missed” the first time around.

Thoughts? Concerns?