Hi there. I keep running into this issue times and times again :
I have a model, with a has_many
relation. The child model of the relation has a bunch of scopes.
The parent model has a bunch of methods that all rely on querying some of the has_many
records, but filtered using the different scopes.
I want to be able to bulk-fetch an array of parent records, with the child relation preloaded, and use the different methods without making N+1 DB calls. Here is a concrete exemple :
class Order
has_many :order_items
def total_quantity = self.order_items.to_a.size
def refunded_quantity = self.order_items.refunded.to_a.size
def fulfilled_quantity = self.order_items.fulfilled.to_a.size
# Imagine more complex business logic using theses different quantities or similar functions
def pending_quantity = total_quantity - refunded_quantity - fulfilled_quantity
def fully_processed? = self.pending_quantity.zero?
end
class OrderItem
scope :refunded, -> { where(refunded: true) }
scope :fulfilled, -> {where(fulfilled: true) }
end
And I want to do something similar to this properly :
orders = Orders.includes(:order_items).limit(5)
orders.find_each do |order|
# let's say that instead of displaying the info in the console, the `fully_processed?`
# method is used to display a coloured label on an Order#index page:
# green for processed, and orange for pending. Or any other business logic you can think of
puts "Order##{order.id} is #{order.fully_processed? ? 'fully processed!' : 'still pending some processing'}"
end
The above code will generate N+1 queries because while order.order_line_items
is preloaded, order.order_line_items.refunded
is not. And yes, I could add a has_many :refunded_items
association to the Order class, but I would need to do this for the :fulfilled_items
and any other scope I ever want to use, and preloading all theses different scopes will result in multiple separate queries to the order_items
table, when it should just be a single query.
An alternative, is to change the methods on the Order class to be like this :
def refunded_quantity = self.order_items.select { |item| item.refunded }.to_a.size
and only then will preloading order_items
prevent the N+1 queries. This is currently the approach I am using absolutely everywhere, and I am currently no able to use the convenient ActiveRecord where
syntax, as well as not being able to rely on AR scopes and having to duplicate the where
logic of the scopes into Array#select
logic pieces, and since I am working with arrays, it makes it hard to go back to an AR relation if needed down the line.
To fix the issue in an elegant way, I would like to propose the following feature :
If a scope is added to a already-loaded AR association record, return a new AR association with the new condition (as it is currently the case) but with the new association’s records preloaded as a subset of the previous one based on the new conditions (if possible: eg if the new condition contains joins or other complex SQL behaviour, then we can leave the new association unloaded).
Basically, I want AR to do this for me :
class ActiveRecord::Relation
# ...
def where(clauses)
new_relation = self.spawn.where(clauses) # Current behaviour if I'm not mistaken
if self.loaded? && clauses.do_not_contain_joins_or_other_complex_things?
#new behavior (very simplified, i'm not well versed in AR's internals)
filtered_records = self.records.select do |record| # Maybe even self.records.lazy.select ?
clauses.all? do |key, value| # exemple: where(id: 1) -> key = :id, value = 1.
next record.send(key).in?(value) if value.is_a?(Array) # where(id: [1,2,3])
# ... handle other weird where() values
record.send(key) == value # where(id: 1)
end
end
new_relation.records = filtered_records
end
new_relation
end
end
(this specific implementation is probably very naive, but carries the point across. I’m not advocating for this specific implementation, but for any implementation that gives me the result of no N+1 queries in the previous examples)
The reasoning is as follows : If I already have the full relation loaded in memory, then there is no downside to just use what is already loaded and do the filtering on that when spawning a child relation. We avoid round-trips to the DB for operations that are most of the time very trivial (==
operations, or in([...])
operations) and we can do on the already loaded records instead of asking the DB when we finally resolve the new association.
It is a simple solution to greatly simply and DRY-ify code that deals with N+1 queries, while staying simple and reverting to old behaviour in case the new where()
clauses are non-trivial to compute for AR.
What do you think ?