Allow ActiveRecord associations with loaded records to return filtered association when chained with scopes if possible

Hi there. I keep running into this issue times and times again :

I have a model, with a has_many relation. The child model of the relation has a bunch of scopes. The parent model has a bunch of methods that all rely on querying some of the has_many records, but filtered using the different scopes. I want to be able to bulk-fetch an array of parent records, with the child relation preloaded, and use the different methods without making N+1 DB calls. Here is a concrete exemple :

class Order
  has_many :order_items

  def total_quantity = self.order_items.to_a.size
  def refunded_quantity = self.order_items.refunded.to_a.size
  def fulfilled_quantity = self.order_items.fulfilled.to_a.size

  # Imagine more complex business logic using theses different quantities or similar functions
  def pending_quantity = total_quantity - refunded_quantity - fulfilled_quantity
  def fully_processed? = self.pending_quantity.zero?
end

class OrderItem
  scope :refunded, -> { where(refunded: true) }
  scope :fulfilled, -> {where(fulfilled: true) }
end

And I want to do something similar to this properly :

orders = Orders.includes(:order_items).limit(5)
orders.find_each do |order|
  # let's say that instead of displaying the info in the console, the `fully_processed?` 
  # method is used to display a coloured label on an Order#index page: 
  # green for processed, and orange for pending. Or any other business logic you can think of
  puts "Order##{order.id} is #{order.fully_processed? ? 'fully processed!' : 'still pending some processing'}"
end

The above code will generate N+1 queries because while order.order_line_items is preloaded, order.order_line_items.refunded is not. And yes, I could add a has_many :refunded_items association to the Order class, but I would need to do this for the :fulfilled_items and any other scope I ever want to use, and preloading all theses different scopes will result in multiple separate queries to the order_items table, when it should just be a single query.

An alternative, is to change the methods on the Order class to be like this :

def refunded_quantity = self.order_items.select { |item| item.refunded }.to_a.size

and only then will preloading order_items prevent the N+1 queries. This is currently the approach I am using absolutely everywhere, and I am currently no able to use the convenient ActiveRecord where syntax, as well as not being able to rely on AR scopes and having to duplicate the where logic of the scopes into Array#select logic pieces, and since I am working with arrays, it makes it hard to go back to an AR relation if needed down the line.


To fix the issue in an elegant way, I would like to propose the following feature :

If a scope is added to a already-loaded AR association record, return a new AR association with the new condition (as it is currently the case) but with the new association’s records preloaded as a subset of the previous one based on the new conditions (if possible: eg if the new condition contains joins or other complex SQL behaviour, then we can leave the new association unloaded).

Basically, I want AR to do this for me :

class ActiveRecord::Relation
  # ...
  def where(clauses)
    new_relation = self.spawn.where(clauses) # Current behaviour if I'm not mistaken
    if self.loaded? && clauses.do_not_contain_joins_or_other_complex_things?
      #new behavior (very simplified, i'm not well versed in AR's internals)
      filtered_records = self.records.select do |record| # Maybe even self.records.lazy.select ?
        clauses.all? do |key, value| # exemple: where(id: 1) -> key = :id, value = 1.
          next record.send(key).in?(value) if value.is_a?(Array) # where(id: [1,2,3])
          # ... handle other weird where() values

          record.send(key) == value # where(id: 1)
        end
      end
      new_relation.records = filtered_records
    end
    new_relation
  end
end

(this specific implementation is probably very naive, but carries the point across. I’m not advocating for this specific implementation, but for any implementation that gives me the result of no N+1 queries in the previous examples)

The reasoning is as follows : If I already have the full relation loaded in memory, then there is no downside to just use what is already loaded and do the filtering on that when spawning a child relation. We avoid round-trips to the DB for operations that are most of the time very trivial (== operations, or in([...]) operations) and we can do on the already loaded records instead of asking the DB when we finally resolve the new association.

It is a simple solution to greatly simply and DRY-ify code that deals with N+1 queries, while staying simple and reverting to old behaviour in case the new where() clauses are non-trivial to compute for AR.

What do you think ?

Hey, this is very similar to what I proposed here [Feature Proposal] Implement QueryMethods on CollectionProxy to improve performance

1 Like

You are right that is basically the same proposal. Upvoted your post