Need help porting limited eager loading optimization to Rails 3

I’m fairly certain this 2.x feature never made it to rails 3. You can view the original ticket and patches here:

The gist of it is, when you run something like this:
Article.includes(:comments, :author).where(‘ = 1’).limit(10)

You end up with two queries. The first is a “SELECT DISTINCT…”, and the next will actually load the comments and authors associations. In Rails 2.x, AR was smart enough to only join against the tables that actually limited the resultset (e.g anything in the where or order clauses). Rails 3 will blindly join all the tables, which kills performance when you have several eager loaded associations.

I started working on a patch to apply_join_dependency but ran into a problem with table aliasing. The diff is here:

The approach is basically to scan the order and where clauses for table names. Then scan the included associations for these table names, adding them (and any intermediate joins) to a list, and only joining those associations. The problem is when the arel object is built for clean_relation, it has fewer joins than the original. AR builds a JoinDependency object and JoinDependency#graft’s all my joins to it. That object never actually sees any of the original joins, and so the alias tracker hands out the default table name instead of the aliased table name. I don’t know how to deal with this without hacking up more of the source than I want to - anyone have any ideas about how to deal with this? Or maybe a completely different approach?

I'm not really sure, but maybe hacking directly Arel
( instead AR is easier to solve this


I feel your pain, but in my opinion, moving further down the road of
scanning queries for strings to determine a table's inclusion takes
things in the wrong direction. The trend in core (and one that I hope
continues) has been toward using ARel objects representing the various
parts of a query. For instance, your order-scanning code is subject to
the same bug that this commit fixes:

I don't think there's a quick (and also correct) way to fix this
behavior, without modifying more than this one method. Jon or Aaron
might tell me I'm wrong, though. :slight_smile:

I totally agree - scanning sql strings is pretty crappy. Extracting the necessary tables from the ARel object should be more robust, this was just a quick hack to get things going.

I think there’s a related performance issue here too, where the second query uses the LEFT OUTER JOIN method of preloading associations instead of issuing separate queries. We have the IDs already from the prequery, there’s no need to use the join strategy anymore, right?