Looking for comments on a rails patch, lazy association instantiation

Prompted by some discussion on rails-talk last week, I worked up a patch to add an option to ActiveRecord::Base#find which allows associations to be :included for purposes of the where clause, but defers instantiating the included association objects.

The use case is where there are a large number of association rows returned by the query, the vast majority will not be used since they are only being used to select a small set of root objects.

Here's the ticket with the patch: http://dev.rubyonrails.org/ticket/9923

Sorry I intended this to go to the rails-talk list but gmail outsmarted me.

I don't mean to be obtuse, but I don't understand why a new feature is needed. The :joins option works fine for me for finding records based on joins to other tables. Inner joins are a better fit for selection conditions when you don't want the eagerly loaded records. What you are doing with your approach is an outer join (since that's how eager loading works). That includes all the result data for the eagerly loaded records, which you'll be dropping on the floor and ignoring. That's making the database work a lot more for no reason.

If you wanted something friendlier than a SQL joins clause string, like the hash/array syntax used by the :include option, that could be a good alternate syntax for :joins. Much more Rails-like too. I'd be interested in seeing something that lets you do:

Post.find(:all, :joins => {:comments => :author}, :conditions => ["authors.name = ?", name]

That should generate SQL

SELECT posts.* FROM posts INNER JOIN comments ON post.id = comments.post_id INNER JOIN authors ON comment_id = authors.id WHERE authors.name = 'Bob Dobbs';

It's a radically different approach than yours in how you'd do it in ActiveRecord code, so it might not be something you want to take on. If not, I might give it a shot in my copious spare time.

Disclaimer: It's late on a Friday. Take what I say with a few grains of salt!

> Prompted by some discussion on rails-talk last week, I worked up a > patch to add an option to ActiveRecord::Base#find which allows > associations to be :included for purposes of the where clause, but > defers instantiating the included association objects. > > > The use case is where there are a large number of association rows > returned by the query, the vast majority will not be used since they > are only being used to select a small set of root objects. > > > Here's the ticket with the patch: > http://dev.rubyonrails.org/ticket/9923

I don't mean to be obtuse, but I don't understand why a new feature is needed. The :joins option works fine for me for finding records based on joins to other tables. Inner joins are a better fit for selection conditions when you don't want the eagerly loaded records. What you are doing with your approach is an outer join (since that's how eager loading works). That includes all the result data for the eagerly loaded records, which you'll be dropping on the floor and ignoring. That's making the database work a lot more for no reason.

If you wanted something friendlier than a SQL joins clause string, like the hash/array syntax used by the :include option, that could be a good alternate syntax for :joins. Much more Rails-like too. I'd be interested in seeing something that lets you do:

Post.find(:all, :joins => {:comments => :author}, :conditions => ["authors.name = ?", name]

That should generate SQL

SELECT posts.* FROM posts INNER JOIN comments ON post.id = comments.post_id INNER JOIN authors ON comment_id = authors.id WHERE authors.name = 'Bob Dobbs';

It's a radically different approach than yours in how you'd do it in ActiveRecord code, so it might not be something you want to take on. If not, I might give it a shot in my copious spare time.

Well your SQL foo is no doubt stronger than mine, but in my use case I tried :join and it didn't work.

Here's my motivating case:

   I've got one model called Item which has_many Schedules. Schedules have a start-time. There are LOTS or schedules.

  I'm trying to get all of the Items which have at least one Schedule which hasn't yet started, the purpose of this is to populate a selection list.

  Once one of those is selected, the list of schedules owned by that Item is presented for selection.

So, it was suggested to me here that

   Items.find(:all, :include => :schedules :conditions => ["schedules.begin_date >= ?", horizon_time]

was the best way to do this.

It works, but since there are many, many schedules in this database, instantiating all their AR objects is costly in time and space.

Now I tried substituting :joins and doing an inner join. The result was that I got duplicate AR objects for my Items.

The deal is that when you use :include, AR does the query, and then iterates over the result and instantiates only one object for each unique row in the first table. It then adds the objects for the included tables to the associations.

The join option does the query and just builds one AR object from each row of the result which isn't what I need.

Disclaimer: It's late on a Friday. Take what I say with a few grains of salt!

It's even later here, (you are on the west coast aren't you. <G>)

Okay, I just reread your post, and this seems to be the nugget.

You're proposing an extension of the value of the :joins option which if used would trigger the same kind of object identity preservation of the instances of the AR class receiving the find message.

That makes sense I think.