So eager loading via :include is far from perfect: as soon as there are more than one has_many then you're getting a big cartesian join and end up processing 10000 rows of information in order to get out 200 rows of extra stuff (assuming two 100 object has_manys)
I've knocked together a patch (http://dev.rubyonrails.org/ticket/9640) which takes a different approach:
Author.find :all, :conditions => blah, :preload => [:posts, :comments]
will first load authors, then one query for posts and one for comments
You can nest things like you can with eager loading i.e.
Author.find :all, :conditions => blah, :preload => [:comments, {:posts => :categories}]
The number of queries will be n+1 (except if you have a has_many :through, which count double since we first load the :through association)
The one thing you definitely can't do is have conditions on the preloaded tables, so
I've put together some unit tests (and by 'put together' I mean 'nicked the tests for eager loading and replaced :include with :preload') that at least cover basic use. I think most of what is achievable via :include should also also be doable in this way, but it's definitely the case that right now I've only handled the simple case for a lot of stuff. I'm sure there are various combinations of options that aren't handled or covered by the tests although I don't see why most of it can't be fixed up.
I'm not sure where all this fits in but thought it was interesting enough that others might want to have a look. In the specific case I had in mind (large has_many loads) it's way faster than :include, and seems to be as fast or faster in other cases too.
I guess this will be a very good addition. I'm +1 on the idea. But we
need to come up with a bit clearer distiction about when should one
use :include and when :preload.
Fred : It would be interesting if you can compare results of :preload
with :include after applying Gabe's patch - http://darwinweb.net/ Just
a thought.
Yes, that was part of my inspiration in a slightly circular way
I guess this will be a very good addition. I'm +1 on the idea. But we
need to come up with a bit clearer distiction about when should one
use :include and when :preload.
Sure - this is still somewhat experimental. I'm going to do a little
benchmarking sometime today or tomorrow to work out when it's a
winner and when it isn't
Fred : It would be interesting if you can compare results of :preload
with :include after applying Gabe's patch - http://darwinweb.net/ Just
a thought.
I think that what Gabe is saying is slow :include = slow prequery +
lots of rows, and then fixes the slow prequery part. With :preload
there isn't really a prequery (unless you count the initial find).
Either way you end up with the same constraint: conditions only on
the base table, except that :preload goes on to fix the next part of
the problem too, which is the bit that was biting me (we were already
sidestepping the prequery issues by doing foo_ids = Foo.find
(:all, :conditions => [...], :limit => 10; Foo.find(foo_ids, :include
=> [...])
What preload can do that I don't think you can easily do
with :include is make conditions more intuitive, so as gabe says
Article.find(:all, :include => :tags, :conditions => "tags.name =
'foo'") is somewhat counter intuitive since it zaps articles with no
tags matching the conditions. You could (if you came up with a better
syntax) imagine preload doing
Article.find(:all, :preload => {:name => 'tags', :conditions => "name
= 'foo'}) which might be a little more intuitive and dead simple to add.
= 'foo'}) which might be a little more intuitive and dead simple to
add.
I was dreaming about this last night.
But you have to do it cleverly.
eg. the above statement should do which of the following? (i'm gonna
write these out in pseudo-AR/SQL)
a) Article.find(:all), Tag.find(:all, "id IN *what_we_just_got* AND
name='foo'")
b) Article.find(:all, "EXISTS (SELECT * FROM tags WHERE article_id=
articles.id)"). Tag.find(:all, "id IN *what_we_just_got*)
c) Article.find(:all, "EXISTS (SELECT * FROM tags WHERE article_id=
articles.id)"). Tag.find(:all, "id IN *what_we_just_got* AND
name="foo")
but I fear that offering too many options will make the situation
worse.
= 'foo'}) which might be a little more intuitive and dead simple to
add.
I was dreaming about this last night.
But you have to do it cleverly.
eg. the above statement should do which of the following? (i'm gonna
write these out in pseudo-AR/SQL)
Well I was thinking of a) - certainly in the light of the pseudo
syntax i wrote above it's what I would expect.
c) is what you currently get if you do Article.find :all, :include
=> :tags, :conditions => "tags.name = 'foo"
which can be counterintuitive - at least to me, moving the condition
inside :preload reads to me like 'load articles and then preload
matching tags
I've no idea what the common/useful case is though.
class Post
has_many :comments, :preload => :author
end
and related work. At this point :preload basically covers everything that :include does, except that conditions can't be specified on the :preloaded tables.
Conditions on the associations do however work, ie
class Post
has_many :unapproved_comments, :class_name => 'Comment', :conditions => 'approved = 0'
end
Post.find(:all, :preload => :unapproved_comments)
works fine, but Post.find(:all, :preload =>:comments, :conditions => 'approved = 0') doesn't work.
The ticket also has a few benchmarks, drawn from some of the stuff I've worked on. :preload seems to be faster than :include (not always by much) in all cases, but the big win is multiple has_manys (because of the cartesian product being loaded etc...). That's just on data that I have access to, it would be great to see how it works in other circumstances.
Anyway, to conclude the ramble it would be nice to see what difference this makes to other apps/tasks (regardless of what happens to the patch itself).