Preloading Strategy Errors

I just got bit by a pretty subtle bug with eager loading. Basically
what happened is mixing a :group and :include clause. The group
clause of course collapsed the associations so I only got one of each
association (this was not immediately obvious due to the data I'm
working with). What makes it so subtle is that if the Preload
strategy is used then there is no problem.

In my case the condition it was even less obvious because the
condition that was triggering the Join strategy was not applicable
anyway. I realize that parsing the conditions and cross-referencing
to other options in order to determine the true applicability of the
Join is a rabbit hole no one wants to go down.

But I was thinking about the other side. If there is a :group clause
in a query, my gut instinct is that a large majority of the time it is
going to clobber associations. Given the fact that it is pretty easy
to add explicit joins to satisfy conditions, but pretty ugly to force
the preload strategy, would there be any support for either:

A) setting up :group clause to force the Preload strategy

B) raising an exception or warning if :group is used with :include of
a non-singular association

C) giving access to the Preload strategy explicitly

I'm finding a proliferation of Klass.send(:preload_associations,
foo, :bar) throughout my codebase and it doesn't smell good. The more
I think about, the more C seems like it might be the only reasonable
option. Thoughts?

C) giving access to the Preload strategy explicitly

I'm finding a proliferation of Klass.send(:preload_associations,
foo, :bar) throughout my codebase and it doesn't smell good. The more
I think about, the more C seems like it might be the only reasonable
option. Thoughts?

I've also found myself doing that :preload_associations trick from
time to time and I'm similarly thinking that that shows two things:

1) Maybe we should make a nicer way for people to do this (fred cheung
was playing with an array proxy to enable this kind of thing)
2) We might want to make :strategy=>:preload an option to find.

I've no idea if :strategy is a good name for the option, but I think
it's probably a good idea to investigate whether the option is
feasible and if it would be useful.

Care to take a crack at it?

Yeah I might give it a shot. I'm a little hammered at work right now,
but the code in question has been pretty stable for a while and I
think it'll be reasonably easy to implement :strategy option. I'm
also happy to collaborate on Fred's work if he's interested... I have
some use for some detached type of preloading.

One thought I had about :strategy => :preload is that compared to the
current hacky approach it's a bit less flexible, because it doesn't
let you mix and match. Given that drawback, I wonder if its even
worth complicating the API. Maybe officially
making :preload_associations public and documenting it would be worth
it?

I was thinking the same thing. I find preload_associations to be a useful tool to work with directly. +1 to making it (or something similar) public and documented.

Ideally I’d like to fix how :include works. I see :include as a pure optimization step. Lack or presence of it should only make things slower/faster and in no way change the way things work. My ideal fix would be :

  • Make :include always use preload strategy unless the required data set is explicitly loaded

  • Introduce :left_joins and :right_joins keys to the finder. Title says it all

  • Make user explicitly specify the required joins if they want to put conditions on the associations

  • For all the associations specified in :include, check if the required dataset has already been loaded by :*joins. If not, preload.

Thoughts ?

- Make :include *always* use preload strategy unless the required data set
is explicitly loaded
- Introduce :left_joins and :right_joins keys to the finder. Title says it
all
- Make user explicitly specify the required joins if they want to put
conditions on the associations
- For all the associations specified in :include, check if the required
dataset has already been loaded by :*joins. If not, preload.
Thoughts ?

Sounds good to me, but I also like the idea of being able to do something like

@orders = Order.find_by_sql(CRAZY_SHIT)
@orders.preload(:line_items=>{:sku=>:title})

I like this approach. :include should definitely preload, and people should have to use :joins and :left_joins when they need them all in the same query.

What would :right_joins do though? You need a left record to load associations of, otherwise how would you get to the right objects?

Regarding making people explicitly specify the required joins to put conditions on the associations, if I understand you correctly, I think this would tend to mean that you end up with numerous copies of your basic foreign key-based join conditions throughout the codebase, which is not very DRY.

IMHO when people want to load associations with extra conditions, what they’re really saying is that they want a subset - for which they should ideally be adding a named_scope on the associated model and somehow telling Rails to load that scope only.

For example, instead of:

Lunch.find(:all, :include => :pies, :conditions => {:pies => {:steak => true}})

as it is currently, or

Lunch.find(:all, :joins => [“LEFT JOIN pies ON pie.lunch_id = lunches.id AND steak = ?”, true])

which is my interpretation of what you’re saying below (please correct me if not), I would like some way to say

Lunch.find(:all, :include => :“pies.steak”)

where Pie defines

named_scope :steak, :conditions => {:steak => true}.

However, it’s hard to think of a good syntax that shows what scope we want to be loaded, and in any case I don’t think we have loaded-ness for scopes yet as we do for associations, so there’d be a number of things that would need to be changed to make that kind of scheme work - it doesn’t sound like an easy approach.

So currently we’re stuck with defining extra associations (Lunch.has_many :steak_pies as well as the original has_many :pies). I’d love a way to get around that, other suggestions welcome!

Will

I like this approach. :include should definitely preload, and people should have to use :joins and :left_joins when they need them all in the same query.

What would :right_joins do though? You need a left record to load associations of, otherwise how would you get to the right objects?

Right join is just one of the gazzilion JOINs a typical RDBMS supports. But I guess we could live without it.

Regarding making people explicitly specify the required joins to put conditions on the associations, if I understand you correctly, I think this would tend to mean that you end up with numerous copies of your basic foreign key-based join conditions throughout the codebase, which is not very DRY.

Not really. You can already specify associations in :joins arguments. So your example would be :

Lunch.find(:all, :left_joins => :pies, :conditions => {:pies => {:steak => true}})

Or if you also want to instantiate pies :

Lunch.find(:all, :left_joins => :pies, :conditions => {:pies => {:steak => true}}, :include => :pies)

In this specific case, you could just use :joins too - which uses INNER JOIN.

As for your scope example, I think that’s a different problem altogether.

Thanks!

Ah, when you said “Make user explicitly specify the required joins if they want to put conditions on the associations” I thought you were suggesting getting rid of that - I interpreted that as saying that you would have to write out the join conditions.

No worries then, I like what you’re suggesting.

But just to clarify, if you write the above statement - a left join, plus conditions on the joined table, will those conditions work as they currently do?

I ask because there’s a big difference between adding terms into the left join conditions and adding terms into the overall statement conditions - the former would exclude the right rows (pies) if the conditions aren’t true, whereas the latter excludes the left rows (lunches) if the conditions aren’t true also.

We only have a way to do the latter at the moment; would it be worth adding a way to do the former?

Lunch.find(:all, :left_joins => :pies, :conditions => {:pies => {:steak => true}})

Would load only lunches that have steak pies currently - meaning that statement will give just the same results as

Lunch.find(:all, :joins => :pies, :conditions => {:pies => {:steak => true}})

Which somewhat defeats the purpose of using a left join - it would be nice to have a way to load all lunches plus any steak pies that they have, since that’s presumably why you’re using a left join in the first place.

(I rate this only a “nice to have” with left joins because admittedly, you can do this by writing the conditions out as a string instead, and use “OR pies.id IS NULL”. But I think that beginners wouldn’t expect to have to do this every time.)

- Make :include *always* use preload strategy unless the required data set
is explicitly loaded
    
With that you mean that it's not going to interpret the select, order and condition strings anymore to determine if they're referencing some table? I'd be all for that! The interpretation of those string currently uses a too simple heuristic and is therefor just not working in practice.

- Introduce :left_joins and :right_joins keys to the finder. Title says it
all
    
+1. Though I prefer the terms :left_outer_joins and :right_outer_joins. For completeness sake also introduce :full_outer_joins.

- Make user explicitly specify the required joins if they want to put
conditions on the associations
- For all the associations specified in :include, check if the required
dataset has already been loaded by :*joins. If not, preload.
    
I guess this means that the :joins method will no longer accept a string? (or if they will, then a string value will be ignored to determine if preloading must occur or not)

Sounds good to me, but I also like the idea of being able to do something like

@orders = Order.find_by_sql(CRAZY_SHIT)
@orders.preload(:line_items=>{:sku=>:title})
  
+1 on a public preload method as well.

Cheers,
Lawrence

  • Make :include always use preload strategy unless the required data set

is explicitly loaded

With that you mean that it’s not going to interpret the select, order

and condition strings anymore to determine if they’re referencing some

table? I’d be all for that! The interpretation of those string

currently uses a too simple heuristic and is therefor just not working

in practice.

Exactly.

  • Introduce :left_joins and :right_joins keys to the finder. Title says it

all

+1. Though I prefer the terms :left_outer_joins and :right_outer_joins.

For completeness sake also introduce :full_outer_joins.

  • Make user explicitly specify the required joins if they want to put

conditions on the associations

  • For all the associations specified in :include, check if the required

dataset has already been loaded by :*joins. If not, preload.

I guess this means that the :joins method will no longer accept a

string? (or if they will, then a string value will be ignored to

determine if preloading must occur or not)

Well, we’ll definitely need a way to supply string joins, even if they’re gonna be ignored to determine preloading. So the API will need some more brainstorming. Possibly rename :joins to :inner_joins and make the new :joins only accept strings.

- Make :include *always* use preload strategy unless the required
data set
is explicitly loaded
- Introduce :left_joins and :right_joins keys to the finder. Title
says it
all
- Make user explicitly specify the required joins if they want to put
conditions on the associations
- For all the associations specified in :include, check if the
required
dataset has already been loaded by :*joins. If not, preload.
Thoughts ?

Sounds good to me, but I also like the idea of being able to do
something like

@orders = Order.find_by_sql(CRAZY_SHIT)
@orders.preload(:line_items=>{:sku=>:title})

I'd love that too (and the result set proxy I fiddled with did get you
that. Don't know if it still works with current versions of rails).
Also handy when the bit fetching your orders has no idea that you're
going to be displaying the orders along with various associations
(although often scopes and what not can mitigate that).

There is one particular case where preload is broken for :include,
when you have a hmt with conditions on the join model (I've been
meaning to do this for a little while but not gotten around to it.
Last time I thought about it it needed a slight refactor of HMT so
that preload doesn't have to duplicate the code that generates the
join statements you need)

Fred