Eager loading has_one: subselects

In looking at (and trying to use) a has_one association, I noticed that the eager loading code for has one associations doesn't respect the "limit one" per record, and could potentially pull back millions of records (though there is code to make sure only one object per association is actually turned converted to an ActiveRecord object).

Since I always use Postgresql with my Rails projects nowadays, I naturally assume that a subselect is the way to go here, but I don't know where sqllite is with subselects. I know MySQL now uses them, but I don't know if Rails is keeping to some kind of lowest common database denominator. If not, I think it might be worth it to try to add some logic to ActiveRecord::AssociationPreload#find_associated_records, but I don't want to get into a whole testing, branching thing if this is too database-specific.

The code I'm envisioning would be something like this:

<code>       def min_or_max_from_order(reflection)         if reflection.nil? || reflection.options[:order].blank?           return 'MAX'         end         case (reflection.options[:order].split.compact.last.upcase rescue 'ASC')         when 'DESC'           'MAX'         else           'MIN'         end       end

      def order_column_from_reflection(reflection)         if reflection.nil? || reflection.options[:order].blank?           return "id"         end         reflection.options[:order].split.compact.first.split('.').last       end

      def find_associated_records(ids, reflection, preload_options)         options = reflection.options         table_name = reflection.klass.quoted_table_name

        if interface = reflection.options[:as]           conditions = "#{reflection.klass.quoted_table_name}.# {connection.quote_column_name "#{interface}_id"} #{in_or_equals_for_ids (ids)} and #{reflection.klass.quoted_table_name}.# {connection.quote_column_name "#{interface}_type"} = '# {self.base_class.sti_name}'"         else           foreign_key = reflection.primary_key_name           conditions = "#{reflection.klass.quoted_table_name}.# {foreign_key} #{in_or_equals_for_ids(ids)}"         end

  if reflection.macro == :has_one           if reflection.options[:order]             # we'll try subselects here             select = " #{reflection.klass.quoted_table_name}.# {foreign_key} as __original_key, #{table_name}.* "             conditions <<               " and ( #{reflection.klass.quoted_table_name}.# {order_column_from_reflection(reflection)} =                       (select #{min_or_max_from_order(reflection)} (\"subselect_table1\".#{order_column_from_reflection(reflection)})                          from #{reflection.klass.quoted_table_name} as \"subselect_table1\"                         where \"subselect_table1\".#{foreign_key} = # {reflection.klass.quoted_table_name}.#{foreign_key} "             conditions << append_conditions if reflection.options [:conditions]             conditions << " ) ) "           end         end

        conditions << append_conditions(reflection, preload_options)

        reflection.klass.with_exclusive_scope do           reflection.klass.find(:all,                               :select => (preload_options[:select] || options[:select] || "#{table_name}.*"),                               :include => preload_options[:include] || options[:include],                               :conditions => [conditions, ids],                               :joins => options[:joins],                               :group => preload_options[:group] || options[:group],                               :order => preload_options[:order] || options[:order])         end       end </code>

Clearly this wouldn't solve all cases, and some restrictions would have to be laid out in the docs, especially about valid "order by" phrasing, and resolving table name issues in adding the reflection conditions to the subselect conditions...

But anyone have any thoughts about this?

[good stuff snipped]

Clearly this wouldn't solve all cases, and some restrictions would have to be laid out in the docs, especially about valid "order by" phrasing, and resolving table name issues in adding the reflection conditions to the subselect conditions...

But anyone have any thoughts about this?

What I would really like to see in the eager loading stuff is a bit more customisation ability - as you point out your solution would have various edge cases, and eager loading other association types have other caveats (eg interpolated conditions, customer finder sql etc...). A lot of the time the best solution to a particular solution is database specific (or even app specific). It would be super awesome if it was possible to tell active record "and here is how you eager load this association".

What activerecord would need to do is to provide the ids of the parent objects and probably (so that you could reuse custom include code) details about the assocation. The end user code would just need to return the records (probably as a hash of primary keys to arrays of objects, leaving AR to mark the associations as loaded and wire up the stuff

It seems to me like this wouldn't be too hard - find_associated_records would just need to check the association options for the presence of method name / proc and if so just call that (plus some refactoring for habtm). (Might be a good time to rejig eager load of hmt and hot so that they don't take two steps).

It would get a little nasty in the cases of associations that are sometimes eager loaded via this mechanism and sometimes via the old join mechanism - i suppose you'd just have to make it clear that the eager load method you provide is only used in your former case.

It would also be nice if you could specify a class providing whatever required methods, so as well as doing

has_one :bars, :eager_load => lambda { ... } or has_one :bars, :eager_load => :load_bars

you could do

has_one :bars, :eager_load => SubselectEagerLoader

so all the subselect logic would be packaged up in SubselectEagerLoader (and then if people have cool database specific optimisations / strategies they can distribute them as gems, plugins etc)

It would mean that someone like John would have to add :eager_load => SubselectEagerLoader to each of his has_ones that required it. it might be nice to be able to set some sort of default (Class, level, AR:Base level, Connection Adapter ?), but I feel like I'm getting a little ahead of myself here.

John - sorry to have hijacked your thread here - I do feel this would be a reasonable way of allowing people to do things the optimal way for them without having loads of database specific paths through active record

Fred