Possible to preserve :order clause on eager-loaded associations?

I'm using eager loading pretty heavily and running up into a lot of
edge cases that I'd like to work on. The latest is that :order
clauses on associations appear to be dropped when the association is
eager loaded. This makes sense since you generally would want to
specify the order on your query, and it's prone to name collisions.
However the downside is that I'm doing a LOT of eager loading and it's
pretty ugly to be extending every single order clause through the
application with association orders (considering the associations will
always be ordered the same). Throw in the fact that my :include
option is sometimes specified dynamically (for API use and such), and
constructing the proper order clause turns into a very significant
undertaking.

I'm wondering if any ActiveRecord experts in here can comment on the
feasibility of factoring :order options into eager-loaded associations
intelligently. Ideally it would always use the associations order
clause unless the passed-in order clause contained a reference to the
eager-loaded table. Given the table renaming and the possibility of
explicit joins, this could be complicated, but I think it could be
made to work reasonably well. Any comments before I start work on a
patch?

As it turns out, extending the order clause doesn't even work for
eager loading with a limit clause since the initial query to find the
IDs doesn't include the joined table, and therefore the clause blows
up. ActiveRecord will need to parse the order string to prevent this
condition... As it is, I'll have to either not eager load the
association (n+1 ugh), or else sort the association after it is loaded
(not bad for small associations, but not very DRY).

Does AR do any analysis of ORDER clauses currently?

As it turns out, extending the order clause doesn't even work for
eager loading with a limit clause since the initial query to find the
IDs doesn't include the joined table, and therefore the clause blows
up. ActiveRecord will need to parse the order string to prevent this
condition... As it is, I'll have to either not eager load the
association (n+1 ugh), or else sort the association after it is loaded
(not bad for small associations, but not very DRY).

Does AR do any analysis of ORDER clauses currently?

Limits, orders etc with Eager includes are bit of a nonsense feature
as is, and the limitations of their implementation highlights that.
Rather than take some ad-hoc patches to fix a few specific broken
cases, I think the best bet is to either:

* Rethink how they can / should work to get something a little more robust; or
* Figure out what situations won't work correctly with our current
implementation and raise useful errors instead of 'kinda working'.

Does anyone have any strong feelings on the matter? I'm sure some of
the other ORMs like hibernate and toplink have an implementation that
could provide clues to a sane approach?

Well just to be clear, I'm not talking about an ad-hoc patch here.
I'm presenting a specific scenario as an example, but it touches on a
whole class of issues with field and table name collisions, and the
inclusion and exclusion of various clauses. I expect there's a lot
of collective wisdom here about various edge cases which is what I'm
hoping to tap into.

I'm doing some pretty heavy advanced search form stuff with
ActiveRecord; really pushing the envelope with limited eager loading,
and I gotta say it's held up fairly well (it's better now than with
Rails 1.2.3). My impressions are that:

* The SQL error usually is just as informative as whatever AR might
easily be made to spit out.
* Determining the error condition would require just as much work as
actually fixing it in a lot of cases.
* I'd rather see documentation of the incompatible options, as by the
time I hit the error, it's usually not hard to figure out what's going
on. Documentation would alert more people to the limitations before
they actually run into them.

I think looking at other ORMs for ideas would be great, however I'm a
little concerned that a truly robust solution starts to approach the
complexity of a full SQL parser. I don't want people to be
discouraged by the complexity and conclude that ActiveRecord shouldn't
support that on grounds of complexity alone.

I like the fact that ActiveRecord doesn't try to do everything, but I
feel pretty strongly that we need more robust eager loading because
it's such an essential feature. Often times it's a performance
necessity, and it's one area where falling back to find_by_sql just
doesn't work because you can't easily construct the association
targets from a raw sql query.

Often times it's a performance
necessity, and it's one area where falling back to find_by_sql just
doesn't work because you can't easily construct the association
targets from a raw sql query.

Building a SQL parser immediately sets off all sorts of alarm bells.
However perhaps the proper solution is to simply remedy this problem
If you can do fancy find_by_sql and get the snippet of your object
graph back, then it makes it less important to start guessing intent
of our users.

The apps I'm working on don't typically need to use complicated
queries to return a chunk of the object graph. Most of the
hand-written sql I use gets fed into aggregate queries of some kind.

If you're currently hitting the limits of the current system, perhaps
that's the motivation needed to investigate improving find_by_sql?

I've thought about that in the abstract and it makes a lot of sense.
Another way to add power and flexibility would be public hooks to the
eager loaded table name aliases (maybe this can be done already, it's
just not documented).

However I think the base case that I'm talking about should definitely
be fixed. It's not some crazy edge case. It's just maintaining the
order of the association. The solution is quite simple: DON'T include
the order clause on the ID-fetching query for limited eager loading,
but DO include it (by appending) on the actual query.

My entire app is built around search, and I'm leveraging pretty much
all the features of ActiveRecord, so using find_by_sql just isn't
practical. If I were to do so, I would have to write code duplicating
75% of ActiveRecord functionality just to build my queries. Kind of
like writing an SQL parser is a bad thing for ActiveRecord to do,
writing a query generator that can process :include, :conditions,
:limit, :offset, etc would be a bad thing for my app to do.

I've thought about that in the abstract and it makes a lot of sense.
Another way to add power and flexibility would be public hooks to the
eager loaded table name aliases (maybe this can be done already, it's
just not documented).

However I think the base case that I'm talking about should definitely
be fixed. It's not some crazy edge case. It's just maintaining the
order of the association. The solution is quite simple: DON'T include
the order clause on the ID-fetching query for limited eager loading,
but DO include it (by appending) on the actual query.

Won't that mean your 'limit / offset' queries aren't guaranteed to
page through the system in a consistent and meaningful way?

My entire app is built around search, and I'm leveraging pretty much
all the features of ActiveRecord, so using find_by_sql just isn't
practical. If I were to do so, I would have to write code duplicating
75% of ActiveRecord functionality just to build my queries. Kind of
like writing an SQL parser is a bad thing for ActiveRecord to do,
writing a query generator that can process :include, :conditions,
:limit, :offset, etc would be a bad thing for my app to do.

Sure, but if you wrote one and contributed it back, we'd all be better off :).

Either way, yeah, you're definitely right that if we can fix the
strangely undefined behaviour, we should. But I don't think that
what you're proposing will lead to consistent behaviour?

>
> I've thought about that in the abstract and it makes a lot of sense.
> Another way to add power and flexibility would be public hooks to the
> eager loaded table name aliases (maybe this can be done already, it's
> just not documented).
>
> However I think the base case that I'm talking about should definitely
> be fixed. It's not some crazy edge case. It's just maintaining the
> order of the association. The solution is quite simple: DON'T include
> the order clause on the ID-fetching query for limited eager loading,
> but DO include it (by appending) on the actual query.

Won't that mean your 'limit / offset' queries aren't guaranteed to
page through the system in a consistent and meaningful way?

You are more familiar with the code than me, so I could be overlooking
something here. However if you append association :order to the end
of the ORDER clause, it shouldn't affect the main :order specified in
the base query. Likewise, leaving the assocation :order it out of the
ID-fetching query ensures we get the right number of, and correct base
objects. If I'm totally missing your point please explain.

> My entire app is built around search, and I'm leveraging pretty much
> all the features of ActiveRecord, so using find_by_sql just isn't
> practical. If I were to do so, I would have to write code duplicating
> 75% of ActiveRecord functionality just to build my queries. Kind of
> like writing an SQL parser is a bad thing for ActiveRecord to do,
> writing a query generator that can process :include, :conditions,
> :limit, :offset, etc would be a bad thing for my app to do.

Sure, but if you wrote one and contributed it back, we'd all be better off :).

Either way, yeah, you're definitely right that if we can fix the
strangely undefined behaviour, we should. But I don't think that
what you're proposing will lead to consistent behaviour?

Well I certainly want to contribute something useful and robust back
to ActiveRecord. At this point I'm just brainstorming attack vectors
to the problem though. I'm looking for any specific warnings,
pitfalls, or edge cases that I should be thinking about. I'm not sure
what you mean by inconsistent behaviour, but my only goal here is
making things work that seem like they should work. The expected
behaviour should be obvious, so I guess I'm not seeing where the
inconsistency would arise (unless I screwed something up).

You are more familiar with the code than me, so I could be overlooking
something here. However if you append association :order to the end
of the ORDER clause, it shouldn't affect the main :order specified in
the base query. Likewise, leaving the assocation :order it out of the
ID-fetching query ensures we get the right number of, and correct base
objects. If I'm totally missing your point please explain.

Leaving the order out of the id-fetching query means you're paging
through the list, *then* sorting. So if you have a limit of three,
and ordering by name, you could easily see:

Page 1
* Marcus
* Michael
* Stephen

Page 2
* Aaron
* James
* Oliver

Page 3
* Adam
* Scott
* Thomas

That's quite a counter intuitive result if someone's clicked to order
by name and paging through the system.

When faced with similar limitations, I ended up writing a plugin that
allowed me to do my own custom SQL when pulling in associations. Some
things are just too tough to try to do through automated SQL
generation.

Check out my blog post at http://kellogg-assoc.com/articles/2006/11/05/eager-finder-sql
or just go to the RubyForge page at http://rubyforge.org/projects/eagerfindersql

Best of luck,

Gregg

Right right. No, what I meant was leaving the :order clause of the
association off (not the whole order clause). Sort of like it leaves
the association's JOINs off in order to get the correct LIMIT and
OFFSET, but adds them to the final query.

I think the best bet would be for you to grab me on #rails-contrib,
it'll be heaps faster to resolve in realtime