Nested has_many :through patch

Hi to all,

This is my first post here, I would first like to salute you all and
praise the great work you guys are doing on RoR.

There is a patch (http://dev.rubyonrails.org/ticket/6461) sitting for
quite a while on the Trac for nested has_many :through associations. I
think it is a must and I have seen it requested quite a few times
(http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/
4517fb8bf464921a/6f4c544fb948b9d1 and
http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/1ee28a00febcbc42/b34f05aae9e01688
).

The patch seems to be all sorted out for more than 2 months (all tests
working). It was even made into a plugin by the author. Is there a
reason it was not applied yet? I didn't find anything about it in this
list or elsewhere.

I think it greatly simplifies apps that have complex (=large number)
models. To make a case for it, I will show a simple 'posts and
comments' example where it would be very useful (I am just writing the
essential).

class CommentRating < AR:Base
  belongs_to :comment
  belongs_to :user
end

class Comment < AR::Base
  has_many :comment_ratings
  belongs_to :user
end

class Post < AR::Base
  has_many :comments
  belongs_to :user
end

class User < AR::Base
  has_many :posts
  has_many :comments, :through=>:posts
  has_many :comment_ratings, :through=>:comments
end

If I wanted to allow a user to review only the ratings given to the
comments from his posts, I could use the contextualized finder:
current_user.comment_ratings.find(params[:id])

I know this example seems a little "forced", but as your modelspace
gets larger, without this kind of nesting, you find yourself having to
write more and more custom finders, with joins, scopes, etc, for
things that could be handled in a much simpler way.

Cheers,

Bernardo (from Brazil)

I think it greatly simplifies apps

+1

I just started using the plugin last night, so I can’t attest to it’s stability yet, but the functionality seems like a good fit for core, IMHO.

±0

Nested include could be quite costy at database level. Something like
this would need some more benchmarking.

While I agree that any additional trips to the database or additional joins to a query are more costly than not doing them, and I would love to see some benchmarking here and get any performance gains possible, I’m not sure I agree with you implication that this is a reason for exclusion.

As I understand it, the purpose of the plugin is not to increase performance, but rather to make it easier and cleaner to do things we are already doing.

From Bernardo’s example:

class CommentRating < AR:Base

belongs_to :comment
belongs_to :user
end

class Comment < AR::Base
has_many :comment_ratings
belongs_to :user
end

class Post < AR::Base

has_many :comments
belongs_to :user
end

class User < AR::Base
has_many :posts
has_many :comments, :through=>:posts
has_many :comment_ratings, :through=>:comments

end

With out the plugin one would have to do something like:

class User <AR::Base
has_many :posts
has_many :comments, :through=>:posts

def comments_rating #or use find_by_sql

ratings = []
comments.each{ |c| ratings << c.comment_ratings}
ratings.flatten

end
end

And that is just to get a finder. One would have reimplement every hm method that they need to use.

So I think the performance would be similar to the methods already being used, but much much cleaner.

My use case is Business -> BusinessOwner -> User -> Contacts (where BusinessOwner is a special kind of BusinessMember) (I do have a good business rule reason why the contact is owned by the business owner and not the business).

Here is the code:
Business.find(5).contacts

Here is the query produced by the plugin:
SELECT contacts.* FROM contacts INNER JOIN users ON (contacts.owner_id = users.id) INNER JOIN business_members ON (
users.id = business_members.user_id) WHERE (business_members.business_id = 5 AND type = ‘BusinessOwner’)

Which I think is a much better query than the comment_ratings I wrote above would use.

Thanks,
steven a bristol

-1

I agree with Pratik

+1

Rails has usually been about clean queries over performant ones. :include came way after associations, for example, but :include solved a real problem visible in real applications.

No benchmarking makes sense outside the context of a real application, and this patch would benefit real applcations which can then determine
whether there is a painful spot that needs work.

In other words this would help most people most of the time who have run into this rather common situation and have solved it in ways that would be equally performant without
making specific optimizations like splitting into multiple queries.

Josh

+1 based on a cursory glance

The only reason to reject this would be that the algorithm is flawed.
The fact that it produces a complex query should not be an issue. By
and large you WANT these kinds of things to be taken care of by the
database where indexes and other optimizations are more efficiently
brought to bear. If you need this kind of logic, there's no getting
around the irreducible complexity. Let's have ActiveRecord do its
best for the general case, and let application writers take care of
the hacky optimizations for individual cases.

When a new feature is implemented in rails, people will tend to use
it, believing it's the rails way to do things, without digging much
about the internals. Which is the correct way in most of the cases.
Itroducing a feature that could possibly be a bottleneck, needs a bit
of more research.

In other words this would help most people most of the time who have
run into this rather common situation and have solved it in ways that
would be equally performant without
making specific optimizations like splitting into multiple queries.

That makes it a plugin material imho.

Also in the given example above, this *might* already work :

class CommentRating < AR:Base
belongs_to :comment
belongs_to :user
end

class Comment < AR::Base
has_many :comment_ratings
belongs_to :user
end

class Post < AR::Base
has_many :comments, :include => :comment_ratings
belongs_to :user
end

class User < AR::Base
has_many :posts
has_many :comments, :through=>:posts
end

Needs to be verified though.

-Pratik

this rather common situation

LOL

I (obviously) agree with Steven, Joshua and Gabe, not only is this
common (in any large app), but I don't see any performance issue with
it (on the contrary). I also think this kind of nesting is the logical
thing to support, as I've seen many users think (myself included) that
it was already there in Rails, try it and be surprised with the errors
it caused. If the Rails way was wanting the user to do custom
"optimized" queries all the time, perhaps it shouldn't even have
associations or finders.

In any reasonably large DB with more than 50 entities it is impossible
to model things in a way that you will never need more than 2 levels
of "INNER JOINING" to query things. So, I agree that the examples
given here may be a case of "bad modeling" (perhaps not bad, but that
could be handled in some other way) if considered separately. My real
app that needs this is way more complex than the example given, I just
made this example up to try to explain it faster. By supporting HMT
nesting Rails will not stimulate (sane) users to make their DB
unnecessarily more complex, but will help (a lot) developer of larger
apps, who can't avoid complexity.

I coded my example ('posts and comments') and run some benchmarks on
it. I implemented some finders in the user class, in 5 different ways.
The last one is just using the nested HMT plugin from the patch (I
needlessly put it in a method as well, just to say I wasn't being
unfair in the benchmark).

def ratings_ruby_way(*arguments_to_find)
    ratings = comments.find(:all).inject([]) do |arr, c|
      arr << c.comment_ratings.find(*arguments_to_find)
    end
    return ratings.flatten
  end

  def ratings_include_ruby_way(*arguments_to_find)
    ratings =
comments.find(:all, :include=>:comment_ratings).inject([])do |arr, c|
      arr << c.comment_ratings.find(*arguments_to_find)
    end
    return ratings.flatten
  end

  def ratings_find_all_include_ruby_way
    ratings =
comments.find(:all, :include=>:comment_ratings).inject([])do |arr, c|
      arr << c.comment_ratings
    end
    return ratings.flatten
  end

  def ratings_find_all_by_sql_way
    CommentRating.find_by_sql(
      [%q{
        Select comment_ratings.* FROM comment_ratings
          INNER JOIN comments ON
            comments.id = comment_ratings.comment_id
          INNER JOIN posts ON
            posts.id = comments.post_id
          INNER JOIN users ON
            users.id = posts.user_id
          WHERE
              users.id = ?},
      self.id] )
  end

  def ratings_nested_way(*arguments_to_find)
    ratings = comment_ratings.find(*arguments_to_find)
    return ratings
  end

I've filled the tables with some test data (about 700 rows each, 5000
comment_ratings). In the benchmark each method is run 500 times,
passing ':all' when appropriate, on a set of 500 random users (the
same set for each method, on the same run).

Results for PostgreSQL 8.2 (using pg dlls based adapter):

load 'lib/benchmark_hmt.rb'

                              user system total real
ruby_way 1.172000 0.422000 1.594000 ( 3.328000)
ruby_include 2.562000 0.484000 3.046000 ( 5.625000)
find_all_ruby_include 2.172000 0.203000 2.375000 ( 4.078000)
find_all_sql 0.734000 0.203000 0.937000 ( 2.156000)
nested 1.016000 0.266000 1.282000 ( 2.360000)

Results for MySQL 5.0.24a (with default Rails adapter):

load 'lib/benchmark_hmt.rb'

                              user system total real
ruby_way 2.766000 0.235000 3.001000 ( 3.407000)
ruby_include 4.703000 0.422000 5.125000 ( 7.656000)
find_all_ruby_include 3.453000 0.109000 3.562000 ( 3.891000)
find_all_sql 1.485000 0.250000 1.735000 ( 1.937000)
nested 1.812000 0.234000 2.046000 ( 2.250000)

It was run on a Pentium M 1.7GHZ, 1GB RAM, WinXP. The DBMSs were
local. I used Rails Edge (from yesterday).

Even though MySQL was slightly faster, when I initially tried it on a
larger set of test data (~ 20000 rows) it was slower than Postgres.
The methods with :include got unusable, and only one query was taking
more than 160 seconds (they virtually froze my notebook). MySQL didn't
like the mix of LEFT OUTER JOIN and INNER JOIN Rails generated. I
don't have the benchmark numbers for those tests and it takes too long
to enter the test data, specially in MySQL, so I decided to use this
smaller set.

The great surprise here is that using :include was slower than just
making a lot of smaller queries (method 1 versus method 3). Using the
nested has_many :through is almost as fast as doing the find_by_sql
directly, and way faster than anything else. And the great plus of the
nested has_many :through is that you get all those fine association
methods, effortlessly.

If you want check the benchmark app, I made it available at:

http://www.bernardopadua.com/temp/benchmark_nested_has_many_through.zip

Cheers,

Bernardo

This is the most important point to make. If you all remember, nested includes was merged into core because of its
elegant implementation and the obvious logical fit of the behavior.

http://dev.rubyonrails.org/ticket/3913

Josh

Thanks Bernardo for doing all of that work!

Maybe we could all make another round of posts to try and discover the maturity of the plugin. Koz & the rest of core might want to wait until the plugin has been pounded in a few apps for a while before deciding one way or another.

I for one have been using it only for a few days. The code hasn’t moved to production yet, but all of my tests are passing so far. Right now my feeling is that if all of my tests are still passing when I am done with my feature (I should be done by tomorrow night) then I will be comfortable with pushing it to production and reporting back here after some real world testing.

How long have you all been using it and what are your experiences with it in a real world app?

Thanks,
Steven A Bristol

In all the cases, the patch ( for plugin of course ) needs more tests.
E.g. with all different htm options ( uniq => true, :conditions, etc.
)

It also needs to be tested against all allowed association collection
methods ( that's from mapping defined in association.rb )

There should be no exception for nested hmt as well. ( e.g.
collection.foo will not work for nested htm )

Thanks
-Pratik

re: http://dev.rubyonrails.org/ticket/6461

Hi,
I'm the author of the patch, or at least the latest one to pick up the
mantle... many thanks to Bernado for pointing me to this thread, and
for doing the benchmarking.

As I understand it, the purpose of the plugin is not to increase
performance, but rather to make it easier and cleaner to do things we are
already doing.

Yep, that's pretty much it - I'd put it a bit more strongly, and say
that it makes it *possible* to do things in ActiveRecord that could
previously only be achieved with workarounds (namely, find_by_sql -
which doesn't support eager loading - or massaging the data set in
Ruby, which is a method I guess I haven't explored as fully as others
here, but just seems intuitively wrong to me when that's just the sort
of thing a database is good at).

It seems that a few people are still under the impression that this
kind of association only comes up in obscure scenarios, but the two
real-world use cases from my latest two projects have been "Find all
my friends' blog posts" (where Friendship has attributes that
necessitate it being a full model, rather than a habtm relationship)
and "Find all songs written by a member of this band" - all pretty
routine stuff.

As far as maturity goes, one of the above projects is in production
<http://footprint.wwf.org.uk/> and using the plugin lightly, and the
other one - still in the works - uses it extensively (in conjunction
with some fairly hefty cascading eager loading), and the 2007-04-28
revision of the patch has Just Worked for me so far. The only issue I
encountered was a table aliasing bug which turned out to be an
existing Rails bug <http://dev.rubyonrails.org/ticket/8267>, just one
that's easier to stumble upon with less twisty relations when you're
using this patch...

One thing it doesn't do yet is allow you to use a nested hmt
association within an :include, which would be a nice addition, but
probably be a mostly separate piece of development (and, sadly, will
be just as big a patch as the present one, I suspect). One of the unit
tests is also failing as of r7119, which I haven't had chance to look
into yet, but I'm hoping it's just triggered by the change in the
fixture.

Cheers for the feedback, everyone - keep me updated about any
deficiencies you find, and I'll give them my best shot.

- Matt