Nested has_many :through patch

Hi to all,

This is my first post here, I would first like to salute you all and praise the great work you guys are doing on RoR.

There is a patch (http://dev.rubyonrails.org/ticket/6461) sitting for quite a while on the Trac for nested has_many :through associations. I think it is a must and I have seen it requested quite a few times (http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/ 4517fb8bf464921a/6f4c544fb948b9d1 and http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/1ee28a00febcbc42/b34f05aae9e01688 ).

The patch seems to be all sorted out for more than 2 months (all tests working). It was even made into a plugin by the author. Is there a reason it was not applied yet? I didn't find anything about it in this list or elsewhere.

I think it greatly simplifies apps that have complex (=large number) models. To make a case for it, I will show a simple 'posts and comments' example where it would be very useful (I am just writing the essential).

class CommentRating < AR:Base   belongs_to :comment   belongs_to :user end

class Comment < AR::Base   has_many :comment_ratings   belongs_to :user end

class Post < AR::Base   has_many :comments   belongs_to :user end

class User < AR::Base   has_many :posts   has_many :comments, :through=>:posts   has_many :comment_ratings, :through=>:comments end

If I wanted to allow a user to review only the ratings given to the comments from his posts, I could use the contextualized finder: current_user.comment_ratings.find(params[:id])

I know this example seems a little "forced", but as your modelspace gets larger, without this kind of nesting, you find yourself having to write more and more custom finders, with joins, scopes, etc, for things that could be handled in a much simpler way.

Cheers,

Bernardo (from Brazil)

I think it greatly simplifies apps

+1

I just started using the plugin last night, so I can’t attest to it’s stability yet, but the functionality seems like a good fit for core, IMHO.

±0

Nested include could be quite costy at database level. Something like this would need some more benchmarking.

While I agree that any additional trips to the database or additional joins to a query are more costly than not doing them, and I would love to see some benchmarking here and get any performance gains possible, I’m not sure I agree with you implication that this is a reason for exclusion.

As I understand it, the purpose of the plugin is not to increase performance, but rather to make it easier and cleaner to do things we are already doing.

From Bernardo’s example:

class CommentRating < AR:Base

belongs_to :comment belongs_to :user end

class Comment < AR::Base has_many :comment_ratings belongs_to :user end

class Post < AR::Base

has_many :comments belongs_to :user end

class User < AR::Base has_many :posts has_many :comments, :through=>:posts has_many :comment_ratings, :through=>:comments

end

With out the plugin one would have to do something like:

class User <AR::Base has_many :posts has_many :comments, :through=>:posts

def comments_rating #or use find_by_sql

ratings = []
comments.each{ |c| ratings << c.comment_ratings}
ratings.flatten

end end

And that is just to get a finder. One would have reimplement every hm method that they need to use.

So I think the performance would be similar to the methods already being used, but much much cleaner.

My use case is Business → BusinessOwner → User → Contacts (where BusinessOwner is a special kind of BusinessMember) (I do have a good business rule reason why the contact is owned by the business owner and not the business).

Here is the code: Business.find(5).contacts

Here is the query produced by the plugin: SELECT contacts.* FROM contacts INNER JOIN users ON (contacts.owner_id = users.id) INNER JOIN business_members ON ( users.id = business_members.user_id) WHERE (business_members.business_id = 5 AND type = ‘BusinessOwner’)

Which I think is a much better query than the comment_ratings I wrote above would use.

Thanks, steven a bristol

-1

I agree with Pratik

+1

Rails has usually been about clean queries over performant ones. :include came way after associations, for example, but :include solved a real problem visible in real applications.

No benchmarking makes sense outside the context of a real application, and this patch would benefit real applcations which can then determine whether there is a painful spot that needs work.

In other words this would help most people most of the time who have run into this rather common situation and have solved it in ways that would be equally performant without making specific optimizations like splitting into multiple queries.

Josh

+1 based on a cursory glance

The only reason to reject this would be that the algorithm is flawed. The fact that it produces a complex query should not be an issue. By and large you WANT these kinds of things to be taken care of by the database where indexes and other optimizations are more efficiently brought to bear. If you need this kind of logic, there's no getting around the irreducible complexity. Let's have ActiveRecord do its best for the general case, and let application writers take care of the hacky optimizations for individual cases.

When a new feature is implemented in rails, people will tend to use it, believing it's the rails way to do things, without digging much about the internals. Which is the correct way in most of the cases. Itroducing a feature that could possibly be a bottleneck, needs a bit of more research.

In other words this would help most people most of the time who have run into this rather common situation and have solved it in ways that would be equally performant without making specific optimizations like splitting into multiple queries.

That makes it a plugin material imho.

Also in the given example above, this *might* already work :

class CommentRating < AR:Base belongs_to :comment belongs_to :user end

class Comment < AR::Base has_many :comment_ratings belongs_to :user end

class Post < AR::Base has_many :comments, :include => :comment_ratings belongs_to :user end

class User < AR::Base has_many :posts has_many :comments, :through=>:posts end

Needs to be verified though.

-Pratik

this rather common situation

LOL

I (obviously) agree with Steven, Joshua and Gabe, not only is this common (in any large app), but I don't see any performance issue with it (on the contrary). I also think this kind of nesting is the logical thing to support, as I've seen many users think (myself included) that it was already there in Rails, try it and be surprised with the errors it caused. If the Rails way was wanting the user to do custom "optimized" queries all the time, perhaps it shouldn't even have associations or finders.

In any reasonably large DB with more than 50 entities it is impossible to model things in a way that you will never need more than 2 levels of "INNER JOINING" to query things. So, I agree that the examples given here may be a case of "bad modeling" (perhaps not bad, but that could be handled in some other way) if considered separately. My real app that needs this is way more complex than the example given, I just made this example up to try to explain it faster. By supporting HMT nesting Rails will not stimulate (sane) users to make their DB unnecessarily more complex, but will help (a lot) developer of larger apps, who can't avoid complexity.

I coded my example ('posts and comments') and run some benchmarks on it. I implemented some finders in the user class, in 5 different ways. The last one is just using the nested HMT plugin from the patch (I needlessly put it in a method as well, just to say I wasn't being unfair in the benchmark).

def ratings_ruby_way(*arguments_to_find)     ratings = comments.find(:all).inject() do |arr, c|       arr << c.comment_ratings.find(*arguments_to_find)     end     return ratings.flatten   end

  def ratings_include_ruby_way(*arguments_to_find)     ratings = comments.find(:all, :include=>:comment_ratings).inject()do |arr, c|       arr << c.comment_ratings.find(*arguments_to_find)     end     return ratings.flatten   end

  def ratings_find_all_include_ruby_way     ratings = comments.find(:all, :include=>:comment_ratings).inject()do |arr, c|       arr << c.comment_ratings     end     return ratings.flatten   end

  def ratings_find_all_by_sql_way     CommentRating.find_by_sql(       [%q{         Select comment_ratings.* FROM comment_ratings           INNER JOIN comments ON             comments.id = comment_ratings.comment_id           INNER JOIN posts ON             posts.id = comments.post_id           INNER JOIN users ON             users.id = posts.user_id           WHERE               users.id = ?},       self.id] )   end

  def ratings_nested_way(*arguments_to_find)     ratings = comment_ratings.find(*arguments_to_find)     return ratings   end

I've filled the tables with some test data (about 700 rows each, 5000 comment_ratings). In the benchmark each method is run 500 times, passing ':all' when appropriate, on a set of 500 random users (the same set for each method, on the same run).

Results for PostgreSQL 8.2 (using pg dlls based adapter):

load 'lib/benchmark_hmt.rb'

                              user system total real ruby_way 1.172000 0.422000 1.594000 ( 3.328000) ruby_include 2.562000 0.484000 3.046000 ( 5.625000) find_all_ruby_include 2.172000 0.203000 2.375000 ( 4.078000) find_all_sql 0.734000 0.203000 0.937000 ( 2.156000) nested 1.016000 0.266000 1.282000 ( 2.360000)

Results for MySQL 5.0.24a (with default Rails adapter):

load 'lib/benchmark_hmt.rb'

                              user system total real ruby_way 2.766000 0.235000 3.001000 ( 3.407000) ruby_include 4.703000 0.422000 5.125000 ( 7.656000) find_all_ruby_include 3.453000 0.109000 3.562000 ( 3.891000) find_all_sql 1.485000 0.250000 1.735000 ( 1.937000) nested 1.812000 0.234000 2.046000 ( 2.250000)

It was run on a Pentium M 1.7GHZ, 1GB RAM, WinXP. The DBMSs were local. I used Rails Edge (from yesterday).

Even though MySQL was slightly faster, when I initially tried it on a larger set of test data (~ 20000 rows) it was slower than Postgres. The methods with :include got unusable, and only one query was taking more than 160 seconds (they virtually froze my notebook). MySQL didn't like the mix of LEFT OUTER JOIN and INNER JOIN Rails generated. I don't have the benchmark numbers for those tests and it takes too long to enter the test data, specially in MySQL, so I decided to use this smaller set.

The great surprise here is that using :include was slower than just making a lot of smaller queries (method 1 versus method 3). Using the nested has_many :through is almost as fast as doing the find_by_sql directly, and way faster than anything else. And the great plus of the nested has_many :through is that you get all those fine association methods, effortlessly.

If you want check the benchmark app, I made it available at:

http://www.bernardopadua.com/temp/benchmark_nested_has_many_through.zip

Cheers,

Bernardo

This is the most important point to make. If you all remember, nested includes was merged into core because of its elegant implementation and the obvious logical fit of the behavior.

http://dev.rubyonrails.org/ticket/3913

Josh

Thanks Bernardo for doing all of that work!

Maybe we could all make another round of posts to try and discover the maturity of the plugin. Koz & the rest of core might want to wait until the plugin has been pounded in a few apps for a while before deciding one way or another.

I for one have been using it only for a few days. The code hasn’t moved to production yet, but all of my tests are passing so far. Right now my feeling is that if all of my tests are still passing when I am done with my feature (I should be done by tomorrow night) then I will be comfortable with pushing it to production and reporting back here after some real world testing.

How long have you all been using it and what are your experiences with it in a real world app?

Thanks, Steven A Bristol

In all the cases, the patch ( for plugin of course ) needs more tests. E.g. with all different htm options ( uniq => true, :conditions, etc. )

It also needs to be tested against all allowed association collection methods ( that's from mapping defined in association.rb )

There should be no exception for nested hmt as well. ( e.g. collection.foo will not work for nested htm )

Thanks -Pratik

re: http://dev.rubyonrails.org/ticket/6461

Hi, I'm the author of the patch, or at least the latest one to pick up the mantle... many thanks to Bernado for pointing me to this thread, and for doing the benchmarking.

As I understand it, the purpose of the plugin is not to increase performance, but rather to make it easier and cleaner to do things we are already doing.

Yep, that's pretty much it - I'd put it a bit more strongly, and say that it makes it *possible* to do things in ActiveRecord that could previously only be achieved with workarounds (namely, find_by_sql - which doesn't support eager loading - or massaging the data set in Ruby, which is a method I guess I haven't explored as fully as others here, but just seems intuitively wrong to me when that's just the sort of thing a database is good at).

It seems that a few people are still under the impression that this kind of association only comes up in obscure scenarios, but the two real-world use cases from my latest two projects have been "Find all my friends' blog posts" (where Friendship has attributes that necessitate it being a full model, rather than a habtm relationship) and "Find all songs written by a member of this band" - all pretty routine stuff.

As far as maturity goes, one of the above projects is in production <WWF Footprint Calculator; and using the plugin lightly, and the other one - still in the works - uses it extensively (in conjunction with some fairly hefty cascading eager loading), and the 2007-04-28 revision of the patch has Just Worked for me so far. The only issue I encountered was a table aliasing bug which turned out to be an existing Rails bug <http://dev.rubyonrails.org/ticket/8267&gt;, just one that's easier to stumble upon with less twisty relations when you're using this patch...

One thing it doesn't do yet is allow you to use a nested hmt association within an :include, which would be a nice addition, but probably be a mostly separate piece of development (and, sadly, will be just as big a patch as the present one, I suspect). One of the unit tests is also failing as of r7119, which I haven't had chance to look into yet, but I'm hoping it's just triggered by the change in the fixture.

Cheers for the feedback, everyone - keep me updated about any deficiencies you find, and I'll give them my best shot.

- Matt