object.collections.include? a

My question is about object.collections (AssociationProxy).

We all know that it pretends to be an Array, but in reality it's actually an AssociationProxy.

But, Why does it have its own db intensive include? method?

Example:

@product = Product.find 1

(product habtm categories)

We have 1000 categories in the system.

In the product edit admin, we display a list of checkboxes including all categories, and the ones that are associated to the product are checked.

<% Category.all do |category| %>   <%= check_box_tag "product[category_ids]", category.id, @product.categories.include?(category) %> <% end %>

but it triggers 1000 sql queries like this (very time consuming):

SELECT `categories`.id FROM `categories` INNER JOIN `categories_products` ON `categories`.id = `categories_products`.category_id WHERE (`categories`.`id` = 1) AND (`categories_products`.product_id = 1 ) LIMIT 1

Why doesn't it use the include? method of the Array? (way faster and doesn't need db queries at all, since the association is already loaded.) Why does it need its own include? method?

If I change my code to this it skips those unnecessary db queries, and return the result lightning fast:

@product.categories.to_a.include?(category)

what do you guys think?

Why doesn't it use the include? method of the Array? (way faster and doesn't need db queries at all, since the association is already loaded.) Why does it need its own include? method?

Presumably the association isn't actually loaded - the code in association_collection.rb checks for that and uses Array#include? if the association is loaded (at least that is the intent).

The key thing is that active record doesn't know ahead of time if you are going to do this once (in which case loading 1000 objects just to do this test would be wasteful) or if you are going to loop over a long list of products in which case not loading the array is wasteful). You can however (as you have found) force AR to go down the more efficient route (but yes, it would be nicer if it was better at guessing what you wanted to do).

Fred

project.milestones # fetches milestones from the database project.milestones.size # uses the milestone cache project.milestones.empty? # uses the milestone cache project.milestones(true).size # fetches milestones from the database project.milestones # uses the milestone cache

Actually that's not quite true: project.milestones does nothing (except possibly create an instance of AssociationCollection. In the console it looks like this loads the array but this is only because the console calls #inspect to display it (which causes it to be loaded). project.milestones.size will do a count(*) as will .empty? unless something else had caused the association to be loaded.

So technically AR should know that the associated objects are loaded already (and cached) so the include? method should use the faster Array#include? method instead of AssociationCollection#inlcude? or to be more accurate #exists? method, right?

project.milestones.include?(milestone) #should be fetched from cache too, no db query should be needed)

It seems that it's programmed in the framework, but for some reason in my example it's always getting the result from the database instead of the already loaded association cache.

Why is that?

I suspect that the association cache isn't actually loaded but without seeing what you're doing this is mere speculation.

Fred

Optimization really depends on the developer. The ORM framework has limitations. You could move the fetch outside the loop and use specific fields to reduce memory usage as well. For example (illustrative) <% avail_category_ids = @product.categories.all(:select => "id").map{| category> category.id}) Category.all(:select => "id").map{|category| category.id} do | category> %>   <%= check_box_tag "product[category_ids]", category.id, avail_category_ids.include?(category.id) %> <% end %>