object.collections.include? a

My question is about object.collections (AssociationProxy).

We all know that it pretends to be an Array, but in reality it's
actually an AssociationProxy.

But, Why does it have its own db intensive include? method?

Example:

@product = Product.find 1

(product habtm categories)

We have 1000 categories in the system.

In the product edit admin, we display a list of checkboxes including all
categories, and the ones that are associated to the product are checked.

<% Category.all do |category| %>
  <%= check_box_tag "product[category_ids][]", category.id,
@product.categories.include?(category) %>
<% end %>

but it triggers 1000 sql queries like this (very time consuming):

SELECT `categories`.id FROM `categories` INNER JOIN
`categories_products` ON `categories`.id =
`categories_products`.category_id WHERE (`categories`.`id` = 1) AND
(`categories_products`.product_id = 1 ) LIMIT 1

Why doesn't it use the include? method of the Array? (way faster and
doesn't need db queries at all, since the association is already
loaded.) Why does it need its own include? method?

If I change my code to this it skips those unnecessary db queries, and
return the result lightning fast:

@product.categories.to_a.include?(category)

what do you guys think?

Why doesn't it use the include? method of the Array? (way faster and
doesn't need db queries at all, since the association is already
loaded.) Why does it need its own include? method?

Presumably the association isn't actually loaded - the code in
association_collection.rb checks for that and uses Array#include? if
the association is loaded (at least that is the intent).

The key thing is that active record doesn't know ahead of time if you
are going to do this once (in which case loading 1000 objects just to
do this test would be wasteful) or if you are going to loop over a
long list of products in which case not loading the array is
wasteful). You can however (as you have found) force AR to go down the
more efficient route (but yes, it would be nicer if it was better at
guessing what you wanted to do).

Fred

project.milestones # fetches milestones from the database
project.milestones.size # uses the milestone cache
project.milestones.empty? # uses the milestone cache
project.milestones(true).size # fetches milestones from the database
project.milestones # uses the milestone cache

Actually that's not quite true: project.milestones does nothing
(except possibly create an instance of AssociationCollection. In the
console it looks like this loads the array but this is only because
the console calls #inspect to display it (which causes it to be
loaded). project.milestones.size will do a count(*) as will .empty?
unless something else had caused the association to be loaded.

So technically AR should know that the associated objects are loaded
already (and cached) so the include? method should use the faster
Array#include? method instead of AssociationCollection#inlcude? or to be
more accurate #exists? method, right?

project.milestones.include?(milestone) #should be fetched from cache
too, no db query should be needed)

It seems that it's programmed in the framework, but for some reason in
my example it's always getting the result from the database instead of
the already loaded association cache.

Why is that?

I suspect that the association cache isn't actually loaded but without
seeing what you're doing this is mere speculation.

Fred

Optimization really depends on the developer. The ORM framework has
limitations. You could move the fetch outside the loop and use
specific fields to reduce memory usage as well. For example
(illustrative)
<% avail_category_ids = @product.categories.all(:select => "id").map{|
category> category.id})
Category.all(:select => "id").map{|category| category.id} do |
category> %>
  <%= check_box_tag "product[category_ids][]", category.id,
avail_category_ids.include?(category.id) %>
<% end %>