ANN: association_collection_tools plugin


Any time you use an ORM you need to know that you are often sacrificing
performance for convenience and developer efficiency. In general, this
is a good thing. I agree with the theory espoused by DHH that
productivity is *often* more valuable than machine performance. At
I certainly agree with it in the early stages of development. Once you
get to a certain scale, however, there are cases where you'll need to
write your own code that bypasses the ORM in the name of performance.
This plugin provides some association operations that issue direct SQL
calls to make things go faster.

a. fast_copy
A method called fast_copy is added to has_and_belongs_to_many
collections that makes the process of cloning HABTM associations *MUCH*
more efficient. Simply replace person1.items = person2.items with
person1.items.fast_copy(person2) and your database, network and RAM
thank you. See below for more details.

b. ids
A method called ids is added to has_many and has_and_belongs_to_many
association collections. It returns the list of object ids in the
collection without unnecessarily instantiating the objects.

== Installation

1. This plugin requires that the memcache-client gem is installed.
   # gem install association_collection_tools

2. Install the plugin OR the gem
   $ script/plugin install
   - OR -
   # gem install association_collection_tools

== HABTM Fast Copy
Copies a HABTM association collection from one object to another
without instantiating a bunch of ActiveRecord objects. This is
faster than the standard assignment operation since:

1. Eliminates massive number of SQL calls used in standard HABTM
   copy by changing it from an O(n) operation to O(1) where
   n is the number of objects in the association collection.
2. It transfers only object IDs back and forth between the database
   instead of all object attributes. Resulting in less work for
   the database, less data transferred and less memory used in ruby.
3. It doesn't instantiate ActiveRecord objects in memory.

A normal HABTM copy (e.g., person1.items = person2.items) results
in the following SQL calls.

SELECT * FROM items INNER JOIN items_people ON =
items_people.item_id WHERE (items_people.person_id = 1 )
SELECT * FROM items INNER JOIN items_people ON =
items_people.item_id WHERE (items_people.person_id = 2 )
DELETE FROM items_people WHERE person_id = 2 AND item_id IN (4)
INSERT INTO items_people (`item_id`, `person_id`) VALUES (1, 2)
INSERT INTO items_people (`item_id`, `person_id`) VALUES (2, 2)
INSERT INTO items_people (`item_id`, `person_id`) VALUES (3, 2)

Notice that:
- items AR objects are instantiated unnecessarily (especially since
  person2.items are about to be deleted)
- 1 SQL call is issued for each object (item) in the association
  collection (items_people)

whereas person.items.fast_copy will result in the
the following SQL calls greatly reducing the impact on the database
and on ruby memory utilization.

DELETE FROM items_people WHERE person_id = 2
SELECT item_id FROM items_people WHERE person_id = 1
REPLACE INTO items_people (person_id,item_id) VALUES (2,3),(2,2),(2,1)

Here are some benchmarks:

when n = 10 and 26 objects in e2.groups: do |x| { for i in 1..n; e1.groups.clear;e1.groups = e2.groups;end } { for i in 1..n; e1.groups.clear;e1.groups.fast_copy(e2);end

    user system total real
1.140000 0.040000 1.180000 ( 1.832122)
0.020000 0.010000 0.030000 ( 0.125368)

when n = 100 and 26 objects in e2.groups:

     user system total real
11.140000 0.360000 11.500000 ( 18.171410)
0.140000 0.010000 0.150000 ( 2.368200)

This method also supports HABTM join tables with additional attributes.
Simply pass in an attribute hash as the second argument and it will
add the attributes to the records it creates in the join table.

e.g, person1.items.fast_copy(person2, {:created_at =>})

REALITY CHECK: The HABTM docs refer to collection_singular_ids=ids
which implies identical functionality, but I can't find mention of
this method in anything other than the documentation. Maybe this
actually already exists and I'm just blind, but from the looks of, it appears that it is a
documentation bug.

== HABTM and has_many ids
Return the list of IDs in this association collection without
instantiating a bunch of Active Record objects. What good is the id of
an object without the object itself? If you think about it for a
you're bound to come up with many uses, especially if you write a lot
SQL by hand. For instance, the fast_copy command documented above uses
this method to return an id list without instantiating AR objects. The
potential savings are enormous when you're dealing with hundreds or
of objects at a time.

== Bugs, Code and Contributing

There.s a RubyForge project set up at:

Anonymous SVN access:

$ svn checkout svn://