Association Field Find Caching Technique

Hi everyone,

I tend to be searching my association collections a lot for specific field values. I could do a foo.bars.find_all_by_fieldname(fieldvalue) each time, but why hit the database over again? Especially when the full collection is already in memory. I wrote this module to extend associations to look at the already loaded collection for field/value matches.

module MyAssociationExtentions

  def field_find(field, value, opts = {})     @field_cache = nil if opts['reload']     ((@field_cache ||= {field => {}})[field] ||= {})[value] ||= self.is_a?(Enumerable) ? (self.select { |task| task.send(field) == value }) : (self if self.send(field) == value)    end

end

Here it is again written out into multi lines so it is easier to read in the forum

module MyAssociationExtentions   def field_find(field, value, opts = {})     @field_cache = nil if opts['reload']     @field_cache = {} unless @field_cache     @field_cache[field] = {value => } unless @field_cache[field]     @field_cache[field][value] ||= self.is_a?(Enumerable) ? (self.select { |task| task.send(field) == value }) : (self if self.send(field) == value)   end end

Questions: 1) I use a multi-dimentional hash to store each potential field/value lookup. Is this too memory intensive? 2) Does this even theoretically improve performance vs the database? or is it a waste of time 3) Is there a better way to write that line (all those annoying checks to see if the hash is already there) 4) could I push this into memcache to lower the memory usage by distributing it across mongrels.

Thanks

Steve,

Have you tried to benchmark your solution, this should answer the question whether this solution has any performance gains. Generally speaking anything stored in physical memory is accessible much much faster than any IO operations.

Also in your solution, when do you invalidate the @field_cache and re- read the field values from database ?

Regards, -daya

Hi Daya,

Don't know how I missed out on require 'benchmark', but I did some testing with it, and it is so much faster for finding by field.

The first time it runs it performs about the same as doing a find_by because it hasn't loaded the collection, if the collection is already in memory it is lightning fast. I have added a reload flag that will skip the use of the @field_cache in case dynamic data is being used.

Once the collection loaded finders should not hit the DB anymore, they are too expensive, Let me know if you see any holes in this. -Steve

Here are some benchmarks

setup foo = Foo.find(:first)

Hitting the DB on each look Benchmark.bm { |x| x.report { 5000.times { foo.bars.find_all_by_some_field(1) } } }       user system total real 14.260000 3.030000 17.290000 ( 18.280076)

Without Field Caching

Benchmark.bm { |x| x.report { 5000.times { foo.bars.field_finder("some_field", 1, true) } } }       user system total real   0.210000 0.070000 0.280000 ( 0.269146)

With Field Caching

Benchmark.bm { |x| x.report { 5000.times { foo.bars.field_finder("some_field", 1, false) } } }       user system total real   0.110000 0.040000 0.150000 ( 0.155943)