ActiveRecord.find.each do

I have noticed that when I use the construct...

ActiveRecordChild.find(:all).each do |record|   record.do_something end

That it still returns the array of all records. With larger datasets this can be murder on memory.. I was trying to do this with one of my migrations and my computer choked after the migration had consumed 1.8GB of memory. I ended up having to rewrite the migration as..

max = ActiveRecordChild.maximum(:id) 1.upto(max).each do |i|   if record = ActiveRecordChild.find_by_id(i)     record.do_something   end end

What I would like to propose is having find accept a block like this.

ActiveRecordChild.find(:all) do |record|   record.do_something end

where instead of even composing a array of ActiveRecord objects, it would just pass in to the block each record as it gets it processed, and then forget about it as soon as the block exits.

A possible additional feature might be to collect the results of the block like a 'collect' call and possibly even omitting that when the block returns something that evaluates as false ( false or nil ). Over all however, I think for memory considerations, I would rather have the form of find with a block return nothing, as if I really want to collect the results, I can alwase push something to an array.

That would cause a query for each record and performance would very likely suffer. When I've had to do a similar thing over a table with many (100,000+) records, I've done something like:

total = Model.count(to_refresh) limit = [ 100, total ].min 0.step(total-1, limit) do |offset|    Model.find(:all, :limit => limit, :offset => offset).each do |model|      # do stuff    end end

If you have a condition that limits what comes back, you might have to tweak the offset if the "stuff" you do causes records to fall out of the condition.

Of course, if the "do_something" is simple enough you can use .update_all (but that's a small subset of all the things that you *could* do).

-Rob

Rob Biedenharn http://agileconsultingllc.com Rob@AgileConsultingLLC.com

If you're using will_paginate (why wouldn't you be using it anyway) you can just call:

Image.paginated_each( :per_page => 20, :conditions => { :cached => false }, :order => 'created_at asc') do |image|      #do something with your image here end

The paginated_each method will automatically paginate your objects so you won't have to load them all.

I have noticed that when I use the construct...

ActiveRecordChild.find(:all).each do |record| record.do_something end

That it still returns the array of all records.

It's just a normal array on which you call find. What each returns is
immaterial - as far as memory concerns go it's already too late.

With larger datasets this can be murder on memory.. I was trying to do this with one of my migrations and my computer choked after the migration had consumed 1.8GB of memory. I ended up having to rewrite the migration as..

max = ActiveRecordChild.maximum(:id) 1.upto(max).each do |i| if record = ActiveRecordChild.find_by_id(i)    record.do_something end end

Yuck. Would have been faster to fetch them in chunks.

What I would like to propose is having find accept a block like this.

ActiveRecordChild.find(:all) do |record| record.do_something end

where instead of even composing a array of ActiveRecord objects, it would just pass in to the block each record as it gets it processed, and then forget about it as soon as the block exits.

A possible additional feature might be to collect the results of the block like a 'collect' call and possibly even omitting that when the block returns something that evaluates as false ( false or nil ). Over all however, I think for memory considerations, I would rather have the form of find with a block return nothing, as if I really want to collect the results, I can alwase push something to an array.

Do the database adapters allow you to page through results before
they've received them all?