ActiveRecord.find.each do

I have noticed that when I use the construct...

ActiveRecordChild.find(:all).each do |record|
  record.do_something
end

That it still returns the array of all records. With larger datasets
this can be murder on memory..
I was trying to do this with one of my migrations and my computer
choked after the migration had consumed 1.8GB of memory.
I ended up having to rewrite the migration as..

max = ActiveRecordChild.maximum(:id)
1.upto(max).each do |i|
  if record = ActiveRecordChild.find_by_id(i)
    record.do_something
  end
end

What I would like to propose is having find accept a block like this.

ActiveRecordChild.find(:all) do |record|
  record.do_something
end

where instead of even composing a array of ActiveRecord objects, it
would just pass in to the block each record as it gets it processed,
and then forget about it as soon as the block exits.

A possible additional feature might be to collect the results of the
block like a 'collect' call and possibly even omitting that when the
block returns something that evaluates as false ( false or nil ). Over
all however, I think for memory considerations, I would rather have
the form of find with a block return nothing, as if I really want to
collect the results, I can alwase push something to an array.

That would cause a query for each record and performance would very likely suffer. When I've had to do a similar thing over a table with many (100,000+) records, I've done something like:

total = Model.count(to_refresh)
limit = [ 100, total ].min
0.step(total-1, limit) do |offset|
   Model.find(:all, :limit => limit, :offset => offset).each do |model|
     # do stuff
   end
end

If you have a condition that limits what comes back, you might have to tweak the offset if the "stuff" you do causes records to fall out of the condition.

Of course, if the "do_something" is simple enough you can use .update_all (but that's a small subset of all the things that you *could* do).

-Rob

Rob Biedenharn http://agileconsultingllc.com
Rob@AgileConsultingLLC.com

If you're using will_paginate (why wouldn't you be using it anyway)
you can just call:

Image.paginated_each( :per_page => 20, :conditions => { :cached =>
false }, :order => 'created_at asc') do |image|
     #do something with your image here
end

The paginated_each method will automatically paginate your objects so
you won't have to load them all.

I have noticed that when I use the construct...

ActiveRecordChild.find(:all).each do |record|
record.do_something
end

That it still returns the array of all records.

It's just a normal array on which you call find. What each returns is
immaterial - as far as memory concerns go it's already too late.

With larger datasets
this can be murder on memory..
I was trying to do this with one of my migrations and my computer
choked after the migration had consumed 1.8GB of memory.
I ended up having to rewrite the migration as..

max = ActiveRecordChild.maximum(:id)
1.upto(max).each do |i|
if record = ActiveRecordChild.find_by_id(i)
   record.do_something
end
end

Yuck. Would have been faster to fetch them in chunks.

What I would like to propose is having find accept a block like this.

ActiveRecordChild.find(:all) do |record|
record.do_something
end

where instead of even composing a array of ActiveRecord objects, it
would just pass in to the block each record as it gets it processed,
and then forget about it as soon as the block exits.

A possible additional feature might be to collect the results of the
block like a 'collect' call and possibly even omitting that when the
block returns something that evaluates as false ( false or nil ). Over
all however, I think for memory considerations, I would rather have
the form of find with a block return nothing, as if I really want to
collect the results, I can alwase push something to an array.

Do the database adapters allow you to page through results before
they've received them all?