AR, find(:all), loops and memory usage

I'm a beginning programming using ActiveRecord outside of Rails to do
conditional processing of database records. So far, I've been
successful. However, my script loads all matching records into memory
first. There are hundreds of thousands of matching records so the
script quickly consumes over 500MB of RAM before any processing is
done. Is there a way to avoid this preloading of row objects in memory?
Below is an example of the type of thing I'm trying to do (although the
actual table row-level processing I'm doing is more complicated than
the example, but this isn't relevant to my question):

#example code
require 'rubygems'
require 'mysql'
require 'active_record'

ActiveRecord::Base.establish_connection(
:adapter => "mysql",
:username => "root",
:password => "password",
:database => "my_schema"
)

class MyTable < ActiveRecord::Base
end

for m in MyTable.find(:all, :conditions => "some_column='criterion to
match'")
  m.other_column = "new value"
  m.save
end
#thanks for any tips

Well, for some reason it took a few days for my first message to appear
in google-groups. In the meantime, I figured out an imperfect but
useable solution. I changed the loop so that it only loads the id
values into memory, and then issues a select query for each id, so that
it only creates the an object with all column values for one row at a
time. Now my loop looks like this:

for m in MyTable.find(:all, :select => "id", :conditions =>
"some_column='criterion to match'")
  p = MyTable.find(m.id)
  p.other_column = "new value"
  p.save
end

In my particular case, this saved me from creating 500MB+ of objects in
memory and instead only used ~70MB using only the id values.
Granted the script has to issue a SELECT query for every id, but in my
case this is acceptable as the loop is on a timer anyway (only querying
the database every couple of seconds).

If anyone else has a more elegant solution to this problem, please
chime in.

You’d be better off writing custom SQL for this particular case. If your database supports subselects, you could do:

MyTable.execute(“UPDATE my_table SET other_column = ‘new_value’ WHERE id IN (SELECT id FROM my_table WHERE some_column = ‘criterion to match’)”)

David Rose

David...thanks for the reply. However, I'm using the google-geocoder
gem and so I have a timed loop (50,000 Google Maps API daily geocode
limit = request every 1.728 seconds) that grabs an address (:condition
=> "latitude=''") and then geocodes the parcel. I just didn't want to
clutter up my example or distract the discussion with the particulars.
For what it's worth, here is the actual loop:

for m in Property.find(:all, :select => "id", :conditions => "latitude
= '0'")
  p = Property.find(m.id)
  address = p.address + ", " + p.city.name + ", " + p.state + " " +
p.zip
  begin
    location = gg.locate address
  rescue GoogleGeocode::AddressError
  else
    p.latitude = location.latitude
    p.longitude = location.longitude
    p.save
  end
  Time.new
  sleep 1.728
end

Using the 'google-geocode' gem and Active Record to interact with the
database was the cleanest and easiest way for me to accomplish my
specific task. And it works. Again, I'm a beginner programmer so that
has a lot to do with my choices here.
So if there is a smarter way to do this, I'm all ears and eager to
learn.

John-Scott