AR, find(:all), loops and memory usage

I'm a beginning programming using ActiveRecord outside of Rails to do conditional processing of database records. So far, I've been successful. However, my script loads all matching records into memory first. There are hundreds of thousands of matching records so the script quickly consumes over 500MB of RAM before any processing is done. Is there a way to avoid this preloading of row objects in memory? Below is an example of the type of thing I'm trying to do (although the actual table row-level processing I'm doing is more complicated than the example, but this isn't relevant to my question):

#example code require 'rubygems' require 'mysql' require 'active_record'

ActiveRecord::Base.establish_connection( :adapter => "mysql", :username => "root", :password => "password", :database => "my_schema" )

class MyTable < ActiveRecord::Base end

for m in MyTable.find(:all, :conditions => "some_column='criterion to match'")   m.other_column = "new value"   m.save end #thanks for any tips

Well, for some reason it took a few days for my first message to appear in google-groups. In the meantime, I figured out an imperfect but useable solution. I changed the loop so that it only loads the id values into memory, and then issues a select query for each id, so that it only creates the an object with all column values for one row at a time. Now my loop looks like this:

for m in MyTable.find(:all, :select => "id", :conditions => "some_column='criterion to match'")   p = MyTable.find(m.id)   p.other_column = "new value"   p.save end

In my particular case, this saved me from creating 500MB+ of objects in memory and instead only used ~70MB using only the id values. Granted the script has to issue a SELECT query for every id, but in my case this is acceptable as the loop is on a timer anyway (only querying the database every couple of seconds).

If anyone else has a more elegant solution to this problem, please chime in.

You’d be better off writing custom SQL for this particular case. If your database supports subselects, you could do:

MyTable.execute(“UPDATE my_table SET other_column = ‘new_value’ WHERE id IN (SELECT id FROM my_table WHERE some_column = ‘criterion to match’)”)

David Rose

David...thanks for the reply. However, I'm using the google-geocoder gem and so I have a timed loop (50,000 Google Maps API daily geocode limit = request every 1.728 seconds) that grabs an address (:condition => "latitude=''") and then geocodes the parcel. I just didn't want to clutter up my example or distract the discussion with the particulars. For what it's worth, here is the actual loop:

for m in Property.find(:all, :select => "id", :conditions => "latitude = '0'")   p = Property.find(m.id)   address = p.address + ", " + p.city.name + ", " + p.state + " " + p.zip   begin     location = gg.locate address   rescue GoogleGeocode::AddressError   else     p.latitude = location.latitude     p.longitude = location.longitude     p.save   end   Time.new   sleep 1.728 end

Using the 'google-geocode' gem and Active Record to interact with the database was the cleanest and easiest way for me to accomplish my specific task. And it works. Again, I'm a beginner programmer so that has a lot to do with my choices here. So if there is a smarter way to do this, I'm all ears and eager to learn.

John-Scott