The details of the query are not important, the significant thing (I
think) is that the the result set is relatively large (~100000 rows
consisting of 3 numbers)
100.000 rows of numbers (actually objects) where each has 20 bytes of
minimum ruby overhead would mean 300.000*20 = 6.000.000 bytes. Possibly
even more since objects may have instance variables. Plus the the memory
allocator has its own overhead too. Also each row is an object in itself.
Think you got it pretty well pinpointed.
While you could try forcing the GC in an after_filter or even better in a
before_filter, the best option would be to create fewer objects at Ruby
level.
We've also done work on large datasets. At first I was manipulating
them with ActiveRecord, but the processing time and memory footprint
were just too much for poor Ruby. Since it's only three values coming
back per row, try making a direct database and manipulating the results
yourself. Though I haven't done a lot of deep work in the source code,
I do know ActiveRecord does a ton of work to make database objects easy
for you. If you can avoid that work you'll get a nice performance
increase.
We switched to using Java for the processing we're doing, though I'll
most likely move to Python here shortly so it's easier to modify the
scripts. As an rough benchmark, we were processing 100,000 entries out
of an text file into a single database table. Using Ruby and
ActiveRecord the process took over 48 hours, while in Java using iBatis
it took about three hours. There's a lot of additional processing
going on with each record, such as checking for updates versus creates
and maintaining PKIDs, so ActiveRecord gets a good work out.