Hi Robert,
tvw wrote in post #1040336:
> today I went into a problem, when I had to iterate over a big result
> set. ActiveRecord produces a huge array of model instances, which
> consumed 1GB of memory on my machine. A comparable perl DBI script
> only used some KB for iterating over the result set.
It is well known that ActiveRecord objects are pretty heavy weight. My
guess is that your Perl script is instead returning you something very
light weight. Something that is little more than key value pairs for
each database row.
ActiveRecord objects are by far not that heavy weight, but it is true,
that the perl script returns just a hash for each row. So if I would
get an ActiveRecord object for each row, I would not have tried to
find a better solution. But ActiveRecord produces an entire
Enumeration object containing all records of the query as ActiveRecord
objects, before it starts returning the first object in the loop. If
the perl script would produce an array of hashes, which is the light
weight counterpart, it would consume a massive and very noticeable
amount of memory too, though not as much as ActiveRecord. But in the
perl script, there never exists more than one hash at the same time,
which is the last row retrieved from the database.
I'm also assuming that whatever you're doing with this large result set
probably doesn't require "smart" heavy weight model objects.
Yes, that is true, but dealing with the objects is more comfortable,
since they are smart and it just costs me only a few bytes and only a
little more time with the solution, I found for the problem.
In the past I used a framework that had a similar issue. It, however,
had a built-in mechanism for dealing with the issue. It had something
called "raw row fetching." Rather than returning true model objects,
with all the intelligence built into them, you could opt to fetch raw
rows that were represented by an array of dictionaries (hashes). A raw
row could then be transformed into a full fledged model object on
demand.
I don't know if ActiveRecord provides anything similar to this
out-of-the-box. But, I'm sure someone must have developed something like
this for Rails.
The ActiveRecord database adapters provide such a mechanism, and that
is what my solution uses: It retrieves the rows from the database as
hashes and generates an ActiveRecord object, which it then yields.
This just costs a few bytes overhead over raw hashes and a little more
time.
The funny thing was, that with that solution the results in my sqlite3
demo database worked as expected, while against my legacy sqlserver
database, it consumed a lot of memory too. The reason was, that the
underlying TinyTDS database driver, which the sqlserver adapter for
ActiveRecord uses, caches each record it retrieves by default and must
be turned off.
So the funny thing is, when you use the ActiveRecord adapter, you do
not only have the memory consumption, which results in AR building an
Enumaration of objects, but on top of this, you have a cache of all
database rows in the driver too, since the adapter uses the default
option, as far as I can see.
Regards
Thomas