ActiveRecord: ar-each_model

Hi,

today I went into a problem, when I had to iterate over a big result set. ActiveRecord produces a huge array of model instances, which consumed 1GB of memory on my machine. A comparable perl DBI script only used some KB for iterating over the result set.

Then I was suggested to use batch (#find_each) for the problem. But basically batch does the same thing by splitting the query into several queries, which only return 1000 model instances in an array, which still consumes more memory than necessary, but not that much. The problem: it was much slower (it took 25 minutes, while the version without batch took 90 seconds and the perl script took only 40 seconds) on my legacy database.

So I was searching for a method, which yields a model instance at a time from the result set of the query, but I could not find anything.

At the end, I tried to find a solution myself and so I came up to this simple solution

I want to put into discussion here, since I wonder, why similar things do not exist in ActiveRecord.

Regards Thomas

tvw wrote in post #1040336:

today I went into a problem, when I had to iterate over a big result set. ActiveRecord produces a huge array of model instances, which consumed 1GB of memory on my machine. A comparable perl DBI script only used some KB for iterating over the result set.

It is well known that ActiveRecord objects are pretty heavy weight. My guess is that your Perl script is instead returning you something very light weight. Something that is little more than key value pairs for each database row.

I'm also assuming that whatever you're doing with this large result set probably doesn't require "smart" heavy weight model objects.

In the past I used a framework that had a similar issue. It, however, had a built-in mechanism for dealing with the issue. It had something called "raw row fetching." Rather than returning true model objects, with all the intelligence built into them, you could opt to fetch raw rows that were represented by an array of dictionaries (hashes). A raw row could then be transformed into a full fledged model object on demand.

I don't know if ActiveRecord provides anything similar to this out-of-the-box. But, I'm sure someone must have developed something like this for Rails.

Hi,

today I went into a problem, when I had to iterate over a big result set. ActiveRecord produces a huge array of model instances, which consumed 1GB of memory on my machine. A comparable perl DBI script only used some KB for iterating over the result set.

How many records was this? 1gb is pretty crazy.

Then I was suggested to use batch (#find_each) for the problem. But basically batch does the same thing by splitting the query into several queries, which only return 1000 model instances in an array, which still consumes more memory than necessary, but not that much. The problem: it was much slower (it took 25 minutes, while the version without batch took 90 seconds and the perl script took only 40 seconds) on my legacy database.

This sounds like your legacy DB is somehow not indexing the ID column - after all, instantiating all those objects only takes 90 seconds.

--Matt Jones

Hi Robert,

tvw wrote in post #1040336:

> today I went into a problem, when I had to iterate over a big result > set. ActiveRecord produces a huge array of model instances, which > consumed 1GB of memory on my machine. A comparable perl DBI script > only used some KB for iterating over the result set.

It is well known that ActiveRecord objects are pretty heavy weight. My guess is that your Perl script is instead returning you something very light weight. Something that is little more than key value pairs for each database row.

ActiveRecord objects are by far not that heavy weight, but it is true, that the perl script returns just a hash for each row. So if I would get an ActiveRecord object for each row, I would not have tried to find a better solution. But ActiveRecord produces an entire Enumeration object containing all records of the query as ActiveRecord objects, before it starts returning the first object in the loop. If the perl script would produce an array of hashes, which is the light weight counterpart, it would consume a massive and very noticeable amount of memory too, though not as much as ActiveRecord. But in the perl script, there never exists more than one hash at the same time, which is the last row retrieved from the database.

I'm also assuming that whatever you're doing with this large result set probably doesn't require "smart" heavy weight model objects.

Yes, that is true, but dealing with the objects is more comfortable, since they are smart and it just costs me only a few bytes and only a little more time with the solution, I found for the problem.

In the past I used a framework that had a similar issue. It, however, had a built-in mechanism for dealing with the issue. It had something called "raw row fetching." Rather than returning true model objects, with all the intelligence built into them, you could opt to fetch raw rows that were represented by an array of dictionaries (hashes). A raw row could then be transformed into a full fledged model object on demand.

I don't know if ActiveRecord provides anything similar to this out-of-the-box. But, I'm sure someone must have developed something like this for Rails.

The ActiveRecord database adapters provide such a mechanism, and that is what my solution uses: It retrieves the rows from the database as hashes and generates an ActiveRecord object, which it then yields. This just costs a few bytes overhead over raw hashes and a little more time.

The funny thing was, that with that solution the results in my sqlite3 demo database worked as expected, while against my legacy sqlserver database, it consumed a lot of memory too. The reason was, that the underlying TinyTDS database driver, which the sqlserver adapter for ActiveRecord uses, caches each record it retrieves by default and must be turned off.

So the funny thing is, when you use the ActiveRecord adapter, you do not only have the memory consumption, which results in AR building an Enumaration of objects, but on top of this, you have a cache of all database rows in the driver too, since the adapter uses the default option, as far as I can see.

Regards Thomas

> Hi,

> today I went into a problem, when I had to iterate over a big result > set. ActiveRecord produces a huge array of model instances, which > consumed 1GB of memory on my machine. A comparable perl DBI script > only used some KB for iterating over the result set.

How many records was this? 1gb is pretty crazy.

400000 records

> Then I was suggested to use batch (#find_each) for the problem. But > basically batch does the same thing by splitting the query into > several queries, which only return 1000 model instances in an array, > which still consumes more memory than necessary, but not that much. > The problem: it was much slower (it took 25 minutes, while the version > without batch took 90 seconds and the perl script took only 40 > seconds) on my legacy database.

This sounds like your legacy DB is somehow not indexing the ID column - after all, instantiating all those objects only takes 90 seconds.

Matt, the ID column is indexed, but probably not all fields the query uses, since the table is used for logging events and must be fast for writing rather than reading. The 400_000 records I retrieve, are 400_000 among millions of records, which lie close to each other, but not next to each other. And the database is in heavy production while reading those data.

So some amount of the 40 seconds, the perl script runs, does the query itself cost. When you now do 400 queries in batches rather than 1 query to retrieve all records, and the query itself may cost you only 1 second, you already have spent 400 seconds which is about 7 minutes without retrieving and processing a single row.

Regards Thomas