RE: [Rails] how to insert a batch of records into db more efficiently

I am currently using the standard AR method create to insert records into oracle db. It seems that the create method generates an INSERT sql command for each record, and execute them separately. I felt that it was a bit slow.(For a record with about 50 columns, the insert speed is 25.3 records per second, and for a record with 4 columns, the insert speed is 115.25 records per second)

Since all the data have been stored in an array/hash (as the structure in db) before the inserting, I wonder if there is a more efficient way to write all the data into the db.(e.g. a table level command/an one-off command to insert or update a table)

You could either use the AR connection directly to write raw sql, or you could try using the ActiveRecord Extensions module. I don't know that it supports true batch insertion, but it does obviate the need to construct a model object for each new row, which is where most of the cost of mass imports seems to be. My mass imports went from hours to minutes once I switched to it.

- donald

Yes, ar::extensions does do a true batch insert, it even supports MySQL's ON DUPLICATE KEY UPDATE syntax, and it also verifies the size of the query and breaks it down into individual queries as necessary (ie. if you are inserting more than the MAX_PACKET_SIZE). It also does a lot of other cool stuff. The hash-based condition extensions via suffix are my favorite:

http://www.continuousthinking.com/tags/arext

I answered this in a private email, but in case someone searches archives and has this problem. The issue was that import was being called on the controller and not on the model. It must be called on the model,

  ie: Book.import

See http://www.continuousthinking.com/tags/arext for more information

Zach

Can you send me the import code you had written that produced this error? Email me privately, thanks,

Zach

My friend Jonah wrote Crewait, which is easier and faster than ARExtensions for bulk insertion.

http://www.jonah.org/articles/crewait_go_.html

I have been facing this problem for a while now. ar-extensions is really good but it doesn't preserve the links between models like one-to-one many-to-many etc.

I am trying to insert ~1M records daily. ActiveRecord hasn't a hope of doing this in a reasonable time.

I am hoping that I am using the plugin in the wrong way. Anyone else have similar problems?

Cheers, Patrick

Have you tried doing batches via transactions? By default activerecord will make every save operation it’s own transaction which adds overhead to the DB in many setups. See ActiveRecord::Transactions::ClassMethods, and put ten or a hundred records at a time in a transaction wrapper and see what you gain. The improvement you see from this depends on the DB engine in use as well as how much the DB is actually the bottleneck versus the processing you’re doing application-side per record. I suspect your problem is actually a combination of both DB overhead and unoptimized application logic. You might want to try some profiling of your application to see where the bottleneck really is - at the scale of 1M operations per day it’s worth doing some of that.

That’s my 2 cents, the disclaimer is that I am by no means a rails expert. Just trying to help out where I can.

jsw