I am currently using the standard AR method create to insert
records into oracle db. It seems that the create method
generates an INSERT sql command for each record, and execute
them separately. I felt that it was a bit slow.(For a record
with about 50 columns, the insert speed is 25.3 records per
second, and for a record with 4 columns, the insert speed is
115.25 records per second)
Since all the data have been stored in an array/hash (as the
structure in db) before the inserting, I wonder if there is a
more efficient way to write all the data into the db.(e.g. a
table level command/an one-off command to insert or update a table)
You could either use the AR connection directly to write raw sql, or you
could try using the ActiveRecord Extensions module. I don't know that it
supports true batch insertion, but it does obviate the need to construct
a model object for each new row, which is where most of the cost of mass
imports seems to be. My mass imports went from hours to minutes once I
switched to it.
Yes, ar::extensions does do a true batch insert, it even supports
MySQL's ON DUPLICATE KEY UPDATE syntax, and it also verifies the size
of the query and breaks it down into individual queries as necessary
(ie. if you are inserting more than the MAX_PACKET_SIZE). It also
does a lot of other cool stuff. The hash-based condition extensions
via suffix are my favorite:
I answered this in a private email, but in case someone searches
archives and has this problem. The issue was that import was being
called on the controller and not on the model. It must be called on
the model,
I have been facing this problem for a while now. ar-extensions is
really good but it doesn't preserve the links between models like
one-to-one many-to-many etc.
I am trying to insert ~1M records daily. ActiveRecord hasn't a hope of
doing this in a reasonable time.
I am hoping that I am using the plugin in the wrong way. Anyone else
have similar problems?
Have you tried doing batches via transactions? By default activerecord will make every save operation it’s own transaction which adds overhead to the DB in many setups. See ActiveRecord::Transactions::ClassMethods, and put ten or a hundred records at a time in a transaction wrapper and see what you gain. The improvement you see from this depends on the DB engine in use as well as how much the DB is actually the bottleneck versus the processing you’re doing application-side per record. I suspect your problem is actually a combination of both DB overhead and unoptimized application logic. You might want to try some profiling of your application to see where the bottleneck really is - at the scale of 1M operations per day it’s worth doing some of that.
That’s my 2 cents, the disclaimer is that I am by no means a rails expert. Just trying to help out where I can.