5 minutes to save 25,000 model objects?

Hi guys. On my laptop, I'm finding that it's taking about 5 minutes to save 25,000 model instances.

The model has 5 validations, and I'm saving all of the model instances in a single transaction: http://pastie.org/509281

25,000 model instances is a lot, but 5 minutes seems too long. Any suggestions for how to speed this up?

Thanks, Nick

Hi Ashwin. I'm finding that the uniqueness constraint doesn't add much additional processing time.

You're right that the validations are quite standard. However, you can't rely on input data; it must be validated server-side.

Thanks, Nick

Hi Marnen. I'm surprised I haven't come across ar-extensions yet. It looks like the exact solution I need to this problem. Thanks for recommending it! -Nick

> Try commenting out the uniqueness constraint and check the result. > To be fair, the validations listed are fairly standard and can be > done before the keyword object is created ( client-side using > something like ajax + jquery ). > > Hope this helps. > > -Ashwin

Hi Ashwin. I'm finding that the uniqueness constraint doesn't add much additional processing time.

It's a bogus constraint, however, because...

You're right that the validations are quite standard. However, you can't rely on input data; it must be validated server-side.

it checks uniqueness in the wrong place. validates_uniqueness_of does not preclude an interleaving of operations of operations from two processes that looks like this

          A B   create(:keyword => 'foo')   validate uniqueness                                create(:keyword => 'foo')                                validate uniqueness                                save   save

To be safe, you must have a constraint in the database. If you have that, the additional app-level validation just wastes performance.

  add_index :keywords, :keyword, :unique => true

Michael

Are you sure about that?: http://pastie.org/509569

I had a similar situation where I had to import groups of around 2,500 objects at a time based upon the files found within a directory, back before ar-extensions was around, and by outputing a text file and then telling mysql to use that as an infile, the time dropped from a total of 45 seconds on average to about 2 seconds, for the whole action. I expected some improvment, but I had no idea how much faster it would really be until I did it.

Carl

Quoting Nick Hoffman <nick@deadorange.com>:

> > > > Try commenting out the uniqueness constraint and check the result. > > > To be fair, the validations listed are fairly standard and can be > > > done before the keyword object is created ( client-side using > > > something like ajax + jquery ). > > > > Hope this helps. > > > > -Ashwin > > > Hi Ashwin. I'm finding that the uniqueness constraint doesn't add > > much additional processing time. > > It's a bogus constraint, however, because... > > > You're right that the validations are quite standard. However, you > > can't rely on input data; it must be validated server-side. > > it checks uniqueness in the wrong place. validates_uniqueness_of does > not preclude an interleaving of operations of operations from two > processes that looks like this > > A B > create(:keyword => 'foo') > validate uniqueness > create(:keyword => 'foo') > validate uniqueness > save > save > > To be safe, you must have a constraint in the database. If you have > that, the additional app-level validation just wastes performance. > > add_index :keywords, :keyword, :unique => true > > Michael > > -- > Michael Schuerig > mailto:mich...@schuerig.dehttp://www.schuerig.de/michael/

Are you sure about that?: http://pastie.org/509569

Your example is two saves in the same thread/process. The counter-example REQUIRES two concurrent processes/threads. And yes his counter-example is correct. Unless you can guarantee that your application will NEVER run multi-process/multi-threaded, e.g., Webrick. If your application will ever be run by multiple people at once, it will almost certainly require this. Such is the price of success.

Also pushing the uniqueness validation into the database will cut your run time. The uniqueness validation is implemented by doing a lookup on the unique field before doing the insert. Letting the DB do it cuts the number of database calls in half. Doing the inserts in batches will also increase the speed significantly. Yeah, the code isn't as pretty but if it is already clear you need the speed, do it. It is doubtful that it will be the ugliest bit of code in your app.

HTH,   Jeffrey

Try to use bulk insert for save all of your instances. http://www.igvita.com/2007/07/11/efficient-updates-data-import-in-rails/

Try to use bulk insert for save all of your instances. Efficient Updates & Data Import in Rails - igvita.com

Hi guys. On my laptop, I'm finding that it's taking about 5 minutes to save 25,000 model instances.

The model has 5 validations, and I'm saving all of the model instances in a single transaction:http://pastie.org/509281

25,000 model instances is a lot, but 5 minutes seems too long. Any suggestions for how to speed this up?

You might also look at your database to see if it can handle a transaction that large. Depending on how it does it may need to store quite a lot in memory...