5 minutes to save 25,000 model objects?

Hi guys. On my laptop, I'm finding that it's taking about 5 minutes to
save 25,000 model instances.

The model has 5 validations, and I'm saving all of the model instances
in a single transaction:
http://pastie.org/509281

25,000 model instances is a lot, but 5 minutes seems too long. Any
suggestions for how to speed this up?

Thanks,
Nick

Hi Ashwin. I'm finding that the uniqueness constraint doesn't add much
additional processing time.

You're right that the validations are quite standard. However, you
can't rely on input data; it must be validated server-side.

Thanks,
Nick

Hi Marnen. I'm surprised I haven't come across ar-extensions yet. It
looks like the exact solution I need to this problem. Thanks for
recommending it!
-Nick

> Try commenting out the uniqueness constraint and check the result.
> To be fair, the validations listed are fairly standard and can be
> done before the keyword object is created ( client-side using
> something like ajax + jquery ).
>
> Hope this helps.
>
> -Ashwin

Hi Ashwin. I'm finding that the uniqueness constraint doesn't add
much additional processing time.

It's a bogus constraint, however, because...

You're right that the validations are quite standard. However, you
can't rely on input data; it must be validated server-side.

it checks uniqueness in the wrong place. validates_uniqueness_of does
not preclude an interleaving of operations of operations from two
processes that looks like this

          A B
  create(:keyword => 'foo')
  validate uniqueness
                               create(:keyword => 'foo')
                               validate uniqueness
                               save
  save

To be safe, you must have a constraint in the database. If you have
that, the additional app-level validation just wastes performance.

  add_index :keywords, :keyword, :unique => true

Michael

Are you sure about that?:
http://pastie.org/509569

I had a similar situation where I had to import groups of around 2,500
objects at a time based upon the files found within a directory, back
before ar-extensions was around, and by outputing a text file and then
telling mysql to use that as an infile, the time dropped from a total
of 45 seconds on average to about 2 seconds, for the whole action. I
expected some improvment, but I had no idea how much faster it would
really be until I did it.

Carl

Quoting Nick Hoffman <nick@deadorange.com>:

>
> > > Try commenting out the uniqueness constraint and check the result.
> > > To be fair, the validations listed are fairly standard and can be
> > > done before the keyword object is created ( client-side using
> > > something like ajax + jquery ).
>
> > > Hope this helps.
>
> > > -Ashwin
>
> > Hi Ashwin. I'm finding that the uniqueness constraint doesn't add
> > much additional processing time.
>
> It's a bogus constraint, however, because…
>
> > You're right that the validations are quite standard. However, you
> > can't rely on input data; it must be validated server-side.
>
> it checks uniqueness in the wrong place. validates_uniqueness_of does
> not preclude an interleaving of operations of operations from two
> processes that looks like this
>
> A B
> create(:keyword => 'foo')
> validate uniqueness
> create(:keyword => 'foo')
> validate uniqueness
> save
> save
>
> To be safe, you must have a constraint in the database. If you have
> that, the additional app-level validation just wastes performance.
>
> add_index :keywords, :keyword, :unique => true
>
> Michael
>
> –
> Michael Schuerig
> mailto:mich...@schuerig.dehttp://www.schuerig.de/michael/

Are you sure about that?:
http://pastie.org/509569

Your example is two saves in the same thread/process. The counter-example
REQUIRES two concurrent processes/threads. And yes his counter-example is
correct. Unless you can guarantee that your application will NEVER run
multi-process/multi-threaded, e.g., Webrick. If your application will ever be
run by multiple people at once, it will almost certainly require this. Such
is the price of success.

Also pushing the uniqueness validation into the database will cut your run
time. The uniqueness validation is implemented by doing a lookup on the
unique field before doing the insert. Letting the DB do it cuts the number of
database calls in half. Doing the inserts in batches will also increase the
speed significantly. Yeah, the code isn't as pretty but if it is already
clear you need the speed, do it. It is doubtful that it will be the ugliest
bit of code in your app.

HTH,
  Jeffrey

Try to use bulk insert for save all of your instances.
http://www.igvita.com/2007/07/11/efficient-updates-data-import-in-rails/

Try to use bulk insert for save all of your instances.
http://www.igvita.com/2007/07/11/efficient-updates-data-import-in-rails/

Hi guys. On my laptop, I'm finding that it's taking about 5 minutes to
save 25,000 model instances.

The model has 5 validations, and I'm saving all of the model instances
in a single transaction:http://pastie.org/509281

25,000 model instances is a lot, but 5 minutes seems too long. Any
suggestions for how to speed this up?

You might also look at your database to see if it can handle a transaction that large. Depending on how it does it may need to store quite a lot in memory...