validation, concurrency, and transactions

Hi,

  Something has been bothering me about model validation: how do I know that
the database has not changed in between when I validate my model's data, and
when it actually gets saved to the database? As a simple example, say I have
a User model like:

class User
  validates_uniqueness_of :username
end

When this gets saved to the database, the validation code checks that there is
not already a row containing the same value for username as in our object.
The object is then saved to the database and everyone is happy. But what if
there is another user trying to register with the same username at the same
time? I am assuming that with load balancing situations is it possible that
two rails processes could try to save a User object at the same time. So
what happens if the User object in each process checks that its username is
unique, and then each User object does an insert into the database. Is the
uniqueness constraint not violated?

One option would be to have a database-level uniqueness constraint, but I've
heard that this is not the "rails way", and in any case, this is just a
simple example. In general the validation code could be arbitrarily complex.

The other option that I can see is to use a transaction around the save
operation, and define an after_save callback that re-validates the data in
the database. If (for the username example above) there is more than one
identical username, the after_save callback could throw an exception which
(as I understand it) would back out the transaction. Although if this is
done, perhaps it is not worth validating the model before saving it in the
first place, since we would catch any errors in after_save.

Is this the way that this problem is usually handled? Or am I seeing a
problem where one does not really exist?

Thanks,

Michael McGreevy.

Something has been bothering me about model validation: how do I know that
the database has not changed in between when I validate my model's data, and
when it actually gets saved to the database? As a simple example, say I have
a User model like:

class User
  validates_uniqueness_of :username
end

When this gets saved to the database, the validation code checks that there is
not already a row containing the same value for username as in our object.
The object is then saved to the database and everyone is happy. But what if
there is another user trying to register with the same username at the same
time? I am assuming that with load balancing situations is it possible that
two rails processes could try to save a User object at the same time. So
what happens if the User object in each process checks that its username is
unique, and then each User object does an insert into the database. Is the
uniqueness constraint not violated?

One option would be to have a database-level uniqueness constraint, but I've
heard that this is not the "rails way", and in any case, this is just a
simple example. In general the validation code could be arbitrarily complex.

The other option that I can see is to use a transaction around the save
operation, and define an after_save callback that re-validates the data in
the database. If (for the username example above) there is more than one
identical username, the after_save callback could throw an exception which
(as I understand it) would back out the transaction. Although if this is
done, perhaps it is not worth validating the model before saving it in the
first place, since we would catch any errors in after_save.

Is this the way that this problem is usually handled? Or am I seeing a
problem where one does not really exist?

I think I'd put a unique constraint in the database and be done with it. Treat the problem at the source so to speak...

Rails shouldn't care if it's there... it will fail on save and you can deal with it at that point...

It's kind of like javascript form validation... it's nice, but you also need to do it on the server if you care about it at all...

2 cents.

-philip

This is a problem with an validation which fetches data before the save. Without a lock or serializing transactions, the validation is not a guarantee. Use a database constraint as a fallback.

jeremy

> Something has been bothering me about model validation: how do I know
> that the database has not changed in between when I validate my model's
> data, and when it actually gets saved to the database?

...

> One option would be to have a database-level uniqueness constraint, but
> I've heard that this is not the "rails way", and in any case, this is
> just a simple example. In general the validation code could be
> arbitrarily complex.

I think I'd put a unique constraint in the database and be done with it.
Treat the problem at the source so to speak...

Rails shouldn't care if it's there... it will fail on save and you can
deal with it at that point...

That's true, but as I mentioned, I have other cases where I need to do more
complex validations, which would require the use of stored procedures (I
think -- I've never written one) if I wanted to validate at the database
level, and I think this would be highly undesirable.

I suppose the problem is that there are two types of validation, which I'll
call "local" and "global". Local validation doesn't care about any data
which is already stored in the database (e.g. validates_length_of,
validates_presence_of), and can safely be handled in isolation by validating
before saving. Global validation is concerned with how the data in a model
fits in with the existing data in the database (e.g. a custom validation that
ensures that person is not an ancestor of himself) and cannot be handled in
isolation. All the examples of Rails code that I have seen, however, treat
"global" validation problems as "local" ones, at least to the extent that
they don't check the integrity of the database after the update. Admittedly
I haven't seen that much Rails code. So if anyone has a "best practice"
counter example, I'd love to see it.

Michael McGreevy.

If more databases had coherent, parseable error messages, we could catch the database exception and populate record.errors. Otherwise we could establish a naming convention for database constraints so we can deduce which validation failed. I tried these with PostgreSQL but was dissatisfied with its error reporting and naming conventions are a pain without migrations support to do it for you.

jeremy