ar validations cannot work unless transaction isolation level is serialzable

given that this happens to be the case

   http://drawohara.tumblr.com/post/18926188

it seems that *any* validation should alter the behavior of Base#create to force the isolation level - otherwise validations end up being nothing be a stack of race conditions. sorry if i'm late to the party but this came as a real surprise to me. at the very least it seems the docs should have a big fat warning about this or the code itself should warn when using validations and the client code is not under a strictly serializable transaction.

kind regards

a @ http://codeforpeople.com/

given that this happens to be the case

   http://drawohara.tumblr.com/post/18926188

it seems that *any* validation should alter the behavior of
Base#create to force the isolation level - otherwise validations end
up being nothing be a stack of race conditions. sorry if i'm late to
the party but this came as a real surprise to me. at the very least
it seems the docs should have a big fat warning about this or the
code itself should warn when using validations and the client code is
not under a strictly serializable transaction.

I'm not sure that this applies to any validations other than
validates_uniqueness_of?

But yes, you need a unique index or serializable transactions if you
want this guaranteed. If there's a database-agnostic way to implement
the transaction stuff that sounds like a good idea, otherwise we could
at least warn in the documentation.

I'm not sure that this applies to any validations other than
validates_uniqueness_of?

   validates_associated
   validates_each
   validates_length_of # esp if a text value is being appended to

but you are right in pointing out that as the main one.

But yes, you need a unique index or serializable transactions if you
want this guaranteed. If there's a database-agnostic way to implement
the transaction stuff that sounds like a good idea, otherwise we could
at least warn in the documentation.

i'd vote for sticking

   def transaction options = {}
   end

as a start, into the base connection adapter, individual adapters can then make appropriate use of the :isolation key which #create could be refactored to use iff validation is present on the record's model class being saved. overall the ar code doesn't seem to have put a ton of concern into transactions and passing options through would at least let the connection adapters have the option (no pun intended) :wink:

thoughts?

btw - we had this happen three times today. the culprit was submit button that is not disabled after the first click and people were double clicking it out of habbit - thus two requests come in quite close together. js can solve it, but i'm not comfortable maintaining db integrity via js :wink:

a @ http://codeforpeople.com/

   validates_associated
   validates_each

These two just aggregate other validations

   validates_length_of # esp if a text value is being appended to

save always provides the full value of every column, there's no way
you can be 'appending' and using active record's save.

i'd vote for sticking

   def transaction options = {}
   end

thoughts?

This definitely sounds like a long term option to consider, assuming
the terminology and api can be unified across all the databases etc.

btw - we had this happen three times today. the culprit was submit
button that is not disabled after the first click and people were
double clicking it out of habbit - thus two requests come in quite
close together. js can solve it, but i'm not comfortable maintaining
db integrity via js :wink:

Stick a unique index on the field and you'll be fine.

This definitely sounds like a long term option to consider, assuming
the terminology and api can be unified across all the databases etc.

i've already pulled out a ton of transaction code that supports nested transactions too - i'll see if i can work this in

Stick a unique index on the field and you'll be fine.

that's fine - but then validates_uniqness_of is not needed - right now it's just there to provide users access to an easy a race condition alongside find_or_create_by. although the answer is simple, it simply raises questions, for me at least, about whether those methods should even exist. my experience is that, even though they should be, most web developers are not versed in acid. pun intended.

cheers.

a @ http://codeforpeople.com/

> Stick a unique index on the field and you'll be fine.

that's fine - but then validates_uniqness_of is not needed - right
now it's just there to provide users access to an easy a race
condition alongside find_or_create_by. although the answer is
simple, it simply raises questions, for me at least, about whether
those methods should even exist. my experience is that, even though
they should be, most web developers are not versed in acid. pun
intended.

No, you still need the validation otherwise instead of a "name has
already been taken" validation error you get an internal server error.

validates_uniqueness_of gives a nice error message, and does an ok job
at guaranteeing uniqueness. Validates uniqueness of + a unique index
does both.

yeah i understand that - i just don't get warm and fuzzies using 'ok' and 'guarantee' in this context since you still end up at internal server errors some of the time. it's not required to throw and internal error - the db conn error can be rescued (UniqueError etc) and dealt with. the api for dealing with it can be 'validates_uniqness_of'. i've done this in some old pg specific code and i realize this is just a bunch of noise without some code - i'll see if i can dig it up...

cheers.

a @ http://codeforpeople.com/

interesing comment from boulder/denver ruby group:

"I think the problem stems largely from the ActiveRecord connection adapter
API. It provides no mechanism for acquiring or releasing table level locks.

If you could get a table level lock, then checking the uniqueness of a value
before performing an insert/update could be an atomic, as well as performing
a find_or_create_by type operation"

a @ http://codeforpeople.com/

Table Level locking is kinda nasty, and a great way to kill your
throughput. If you're already creating so many 'foos' concurrently
that you're hitting this race condition, chances are you don't want to
continually lock and unlock the table. A unique index solves all of
the problems but leaves it up to the database to handle whatever
locking is required. I'm a little confused why we're looking for
another solution.

Hey Ara, a one-two punch of validates_uniqueness_of plus a unique
index on those fields gives you nice error messages in the common case
and a rare database exception otherwise. This should definitely be in
the docs.

Now, if we could parse the db constraint violation error message to
deduce the type of constraint that failed and the affected fields,
then we could turn the dumb exception (StatementInvalid) into a smart
one (UniquenessConstraintViolation), rescue it in
Base#save_with_validations, and add the message to record.errors.
Unfortunately I tried it with postgres a while back and the db errors
left too much up to the imagination to get much further than a neat
demo.

Best,
jeremy

I discovered the same problem a year and a half ago. I asked around on
mailing lists and IRC and the answers ranged from "what's the problem"
to "your design is stupid" - nobody really had a good answer. Table
locks are expensive, and nesting table locks is hard to impossible
because there's only one unlock phase. And last time I tried
serializable transaction level on MySQL to solve this problem, MySQL
somehow detected a deadlock.

Combining validates_uniqueness_of and unique constraint is the best
solution right now. You can parse database error messages to detect
unique constraint failures but unfortunately that's database dependent.

yup. that certainly seems to be the consensus, this comment followed (which is more inline with what i was suggesting):

"
This approach would work but I have a hard time imagining wanting to
take the performance hit of a table lock on any operation that happens
often enough to make this race condition a real issue.

A workable approach for the case of find_or_create is to enforce the
uniqueness with a database constraint and do this in ruby code:

   1. do a find
   2a. return the model object if you found it
   2b. if it does not exist try to insert it
   3a. if the insert succeeds return the new object
   3b. if the insert fails re-run the find and return the result of
       that.

We use this pattern in a couple of the high volume models where I work
and it works quite well. With PostgreSQL, can achieve this by creating
  a "savepoint" before attempting the inserts and then rolling back to
the savepoint if the insert fails.

For normal inserts and updates you could rely on database constraints
for the data integrity, but use validations for pretty error messages.
AR could then detect an insert/update failure and, in that case, it
could rerun the validations to figure out the appropriate message(s).
The validations could even be mechanically created from the database
constraints in many cases (see http://drysql.rubyforge.org/ for an
example of this).
"

the bit about re-running the validations is quite interesting.

cheers.

a @ http://codeforpeople.com/

Hey Ara, a one-two punch of validates_uniqueness_of plus a unique
index on those fields gives you nice error messages in the common case
and a rare database exception otherwise. This should definitely be in
the docs.

that's fair enough and a great start imho. i've never contributed docs but know marcel is the gatekeeper - what's the protocol?

Now, if we could parse the db constraint violation error message to
deduce the type of constraint that failed and the affected fields,
then we could turn the dumb exception (StatementInvalid) into a smart
one (UniquenessConstraintViolation), rescue it in
Base#save_with_validations, and add the message to record.errors.
Unfortunately I tried it with postgres a while back and the db errors
left too much up to the imagination to get much further than a neat
demo.

i played with this a bit and you are quite right. this is quite interesting

   http://drawohara.tumblr.com/post/19178992

but of little practical value. serialized transactions work too of course - sans speed.

cheers.

a @ http://codeforpeople.com/

From the MySQL manual:

14.2.10.6. Next-Key Locking: Avoiding the Phantom Problem
In row-level locking, InnoDB uses an algorithm called next-key
locking. InnoDB performs the row-level locking in such a way that when
it searches or scans an index of a table, it sets shared or exclusive
locks on the index records it encounters. Thus, the row-level locks
are actually index record locks.

The locks InnoDB sets on index records also affect the "gap" before
that index record. If a user has a shared or exclusive lock on record
R in an index, another user cannot insert a new index record
immediately before R in the index order. This locking of gaps is done
to prevent the so-called "phantom problem." Suppose that you want to
read and lock all children from the child table having an identifier
value greater than 100, with the intention of updating some column in
the selected rows later:

SELECT * FROM child WHERE id > 100 FOR UPDATE;

Suppose that there is an index on the id column. The query scans that
index starting from the first record where id is bigger than 100. If
the locks set on the index records would not lock out inserts made in
the gaps, a new row might meanwhile be inserted to the table. If you
execute the same SELECT within the same transaction, you would see a
new row in the result set returned by the query. This is contrary to
the isolation principle of transactions: A transaction should be able
to run so that the data it has read does not change during the
transaction. If we regard a set of rows as a data item, the new
"phantom" child would violate this isolation principle.

When InnoDB scans an index, it can also lock the gap after the last
record in the index. Just that happens in the previous example: The
locks set by InnoDB prevent any insert to the table where id would be
bigger than 100.

You can use next-key locking to implement a uniqueness check in your
application: If you read your data in share mode and do not see a
duplicate for a row you are going to insert, then you can safely
insert your row and know that the next-key lock set on the successor
of your row during the read prevents anyone meanwhile inserting a
duplicate for your row. Thus, the next-key locking allows you to
"lock" the non-existence of something in your table.
###END QUOTE###

As far as I can judge locking FOR UPDATE can effectively save us from
the burden of parse-db-error-messages-and-retry approach, provided
that unique index is set on the corresponding column of course.
I dream I could do like this:
validates_uniqueness_of :something, :lock => true
and this would use the statements like the following:
SELECT .... WHERE something = ? FOR UPDATE
INSERT ....
This is not equal to table-level locking as in our case (uniqueness
checking) only access to a certain index gap is serialized and neither
other gaps nor existing rows are locked.

That's cool. I thought FOR UPDATE wouldn't prevent inserts. Is this
also true for the other databases we support?

jeremy

> As far as I can judge locking FOR UPDATE can effectively save us from
> the burden of parse-db-error-messages-and-retry approach, provided
> that unique index is set on the corresponding column of course.
> I dream I could do like this:
> validates_uniqueness_of :something, :lock => true
> and this would use the statements like the following:
> SELECT .... WHERE something = ? FOR UPDATE
> INSERT ....
> This is not equal to table-level locking as in our case (uniqueness
> checking) only access to a certain index gap is serialized and neither
> other gaps nor existing rows are locked.

That's cool. I thought FOR UPDATE wouldn't prevent inserts.

It doesn't. It locks all the rows it selects, but since there's no
row yet, there's nothing to lock.

Assaf

After some digging I can tell that... not really...

MySQL(InnoDB): true
H2: true. side-effect of total absence of row-level locking, so locks
the whole table as any DML statement would.

SQLServer: "key-range" locking supported and effect achievable but
with different syntax (locking hints in the FROM clause)

FireBird, OpenBase, Mimer: couldn't find relevant info so far

PostgreSQL: looks like false?
Oracle: false
FrontBase: false
Sybase: who cares, it's discontinued anyway

false because non-transactional: SQLite, Derby, hsqldb

Ok, it seems that's rather non-standard feature of MySQL. Is it a
stopper? MySQL is so mainstream that it may be worth adding anyway(?)

Sybase who cares?...actually it is possible that Sybase iAnywhere
(developer of SQLAnywhere: Former ASA) itself will support a driver
for ruby on rails.