ar validations cannot work unless transaction isolation level is serialzable

given that this happens to be the case

   http://drawohara.tumblr.com/post/18926188

it seems that *any* validation should alter the behavior of Base#create to force the isolation level - otherwise validations end up being nothing be a stack of race conditions. sorry if i'm late to the party but this came as a real surprise to me. at the very least it seems the docs should have a big fat warning about this or the code itself should warn when using validations and the client code is not under a strictly serializable transaction.

kind regards

a @ http://codeforpeople.com/

given that this happens to be the case

   http://drawohara.tumblr.com/post/18926188

it seems that *any* validation should alter the behavior of Base#create to force the isolation level - otherwise validations end up being nothing be a stack of race conditions. sorry if i'm late to the party but this came as a real surprise to me. at the very least it seems the docs should have a big fat warning about this or the code itself should warn when using validations and the client code is not under a strictly serializable transaction.

I'm not sure that this applies to any validations other than validates_uniqueness_of?

But yes, you need a unique index or serializable transactions if you want this guaranteed. If there's a database-agnostic way to implement the transaction stuff that sounds like a good idea, otherwise we could at least warn in the documentation.

I'm not sure that this applies to any validations other than validates_uniqueness_of?

   validates_associated    validates_each    validates_length_of # esp if a text value is being appended to

but you are right in pointing out that as the main one.

But yes, you need a unique index or serializable transactions if you want this guaranteed. If there's a database-agnostic way to implement the transaction stuff that sounds like a good idea, otherwise we could at least warn in the documentation.

i'd vote for sticking

   def transaction options = {}    end

as a start, into the base connection adapter, individual adapters can then make appropriate use of the :isolation key which #create could be refactored to use iff validation is present on the record's model class being saved. overall the ar code doesn't seem to have put a ton of concern into transactions and passing options through would at least let the connection adapters have the option (no pun intended) :wink:

thoughts?

btw - we had this happen three times today. the culprit was submit button that is not disabled after the first click and people were double clicking it out of habbit - thus two requests come in quite close together. js can solve it, but i'm not comfortable maintaining db integrity via js :wink:

a @ http://codeforpeople.com/

   validates_associated    validates_each

These two just aggregate other validations

   validates_length_of # esp if a text value is being appended to

save always provides the full value of every column, there's no way you can be 'appending' and using active record's save.

i'd vote for sticking

   def transaction options = {}    end

thoughts?

This definitely sounds like a long term option to consider, assuming the terminology and api can be unified across all the databases etc.

btw - we had this happen three times today. the culprit was submit button that is not disabled after the first click and people were double clicking it out of habbit - thus two requests come in quite close together. js can solve it, but i'm not comfortable maintaining db integrity via js :wink:

Stick a unique index on the field and you'll be fine.

This definitely sounds like a long term option to consider, assuming the terminology and api can be unified across all the databases etc.

i've already pulled out a ton of transaction code that supports nested transactions too - i'll see if i can work this in

Stick a unique index on the field and you'll be fine.

that's fine - but then validates_uniqness_of is not needed - right now it's just there to provide users access to an easy a race condition alongside find_or_create_by. although the answer is simple, it simply raises questions, for me at least, about whether those methods should even exist. my experience is that, even though they should be, most web developers are not versed in acid. pun intended.

cheers.

a @ http://codeforpeople.com/

> Stick a unique index on the field and you'll be fine.

that's fine - but then validates_uniqness_of is not needed - right now it's just there to provide users access to an easy a race condition alongside find_or_create_by. although the answer is simple, it simply raises questions, for me at least, about whether those methods should even exist. my experience is that, even though they should be, most web developers are not versed in acid. pun intended.

No, you still need the validation otherwise instead of a "name has already been taken" validation error you get an internal server error.

validates_uniqueness_of gives a nice error message, and does an ok job at guaranteeing uniqueness. Validates uniqueness of + a unique index does both.

yeah i understand that - i just don't get warm and fuzzies using 'ok' and 'guarantee' in this context since you still end up at internal server errors some of the time. it's not required to throw and internal error - the db conn error can be rescued (UniqueError etc) and dealt with. the api for dealing with it can be 'validates_uniqness_of'. i've done this in some old pg specific code and i realize this is just a bunch of noise without some code - i'll see if i can dig it up...

cheers.

a @ http://codeforpeople.com/

interesing comment from boulder/denver ruby group:

"I think the problem stems largely from the ActiveRecord connection adapter API. It provides no mechanism for acquiring or releasing table level locks.

If you could get a table level lock, then checking the uniqueness of a value before performing an insert/update could be an atomic, as well as performing a find_or_create_by type operation"

a @ http://codeforpeople.com/

Table Level locking is kinda nasty, and a great way to kill your throughput. If you're already creating so many 'foos' concurrently that you're hitting this race condition, chances are you don't want to continually lock and unlock the table. A unique index solves all of the problems but leaves it up to the database to handle whatever locking is required. I'm a little confused why we're looking for another solution.

Hey Ara, a one-two punch of validates_uniqueness_of plus a unique index on those fields gives you nice error messages in the common case and a rare database exception otherwise. This should definitely be in the docs.

Now, if we could parse the db constraint violation error message to deduce the type of constraint that failed and the affected fields, then we could turn the dumb exception (StatementInvalid) into a smart one (UniquenessConstraintViolation), rescue it in Base#save_with_validations, and add the message to record.errors. Unfortunately I tried it with postgres a while back and the db errors left too much up to the imagination to get much further than a neat demo.

Best, jeremy

I discovered the same problem a year and a half ago. I asked around on mailing lists and IRC and the answers ranged from "what's the problem" to "your design is stupid" - nobody really had a good answer. Table locks are expensive, and nesting table locks is hard to impossible because there's only one unlock phase. And last time I tried serializable transaction level on MySQL to solve this problem, MySQL somehow detected a deadlock.

Combining validates_uniqueness_of and unique constraint is the best solution right now. You can parse database error messages to detect unique constraint failures but unfortunately that's database dependent.

yup. that certainly seems to be the consensus, this comment followed (which is more inline with what i was suggesting):

" This approach would work but I have a hard time imagining wanting to take the performance hit of a table lock on any operation that happens often enough to make this race condition a real issue.

A workable approach for the case of find_or_create is to enforce the uniqueness with a database constraint and do this in ruby code:

   1. do a find    2a. return the model object if you found it    2b. if it does not exist try to insert it    3a. if the insert succeeds return the new object    3b. if the insert fails re-run the find and return the result of        that.

We use this pattern in a couple of the high volume models where I work and it works quite well. With PostgreSQL, can achieve this by creating   a "savepoint" before attempting the inserts and then rolling back to the savepoint if the insert fails.

For normal inserts and updates you could rely on database constraints for the data integrity, but use validations for pretty error messages. AR could then detect an insert/update failure and, in that case, it could rerun the validations to figure out the appropriate message(s). The validations could even be mechanically created from the database constraints in many cases (see http://drysql.rubyforge.org/ for an example of this). "

the bit about re-running the validations is quite interesting.

cheers.

a @ http://codeforpeople.com/

Hey Ara, a one-two punch of validates_uniqueness_of plus a unique index on those fields gives you nice error messages in the common case and a rare database exception otherwise. This should definitely be in the docs.

that's fair enough and a great start imho. i've never contributed docs but know marcel is the gatekeeper - what's the protocol?

Now, if we could parse the db constraint violation error message to deduce the type of constraint that failed and the affected fields, then we could turn the dumb exception (StatementInvalid) into a smart one (UniquenessConstraintViolation), rescue it in Base#save_with_validations, and add the message to record.errors. Unfortunately I tried it with postgres a while back and the db errors left too much up to the imagination to get much further than a neat demo.

i played with this a bit and you are quite right. this is quite interesting

   http://drawohara.tumblr.com/post/19178992

but of little practical value. serialized transactions work too of course - sans speed.

cheers.

a @ http://codeforpeople.com/

From the MySQL manual:

14.2.10.6. Next-Key Locking: Avoiding the Phantom Problem In row-level locking, InnoDB uses an algorithm called next-key locking. InnoDB performs the row-level locking in such a way that when it searches or scans an index of a table, it sets shared or exclusive locks on the index records it encounters. Thus, the row-level locks are actually index record locks.

The locks InnoDB sets on index records also affect the "gap" before that index record. If a user has a shared or exclusive lock on record R in an index, another user cannot insert a new index record immediately before R in the index order. This locking of gaps is done to prevent the so-called "phantom problem." Suppose that you want to read and lock all children from the child table having an identifier value greater than 100, with the intention of updating some column in the selected rows later:

SELECT * FROM child WHERE id > 100 FOR UPDATE;

Suppose that there is an index on the id column. The query scans that index starting from the first record where id is bigger than 100. If the locks set on the index records would not lock out inserts made in the gaps, a new row might meanwhile be inserted to the table. If you execute the same SELECT within the same transaction, you would see a new row in the result set returned by the query. This is contrary to the isolation principle of transactions: A transaction should be able to run so that the data it has read does not change during the transaction. If we regard a set of rows as a data item, the new "phantom" child would violate this isolation principle.

When InnoDB scans an index, it can also lock the gap after the last record in the index. Just that happens in the previous example: The locks set by InnoDB prevent any insert to the table where id would be bigger than 100.

You can use next-key locking to implement a uniqueness check in your application: If you read your data in share mode and do not see a duplicate for a row you are going to insert, then you can safely insert your row and know that the next-key lock set on the successor of your row during the read prevents anyone meanwhile inserting a duplicate for your row. Thus, the next-key locking allows you to "lock" the non-existence of something in your table. ###END QUOTE###

As far as I can judge locking FOR UPDATE can effectively save us from the burden of parse-db-error-messages-and-retry approach, provided that unique index is set on the corresponding column of course. I dream I could do like this: validates_uniqueness_of :something, :lock => true and this would use the statements like the following: SELECT .... WHERE something = ? FOR UPDATE INSERT .... This is not equal to table-level locking as in our case (uniqueness checking) only access to a certain index gap is serialized and neither other gaps nor existing rows are locked.

That's cool. I thought FOR UPDATE wouldn't prevent inserts. Is this also true for the other databases we support?

jeremy

> As far as I can judge locking FOR UPDATE can effectively save us from > the burden of parse-db-error-messages-and-retry approach, provided > that unique index is set on the corresponding column of course. > I dream I could do like this: > validates_uniqueness_of :something, :lock => true > and this would use the statements like the following: > SELECT .... WHERE something = ? FOR UPDATE > INSERT .... > This is not equal to table-level locking as in our case (uniqueness > checking) only access to a certain index gap is serialized and neither > other gaps nor existing rows are locked.

That's cool. I thought FOR UPDATE wouldn't prevent inserts.

It doesn't. It locks all the rows it selects, but since there's no row yet, there's nothing to lock.

Assaf

After some digging I can tell that... not really...

MySQL(InnoDB): true H2: true. side-effect of total absence of row-level locking, so locks the whole table as any DML statement would.

SQLServer: "key-range" locking supported and effect achievable but with different syntax (locking hints in the FROM clause)

FireBird, OpenBase, Mimer: couldn't find relevant info so far

PostgreSQL: looks like false? Oracle: false FrontBase: false Sybase: who cares, it's discontinued anyway

false because non-transactional: SQLite, Derby, hsqldb

Ok, it seems that's rather non-standard feature of MySQL. Is it a stopper? MySQL is so mainstream that it may be worth adding anyway(?)

Sybase who cares?...actually it is possible that Sybase iAnywhere (developer of SQLAnywhere: Former ASA) itself will support a driver for ruby on rails.