Validates_uniqueness_of doesn't really guarantee uniqueness

So there seems to be a recurring discussion happening with me and people that are new to rails - is that ActiveRecord::Validations::validates_uniqueness_of doesn’t really guarantee that record will be unique.

There are workarounds, even some gems… but still, it felt awkward before, it feels awkward now.

9 Likes

This is an insidious landmine. Using uniqueness validations without an underlying unique index should be warned.

4 Likes

FWIW the latest version of rubocop-rails warned me about this on a project. I’m not sure when this check landed in Rubocop. Perhaps Rails could implement a similar check?

There was actually a proposed PR for this, just saying.

https://github.com/rails/rails/pull/34725

2 Likes

Thank you for bring that up.

I was open to that PR before and I still am. The company I work even has a gem for that https://github.com/Shopify/activerecord-rescue_from_duplicate.

If someone could finish it it would great.

1 Like

I wonder if the problem though is deeper.

When a newbie adds validates_uniqueness_of :column this is not a guarantee before they also do a add_index :table, :column, unique: true

Maybe in dev the validator should also confirm on boot that the backing index really is there and raise something if it is not by looking at the information schema.

Better error handling sure is great, but how do we teach people that they also need some sort of index to ensure uniqueness?

Maybe a simpler way to resolve this is to deprecate validates_uniqueness_of altogether and have it direct the users at creating a unique index on the table instead?

6 Likes

The validation is useful to get specific errors. I think you were right on the money with this:

1 Like

https://github.com/toptal/database_validations/ comes with a uniqueness index checker, which is turned on by default.

It yields a pretty solid experience when used together with https://github.com/djezzzl/database_consistency of the same author.

2 Likes

I also pinged original author of PR, maybe he is still willing to finish it.

https://github.com/toptal/database_validations/issues/18

I’m happy to help out in any way possible - I wasted ridiculous amount of human hours dealing with issues related to this. And I’m confident it will save a lot of headache not only for me, but for a lot of other people.

Maybe a simpler way to resolve this is to deprecate validates_uniqueness_of altogether and have it direct the users at creating a unique index on the table instead?

Often you want both. In the Elixir world, Ecto’s original way of ensuring uniqueness was to try the insert, notice if it fails due to the unique constraint, and turn that db error into a user-friendly one. This is still the “main” way we do it.

However, imagine you’re trying to guarantee uniqueness of a username. You go to insert a record when somebody signs up, and the name is taken. When was that name taken? 99.999% of the time, it was taken yesterday or a year ago or whatever, not 10 milliseconds ago in a concurrent request. (This is why the validation works OK most of the time, especially for low-traffic sites; you might not notice the possible race condition until you start getting many sign-ups per second.)

You need the index for that rare race condition, but the “do a SELECT to check for it before we do the INSERT” validation approach will catch the vast majority of cases, and it will catch the uniqueness issue at the same time as other validation issues (eg blank first name), whereas if you violate multiple database constraints with an INSERT, the database (at least this is true for PostgreSQL) will only tell you about the first one it notices. And trying the INSERT over and over to find all the issues isn’t a great workflow.

Ultimately we added an unsafe_validate_unique function to Ecto to cover this. The name tells you not to rely on it, but it can improve user experience if you have that in front of your uniqueness constraint.

Using uniqueness validations without an underlying unique index should be warned.

I think that’s the right approach. And turning the RecordNotUnique exception into a user-friendly error should be the default, IMO.

6 Likes

BTW, a paper a few years ago examined 67 open source Rails projects and concluded that this was a common problem.

We found that, in contrast with traditional transaction processing, these applications overwhelmingly prefer to leverage application-level feral support for data integrity, typically in the form of declarative (sometimes user-defined) validation and association logic. Despite the popularity of these invariants, we find limited use of in-database support to correctly implement them, leading to a range of quantifiable inconsistencies for Rails’ built-in uniqueness and association validations.

See Feral Concurrency Control:An Empirical Investigation of Modern Application Integrity.

And as it implies, the issue is broader than unique indexes.

As I’ve written elsewhere:

While application code can generally produce friendlier user errors, only the database knows what data it has right now - even if the application code ran a SELECT a few milliseconds ago, there may have been an UPDATE since… So any attempts to validate data in relation to other data can only be reliably done by the database itself at the moment you insert or update data .

You could get the database to guarantee consistency by use of indexes, or by locking a whole table before you insert a record (:frowning:), but one way or another, the database has to help you.

So stuff like:

  • Don’t insert a username that duplicates one that exists right now (unique constraint)
  • Don’t allow a comment for a post that doesn’t exist right now (foreign keys)
  • Don’t insert a reservation which overlaps the dates of a reservation that exists right now (exclusion constraints)

See my post Protect Your Data with PostgreSQL Constraints for more.

1 Like

For what is worth, rubocop-rails recently added this static check, it’d be nice of Rails had something like that built-in.

https://github.com/rubocop-hq/rubocop-rails/pull/197

Maybe by employing index_exists?

2 Likes

I like the idea of making non-index-backed uniqueness validations warn the user, but I’m a little leery of a full-on error – “sharp knives” and all that.

Maybe an error or warning you could shut up by adding an index: false parameter to the validation? Of course then you’d be getting people adding that indiscriminately to shut Rails up, but there’s only so far you can go in saving people from themselves.

4 Likes

Hey everyone!

I’m ready to help with the PR but I’d like to know what core team expects from it?

Do we want to have a second validation called safe_uniqueness / db_uniqueness or something which includes:

  • check on boot time that unique index is existed
  • rescue on RecordNotUnique error if the proper validator exists

Or we want to adjust the existing one with some option, e.g. constraint: true ?

5 Likes

@rafaelfranca :point_up:

Any updates @rafaelfranca for @DjezzzL ?:slight_smile: