So there seems to be a recurring discussion happening with me and people that are new to rails - is that ActiveRecord::Validations::validates_uniqueness_of doesn’t really guarantee that record will be unique.
There are workarounds, even some gems… but still, it felt awkward before, it feels awkward now.
FWIW the latest version of rubocop-rails warned me about this on a project. I’m not sure when this check landed in Rubocop. Perhaps Rails could implement a similar check?
When a newbie adds validates_uniqueness_of :column this is not a guarantee before they also do a add_index :table, :column, unique: true
Maybe in dev the validator should also confirm on boot that the backing index really is there and raise something if it is not by looking at the information schema.
Better error handling sure is great, but how do we teach people that they also need some sort of index to ensure uniqueness?
Maybe a simpler way to resolve this is to deprecate validates_uniqueness_of altogether and have it direct the users at creating a unique index on the table instead?
I’m happy to help out in any way possible - I wasted ridiculous amount of human hours dealing with issues related to this. And I’m confident it will save a lot of headache not only for me, but for a lot of other people.
Maybe a simpler way to resolve this is to deprecate validates_uniqueness_of altogether and have it direct the users at creating a unique index on the table instead?
Often you want both. In the Elixir world, Ecto’s original way of ensuring uniqueness was to try the insert, notice if it fails due to the unique constraint, and turn that db error into a user-friendly one. This is still the “main” way we do it.
However, imagine you’re trying to guarantee uniqueness of a username. You go to insert a record when somebody signs up, and the name is taken. When was that name taken? 99.999% of the time, it was taken yesterday or a year ago or whatever, not 10 milliseconds ago in a concurrent request. (This is why the validation works OK most of the time, especially for low-traffic sites; you might not notice the possible race condition until you start getting many sign-ups per second.)
You need the index for that rare race condition, but the “do a SELECT to check for it before we do the INSERT” validation approach will catch the vast majority of cases, and it will catch the uniqueness issue at the same time as other validation issues (eg blank first name), whereas if you violate multiple database constraints with an INSERT, the database (at least this is true for PostgreSQL) will only tell you about the first one it notices. And trying the INSERT over and over to find all the issues isn’t a great workflow.
Ultimately we added an unsafe_validate_unique function to Ecto to cover this. The name tells you not to rely on it, but it can improve user experience if you have that in front of your uniqueness constraint.
Using uniqueness validations without an underlying unique index should be warned.
I think that’s the right approach. And turning the RecordNotUnique exception into a user-friendly error should be the default, IMO.
BTW, a paper a few years ago examined 67 open source Rails projects and concluded that this was a common problem.
We found that, in contrast with traditional transaction processing, these applications overwhelmingly prefer to leverage application-level feral support for data integrity, typically in the form of declarative (sometimes user-defined) validation and association logic. Despite the popularity of these invariants, we find limited use of in-database support to correctly implement them, leading to a range of quantifiable inconsistencies for Rails’ built-in uniqueness and association validations.
And as it implies, the issue is broader than unique indexes.
As I’ve written elsewhere:
While application code can generally produce friendlier user errors, only the database knows what data it has right now - even if the application code ran a SELECT a few milliseconds ago, there may have been an UPDATE since… So any attempts to validate data in relation to other data can only be reliably done by the database itself at the moment you insert or update data .
You could get the database to guarantee consistency by use of indexes, or by locking a whole table before you insert a record (), but one way or another, the database has to help you.
So stuff like:
Don’t insert a username that duplicates one that exists right now (unique constraint)
Don’t allow a comment for a post that doesn’t exist right now (foreign keys)
Don’t insert a reservation which overlaps the dates of a reservation that exists right now (exclusion constraints)
I like the idea of making non-index-backed uniqueness validations warn the user, but I’m a little leery of a full-on error – “sharp knives” and all that.
Maybe an error or warning you could shut up by adding an index: false parameter to the validation? Of course then you’d be getting people adding that indiscriminately to shut Rails up, but there’s only so far you can go in saving people from themselves.