Opinion about new finder: :unique

Hi guys,

Pratik pointed me here. I have submitted feature request #2974.

It's basically a new finder: unique. To be used at times where people
use :first.
Often, you query for something, and expect only a single record.

To do this, people use :first. However, :first does not check for any
following entities. This means that if the developer expects 1
records, and gets more than 1 the application will 'just work'.

I don't think this is correct behaviour. Also, :first and :last mean:
first / last of the list. If you're looking for a single row, unique
makes more sense

I’d have to say I think :one would be a better choice, or :single. I agree with the request though. :unique brings to mind SELECT DISTINCT and is misleading in my opinion.

Paul Bowsher

Yeah I agree with Paul that :unique is a bit misleading. Also, I'm not
sure if I've had a case using .first() where it'd matter if there were
more than one matching row. Could you mention your actual use cases ?

Thanks.

This just feels a little backwards. Enforcement of how many records are in the db is not the responsibility of the finder. Uniqueness should be enforced in the database and in the parts of the application that insert and update data.

I concede it would be useful to have a more-flexible form of ActiveRecord#exists? so I could check whether too-many results would be returned by a set of conditions, but without having to invoke a large COUNT.

Perhaps something like ActiveRecord#exists_more_than_n_records?(n)
This way I could ask ((find :all, :limit => n+1).size > n)

Re use cases, we a comparable thing in one of my projects (:only would
also be my vote for the option name):

module Enumerable
  # Like #first and #last, but raises an error if there's more than one item in
  # the collection. Useful to make code that only makes sense if there's only
  # one of something "fail fast" without needing to repetitively assert this.
  def only
    raise "should only have one element out of #{inspect}" if size > 1
    first
  end
end

It's proved very useful for applying + verifying assumptions cleanly
throughout our codebase - often we find that the process business
logic is such that at that point in time, through the parent
association (or whatever) there is theoretically only one of
something, even though it needs to be (say) a has_many association in
the general case.

But, sometimes using first or last hid bugs or historical data
problems, so I added #only and we started using it, and now those
problems often get caught. It's particularly useful in test set-up
code.

If we had an association finder option, that'd be useful in a fairly
similar way, so I imagine Joris has seen similar problems?

Cheers,
Will

I’m still not sure what are we tring to solve here. Here’s what we have now:

User.first(:order => ‘id’) #=> gets me the first user

User.first(:conditions => { :username => ‘mislav’ }) #=> gets me a specific user

I designed the domain model, so I know that “username” is a unique key in the database. If I was querying against a column that’s non-unique, I would have used User.all(…) and then do some inspecting of the resultset and apply some business logic to it.

How does that “feel” wrong? If an implicit guard like Will Bryant’s code makes you feel safer, there’s nothing preventing you adding it. But I think core material should be something that everybody is encouraged to use—for a solid reason.

If you know something should return a single row, then you might as
well check check if it's really the case.

Also, first has a different meaning than one/unique.
If I say that I want the first of a query, there could be more results
in the resultset.

The 'first user' example is a good example of a case where you'd want
to use first. You could also use last.
But the second query, if username should be unique, User.one or
User.unique better states what you're trying to do. You could also
write User.last(.....), but that would look awkward.

If I would mistakenly write the query this way:
User.first(:conditions => :firstname => 'mislav'), I wouldn't be
notified if the query returned 3 mislav users.

If you know something should return a single row, then you might as

well check check if it’s really the case.

Also, first has a different meaning than one/unique.

If I say that I want the first of a query, there could be more results

in the resultset.

The ‘first user’ example is a good example of a case where you’d want

to use first. You could also use last.

But the second query, if username should be unique, User.one or

User.unique better states what you’re trying to do. You could also

write User.last(…), but that would look awkward.

If I would mistakenly write the query this way:

User.first(:conditions => :firstname => ‘mislav’), I wouldn’t be

notified if the query returned 3 mislav users.

I have to agree with Mislav and Duncan here, if firstname should be unique, a finder is not the place to enforce that. Those 2 other “mislav” users should not be in the database in the first place! There may be a compelling use-case for a find(:only/unique/make_sure_there_is_only_one_result), but late detection of invalid data isn’t it.

To further explain my reasoning:

To me, first() is a method I almost exclusively use in the console. I think of it like “get me the first record that matches some conditions or scope I specify”. If there was a unique or single method, would the implementation be much different?

For production code, I use dynamic finders against columns that are constrained with a unique key—e.g.:

find(1)

find_by_username(‘mislav’)

The implementation for these is (I believe) identical to first. I only use first in combination with ordering to say “get me the user with most comments” or “get me the most recent comment”, but this is only for presentational purposes and you can agree that such a use case doesn’t come often.

But, it’s obvious that it’s completely wrong to use first or find_by on some column(s) that are not under a unique constraint. This should always be used:

find_all_by_username(‘mislav’)

I guess that all I’m trying to say can be summed up as:

  1. if the column is unique-constrained, you should use find_by_column (or first with conditions or scope);
  2. if the column is not unique, you must use find_all_by_column (or alternatives);
  3. there’s no need for an extra method.

if the column is unique-constrained, you should use find_by_column (or
`first` with conditions or scope);
if the column is not unique, you must use find_all_by_column (or
alternatives);
there's no need for an extra method.

In a world where developers never make assumptions, I'd agree with you
entirely. However we do, and we frequently make them incorrectly.

If you believe the column (or other condition) is unique, and want to
fail-fast if you're incorrect, then you *do* need this. Take the
following:

Status.find_by_code("cancelled")

If I build this right, and put a unique index and a NOT NULL on the
code column, you're right there's no difference. However if I leave
off that unique index because I forgot it or mis-merged a migration
then my code will be 'randomly' returning one of the two cancelled
statuses and slowly messing up my data leaving a huge clean up job.

This to me is akin to foo.bars.create! where there are *currently* no
validations but you want exceptions rather than silent failures if you
add one later on.

That's exactly my point.
If we know something will return 1 row, why would we need to enforce
the constraint with a unique constraint?

"Assumption is the mother of all f*** ups" :slight_smile:

If there's a possibility for the framework to:
fail fast
AND
provide a method that more precisely states what you're trying to do,
then why not implement that?

have a nice weekend

If you believe this, you should have database setup and tests checking for this, not testing-related code in production code slowing everything down. The test should catch whatever you fail to do at the database level.

Uniquenes should be maintained in the database, because
a) we have a cheap and quick mechanism which fails quickly (unique index)
b) it is insanely expensive to guarantee database record uniqueness in application code and most people who try are doing it wrong

What you are saying is that adding :unique is just replacement for not having tests. Encouragement for not writing tests probably doesn't belong in the core :slight_smile:

There may exist some validations which are record-local or so complex that application code is the right approach, but database integrity should really be done in database if at all possible...

izidor

It's a very slippery slope if the core starts trying to predict and
protect developers from non-Rails assumptions.

Where an assumption is

Jason King

It's a very slippery slope if the core starts trying to predict and
protect developers from non-Rails assumptions.

You could make a similar case regarding escaping HTML output, but I
think that's a useful feature for rails to have. I think the same
about a :unique or similarly named feature, I would find it useful and
would use it on User.find_by_name and similar calls. Yes, I have
unique constraints in my database and use validates_uniqueness_of, but
I find it useful in these cases to practice defensive programming.

- donald

I agree. It is kind of a finder mixed with an assertion. Doesn't fit
in core in my view. It's easy to write it yourself nonetheless.

if the column is unique-constrained, you should use find_by_column (or
`first` with conditions or scope);
if the column is not unique, you must use find_all_by_column (or
alternatives);
there's no need for an extra method.

In a world where developers never make assumptions, I'd agree with you
entirely. However we do, and we frequently make them incorrectly.

If you believe the column (or other condition) is unique, and want to
fail-fast if you're incorrect, then you *do* need this. Take the
following:

Status.find_by_code("cancelled")

If I build this right, and put a unique index and a NOT NULL on the
code column, you're right there's no difference. However if I leave
off that unique index because I forgot it or mis-merged a migration
then my code will be 'randomly' returning one of the two cancelled
statuses and slowly messing up my data leaving a huge clean up job.

If you forgot to adjust your schema to your domain model, is it the
framework's responsibility to fix that? I'd say it's *your*
responsibility. If something weird (like a merge gone wrong) breaks
your schema, again, it's not the framework's responsibility (IMO) to
fix this.

In any case, proper tests will ensure this works fine.

I honestly haven't been bit by "forgetting" to add a :unique index. I
agree that YMMV and all that, but this looks more like something
belonging in a plugin than in core.

Just my ¢2

-foca

Irrespective of the argument regarding where this uniqueness
constraint belongs, this feature seems like a solution for a
relatively uncommon use case. As Xavier pointed out, and as Will
Bryant demonstrated earlier in this thread, it's possible to implement
in a few lines of code. Rails, like any framework, should focus on
providing core value, not on providing sugar for every possible use
case.

What's wrong with:

Model.find(:all, :limit => 2).size

If size > 1 you know there's more than 1 record in the database