Possible bug with chained where methods on 3.0.4

HI

I notice on Rails 3.0.4 that the following two statements generate
different results.

State.where(:abbreviation => 'TX').where(:abbreviation => 'NE')
SQL: SELECT `states`.* FROM `states` WHERE (`states`.`abbreviation` =
'TX') OR (`states`.`abbreviation` = 'NE')

State.where(:abbreviation => 'TX').where("abbreviation = 'NE'")
SQL: SELECT `states`.* FROM `states` WHERE (`states`.`abbreviation` =
'TX') AND (`states`.`abbreviation` = 'NE')

Using Rails 3.0.3 they both generated AND clauses which seems to me to
be correct.

Is this a bug or by design please?

Colin

The OR functionality is new, but intentional. See
Relation::QueryMethods#collapse_wheres. Since the idea of having a
column being equal to two different values doesn't make sense,
equalities with the same left-hand side get combined with OR, which is
more likely the desired result if two scopes are merged and both
include equality conditions on the same column.

Since your example above used a string in the second where call, it
didn't result in an Arel::Nodes::Equality, so it wasn't ORed.

I doubt that. There are a lot of cases (especially when using
search/filter functionality of application) where you want it to
generate an empty set because there are no matching records. Everybody
everywhere in current code assumes that the condition will be added
with 'AND' operator and now we are breaking this assumption ?

If I want to have 'OR' then I would build the query this way:

State.where(:abbreviation => ['TX', 'NE'])
or using longer but perfectly fine and valid arel version...

An example:

def where_something(scope, value)
scope.where(:column => value)
end

This code is ambiguous now. I do not know what sql it is going to
generate. Will it be
"AND column = value"
or
"(... OR column = value)" ?

Please revert it to the original behavior and provide a different way
of achieving this like:

State.possible(:abbreviation => 'TX').possible(:abbreviation => 'NE') =>
sql: "states.abbrevation = 'TX' OR states.abbrevation = 'NE"

or whatever different method name that reveals the intentions clearly

Robert Pankowecki

Even on just that point alone, this is a really bad API idea - it is
completely inconsistent, and this will cause subtle and confusing bugs
when people change between two otherwise-identical forms of where()
argument.

And to do it at all breaks the relational algebra idea badly. If I
pass a scope to something, and it does several different queries on it
using its own where scopes, each of those scopes should further
restrict the original scope, not broaden it.

With this commit, now neither callers nor callees have any idea what
where(:hash => arguments) will result in - whether the returned
results will match the conditions hash - unless they have constructed
the entire scope from scratch.

I also struggle to understand how it was decided that a minor point
patch release - for a security issue no less - should include an API
behavior change that will cause preexisting code to start getting
completely different query results.

OK, thanks for that. If that is the way it is supposed to be then
that is the way it is supposed to be. It seems odd to me that
chaining a second where *adds* to the dataset of the first where. It
would seem particularly unexpected in the case
items = Model.where( :x => y)
...
...
myitems = items.where(:x => z)
resulting in myitems having more records than items, and in particular
records that do not satisfy :x => z. Note that at this point the code
would not necessarily 'know' the details of the first scope.

Colin

Sound arguments. I didn't implement collapse_wheres, just in case a
witch hunt starts. I was just the first person awake this morning to
pose the answer. :slight_smile:

The change probably shouldn't have made it into a point release, but I
do find a certain convenience and logic to it, and I long for a day
when we aren't hacking about trying to make a String and a Hash act
like they're the same thing. Hashes aren't strings, and (IMHO, and I
know there are people on the core team who disagree) it would be
better if we gradually eliminated hand coded SQL string conditions and
relied more heavily on hashes and other data structures, converting
them to SQL when needed via ARel. They're already a last resort in my
own code and I expect them to be relatively brittle when I use them.

I should also probably note that I went off on a sort of anti-String
tangent there, unrelated to the topic at hand. So please read:

"...a certain convenience and logic to it, and I long for..."

as:

"...a certain convenience and logic to it. On another note, I long
for..."

Thanks. :slight_smile:

Sound arguments. I didn't implement collapse_wheres, just in case a
witch hunt starts. I was just the first person awake this morning to
pose the answer. :slight_smile:

Understood. :slight_smile:

With this commit, now neither callers nor callees have any idea what
where(:hash => arguments) will result in - whether the returned
results will match the conditions hash - unless they have constructed
the entire scope from scratch.

I agree with this reasoning, too. Under the 3.0.4 logic, we won't be
able to chain independent scopes together on the fly with any
confidence. E.g., a loop that iterates over various input conditions
and adds scopes one by one based on the input.

+1 in favor of reverting to the original behavior: chaining 'where'
clauses should consistently use 'AND'.

Brian

+1 for the the original behavior where chained clauses are ANDed
(POLS).

The intentional change fixes this bug:

  https://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/4598-default_scope-treats-hashes-and-relations-inconsistently-when-overwriting

If there is a different way to fix that ticket, I'm happy to apply
patches.

The different way is to make both cases return an empty array. David
(who created the ticket) expects something different but I am not sure
that this is what most people expect.

Could we provide a different method for such behavior as expected by David ?

Robert Pankowecki

The intentional change fixes this bug:

https://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/4598-default_scope-treats-hashes-and-relations-inconsistently-when-overwriting

If there is a different way to fix that ticket, I’m happy to apply
patches.

The different way is to make both cases return an empty array. David
(who created the ticket) expects something different but I am not sure
that this is what most people expect.

Could we provide a different method for such behavior as expected by David ?

Agreed, the OR behavior seems inconsistent and unintuitive. In fact, the original solution suggested was to AND them together to make case 1 consistent with case 2 (rather than ORing them to make case 2 consistent with case 1).

https://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/4598-default_scope-treats-hashes-and-relations-inconsistently-when-overwriting#ticket-4598-2

That seems more sane and less radical than having a single special case where chained methods get ORd. Especially when Arel itself already has its very own #or method to accomplish this within the relation chaining.

It seems odd to me that chaining a second where adds to the dataset of the first where.

Exactly.

-Steve
@jangosteve

Perhaps:

#where_in == OR

#where == AND

???

Just want to get this onto the table, “what most people expect” is not the only measure of a good API choice.

To specifically answer Aaron’s question, for #4598, I think the fix for that would be for subsequent default_scopes to just overwrite previous ones entirely. Not to merge with them.

For the broader question which seems to be what this thread is about, ie. what to do with: User.where(:foo => 1).where(:foo => 2) how about adding alias and as an alias for where, and adding or as an alias for “where that does ORs”. So the syntax could be:

User.where(:foo => 1).and(:foo => 2)

And:

User.where(:foo => 1).or(:foo => 2)

I’m sure I don’t need to state what SQL each of those turn into (the real test for a good API :slight_smile:

Btw, that could also resolve the "how to combine default_scopes issue, if that’s something that is still desired.

default_scope where( :foo => 1 )

And then:

default_scope where( :foo => 2 ) # effectively negates previous default scope

default_scope and( :foo => 2 ) # same as above

default_scope or( :foo => 2 ) # merges with original

All nice and explicit.

AND:

User.or( :foo => 2 )

…would also combine with any existing default_scope, whereas:

User.where( :foo => 2 )

Would AND it and therefore negate the default scope.

Pretty sure the reason default scopes are cumulative is to support STI classes refining their parent classes default scope, but I could be wrong. Outside of that context, I can’t think of a use case for calling it multiple times, off the top of my head. Happy to be corrected, though.

I suspect the 'and' behavior is already pretty well understood as the result of chaining 'where's. What about an explicit one for the 'or' case:

User.where(:foo => 1).or_where(:foo => 2)

The only issue I could see with this syntax is that it's somewhat unclear what should happen here:

User.where(:foo => 1).where(:bar => 2).or_where(:bar => 3)

as it could be interpreted as either

Version A: (foo = 1 AND bar = 2) OR (bar = 2)

or

Version B: (foo = 1) AND (bar = 1 OR bar = 2)

The issue is especially tricky once you're constructing relations in more than one place. For instance, this should do something reasonable:

r = User.where(:foo => 1)
def foo(some_relation)
  some_relation.where(:bar => 1).or_where(:bar => 2) # probably expects version B SQL
end
foo(r) # probably expects version B SQL

but so should this:
r2 = User.where(:foo => 1).where(:bar => 1)
def other_foo(some_relation)
  some_relation.or_where(:bar => 2)
end
other_foo(r2) # probably expects version A SQL

the difficulty is that the sequence of method calls on the relation are identical, they're just in different scopes.

On the other hand, isn't this sort of issue exactly why Arel has operators to combine scopes? The chained notation is alright for some things, but it's never going to be sufficient to create arbitrary SQL without a lot of fussing.

--Matt Jones

Perhaps or() and where/and (I mainly add and because it’s symmetrical with or and adds a lot of readability IMHO) could take other where() snippets? So your examples become:

r = User.where( :foo => 1 )

r = r.and( r.where( :bar => 1 ).or( :bar => 2 ))

…and…

r = User.where( :foo => 1 ).where( :bar => 1 )

r = r.or( :bar => 2 )

…respectively.

I get the feeling that those in the know are just hanging back waiting for us to tie ourselves in syntactic knots.

I don’t want to derail the original thread here, which is dealing with the much more common and simple case of providing a consistent API for when to add an AND and when to combine same-name parameters.

If we postpone any thoughts of creating complex nested WHERE clauses, and in effect (probably using the IN function) limit our parenthesizing to same-name columns, then I think there’s still a lot to be gained by making the OR explicit.

Then for non-same-name columns, we just leave it up the programmer and the call order (ie. no other parenthesis).

Having where behave differently depending on whether the column was already used elsewhere in the relation is pure madness. If you do:

relation.where(:column => 1).all

All records returned by it should have 1 as the value of column, regardless of the contents of relation. Using where should filter the relation, it should always return a subset of the receiver.

FWIW, in Sequel, #where always operates as a filter (uses AND), and you can call #or if you want to use OR.

Jeremy