default_scope, named scopes and lambda arguments

This is a rather broad topic covering a few related issues. Please
bear with me while I elaborate.

Rails 3.0 supports lambda named scopes and there has been a request to
add support for lambda default_scopes in 3.1. With a help of a few
folks and under an eye of Aaron Patterson some work has been made
which is documented in this rather long ticket on lighthouse:

Unfortunately we've stumbled upon problems which are impossible to
overcome with simple patches. With the rails 3 new where(), order(),
limit(), etc. functions ActiveRecord::Relation objects are created and
merged dynamically and always in the context of the current scope.
Consider those examples:

    # example 1
    class Post < ActiveRecord::Base
      default_scope lambda { where( :locale => Locales.current ) }
      scope :valid, where( :valid => true )

The `where` function will be called before the call to `scope` and it
will return a new ActiveRecord::Relation object that will be saved as
the named scope. Unfortunately that relation will be created within
the currently active scope, which for calls at the AR class level is
the default scope. Read: the default scope will be evaluated during
the call to `scope` and it's resulting conditions will be merged
with :valid scope conditions.

Then whenever a user will call `Post.valid` two things will happen:
- first, default scope will be evaluated again and will produce a
Relation object with new, proper conditions
- second, this Relation will be merged with Relation saved in :valid
scope, which contains conditions from the call to `default_scope` at
the time of :valid scope declaration.

As a result of this merge the current conditions will be overwritten
by that
outdated data.

This also means that later you can't run :valid at the `unscoped`
level. Like `Post.unscoped.valid` - the resulting relation will
contain conditions taken from the `default_scope`.

Note that this would not happen if the programmer decided to declare
the scope like this:

    # example 2
    class Post < ActiveRecord::Base
      default_scope lambda { where( :locale => Locales.current ) }
      scope :valid, unscoped.where( :valid => true ) # notice

In this case the :valid scope does not contain conditions from the
default scope. But this is not transparent to the coder. It's not The
Rails Way if you have to remember to use `unscoped` if you've used
lambda before.

I had some ideas for dirty hacks that would work around this problem.
One of which ended up as a pull request on github:

In that patch I modified ActiveRecord::Relation to contain a mirror
relation without data from the default scope, I called that mirror
`without_default`. Each time a relation is merged with another so are
their `without_default` counterparts. The relation returned from
default scope has it's `without_default` cleared, so it's where the
"branch point" comes from. Then when I save a relation as new named
scope, I use it's `without_default` version.

It's terrible, messy. I know. It gets the job done for this one issue,
but it's a bad design.

What I have suggested to Aaron and others is changing the
`default_scope` and `scope` syntax. Have it always take blocks and
always evaluate them at the `unscoped` level. Basically do what I did
in example 2, but automatically.

    # example 3
    class Post < ActiveRecord::Base
      default_scope do
        lambda { where( :locale => Locales.current ) }
      scope :valid { where( :valid => true ) }

This way `scope` and `default_scope` can run those blocks at the
`unscoped` level and they could also run this at the time of the named
scope usage.

This has the added benefit of helping with another related issue.
Consider this bug I just found in Spree, a major e-commerce platform
for RoR:

    # example 4
    class Product < ActiveRecord::Base
      scope :not_deleted, where("products.deleted_at is NULL")
      scope :available, lambda { |*on|
        where("products.available_on <= ?", on.first || )
      scope :active, not_deleted.available

I'd say this is typical. Not only is this is how most coders think
named scopes work, but it's also how they *should* work. Of course in
the current version of Rails `not_deleted.available` is evaluated
before being saved as an :active named scope and as a result the time
in the available_on condition is frozen and never changes in the
subsequent calls to ``.

If we changed the `scope` syntax this would look like this:

    # example 5
    class Product < ActiveRecord::Base
      scope :not_deleted { where("products.deleted_at is NULL") }
      scope :available do
        lambda { |*on|
          where("products.available_on <= ?", on.first || ) }
      scope :active { not_deleted.available }

And the block passed to `scope :active` could be saved and run with
each call to ``.

Anyway - it's is just a suggestion. Aaron has asked me to start a
discussion here, because we really need to make a decision about
default_scopes and lambdas. The code currently residing at master has
buggy support and even occasionally throws exceptions due to proc

Please voice your opinions.


To solve the default scope problem it's enough to execute the blocks
at the `unscoped` level, but if we also want to solve the issue shown
in example 4 we need to delay block execution until default_scope or
named scopes are used. Then of course returning lambdas from the block
is unnecessary and a proper code would look a bit cleaner than what I
have shown above:

    # example 3
    class Post < ActiveRecord::Base
      default_scope { where( :locale => Locales.current ) }
      scope :valid { where( :valid => true ) }


    # example 5
    class Product < ActiveRecord::Base
      scope :not_deleted { where("products.deleted_at is NULL") }
      scope :available do |*on|
        where("products.available_on <= ?", on.first || )
      scope :active { not_deleted.available }


Hey Adam,

Thanks for looking into this, it definitely seems like there are some
fundamental design issues that are surfacing here.

I agree with the thrust of your proposal, but it would be good to
investigate whether it is possible to implement this lazy evaluation
without changing the syntax to have blocks all over the place.

For example, perhaps it would be possible to store the lambda-scopes on
the actual ActiveRecord::Relation object and only evaluate the lambdas
at the very last minute (i.e. when to_a is called).

I haven't thought about this particularly long and hard so there may be
other issues with this approach...


Thanks for your comment. I think your proposal could solve the issue from example 4 - joining named scopes that are lambdas, but it wouldn't help with default scope being applied to the relation passed as argument to subsequent `default_scope` or `scope` calls.

But here are some considerations anyway. You couldn't just store lambdas on the Relation, because the order of operations matter.

    class Post < ActiveRecord::Base
      scope :localized, lambda { where( :locale => Locales.current ) }
    # those two are different
    Post.localized.where(:locale => :en)
    Post.where(:locale => :en).localized

You could store a queue of previously applied Relation and lambda objects though. Of course two consecutive Relation objects could be merged the traditional way.

Although it would look transparent to the user and rails apps wouldn't require any special migration to 3.1, the AR code would become significantly more complex.

By making every named and default scope a lambda you just need to change two AR methods that prepare the Relation just before putting it on the current scope stack. On the other hand by introducing a relation-lambdas queue you have to make some methods aware of this queue. You mentioned `to_a`, but this could be postponed until `build_arel` which is called by `to_a` and similar.

Then there are info methods like `where_clauses` which are used by AR, but also by some gems I presume. Those would need to flatten the whole queue to return proper results.

Furthermore there are methods like `except` and `reverse_order` which would need to be reimplemented as actual elements on the queue. Currently `except` just drops some commands from the Relation object and `reverse_order` just changes the order clauses ASC and DESC properties. If we were to put those methods on top of lambas queue, you would have to remember the requested operation and apply it only after the lambda would be evaluated.

I'm worried about AR maintainability here. And the performance hit would probably be bigger than one additional proc evaluation every time named scope is used.

All this and it wouldn't solve the default scope issue unless you tried to mark the elements on the queue as coming from default scope and dropping them from queues coming from named scopes.



Thanks for your thoughtful response. I have re-read your previous email,
and this email, and thought about it some more.

I have changed my mind. I now agree that the best thing to do would be
to have 'scope' take blocks.

Essentially what we're getting at is that scopes should be lazily
evaluated. There may be some hackery we can apply to have laziness
without 'scope' taking a block, but ultimately it will always be
hackery. The only way to make the laziness completely transparent to the
users of the API would be to make it obvious, i.e. by requiring them to
use a block. Having a block makes it instantly clear that the evaluation
does not happen immediately.

My initial reticence was essentially because I think scope :foo
{ where(:bla) } is ugly (it's also syntactically invalid without
parentheses actually). But in reality, I think if this were the syntax
then I would just declare all my scopes as:

  scope :foo do

This actually appeals to me additionally because it makes it look more
like a method. Scope are essentially just hyped-up class methods which
get cached when they are accessed through associations.

So, in short, I approve of this proposal.

One question though. Why do we need:

  default_scope do
    lambda { where(:bla} }

Why not just:

  default_scope do


Also, have you thought about how to make the transition from the current
syntax? My proposal would be to deprecate the non-block syntax in 3.1
and explain to people why it might not give them what they want, but to
not actually remove it until 3.2. I think we should allow time to
upgrade as this is a feature that basically everybody uses.


I haven't thought about a possible transition plan yet. I'd like to hear more opinions about this idea. Syntax change is not a minor thing after all.

You're also right, you need parentheses if you want to pass "{" block along with some arguments. Which is too bad, because it looks even worse with them.

As for your question about returning lambda from within a block. I've corrected myself in the first reply to this thread:

I would really like to see default_scope go a step further by adding
optional 'names' to them. Since multiple default_scopes can be added
to model, it would be nice to selectively disable them by name.
Imagine the case for implementing a soft-delete gem:

default_scope :non_deleted, { where('deleted_at is not null') }
scope :with_deleted, { without_default(:non_deleted) }

message.all # returns all non_deleted records
message.with_deleted.all # returns all records, including those marked
as deleted

I think this goes overboard. Default scopes are not really a good programming technique. They should be applied sparsely and carefully.

We have named scopes that should be used in the case that you've described: