default_scope, named scopes and lambda arguments

This is a rather broad topic covering a few related issues. Please bear with me while I elaborate.

Rails 3.0 supports lambda named scopes and there has been a request to add support for lambda default_scopes in 3.1. With a help of a few folks and under an eye of Aaron Patterson some work has been made which is documented in this rather long ticket on lighthouse: https://rails.lighthouseapp.com/projects/8994/tickets/1812-default_scope-cant-take-procs

Unfortunately we've stumbled upon problems which are impossible to overcome with simple patches. With the rails 3 new where(), order(), limit(), etc. functions ActiveRecord::Relation objects are created and merged dynamically and always in the context of the current scope. Consider those examples:

    # example 1     class Post < ActiveRecord::Base       default_scope lambda { where( :locale => Locales.current ) }       scope :valid, where( :valid => true )     end

The `where` function will be called before the call to `scope` and it will return a new ActiveRecord::Relation object that will be saved as the named scope. Unfortunately that relation will be created within the currently active scope, which for calls at the AR class level is the default scope. Read: the default scope will be evaluated during the call to `scope` and it's resulting conditions will be merged with :valid scope conditions.

Then whenever a user will call `Post.valid` two things will happen: - first, default scope will be evaluated again and will produce a Relation object with new, proper conditions - second, this Relation will be merged with Relation saved in :valid scope, which contains conditions from the call to `default_scope` at the time of :valid scope declaration.

As a result of this merge the current conditions will be overwritten by that outdated data.

This also means that later you can't run :valid at the `unscoped` level. Like `Post.unscoped.valid` - the resulting relation will contain conditions taken from the `default_scope`.

Note that this would not happen if the programmer decided to declare the scope like this:

    # example 2     class Post < ActiveRecord::Base       default_scope lambda { where( :locale => Locales.current ) }       scope :valid, unscoped.where( :valid => true ) # notice 'unscoped'     end

In this case the :valid scope does not contain conditions from the default scope. But this is not transparent to the coder. It's not The Rails Way if you have to remember to use `unscoped` if you've used lambda before.

I had some ideas for dirty hacks that would work around this problem. One of which ended up as a pull request on github: https://github.com/rails/rails/pull/169

In that patch I modified ActiveRecord::Relation to contain a mirror relation without data from the default scope, I called that mirror `without_default`. Each time a relation is merged with another so are their `without_default` counterparts. The relation returned from default scope has it's `without_default` cleared, so it's where the "branch point" comes from. Then when I save a relation as new named scope, I use it's `without_default` version.

It's terrible, messy. I know. It gets the job done for this one issue, but it's a bad design.

What I have suggested to Aaron and others is changing the `default_scope` and `scope` syntax. Have it always take blocks and always evaluate them at the `unscoped` level. Basically do what I did in example 2, but automatically.

    # example 3     class Post < ActiveRecord::Base       default_scope do         lambda { where( :locale => Locales.current ) }       end       scope :valid { where( :valid => true ) }     end

This way `scope` and `default_scope` can run those blocks at the `unscoped` level and they could also run this at the time of the named scope usage.

This has the added benefit of helping with another related issue. Consider this bug I just found in Spree, a major e-commerce platform for RoR:

    # example 4     class Product < ActiveRecord::Base       scope :not_deleted, where("products.deleted_at is NULL")       scope :available, lambda { |*on|         where("products.available_on <= ?", on.first || Time.zone.now )       }       scope :active, not_deleted.available     end

I'd say this is typical. Not only is this is how most coders think named scopes work, but it's also how they *should* work. Of course in the current version of Rails `not_deleted.available` is evaluated before being saved as an :active named scope and as a result the time in the available_on condition is frozen and never changes in the subsequent calls to `Product.active`.

If we changed the `scope` syntax this would look like this:

    # example 5     class Product < ActiveRecord::Base       scope :not_deleted { where("products.deleted_at is NULL") }       scope :available do         lambda { |*on|           where("products.available_on <= ?", on.first || Time.zone.now ) }       end       scope :active { not_deleted.available }     end

And the block passed to `scope :active` could be saved and run with each call to `Product.active`.

Anyway - it's is just a suggestion. Aaron has asked me to start a discussion here, because we really need to make a decision about default_scopes and lambdas. The code currently residing at master has buggy support and even occasionally throws exceptions due to proc merges.

Please voice your opinions.

Cheers, Adam

To solve the default scope problem it's enough to execute the blocks at the `unscoped` level, but if we also want to solve the issue shown in example 4 we need to delay block execution until default_scope or named scopes are used. Then of course returning lambdas from the block is unnecessary and a proper code would look a bit cleaner than what I have shown above:

    # example 3     class Post < ActiveRecord::Base       default_scope { where( :locale => Locales.current ) }       scope :valid { where( :valid => true ) }     end

And:

    # example 5     class Product < ActiveRecord::Base       scope :not_deleted { where("products.deleted_at is NULL") }       scope :available do |*on|         where("products.available_on <= ?", on.first || Time.zone.now )       end       scope :active { not_deleted.available }     end

Adam

Hey Adam,

Thanks for looking into this, it definitely seems like there are some fundamental design issues that are surfacing here.

I agree with the thrust of your proposal, but it would be good to investigate whether it is possible to implement this lazy evaluation without changing the syntax to have blocks all over the place.

For example, perhaps it would be possible to store the lambda-scopes on the actual ActiveRecord::Relation object and only evaluate the lambdas at the very last minute (i.e. when to_a is called).

I haven't thought about this particularly long and hard so there may be other issues with this approach...

Cheers, Jon

Thanks for your comment. I think your proposal could solve the issue from example 4 - joining named scopes that are lambdas, but it wouldn't help with default scope being applied to the relation passed as argument to subsequent `default_scope` or `scope` calls.

But here are some considerations anyway. You couldn't just store lambdas on the Relation, because the order of operations matter.

    class Post < ActiveRecord::Base       scope :localized, lambda { where( :locale => Locales.current ) }     end     # those two are different     Post.localized.where(:locale => :en)     Post.where(:locale => :en).localized

You could store a queue of previously applied Relation and lambda objects though. Of course two consecutive Relation objects could be merged the traditional way.

Although it would look transparent to the user and rails apps wouldn't require any special migration to 3.1, the AR code would become significantly more complex.

By making every named and default scope a lambda you just need to change two AR methods that prepare the Relation just before putting it on the current scope stack. On the other hand by introducing a relation-lambdas queue you have to make some methods aware of this queue. You mentioned `to_a`, but this could be postponed until `build_arel` which is called by `to_a` and similar.

Then there are info methods like `where_clauses` which are used by AR, but also by some gems I presume. Those would need to flatten the whole queue to return proper results.

Furthermore there are methods like `except` and `reverse_order` which would need to be reimplemented as actual elements on the queue. Currently `except` just drops some commands from the Relation object and `reverse_order` just changes the order clauses ASC and DESC properties. If we were to put those methods on top of lambas queue, you would have to remember the requested operation and apply it only after the lambda would be evaluated.

I'm worried about AR maintainability here. And the performance hit would probably be bigger than one additional proc evaluation every time named scope is used.

All this and it wouldn't solve the default scope issue unless you tried to mark the elements on the queue as coming from default scope and dropping them from queues coming from named scopes.

Adam

Adam,

Thanks for your thoughtful response. I have re-read your previous email, and this email, and thought about it some more.

I have changed my mind. I now agree that the best thing to do would be to have 'scope' take blocks.

Essentially what we're getting at is that scopes should be lazily evaluated. There may be some hackery we can apply to have laziness without 'scope' taking a block, but ultimately it will always be hackery. The only way to make the laziness completely transparent to the users of the API would be to make it obvious, i.e. by requiring them to use a block. Having a block makes it instantly clear that the evaluation does not happen immediately.

My initial reticence was essentially because I think scope :foo { where(:bla) } is ugly (it's also syntactically invalid without parentheses actually). But in reality, I think if this were the syntax then I would just declare all my scopes as:

  scope :foo do     where(:bla)   end

This actually appeals to me additionally because it makes it look more like a method. Scope are essentially just hyped-up class methods which get cached when they are accessed through associations.

So, in short, I approve of this proposal.

One question though. Why do we need:

  default_scope do     lambda { where(:bla} }   end

Why not just:

  default_scope do     where(:bla)   end

?

Also, have you thought about how to make the transition from the current syntax? My proposal would be to deprecate the non-block syntax in 3.1 and explain to people why it might not give them what they want, but to not actually remove it until 3.2. I think we should allow time to upgrade as this is a feature that basically everybody uses.

Jon

I haven't thought about a possible transition plan yet. I'd like to hear more opinions about this idea. Syntax change is not a minor thing after all.

You're also right, you need parentheses if you want to pass "{" block along with some arguments. Which is too bad, because it looks even worse with them.

As for your question about returning lambda from within a block. I've corrected myself in the first reply to this thread:

I would really like to see default_scope go a step further by adding optional 'names' to them. Since multiple default_scopes can be added to model, it would be nice to selectively disable them by name. Imagine the case for implementing a soft-delete gem:

default_scope :non_deleted, { where('deleted_at is not null') } scope :with_deleted, { without_default(:non_deleted) }

message.all # returns all non_deleted records message.with_deleted.all # returns all records, including those marked as deleted

I think this goes overboard. Default scopes are not really a good programming technique. They should be applied sparsely and carefully.

We have named scopes that should be used in the case that you've described:

Message.unscoped.other_scope.all

Adam