AR::Base should not be nuking it's children, just because it lost interest.

I diagnosed the following issue as part of [1], but it has since come up again as part of [2].

== The issue: ==

Whenever an AR::Base instance is persisted between requests with cache_classes disabled, Dispatcher#reset_application! will go through all its subclasses and remove all its instance methods and instance variables (those on the instance of Class not on instances of AR::Base, it's all very confusing!).

This was intended to fix a memory leak [3], and had a test associated with it. The test is now long gone, and my efforts at re-adding it failed because I never got it to pass, with or without the class stripping.

== The effect: ==

The example app by David Rice [4] will explode with error [5] after 2 page loads.

This app saves an AR model to a constant at startup... then expects it to work after a reload!.

After a reload the model gets very confused, as its class has all its class_inheritable_attributes stripped.

It expects an array to be dangling off the end of #skip_time_zone_conversion_for_attributes, but instead it finds nil.

This sucks.

== What should happen (IMHO): ==

Removing the references to classes in @@subclasses should be enough to allow garbage collection to do the good stuff when the relevant constants (that refer to the classes) are removed.

This way, any objects that are still referencing those classes (you know, like when they are an instance of them), can still operate without unnecessary loss of organs.

== The patch: ==

The attached patch, rebased especially for the occasion, stops AR from behaving in this odd way, and seems to pass the existing test pack.

== The questions: ==

Does anyone have a good reason why these lines should stay?

Can anyone create a failing test case? (preferably one that passes before the patch is applied :wink:

Does anyone remember why the code is there?

Thanks in advance for any thoughts / input / other.
    - Tom

p.s.
I had to revert my local rails back to 55707da1a1481, as 8adb79b9b598 onwards clashed with my MySQL.
Tests passed at that point though.

prevent_class_method_and_variable_nukeing_on_AR_reset_subclasses.diff (1.07 KB)

У Чцв, 06/11/2008 у 20:40 +0000, Tom Lea піша:

== What should happen (IMHO): ==

Removing the references to classes in @@subclasses should be enough to
allow garbage collection to do the good stuff when the relevant
constants (that refer to the classes) are removed.

This way, any objects that are still referencing those classes (you
know, like when they are an instance of them), can still operate
without unnecessary loss of organs.

If I understand correctly this means that old versions of code will
remain. I think that better approach is to provide better explanations
of what's going on, so that people don't try to preserve instances or
classes between requests. We can do so by defining #method_missing on
zombie classes. What do you think ?

If I understand correctly this means that old versions of code will
remain. I think that better approach is to provide better explanations
of what's going on, so that people don't try to preserve instances or
classes between requests.

I agree that reload having no effect on cached instances could be a little confusing.

On the other hand, perfectly good production code being banned from develepment mode seems overkill, and also confusing.

We can do so by defining #method_missing on
zombie classes. What do you think ?

The issues are not always to do with missing methods, class inheritable attributes are just as bad, so existing methods that reference class inheritable attributes would still explode without reason.

Any ideas on how to do this nicely?

... currently, very easy to do, if you store ActiveRecord instances in
ActiveSupport's memory cache store. With the current Rails reloading
behavior in development, you need to add additional logic to your app
to not cache AR instances in development, or avoid this practice
entirely, neither of which are attractive options, imo.

As I understand it, if we do allow AR instances to persist between
requests, there'd only be an issue when you modify the class of an
instance that was cached -- in this case, you'd need to restart the
server. But I think that makes intuitive sense -- cached instances
would then be out-of-sync with the current class definition.

У Суб, 08/11/2008 у 13:24 -0800, Geoff B піша:

As I understand it, if we do allow AR instances to persist between
requests, there'd only be an issue when you modify the class of an
instance that was cached -- in this case, you'd need to restart the
server. But I think that makes intuitive sense -- cached instances
would then be out-of-sync with the current class definition.

If you always restart than there's no difference in keeping old versions
or turning them to zombies.

Anyway, if we allow multiple class versions we should warn people. Many
things will stop working, like for example equality testing of active
record instances. Otherwise people will spend many hours trying to track
down extremely weird and hard to find bugs, many of them, probably,
unsuccessfully.

Ideal solution would be to upgrade old classes and instances to new
definition in a manner similar to CLOS. But I don't see a clean way to
do that in ruby. The only candidate approach I see is via
ObjectSpace#each_object (rewrite references to old instances & classes
to point to upgraded instances), but it cannot inspect and rewrite
closures, so will not work too. And of course ObjectSpace#each_object
should be considered implementation-specific feature, because it cannot
be efficiently implemented on all ruby implementations.

Good point about object equality -- #is_a? comparisons wouldn't work
as expected, either.

Note that this is an existing problem; it's not specific to Tom Lea's
patch. But it does demonstrate that objects kept around between
requests will never work exactly as you'd expect them to.

I suppose the only fix for this would be, remove all instance methods
from AR instances, including methods defined in superclasses, in
AR::Base.reset_subclasses, and implement a method_missing with an
informative error, as you suggest.

Good point about object equality -- #is_a? comparisons wouldn't work
as expected, either.

Note that this is an existing problem; it's not specific to Tom Lea's
patch. But it does demonstrate that objects kept around between
requests will never work exactly as you'd expect them to.

The key here is that they do behave as expected in production. There is nothing inherently wrong about keeping instances between requests. It works, and it works well, just not with the odd behaviour unique to AR.

The same reload behaviour described in previous posts effects any constant defined via a require_dependancy. It is only the deletion of methods and class attributes on AR instances that I am questioning, and that is all this patch addresses.

I suppose the only fix for this would be, remove all instance methods
from AR instances, including methods defined in superclasses, in
AR::Base.reset_subclasses, and implement a method_missing with an
informative error, as you suggest.

If this is indeed the aim, should the same logic apply to instances of other classes?

I fear this is a bridge too far.

The key here is that they do behave as expected in production. There
is nothing inherently wrong about keeping instances between requests.

Agreed -- this works fine in production. The issues we're discussing
are only occur when config.cache_classes = true (i.e., development.)

If this is indeed the aim, should the same logic apply to instances of
other classes?

I fear this is a bridge too far.

You do make a good point about this being an issue with non-AR
classes, if they're reloaded -- if we're discussing neutering zombie
AR classes, perhaps we should really be discussing neutering any class
that's reloaded via Dependencies.

As it stands now, for any class (AR or otherwise) that is reloaded,
any instances that were created prior to reloading won't respond to
#== and #is_a? as expected.

Seems like the choice here is between:

1. accepting the possibility of weird, buggy, hard-to-diagnose
behavior in development (in which case, we could then consider your
patch, which solves some, but not all, of the issues we've
identified), and

2. prohibiting the user from using instances of reloaded classes that
persist between requests in development, thereby requiring the user to
work around this in their application code

Not a great set of choices, I know. Am I missing something here?
Thoughts?

Correction: I meant to say, when config.cache_classes = false (i.e.,
development.)

Anyway, here's a specific example of the issue, using David Rice's
example app from #1290 --

We set MONKEY_ONE = Monkey.first in an initializer (if we use
Rails.cache with :memory_store back-end instead of a constant, the
result is the same.)

In the view:

<%=
case MONKEY_ONE
when Monkey
  "a monkey"
else
  "not a monkey"
end
%>

With cache_classes = true (i.e., production), we get "a monkey" every
time we hit the page, as expected.

With cache_classes = false (i.e., development), we get "a monkey" the
first time we hit the page, and "not a monkey" on subsequent requests.
This occurs whether or not we apply Tom Lea's patch.

So, the question is, if it's impossible for us to tweak development
reloading so that we can get "a monkey" every time we hit the page, is
it okay to get one result the first time you hit the page, and a
different result on subsequent requests?

Or would it be better to get "a monkey" on the first request, and a
"ZombieClassError" when you reloaded the page?

У Няд, 09/11/2008 у 15:21 -0800, Geoff B піша:

With cache_classes = false (i.e., development), we get "a monkey" the
first time we hit the page, and "not a monkey" on subsequent requests.
This occurs whether or not we apply Tom Lea's patch.

We can ease the pain of running such code in development by not
reloading all classes after each request. But reloading only changed
classes or reloading all classes if something has changed. There were a
plugin which tried to reload only changed clasess. It didn't work for
us. So I created (and haven't published yet) a simpler solution which
reloads everything only if anything has changed. We can put this code
into rails.

Also we can ease development pain for such cases by restaring whole
process instead of reloading classes. CGI should work fine here for
example. Doing it other way seems non-trivial, because new process will
need to inherit connected socked from old process but nothing else.

The key here is that they do behave as expected in production. There
is nothing inherently wrong about keeping instances between requests.
It works, and it works well, just not with the odd behaviour unique to
AR.

This will not affect production users unless you have cache_classes off.
If you do your performance will be awful as all these constants get
removed and redefined every request.

Is there a reason you have cache_classes set like that?

The same reload behaviour described in previous posts effects any
constant defined via a require_dependancy. It is only the deletion of
methods and class attributes on AR instances that I am questioning,
and that is all this patch addresses.

I commented on the ticket but will do so here for completeness.

The behaviour in question was added to fix a memory leak caused (I
believe) by ruby not GCing procs passed to define_method when the
constant was garbage collected. As such we can only apply this patch if
all 'popular' versions of ruby no longer hold on to those procs. If
they still leak, then this is really just a bitter side effect of our
having to work around a ruby bug.

If this is indeed the aim, should the same logic apply to instances of
other classes?

I fear this is a bridge too far.

As this was only done to support a particular memory leak due to a
particular ruby bug. If it's still a problem we can look at an
alternative fix in 2.3, however the memory leak in dev mode was pretty
severe and very very annoying.

We can ease the pain of running such code in development by not
reloading all classes after each request. But reloading only changed
classes or reloading all classes if something has changed. There were a
plugin which tried to reload only changed clasess. It didn't work for
us. So I created (and haven't published yet) a simpler solution which
reloads everything only if anything has changed. We can put this code
into rails.

This sounds like a good idea, but inter-dependent modules and the like
could get a little confusing / tricky. Sounds like a good addition to
2.3. We already do this for routes.rb, but it's a much simpler case to
handle

Also we can ease development pain for such cases by restaring whole
process instead of reloading classes. CGI should work fine here for
example. Doing it other way seems non-trivial, because new process will
need to inherit connected socked from old process but nothing else.

We had CGI based development back before 1.0, it was much slower, and
much more frustrating. It would take a lot more than this one case to
justify switching.

The memory store could also be replaced / complemented with a
'marshaling memory store' which would completely work around this problem.