I18n, validations and symbols

Few facts:
a) Symbols are never garbage collected,
b) Validations, human_attribute_name and Model.model_name.human
generate lot of symbols as defaults for translations.

Those two things combined lead to lot of memory consumption (the
bigger the application is the more memory is never returned back). I
think we could exchange the symbols for a some special class which
would mean: "I'm another default that should be translated instead
just being passed as a string".

To keep compatibility with older rails version the old symbol behavior
shall remain unchanged however rails internals could stop using
symbols as defaults.

String class could be extended with a method that would easily return
our new class like:

class String
def translation_default
   TranslationDefault.new(self)
end
alias_method :t, :translation_default
end

New api:
I18n.t
"activerecord.errors.models.user.attributes.name.blank", :default => [
"activerecord.errors.models.user.blank".t,
"activerecord.errors.messages.blank".t,
"errors.attributes.name.blank".t
"errors.messagges.blank".t,
"Can't be blank"
]

Old api:
I18n.t
"activerecord.errors.models.user.attributes.name.blank", :default => [
:"activerecord.errors.models.user.blank",
:"activerecord.errors.messages.blank",
:"errors.attributes.name.blank",
:"errors.messagges.blank",
"Can't be blank"
]

Both would work.

I would like to know what you think about the problem and solution.

Robert Pankowecki

I'm not a Ruby-internals expert by any means, but I don't think this is really saving any memory over the symbol version - in fact, it seems likely to lead to *more* memory usage and GC churn. The problem is that the strings instantiated in a particular call may get GCed, but there's still the object that's part of the loaded code.

Here's a gist with some code:

http://gist.github.com/660316

On my system (ruby 1.8.7 (2010-01-10 patchlevel 249) [i686-darwin9.8.0]) I get the following output:

131 131 131 132 131 132 132

Not sure about that last value - it should be 131, as the string created by dummy_function should get collected. In any case, making dummy_function return nil instead yields:

130 130 130 130 130 130 130

as it should, since there isn't any string object creation going on during the count intervals.

Further, there's the issue that *every* time a String literal is used, a new object is created:

'foo'.object_id == 'foo'.object_id # => false
:foo.object_id == :foo.object_id # => true

so running validations on an object with validates_presence_of on N attributes will end up with N+1 copies of the literal 'errors.messages.blank', N of which will be GCed at some point. (plus the additional N TranslationDefault objects). The symbol version, on the other hand, will have the single :'errors.messages.blank' symbol.

The waters are somewhat muddied by things like:

      defaults << :"errors.messages.#{type}"

(from http://github.com/rails/rails/blob/master/activemodel/lib/active_model/errors.rb ) which looks likely generate a string and *then* make it into a symbol, thus negating the previous discussion. The only other factor I could see pointing towards symbols is that *user* code is less likely to contain symbols created by interpolation (so symbols would save memory) and forcing the API to work with both is somewhat confusing (and error-prone...)

Note that there *are* times when using symbols is a bad idea; HashWithIndifferentAccess was specifically created (and keyed with strings rather than symbols) to block attacks that try to overflow memory with tons of unique parameter names.

BTW, I've also heard some chatter about 1.9 doing GC on Symbols as well, but I can't seem to find a reference.

As I said in the opening, not even sorta an expert on this - I'd put a better-than-even chance that wycats or somebody with vastly superior ruby-core-fu will explain exactly why I've got no idea what I'm talking about. :slight_smile:

--Matt Jones

Link: http://www.ruby-forum.com/topic/450136

There are two possibilites:

1) Symbols are garbage collected and we can forget about our nice
discussion and go back to work

2) They are not collected so we have a choice :

a) create more objects that will be garbage collected
b) create less objects but they will stay forever in the application.

I think that version "a" is better.
I would like to here other opinions.

Robert Pankowecki

Hi,

Not sure it’s gonna help but running this script with REE:

1 require ‘rubygems’

 2	require 'memprof'

 3

 4	Memprof.start

 5	:'something.interpolated'

 6	a = 'interpolated'

 7	:"something.#{a}"

 8	:"something.#{'inter'+'polated'}"

 9	Memprof.stats

10	Memprof.stop

will return:

Line# File:LineNumber:Object

  1 /tmp/test.rb:6:String

  1 /tmp/test.rb:7:String

  4 /tmp/test.rb:8:String

So an interpolated symbol produces a String.

Any new ideas on this topic ? It seems that we always produce a String
and then turn it into symbol.
Do you want me to create some benchmarks so we can discuss the switch
based on some facts ?

Robert Pankowecki