I added https://github.com/rails/rails/issues/7629 on this subject.
Adding issues does not help, and only creates noise on the tracker.
Specifics about WHAT is causing over-allocation or HOW to fix it may be valid. But an open issue for 'tons ‘o objects’ helps nobody and is not productive. It is far too general.
The bug report specifies what is causing the over-allocation and how to fix it. It’s pretty specific.
It's pretty specific in terms of what the problem is, but it's not at all descriptive of what the actual problem is an how to fix it.
In this case, the cause is that ActiveRecord is using unfrozen strings as keys. When you use an unfrozen string as a hash key, ruby dups it, freezes the dup, and uses the frozen dup as the hash key. The simple fix to reduce the number of allocated strings from columns * (rows + 1) to just columns is to freeze the columns before using them as hash keys.
Pull request filed: https://github.com/rails/rails/pull/7631
Jeremy
Excellent! I verified that your fix did eliminate the redundancy on these field name strings in the case I was studying (from 15 extra strings per instance down to 2, where the 2 were attribute values, not field names).
Attribute values would be a good case for the StringPool I guess, even though I still think that would be something that should be introduced in Ruby, not Rails, and because of string’s bang methods altering the object itself so a lot of existing user code would assume object_id equivalence of a string and the object produced by one of that string’s bang methods, so it would be a major change. I know you wanted to focus on AR, but if you did only focus on AR attribute values and just had a StringPool for them, then AR attribute values would be object equivalent and have the same string bang method wierdness, but other strings wouldn’t act that way, and this would be much more evil than doing it in Ruby.
Excellent! I verified that your fix did eliminate the redundancy on these field name strings in the case I was studying (from 15 extra strings per instance down to 2, where the 2 were attribute values, not field names).
Attribute values would be a good case for the StringPool I guess, even though I still think that would be something that should be introduced in Ruby, not Rails, and because of string’s bang methods altering the object itself so a lot of existing user code would assume object_id equivalence of a string and the object produced by one of that string’s bang methods, so it would be a major change. I know you wanted to focus on AR, but if you did only focus on AR attribute values and just had a StringPool for them, then AR attribute values would be object equivalent and have the same string bang method wierdness, but other strings wouldn’t act that way, and this would be much more evil than doing it in Ruby.
Take a look at these:
Bartosz Dziewoński wrote in post #1077524:
Never create Ruby strings longer than 23 characters - Pat Shaughnessy Seeing double: how Ruby shares string values - Pat Shaughnessy
That probably doesn’t help because the Ruby optimization happens for strings that are 23 chars are more, and I guess that most attribute names are shorter (and many attribute values may be also).