String Allocation and Memory Bloat in ActiveRecord Objects

Hello,
I’ve been trying to track-down a pretty significant memory bloat issue with ActiveRecord objects. Running
some generic experiments e.g. an empty model with 6 enum_fields, created_at, updated_at - and fetching 72000 of these with ActiveRecord from a Postgres Database allocates approximately 120mb of memory or (1747 bytes per object) - measured using Instruments.On the other hand fetching the same 72000 objects into a list of hashes directly with Ruby-PG is only about 10mb of memory.
One particular thing I found which was surprising is that enum is not any more efficient than string. This appears to be because
the Postgresql Adapter for ActiveRecord uses TypeMapAllString on the RubyPG level and than performs all of the TypeMapping in Ruby. This seems to be grossly inefficient since Ruby-PG has support for both basic and custom type-mapping on the C-level.Here is some notable output from the ```memory_profiler` gem - this is for only 729 objects.

1617 “0”
1458 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:169
159 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/connection_adapters/postgresql/oid/type_map_initializer.rb:161459 “2”
1459 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:1691459 “1”
1459 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:169729 “_initialize_callbacks”
729 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activesupport-4.2.3/lib/active_support/callbacks.rb:81729 “_find_callbacks”
729 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activesupport-4.2.3/lib/active_support/callbacks.rb:81729 “initialize”
729 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activesupport-4.2.3/lib/active_support/callbacks.rb:81729 “find”
729 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activesupport-4.2.3/lib/active_support/callbacks.rb:81505 “\n”
262 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/relation.rb:17
87 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/relation/delegation.rb:15
42 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/relation/delegation.rb:16
34 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/connection_adapters/abstract/database_statements.rb:228
31 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activesupport-4.2.3/lib/active_support/dependencies.rb:274
20 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/relation.rb:654
20 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/connection_adapters/postgresql/database_statements.rb:169
7 /Users/user_name/.rvm/gems/ruby-2.2.2/gems/activerecord-4.2.3/lib/active_record/relation/delegation.rb:17
2 /Users/user_name/.rvm/rubies/ruby-2.2.2/lib/ruby/2.2.0/mutex_m.rb:41

``

It seems to me that potentially two improvements could be made here:

a) Freezing the callback strings. While “_initialize_callbacks” and “_find_callbacks” are dynamically generated strings I think the variations are quite few in number, using hash lookup for this could be a large cost saver when fetching multiple objects.

b) Switching from TypeMapAllString in the ActiveRecord Postgres Adapter to native C type-mapping in Ruby-PG.

I
have experimented with a patch for Freezing the callback strings (attached). It may not be very nice but perhaps still preferable to dynamically generating the callback strings per object.

P.S. I have only about ~3 months of experience with Ruby, so I am quite apt to have some serious misunderstandings, please take it easy on me :slight_smile:

callback-strings.patch (1.23 KB)

I appreciate people pointing out slow spots. I’m not the most quallified to speak about AR but I’ll try to help with the proposed patch. You can get a bit more speed by taking out the hash lookup:


require 'benchmark/ips'

var = "world".freeze

hash = { var => "hello #{var}".freeze}

Benchmark.ips do |x|

x.report("hash    ") { hash[var] }

x.report("compare ") { var == "world".freeze ? "hello world".freeze : "hello #{var}" }

x.report("static  ") { "hello world".freeze }

x.report("none    ") { "hello #{var}" }

end

Gives you:


Calculating -------------------------------------

hash        90.986k i/100ms

compare     98.282k i/100ms

static     103.704k i/100ms

none        90.593k i/100ms

Hello Richard,
Thanks for the response.
Unfortunately I’m not sure how it’s possible to do this without the Hash look-up. For example one might imagine adding
something like this to ActiveSupport::Callbacks (which would prevent more than one string from being permanently allocated):

@@_current_kind = nil    

def get_frozen_callback(kind)
  if @@_current_kind == kind
    @@_current_callback
  else
    @@_current_kind = kind
    @@_current_callback = "_#{kind}_callbacks".freeze
  end
end

``

The problem is that this is what a typical run actually looks like:

[1] pry(main)> Symbolie.all.to_a
in #run_callbacks – self: #ActiveRecord::ConnectionAdapters::PostgreSQLAdapter:0x007faa970f41e8, kind: checkout
Symbolie Load (2.6ms) SELECT “symbolies”.* FROM “symbolies”
in #run_callbacks – self: #Symbolie:0x007faa95775cd0, kind: find
in #run_callbacks – self: #Symbolie:0x007faa95775cd0, kind: initialize
in #run_callbacks – self: #Symbolie:0x007faa95775960, kind: find
in #run_callbacks – self: #Symbolie:0x007faa95775960, kind: initialize
in #run_callbacks – self: #Symbolie:0x007faa95775690, kind: find
in #run_callbacks – self: #Symbolie:0x007faa95775690, kind: initialize
in #run_callbacks – self: #Symbolie:0x007faa957753c0, kind: find
in #run_callbacks – self: #Symbolie:0x007faa957753c0, kind: initialize
in #run_callbacks – self: #Symbolie:0x007faa957750f0, kind: find
in #run_callbacks – self: #Symbolie:0x007faa957750f0, kind: initialize
in #run_callbacks – self: #Symbolie:0x007faa95774e20, kind: find
in #run_callbacks – self: #Symbolie:0x007faa95774e20, kind: initialize
in #run_callbacks – self: #Symbolie:0x007faa95774b50, kind: find

etc…

``

The
objects are initialized sequentially so without a hash the strings will still have to be allocated and deallocated on a per-object basis - since
as far as I understand (I may be mistaken here?) it is not useful to freeze a dynamically-generated string in Ruby without storing the reference e.g.

[2] pry(main)> “hello world”.freeze.object_id
=> 70185328616580
[3] pry(main)> “hello world”.freeze.object_id
=> 70185328616580
[4] pry(main)> var = “world”
=> “world”
[5] pry(main)> “hello #{var}”.freeze.object_id
=> 70185328477560
[6] pry(main)> “hello #{var}”.freeze.object_id
=> 70185328139100

``

I
think for all normal applications the memory use from the hash is entirely marginal (about the same as a previous_changes on a single AR object) and since all objects will re-use these calls it doesn’t make
a lot of sense to clear. Actually
I think even if the plan was to clear, it may be quite difficult to do
so - since callbacks are an ActiveSupport concern they should be generally usable without AR (in theory?) – meaning that any class which
extends the concern and calls the define_callbacks and run_callbacks methods should be capable of using them correctly. Perhaps an alternative would be to make the string freezing behaviour specific to the ActiveRecord extension of ActiveSupport callbacks, and guaranteeing that the callbacks hash is cleared after all callbacks are run (it would
need to have transactional not object based context) and may be little
messy and I’m really not sure clearing 10-30 strings from memory (potentially only until the next callback cycle re-allocates them) would justify that.

I will make a pull request on github as per your suggestion.

  • Alex

You’re correct, it wouldn’t work without the hash. I didn’t grok the patch until it was in front of me on github. Thank