Rails 4 preload performance

I’m trying to track down why Rails 4.1.0 beta is more than twice as slow as Rails 3.2.13 when performing a standard MyModel.includes(:some_association) query. I’ve already patched an issue that caused it to be 31 times slower, (see https://github.com/rails/rails/pull/12090), but the remaining delta is still prohibitively slow (at least for our app). I’m testing by running the following code (see https://gist.github.com/njakobsen/6393783 for a simple test suite).

result = RubyProf.stop
printer = RubyProf::GraphHtmlPrinter.new(result)
printer.print(File.open("…/rails-include-performance-#{Rails.version}.html", ‘w’))
One interesting thing I noticed using Ruby prof was that while Rails 3.2.13 made 10145 calls to Class#new, Rails 4.1.0 beta called it 67154 times while running the same test. Now, I’m not familiar with all the fundamental differences between Rails 3 vs Rails 4, but it seems like we’re now making a ton more objects, and it’s having a serious impact on performance. Is there anyone familiar with the association preloader function, or someone who is working on performance tuning Rails 4 so I can help make it as fast as Rails 3 was?