Improving ActiveRecord::LogSubscriber#log_query_source

I was benchmarking a real-world complex rails app and found that ActiveRecord::LogSubscriber#log_query_source contributes as much as 80-120ms. I was wondering if we could improve this.

From what I found, simply changing caller with yields significant improvements, and should be equivalent.

 def log_query_source_old
  source = extract_query_source_location(caller)                                                                       
  logger.debug("  ↳ #{source}") if source
def log_query_source_new
  source = extract_query_source_location(                                                                       
  logger.debug("  ↳ #{source}") if source

 Benchmark.ips do |b|'old using caller') { log_query_source_old }'new using caller_locations') { log_query_source_new }!

On a complex real-world rails app (where the log_query_source often contributes 80-120ms in a request)

new using caller_locations:    12436.4 i/s
          old using caller:    7101.4 i/s - 1.75x  (± 0.00) slower

On a brand new rails app

new using caller_locations:    14611.3 i/s
          old using caller:    7947.0 i/s - 1.84x  (± 0.00) slower

In a real world app, making this change took the time it took for this method down from 80ms to ~45ms in a single request (totalling all the time it’s invoked in a single request for all SQL queries)

Now, it wouldn’t matter if it was even 5x faster on a new rails app, the difference would ultimately be insignificant. Still, as the size increases, and as the #no of SQL queries increases this adds significant overhead to every request.

An explanation for why the latter might be faster(can ignore):

That might seem weird at first.

Why would caller_locations which returns array of ruby objects be faster than simply using caller which returns array of strings?

Dwelving into the implementation, I think it’s because, the actual string computation is fairly expensive and when using caller it has to compute the actual backtrace string for every single item in the stack, even though in a lot of the cases, what we care about is going to be near the top. caller_location on the other hand does not compute the string, but it’s to_s map to the same C method that caller uses.

You can see this stuff here:

  1. Both caller and caller_location use this. For caller, to_str is true so you get an array of strings, and for the latter, it’s false so you get an array of Thread::Backtrace::Location objects.
  2. This is what effectively every element in caller uses.
  3. This is the equivalent for #2 for caller_location. Note: how we are not computing the location of every item.
  4. And, I think this is where we make the to_s for a Thread::Backtrace::Location object use the same implementation that caller uses. (Disclaimer: I’ve never worked on C code, so if you see any problems, feel free to let me know)

So using caller_location avoids computing the string for every single item in advance (a brand new rails app had around ~140 items in the array, the complex one I originally started with had around ~210 items on average)

This was on MRI. I haven’t seen what difference it makes on other ruby runtimes. I was reading the source for JRuby, and this stuff seems to have been implemented like how it’s done in MRI, so I’d assume we should see a similar improvement.

Not sure if things like these are usually accepted into Rails, so thought to share it here if anyone’s interested.

Can look into preparing a PR if there’s positive outlook.