I was benchmarking a real-world complex rails app and found that ActiveRecord::LogSubscriber#log_query_source contributes as much as 80-120ms. I was wondering if we could improve this.
From what I found, simply changing caller
with caller_locations.lazy.map(&:to_s)
yields significant improvements, and should be equivalent.
def log_query_source_old
source = extract_query_source_location(caller)
logger.debug(" ↳ #{source}") if source
end
def log_query_source_new
source = extract_query_source_location(caller_locations.lazy.map(&:to_s))
logger.debug(" ↳ #{source}") if source
end
Benchmark.ips do |b|
b.report('old using caller') { log_query_source_old }
b.report('new using caller_locations') { log_query_source_new }
b.compare!
end
On a complex real-world rails app (where the log_query_source often contributes 80-120ms in a request)
Comparison:
new using caller_locations: 12436.4 i/s
old using caller: 7101.4 i/s - 1.75x (± 0.00) slower
On a brand new rails app
Comparison:
new using caller_locations: 14611.3 i/s
old using caller: 7947.0 i/s - 1.84x (± 0.00) slower
In a real world app, making this change took the time it took for this method down from 80ms to ~45ms in a single request (totalling all the time it’s invoked in a single request for all SQL queries)
Now, it wouldn’t matter if it was even 5x faster on a new rails app, the difference would ultimately be insignificant. Still, as the size increases, and as the #no of SQL queries increases this adds significant overhead to every request.
An explanation for why the latter might be faster(can ignore):
That might seem weird at first.
Why would caller_locations
which returns array of ruby objects be faster than simply using caller
which returns array of strings?
Dwelving into the implementation, I think it’s because, the actual string computation is fairly expensive and when using caller
it has to compute the actual backtrace string for every single item in the stack, even though in a lot of the cases, what we care about is going to be near the top.
caller_location
on the other hand does not compute the string, but it’s to_s
map to the same C method that caller
uses.
You can see this stuff here:
- Both caller and caller_location use this. For
caller
,to_str
is true so you get an array of strings, and for the latter, it’s false so you get an array of Thread::Backtrace::Location objects. -
This is what effectively every element in
caller
uses. -
This is the equivalent for #2 for
caller_location
. Note: how we are not computing the location of every item. - And, I think this is where we make the
to_s
for a Thread::Backtrace::Location object use the same implementation thatcaller
uses. (Disclaimer: I’ve never worked on C code, so if you see any problems, feel free to let me know)
So using caller_location avoids computing the string for every single item in advance (a brand new rails app had around ~140 items in the array, the complex one I originally started with had around ~210 items on average)
This was on MRI. I haven’t seen what difference it makes on other ruby runtimes. I was reading the source for JRuby, and this stuff seems to have been implemented like how it’s done in MRI, so I’d assume we should see a similar improvement.
Not sure if things like these are usually accepted into Rails, so thought to share it here if anyone’s interested.
Can look into preparing a PR if there’s positive outlook.