find_in_batches + query_cache = bloat

Woody_Peterson · December 30, 2010, 11:00pm

We've been using find_in_batches to reduce memory usage, and recently noticed one of our more intensive background processes had a huge memory footprint (600mb+) and was getting killed by our memory monitor. We were unable to reproduce this in development, and after investigation, the culprit is query_cache. Wrapping the task in ActiveRecord::Base#uncached kept the task stable (~200mb), and it's not hard to imagine why. Looping over thousands of items while eager loading many more likely grows the cache to huge amounts, which seems counter to the use case for find_in_batches.

So first of all, this is an FYI. Beyond that, all the ways in which we use find_in_batches would be aversely affected by the query_cache; sure, it *might* make a query faster, but it *definitely* will grow in memory, as you are expected to use it across thousands of records. Given that find_in_batches' use case is to reduce memory when searching across thousands of records, should it not be default behavior to disable query cache for find_in_batches operations?

Frederick_Cheung · December 31, 2010, 11:07am

We've been using find_in_batches to reduce memory usage, and recently noticed one of our more intensive background processes had a huge memory footprint (600mb+) and was getting killed by our memory monitor. We were unable to reproduce this in development, and after investigation, the culprit is query_cache. Wrapping the task in ActiveRecord::Base#uncached kept the task stable (~200mb), and it's not hard to imagine why. Looping over thousands of items while eager loading many more likely grows the cache to huge amounts, which seems counter to the use case for find_in_batches.

Were you manually turning on query cache in your background processes? (I was trying to think why I hadn't been bitten by this before and remembered that the query cache is turned on via an around filter by default, so doesn't affect scripts run by hand, daemon processes etc)

So first of all, this is an FYI. Beyond that, all the ways in which we use find_in_batches would be aversely affected by the query_cache; sure, it *might* make a query faster, but it *definitely* will grow in memory, as you are expected to use it across thousands of records. Given that find_in_batches' use case is to reduce memory when searching across thousands of records, should it not be default behavior to disable query cache for find_in_batches operations?

Seems sensible. I'm not sure how tight the scope of your disabling should be, ie should query caching be forced off for the contents of the block? The block might also be doing lots of stuff that is inherently pointless to cache, but equally it might not.

Fred

Topic		Replies	Views
find_each leaks (a lot)? rubyonrails-talk	0	167	February 1, 2010
ActiveRecord: reducing memory footprint with find rubyonrails-talk	1	123	January 16, 2008
memory usage in jruby with Active record find_in_batches() rubyonrails-talk	0	238	December 7, 2012
Script/Runner rubyonrails-talk	1	94	October 28, 2008
AR, find(:all), loops and memory usage rubyonrails-talk	3	214	December 4, 2006

find_in_batches + query_cache = bloat

Related topics

More Resources