find_in_batches + query_cache = bloat

We've been using find_in_batches to reduce memory usage, and recently noticed one of our more intensive background processes had a huge memory footprint (600mb+) and was getting killed by our memory monitor. We were unable to reproduce this in development, and after investigation, the culprit is query_cache. Wrapping the task in ActiveRecord::Base#uncached kept the task stable (~200mb), and it's not hard to imagine why. Looping over thousands of items while eager loading many more likely grows the cache to huge amounts, which seems counter to the use case for find_in_batches.

So first of all, this is an FYI. Beyond that, all the ways in which we use find_in_batches would be aversely affected by the query_cache; sure, it *might* make a query faster, but it *definitely* will grow in memory, as you are expected to use it across thousands of records. Given that find_in_batches' use case is to reduce memory when searching across thousands of records, should it not be default behavior to disable query cache for find_in_batches operations?

We've been using find_in_batches to reduce memory usage, and recently noticed one of our more intensive background processes had a huge memory footprint (600mb+) and was getting killed by our memory monitor. We were unable to reproduce this in development, and after investigation, the culprit is query_cache. Wrapping the task in ActiveRecord::Base#uncached kept the task stable (~200mb), and it's not hard to imagine why. Looping over thousands of items while eager loading many more likely grows the cache to huge amounts, which seems counter to the use case for find_in_batches.

Were you manually turning on query cache in your background processes? (I was trying to think why I hadn't been bitten by this before and remembered that the query cache is turned on via an around filter by default, so doesn't affect scripts run by hand, daemon processes etc)

So first of all, this is an FYI. Beyond that, all the ways in which we use find_in_batches would be aversely affected by the query_cache; sure, it *might* make a query faster, but it *definitely* will grow in memory, as you are expected to use it across thousands of records. Given that find_in_batches' use case is to reduce memory when searching across thousands of records, should it not be default behavior to disable query cache for find_in_batches operations?

Seems sensible. I'm not sure how tight the scope of your disabling should be, ie should query caching be forced off for the contents of the block? The block might also be doing lots of stuff that is inherently pointless to cache, but equally it might not.

Fred