find_in_batches has several major limitations -
-
order and limit are not supported
-
joins can break it (i.e. if there end being multiple records with the same primary key, the next batch might miss some that were truncated by the limit)
-
it forces a table scan because it’s ordering by primary key, making it inefficient
All of these can be worked around by either using cursors or temporary tables. Would a patch to automatically use such features (if the need for it is detected, like where it is currently warning about order and limit) be accepted? What would the suggested way to structure such a patch, given that it would use DB specific features? Add some stubs to SchemaStatements or AbstractAdapter, and call them from find_in_batches? I’m guessing detecting the adapter inline is frowned upon.
For reference, we’re using such an implementation right now in our project (Rails 2.3; we’re in the process of upgrading): canvas-lms/active_record.rb at release/2013-11-16.13 · instructure/canvas-lms · GitHub.
Cody Cutrer