Feature proposal: Use find_each/find_in_batches with pluck

I would like to provide a new feature on Rails that consist on the use of pluck when using find_each/find_in_batches to speed up the loop when is not need to access to active record instances.

Do you think this could be incorporated in Rails? For the moment I’m going to implement a solution for our use case on a Rails 3.2 app

Thanks in advance

Can you share an example of this proposal? Vipul A.M. +91-8149-204995

+1

I find one of the most frequent uses of find_each/find_in_batches is looping through a large collection in order to queue up a list of ids for a background job to process. e.g. queuing up a big list of user ids to send an email to. Would be nice to avoid the overhead of AR objects and just do something like:

User.some_scopes.pluck_each(:id) { |id| … }

User.some_scopes.pluck_in_batches { |batch| … }

Or maybe pluck could be an alternative to select?

User.some_scopes.pluck(:id).find_each { |id| … }

1 Like

Yes, that our use case too, and as you said we wanted to avoid the AR objects overhead

+1

By the way, with the new in_batches API you can do this. a la

User.some_scopes.in_batches.each do |users|

users.pluck(:id)

end

1 Like

Yes, “in_batches” will do the trick.

The problem with the proposed in_batches solution is that each batch produces two database queries, instead of one.

Indeed relation.pluck(:id) triggers a second SQL query… that should be unnecessary to get the IDs, because the batch already has that information.

Is there any way to avoid the second query to the database?

Best solution that I found so far (only 1 query per batch, only select ids) is to use select(:id).find_in_batches, however it unnecessarily instantiate the ActiveRecord objects… We would need something like “pluck_in_batches” to improve that.

2 Likes