Yield record ids to "in_batches" block

jordan-brough · July 18, 2022, 6:46pm

I’d like to use in_batches to find record ids and then enqueue async jobs for them using only the ids.

Currently I can do the following but it generates 2 queries instead of 1:

>> User.in_batches { |relation| relation.pluck(:id).each { ... } }
(0.9ms)  SELECT "users"."id" FROM "users" ORDER BY "users"."id" ASC LIMIT $1  [["LIMIT", 1000]]
(0.8ms)  SELECT "users"."id" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)  [["id", 1], ["id", 2], ["id", 3], ["id", 4], ["id", 5], ["id", 6], ["id", 7], ["id", 8], ["id", 9], ["id", 10], ["id", 11], ["id", 12], ["id", 13], ["id", 14]]

I’d like instead to be able to do the following:

>> User.in_batches { |relation, ids| ids.each { ... } }
(0.9ms)  SELECT "users"."id" FROM "users" ORDER BY "users"."id" ASC LIMIT $1  [["LIMIT", 1000]]

in_batches already loads the record ids, so I think we could accomplish this with a change here:

--- a/activerecord/lib/active_record/relation/batches.rb
+++ b/activerecord/lib/active_record/relation/batches.rb
@@ -257,7 +257,7 @@ def in_batches(of: 1000, start: nil, finish: nil, load: false, error_on_ignore:
         primary_key_offset = ids.last
         raise ArgumentError.new("Primary key not included in the custom select clause") unless primary_key_offset
 
-        yield yielded_relation
+        yield yielded_relation, ids
 
         break if ids.length < batch_limit

Does that sound OK? I’m happy to prepare a pull request if so.

Thanks.

nikita · July 18, 2022, 8:20pm

I think that makes sense, I wonder if we could even consider having capability like pluck_in_batches

On an unrelated note, unless you are very cautious about memory, you could do

User.select(:id).find_each { |user| MyJob.perform_later(id: user.id) }

jordan-brough · July 19, 2022, 9:58pm

unless you are very cautious about memory, you could do …

Yup and that’s pretty much what I’m doing now, but if I’m fetching a million records via batches and I only need the ids it’d be nice to skip instantiating all the ActiveRecord objects, especially since we already have all the ids in hand in the existing code.

Topic		Replies	Views
Feature proposal: Use find_each/find_in_batches with pluck rubyonrails-core	9	4272	January 28, 2024
ActiverRecord find_in_batches to take option[:order] rubyonrails-core	7	205	March 6, 2009
What am I doing wrong with find_in_batches? rubyonrails-talk	2	120	October 12, 2010
[ActiveRecord] Feature proposal: `ActiveRecord::Batches#each_batch_bounds` rubyonrails-core	5	491	September 5, 2018
#find_in_batches should support returning Active Relations as well as arrays rubyonrails-core	1	193	December 3, 2013

Yield record ids to "in_batches" block

Related topics

More Resources