Yield record ids to "in_batches" block

I’d like to use in_batches to find record ids and then enqueue async jobs for them using only the ids.

Currently I can do the following but it generates 2 queries instead of 1:

>> User.in_batches { |relation| relation.pluck(:id).each { ... } }
(0.9ms)  SELECT "users"."id" FROM "users" ORDER BY "users"."id" ASC LIMIT $1  [["LIMIT", 1000]]
(0.8ms)  SELECT "users"."id" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14)  [["id", 1], ["id", 2], ["id", 3], ["id", 4], ["id", 5], ["id", 6], ["id", 7], ["id", 8], ["id", 9], ["id", 10], ["id", 11], ["id", 12], ["id", 13], ["id", 14]]

I’d like instead to be able to do the following:

>> User.in_batches { |relation, ids| ids.each { ... } }
(0.9ms)  SELECT "users"."id" FROM "users" ORDER BY "users"."id" ASC LIMIT $1  [["LIMIT", 1000]]

in_batches already loads the record ids, so I think we could accomplish this with a change here:

--- a/activerecord/lib/active_record/relation/batches.rb
+++ b/activerecord/lib/active_record/relation/batches.rb
@@ -257,7 +257,7 @@ def in_batches(of: 1000, start: nil, finish: nil, load: false, error_on_ignore:
         primary_key_offset = ids.last
         raise ArgumentError.new("Primary key not included in the custom select clause") unless primary_key_offset
 
-        yield yielded_relation
+        yield yielded_relation, ids
 
         break if ids.length < batch_limit

Does that sound OK? I’m happy to prepare a pull request if so.

Thanks.

1 Like

I think that makes sense, I wonder if we could even consider having capability like pluck_in_batches

On an unrelated note, unless you are very cautious about memory, you could do

User.select(:id).find_each { |user| MyJob.perform_later(id: user.id) }

unless you are very cautious about memory, you could do …

Yup and that’s pretty much what I’m doing now, but if I’m fetching a million records via batches and I only need the ids it’d be nice to skip instantiating all the ActiveRecord objects, especially since we already have all the ids in hand in the existing code.