I’d like to use in_batches to find record ids and then enqueue async jobs for them using only the ids.
Currently I can do the following but it generates 2 queries instead of 1:
>> User.in_batches { |relation| relation.pluck(:id).each { ... } }
(0.9ms) SELECT "users"."id" FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
(0.8ms) SELECT "users"."id" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14) [["id", 1], ["id", 2], ["id", 3], ["id", 4], ["id", 5], ["id", 6], ["id", 7], ["id", 8], ["id", 9], ["id", 10], ["id", 11], ["id", 12], ["id", 13], ["id", 14]]
I’d like instead to be able to do the following:
>> User.in_batches { |relation, ids| ids.each { ... } }
(0.9ms) SELECT "users"."id" FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
in_batches
already loads the record ids, so I think we could accomplish this with a change here:
--- a/activerecord/lib/active_record/relation/batches.rb
+++ b/activerecord/lib/active_record/relation/batches.rb
@@ -257,7 +257,7 @@ def in_batches(of: 1000, start: nil, finish: nil, load: false, error_on_ignore:
primary_key_offset = ids.last
raise ArgumentError.new("Primary key not included in the custom select clause") unless primary_key_offset
- yield yielded_relation
+ yield yielded_relation, ids
break if ids.length < batch_limit
Does that sound OK? I’m happy to prepare a pull request if so.
Thanks.