I’ve ran into a problem a few times in our Rails codebase where a developer will pass an array of ActiveRecord objects as an argument which will serialise into an array of Global IDs, which are then queried one by one when deserialising inside the job resulting in an N+1. This gist replicates this scenario: ActiveJob serialise/deserialise with an array N+1 replication · GitHub
This occurs because of how serialisation and deserialisation currently works with arrays in ActiveJob. When serialising a value that is an Array, ActiveJob simply maps over the array and calls serialize_argument which will go through and convert the records to a Global ID hash:
Upon deserialisation, ActiveJob will do basically the same thing except call deserialize_argument for each record:
This results in each Global ID being deserialised with GlobalID::Locator.locate which results in an N+1
This isn’t really a bug, just an unfortunate side-effect of how serialisation/deserialisation currently works for arrays. An obvious workaround for this would be to simply pass an Array of IDs and query for them manually in the job, but it would be great if ActiveJob could be “smarter” here.
Two potential solutions I’ve thought of are:
- We leave serialisation as-is, and update
derserialize_argumentwhen checking forArrayto see if the records in the array are global ID hashes, if they are we extract the IDs and useGlobalID::Locator.locate_many
The main problem I see with this is we have to worry about cases where an array of mixed objects are passed. This means that we can’t simply check the first value in the array to then decide whether we handle the case of an array of Global IDs, because there could be at least 1 other object in the array that isn’t a Global ID. If we check the whole array then we risk unnecessary performance overhead.
- We add a custom serialiser for ActiveRecord::Relation objects into ActiveJob
The way this would work is we would add a new key GLOBAL_ID_MANY_KEY similar to GLOBAL_ID_KEY. Then on serialisation we would execute the query, build a Global ID hash for each, and then put it in a hash with that key as a value. When deserialising we would check for a hash with that key, extract the Global IDs and then query for all of the records at once with GlobalID::Locator.locate_many.
The main problem I see with this is the behaviour may be somewhat surprising, especially considering the argument would essentially be converted into an array of records when accessed inside of the job.
Has anyone else run into this problem and have any other creative solutions? Keen to hear any thoughts on the topic ![]()