Currently, checking for array inclusion in Rails generates an IN list:
User.where(id: [1,2,3]).to_a
SELECT "users".* FROM "users" WHERE "users"."id" IN (1, 2, 3)
This is a problem because pg_stat_statements does not normalize the variable number of values that can be passed through IN lists: if the array you’re matching reaches a size of 100, as many as 100 duplicate queries can be tracked by pg_stat_statements. The result is twofold:
- developers aren’t able to monitor pg_stat_statements to get an accurate view of their query performance because queries are split up into hundreds of duplicates
- production applications generate so many duplicate queries that the historical data is pruned from pg_stat_statements, which makes historical analysis more difficult and is unnecessary work for the database
There have long been discussions on the Postgres mailing list to change this behavior: the current thread proposes grouping queries into buckets by the number of IN list values: 0-9, 10-99, 100-999, etc. But due to the complex implementation and lack of clarity on what should be the default behavior, it’s unlikely to be committed anytime soon.
Rails can resolve this issue in the meantime by constructing queries using any()
instead:
sql = <<~SQL
SELECT * FROM users WHERE id = any($1::bigint[])
SQL
ids = [1, 2, 3]
binds = [
ActiveModel::Attribute.from_database('ids', ids, ActiveRecord::Type.lookup(:text, array: true)),
]
User.connection.select_rows(sql, '', binds)
SELECT * FROM users WHERE id = any($1::bigint[])
[["ids", "{1,2,3}"]]
I would like to open a PR to make this the default behavior for Postgres. Thoughts on possible roadblocks to getting such a PR merged?