Counter caches are used to avoid costly sql queries like parent.children.any?
/parent.children.none?
/parent.children.count
or just make some queries based on the column itself etc.
It is very easy to add a counter cache column to a new table, just add a column.
It is also easy to add a counter cache column to an existing small table - just add a column, lock additionally the referenced table (to avoid adding new records) and backfill it in a single transaction - this will not take too much time and is safe.
Adding a counter cache to a big table is problematic. It should be done in multiple steps.
One way:
- add a counter column to the parent table
- add triggers to the referenced table to update the parent table when records are created/deleted
- add
counter_cache: true
to the parent model - delete the triggers
Or we can change step 2 to use a child model callback instead.
But in either case, this makes introducing counter caches more complex than it needs to be and many people (myself included some years ago) just don’t think about (aren’t aware of) this problem and do it the straightforward (incorrect) way.
So I propose a new option, something like: counter_cache: { filled: false }
, which skips considering the column to optimize some COUNT
based queries (like in the examples at the beginning) and maybe raises so error when trying to access it via record.somethings_count
.
With this change, the new process for adding a counter cache will be like the following:
- add a column
- add
counter_cache: { filled: false }
- backfill
- remove
filled: false
, so now the app uses this column
Nice and simple, no triggers, callbacks and less chances to get a race condition
Wdyt? I would like to introduce this feature into rails.
@byroot Interested in your opinion on this. Does Shopify uses rails’ counter caches and how you solve the mentioned problem.