Is Active Record's query cache actually useful?

I recently had to fix stale Active Record query cache in rodauth-rails – RodauthApp.rodauth.create_account does not clear query cache · Issue #204 · janko/rodauth-rails · GitHub.

Why does Active Record have a query cache, and why is it enabled by default? Does it actually make a real performance difference? I thought developers are smart enough not to repeat expensive queries multiple times within a request, and that it shouldn’t be Active Record’s job to help with this.

Sequel doesn’t have query caching functionality, and I don’t remember it ever being requested. Because I cannot find any resources around the decision to add it to AR, I can’t help but think it doesn’t provide value, that it’s just an unnecessary maintenance burden for Active Record.

2 Likes

I wondered the same things a few years back so I instrumented it in our app expecting to find very few hits and that I’d remove it. But the hit rate was surprisingly decent (~15% if my memory serves me right), enough that we decided to keep it.

That said it’s a single datapoint and I’d welcome more feedback.

As for the issue you encountered, it is know and I’ve meant to fix it for while, I’m planning to make low-level APIs like #execute clear cache by default to avoid these problems.

Another issue I hear from time to time is very long jobs causing the cache to grow a lot, for that I think we should make it some sort of LRU.

Sequel doesn’t have query caching functionality, and I don’t remember it ever being requested.

I don’t think it’s a sound argument.

4 Likes

I think there is a low-contention step that would help massively here. We need assertions in minitest for the number of DB queries that are actually sent to the DB.

Engineers need an easy way to test that a piece of functionality (controller actions in particular) are generating the expected number of DB hits and that mechanisms like includes and query caching and such are all functioning as they expect.

If we had that and used it for a year, I think everyone would know quite well how often (or not) things like query caching are helping.

I thought developers are smart enough not to repeat expensive queries multiple times within a request

It’s not about intelligence but trade-offs. It’s very easy in ActiveRecord to run the same query multiple times because of the abstractions. There are some fairly low-hanging fruit such as includes and the inverse_of option on relationships that can help reduce this but sometimes there is time that has to be spent to track down the source of redundant queries and coming up with a way to avoid them. Depending on the type of work being done there may or may not be the priority to spend that time. How much traffic is the app getting, how much use is that specific function getting, funding, etc.

For environments where the resources are there to dig in and ensure the optimal number queries are being executed they can still do this. But for everywhere else (and lets be honest, most places don’t have the resources to dig in) the query cache provides a fallback that helps ensure the performance costs of the redundant queries isn’t too large.

Of course there is nothing stopping a shop from turning it off and doing the engineering work to not have the extra queries. But as a reasonable default I like it and find it useful.

2 Likes

Right. Such test helper actually exist inside Rails own test suite, and I do have a similar assert_queries helper in the vast majority of my apps. On some bigger apps, the helper even allow to test how many queries did hit the query cache.

So yeah I’d be quite in favor of this.

I appreciate the comments, for some reason I forgot to come back to this until today.

I wondered the same things a few years back so I instrumented it in our app expecting to find very few hits and that I’d remove it. But the hit rate was surprisingly decent (~15% if my memory serves me right), enough that we decided to keep it.

@byroot Those are interesting findings. I’m kind of expected the hit rate might be that high in some apps, I just thought that the performance impact of retrieving by ID. But I remembered now that the query cache is used for any SELECT statement, not just finding by ID.

Of course there is nothing stopping a shop from turning it off and doing the engineering work to not have the extra queries. But as a reasonable default I like it and find it useful.

@e_a Yeah, I suppose it makes sense that it’s the default if it improves performance. I admit I only encountered issues when doing something very rare and specific (allowing Sequel to reuse Active Record’s DB connection).