getting rid of AS::Cache::Entry in the memcached store

As you probably know, the builtin memcached store wraps values in instances of ActiveSupport::Cache::Entry. I am studying whether we could get rid of it and your input would be valuable in helping us take a decision.

In addition to the cached value, these entries store the creation timestamp, TTL, and compression flag.

Rails 4 uses Dalli for the builtin memcached cache store, and Dalli supports compression. It uses a flag to mark the stored value. We do not need to handle compression by hand in AS with Dalli.

The creation and TTL values are there because albeit memcached handles expiration, Active Support is doing a first expiration pass on the values: on writing (in 3-2-stable, but should be also on master) Active Support adds 5 minutes to the TTL set by the user. Then when an entry is read, there is code common to all stores that checks whether the entry expired, and if so deletes manually the entry.

If you read the entry in that 5 minute window, expiration is handled by AS, otherwise, it is handled by memcached (5 minutes later than the user specified).

That manual management of expiration makes sense if the store does not support expiration (like the file store). What’s the point in the memcached store? We need to support :race_condition_ttl.

:race_condition_ttl is an optional TTL you may pass in a fetch call that goes like this:

  1. If we read the entry and according to the user it has expired, then reset the TTL to race_condition_ttl to have everyone else fetch the “stale” entry while this process computes the new fresh value.

  2. When the value is computed, store it with the normal TTL (+ 5 minutes), and return it.

This feature was introduced as a way to avoid the dog pile effect in heavy loaded websites that result from too many processes trying to get the fresh value when said value is expensive to compute. This can indeed be a problem.

Memcached does not give you the TTL of a key, therefore we need this timestamp to support this feature.

Question is, how many people need this? I wonder if the trade-off justifies taxing all users. There are suboptimal techniques that may help sites up to certain sizes that involve a second key that acts as an advisory lock or something like that. And this complete solution could be a cache store implemented by someone else in a separate plugin, if you totally need this :race_condition_ttl.

So, the question is, do you guys use this option? Thoughts?

The Entry model does provide one other feature which I think also warrants keeping it around and which is even potentially more valuable than the race_condition_ttl. It allows the cache to store nil values.

If you have a statement like this:

record = Rails.cache.fetch(“my_cache_key”){ Model.expensive_query_that_returns_nil }

With the current implementation of ActiveSupport::Cache the query results can be cached (because the nil result is wrapped in an Entry object). This was not possible with the previous (Rails 2.x) version that did have the Entry class. It is also not possible in the current Dalli cache store implementation (which does away with the Entry model). With dalli, the above code will execute the expensive query every time.

That being said, there is certainly room for improving the efficiency of the Entry model and since it is something called frequently and in performance critical section of code, I think it is certainly warranted.

I’ve code up an optimized version of it here: https://github.com/bdurand/rails/commit/e78c30100f54ae4366fdb9352987bae9b8c1c1e7.

I’ve run some tests on performance of this improved code over the previous implementation as well as just using raw strings.

value = [4952 byte string]

Test commands:

  1. Marshaling overhead in bytes: Marshal.dump(ActiveSupport::Cache::Entry.new(value)).bytesize - value.bytesize
  2. Milliseconds to create and serialize entry: t = Time.now; 100000.times{Marshal.dump(ActiveSupport::Cache::Entry.new(value))}; ((Time.now - t) / 100).round(3)
  3. Milliseconds to deserialize entry and retrieve its value: t = Time.now; 100000.times{e = Marshal.load(serialized_value); e.value unless e.expired?}; ((Time.now - t) / 100).round(3)

Raw String (i.e. run the test code without wrapping value in the Entry class):

  1. 12 bytes
  2. 0.006 ms
  3. 0.005 ms

Old Implementation:

  1. 121 bytes
  2. 0.020 ms
  3. 0.016 ms

New Implementation:

  1. 58 bytes
  2. 0.009 ms
  3. 0.008 ms

The optimized implementation still adds a slight “tax” on all calls to the cache, but it’s a 2x improvement over the previous implementation. The tax with the improved code is 46 bytes, and 0.003 ms increase in time to read and write entries so even a request reading 300 entries from the cache has a tax of less than 1ms.

String is pretty well optimized in Ruby, so I also ran the same series of tests using a more complex custom Ruby object with 11 internal instance variables (similar to an ActiveRecord model). The tax remained the same, but was far less significant compared to the overall resources needed to serialize and deserialize the more complex object (0.026 ms without Entry, 0.030 ms with Entry).

Hi Brian,

The Entry model does provide one other feature which I think also warrants keeping it around and which is even potentially more valuable than the race_condition_ttl. It allows the cache to store nil values.

You are right. That in itself justifies Entry I believe. We could add a comment in the source code of the memcached store that documents these two pros for the next guy wondering about why we use Entry :).

That being said, there is certainly room for improving the efficiency of the Entry model and since it is something called frequently and in performance critical section of code, I think it is certainly warranted.

I’ve code up an optimized version of it here: https://github.com/bdurand/rails/commit/e78c30100f54ae4366fdb9352987bae9b8c1c1e7.

Awesome! Would you be so kind as to prepare a pull request with these enhancements? It would need a CHANGELOG entry that documents the serialized object changes. Also worth mentioning in the upgrading guide I believe.

Do you think we could hack something in order to support already cached values on deserialization to ease upgrading? Otherwise all caches would need to be wiped.

I disagree very strongly with this line of reasoning. If you’re using memcached as a cache store, you should be relying on its semantics, which is that nil and ‘no value’ are identical.

Even with the optimised version you’re talking storing 58 bytes vs 12 bytes, for a site with a 1GB of stuff to store, should they really need to allocate 5G of cache storage just so people who want to store nil (without using their own wrapper) can?

The particular site where I encountered this issue upgraded to a version of rails which used AS::Entry and had their working set expand from 4G to more than 10. Frankly that’s not even close to an acceptable trade off.

Basing the store on that premise seems sound to me.

I have checked the memcache binary protocol and memcache does tell you if a key is not known. Wouldn’t dalli be the one not following the semantics here?

If I understand everything correctly we could theoretically use a different client that provides that interface and allow us to cache nil values. Not claiming such client exist, only digging into where is the real problem regarding nil.

I agree with Michael here. While I appreciate the attempt to eliminate the dog-piling effect (I worked on a project where that was a problem), it seems counterproductive to cache the cache, and end up taking a lot more memory. If we want the cache in app memory we would use the :memory_store.

Brian Morearty

Is rails providing a generic lowest-common-denominator caching interface to many caching stores, memcached being only one of them? If so, then it would make sense to provide a consistent semantic to all of them, rather than make each one behave quite differently based on what semantic each one internally uses. This way you can easily switch between them without modifying all your code.

Of course if one is using memcached directly then it makes sense to use memcached semantics, and be more optimal about it too.

Dave

Yeah, definitely memcache tells you in both protocols whether the key was found or not (never set, expired, whatever). So I’d say storing nil goes well with memcache semantics.

The basic premise that we should follow memcache semantics is sound regardless of the actual semantics :).

So in my view we have impedances in two places: AFAIK dalli returns nil in either case, and the interface of AS stores documents the same behavior for #read.

If the client gave you a way to distinguish whether the key is unknown or the cached value nil, you could implement a smarter #fetch with block in the memcache store, though #read would not be able to tell you unless we modify the contract.

Given this situation my thoughts are:

  1. I’d love to get rid of Entry due to the impact.

  2. But, we should be able to cache nil values in #fetch.

Unless AS changes the memcache client with another one that supports reporting unknown keys, I see no way to get rid of Entry. And if we keep Entry, the enhancements done by Brian are an important improvement over the current implementation.

In the example you give, the cache would not expand from 1GB to 5GB unless you’re storing very small values with very small keys. If you average key + value size is around 1K, then the cache would only expand by ~4%, not 500%. If you need to store lots of small values, you should probably go directly to storing raw values directly with dalli since any use ActiveSupport::Cache will add the overhead of an extra layer. If you’re storing non-Strings to memcache, the overhead of the internal instance variable names are probably adding a far larger overhead than Entry is adding since the names are stored in strings in the serialized object.

I think it makes the most sense to look at ActiveSupport::Cache as a generic, consistent interface to a variety of caching back ends and have the default implementations support all features. There’s nothing to prevent someone from implementing an optimized memcache backend if they have special needs for performance.

In the example you give, the cache would not expand from 1GB to 5GB unless you’re storing very small values with very small keys. If you average key + value size is around 1K, then the cache would only expand by ~4%, not 500%. If you need to store lots of small values, you should probably go directly to storing raw values directly with dalli since any use ActiveSupport::Cache will add the overhead of an extra layer. If you’re storing non-Strings to memcache, the overhead of the internal instance variable names are probably adding a far larger overhead than Entry is adding since the names are stored in strings in the serialized object.

I think it makes the most sense to look at ActiveSupport::Cache as a generic, consistent interface to a variety of caching back ends and have the default implementations support all features. There’s nothing to prevent someone from implementing an optimized memcache backend if they have special needs for performance.

This is the root of the disagreement I guess. ActiveSupport::Cache is a generic interface to a variety of backends, each of which have their own limitations and restrictions. Just as active record does its best to provide a consistent interface but at the edges there are limitations or differences in how the different databases behave or query languages they support. If someone were to propose changing the way active record stores its data in postgresql in order to implement support for ON DUPLICATE KEY UPDATE, we’d reject that out of hand.

Running the same code against two different cache stores isn’t particularly valuable in the face of the different performance characteristics of the different backends. No one is switching between memcached, redis and memory store on a regular basis.

As for the particular nil vs missing thing, the memcached gem differentiates between the two cases, but as it depends on native code it’s not something that we want to have as a dependency. Perhaps the simplest solution for this is for you to approach mike perham who looks after dalli and try to fix this particular issue there? There are also various forks of libmemcached_store from 37signals in use which will most likely do what you want.

Let me propose something.

Since we are already using Entry, Brian’s approach is written, and it is a drop-in bringing great improvement for many use cases over the current store (caching counters, flags, small strings, etc.), I’d suggest that we work on a pull request with that patch. This option is very concrete and doable in the short term. That way we make sure Rails 4 ships with at least these improvements.

Then, if someone can go further and make dalli report missing keys somehow, or want to write a plugin which does not use Entry at the cost of not supporting nil and race_condition_ttl, perfect. And if the former happens, then let’s see what do we do.

Any objection?

So ship this alternative with core? Yeah seems good to me.

I was thinking also about the possibility of using an array instead of Entry. Arrays dump to even much less bytes, though they have the cons that are harder to evolve if we add any other metadata to the entry.

In my experience small values like counters and flags are common, so I think squeezing this as much as possible is worth pursuing.

I opened a pull request on my changes: https://github.com/rails/rails/pull/7800.

I updated the code a bit and fixed the tests so it is slightly different than the branch referenced above. It should now be backward compatible with older cache entries as well.

As for the Array idea, that would certainly reduce overhead a little bit, but I’m not sure it’s worth it in terms of maintainability. For what it’s worth, 27 of the 58 byte serialization overhead comes just from the class name “ActiveSupport::Cache::Entry”.

I opened a pull request on my changes: https://github.com/rails/rails/pull/7800.

I updated the code a bit and fixed the tests so it is slightly different than the branch referenced above. It should now be backward compatible with older cache entries as well.

Awesome, it is merged, thanks!

As for the Array idea, that would certainly reduce overhead a little bit, but I’m not sure it’s worth it in terms of maintainability. For what it’s worth, 27 of the 58 byte serialization overhead comes just from the class name “ActiveSupport::Cache::Entry”.

Yeah, the idea would be not to use a class like that, well maybe in the Ruby side, but anyway serialize/deserialize an array. I have not tried to implement it, not sure I am not missing some blocker:

1.9.2p320 :007 > Marshal.dump([1, Time.now.to_i, false])

=> “\x04\b[\bi\x06l+\a\xC0~hPF”

1.9.2p320 :008 > _.length

=> 14

It feels rigid though. Going with a hash is more flexible

1.9.2p320 :009 > Marshal.dump(‘v’ => 1, ‘x’ => Time.now.to_i, ‘c’ => false)

=> “\x04\b{\bI”\x06v\x06:\x06ETi\x06I"\x06x\x06;\x00Tl+\a\xCB~hPI"\x06c\x06;\x00TF"

1.9.2p320 :010 > _.length

=> 39

but then the benefit is maybe less clear.

In any case, we have now a much efficient Entry class for Rails 4, and it understands existing entry objects. That’s great!