Memcached or ehCache? Performance Benefits of In-Memory Caching

Graph comparing memcached and ehcache

Memcached is a distributed, in-memory, cache that was popularized by LiveJournal. Memcached has good performance but it runs as a deamon (diagram) and an interesting question is whether local, cooperating, in-memory caches might do better.

Greg implemented this approach in ehCache (diagram, Wotif.COM) and recently ran some comparisons that suggest ehCache is much faster.

Greg gave a full report on this topic in TS-6039 at JavaOne but the slides are not yet online. Fortunately he just published a short summary in his blog site (see comparison graph). There was also a micro-session at CommunityDay and I'll let you know when we push the slides to the Virtual GlassFish Day page.

Comments:

It's apples and oranges. A put to memcached updates the only copy of the datum, consistently. Asynchronous replication of puts with ehcache is prone to inconsistencies. If asynchronous is ok, then we should compare against the cost of sending an async message to memcached as well: that would be zero.

A consistent solution with local caches requires heavier tools like virtually synchronous group communication, causal ordering and somesuch.

Posted by Matthias on May 19, 2007 at 08:26 PM PDT #

Thanks for the comment, Matthias. Did you try placing a comment in Greg's blog? He is the one that is best equiped to address your comments, mostly I highlight what seems sound and interesting. But I'll send an email to Greg to see if he wants to follow-up here... That said...

The <em>get</em> numbers would be comparable, right? For <em>put/remove</em>, wouldn't a disk-based sync olution do the trick? But I don't know if the numbers shown in the chart for "ehcache w/disk" involve sync or async. Again, Greg would know.

Posted by eduardo pelegri-llopart on May 21, 2007 at 03:25 AM PDT #

Hi,

Greg's Blog didn't offer commenting so I turned on you :-) You are right, get performance with local caches is of course much higher - indeed that is the value proposition of replicated caches. But the cost of consistent maintenance is grossly misrepresented here. You might get away with async if you can garantuee that one client always hits the same server.

I would have assumed that a force to disk is actually more expensive than a network roundtrip to memcached. Async, too?

Posted by Matthias on May 21, 2007 at 08:09 AM PDT #

Matthias

The disk stuff is fast because it is async. However there is no possibility of inconsistencies, because the spool is also checked.

In terms of inconsistencies, doesn't memcached act sort of like a database but without transactions. It does not have a commit concept. So let's say you were writing updates to memcached as part of a transaction. You put to the cache but then the transaction gets rolled back. Now maybe you catch that and roll back the put to memcached. But someone else might have used the data in the meantime.

My point is that caches are not meant to have full database-like semantics. It is usually acceptable for the data to be dirty or stale to some extent. That extent depends on the app.

In ehcache you can configure distributed synchronous invalidation where you really care about no-one having a stale copy. There are lots of different config options to satisfy the different usages. I used the default async one in the test as a simple comparison, and to stimulate discussions like this.

My overwhelming concern with the memcached approach is two-fold:
  1. Does it really offer any advantage for its style of caching to MySQL with in-memory tables?
  2. Almost all caching problems I am come up against have many reads to one write, so read performance is very important. Staging the data in-process where it can be read much faster than across a network is better. Greg

    Posted by Greg Luck on May 21, 2007 at 08:38 AM PDT #

Greg,

thanks for your answer. You're right - caching typically involves some tradeoffs for consistency. I'm just afraid that if you're not careful the effects might be no longer be understandable in production.

It might be worthwhile to investigate optimistic concurrency extensions to cache APIs/memcached: something like tryPut(key->timestamp preconditions, key->value updates) returns key->timestamp conflicts. But we're crossing a blurry line between cache and in-memory DB with that.

Anyhow - I was just reacting to how the comparison was made. I like local caches, too. Cheers, Matthias

Posted by Matthias Ernst on May 21, 2007 at 07:59 PM PDT #

Post a Comment:
Comments are closed for this entry.