Thursday Feb 07, 2013

NoSQL with MySQL's Memcached API

One of our training courses has a section covering MySQL's Memcached API, and how it works. In the discussion, there's a line that goes like this:

 "A key is similar to a primary key in a table, and a value is similar to a second column in the same table"

For someone well versed in database tables but not so much in key-value stores, that sentence might take a bit of grasping. So, let's break it down.

An Example Key/Value Store 

Imagine the table kvstore has a column key and a column value. Also imagine that we've set up the Memcached plugin in MySQL and configured it to use that table and those columns as its store. I won't get into that bit for now, but trust me, it's not that hard.

You might be familiar with statements like this:

REPLACE INTO kvstore (key, value) VALUES ('X', 'Y');
SELECT value FROM kvstore WHERE key='X';

Now imagine you want to be able to get at your values with a simple syntax like this:

set X Y ...
get X

...or if you're using PHP (and have a MySQL memcached connection called $m), you might want something like this:

$Y = $m->get($X);
$m->set($X, $Y, ...); 

Note the simplicity of the later statements, and the absence of the words "kvstore", "key" or "value"; Once you've configured the Memcached plugin, you don't have to worry about those. Remember, it's a key/value store, so we just worry about the keys and the values and let the configuration take care of everything else.

The memcached API merely provides you with a different way of getting at your InnoDB data. Although I've greatly simplified the memcached protocol examples above, you can get the idea. Simply put, you can read and write values without having to construct entire SQL statements. Not only is this faster for you, the programmer, but it's also faster for MySQL.

Now, SQL is a much more expressive language, and lets you do things like aggregation and range-based processing, but at a cost: that the SQL parser, statement normalizer, and optimizer have to do their jobs regardless of whether it's a simple SELECT or a complex aggregated join with subqueries, whereas the memcached API just does one thing per statement, so you bypass all the extra CPU work that comes before the storage engine kicks in.

So, why bother?

An Example Use Case 

MySQL's NoSQL feature is comparable with other key-value caches. These are often used on the web when you want to serve pieces of the page quickly, but the page as a whole differs between users.

Imagine you've got 100,000 users of a web page, and each one wants to see a different page when they're logged in. You want to see your login name and your current karma/kudos/rep at the top right, and I want to see my relevant information on mine. The navigation div at the top and on the common links on the left will be the same for both of us, but my list of "tags" (or subsectons or whatever other personal navigation shortcuts) will belong to me and yours to you. Then the body content is going to be the same (per article) but different (per page), so my front page will show articles 41, 42, 45, and yours will show articles 42, 43, 44, depending on our options. And the same goes for each of the other 100,000 users who visit the site and want their own personal content.

Now the page generation logic, rather than reading the entire navigation code and articles and navigation shortcuts and so on for each page, becomes a matter of asking the cache for the HTML values contained in the keys "user12345-header" and "common-nav" and "common-list" and "user12345-navshortcuts" and "article-41" and "article-42" and ... etc. So the page processing becomes a case of asking the cache for content rather than generating the content each time. This provides massive performance improvements over generating each page's content per page view, although you need to change your logic to catch cache-misses and generate the HTML code to put into the cache accordingly, and to handle session/cache timeouts.

MySQL's Advantage 

In MySQL this is even better than many other key-value stores, because some cache-misses are automatically taken care of; with InnoDB as the backing store, you've two levels of "cache-hit", the first being when the page is already in the buffer pool, so you're getting the key's value from RAM (which is about as fast as existing key-value caches), but if the page isn't in the buffer pool, InnoDB automatically reads it from the table. This isn't anything new; it's how InnoDB already works with table data. With other key-value stores, if it's not already in RAM, you just get a cache-miss and your code has to generate the data. With InnoDB, it's got this effective second-level cache that stores stuff on disk when it's not in RAM.

Plus you get the benefit of other parts of the application being able to use the power of the SQL API (remember those SELECTs and subqueries and joins and aggregation and all the other fun stuff that key-value stores don't have?) to run reports on all your cache data. That's why they call it "Not only SQL". And you get this without having to do any ELT transfers or writing clever persistence logic for your cache. You get that bit for free.


Jeremy Smyth writes MySQL training courses, and likes exploring interesting questions that come up from novices and experts alike.



« July 2016