NoSQL with MySQL's Memcached API

One of our training courses has a section covering MySQL's Memcached API, and how it works. In the discussion, there's a line that goes like this:

 "A key is similar to a primary key in a table, and a value is similar to a second column in the same table"

For someone well versed in database tables but not so much in key-value stores, that sentence might take a bit of grasping. So, let's break it down.

An Example Key/Value Store 

Imagine the table kvstore has a column key and a column value. Also imagine that we've set up the Memcached plugin in MySQL and configured it to use that table and those columns as its store. I won't get into that bit for now, but trust me, it's not that hard.

You might be familiar with statements like this:

REPLACE INTO kvstore (key, value) VALUES ('X', 'Y');
SELECT value FROM kvstore WHERE key='X';

Now imagine you want to be able to get at your values with a simple syntax like this:

set X Y ...
get X

...or if you're using PHP (and have a MySQL memcached connection called $m), you might want something like this:

$Y = $m->get($X);
$m->set($X, $Y, ...); 

Note the simplicity of the later statements, and the absence of the words "kvstore", "key" or "value"; Once you've configured the Memcached plugin, you don't have to worry about those. Remember, it's a key/value store, so we just worry about the keys and the values and let the configuration take care of everything else.

The memcached API merely provides you with a different way of getting at your InnoDB data. Although I've greatly simplified the memcached protocol examples above, you can get the idea. Simply put, you can read and write values without having to construct entire SQL statements. Not only is this faster for you, the programmer, but it's also faster for MySQL.

Now, SQL is a much more expressive language, and lets you do things like aggregation and range-based processing, but at a cost: that the SQL parser, statement normalizer, and optimizer have to do their jobs regardless of whether it's a simple SELECT or a complex aggregated join with subqueries, whereas the memcached API just does one thing per statement, so you bypass all the extra CPU work that comes before the storage engine kicks in.

So, why bother?

An Example Use Case 

MySQL's NoSQL feature is comparable with other key-value caches. These are often used on the web when you want to serve pieces of the page quickly, but the page as a whole differs between users.

Imagine you've got 100,000 users of a web page, and each one wants to see a different page when they're logged in. You want to see your login name and your current karma/kudos/rep at the top right, and I want to see my relevant information on mine. The navigation div at the top and on the common links on the left will be the same for both of us, but my list of "tags" (or subsectons or whatever other personal navigation shortcuts) will belong to me and yours to you. Then the body content is going to be the same (per article) but different (per page), so my front page will show articles 41, 42, 45, and yours will show articles 42, 43, 44, depending on our options. And the same goes for each of the other 100,000 users who visit the site and want their own personal content.

Now the page generation logic, rather than reading the entire navigation code and articles and navigation shortcuts and so on for each page, becomes a matter of asking the cache for the HTML values contained in the keys "user12345-header" and "common-nav" and "common-list" and "user12345-navshortcuts" and "article-41" and "article-42" and ... etc. So the page processing becomes a case of asking the cache for content rather than generating the content each time. This provides massive performance improvements over generating each page's content per page view, although you need to change your logic to catch cache-misses and generate the HTML code to put into the cache accordingly, and to handle session/cache timeouts.

MySQL's Advantage 

In MySQL this is even better than many other key-value stores, because some cache-misses are automatically taken care of; with InnoDB as the backing store, you've two levels of "cache-hit", the first being when the page is already in the buffer pool, so you're getting the key's value from RAM (which is about as fast as existing key-value caches), but if the page isn't in the buffer pool, InnoDB automatically reads it from the table. This isn't anything new; it's how InnoDB already works with table data. With other key-value stores, if it's not already in RAM, you just get a cache-miss and your code has to generate the data. With InnoDB, it's got this effective second-level cache that stores stuff on disk when it's not in RAM.

Plus you get the benefit of other parts of the application being able to use the power of the SQL API (remember those SELECTs and subqueries and joins and aggregation and all the other fun stuff that key-value stores don't have?) to run reports on all your cache data. That's why they call it "Not only SQL". And you get this without having to do any ELT transfers or writing clever persistence logic for your cache. You get that bit for free.

Comments:

Re: "In MySQL this is even better than many other key-value stores, because some cache-misses are automatically taken care of; with InnoDB as the backing store, you've two levels of "cache-hit"

"some cache-misses are automatically taken care of" seems to imply that some are not. Can you elaborate?

Posted by guestgihrig on February 11, 2013 at 09:16 PM GMT #

@guestgihrig If the application requests "user12345-header" from the database, one of three things can happen: 1 - It has been recently retrieved, so it's in InnoDB's buffer cache. 2 - It hasn't recently been retrieved (or was, but on a low-RAM machine it might have been pushed out of the buffer cache), but exists in the database as a table row. 3 - It's not in the table at all.

The first is preferable, being a cache hit both at the application level but also at InnoDB's level, because the data is in RAM. The second is an InnoDB cache miss, but the application still gets its data, albeit from hard disk instead of RAM. The third is a cache miss, because the application does not get its data from the database, so has to generate the data and store it in the key-value store.

To carry the analogy further - if I log in as "user12345" for the first time this week, and the application goes looking for my header div in the cache, it won't find it. That's option 3 above, and the application has to generate the header div. Once it's done that, it stores it in the database, which puts it both in RAM (because it's recently accessed) and on the hard disk, which is what InnoDB does anyway. When I request another page seconds later, that results in option 1 above; the header div is in the buffer cache, and so doesn't require a round-trip to the database. After 20 minutes, that row is still in the database (assuming the application doesn't kill 20-minute old sessions), but may have fallen out of InnoDB's buffer cache due to other data taking its place, but when the application requests it, it's still in the table, so it's retrieved successfully, as per option 2 above.

Posted by guest on February 11, 2013 at 09:56 PM GMT #

So if the data exists in the innodb table, I can expect MySQL memcached api to handle cache loading and retrieval automatically for 100% of requests.

I am not in the habit of thinking of data simply not existing in the DB as a 'cache miss', but it makes sense in light of the new memcached api.

Thanks for clarifying this.

Posted by guest-gihrig on February 11, 2013 at 10:33 PM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Jeremy Smyth writes MySQL training courses, and likes exploring interesting questions that come up from novices and experts alike.

Connect

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today