NoSQL to MySQL with Memcached
By Mat Keep on Apr 12, 2011
The ever increasing performance demands of web-based services has generated significant interest in providing NoSQL access methods to MySQL - enabling users to maintain all of the advantages of their existing relational database infrastructure, while providing blazing fast performance for simple queries, using an API to complement regular SQL access to their data.
The HandlerSocket development at DeNA is a great example of community innovation, with a solution implemented as a custom plug-in and protocol for the MySQL server daemon.
We are hearing the community say they want NotOnly SQL - they want their trusted SQL RDBMS - plus, they want NoSQL techniques to access that data. So, we are previewing our NotOnlySQL solution for MySQL - delivered via memcached - with implementations to access both the InnoDB and MySQL Cluster (NDB) storage engines.
The purpose of this blog is to provide more detail on the Memcached API for MySQL, specifically covering:
- Design rationale
- Getting started
Using the memcached API, web services can directly access the InnoDB and MySQL Cluster storage engines without transformations to SQL, ensuring low latency and high throughput for read/write queries. Operations such as SQL parsing are eliminated and more of the server's hardware resources (CPU, memory and I/O) are dedicated to servicing the query within the storage engine itself.
Over and above performance, there are a number of additional potential benefits in this approach for both developers and DBAs:
- Preserves investments in memcached infrastructure by re-using existing memcached clients and eliminates the need for application changes.
- Access to the full range of memcached client libraries and platforms, providing maximum deployment flexibility and consistently high performance across all supported environments.
- Extends memcached functionality by integrating persistent, crash-safe, transactional database back-ends offering ACID compliance, rich query support and extensive management and monitoring tools.
- Reduces service disruption caused by cache re-population after an outage (note that buffer pool reloading enhancements planned for a future milestone release will further improve recovery performance by warming the cache)
- Simplifies web infrastructure by compressing the caching and database layers into a single data tier, managed by MySQL.
- Reduces development and administration effort by eliminating the cache invalidation and data consistency checking required to ensure synchronization between the database and cache when updates are committed.
- Eliminates duplication of data between the cache and database, enabling simpler re-use of data across multiple applications, and reducing memory footprint.
- Flexibility to concurrently access the same data set with SQL, allowing complex queries to be run while simultaneously supporting Key-Value operations from memcached.
Of course, the memcached implementations for InnoDB and MySQL Cluster are still in their early phases of development (though MySQL Cluster is more mature at this stage), and so neither is suitable for production deployment. Nonetheless, developers can at least get a taste of what is possible as these features evolve.
The initial memcached API implementations for InnoDB and MySQL Cluster take slightly different approaches, which are discussed below.
Note that both implementations are dependent on memcached 1.6.
Memcached and InnoDB
As illustrated in Figure 1, memcached protocol access for InnoDB is implemented via a memcached daemon plug-in to the mysqld process, with the memcached protocol mapped to the native InnoDB API.
Figure 1: Memcached API Implementation for InnoDB
With the memcached daemon running in the same process space, users get very low latency access to their data while also leveraging the scalability enhancements delivered with InnoDB 1.2 (which has been introduced as part of the MySQL 5.6.2 Development Milestone Release), and a simple deployment and management model. Multiple web / application servers can remotely access the memcached / InnoDB server to get direct access to a shared data set.
Note that in the current InnoDB implementation, updates made by memcached applications are not written to the binlog. Binlogging capability is something that the engineering team plan to add in a future milestone release which would deliver against more of the benefits identified in the "Design Rationale" section above.
With simultaneous SQL access, users can maintain all the advanced functionality offered by InnoDB including support for foreign keys, XA transactions and complex JOIN operations.
Looking forward, the MySQL engineering team plans to develop the same implementation model used by MySQL Cluster with the memcached server running in a separate process space (discussed below). Users can then choose whichever implementation model makes the most sense for their specific use-case.
You can download the code now from http://labs.mysql.com and select the appropriate build:
- Binary: mysql-5.6.2-labs-innodb-memcached-linux-x86_64.tar.gz
- Source: mysql-5.6.2-labs-innodb-memcached.tar.gz
Memcached and MySQL Cluster (NDB)
Like memcached, MySQL Cluster provides a distributed hash table with in-memory performance for caching, which can now be accessed via the simple memcached API.
Figure 2: Memcached API Implementation for MySQL Cluster (NDB)
Unlike the initial InnoDB implementation discussed above, a MySQL Cluster plug-in is installed within the memcached server as an "memcached driver for NDB" which can access the NDB API to directly query the data nodes, as illustrated in the diagram above.
With the memcached server running in a separate process space, a single MySQL Cluster instance can serve multiple memcached applications, and scale-out on demand with transparent auto-sharding, in-memory data and the ability to add nodes on-line to a running cluster, without interruption to service.
Users can also take advantage of 99.999% uptime and high write performance properties of MySQL Cluster to support update-intensive services with extreme availability requirements.
As all updates from memcached applications pass through the NDB API, the binlog injector thread captures and writes events to the binary log for onward replication to slave systems.
As well as having memcached access to MySQL Cluster, users have the additional flexibility of maintaining their own dedicated memcached caching layer for data with the following properties:
- Read-intensive (rarely updated)
- Response-time sensitive
- Does not require persistence
- Simple key-value access patterns
The Memcached API adds another direct NoSQL access method to MySQL Cluster, which already includes C++ (NDB API), Java, JPA, LDAP and HTTP/REST interfaces, all of which can be used concurrently with the SQL interface to serve a broad range of web, telecoms and embedded use-cases handling the simplest to the most complex queries.
You can also read the blog from the engineer who developed this capability and also take a look at his presentation from the O'Reilly MySQL Conference.
You can download the code now from http://labs.mysql.com and select the source build: mysql-cluster-7.2-labs-memcached
Explosions in data volumes and internet penetration rates are driving a seemingly insatiable demand for ever-higher levels of database performance. By directly implementing memcached API support to InnoDB and MySQL Cluster, developers and DBAs can preserve the rich functionality of relational databases and SQL, while also having options to integrate simple and fast access methods provided by the one of the most widely adopted NoSQL protocols.
Let us know what you think of these enhancements directly in comments for each blog. We look forward to working with the community to perfect these new features.