MySQL Cluster 7.2 Development Milestone Release - NoSQL with Memcached and 20x Higher JOIN Performance
By Mat Keep on Apr 12, 2011
The announcements at the Oracle Collaborate and O'Reilly MySQL conferences mark an exciting milestone in the development of the MySQL Cluster database.
MySQL Cluster is already proven as a write-scalable, real-time transactional database, combining 99.999% availability with the low TCO of open source. With a distributed, multi-master architecture and no single point of failure, MySQL Cluster scales horizontally on commodity hardware to serve read and write intensive workloads.
With these enhancements announced in the Development Milestone Release, MySQL Cluster can be extended to serve a broader range of workloads.
Summary of Key Enhancements
The MySQL Cluster 7.2 Development Milestone Release and latest labs.mysql.com builds deliver enhancements based on input from the community and customers, including support for the memcached NoSQL API, faster JOIN performance and simplified administration:
- NoSQL with Memcached support enables users to extend memcached by deploying a scalable, persistent, highly available data-store supporting high volumes of reads and writes with real-time performance, all accessed via the trusted, proven and popular Memcached API
- Adaptive Query Localization delivers over 20x higher performance when executing complex queries.
- Shared User Privilege Tables radically simplifies the provisioning and administration of MySQL Cluster by consolidating previous distributed user privilege tables into the data nodes - accessible from all MySQL Servers
In addition to memcached access, the engineering team also previewed JSON as an additional NoSQL interface to MySQL Cluster, allowing applications to directly query and modify the database, and return results directly to a browser, eliminating transformations to SQL. Expect to hear more about this shortly.
In the meantime, what do the enhancements delivered by the Development Milestone Release and labs.mysql.com do, and why should you care?
Hopefully the following will answer those questions, as well as providing you links to the blogs written directly by the product team who will discuss the implementation detail - and wax lyrical about the fruits of their labor.
NoSQL with Memcached API
Like memcached, MySQL Cluster provides a distributed hash table with in-memory performance for caching, accessed via the simple memcached API. MySQL Cluster extends memcached functionality by adding support for write-intensive workloads, a full relational model with ACID compliance (including persistence), rich query support, transparent partitioning for scale-out and 99.999% availability, with extensive management and monitoring.
All writes are committed directly to MySQL Cluster, eliminating cache invalidation and the overhead of data consistency checking to ensure complete synchronization between the database and cache. Duplication of data between the cache and database can be eliminated, enabling simpler re-use of data across multiple applications, and reducing memory footprint.
Using the memcached API, users can simplify their architecture by compressing the caching and database layers into a single data tier, managed by MySQL Cluster, enabling them to:
- Preserve their existing investments in memcached by re-using existing memcached clients and without requiring application changes.
- Deliver higher write performance for update intensive applications.
- Simplify scale-out (both at the Memcached and MySQL Cluster layers).
- Improve uptime and availability.
The Memcached API adds another NoSQL access method to MySQL Cluster, which already includes C++ (NDB API), Java, JPA, LDAP and HTTP/REST APIs, all of which can be used concurrently with SQL to serve a broad range of web, telecoms and embedded use-cases handling the simplest to the most complex queries.
The Memcached API enables web services to directly access the MySQL Cluster database without transformations to SQL, ensuring low latency and high throughput for read/write operations.
Figure 1: Implementation of memcached Access to MySQL Cluster
Implementation is simple - MySQL-Cluster specific code is loaded into the Memcached server as a "memcached driver for NDB" which can access the NDB API directly to read and write from the clustered data nodes.
You can download the code now from http://labs.mysql.com and select the source build: mysql-cluster-7.2-labs-memcached
Adaptive Query Localization
Delivering over 20x higher performance when executing complex JOIN operations, Adaptive Query Localization expands the range of use-cases that can benefit from MySQL Cluster:
- Docudesk, developers of an industry-leading DocQ document
management SaaS web service, achieved 21x higher performance when
testing reporting query response time.1 Casey Brown, Manager of Development & DBA Services at Docudesk said:
"Docudesk relies on MySQL Cluster to support our DocQ SaaS offering which demands high update rates, low latency and continuous availability from the database. Testing of Adaptive Query Localization has yielded over 20x higher performance on complex queries within our application, enabling Docudesk to expand our use of MySQL Cluster into a broader range of highly dynamic web services."
- 25x more Transactions Per Second (TPS) was achieved when running a transactional ecommerce benchmark based on the TPC-W suite, simulating multiple users searching, browsing and buying books from an on-line retailer.2
By default, JOIN operations are executed in the MySQL Server, which provides high performance when the data is local. In MySQL Cluster, the data is distributed across multiple, redundant data nodes. As a consequence, the nested-loop-join in the MySQL Server needs to access the data nodes repeatedly at each step. As the depth of the JOIN or the size of the intermediate result sets grow, the number of messages to the data nodes increases quickly, which can dramatically slow down query execution.
The Adaptive Query Localization functionality ships queries from the MySQL Server to the data nodes where the query executes on local copies of the data in parallel, and then returns the merged result set back to the MySQL Server, significantly enhancing performance by reducing network trips.
Adaptive Query Localization enables MySQL Cluster to better serve those use-cases that have the need to run complex queries along with high throughput OLTP operations in one application and database solution or where multiple applications need access to the same data but have different access/query requirements.
Note: Adaptive Query Localization is supported with the following conditions:
- JOINed columns must have the same data type
- Queries referencing BLOBs are not supported
- Explicit locking is not supported (existing implicit row-based locking is enforced)
- Only supports fully or partially qualified primary keys or plain indexes as access method for child tables
The parameterised queries that underpin this feature can also be used directly by applications using the NDB API directly - reducing the number of database accesses they need to make and so improving performance even further.
Simplified Provisioning and Administration
Radically simplifying the provisioning and administration of MySQL Cluster, user privilege tables are now consolidated into the data nodes and centrally accessible by all MySQL servers accessing the cluster.
Previously the privilege tables were local to each MySQL server, meaning users and their associated privileges had to be managed separately on each server. By consolidating privilege data, users need only be defined once and managed centrally, saving Systems Administrators significant effort and reducing cost of operations.
The MySQL Cluster 7.2 Development Milestone Release and new labs.mysql.com builds enable new classes of use-cases to benefit from web-scale performance with carrier-grade availability.
You can download the MySQL Cluster 7.2 Development Milestone Release for evaluation now from: http://dev.mysql.com/downloads/cluster/ (select Development Milestone Release tab).
You can download the memcached API for MySQL Cluster now from http://labs.mysql.com and select the source build: mysql-cluster-7.2-labs-memcached
Let us know what you think of these enhancements directly in comments for each blog. We look forward to working with the community to perfect these new features.
1 Casey Brown, DBA and Development Manager: "Here is a sample reporting query. SET ndb_join_pushdown=0; SELECT DISTINCT f.ownerUser FROM invitations i INNER JOIN sharedfiles s ON i.sharedfile = s.id INNER JOIN files f ON s.file = f.id WHERE (i.group = 'ed881bac6397ede33c0a285c9f50bb83') AND f.ownerUser <> '0ac2006cffa2d22d5f34ee20075bfad1' 739 rows in set (2.11 sec) SET ndb_join_pushdown=1; <run query again> 739 rows in set (0.10 sec) That is a massive improvement"
2 Results reported in the Enhancing JOIN Performance Webinar, November 2010: http://www.mysql.com/news-and-events/on-demand-webinars/display-od-583.html