Thursday Dec 13, 2012

NoSQL Memcached API for MySQL: Latest Updates

With data volumes exploding, it is vital to be able to ingest and query data at high speed. For this reason, MySQL has implemented NoSQL interfaces directly to the InnoDB and MySQL Cluster (NDB) storage engines, which bypass the SQL layer completely. Without SQL parsing and optimization, Key-Value data can be written directly to MySQL tables up to 9x faster, while maintaining ACID guarantees.

In addition, users can continue to run complex queries with SQL across the same data set, providing real-time analytics to the business or anonymizing sensitive data before loading to big data platforms such as Hadoop, while still maintaining all of the advantages of their existing relational database infrastructure.

This and more is discussed in the latest Guide to MySQL and NoSQL where you can learn more about using the APIs to scale new generations of web, cloud, mobile and social applications on the world's most widely deployed open source database

The native Memcached API is part of the MySQL 5.6 Release Candidate, and is already available in the GA release of MySQL Cluster. By using the ubiquitous Memcached API for writing and reading data, developers can preserve their investments in Memcached infrastructure by re-using existing Memcached clients, while also eliminating the need for application changes.

Speed, when combined with flexibility, is essential in the world of growing data volumes and variability. Complementing NoSQL access, support for on-line DDL (Data Definition Language) operations in MySQL 5.6 and MySQL Cluster enables DevOps teams to dynamically update their database schema to accommodate rapidly changing requirements, such as the need to capture additional data generated by their applications. These changes can be made without database downtime.

Using the Memcached interface, developers do not need to define a schema at all when using MySQL Cluster.

Lets look a little more closely at the Memcached implementations for both InnoDB and MySQL Cluster.

Memcached Implementation for InnoDB

The Memcached API for InnoDB is previewed as part of the MySQL 5.6 Release Candidate.

As illustrated in the following figure, Memcached for InnoDB is implemented via a Memcached daemon plug-in to the mysqld process, with the Memcached protocol mapped to the native InnoDB API.

Figure 1: Memcached API Implementation for InnoDB

With the Memcached daemon running in the same process space, users get very low latency access to their data while also leveraging the scalability enhancements delivered with InnoDB and a simple deployment and management model. Multiple web / application servers can remotely access the Memcached / InnoDB server to get direct access to a shared data set.

With simultaneous SQL access, users can maintain all the advanced functionality offered by InnoDB including support for Foreign Keys, XA transactions and complex JOIN operations.

Benchmarks demonstrate that the NoSQL Memcached API for InnoDB delivers up to 9x higher performance than the SQL interface when inserting new key/value pairs, with a single low-end commodity server supporting nearly 70,000 Transactions per Second.

Figure 2: Over 9x Faster INSERT Operations

The delivered performance demonstrates MySQL with the native Memcached NoSQL interface is well suited for high-speed inserts with the added assurance of transactional guarantees.

You can check out the latest Memcached / InnoDB developments and benchmarks here

You can learn how to configure the Memcached API for InnoDB here

Memcached Implementation for MySQL Cluster

Memcached API support for MySQL Cluster was introduced with General Availability (GA) of the 7.2 release, and joins an extensive range of NoSQL interfaces that are already available for MySQL Cluster

Like Memcached, MySQL Cluster provides a distributed hash table with in-memory performance. MySQL Cluster extends Memcached functionality by adding support for write-intensive workloads, a full relational model with ACID compliance (including persistence), rich query support, auto-sharding and 99.999% availability, with extensive management and monitoring capabilities.

All writes are committed directly to MySQL Cluster, eliminating cache invalidation and the overhead of data consistency checking to ensure complete synchronization between the database and cache.

Figure 3: Memcached API Implementation with MySQL Cluster

Implementation is simple:

1. The application sends reads and writes to the Memcached process (using the standard Memcached API).

2. This invokes the Memcached Driver for NDB (which is part of the same process)

3. The NDB API is called, providing for very quick access to the data held in MySQL Cluster’s data nodes.

The solution has been designed to be very flexible, allowing the application architect to find a configuration that best fits their needs. It is possible to co-locate the Memcached API in either the data nodes or application nodes, or alternatively within a dedicated Memcached layer.

The benefit of this flexible approach to deployment is that users can configure behavior on a per-key-prefix basis (through tables in MySQL Cluster) and the application doesn’t have to care – it just uses the Memcached API and relies on the software to store data in the right place(s) and to keep everything synchronized.

Using Memcached for Schema-less Data

By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row – thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a pre-defined column in a specific table.

Of course if the application needs to access the same data through SQL then developers can map key prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster.


Download the Guide to MySQL and NoSQL to learn more about NoSQL APIs and how you can use them to scale new generations of web, cloud, mobile and social applications on the world's most widely deployed open source database

See how to build a social app with MySQL Cluster and the Memcached API from our on-demand webinar or take a look at the docs

Don't hesitate to use the comments section below for any questions you may have 

Monday Jul 02, 2012

NoSQL Java API for MySQL Cluster: Questions & Answers

The MySQL Cluster engineering team recently ran a live webinar, available now on-demand demonstrating the ClusterJ and ClusterJPA NoSQL APIs for MySQL Cluster, and how these can be used in building real-time, high scale Java-based services that require continuous availability.

Attendees asked a number of great questions during the webinar, and I thought it would be useful to share those here, so others are also able to learn more about the Java NoSQL APIs.

First, a little bit about why we developed these APIs and why they are interesting to Java developers.

ClusterJ and Cluster JPA

ClusterJ is a Java interface to MySQL Cluster that provides either a static or dynamic domain object model, similar to the data model used by JDO, JPA, and Hibernate. A simple API gives users extremely high performance for common operations: insert, delete, update, and query.

ClusterJPA works with ClusterJ to extend functionality, including

- Persistent classes

- Relationships

- Joins in queries

- Lazy loading

- Table and index creation from object model

By eliminating data transformations via SQL, users get lower data access latency and higher throughput. In addition, Java developers have a more natural programming method to directly manage their data, with a complete, feature-rich solution for Object/Relational Mapping. As a result, the development of Java applications is simplified with faster development cycles resulting in accelerated time to market for new services.

MySQL Cluster offers multiple NoSQL APIs alongside Java:

  • - Memcached for a persistent, high performance, write-scalable Key/Value store,
  • - HTTP/REST via an Apache module
  • - C++ via the NDB API for the lowest absolute latency.

Developers can use SQL as well as NoSQL APIs for access to the same data set via multiple query patterns – from simple Primary Key lookups or inserts to complex cross-shard JOINs using Adaptive Query Localization

Marrying NoSQL and SQL access to an ACID-compliant database offers developers a number of benefits. MySQL Cluster’s distributed, shared-nothing architecture with auto-sharding and real time performance makes it a great fit for workloads requiring high volume OLTP. Users also get the added flexibility of being able to run real-time analytics across the same OLTP data set for real-time business insight.

OK – hopefully you now have a better idea of why ClusterJ and JPA are available. Now, for the Q&A.

Q & A

Q. Why would I use Connector/J vs. ClusterJ?

A. Partly it's a question of whether you prefer to work with SQL (Connector/J) or objects (ClusterJ). Performance of ClusterJ will be better as there is no need to pass through the MySQL Server. A ClusterJ operation can only act on a single table (e.g. no joins) - ClusterJPA extends that capability

Q. Can I mix different APIs (ie ClusterJ, Connector/J) in our application for different query types?

A. Yes. You can mix and match all of the API types, SQL, JDBC, ODBC, ClusterJ, Memcached, REST, C++. They all access the exact same data in the data nodes. Update through one API and new data is instantly visible to all of the others.

Q. How many TCP connections would a SessionFactory instance create for a cluster of 8 data nodes?

A. SessionFactory has a connection to the mgmd (management node) but otherwise is just a vehicle to create Sessions. Without using connection pooling, a SessionFactory will have one connection open with each data node. Using optional connection pooling allows multiple connections from the SessionFactory to increase throughput.

Q. Can you give details of how Cluster J optimizes sharding to enhance performance of distributed query processing?

A. Each data node in a cluster runs a Transaction Coordinator (TC), which begins and ends the transaction, but also serves as a resource to operate on the result rows. While an API node (such as a ClusterJ process) can send queries to any TC/data node, there are performance gains if the TC is where most of the result data is stored. ClusterJ computes the shard (partition) key to choose the data node where the row resides as the TC.

Q. What happens if we perform two primary key lookups within the same transaction? Are they sent to the data node in one transaction?

A. ClusterJ will send identical PK lookups to the same data node.

Q. How is distributed query processing handled by MySQL Cluster ?

A. If the data is split between data nodes then all of the information will be transparently combined and passed back to the application. The session will connect to a data node - typically by hashing the primary key - which then interacts with its neighboring nodes to collect the data needed to fulfil the query.

Q. Can I use Foreign Keys with MySQL Cluster

A. Support for Foreign Keys is included in the MySQL Cluster 7.3 Early Access release


The NoSQL Java APIs are packaged with MySQL Cluster, available for download here so feel free to take them for a spin today!

Key Resources

MySQL Cluster on-line demo 

MySQL ClusterJ and JPA On-demand webinar 

MySQL ClusterJ and JPA documentation

MySQL ClusterJ and JPA whitepaper and tutorial

Friday Mar 30, 2012

Guide to MySQL & NoSQL, Webinar Q&A

Yesterday we ran a webinar discussing the demands of next generation web services and how blending the best of relational and NoSQL technologies enables developers and architects to deliver the agility, performance and availability needed to be successful.

Attendees posted a number of great questions to the MySQL developers, serving to provide additional insights into areas like auto-sharding and cross-shard JOINs, replication, performance, client libraries, etc. So I thought it would be useful to post those below, for the benefit of those unable to attend the webinar.

Before getting to the Q&A, there are a couple of other resources that maybe useful to those looking at NoSQL capabilities within MySQL:

- On-Demand webinar

- Slides used during the webinar

- Guide to MySQL and NoSQL whitepaper 

- MySQL Cluster demo, including NoSQL interfaces, auto-sharing, high availability, etc. 

So here is the Q&A from the event 

Q. Where does MySQL Cluster fit in to the CAP theorem?

A. MySQL Cluster is flexible. A single Cluster will prefer consistency over availability in the presence of network partitions. A pair of Clusters can be configured to prefer availability over consistency. A full explanation can be found on the MySQL Cluster & CAP Theorem blog post. 

Q. Can you configure the number of replicas? (the slide used a replication factor of 1)

Yes. A cluster is configured by an .ini file. The option NoOfReplicas sets the number of originals and replicas: 1 = no data redundancy, 2 = one copy etc. Usually there's no benefit in setting it >2.

Q. Interestingly most (if not all) of the NoSQL databases recommend having 3 copies of data (the replication factor).   

Yes, with configurable quorum based Reads and writes. MySQL Cluster does not need a quorum of replicas online to provide service. Systems that require a quorum need > 2 replicas to be able to tolerate a single failure. Additionally, many NoSQL systems take liberal inspiration from the original GFS paper which described a 3 replica configuration. MySQL Cluster avoids the need for a quorum by using a lightweight arbitrator. You can configure more than 2 replicas, but this is a tradeoff between incrementally improved availability, and linearly increased cost.

Q. Can you have cross node group JOINS? Wouldn't that run into the risk of flooding the network?

MySQL Cluster 7.2 supports cross nodegroup joins. A full cross-join can require a large amount of data transfer, which may bottleneck on network bandwidth. However, for more selective joins, typically seen with OLTP and light analytic applications, cross node-group joins give a great performance boost and network bandwidth saving over having the MySQL Server perform the join.

Q. Are the details of the benchmark available anywhere? According to my calculations it results in approx. 350k ops/sec per processor which is the largest number I've seen lately

The details are linked from Mikael Ronstrom's blog

The benchmark uses a benchmarking tool we call flexAsynch which runs parallel asynchronous transactions. It involved 100 byte reads, of 25 columns each. Regarding the per-processor ops/s, MySQL Cluster is particularly efficient in terms of throughput/node. It uses lock-free minimal copy message passing internally, and maximizes ID cache reuse. Note also that these are in-memory tables, there is no need to read anything from disk.

Q. Is access control (like table) planned to be supported for NoSQL access mode?

Currently we have not seen much need for full SQL-like access control (which has always been overkill for web apps and telco apps). So we have no plans, though especially with memcached it is certainly possible to turn-on connection-level access control. But specifically table level controls are not planned.

Q. How is the performance of memcached APi with MySQL against memcached+MySQL or any other Object Cache like Ecache with MySQL DB?

With the memcache API we generally see a memcached response in less than 1 ms. and a small cluster with one memcached server can handle tens of thousands of operations per second.

Q. Can .NET can access MemcachedAPI?

Yes, just use a .Net memcache client such as the enyim or BeIT memcache libraries.

Q. Is the row level locking applicable when you update a column through memcached API?

An update that comes through memcached uses a row lock and then releases it immediately. Memcached operations like "INCREMENT" are actually pushed down to the data nodes. In most cases the locks are not even held long enough for a network round trip.

Q. Has anyone published an example using something like PHP? I am assuming that you just use the PHP memcached extension to hook into the memcached API. Is that correct?

Not that I'm aware of but absolutely you can use it with php or any of the other drivers

Q. For beginner we need more examples.

Take a look here for a fully worked example

Q. Can I access MySQL using Cobol (Open Cobol) or C and if so where can I find the coding libraries etc?

A. There is a cobol implementation that works well with MySQL, but I do not think it is Open Cobol. Also there is a MySQL C client library that is a standard part of every mysql distribution

Q. Is there a place to go to find help when testing and/implementing the NoSQL access?

If using Cluster then you can use the alias or post on the MySQL Cluster forum

Q. Are there any white papers on this? 

Yes - there is more detail in the MySQL Guide to NoSQL whitepaper

If you have further questions, please don’t hesitate to use the comments below!

Monday Aug 01, 2011

MySQL 5.6 Replication – New Early Access Features

At OSCON 2011 last week, Oracle delivered more early access (labs) features for MySQL 5.6 replication. These features are focused on better integration, performance and data integrity, and are summarized in this blog with links to resources enabling users to download, configure and evaluate them[Read More]

Get the latest updates on products, technology, news, events, webcasts, customers and more.




« June 2016