By Mat Keep on Jul 02, 2012
The MySQL Cluster engineering team recently ran a live webinar, available now on-demand demonstrating the ClusterJ and ClusterJPA NoSQL APIs for MySQL Cluster, and how these can be used in building real-time, high scale Java-based services that require continuous availability.
Attendees asked a number of great questions during the webinar, and I thought it would be useful to share those here, so others are also able to learn more about the Java NoSQL APIs.
First, a little bit about why we developed these APIs and why they are interesting to Java developers.
ClusterJ and Cluster JPA
ClusterJ is a Java interface to MySQL Cluster that provides either a static or dynamic domain object model, similar to the data model used by JDO, JPA, and Hibernate. A simple API gives users extremely high performance for common operations: insert, delete, update, and query.
ClusterJPA works with ClusterJ to extend functionality, including
- Persistent classes
- Joins in queries
- Lazy loading
- Table and index creation from object model
By eliminating data transformations via SQL, users get lower data access latency and higher throughput. In addition, Java developers have a more natural programming method to directly manage their data, with a complete, feature-rich solution for Object/Relational Mapping. As a result, the development of Java applications is simplified with faster development cycles resulting in accelerated time to market for new services.
MySQL Cluster offers multiple NoSQL APIs alongside Java:
- - Memcached for a persistent, high performance, write-scalable Key/Value store,
- - HTTP/REST via an Apache module
- - C++ via the NDB API for the lowest absolute latency.
Developers can use SQL as well as NoSQL APIs for access to the same data set via multiple query patterns – from simple Primary Key lookups or inserts to complex cross-shard JOINs using Adaptive Query Localization
Marrying NoSQL and SQL access to an ACID-compliant database offers developers a number of benefits. MySQL Cluster’s distributed, shared-nothing architecture with auto-sharding and real time performance makes it a great fit for workloads requiring high volume OLTP. Users also get the added flexibility of being able to run real-time analytics across the same OLTP data set for real-time business insight.
OK – hopefully you now have a better idea of why ClusterJ and JPA are available. Now, for the Q&A.
Q & A
Q. Why would I use Connector/J vs. ClusterJ?
A. Partly it's a question of whether you prefer to work with SQL (Connector/J) or objects (ClusterJ). Performance of ClusterJ will be better as there is no need to pass through the MySQL Server. A ClusterJ operation can only act on a single table (e.g. no joins) - ClusterJPA extends that capability
Q. Can I mix different APIs (ie ClusterJ, Connector/J) in our application for different query types?
A. Yes. You can mix and match all of the API types, SQL, JDBC, ODBC, ClusterJ, Memcached, REST, C++. They all access the exact same data in the data nodes. Update through one API and new data is instantly visible to all of the others.
Q. How many TCP connections would a SessionFactory instance create for a cluster of 8 data nodes?
A. SessionFactory has a connection to the mgmd (management node) but otherwise is just a vehicle to create Sessions. Without using connection pooling, a SessionFactory will have one connection open with each data node. Using optional connection pooling allows multiple connections from the SessionFactory to increase throughput.
Q. Can you give details of how Cluster J optimizes sharding to enhance performance of distributed query processing?
A. Each data node in a cluster runs a Transaction Coordinator (TC), which begins and ends the transaction, but also serves as a resource to operate on the result rows. While an API node (such as a ClusterJ process) can send queries to any TC/data node, there are performance gains if the TC is where most of the result data is stored. ClusterJ computes the shard (partition) key to choose the data node where the row resides as the TC.
Q. What happens if we perform two primary key lookups within the same transaction? Are they sent to the data node in one transaction?
A. ClusterJ will send identical PK lookups to the same data node.
Q. How is distributed query processing handled by MySQL Cluster ?
A. If the data is split between data nodes then all of the information will be transparently combined and passed back to the application. The session will connect to a data node - typically by hashing the primary key - which then interacts with its neighboring nodes to collect the data needed to fulfil the query.
Q. Can I use Foreign Keys with MySQL Cluster
A. Support for Foreign Keys is included in the MySQL Cluster 7.3 Early Access release
The NoSQL Java APIs are packaged with MySQL Cluster, available for download here so feel free to take them for a spin today!