Friday Oct 07, 2011

MySQL Cluster 7.2 (DMR2): NoSQL, Key/Value, Memcached

70x Higher Performance, Cross Data Center Scalability and New NoSQL Interface

Its been an exciting week for all involved with MySQL Cluster, with the announcement of the second Development Milestone Release (7.2.1) at Oracle Open World. Highlights include:

- Enabling next generation web services: 70x higher complex query performance, native memcached API and integration with the latest MySQL 5.5 server

- Enhancing cross data scalability: new multi-site clustering and enhanced active/active replication

- Simplified provisioning: consolidated user privileges.

You can download the DMR for evaluation now from: http://dev.mysql.com/downloads/cluster/ (select Development Milestone Release tab).

You can also read up on the detail of each of these features in the new article posted at the MySQL Developer Zone. In this blog, I’ll summarize the main parts of the announcement.

70x Higher Performance with Adaptive Query Localization (AQL)

Previewed as part of the first MySQL Cluster DMR, AQL is enabled by a new Index Statistics function that allows the SQL optimizer to build a better execution plan for each query.

As a result, JOIN operations are pushed down to the data nodes where the query executes in parallel on local copies of the data. A merged result set is then sent back to the MySQL Server, significantly enhancing performance by reducing network trips.

Take a look at how this is used by a web-based content management to increase performance by 70x

Adaptive Query Localization enables MySQL Cluster to better serve those use-cases that have the need to run real-time analytics across live data sets, along with high throughput OLTP operations. Examples include recommendations engines and clickstream analysis in web applications, pre-pay billing promotions in mobile telecoms networks or fraud detection in payment systems.

New NoSQL Interface and Schema-less Storage with the memcached API

The memcached interface released as an Early Access project with the first MySQL Cluster DMR is now integrated directly into the MySQL Cluster 7.2.1 trunk, enabling simpler evaluation.

The popularity of Key/Value stores has increased dramatically. With MySQL Cluster and the new memcached API, you have all the benefits of an ACID RDBMS, combined with the performance capabilities of Key/Value store.

By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row – thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a pre-defined column in a specific table.

Of course if the application needs to access the same data through SQL then developers can map key prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster.

You can read more about the design goals and implementation of the memcached API for MySQL Cluster here.

Integration with MySQL 5.5

MySQL Cluster 7.2.1 is integrated with MySQL Server 5.5, providing binary compatibility to existing MySQL Server deployments. Users can now fully exploit the latest capabilities of both the InnoDB and MySQL Cluster storage engines within a single application.

Users simply install the new MySQL Cluster binary including the MySQL 5.5 release, restart the server and immediate have access to both InnoDB and MySQL Cluster!

Enhancing Cross Data Center Scalability: Simplified Active / Active Replication

MySQL Cluster has long offered Geographic Replication, distributing clusters to remote data centers to reduce the affects of geographic latency by pushing data closer to the user, as well as providing a capability for disaster recovery.

Geographic replication has always been designed around an Active / Active technology, so if applications are attempting to update the same row on different clusters at the same time, the conflict can be detected and resolved. With the release of MySQL Cluster 7.2.1, implementing Active / Active replication has become a whole lot simpler. Developers no longer need to implement and manage timestamp columns within their applications. Also rollbacks can be made to whole transactions rather than just individual operations.

You can learn more here.

Enhancing Cross Data Center Scalability: Multi-Site Clustering

MySQL Cluster 7.2.1 DMR provides a new option for cross data center scalability – multi-site clustering. For the first time splitting data nodes across data centers is a supported deployment option.

Improvements to MySQL Cluster’s heartbeating mechanism with a new “ConnectivityCheckPeriod” parameter enables greater resilience to temporary latency spikes on a WAN, thereby maintaining operation of the cluster.

With this deployment model, users can synchronously replicate updates between data centers without needing conflict detection and resolution, and automatically failover between those sites in the event of a node failure.

Users need to characterize their network bandwidth and latencies, and observe best practices in configuring both their network environment and Cluster. More guidance is available here.

User Privilege Consolidation

User privilege tables are now consolidated into the data nodes and centrally accessible by all MySQL servers accessing the cluster.

Previously the privilege tables were local to each MySQL server, meaning users and their associated privileges had to be managed separately on each server. By consolidating privilege data, users need only be defined once and managed centrally, saving Systems Administrators significant effort and reducing cost of operations.

Summary

The MySQL Cluster 7.2.1 DMR enables new classes of use-cases to benefit from web-scale performance with carrier-grade availability.  We also have a great webinar coming up on Wednesday October 19th  where the engineering and product management team will discuss the enhancements in more detail, and how you can use them today. You can sign up here.

You can download the DMR for evaluation now from: http://dev.mysql.com/downloads/cluster/ (select Development Milestone Release tab).

You can learn more about the MySQL Cluster architecture from our Guide to scaling web databases

Let us know what you think of these enhancements directly in comments of this or the associated blogs. We look forward to working with the community to perfect these new features.

Tuesday Jun 14, 2011

Scaling Web Databases: Auto-Sharding with MySQL Cluster

The realities of today’s successful web services are creating new demands that many legacy databases were just not designed to handle:

- The need to scale writes, as well as reads, both within and across geographically dispersed data centers;

- The need to scale operational agility to keep pace with database load and application requirements. This means being able to add capacity and performance to the database, and to evolve the schema – all without downtime;

- The need to scale queries by having flexibility in the APIs used to access the database;

- The need to scale the database while maintaining continuous availability for both failures as well as scheduled maintenance events.

Each of the requirements above warrant their own dedicated blog, which I’ll find time to write over the next few weeks.

But to get started, I wanted to discuss how the MySQL Cluster database addresses the first point – scaling writes to the database with automatic sharding and geographic replication.

Auto-Sharding

MySQL Cluster is implemented as a distributed, multi-master database with no single point of failure. Tables are automatically sharded across a pool of low cost commodity nodes, enabling the database to scale horizontally to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs (memcached, REST/HTTP, C++, Java, JPA and LDAP). Up to 255 nodes are supported, of which 48 are data nodes. You can read more about the different types of nodes here.

By automatically sharding tables in the database, MySQL Cluster eliminates the need to shard at the application layer, greatly simplifying application development and maintenance.

Sharding is based on the hashing of the primary key, though users can override this by telling MySQL Cluster which fields from the primary key should be used in the hashing algorithm. Hashing on the primary key generally leads to a more even distribution of data and queries across the cluster than alternative approached such as range partitioning.

Figure 1 demonstrates how MySQL Cluster shards tables across data nodes of the cluster.

Figure 1: Auto-Sharding in MySQL Cluster

You will see from the figure above that MySQL Cluster automatically creates “node groups” from the number of replicas and data nodes specified by the user. Updates are synchronously replicated between members of the node group to protect against data loss and enable sub-second failover in the event of a node failure.

Figure 2 shows how MySQL Cluster creates primary and secondary fragments of each shard.


Figure 2: Eliminating Data Loss with Cross-Shard Fragments

MySQL Cluster is an active/active architecture with multi-master replication, so updates made by any application or SQL node accessing the cluster are instantly available to all of the other nodes accessing the cluster.

Unlike other distributed databases, users do not lose the ability to perform JOIN operations or sacrifice ACID-guarantees. In the Development Release of MySQL Cluster (7.2), Adaptive Query Localization pushes JOIN operations down to the data nodes where they are executed locally and in parallel. We've seen 20-40x higher throughput from the community members that have tested it.

Geographic Replication

Of course, web services are global and so developers will want to ensure their databases can scale-out across regions. MySQL Cluster offers Geographic Replication which distributes clusters to remote data centers, serving to reduce the affects of geographic latency as well as provide a facility for disaster recovery.

Figure 3: Geographic Replication with MySQL Cluster

Geographic Replication is asynchronous and based on standard MySQL replication – with one important difference – it is active/active so supports the detection and resolution of conflicts when the same row is updated across different clusters. This does currently require the addition of a timestamp column in the application, but that is expected to be eliminated in future releases.

Where the Rubber Meets the Road

Auto-sharding and geographic replication are all great technologies, but what do they mean in terms of delivered performance ?

The MySQL Cluster development team recently ran a series of benchmarks that characterized performance across 8 x dual socket 2.93GHz, 6 core commodity Intel servers, each equipped with 24GB of RAM. As seen in the figure below, MySQL Cluster delivered just under 2.5 million updates per second with 2 x data nodes configured per server.

Figure 4: MySQL Cluster performance scaling-out on commodity nodes.

Across 16 Intel servers, MySQL Cluster achieved just under 7 million read operations per second. We ran out of time in the test cluster before being able to complete the test of write performance, but will return to those efforts soon.

Wrap-Up

So what does all of this mean ? There is an ever-growing array of options for developers to choose from when scaling out new generations of web applications. Don’t assume that relational databases can’t scale, or offer the kind of operational agility demanded by today’s highly dynamic services. MySQL Cluster is already proven as one such option….and you don’t have to throw away ACID guarantees or the ability to run complex queries to get scalability or schema agility.

You can learn about how MySQL Cluster implements auto-sharding, along with other key features for web services such as online schema updates and NoSQL interfaces from a new on-demand webinar.

And of course MySQL Cluster is open source, so you are free to download, develop and deploy with it. The latest GA release is here.

The MySQL Cluster 7.2 Development Milestone Release including Adaptive Query Localization is here (select the Development Release tab):

Finally, if you wanted to try out MySQL Cluster with the memcached API, you can get it from the latest build on the MySQL labs site.

As ever, let us know how these technologies work for you, either in the comments below or via the MySQL Cluster forum.

About

Get the latest updates on products, technology, news, events, webcasts, customers and more.

Twitter


Facebook

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
5
6
9
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today