Friday Mar 30, 2012

Guide to MySQL & NoSQL, Webinar Q&A

Yesterday we ran a webinar discussing the demands of next generation web services and how blending the best of relational and NoSQL technologies enables developers and architects to deliver the agility, performance and availability needed to be successful.

Attendees posted a number of great questions to the MySQL developers, serving to provide additional insights into areas like auto-sharding and cross-shard JOINs, replication, performance, client libraries, etc. So I thought it would be useful to post those below, for the benefit of those unable to attend the webinar.

Before getting to the Q&A, there are a couple of other resources that maybe useful to those looking at NoSQL capabilities within MySQL:

- On-Demand webinar

- Slides used during the webinar

- Guide to MySQL and NoSQL whitepaper 

- MySQL Cluster demo, including NoSQL interfaces, auto-sharing, high availability, etc. 

So here is the Q&A from the event 

Q. Where does MySQL Cluster fit in to the CAP theorem?

A. MySQL Cluster is flexible. A single Cluster will prefer consistency over availability in the presence of network partitions. A pair of Clusters can be configured to prefer availability over consistency. A full explanation can be found on the MySQL Cluster & CAP Theorem blog post. 

Q. Can you configure the number of replicas? (the slide used a replication factor of 1)

Yes. A cluster is configured by an .ini file. The option NoOfReplicas sets the number of originals and replicas: 1 = no data redundancy, 2 = one copy etc. Usually there's no benefit in setting it >2.

Q. Interestingly most (if not all) of the NoSQL databases recommend having 3 copies of data (the replication factor).   

Yes, with configurable quorum based Reads and writes. MySQL Cluster does not need a quorum of replicas online to provide service. Systems that require a quorum need > 2 replicas to be able to tolerate a single failure. Additionally, many NoSQL systems take liberal inspiration from the original GFS paper which described a 3 replica configuration. MySQL Cluster avoids the need for a quorum by using a lightweight arbitrator. You can configure more than 2 replicas, but this is a tradeoff between incrementally improved availability, and linearly increased cost.

Q. Can you have cross node group JOINS? Wouldn't that run into the risk of flooding the network?

MySQL Cluster 7.2 supports cross nodegroup joins. A full cross-join can require a large amount of data transfer, which may bottleneck on network bandwidth. However, for more selective joins, typically seen with OLTP and light analytic applications, cross node-group joins give a great performance boost and network bandwidth saving over having the MySQL Server perform the join.

Q. Are the details of the benchmark available anywhere? According to my calculations it results in approx. 350k ops/sec per processor which is the largest number I've seen lately

The details are linked from Mikael Ronstrom's blog

The benchmark uses a benchmarking tool we call flexAsynch which runs parallel asynchronous transactions. It involved 100 byte reads, of 25 columns each. Regarding the per-processor ops/s, MySQL Cluster is particularly efficient in terms of throughput/node. It uses lock-free minimal copy message passing internally, and maximizes ID cache reuse. Note also that these are in-memory tables, there is no need to read anything from disk.

Q. Is access control (like table) planned to be supported for NoSQL access mode?

Currently we have not seen much need for full SQL-like access control (which has always been overkill for web apps and telco apps). So we have no plans, though especially with memcached it is certainly possible to turn-on connection-level access control. But specifically table level controls are not planned.

Q. How is the performance of memcached APi with MySQL against memcached+MySQL or any other Object Cache like Ecache with MySQL DB?

With the memcache API we generally see a memcached response in less than 1 ms. and a small cluster with one memcached server can handle tens of thousands of operations per second.

Q. Can .NET can access MemcachedAPI?

Yes, just use a .Net memcache client such as the enyim or BeIT memcache libraries.

Q. Is the row level locking applicable when you update a column through memcached API?

An update that comes through memcached uses a row lock and then releases it immediately. Memcached operations like "INCREMENT" are actually pushed down to the data nodes. In most cases the locks are not even held long enough for a network round trip.

Q. Has anyone published an example using something like PHP? I am assuming that you just use the PHP memcached extension to hook into the memcached API. Is that correct?

Not that I'm aware of but absolutely you can use it with php or any of the other drivers

Q. For beginner we need more examples.

Take a look here for a fully worked example

Q. Can I access MySQL using Cobol (Open Cobol) or C and if so where can I find the coding libraries etc?

A. There is a cobol implementation that works well with MySQL, but I do not think it is Open Cobol. Also there is a MySQL C client library that is a standard part of every mysql distribution

Q. Is there a place to go to find help when testing and/implementing the NoSQL access?

If using Cluster then you can use the cluster@lists.mysql.com alias or post on the MySQL Cluster forum

Q. Are there any white papers on this? 

Yes - there is more detail in the MySQL Guide to NoSQL whitepaper

If you have further questions, please don’t hesitate to use the comments below!

Friday Feb 24, 2012

Where Would I Use MySQL Cluster?

MySQL Cluster has long been used in telecommunications network services for Subscriber Data Management (HLR/HSS), Service Delivery Platforms and Value-Added Services, and has also been deployed in certain parts of general web infrastructure.

Following the announcements of MySQL Cluster 7.2 General Availability, including new benchmarks demonstrating MySQL Cluster delivering 1 Billion Queries per Minute, I thought it might be worthwhile to highlight examples of use cases for MySQL Cluster .

Web-Based Payment & Financial Services Platforms

MySQL Cluster can be deployed across a range of applications including payment gateways, trading systems and customer service infrastructure.

Payment Gateways

- These are used by merchants to process customer payments

- The gateways need to integrate with multiple credit and debit card systems

- Multiple payment channels have to be supported, i.e. ePayment, mPayment, In-store, etc.

- MySQL Cluster can be used to record full transaction data, including customer & product information

- This data is persisted for set time periods to enable auditing and fraud detection

Web-Based Trading Systems

- MySQL Cluster can be deployed to support the trading engine, persisting the details of each trade

- MySQL Cluster also provides the storage layer for the store–and-forward messaging system used by traders and customers to track transactions

Customer Service Systems

- MySQL Cluster can be used as a command and control system, providing telephony, web portal and call desk integration

- Inbound calls are routed to customer services representatives and customer account details are retrieved in real-time

- Additional support for Integrated Voice Response systems enabling customer self-service

Core database requirements of these platforms include:

· ACID compliance to support transactional integrity

· Rapid scale-out to support growth in merchants, traders, customers and payment channels

· Very high insert and update rates

· Low, predictable latency to support real-time trading and customer experience

· 99.999% availability to guard against both outages and support on-line maintenance operations needed to seamlessly evolve services (i.e. adding nodes, upgrading schema, etc.)

· Low TCO to maximize trading margins

Session Management and eCommerce

Providing the back-end to on-line retail sites is an area where MySQL Cluster has a strong track record, providing the following services:

- Enabling a seamless experience as users log-in, search and browse products, and then place orders.

- Managing user accounts, storing each new user session, updating customer profiles and maintaining shopping carts

- Recording and tracking user behavior to integrate with merchandizing systems, enabling real-time cross-sell and upsell promotions

Database requirements for eCommerce include

· ACID compliance to support transactional integrity

· Elastic, on-demand scale-out using commodity hardware to support growing user and order volumes, and holiday season peaks

· Low, predictable latency to support a real time user experience

· High availability to avoid downtime resulting in lost sales and compromised customer satisfaction

· On-line schema changes to support the additions of new product categories or customer profiling attributes

Take a look at the MySQL Web Reference Architectures for best practices in scaling highly available, on-line retail sites

On-Line Gaming

With a huge growth in gamers, and gaming platforms, MySQL Cluster can be used to support the core gaming persistence layer:

- MySQL Cluster manages user accounts, gaming entitlements and session state (life-force, weapons, scores, etc.), along with leaderboards, all in real time

- Manages the eCommerce and billing platform (for in-game purchases)

- Command and control system across gaming platforms, integrating multiple services with avatars and devices

Again, the core requirements of the database include:

· Linear, on-demand scalability of both read and write operations to support the ramp in demand when new games gain traction

· High availability

· Low latency for a real-time gaming experience

Event Data and Content Management

Digital Advertising and Customer Relationship Management

MySQL Cluster can be used to capture customer campaign responses in real time

- Campaign responses are consolidated across multiple channels, including web, social media, SMS, and in-store responses.

- Data is replicated in batches to the MySQL InnoDB storage engine for analysis and reporting

Event Data Capture

MySQL Cluster is used to capture real-time data feeds & metadata from environmental sensors, devices and satellites. Data is then replicated to analysis platforms for transformation and processing

Database requirements include:

· The ability to support high volume insert and update rates, with zero data loss

· Scaled-out on commodity hardware

· Flexible replication topologies to other database engines and across data centers

How to Get Started

The above examples illustrate how MySQL Cluster can be used across range of web-based services deployed on-premise or in the cloud.

If you have workloads that have similar demands, it’s worth taking a look at MySQL Cluster 7.2. The new MySQL Cluster Evaluation Guide provides best practices in quickly provisioning proof-of-concepts and benchmarking MySQL Cluster with your application.

We’d love to hear more about they types of workloads that you think would benefit from MySQL Cluster, so please use the comments section below and provide feedback.

Wednesday Feb 15, 2012

MySQL Cluster 7.2 GA Released, Delivers 1 BILLION Queries per Minute

70x Higher JOIN Performance, NoSQL Key-Value API & Cross Data Center Sharding with Synchronous Replication 

Oracle is delighted to announce the immediate availability of the production-ready, GA release of MySQL Cluster 7.2, available for download under the GPL, and as part of the commercial MySQL Cluster Carrier Grade Edition, including management tools, product certifications and 24x7 global support.

1 Billion Queries per Minute

MySQL Cluster delivered 1 billion queries per minute (17.6m million queries per second), scaled-out across 8x commodity Intel x86 server nodes, accessed by the NoSQL C++ NDB API.

It did this while maintaining 99.999% availability and complete data consistency across the cluster, demonstrating MySQL Cluster is a great choice for the most demanding web and telecoms services, whether deployed on-premise or in the cloud

New Feature Overview

The MySQL Cluster 7.2 GA release builds upon the Development Milestones published over the past 9 months, which provided the community with an opportunity to test and provide feedback on the latest features.

MySQL Cluster 7.2 offers a range of new capabilities designed to enable the delivery of next generation web services, enhance cross data center scalability and improve ease-of-use:

- Enabling next generation web services:

o 70x higher complex query performance

o Native Memcached API

o 4x higher data node scalability

o Integration with the latest MySQL 5.5 server

o Support for Virtual Machine (VM) environments

- Enhancing cross data scalability:

o New multi-site clustering with auto-sharding and synchronous replication between datacenters

o Improved active/active replication between data centers with eventual consistency

- Improved Ease-of-Use:

o Consolidated user privileges

o MySQL Cluster Manager 1.1.4

Read the MySQL Cluster 7.2 Developer Zone article to get the detail on all of the new features.

You can download the MySQL Cluster 7.2 New Features whitepaper for implementation details and how to get started or join a forthcoming MySQL Cluster 7.2 webinar for your Time Zone to learn more:

Summary

MySQL Cluster 7.2 is the best release to date, enabling projects and applications to benefit from web-scalability with carrier-grade availability and developer agility.

You can review the MySQL Cluster 7.2 documentation, and also ask questions to the development team and community via our the MySQL Cluster forum

We look forward to helping you in your new projects, and working with you to continue evolving MySQL Cluster to serve an even broader set of requirements in the future.

Monday Dec 19, 2011

Using MySQL Cluster to Protect & Scale the HDFS Namenode

The MySQL Cluster product team is always interested to see new and innovative uses of the database. Last week, a team of students at the KTH Royal Institute of Technology in Sweden blogged about their use of MySQL Cluster in creating a scalable and highly available HDFS Namenode. The blog has received some pretty wide coverage, but was first picked up by Alex Popescu at the myNoSQL site

There are many established use cases of MySQL Cluster in the web, cloud/SaaS, telecoms and even flight control systems – you can see those we are allowed to talk about publicly here

The KTH team has been working on a project to move all of the metadata from the HDFS / Hadoop nameenode to MySQL Cluster. Why did they want to do this, you may ask? Well…:

- The namenode is a single point of failure. If it goes down, so too does the file system

- As a single server, the namenode becomes a bottleneck within heavily loaded HDFS / Hadoop deployments. As server resources are consumed and write volumes increase, so the system can grind to a halt. (And with data volumes growing around 40% per year, this will only become more common!)

So KTH decided to move metadata storage to MySQL Cluster. Why, you may ask? Well….

- MySQL Cluster already offered them a replicated, shared-nothing database, distributed across commodity hardware.

- MySQL Cluster is widely deployed with proven stability

- The metadata can be distributed across nodes to scale out capacity, while retaining complete consistency to the clients and eliminating any Single Point of Failure

- Linear scaling of operations per second across the cluster, as new namenodes are added.

Access to the cluster is via the MySQL Cluster Connector for Java, providing a NoSQL, Java based ORM with very low latency. You can learn more about this ClusterJ API here

Of course, the work at KTH is on-going with future optimizations planned – which we will follow with interest.

So how can you determine if MySQL Cluster is the right choice for your new project? We have just updated our MySQL Cluster Evaluation Guide

This update is based around the latest MySQL Cluster 7.2 Development Release which includes a series of enhancements to further broaden the use case of MySQL Cluster, including:

- 70x higher JOIN performance with Adaptive Query Localization pushing JOIN operations down to MySQL Cluster’s data

- Native Key-Value Memcached interface to the cluster allowing schema and schemaless storage

- New cross-data center scalability enhancements

MySQL Cluster is not a fit for every use-case, but by downloading the Evaluation Guide, you’ll get a clear picture of where MySQL Cluster can be useful to you, and best practices in planning and executing your evaluation.

Let us know of other interesting use-cases in the comments below

Wednesday Nov 02, 2011

MySQL Cluster, and NoSQL

Those are the topics we cover in the latest episode of our “Meet The MySQL Experts” podcast.

Mat Keep and Bernd Ocklin talk about new database requirements, and walk us through what's new in the second Development Milestone Release of MySQL Cluster 7.2, including impressive performance improvements, new NoSQL access via memcached, cross data center scalability, and more...

Enjoy the podcast!

Friday Oct 07, 2011

MySQL Cluster 7.2 (DMR2): NoSQL, Key/Value, Memcached

70x Higher Performance, Cross Data Center Scalability and New NoSQL Interface

Its been an exciting week for all involved with MySQL Cluster, with the announcement of the second Development Milestone Release (7.2.1) at Oracle Open World. Highlights include:

- Enabling next generation web services: 70x higher complex query performance, native memcached API and integration with the latest MySQL 5.5 server

- Enhancing cross data scalability: new multi-site clustering and enhanced active/active replication

- Simplified provisioning: consolidated user privileges.

You can download the DMR for evaluation now from: http://dev.mysql.com/downloads/cluster/ (select Development Milestone Release tab).

You can also read up on the detail of each of these features in the new article posted at the MySQL Developer Zone. In this blog, I’ll summarize the main parts of the announcement.

70x Higher Performance with Adaptive Query Localization (AQL)

Previewed as part of the first MySQL Cluster DMR, AQL is enabled by a new Index Statistics function that allows the SQL optimizer to build a better execution plan for each query.

As a result, JOIN operations are pushed down to the data nodes where the query executes in parallel on local copies of the data. A merged result set is then sent back to the MySQL Server, significantly enhancing performance by reducing network trips.

Take a look at how this is used by a web-based content management to increase performance by 70x

Adaptive Query Localization enables MySQL Cluster to better serve those use-cases that have the need to run real-time analytics across live data sets, along with high throughput OLTP operations. Examples include recommendations engines and clickstream analysis in web applications, pre-pay billing promotions in mobile telecoms networks or fraud detection in payment systems.

New NoSQL Interface and Schema-less Storage with the memcached API

The memcached interface released as an Early Access project with the first MySQL Cluster DMR is now integrated directly into the MySQL Cluster 7.2.1 trunk, enabling simpler evaluation.

The popularity of Key/Value stores has increased dramatically. With MySQL Cluster and the new memcached API, you have all the benefits of an ACID RDBMS, combined with the performance capabilities of Key/Value store.

By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row – thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a pre-defined column in a specific table.

Of course if the application needs to access the same data through SQL then developers can map key prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster.

You can read more about the design goals and implementation of the memcached API for MySQL Cluster here.

Integration with MySQL 5.5

MySQL Cluster 7.2.1 is integrated with MySQL Server 5.5, providing binary compatibility to existing MySQL Server deployments. Users can now fully exploit the latest capabilities of both the InnoDB and MySQL Cluster storage engines within a single application.

Users simply install the new MySQL Cluster binary including the MySQL 5.5 release, restart the server and immediate have access to both InnoDB and MySQL Cluster!

Enhancing Cross Data Center Scalability: Simplified Active / Active Replication

MySQL Cluster has long offered Geographic Replication, distributing clusters to remote data centers to reduce the affects of geographic latency by pushing data closer to the user, as well as providing a capability for disaster recovery.

Geographic replication has always been designed around an Active / Active technology, so if applications are attempting to update the same row on different clusters at the same time, the conflict can be detected and resolved. With the release of MySQL Cluster 7.2.1, implementing Active / Active replication has become a whole lot simpler. Developers no longer need to implement and manage timestamp columns within their applications. Also rollbacks can be made to whole transactions rather than just individual operations.

You can learn more here.

Enhancing Cross Data Center Scalability: Multi-Site Clustering

MySQL Cluster 7.2.1 DMR provides a new option for cross data center scalability – multi-site clustering. For the first time splitting data nodes across data centers is a supported deployment option.

Improvements to MySQL Cluster’s heartbeating mechanism with a new “ConnectivityCheckPeriod” parameter enables greater resilience to temporary latency spikes on a WAN, thereby maintaining operation of the cluster.

With this deployment model, users can synchronously replicate updates between data centers without needing conflict detection and resolution, and automatically failover between those sites in the event of a node failure.

Users need to characterize their network bandwidth and latencies, and observe best practices in configuring both their network environment and Cluster. More guidance is available here.

User Privilege Consolidation

User privilege tables are now consolidated into the data nodes and centrally accessible by all MySQL servers accessing the cluster.

Previously the privilege tables were local to each MySQL server, meaning users and their associated privileges had to be managed separately on each server. By consolidating privilege data, users need only be defined once and managed centrally, saving Systems Administrators significant effort and reducing cost of operations.

Summary

The MySQL Cluster 7.2.1 DMR enables new classes of use-cases to benefit from web-scale performance with carrier-grade availability.  We also have a great webinar coming up on Wednesday October 19th  where the engineering and product management team will discuss the enhancements in more detail, and how you can use them today. You can sign up here.

You can download the DMR for evaluation now from: http://dev.mysql.com/downloads/cluster/ (select Development Milestone Release tab).

You can learn more about the MySQL Cluster architecture from our Guide to scaling web databases

Let us know what you think of these enhancements directly in comments of this or the associated blogs. We look forward to working with the community to perfect these new features.

Monday Oct 03, 2011

Synchronously Replicating Databases Across Data Centers – Are you Insane?

 

Well actually….no. The second Development Milestone Release of MySQL Cluster 7.2 introduces support for what we call “Multi-Site Clustering”. In this post, I’ll provide an overview of this new capability, and considerations you need to make when considering it as a deployment option to scale geographically dispersed database services.

You can read more about MySQL Cluster 7.2.1 in the article posted on the MySQL Developer Zone

MySQL Cluster has long offered Geographic Replication, distributing clusters to remote data centers to reduce the affects of geographic latency by pushing data closer to the user, as well as providing a capability for disaster recovery.

Multi-Site Clustering provides a new option for cross data center scalability. For the first time splitting data nodes across data centers is a supported deployment option. With this deployment model, users can synchronously replicate updates between data centers without needing to modify their application or schema for conflict handling, and automatically failover between those sites in the event of a node failure.

MySQL Cluster offers high availability by maintaining a configurable number of data replicas.  All replicas are synchronously maintained by a built-in 2 phase commit protocol.  Data node and communication failures are detected and handled automatically.  On recovery, data nodes automatically rejoin the cluster, synchronize with running nodes, and resume service.

All replicas of a given row are stored in a set of data nodes known as a nodegroup.  To provide service, a cluster must have at least one data node from each nodegroup available at all times.  When the cluster detects that the last node in a nodegroup has failed, the remaining cluster nodes will be gracefully shutdown, to ensure the consistency of the stored databases on recovery.

Improvements to the heartbeating mechanism used by MySQL Cluster enables greater resilience to temporary latency spikes on a WAN, thereby maintaining operation of the cluster. A new “ConnectivityCheck” mechanism is introduced, which must be explicitly configured. This extra mechanism adds messaging overheads and failure handling latency, and so is not switched on by default.

When configuring Multi-Site clustering, the following factors must be considered:

Bandwidth
Low bandwidth between data nodes can slow data node recovery.  In normal operation, the available bandwidth can limit the maximum system throughput.  If link saturation causes latency on individual links to increase, then node failures, and potentially cluster failure could occur.

Latency and performance
Synchronously committing transactions over a wide area increases the latency of operation execution and commit, therefore individual operations are slowed. To maintain the same overall throughput, higher client concurrency is required.  With the same client concurrency level, throughput will decrease relative to a lower latency configuration.

Latency and stability
Synchronous operation implies that clients wait to hear of the success or failure of each operation before continuing. Loss of communication to a node, and high latency communication to a node are indistinguishable in some cases.  To ensure availability, the Cluster monitors inter-node communication.  If a node experiences high communication latency, then it may be killed by another node, to prevent its high latency causing service loss.

Where inter-node latencies fluctuate, and are in the same range as the node-latency-monitoring trigger levels, node failures can result.  Node failures are expensive to recover from, and endanger Cluster availability. 

To avoid node failures, either the latency should be reduced, or the trigger levels should be raised.  Raising trigger levels can result in a longer time-to-detection of communication problems.

WAN latencies
Latency on an IP WAN may be a function of physical distance, routing hops, protocol layering, link failover times and rerouting times. The maximum expected latency on a link should be characterized as input to the cluster configuration.

Survivability of node failures
MySQL Cluster uses a fail fast mechanism to minimize time-to-recovery. Nodes that are suspected of being unreachable or dead are quickly excluded from the Cluster.  This mechanism is simple and fast, but sometimes takes steps that result in unnecessary cluster failure.  For this reason, latency trigger levels should be configured a safe margin
above the maximum latency variation on inter-data node links.

Users can configure various MySQL Cluster parameters including heartbeats, Connectivity_Check, GCP timeouts and transaction deadlock timeouts. You can read more about these parameters in the documentation

Recommendations for Multi-Site Clustering
- Ensure minimal, stable latency;
- Provision the network with sufficient bandwidth for the expected peak load - test with node recovery and system recovery;
- Configure the heartbeat period to ensure a safe margin above latency fluctuations;

- Configure the ConnectivtyCheckPeriod to avoid unnecessary node failures;

- Configure other timeouts accordingly including the GCP timeout, transaction deadlock timeout, and transaction inactivity timeout.

Example
The following is a recommendation of latency and bandwidth requirements for applications with high throughput and fast failure detection requirements:
- latency between remote data nodes must not exceed 20 milliseconds;
- bandwidth of the network link must be more than 1 Gigabit per Second.

For applications that do not require this type of stringent operating environment, latency and bandwidth can be relaxed, subject to the testing recommended above.

As the recommendations demonstrate, there are a number of factors that need to be considered before deploying multi-site clustering. For geo-redundancy, Oracle recommends Geographic Replication, but multi-site clustering does present an alternative deployment, subject to the considerations and constraints discussed above.

You can learn more about scaling web databases with MySQL Cluster from our new Guide.  We look forward to hearing your experiences with the new MySQL Cluster 7.2.1 DMR!

Friday Aug 05, 2011

Scaling Web Databases, Part 3: SQL & NoSQL Data Access

Supporting successful services on the web means scaling your back-end databases across multiple dimensions. This blog focuses on scaling access methods to your data using SQL and/or NoSQL interfaces.

In Part 1 of the blog series , I discussed scaling database performance using auto-sharding and active/active geographic replication in MySQL Cluster to enable applications to scale both within and across data centers.  

In Part 2, I discussed the need to scale operational agility to keep pace with demand, which includes being able to add capacity and performance to the database, and to evolve the schema – all without downtime.

So in this blog I want to explore another dimension to scalability -  how multiple interfaces can be used to scale access to the database, enabling users to simultaneously serve multiple applications, each with distinct access requirements.

Data Access Interfaces to MySQL Cluster

MySQL Cluster automatically shards tables across pools of commodity data nodes, rather than store those tables in a single MySQL Server. It is therefore able to present multiple interfaces to the database, giving developers a choice between:

- S    -  SQL for complex reporting-type queries;

- S    -  Simple Key/Value interfaces bypassing the SQL layer for blazing fast reads & writes;

- S    -  Real-time interfaces for micro-second latency, again bypassing the SQL layer

With this choice of interfaces, developers are free to work in their own preferred environments, enhancing productivity and agility and enabling them to innovate faster.

SQL or NoSQL - Selecting the Right Interface

The following chart shows all of the access methods available to the database. The native API for MySQL Cluster is the C++ based NDB API. All other interfaces access the data through the NDB API.

At the extreme right hand side of the chart, an application has embedded the NDB API library enabling it to make native C++ calls to the database, and therefore delivering the lowest possible latency.

On the extreme left hand side of the chart, MySQL presents a standard SQL interface to the data nodes, and provides connectivity to all of the standard MySQL connectors including:

- Common web development languages and frameworks, i.e. PHP, Perl, Python, Ruby, Ruby on Rails, Spring, Django, etc;

- JDBC (for additional connectivity into ORMs including EclipseLink, Hibernate, etc)

- .NET

- ODBC

Whichever API is chosen for an application, it is important to emphasize that all of these SQL and NoSQL access methods can be used simultaneously, across the same data set, to provide the ultimate in developer flexibility. Therefore, MySQL Cluster maybe supporting any combination of the following services, in real-time:

- Relational queries using the SQL API;

- Key/Value-based web services using the REST/JSON and memcached APIs;

- Enterprise applications with the ClusterJ and JPA APIs;

- Real-time web services (i.e. presence and location based) using the NDB API.

The following figure aims to summarize the capabilities and use-cases for each API.

Schema-less Data Store with the memcached API

As part of the MySQL Cluster 7.2 Development Milestone Release , Oracle announced the preview of native memcached Key/Value API support for MySQL Cluster enabling direct access to the database from the memcached API without passing through the SQL layer. You can read more about the implementation and how to get going with it in this excellent post from Andrew Morgan.

The following image shows the implementation of the memcached API for MySQL Cluster 


Implementation is simple - the application sends read and write requests to the memcached process (using the standard memcached API). This in turn invokes the Memcached Driver for NDB (which is part of the same process), which in turn calls the NDB API for very quick access to the data held in MySQL Cluster’s data nodes.

The solution has been designed to be very flexible, allowing the application architect to find a configuration that best fits their needs. It is possible to co-locate the memcached API in either the data nodes or application nodes, or alternatively within a dedicated memcached layer.

The benefit of this approach is that users can configure behavior on a per-key-prefix basis (through tables in MySQL Cluster) and the application doesn’t have to care – it just uses the memcached API and relies on the software to store data in the right place(s) and to keep everything synchronized.

By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row – thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a pre-defined column in a specific table.

Of course if the application needs to access the same data through SQL then developers can map key prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster.

Summary

MySQL Cluster provides developers and architects with a huge amount of flexibility in accessing their persistent data stores - a reflection that one size no longer fits all in the world of web services and databases.

You can learn more about this, and the other dimensions to scaling web databases in our new Guide. 

As ever, let me know your thoughts in the comments below. 


Thursday Jul 21, 2011

Scaling Web Databases, Part 2: Adding Nodes, Evolving Schema with Zero Downtime

In my previous post, I discussed scaling web database performance in MySQL Cluster using auto-sharding and active/active geographic replication - enabling users to scale both within and across data centers.  

I also mentioned that while scaling write-performance of any web service is critical, it is only 1 of multiple dimensions to scalability, which include:

- The need to scale operational agility to keep pace with demand. This means being able to add capacity and performance to the database, and to evolve the schema – all without downtime;

- The need to scale queries by having flexibility in the APIs used to access the database – including SQL and NoSQL interfaces;

- The need to scale the database while maintaining continuous availability.

All of these subjects are discussed in more detail in our new Scaling Web Databases guide.

In this posting, we look at scaling operational agility. 

As a web service gains in popularity it is important to be able to evolve the underlying infrastructure seamlessly, without incurring downtime and without having to add lots of additional DBA or developer resource.

Users may need to increase the capacity and performance of the database; enhance their application (and therefore their database schema) to deliver new capabilities and upgrade their underlying platforms.

MySQL Cluster can perform all of these operations and more on-line – without interrupting service to the application or clients.  

On-Line, On-Demand Scaling

MySQL Cluster allows users to scale both database performance and capacity by adding Application and Data Nodes on-line, enabling users to start with small clusters and then scale them on-demand, without downtime, as a service grows. Scaling could be the result of more users, new application functionality or more applications needing to share the database.

In the following example, the cluster on the left is configured with two application and data nodes and a single management server.  As the service grows, the users are able to scale the database and add management redundancy – all of which can be performed as an online operation.  An added advantage of scaling the Application Nodes is that they provide elasticity in scaling, so can be scaled back down if demand to the database decreases.

When new data nodes and node groups are added, the existing nodes in the cluster initiate a rolling restart to reconfigure for the new resource.  This rolling restart ensures that the cluster remains operational during the addition of new nodes.  Tables are then repartitioned and redundant rows are deleted with the OPTIMIZE TABLE command.  All of these operations are transactional, ensuring that a node failure during the add-node process will not corrupt the database.

The operations can be performed manually from the command line or automated with MySQL Cluster Manager , part of the commercial MySQL Cluster Carrier Grade Edition.

On-Line Cluster Maintenance

With its shared-nothing architecture, it is possible to avoid database outages by using rolling restarts to not only add but also upgrade nodes within the cluster.  Using this approach, users can:

- Upgrade or patch the underlying hardware and operating system;

- Upgrade or patch MySQL Cluster, with full online upgrades between releases.

MySQL Cluster supports on-line, non-blocking backups, ensuring service interruptions are again avoided during this critical database maintenance task.  Users are able to exercise fine-grained control when restoring a MySQL Cluster from backup using ndb_restore. Users can restore only specified tables or databases, or exclude specific tables or databases from being restored, using ndb_restore options --include-tables, --include-databases, --exclude-tables, and --exclude-databases.

On-Line Schema Evolution

As services evolve, developers often want to add new functionality, which in many instances may demand updating the database schema.  

This operation can be very disruptive for many databases, with ALTER TABLE commands taking the database offline for the duration of the operation.  When users have large tables with many millions of rows, downtime can stretch into hours or even days.

MySQL Cluster supports on-line schema changes, enabling users to add new columns and tables and add and remove indexes – all while continuing to serve read and write requests, and without affecting response times.

Unlike other on-line schema update solutions, MySQL Cluster does not need to create temporary tables, therefore avoiding the user having to provision double the usual memory or disk space in order to complete the operation.

Summary

So in addition to scaling write performance, MySQL Cluster can also scale operational agility.  I'll post more on scaling of data access methods and availability levels over the next few weeks.

You can read more about all of these capabilities in the new Scaling Web Databases guide.  

And of course, you can try MySQL Cluster out for yourself - its available under the GPL:

The GA release is 7.1 which can be downloaded here, but I'd recommend taking a look at the latest Development Milestone Release for MySQL Cluster 7.2 which has some great new capabilities (localized JOIN operations, simpler provisioning, etc) which can be downloaded from here (select the Development Releases tab).

As ever, let me know if there are other dimensions of scalability that I should be discussing 

Monday Jul 18, 2011

Simpler and Safer Clustering: MySQL Cluster Manager Update

Clustered computing brings with it many benefits: high performance, high availability, scalable infrastructure, etc. But it also brings with it more complexity.

Why?

Well, by its very nature, there are more “moving parts” to monitor and manage (from physical, virtual and logical hosts) to clustering software to redundant networking components – the list goes on. And a cluster that isn’t effectively provisioned and managed will cause more downtime than the standalone systems it is designed to improve upon.

When it comes to the database industry, analysts already estimate that 50% of a typical database’s Total Cost of Ownership is attributable to staffing and downtime costs. These costs will only increase if a database cluster is not effectively monitored and managed.

Monitoring and management has been a major focus in the development of the MySQL Cluster database, and as part of this focus, the latest release of MySQL Cluster Manager (MCM) hit General Availability last week. You can read all about it in Andrew Morgan's blog.

MySQL Cluster Manager 1.1.1 makes it much simpler to get up and running, to manage the cluster and to allow multiple clusters to be managed from a single process.

MySQL Cluster Manager is part of the commercial Carrier-Grade Edition but anyone is free to download and use MySQL Cluster Manager without obligation for 30 days. This is a great way for those new to MySQL Cluster to rapidly configure and provision their first cluster.

All you need do is:

1. Go to Oracle eDelivery

2. Enter some basic details and click through the agreement

3. Select “MySQL Product Pack”, then your platform, then Go

Not only does MCM make the management of MySQL Cluster simpler, it also makes it safer. One of the largest causes of downtime is administrator error, and here MySQL Cluster Manager can significantly reduce risk.

Consider the task of upgrading rom one release of MySQL Cluster to another. This can be performed as an on-line operation, using rolling restarts to apply upgrades while still serving read and write requests. Its just one of the many operations users can perform on line (ie adding data nodes, upgrading schema, backups, etc) all of which enable MySQL Cluster to achieve 99.999% uptime.

Using a manual upgrade method on a cluster configured with 4 x data nodes, 2 x MySQL Server application nodes and 2 x management nodes, the administrator would be typing 46 x manual commands in an operation that would take around 2 ½ hours to complete. The steps are shown below:

1 x preliminary check of cluster state

8 x ssh commands per server

8 x per-process stop commands

4 x scp of configuration files (2 x mgmd & 2 x mysqld)

8 x per-process start commands

8 x checks for started and re-joined processes

8 x process completion verifications

1 x verify completion of the whole cluster.

Excludes manual editing of each configuration file.

Now compare this to using MySQL Cluster Manager:

upgrade cluster --package=7.1 mycluster;

Just 1 command and walk away and leave it.

Note – both of the processes above exclude the preparation steps of copying the new software package to each host and defining where it's located. The total operation times are based on a DBA restarting 4 x MySQL Cluster Data Nodes, each with 6GB of data, and performing 10,000 operations per second.

You can learn more about MySQL Cluster Manager from our new whitepaper and on-line demo.

We also have an on-demand webinar which covers MySQL Cluster Manager as well as other complimentary methods to managing a MySQL Cluster environment:

* NDBINFO: released with MySQL Cluster 7.1, NDBINFO presents real-time status and usage statistics, providing developers and DBAs with a simple means of pro-actively monitoring and optimizing database performance and availability.

* MySQL Cluster Advisors & Graphs: part of the MySQL Enterprise Monitor and available in the commercial MySQL Cluster Carrier Grade Edition, the Enterprise Advisor includes automated best practice rules that alert on key performance and availability metrics from MySQL Cluster data nodes.

While managing clusters will never be easy, it keeps getting a whole lot simpler !

Tuesday Jun 14, 2011

Scaling Web Databases: Auto-Sharding with MySQL Cluster

The realities of today’s successful web services are creating new demands that many legacy databases were just not designed to handle:

- The need to scale writes, as well as reads, both within and across geographically dispersed data centers;

- The need to scale operational agility to keep pace with database load and application requirements. This means being able to add capacity and performance to the database, and to evolve the schema – all without downtime;

- The need to scale queries by having flexibility in the APIs used to access the database;

- The need to scale the database while maintaining continuous availability for both failures as well as scheduled maintenance events.

Each of the requirements above warrant their own dedicated blog, which I’ll find time to write over the next few weeks.

But to get started, I wanted to discuss how the MySQL Cluster database addresses the first point – scaling writes to the database with automatic sharding and geographic replication.

Auto-Sharding

MySQL Cluster is implemented as a distributed, multi-master database with no single point of failure. Tables are automatically sharded across a pool of low cost commodity nodes, enabling the database to scale horizontally to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs (memcached, REST/HTTP, C++, Java, JPA and LDAP). Up to 255 nodes are supported, of which 48 are data nodes. You can read more about the different types of nodes here.

By automatically sharding tables in the database, MySQL Cluster eliminates the need to shard at the application layer, greatly simplifying application development and maintenance.

Sharding is based on the hashing of the primary key, though users can override this by telling MySQL Cluster which fields from the primary key should be used in the hashing algorithm. Hashing on the primary key generally leads to a more even distribution of data and queries across the cluster than alternative approached such as range partitioning.

Figure 1 demonstrates how MySQL Cluster shards tables across data nodes of the cluster.

Figure 1: Auto-Sharding in MySQL Cluster

You will see from the figure above that MySQL Cluster automatically creates “node groups” from the number of replicas and data nodes specified by the user. Updates are synchronously replicated between members of the node group to protect against data loss and enable sub-second failover in the event of a node failure.

Figure 2 shows how MySQL Cluster creates primary and secondary fragments of each shard.


Figure 2: Eliminating Data Loss with Cross-Shard Fragments

MySQL Cluster is an active/active architecture with multi-master replication, so updates made by any application or SQL node accessing the cluster are instantly available to all of the other nodes accessing the cluster.

Unlike other distributed databases, users do not lose the ability to perform JOIN operations or sacrifice ACID-guarantees. In the Development Release of MySQL Cluster (7.2), Adaptive Query Localization pushes JOIN operations down to the data nodes where they are executed locally and in parallel. We've seen 20-40x higher throughput from the community members that have tested it.

Geographic Replication

Of course, web services are global and so developers will want to ensure their databases can scale-out across regions. MySQL Cluster offers Geographic Replication which distributes clusters to remote data centers, serving to reduce the affects of geographic latency as well as provide a facility for disaster recovery.

Figure 3: Geographic Replication with MySQL Cluster

Geographic Replication is asynchronous and based on standard MySQL replication – with one important difference – it is active/active so supports the detection and resolution of conflicts when the same row is updated across different clusters. This does currently require the addition of a timestamp column in the application, but that is expected to be eliminated in future releases.

Where the Rubber Meets the Road

Auto-sharding and geographic replication are all great technologies, but what do they mean in terms of delivered performance ?

The MySQL Cluster development team recently ran a series of benchmarks that characterized performance across 8 x dual socket 2.93GHz, 6 core commodity Intel servers, each equipped with 24GB of RAM. As seen in the figure below, MySQL Cluster delivered just under 2.5 million updates per second with 2 x data nodes configured per server.

Figure 4: MySQL Cluster performance scaling-out on commodity nodes.

Across 16 Intel servers, MySQL Cluster achieved just under 7 million read operations per second. We ran out of time in the test cluster before being able to complete the test of write performance, but will return to those efforts soon.

Wrap-Up

So what does all of this mean ? There is an ever-growing array of options for developers to choose from when scaling out new generations of web applications. Don’t assume that relational databases can’t scale, or offer the kind of operational agility demanded by today’s highly dynamic services. MySQL Cluster is already proven as one such option….and you don’t have to throw away ACID guarantees or the ability to run complex queries to get scalability or schema agility.

You can learn about how MySQL Cluster implements auto-sharding, along with other key features for web services such as online schema updates and NoSQL interfaces from a new on-demand webinar.

And of course MySQL Cluster is open source, so you are free to download, develop and deploy with it. The latest GA release is here.

The MySQL Cluster 7.2 Development Milestone Release including Adaptive Query Localization is here (select the Development Release tab):

Finally, if you wanted to try out MySQL Cluster with the memcached API, you can get it from the latest build on the MySQL labs site.

As ever, let us know how these technologies work for you, either in the comments below or via the MySQL Cluster forum.

Wednesday May 18, 2011

Unlocking New Value from Web Session Management

Join us for a live webinar and download a new whitepaper where we discuss how to realize new value from data collected during web session management.

Session management has long been a key component of any web infrastructure – enhancing the user browsing experience through improved reliability, reduced latency and tighter security.

Increasingly organizations are looking to unlock more value from session management to further improve user loyalty (i.e. making the web service more “sticky”) and improve monetization of web services.  There are two distinct developments that offer the promise of unlocking more value from session data:
1.    Provide highly personalized browsing experiences by recognizing repeat visitors and making real-time recommendations based on previous browsing behavior
2.    Enhance insight into user behavior through analysis of how they interact with the web service, enabling organizations to optimize web experiences  

There are many approaches to session management, and technology selection has become critical in ensuring the full value of data collected from user sessions can be realized.

For rapidly growing web properties, higher volumes of session data need to be managed and persisted in real-time while also demanding very high levels of availability, coupled with the flexibility of relational data management.

In such cases, it makes sense to evaluate the MySQL Cluster database.

To further discuss the challenges and solutions to session management, we are hosting a live webinar on Tuesday May 31st at 0900 Pacific Time / 1700 UK.  In this webinar, we discuss the challenges and solutions to session management, covering:

* The demands of session management
* How MySQL Cluster is well placed to meet the demands from session management
* Configuring session management with PHP and MySQL Cluster
* Configuring session management with with memcached and MySQL Cluster
* Real Time analysis of session data with MySQL Cluster
* Case studies

You can register for the webinar here

You can download the associated whitepaper here

Let us know your recommendations for unlocking more value from web session data in the comments below

About

Get the latest updates on products, technology, news, events, webcasts, customers and more.

Twitter


Facebook

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
5
6
9
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today