Scaling Web Databases, Part 3: SQL & NoSQL Data Access

Supporting successful services on the web means scaling your back-end databases across multiple dimensions. This blog focuses on scaling access methods to your data using SQL and/or NoSQL interfaces.

In Part 1 of the blog series , I discussed scaling database performance using auto-sharding and active/active geographic replication in MySQL Cluster to enable applications to scale both within and across data centers.  

In Part 2, I discussed the need to scale operational agility to keep pace with demand, which includes being able to add capacity and performance to the database, and to evolve the schema – all without downtime.

So in this blog I want to explore another dimension to scalability -  how multiple interfaces can be used to scale access to the database, enabling users to simultaneously serve multiple applications, each with distinct access requirements.

Data Access Interfaces to MySQL Cluster

MySQL Cluster automatically shards tables across pools of commodity data nodes, rather than store those tables in a single MySQL Server. It is therefore able to present multiple interfaces to the database, giving developers a choice between:

- S    -  SQL for complex reporting-type queries;

- S    -  Simple Key/Value interfaces bypassing the SQL layer for blazing fast reads & writes;

- S    -  Real-time interfaces for micro-second latency, again bypassing the SQL layer

With this choice of interfaces, developers are free to work in their own preferred environments, enhancing productivity and agility and enabling them to innovate faster.

SQL or NoSQL - Selecting the Right Interface

The following chart shows all of the access methods available to the database. The native API for MySQL Cluster is the C++ based NDB API. All other interfaces access the data through the NDB API.

At the extreme right hand side of the chart, an application has embedded the NDB API library enabling it to make native C++ calls to the database, and therefore delivering the lowest possible latency.

On the extreme left hand side of the chart, MySQL presents a standard SQL interface to the data nodes, and provides connectivity to all of the standard MySQL connectors including:

- Common web development languages and frameworks, i.e. PHP, Perl, Python, Ruby, Ruby on Rails, Spring, Django, etc;

- JDBC (for additional connectivity into ORMs including EclipseLink, Hibernate, etc)

- .NET

- ODBC

Whichever API is chosen for an application, it is important to emphasize that all of these SQL and NoSQL access methods can be used simultaneously, across the same data set, to provide the ultimate in developer flexibility. Therefore, MySQL Cluster maybe supporting any combination of the following services, in real-time:

- Relational queries using the SQL API;

- Key/Value-based web services using the REST/JSON and memcached APIs;

- Enterprise applications with the ClusterJ and JPA APIs;

- Real-time web services (i.e. presence and location based) using the NDB API.

The following figure aims to summarize the capabilities and use-cases for each API.

Schema-less Data Store with the memcached API

As part of the MySQL Cluster 7.2 Development Milestone Release , Oracle announced the preview of native memcached Key/Value API support for MySQL Cluster enabling direct access to the database from the memcached API without passing through the SQL layer. You can read more about the implementation and how to get going with it in this excellent post from Andrew Morgan.

The following image shows the implementation of the memcached API for MySQL Cluster 


Implementation is simple - the application sends read and write requests to the memcached process (using the standard memcached API). This in turn invokes the Memcached Driver for NDB (which is part of the same process), which in turn calls the NDB API for very quick access to the data held in MySQL Cluster’s data nodes.

The solution has been designed to be very flexible, allowing the application architect to find a configuration that best fits their needs. It is possible to co-locate the memcached API in either the data nodes or application nodes, or alternatively within a dedicated memcached layer.

The benefit of this approach is that users can configure behavior on a per-key-prefix basis (through tables in MySQL Cluster) and the application doesn’t have to care – it just uses the memcached API and relies on the software to store data in the right place(s) and to keep everything synchronized.

By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row – thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a pre-defined column in a specific table.

Of course if the application needs to access the same data through SQL then developers can map key prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster.

Summary

MySQL Cluster provides developers and architects with a huge amount of flexibility in accessing their persistent data stores - a reflection that one size no longer fits all in the world of web services and databases.

You can learn more about this, and the other dimensions to scaling web databases in our new Guide. 

As ever, let me know your thoughts in the comments below. 


Comments:

It's very cool!

Posted by guest on August 05, 2011 at 06:08 AM GMT #

Post a Comment:
Comments are closed for this entry.
About

Get the latest updates on products, technology, news, events, webcasts, customers and more.

Twitter


Facebook

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
5
6
9
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today