Supporting successful services on the web means scaling your back-end databases across multiple dimensions. This blog focuses on scaling access methods to your data using SQL and/or NoSQL interfaces.
In Part 1 of
the blog series ,
I discussed scaling database performance using auto-sharding and
active/active geographic replication in MySQL Cluster to enable applications to
scale both within and across data centers.
I discussed the need to scale operational agility to keep pace with demand, which includes being able to add capacity and performance to the database, and
to evolve the schema – all without downtime.
in this blog I want to explore another dimension to scalability - how multiple interfaces can be
used to scale access to the database, enabling users to simultaneously serve
multiple applications, each with distinct access requirements.
Access Interfaces to MySQL Cluster
MySQL Cluster automatically shards tables across pools of commodity data nodes, rather than store those tables in a single MySQL Server. It is therefore able to present multiple interfaces to the database, giving developers a
- S - SQL for complex reporting-type queries;
- S - Simple Key/Value interfaces bypassing the SQL layer
for blazing fast reads & writes;
- S - Real-time interfaces for micro-second latency, again bypassing the SQL layer
With this choice of interfaces, developers are
free to work in their own preferred environments, enhancing productivity and
agility and enabling them to innovate faster.
SQL or NoSQL - Selecting the Right Interface
The following chart shows all of
the access methods available to the database. The native API for MySQL Cluster is the C++ based NDB API. All other interfaces access the data through
the NDB API.
At the extreme right hand side
of the chart, an application has embedded the NDB API library enabling it to
make native C++ calls to the database, and therefore delivering the lowest
On the extreme left hand side of
the chart, MySQL presents a standard SQL interface to the data nodes, and
provides connectivity to all of the standard MySQL connectors including:
- Common web development languages and frameworks, i.e.
PHP, Perl, Python, Ruby, Ruby on Rails, Spring, Django, etc;
- JDBC (for additional connectivity into ORMs including
EclipseLink, Hibernate, etc)
Whichever API is chosen for an
application, it is important to emphasize that all of these SQL and NoSQL
access methods can be used simultaneously, across the same data set, to provide
the ultimate in developer flexibility. Therefore, MySQL Cluster maybe
supporting any combination of the following services, in real-time:
- Relational queries using the SQL API;
- Key/Value-based web services using the REST/JSON and
- Enterprise applications with the ClusterJ and JPA APIs;
- Real-time web services (i.e. presence and location based)
using the NDB API.
The following figure aims to
summarize the capabilities and use-cases for each API.
Schema-less Data Store with the memcached API
As part of the MySQL
Cluster 7.2 Development Milestone Release , Oracle announced the preview of
native memcached Key/Value API support for MySQL Cluster enabling direct
access to the database from the memcached API without passing through the SQL
layer. You can read more about the implementation
and how to get going with it in this excellent post from Andrew Morgan.
The following image shows the implementation of the memcached API for MySQL Cluster
is simple - the application sends read and write requests to the memcached process
(using the standard memcached API). This in turn invokes the Memcached Driver
for NDB (which is part of the same process), which in turn calls the NDB API
for very quick access to the data held in MySQL Cluster’s data nodes.
has been designed to be very flexible, allowing the application architect to
find a configuration that best fits their needs. It is possible to co-locate
the memcached API in either the data nodes or application nodes, or
alternatively within a dedicated memcached layer.
The benefit of this
approach is that users can configure behavior on a per-key-prefix basis
(through tables in MySQL Cluster) and the application doesn’t have to care – it
just uses the memcached API and relies on the software to store data in the
right place(s) and to keep everything synchronized.
By default, every Key
/ Value is written to the same table with each Key / Value pair stored in a
single row – thus allowing schema-less data storage. Alternatively, the
developer can define a key-prefix so that each value is linked to a pre-defined
column in a specific table.
Of course if the
application needs to access the same data through SQL then developers can map
key prefixes to existing table columns, enabling Memcached access to
schema-structured data already stored in MySQL Cluster.
MySQL Cluster provides developers and architects with a huge amount of flexibility in accessing their persistent data stores - a reflection that one size no longer fits all in the world of web services and databases.
You can learn more about this, and the other dimensions to scaling web databases in our new Guide.
As ever, let me know your thoughts in the comments below.