Scaling Web Databases, Part 3: SQL & NoSQL Data Access
By Mat Keep on Aug 05, 2011
Supporting successful services on the web means scaling your back-end databases across multiple dimensions. This blog focuses on scaling access methods to your data using SQL and/or NoSQL interfaces.
In Part 1 of the blog series , I discussed scaling database performance using auto-sharding and active/active geographic replication in MySQL Cluster to enable applications to scale both within and across data centers.
In Part 2, I discussed the need to scale operational agility to keep pace with demand, which includes being able to add capacity and performance to the database, and to evolve the schema – all without downtime.
So in this blog I want to explore another dimension to scalability - how multiple interfaces can be used to scale access to the database, enabling users to simultaneously serve multiple applications, each with distinct access requirements.
Data Access Interfaces to MySQL Cluster
MySQL Cluster automatically shards tables across pools of commodity data nodes, rather than store those tables in a single MySQL Server. It is therefore able to present multiple interfaces to the database, giving developers a choice between:
- S - SQL for complex reporting-type queries;
- S - Simple Key/Value interfaces bypassing the SQL layer for blazing fast reads & writes;
- S - Real-time interfaces for micro-second latency, again bypassing the SQL layer
With this choice of interfaces, developers are free to work in their own preferred environments, enhancing productivity and agility and enabling them to innovate faster.
SQL or NoSQL - Selecting the Right Interface
The following chart shows all of the access methods available to the database. The native API for MySQL Cluster is the C++ based NDB API. All other interfaces access the data through the NDB API.
At the extreme right hand side of the chart, an application has embedded the NDB API library enabling it to make native C++ calls to the database, and therefore delivering the lowest possible latency.
On the extreme left hand side of the chart, MySQL presents a standard SQL interface to the data nodes, and provides connectivity to all of the standard MySQL connectors including:
- Common web development languages and frameworks, i.e. PHP, Perl, Python, Ruby, Ruby on Rails, Spring, Django, etc;
- JDBC (for additional connectivity into ORMs including EclipseLink, Hibernate, etc)
Whichever API is chosen for an application, it is important to emphasize that all of these SQL and NoSQL access methods can be used simultaneously, across the same data set, to provide the ultimate in developer flexibility. Therefore, MySQL Cluster maybe supporting any combination of the following services, in real-time:
- Relational queries using the SQL API;
- Key/Value-based web services using the REST/JSON and memcached APIs;
- Enterprise applications with the ClusterJ and JPA APIs;
- Real-time web services (i.e. presence and location based) using the NDB API.
The following figure aims to summarize the capabilities and use-cases for each API.
Schema-less Data Store with the memcached API
As part of the MySQL Cluster 7.2 Development Milestone Release , Oracle announced the preview of native memcached Key/Value API support for MySQL Cluster enabling direct access to the database from the memcached API without passing through the SQL layer. You can read more about the implementation and how to get going with it in this excellent post from Andrew Morgan.
The following image shows the implementation of the memcached API for MySQL Cluster
Implementation is simple - the application sends read and write requests to the memcached process (using the standard memcached API). This in turn invokes the Memcached Driver for NDB (which is part of the same process), which in turn calls the NDB API for very quick access to the data held in MySQL Cluster’s data nodes.
The solution has been designed to be very flexible, allowing the application architect to find a configuration that best fits their needs. It is possible to co-locate the memcached API in either the data nodes or application nodes, or alternatively within a dedicated memcached layer.
The benefit of this approach is that users can configure behavior on a per-key-prefix basis (through tables in MySQL Cluster) and the application doesn’t have to care – it just uses the memcached API and relies on the software to store data in the right place(s) and to keep everything synchronized.
By default, every Key / Value is written to the same table with each Key / Value pair stored in a single row – thus allowing schema-less data storage. Alternatively, the developer can define a key-prefix so that each value is linked to a pre-defined column in a specific table.
Of course if the application needs to access the same data through SQL then developers can map key prefixes to existing table columns, enabling Memcached access to schema-structured data already stored in MySQL Cluster.
MySQL Cluster provides developers and architects with a huge amount of flexibility in accessing their persistent data stores - a reflection that one size no longer fits all in the world of web services and databases.
You can learn more about this, and the other dimensions to scaling web databases in our new Guide.
As ever, let me know your thoughts in the comments below.