Scaling Web Databases, Part 2: Adding Nodes, Evolving Schema with Zero Downtime
By Mat Keep on Jul 21, 2011
In my previous post, I discussed scaling web database performance in MySQL Cluster using auto-sharding and active/active geographic replication - enabling users to scale both within and across data centers.
I also mentioned that while scaling write-performance of any web service is critical, it is only 1 of multiple dimensions to scalability, which include:
- The need to scale operational agility to keep pace with demand. This means being able to add capacity and performance to the database, and to evolve the schema – all without downtime;
- The need to scale queries by having flexibility in the APIs used to access the database – including SQL and NoSQL interfaces;
- The need to scale the database while maintaining continuous availability.
All of these subjects are discussed in more detail in our new Scaling Web Databases guide.
In this posting, we look at scaling operational agility.
As a web service gains in popularity it is important to be able to evolve the underlying infrastructure seamlessly, without incurring downtime and without having to add lots of additional DBA or developer resource.
Users may need to increase the capacity and performance of the database; enhance their application (and therefore their database schema) to deliver new capabilities and upgrade their underlying platforms.
MySQL Cluster can perform all of these operations and more on-line – without interrupting service to the application or clients.
On-Line, On-Demand Scaling
MySQL Cluster allows users to scale both database performance and capacity by adding Application and Data Nodes on-line, enabling users to start with small clusters and then scale them on-demand, without downtime, as a service grows. Scaling could be the result of more users, new application functionality or more applications needing to share the database.
In the following example, the cluster on the left is configured with two application and data nodes and a single management server. As the service grows, the users are able to scale the database and add management redundancy – all of which can be performed as an online operation. An added advantage of scaling the Application Nodes is that they provide elasticity in scaling, so can be scaled back down if demand to the database decreases.
When new data nodes and node groups are added, the existing nodes in the cluster initiate a rolling restart to reconfigure for the new resource. This rolling restart ensures that the cluster remains operational during the addition of new nodes. Tables are then repartitioned and redundant rows are deleted with the OPTIMIZE TABLE command. All of these operations are transactional, ensuring that a node failure during the add-node process will not corrupt the database.
The operations can be performed manually from the command line or automated with MySQL Cluster Manager , part of the commercial MySQL Cluster Carrier Grade Edition.
On-Line Cluster Maintenance
With its shared-nothing architecture, it is possible to avoid database outages by using rolling restarts to not only add but also upgrade nodes within the cluster. Using this approach, users can:
- Upgrade or patch the underlying hardware and operating system;
- Upgrade or patch MySQL Cluster, with full online upgrades between releases.
MySQL Cluster supports on-line, non-blocking backups, ensuring service interruptions are again avoided during this critical database maintenance task. Users are able to exercise fine-grained control when restoring a MySQL Cluster from backup using ndb_restore. Users can restore only specified tables or databases, or exclude specific tables or databases from being restored, using ndb_restore options --include-tables, --include-databases, --exclude-tables, and --exclude-databases.
On-Line Schema Evolution
As services evolve, developers often want to add new functionality, which in many instances may demand updating the database schema.
This operation can be very disruptive for many databases, with ALTER TABLE commands taking the database offline for the duration of the operation. When users have large tables with many millions of rows, downtime can stretch into hours or even days.
MySQL Cluster supports on-line schema changes, enabling users to add new columns and tables and add and remove indexes – all while continuing to serve read and write requests, and without affecting response times.
Unlike other on-line schema update solutions, MySQL Cluster does not need to create temporary tables, therefore avoiding the user having to provision double the usual memory or disk space in order to complete the operation.
So in addition to scaling write performance, MySQL Cluster can also scale operational agility. I'll post more on scaling of data access methods and availability levels over the next few weeks.
You can read more about all of these capabilities in the new Scaling Web Databases guide.
And of course, you can try MySQL Cluster out for yourself - its available under the GPL:
The GA release is 7.1 which can be downloaded here, but I'd recommend taking a look at the latest Development Milestone Release for MySQL Cluster 7.2 which has some great new capabilities (localized JOIN operations, simpler provisioning, etc) which can be downloaded from here (select the Development Releases tab).
As ever, let me know if there are other dimensions of scalability that I should be discussing