Tuesday Apr 10, 2012

Benchmarking MySQL Replication with Multi-Threaded Slaves

The objective of this benchmark is to measure the performance improvement achieved when enabling the Multi-Threaded Slave enhancement delivered as a part MySQL 5.6.

As the results demonstrate, Multi-Threaded Slaves delivers 5x higher replication performance based on a configuration with 10 databases/schemas. For real-world deployments, higher replication performance directly translates to:

· Improved consistency of reads from slaves (i.e. reduced risk of reading "stale" data)

· Reduced risk of data loss should the master fail before replicating all events in its binary log (binlog)

The multi-threaded slave splits processing between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems deployed in cloud environments.

Multi-Threaded Slaves are just one of many enhancements to replication previewed as part of the MySQL 5.6 Development Release, which include:

· Global Transaction Identifiers coupled with MySQL utilities for automatic failover / switchover and slave promotion

· Crash Safe Slaves and Binlog

· Optimized Row Based Replication

· Replication Event Checksums

· Time Delayed Replication

These and many more are discussed in the “MySQL 5.6 Replication: Enabling the Next Generation of Web & Cloud Services” Developer Zone article 

Back to the benchmark - details are as follows.

The test environment consisted of two Linux servers:

· one running the replication master

· one running the replication slave.

Only the slave was involved in the actual measurements, and was based on the following configuration:

- Hardware: Oracle Sun Fire X4170 M2 Server

- CPU: 2 sockets, 6 cores with hyper-threading, 2930 MHz.

- OS: 64-bit Oracle Enterprise Linux 6.1
- Memory: 48 GB

Test Procedure
Initial Setup:

Two MySQL servers were started on two different hosts, configured as replication master and slave.

10 sysbench schemas were created, each with a single table:

CREATE TABLE `sbtest` (
   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `k` int(10) unsigned NOT NULL DEFAULT '0',
   `c` char(120) NOT NULL DEFAULT '',
   `pad` char(60) NOT NULL DEFAULT '',
   PRIMARY KEY (`id`),
   KEY `k` (`k`)

10,000 rows were inserted in each of the 10 tables, for a total of 100,000 rows. When the inserts had replicated to the slave, the slave threads were stopped. The slave data directory was copied to a backup location and the slave threads position in the master binlog noted.

10 sysbench clients, each configured with 10 threads, were spawned at the same time to generate a random schema load against each of the 10 schemas on the master. Each sysbench client executed 10,000 "update key" statements:

UPDATE sbtest set k=k+1 WHERE id = <random row>

In total, this generated 100,000 update statements to later replicate during the test itself.

Test Methodology:
The number of slave workers to test with was configured using:

SET GLOBAL slave_parallel_workers=<workers>

Then the slave IO thread was started and the test waited for all the update queries to be copied over to the relay log on the slave.

The benchmark clock was started and then the slave SQL thread was started. The test waited for the slave SQL thread to finish executing the 100k update queries, doing "select master_pos_wait()". When master_pos_wait() returned, the benchmark clock was stopped and the duration calculated.

The calculated duration from the benchmark clock should be close to the time it took for the SQL thread to execute the 100,000 update queries. The 100k queries divided by this duration gave the benchmark metric, reported as Queries Per Second (QPS).

Test Reset:

The test-reset cycle was implemented as follows:

· the slave was stopped

· the slave data directory replaced with the previous backup

· the slave restarted with the slave threads replication pointer repositioned to the point before the update queries in the binlog.

The test could then be repeated with identical set of queries but a different number of slave worker threads, enabling a fair comparison.

The Test-Reset cycle was repeated 3 times for 0-24 number of workers and the QPS metric calculated and averaged for each worker count.

MySQL Configuration
The relevant configuration settings used for MySQL are as follows:


As described in the test procedure, the
slave_parallel_workers setting was modified as part of the test logic. The consequence of changing this setting is:

0 worker threads:
   - current (i.e. single threaded) sequential mode
   - 1 x IO thread and 1 x SQL thread
   - SQL thread both reads and executes the events

1 worker thread:
   - sequential mode
   - 1 x IO thread, 1 x Coordinator SQL thread and 1 x Worker thread
   - coordinator reads the event and hands it to the worker who executes

2+ worker threads:
   - parallel execution
   - 1 x IO thread, 1 x Coordinator SQL thread and 2+ Worker threads
   - coordinator reads events and hands them to the workers who execute them

Figure 1 below shows that Multi-Threaded Slaves deliver ~5x higher replication performance when configured with 10 worker threads, with the load evenly distributed across our 10 x schemas. This result is compared to the current replication implementation which is based on a single SQL thread only (i.e. zero worker threads).

Figure 1: 5x Higher Performance with Multi-Threaded Slaves

The following figure shows more detailed results, with QPS sampled and reported as the worker threads are incremented.

The raw numbers behind this graph are reported in the Appendix section of this post.

Figure 2: Detailed Results

As the results above show, the configuration does not scale noticably from 5 to 9 worker threads. When configured with 10 worker threads however, scalability increases significantly. The conclusion therefore is that it is desirable to configure the same number of worker threads as schemas.

Other conclusions from the results:

· Running with 1 worker compared to zero workers just introduces overhead without the benefit of parallel execution.

· As expected, having more workers than schemas adds no visible benefit.

Aside from what is shown in the results above, testing also demonstrated that the following settings had a very positive effect on slave performance:


For 5+ workers, it was up to 2.3 times as fast to run with TABLE compared to FILE.


As the results demonstrate, Multi-Threaded Slaves deliver significant performance increases to MySQL replication when handling multiple schemas.

This, and the other replication enhancements introduced in MySQL 5.6 are fully available for you to download and evaluate now from the MySQL Developer site (select Development Release tab).

You can learn more about MySQL 5.6 from the documentation 

Please don’t hesitate to comment on this or other replication blogs with feedback and questions.

Appendix – Detailed Results

Monday Aug 01, 2011

MySQL 5.6 Replication – New Early Access Features

At OSCON 2011 last week, Oracle delivered more early access (labs) features for MySQL 5.6 replication. These features are focused on better integration, performance and data integrity, and are summarized in this blog with links to resources enabling users to download, configure and evaluate them[Read More]

Get the latest updates on products, technology, news, events, webcasts, customers and more.




« February 2017