MySQL and HeatWave

Benchmarking MySQL Replication with Multi-Threaded Slaves

Guest Author

objective of this benchmark is to measure the performance improvement achieved
when enabling the Multi-Threaded Slave enhancement delivered as a part MySQL 5.6.

As the
results demonstrate, Multi-Threaded Slaves delivers 5x higher replication performance
based on a configuration with 10 databases/schemas. For real-world deployments, higher replication
performance directly translates to:

· Improved
consistency of reads from slaves (i.e. reduced risk of reading
"stale" data)

· Reduced
risk of data loss should the master fail before replicating all events in its
binary log (binlog)

The multi-threaded slave
splits processing between worker threads based on schema, allowing updates to
be applied in parallel, rather than sequentially. This delivers benefits to
those workloads that isolate application data using databases - e.g.
multi-tenant systems deployed in cloud environments.

Multi-Threaded Slaves
are just one of many enhancements to replication previewed as part of the MySQL
5.6 Development Release, which include:

· Global Transaction Identifiers coupled with
MySQL utilities for automatic failover / switchover and slave promotion

· Crash Safe Slaves and Binlog

· Optimized Row Based Replication

· Replication Event Checksums

· Time Delayed Replication

These and many more
are discussed in the “MySQL 5.6
Replication: Enabling the Next Generation of Web & Cloud Services”
Developer Zone article 

Back to the benchmark - details
are as follows.


The test environment consisted of two Linux servers:

· one
running the replication master

· one
running the replication slave.

Only the
slave was involved in the actual measurements, and was based on the following

- Hardware: Oracle Sun Fire X4170 M2 Server

- CPU: 2
sockets, 6 cores with hyper-threading, 2930 MHz.

- OS: 64-bit
Oracle Enterprise Linux 6.1

- Memory: 48 GB

Test Procedure
Initial Setup:

Two MySQL servers were started on two different hosts, configured as
replication master and slave.

sysbench schemas were created, each with a single table:

CREATE TABLE `sbtest` (

   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

   `k` int(10) unsigned NOT NULL DEFAULT '0',

   `c` char(120) NOT NULL DEFAULT '',

   `pad` char(60) NOT NULL DEFAULT '',

   PRIMARY KEY (`id`),

   KEY `k` (`k`)


rows were inserted in each of the 10 tables, for a total of 100,000 rows. When
the inserts had replicated to the slave, the slave threads were stopped. The
slave data directory was copied to a backup location and the slave threads
position in the master binlog noted.

10 sysbench clients, each configured with 10 threads,
were spawned at the same time to generate a random schema load against each of
the 10 schemas on the master. Each sysbench client executed 10,000 "update
key" statements:

UPDATE sbtest set k=k+1 WHERE id = <random row>

In total,
this generated 100,000 update statements to later replicate during the test

Test Methodology:
The number of slave workers to test with was configured using:

SET GLOBAL slave_parallel_workers=<workers>

Then the slave IO thread was started and the test waited for all the update
queries to be copied over to the relay log on the slave.

The benchmark clock was started and then the slave SQL thread was started. The
test waited for the slave SQL thread to finish executing the 100k update
queries, doing "select master_pos_wait()". When master_pos_wait() returned, the benchmark
clock was stopped and the duration calculated.

The calculated duration from the benchmark clock should be close to the time it
took for the SQL thread to execute the 100,000 update queries. The 100k queries divided by this duration gave
the benchmark metric, reported as Queries Per Second (QPS).

Test Reset:

test-reset cycle was implemented as follows:

· the
slave was stopped

· the
slave data directory replaced with the previous backup

· the
slave restarted with the slave threads replication pointer repositioned to the
point before the update queries in the binlog.

The test
could then be repeated with identical set of queries but a different number of
slave worker threads, enabling a fair comparison.

The Test-Reset cycle was repeated 3 times for 0-24 number of workers and the
QPS metric calculated and averaged for each worker count.

MySQL Configuration

The relevant configuration settings used for MySQL are as follows:




As described in the test procedure, the
slave_parallel_workers setting was modified as part of
the test logic. The consequence of changing this setting is:

0 worker threads:

   - current (i.e. single threaded) sequential mode

   - 1 x IO thread and 1 x SQL thread

   - SQL thread both reads and executes the events

1 worker thread:

   - sequential mode

   - 1 x IO thread, 1 x Coordinator SQL thread and 1 x Worker thread

   - coordinator reads the event and hands it to the worker who

2+ worker threads:

   - parallel execution

   - 1 x IO thread, 1 x Coordinator SQL thread and 2+ Worker threads

   - coordinator reads events and hands them to the workers who execute


Figure 1 below shows that Multi-Threaded Slaves deliver ~5x higher replication
performance when configured with 10 worker threads, with the load evenly
distributed across our 10 x schemas. This result is compared to the current replication implementation which
is based on a single SQL thread only (i.e. zero worker threads).

Figure 1: 5x Higher Performance with
Multi-Threaded Slaves

following figure shows more detailed results, with QPS sampled and reported as
the worker threads are incremented.

The raw
numbers behind this graph are reported in the Appendix section of this post.

Figure 2: Detailed Results

As the
results above show, the configuration does not scale noticably from 5 to 9
worker threads. When configured with 10
worker threads however, scalability increases significantly. The conclusion therefore is that it is
desirable to configure the same number of worker threads as schemas.

conclusions from the results:

· Running
with 1 worker compared to zero workers just introduces overhead without the
benefit of parallel execution.

· As
expected, having more workers than schemas adds no visible benefit.

from what is shown in the results above, testing also demonstrated that the
following settings had a very positive
effect on slave performance:



For 5+
workers, it was up to 2.3 times as fast to run with TABLE compared to FILE.


As the
results demonstrate, Multi-Threaded Slaves deliver significant performance
increases to MySQL replication when handling multiple schemas.

This, and
the other replication enhancements introduced in MySQL 5.6 are fully available
for you to download and evaluate now from the MySQL Developer site
 (select Development Release tab).

You can
learn more about MySQL 5.6 from the documentation 

don’t hesitate to comment on this or other replication blogs with feedback and

Appendix – Detailed Results

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.