Thursday Feb 07, 2013

MySQL 5.6 Replication Performance

With data volumes and user populations growing, its no wonder that database performance is a hot topic in developer and DBA circles.  

Its also no surprise that continued performance improvements were one of the top design goals of the new MySQL 5.6 release which was declared GA on February 5th (note: GA means “Generally Available”, not “Gypsy Approved” @mysqlborat)

And the performance gains haven’t disappointed:

- Dimitri Kravtchuk’s Sysbench tests showed MySQL delivering up to 4x higher performance than the previous 5.5 release.

- Mikael Ronstrom’s testing showed up to 4x better scalability as thread counts rose to 48 and 60 threads (not so uncommon in commodity systems today)

Of course, YMMV (Your Mileage May Vary) in real-world workloads, but you can be reasonably certain you will see performance gains by upgrading to MySQL 5.6.  It is up to you whether you invest these gains to serve more applications and users from your MySQL databases, or you consolidate to fewer instances to reduce operational costs.

How about if you are using MySQL replication in order to scale out your database across commodity nodes, and as a foundation for High Availability (HA)? Performance here has also been improved through a combination of:

- Binary Log Group Commit

- Multi-Threaded Slaves

- Optimized Row-Based Replication

As well as giving you the benefits above; higher replication performance directly translates to:

- Reduced risk of losing data in the event of a failure on the master

- Improved read consistency from slaves

- Resource-efficient binlogs traversing the replication cluster

We will look at each of these in more detail.

Binlog Group Commit

Rather than applying writes one at a time, Binary Log Group Commit batches writes to the Binlog, significantly reducing overhead to the master. This is demonstrated by the benchmark results below.


Using the SysBench RW tests, enabling the binlog and using the default sync_binlog=0 reduces the throughput of the master by around 10%. With sync_binlog=1 (where the MySQL server synchronizes its binary log to disk after every write to the binlog, thereby giving maximum data safety), throughput is reduced by a further 5%.

To understand the difference this makes, the tests were repeated comparing MySQL 5.6 to MySQL 5.5 


Even with sync_binlog=1, MySQL 5.6 was still up to 4.5x faster than 5.5 with sync_binlog=0

Gone are the days when configuring replication resulted in a 50% or more hit to performance of your master.

The result of Binary Log Group Commit is that MySQL replication is much better able to keep pace with the demands of write-intensive workloads, imposing much less overhead on the master, even when the binlog is flushed to disk after each commit!

Details of the configurations used for the benchmark are in the Appendix at the end of this post.

You can learn more about the implementation of Binlog Group Commit from Mats Kindahl’s blog.

Multi-Threaded Slave

Looking beyond the replication master, it is also necessary to bring performance enhancements to the replication slaves.

Using Multi-Threaded Slaves, processing is split between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems deployed in cloud environments.

Benchmarks demonstrate that Multi-Threaded Slaves increase performance by 5x.  


This performance enhancement translates to improved read consistency for clients accessing the replication cluster. Slaves are better able to keep up with the master, and so users are much less likely to need to throttle the sustained throughput of writes, just so that the slaves don't indefinitely fall further and further behind (previously some users had to reduce the capacity of their systems in order to reduce slave lag).

You can get all of the details on this benchmark and the configurations used in this earlier blog posting

Optimized Row-Based Replication

The final piece in improving replication performance is to introduce efficiencies to the binary log itself.

By only replicating those elements of the row image that have changed following INSERT, UPDATE and DELETE operations, replication throughput for both the master and slave(s) can be increased while binary log disk space, network resource and server memory footprint are all reduced.

This is especially useful when replicating across datacenters or cloud availability zones.

Another performance enhancements is the was Row-Based Replication events are handled on the slave against tables without Primary Keys. Updates would be made via multiple table scans. In MySQL 5.6, no matter size of the event, only one table scan is performed, significantly reducing the apply time.

You can learn more about Optimized Row Based Replication from the MySQL documentation.  

Crash-Safe Slaves and Binlog

Aka “Transactional Replication” this is more of an availability than a performance feature, but there is a nice performance angle to it.

The key concept behind crash safe replication is that replication positions are stored in tables rather than files and can therefore be updated transactionally, together with the data. This enables the slave or master to automatically roll back replication to the last committed event before a crash, and resume replication without administrator intervention. Not only does this reduce operational overhead, it also eliminates the risk of data loss or corruption.

From a performance perspective, it also means there is one less disk sync, since we leverage the storage engine's fsync and don't need an additional fsync for the file, giving users a performance gain.

Wrapping Up

So with a faster MySQL Server, InnoDB storage engine and replication, developers and DBAs can get ahead of performance demands. Bear in mind there is more to MySQL 5.6 replication than performance – for example Global Transaction Identifiers (GTIDs), replication event checksums, time-delayed replication and more.

To learn more about all of the new replication features in MySQL 5.6, download the Replication Introduction guide

To get up and running with MySQL replication, download the new Replication Tutorial Guide 


Appendix – Binary Log Group Commit Benchmarks

System Under Test:

5 x 12 thread Intel Xeon E7540 CPUs @2.00 GHz

512 GB memory

Oracle Linux 6.1


Configuration file:

1) --innodb_purge_threads=1

2) --innodb_file_format=barracuda

3) --innodb-buffer-pool-size=8192M

4) --innodb-support-xa=FALSE

5) --innodb-thread-concurrency=0

6) --innodb-flush-log-at-trx-commit=2

7) --innodb-log-file-size=8000M

8) --innodb-log-buffer-size=256M

9) --innodb-io-capacity=2000

10) --innodb-io-capacity-max=4000

11) --innodb-flush-neighbors=0

12) --skip-innodb-adaptive-hash-index

13) --innodb-read-io-threads=8

14) --innodb-write-io-threads=8

15) --innodb_change_buffering=all

16) --innodb-spin-wait-delay=48

17) --innodb-stats-on-metadata=off

18) --innodb-buffer-pool-instances=12

19) --innodb-monitor-enable='%'

20) --max-tmp-tables=100

21) --performance-schema

22) --performance-schema-instrument='%=on'

23) --query-cache-size=0

24) --query-cache-type=0

25) --max-connections=4000

26) --max-prepared-stmt-count=1048576

27) --sort-buffer-size=32768

28) --table-open-cache=4000

29) --table-definition-cache=4000

30) --table-open-cache-instances=16

31) --tmp-table-size=100M

32) --max-heap-table-size=1000M

33) --key-buffer-size=50M

34) --join-buffer-size=1000000



Tuesday Apr 10, 2012

Benchmarking MySQL Replication with Multi-Threaded Slaves

The objective of this benchmark is to measure the performance improvement achieved when enabling the Multi-Threaded Slave enhancement delivered as a part MySQL 5.6.

As the results demonstrate, Multi-Threaded Slaves delivers 5x higher replication performance based on a configuration with 10 databases/schemas. For real-world deployments, higher replication performance directly translates to:

· Improved consistency of reads from slaves (i.e. reduced risk of reading "stale" data)

· Reduced risk of data loss should the master fail before replicating all events in its binary log (binlog)

The multi-threaded slave splits processing between worker threads based on schema, allowing updates to be applied in parallel, rather than sequentially. This delivers benefits to those workloads that isolate application data using databases - e.g. multi-tenant systems deployed in cloud environments.

Multi-Threaded Slaves are just one of many enhancements to replication previewed as part of the MySQL 5.6 Development Release, which include:

· Global Transaction Identifiers coupled with MySQL utilities for automatic failover / switchover and slave promotion

· Crash Safe Slaves and Binlog

· Optimized Row Based Replication

· Replication Event Checksums

· Time Delayed Replication

These and many more are discussed in the “MySQL 5.6 Replication: Enabling the Next Generation of Web & Cloud Services” Developer Zone article 

Back to the benchmark - details are as follows.


Environment
The test environment consisted of two Linux servers:

· one running the replication master

· one running the replication slave.

Only the slave was involved in the actual measurements, and was based on the following configuration:

- Hardware: Oracle Sun Fire X4170 M2 Server

- CPU: 2 sockets, 6 cores with hyper-threading, 2930 MHz.

- OS: 64-bit Oracle Enterprise Linux 6.1
- Memory: 48 GB

Test Procedure
Initial Setup:

Two MySQL servers were started on two different hosts, configured as replication master and slave.

10 sysbench schemas were created, each with a single table:

CREATE TABLE `sbtest` (
   `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `k` int(10) unsigned NOT NULL DEFAULT '0',
   `c` char(120) NOT NULL DEFAULT '',
   `pad` char(60) NOT NULL DEFAULT '',
   PRIMARY KEY (`id`),
   KEY `k` (`k`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

10,000 rows were inserted in each of the 10 tables, for a total of 100,000 rows. When the inserts had replicated to the slave, the slave threads were stopped. The slave data directory was copied to a backup location and the slave threads position in the master binlog noted.

10 sysbench clients, each configured with 10 threads, were spawned at the same time to generate a random schema load against each of the 10 schemas on the master. Each sysbench client executed 10,000 "update key" statements:

UPDATE sbtest set k=k+1 WHERE id = <random row>

In total, this generated 100,000 update statements to later replicate during the test itself.

Test Methodology:
The number of slave workers to test with was configured using:

SET GLOBAL slave_parallel_workers=<workers>

Then the slave IO thread was started and the test waited for all the update queries to be copied over to the relay log on the slave.

The benchmark clock was started and then the slave SQL thread was started. The test waited for the slave SQL thread to finish executing the 100k update queries, doing "select master_pos_wait()". When master_pos_wait() returned, the benchmark clock was stopped and the duration calculated.

The calculated duration from the benchmark clock should be close to the time it took for the SQL thread to execute the 100,000 update queries. The 100k queries divided by this duration gave the benchmark metric, reported as Queries Per Second (QPS).

Test Reset:

The test-reset cycle was implemented as follows:

· the slave was stopped

· the slave data directory replaced with the previous backup

· the slave restarted with the slave threads replication pointer repositioned to the point before the update queries in the binlog.

The test could then be repeated with identical set of queries but a different number of slave worker threads, enabling a fair comparison.

The Test-Reset cycle was repeated 3 times for 0-24 number of workers and the QPS metric calculated and averaged for each worker count.

MySQL Configuration
The relevant configuration settings used for MySQL are as follows:

binlog-format=STATEMENT
relay-log-info-repository=TABLE
master-info-repository=TABLE

As described in the test procedure, the
slave_parallel_workers setting was modified as part of the test logic. The consequence of changing this setting is:

0 worker threads:
   - current (i.e. single threaded) sequential mode
   - 1 x IO thread and 1 x SQL thread
   - SQL thread both reads and executes the events

1 worker thread:
   - sequential mode
   - 1 x IO thread, 1 x Coordinator SQL thread and 1 x Worker thread
   - coordinator reads the event and hands it to the worker who executes

2+ worker threads:
   - parallel execution
   - 1 x IO thread, 1 x Coordinator SQL thread and 2+ Worker threads
   - coordinator reads events and hands them to the workers who execute them

Results
Figure 1 below shows that Multi-Threaded Slaves deliver ~5x higher replication performance when configured with 10 worker threads, with the load evenly distributed across our 10 x schemas. This result is compared to the current replication implementation which is based on a single SQL thread only (i.e. zero worker threads).

Figure 1: 5x Higher Performance with Multi-Threaded Slaves

The following figure shows more detailed results, with QPS sampled and reported as the worker threads are incremented.

The raw numbers behind this graph are reported in the Appendix section of this post.



Figure 2: Detailed Results

As the results above show, the configuration does not scale noticably from 5 to 9 worker threads. When configured with 10 worker threads however, scalability increases significantly. The conclusion therefore is that it is desirable to configure the same number of worker threads as schemas.

Other conclusions from the results:

· Running with 1 worker compared to zero workers just introduces overhead without the benefit of parallel execution.

· As expected, having more workers than schemas adds no visible benefit.

Aside from what is shown in the results above, testing also demonstrated that the following settings had a very positive effect on slave performance:


relay-log-info-repository=TABLE
master-info-repository=TABLE

For 5+ workers, it was up to 2.3 times as fast to run with TABLE compared to FILE.

Conclusion

As the results demonstrate, Multi-Threaded Slaves deliver significant performance increases to MySQL replication when handling multiple schemas.

This, and the other replication enhancements introduced in MySQL 5.6 are fully available for you to download and evaluate now from the MySQL Developer site (select Development Release tab).

You can learn more about MySQL 5.6 from the documentation 

Please don’t hesitate to comment on this or other replication blogs with feedback and questions.

Appendix – Detailed Results

Monday Aug 01, 2011

MySQL 5.6 Replication – New Early Access Features

At OSCON 2011 last week, Oracle delivered more early access (labs) features for MySQL 5.6 replication. These features are focused on better integration, performance and data integrity, and are summarized in this blog with links to resources enabling users to download, configure and evaluate them[Read More]
About

Get the latest updates on products, technology, news, events, webcasts, customers and more.

Twitter


Facebook

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
2
5
6
9
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today