Thursday May 22, 2014

MySQL Enterprise Backup Improved Compression Algorithm for 3.10

Background:

Prior to version 3.10, MySQL Enterprise Backup (MEB) used zlib compression for in-memory compression of datafiles. The compression worked by splitting the innodb datafiles into fixed size blocks and compressing each block independently.After searching on the web we found there are many compression algorithms available which can be used for compression. This triggered the idea of testing the performance of available compression algorithms. If the benchmark shows improved performance we can make backup and/or restore faster by adding the new compression algorithm to MEB.

Implementation :

The idea to implement the algorithms procceded as follows .

1. Select a "long list" of algorithms based on literature and what Google and other databases are using.
2. Create a prototype of MEB supporting the algorithms in the long list.
3. Run comparison tests of algorithms with the MEB prototype.
4. Select a "short list" of algorithms that will be added to MEB 3.10.

Criteria for Selecting the Algorithm:

The following criteria were used in comparing compression algorithms.

1. Compression speed
2. Decompression speed
3. Compression ratio
4. CPU-usage
5. Licensing model

These criteria have a differing importance. Compression speed, and compression ratio are probably more important to most users than decompression speed.

Performance Test:

We have now completed the performance tests of the new compression algorithms for MEB. See the below table for the list of compression algorithms were evaluated in the test.

Machine and OS Configurations:

OS : Oracle Linux 6 (x86_64)
Memory: 29 GB RAM
Cpu : 8 vCPUs (2 quad-core processors, no HT)
Read speed of the source dir(data directory) : 600 MB/s
Write speed of the destination dir(backup directory) : 300 MB/s

A backup of a 441 GB database was generated using TPC-H datagen tool taken when the mysqld process was not running .

Compression Algorithms Time [min] Compr. size [GB] Compr. / Orig. size Avg. CPU usage Avg. CPU Idle Reads [MB/s] Writes [MB/s] Source Disk busy
uncompressed/Normal Backup to Directory 31 N/A 100% 20% 65% 250 250 100%
Zlib (level=1) 34 165 37% 82% 15% 220 90 70%
Zlib (level=9) 720 120 27% - - - - -
LZF 27 222 50% 45% 50% 270 140 100%
LZO 27 224 51% 40% 55% 270 140 100%
Snappy 31 221 50% 55% 40% 260 130 80%
QuickLZ 26 203 46% 35% 55% 280 120 100%
LZ4 26 215 49% 35% 55% 280 130 100%
LZMA (level=1) 90 110 25% 78% 20% 80 22 25%
LZMA (level=9) 360 88 20% - - - - -

Few Important Notes:

•  Some columns are blanks because the test ran for longer duration of time so it was not feasible to collect monitoring stats.

• “Source Disk busy" is the number of I/O per second in percent of what the device can execute. It is not related to the device throughput (MB/s).

• MEB has an internal work queue to process data that is managed by separate read, process, and write threads. Read threads will place data in the process queue where processing threads then process it, and finally after the processing is complete, the data will be placed in the write queue where they will be written out to storage. Due to this design, if writes are slower than reads (which they often are), then the reads will effectively be throttled by the write speeds (write speeds typically being the limiting factor).

Analysis of the Compression Test's:

LZ4 and QuickLZ were the fastest algorithms, while ZLib (level=9) was by far the slowest. For compression ratios, LZMA (level=9) was only able to reach 20%, whereas QuickLZ reached 46%, and LZ4 49%. This illustrates the fact that there is a trade-off between backup speed and the reduction in data size. Nevertheless, we could say that algorithm A is better than algorithm B, if A is faster than B and produces a backup which is not larger than that of B, or if A produces a smaller backup than B and A is not slower than B. Using this criteria we can say that QuickLZ is a better compression algorithm than LZ4, Snappy, LZO, or LZF. Similarly, LZMA (level=1) is superior to Zlib (level=9).                                                                         The summary table shows two limiting factors for the backup speed. The IO speed of the of disk on which the database resides (the source disk) is thelimiting factor for uncompressed backup and compressed backups made with LZF, LZO,QuickLZ and LZ4. For Zlib (level=1), Snappy and LZMA (level=1) the limiting factor is the CPU. After removing the worst performing algorithms, we have four remaining that we can organize into a line where you get higher speeds as you move to the left, and better compression as you move to the right.

BEST SPEED --- QuickLZ --LZ4------ Zlib (lev.=1) ---- LZMA (lev.=1)---- LZMA(lev=9) --- BEST COMPRESSION

Restore Speed:

The restore speed was almost the same for all the algorithms. The restore of uncompressed backup and ZLib compressed backup took 28 minutes, and for all the other algorithms the restore time was 29 minutes.

Conclusion:

For licensing reasons QuickLZ cannot be used with MEB. Therefore it was replaced with LZ4. Thus, the new compression algorithms are LZ4 (for fast compression) and LZMA (for high compression ratio).


Friday May 03, 2013

MEB : The journey so far 2010-2013

MySQL Enterprise Backup (MEB) was born 3 years ago as a newly branded avatar of InnoDB Hot backup. Wanted to share what has gone on so far, how we at Oracle think about backup, the milestones that we have achieved and the road ahead. The idea for this blog came to me after looking at Mikael's latest blog. While Mikael talks about MySQL, I want to talk about MEB.

When we started with InnoDB Hot backup the first challenge was to have it adhere to the development, quality and release processes for MySQL. This meant creating a quality plan, getting it into the development trees of MySQL and ensuring that each piece of new code went through architecture and code review. Though the initial implementer and architect of Hot backup continues to work with the MEB team, there were a host of new engineers to be trained. We also needed to ensure that the new (at that time) Barracuda InnoDB file format and incremental backup was supported. MEB 3.5.1 was the release which got these things along with the adherence to the development and quality model of MySQL.

The next challenge we faced was that of ensuring that MEB was on equal footing for both Linux and Windows. InnoDB hot Backup consisted of 2 programs - ibbackup and innobackup; innobackup is a Perl module. The main issue with using the program on Windows was the requirement to install Perl. With multiple Perl implementations and changing Perl versions, we did not want to check MEB compatibility for every implementation and new version of Perl when it was released. Even though the problem is the similar for Linux; the users of Linux are used to hacking around, changing paths and managing multiple versions of software like Perl. Windows users however expect things to just work. So we set about removing the Perl code altogether. This meant that the innobackup functionality had to be re-coded as a C program. Merging these 2 programs meant a major re-think on how the combined command line interface needed to look. The solution we came up with was to let the ibbackup and innobackup command line syntax remain as is, while the combined program had a similar but more logical "mysqlbackup" command line syntax. We were very happy with the new syntax because it freed us from history and MEB syntax became very much in line with the syntax of other MySQL clients. With the release of 3.6 we had a single C program, a more logical syntax , a product which was easier to install and worked exactly the same for all platforms.

We were getting to 2012 and database sizes were commonly approaching the 1 TB size. Such large databases meant the backup should ideally be streamed to tape. Interfacing with tape drives is a complicated and specialized activity. We neither had the bandwidth nor the expertize to handle tapes in MEB. The best solution was to adhere to a good common standard interface that was adopted by software which dealt with tapes. The interface we decided to support was Oracle's System Backup to Tape (SBT) . MEB was modified to be able to to stream the backup output to this interface. A common requirement for these interfaces is that they ideally want to deal with the backup as a single file. A single file can be streamed and restored by any software that speaks SBT. There is a whole ecosystem around SBT because it is the preferred way to backup the Oracle database. Changing MEB to think streaming instead of random access directory output was the challenge we overcame with the release of version 3.7 of MEB. With version 3.7, MEB could interface with Oracle Secure Backup, Symantec Net Backup and Tivoli Storage Manager and any other backup software that understood SBT.

After having resolved what we saw as the "basic" requirements for backup , our customers were demanding more performance and usability. We took up the challenge of performance for the 3.8 version of MEB. MEB was a monolithic single threaded program. We decided to internally break up MEB into 3 separate modules. The read phase, the process phase and the write phase. Each of these 3 phases could be multi-threaded. The number of threads dedicated for each phase were also made user configurable. All operations of backup including the "Applylog" and "copyback" were made multi-threaded. Read more details about this design approach and the performance gains in my blog - Truly Parallel backup. Meanwhile the new release of the MySQL Server 5.6 was also out. It was an interesting challenge to ensure that MEB understood the new MySQL 5.6 features and was able to take advantage of them. As on date MEB 3.8.1 is the only online backup solution that is compatible with the new features of MySQL 5.6.

Backup is like buying insurance. When all else fails you need to be sure that there is a working backup that is available to bring back your database. Backup is not something that can fail when it is needed. It is required that we are surefooted when dealing with such a critical activity. We take your trust in our solution very seriously. Thanks for being a part of the MEB journey (and for reading this blog) so far. The MySQL landscape is ever changing and we know that you desire more usability, performance and flexibility from MEB. We will try and ensure that we meet these expectations with the best possible quality. With every new MEB release you will see a more usable, flexible and performant MEB.

About

MySQL MEB Team Blog

Search

Categories
Archives
« July 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
Today