MySQL Enterprise Backup Improved Compression Algorithm for 3.10
By SANDEEP SETHIA-Oracle on May 22, 2014
Prior to version 3.10, MySQL Enterprise Backup (MEB) used zlib compression for in-memory compression of datafiles. The compression worked by splitting the innodb datafiles into fixed size blocks and compressing each block independently.After searching on the web we found there are many compression algorithms available which can be used for compression. This triggered the idea of testing the performance of available compression algorithms. If the benchmark shows improved performance we can make backup and/or restore faster by adding the new compression algorithm to MEB.
The idea to implement the algorithms procceded as follows .
1. Select a "long list" of algorithms based on literature and what Google and other databases are using.
2. Create a prototype of MEB supporting the algorithms in the long list.
3. Run comparison tests of algorithms with the MEB prototype.
4. Select a "short list" of algorithms that will be added to MEB 3.10.
Criteria for Selecting the Algorithm:
The following criteria were used in comparing compression algorithms.
1. Compression speed
2. Decompression speed
3. Compression ratio
5. Licensing model
Performance Test:We have now completed the performance tests of the new compression algorithms for MEB. See the below table for the list of compression algorithms were evaluated in the test.
Machine and OS Configurations:
OS : Oracle Linux 6 (x86_64)
Memory: 29 GB RAM
Cpu : 8 vCPUs (2 quad-core processors, no HT)
Read speed of the source dir(data directory) : 600 MB/s
Write speed of the destination dir(backup directory) : 300 MB/s
A backup of a 441 GB database was generated using TPC-H datagen tool taken when the mysqld process was not running .
|Compression Algorithms||Time [min]||Compr. size [GB]||Compr. / Orig. size||Avg. CPU usage||Avg. CPU Idle||Reads [MB/s]||Writes [MB/s]||Source Disk busy|
|uncompressed/Normal Backup to Directory||31||N/A||100%||20%||65%||250||250||100%|
Few Important Notes:
• Some columns are blanks because the test ran for longer duration of time so it was not feasible to collect monitoring stats.
• “Source Disk busy" is the number of I/O per second in percent of what the device can execute. It is not related to the device throughput (MB/s).
• MEB has an internal work queue to process data that is managed by separate read, process, and write threads. Read threads will place data in the process queue where processing threads then process it, and finally after the processing is complete, the data will be placed in the write queue where they will be written out to storage. Due to this design, if writes are slower than reads (which they often are), then the reads will effectively be throttled by the write speeds (write speeds typically being the limiting factor).
Analysis of the Compression Test's:
LZ4 and QuickLZ were the fastest algorithms, while ZLib (level=9) was by far the slowest. For compression ratios, LZMA (level=9) was only able to reach 20%, whereas QuickLZ reached 46%, and LZ4 49%. This illustrates the fact that there is a trade-off between backup speed and the reduction in data size. Nevertheless, we could say that algorithm A is better than algorithm B, if A is faster than B and produces a backup which is not larger than that of B, or if A produces a smaller backup than B and A is not slower than B. Using this criteria we can say that QuickLZ is a better compression algorithm than LZ4, Snappy, LZO, or LZF. Similarly, LZMA (level=1) is superior to Zlib (level=9). The summary table shows two limiting factors for the backup speed. The IO speed of the of disk on which the database resides (the source disk) is thelimiting factor for uncompressed backup and compressed backups made with LZF, LZO,QuickLZ and LZ4. For Zlib (level=1), Snappy and LZMA (level=1) the limiting factor is the CPU. After removing the worst performing algorithms, we have four remaining that we can organize into a line where you get higher speeds as you move to the left, and better compression as you move to the right.
BEST SPEED --- QuickLZ --LZ4------ Zlib (lev.=1) ---- LZMA (lev.=1)---- LZMA(lev=9) --- BEST COMPRESSION
The restore speed was almost the same for all the algorithms. The restore of uncompressed backup and ZLib compressed backup took 28 minutes, and for all the other algorithms the restore time was 29 minutes.
For licensing reasons QuickLZ cannot be used with MEB. Therefore it was replaced with LZ4. Thus, the new compression algorithms are LZ4 (for fast compression) and LZMA (for high compression ratio).