Improved Redo Log Copying During Backup
By Rahul Sisondia-Oracle on Mar 04, 2017
MySQL Enterprise Backup Team is pleased to announce redo log performance
improvements
during the backup operation starting with release 4.1.
Introduction
As explained in Redo Logging in InnoDB, redo logs are written to the redo log files in a circular fashion. Unless the MySQL server is idle, it continues to generate redo logs. MySQL Enterprise Backup (MEB) copies the redo logs from the redo log files to the ibbackup_logfile during a backup operation. MEB copies the redo logs in order to ensure that it does not miss any transaction while the physical backup is being done. MEB continues to copy the redo logs until it reads all the files from the data directory. If the redo log records are written faster than they can be processed by MEB, the backup operation fails with the following error.
"
Log scan was only able to reach <LSN1>,
but a checkpoint was at <LSN2>.
This means that the database server has overwritten
a part of the circular InnoDB log files before ibbackup
was able to read it.
To solve this problem either:
1) re-run the backup when the database has less load, or
2) re-configure the database with bigger InnoDB log files.
"
We have improved MEB so that this problem is unlikely to happen in most cases.
Background
As
explained in Parallel backup, MEB
uses different types of threads during an operation. The Following diagram
depicts the typical active threads that are working during a backup operation
prior to the 4.1 release.

Main thread
This thread forks other child threads and monitors their
execution.
Reader threads
Read the files in segments to be copied and pass them to the
processor threads.
Processor threads
Do the required processing (for instance - compress, encrypt
and validate) on the segments and pass the processed segments to the writer
threads.
Writer threads
Write the segments to the target location.
Progress thread
Is forked only if the --show-progress option is
specified. It prints the overall progress made by MEB.
Redo log thread
Continues to copy the redo logs until notified by a reader thread to stop once reader threads have finished reading the files. While multiple reader threads keep reading the files from the data directory, there is only one redo log thread that performs the following three tasks:
- Reads redo log segments of 1MB size each from redo log files in server. If there is no redo log then go to sleep for 10 milliseconds and retry to read more redo log after wake up
- Validates the redo log segments for unlikely problems (for instance – corruption and partial read) in the redo log segment.
- Writes the redo log segments to ibbackup_logfile in the backup.
It is
evident that the redo log thread does a lot of work, which could be split into
multiple threads much like the reader, processor, and writer threads.
The improvement
In the
4.1 release, we have split the one redo log thread into three different threads
as depicted in the diagram below.

Reader redo log thread
Reads redo log segments of 1 MB size each and
passes them to the Processor redo log thread. MEB creates a 16 MB buffer queue to keep these
redo log segments in order to pass them among the three redo log threads. If
there is no redo log to read, then the thread sleep with a duration that varies
between 10 milliseconds to 1 second based on the redo log generation pattern. This
ensures that the reader redo log thread does not wake up too early and waste
CPU cycles and to avoid a sleep duration that may cause the same problem which we
have fixed through this improvement.
Processor redo log thread
Validates the redo log segments, and passes
them to the writer redo log thread.
Writer redo log thread:
Writes the segments to the ibbackup_logfile
in the backup.
Test Results
We
performed a few tests in two different environments as follows. In both environments, redo log files in
server were configured to be on a RAMDISK and backup was performed on a hard
disk. This was done to simulate the use
cases where backups are usually performed in the slower disks compared to the
server is located. This configuration
ensured that server could write the redo log files as fast as it could and we
could identify the latency among the Server writing the redo logs, MEB reading,
processing and writing the same.
Environment-01:
- Server Repository Options:
- innodb_log_files_in_group = 2
- innodb_log_file_size = 512 MB
- innodb_log_buffer_size = 16 MB
- Initial database size = 687 GB
- After 30 seconds, sysbench clients connect to the server and insert one million rows/per tables
- Number of tables were the same as the number of clients.
- Redo write rate = ~60 MB/s
|
MEB version |
Number of clients |
Backup status |
|
4.1 |
100 |
Completed successfully |
|
4.1 |
500 |
Completed successfully |
|
4.1 |
1000 |
Completed successfully |
|
4.0.3 |
2 |
Failed with redo log overwrite issue |
We
kept the redo log files size small so that the redo log file switch would
happen frequently. That may result in a case where MEB is not able to read as
fast as the server was writing, but in the test, the case did not occur. That
is, the threads completed every time using the three iterations mentioned
above. In each iteration, there was approximately
150GB of data in the ibbackup_logfile.
Environment-02:
- Server Repository Options:
- innodb_log_files_in_group= 6
- innodb_log_file_size = 4 GB
- innodb_log_buffer_size = 16 MB
- Initial database size = 687 GB
- After 30 seconds, sysbench clients connect to the server and insert three million rows/per tables\
- Number of tables were same as number of clients.
- Redo write rate = ~90 MB/s
|
MEB version |
Number of clients |
Backup status |
|
4.1 |
2000 |
Completed successfully |
|
4.0.3 |
2 |
Failed with redo log overwrite issue |
Here,
we kept the redo log files size larger to widen the gap between the server
writing the redo log and MEB reading it from the redo log files. We increased the number of clients this time
and noticed a higher redo write rate but backup completed without any problems.
In this test, there was approximately 850GB of data in the ibbackup_logfile.
Summary
We
could replicate the redo log overwrite issues easily with the older version of
MEB, but we are yet to see the problem in the 4.1 release. Although it is still possible that the redo
log overwrite error may happen, the probability of that happening has been
reduced drastically. This probability could
be further reduced if you perform an Optimistic Full Backups or an
Optimistic Incremental Backups. Try out the latest version and provide us
feedback on how it works for your environments.
This improvement will be available in next release of 3.x series.
Posted by rahul on March 06, 2017 at 07:14 PM PST #