MySQL Enterprise Backup Team is pleased to announce redo log performance
improvements
during the backup operation starting with release 4.1.
As
explained in Redo Logging in InnoDB, redo
logs are written to the redo log files in a circular fashion. Unless the MySQL
server is idle, it continues to generate redo logs. MySQL Enterprise Backup (MEB)
copies the redo logs from the redo log files to the ibbackup_logfile during a backup
operation. MEB copies the redo logs in order to ensure that it does not miss
any transaction while the physical backup is being done. MEB continues to copy
the redo logs until it reads all the files from the data directory. If the redo log records are written faster
than they can be processed by MEB, the backup operation fails with the
following error.
"
Log scan was only able to reach <LSN1>,
but a checkpoint was at <LSN2>.
This means that the database server has overwritten
a part of the circular InnoDB log files before ibbackup
was able to read it.
To solve this problem either:
1) re-run the backup when the database has less load, or
2) re-configure the database with bigger InnoDB log files.
"
We
have improved MEB so that this problem is unlikely to happen in most cases.
As
explained in Parallel backup, MEB
uses different types of threads during an operation. The Following diagram
depicts the typical active threads that are working during a backup operation
prior to the 4.1 release.
This thread forks other child threads and monitors their
execution.
Read the files in segments to be copied and pass them to the
processor threads.
Do the required processing (for instance - compress, encrypt
and validate) on the segments and pass the processed segments to the writer
threads.
Write the segments to the target location.
Is forked only if the --show-progress option is
specified. It prints the overall progress made by MEB.
Continues to copy the redo logs until notified by a
reader thread to stop once reader threads have finished reading the files. While multiple reader
threads keep reading the files from the data directory, there is only one redo
log thread that performs the following three tasks:
It is
evident that the redo log thread does a lot of work, which could be split into
multiple threads much like the reader, processor, and writer threads.
In the
4.1 release, we have split the one redo log thread into three different threads
as depicted in the diagram below.
Reads redo log segments of 1 MB size each and
passes them to the Processor redo log thread. MEB creates a 16 MB buffer queue to keep these
redo log segments in order to pass them among the three redo log threads. If
there is no redo log to read, then the thread sleep with a duration that varies
between 10 milliseconds to 1 second based on the redo log generation pattern. This
ensures that the reader redo log thread does not wake up too early and waste
CPU cycles and to avoid a sleep duration that may cause the same problem which we
have fixed through this improvement.
Validates the redo log segments, and passes
them to the writer redo log thread.
Writes the segments to the ibbackup_logfile
in the backup.
We
performed a few tests in two different environments as follows. In both environments, redo log files in
server were configured to be on a RAMDISK and backup was performed on a hard
disk. This was done to simulate the use
cases where backups are usually performed in the slower disks compared to the
server is located. This configuration
ensured that server could write the redo log files as fast as it could and we
could identify the latency among the Server writing the redo logs, MEB reading,
processing and writing the same.
Environment-01:
MEB version | Number of clients | Backup status |
4.1 | 100 | Completed successfully |
4.1 | 500 | Completed successfully |
4.1 | 1000 | Completed successfully |
4.0.3 | 2 | Failed with redo log overwrite issue |
We
kept the redo log files size small so that the redo log file switch would
happen frequently. That may result in a case where MEB is not able to read as
fast as the server was writing, but in the test, the case did not occur. That
is, the threads completed every time using the three iterations mentioned
above. In each iteration, there was approximately
150GB of data in the ibbackup_logfile.
Environment-02:
MEB version | Number of clients | Backup status |
4.1 | 2000 | Completed successfully |
4.0.3 | 2 | Failed with redo log overwrite issue |
Here,
we kept the redo log files size larger to widen the gap between the server
writing the redo log and MEB reading it from the redo log files. We increased the number of clients this time
and noticed a higher redo write rate but backup completed without any problems.
In this test, there was approximately 850GB of data in the ibbackup_logfile.
We
could replicate the redo log overwrite issues easily with the older version of
MEB, but we are yet to see the problem in the 4.1 release. Although it is still possible that the redo
log overwrite error may happen, the probability of that happening has been
reduced drastically. This probability could
be further reduced if you perform an Optimistic Full Backups or an
Optimistic Incremental Backups. Try out the latest version and provide us
feedback on how it works for your environments.
This improvement will be available in next release of 3.x series.