Sunday, March 5, 2017

Improved Redo Log Copying During Backup

MySQL Enterprise Backup Team is pleased to announce redo log performance
improvements

during the backup operation starting with release 4.1. 

Introduction 

As
explained in
Redo Logging in InnoDB, redo
logs are written to the redo log files in a circular fashion. Unless the MySQL
server is idle, it continues to generate redo logs. MySQL Enterprise Backup (MEB)
copies the redo logs from the redo log files to the ibbackup_logfile during a backup
operation. MEB copies the redo logs in order to ensure that it does not miss
any transaction while the physical backup is being done. MEB continues to copy
the redo logs until it reads all the files from the data directory.
If the redo log records are written faster
than they can be processed by MEB, the backup operation fails with the
following error.

"

    Log scan was only able to reach <LSN1>,

    but a checkpoint was at <LSN2>.

    This means that the database server has overwritten

    a part of the circular InnoDB log files before ibbackup

    was able to read it.

    To solve this problem either:

    1) re-run the backup when the database has less load, or

    2) re-configure the database with bigger InnoDB log files.

"

We
have improved MEB so that this problem is unlikely to happen in most cases.

Background

As
explained in Parallel backup, MEB
uses different types of threads during an operation. The Following diagram
depicts the typical active threads that are working during a backup operation
prior to the 4.1 release.


Main thread

This thread forks other child threads and monitors their
execution.

Reader threads

Read the files in segments to be copied and pass them to the
processor threads.

Processor threads

Do the required processing (for instance - compress, encrypt
and validate) on the segments and pass the processed segments to the writer
threads.

Writer threads

Write the segments to the target location.

Progress thread

Is forked only if the --show-progress option is
specified. It prints the overall progress made by MEB.

Redo log thread

Continues to copy the redo logs until notified by a
reader thread to stop once reader threads have finished reading the files.  While multiple reader
threads keep reading the files from the data directory, there is only one redo
log thread that performs the following three tasks:

  1. Reads
    redo log segments of 1MB size each from redo log files in server.  If there is no redo log then go to sleep for
    10 milliseconds and retry to read more redo log after wake up
  2. Validates
    the redo log segments for unlikely problems (for instance – corruption and partial
    read) in the redo log segment.  
  3. Writes
    the redo log segments to ibbackup_logfile in the backup.

It is
evident that the redo log thread does a lot of work, which could be split into
multiple threads much like the reader, processor, and writer threads. 

The improvement

In the
4.1 release, we have split the one redo log thread into three different threads
as depicted in the diagram below.


Reader redo log thread

Reads redo log segments of 1 MB size each and
passes them to the Processor redo log thread.  MEB creates a 16 MB buffer queue to keep these
redo log segments in order to pass them among the three redo log threads. If
there is no redo log to read, then the thread sleep with a duration that varies
between 10 milliseconds to 1 second based on the redo log generation pattern. This
ensures that the reader redo log thread does not wake up too early and waste
CPU cycles and to avoid a sleep duration that may cause the same problem which we
have fixed through this improvement.

Processor redo log thread

Validates the redo log segments, and passes
them to the writer redo log thread.

Writer redo log thread:

Writes the segments to the ibbackup_logfile
in the backup.

Test Results

We
performed a few tests in two different environments as follows.  In both environments, redo log files in
server were configured to be on a RAMDISK and backup was performed on a hard
disk.  This was done to simulate the use
cases where backups are usually performed in the slower disks compared to the
server is located.  This configuration
ensured that server could write the redo log files as fast as it could and we
could identify the latency among the Server writing the redo logs, MEB reading,
processing and writing the same.

Environment-01:

  • Server
    Repository Options:
    • innodb_log_files_in_group
      = 2
    • innodb_log_file_size
      = 512 MB
    • innodb_log_buffer_size
      = 16 MB
  • Initial
    database size = 687 GB
  • After
    30 seconds, sysbench clients connect to the server and insert one million rows/per
    tables
  • Number
    of tables were the same as the number of clients.
  • Redo
    write rate = ~60 MB/s

MEB  version

Number of clients

Backup status

4.1

100

Completed successfully

4.1

500

Completed successfully

4.1

1000

Completed successfully

4.0.3

2

Failed with redo log overwrite issue

We
kept the redo log files size small so that the redo log file switch would
happen frequently. That may result in a case where MEB is not able to read as
fast as the server was writing, but in the test, the case did not occur. That
is, the threads completed every time using the three iterations mentioned
above.  In each iteration, there was approximately
150GB of data in the ibbackup_logfile.

Environment-02:

  •  Server
    Repository Options:
    • innodb_log_files_in_group=
      6
    • innodb_log_file_size
      = 4 GB
    • innodb_log_buffer_size
      = 16 MB
  • Initial
    database size = 687 GB
  • After
    30 seconds, sysbench clients connect to the server and insert three million rows/per
    tables\
  • Number
    of tables were same as number of clients.
  • Redo write
    rate = ~90 MB/s

MEB  version

Number of clients

Backup status

4.1

2000

Completed successfully

4.0.3

2

Failed with redo log overwrite issue

Here,
we kept the redo log files size larger to widen the gap between the server
writing the redo log and MEB reading it from the redo log files.  We increased the number of clients this time
and noticed a higher redo write rate but backup completed without any problems.
In this test, there was approximately 850GB of data in the ibbackup_logfile.

Summary

We
could replicate the redo log overwrite issues easily with the older version of
MEB, but we are yet to see the problem in the 4.1 release.  Although it is still possible that the redo
log overwrite error may happen, the probability of that happening has been
reduced drastically.  This probability could
be further reduced if you perform an Optimistic Full Backups or an
Optimistic Incremental Backups.  Try out the latest version and provide us
feedback on how it works for your environments.

Join the discussion

Comments ( 1 )
  • rahul Tuesday, March 7, 2017

    This improvement will be available in next release of 3.x series.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
 

Visit the Oracle Blog

 

Contact Us

Oracle

Integrated Cloud Applications & Platform Services