X

MySQL and MySQL Community information

  • February 8, 2013

Truly Parallel backup (MySQL Enterprise Backup 3.8 and later)

Sanjay Manwani
MySQL India Director
How do you implement a parallel algorithm for a software which needs to
be streamed to tapes?

How do you ensure that you have the capability to be able to tune the
level of parallelism for varying input and output devices and varying
levels of load?

These were some of the questions that we needed to answer when we were
trying to implement multi-threading capability for MySQL Enterprise
Backup (MEB).

The trivial way of achieving parallelism is by having the multiple
threads pick up the different files (in a file per table) scenario. But
this did not seem adequate because:

a) The sizes of these files (corresponding to the tables) could be
different and then one large file would limit the level of parallelism
since it would be processed by a single thread.

b) If you have to stream the backup how do you reconcile these multiple
files being streamed by separate threads? Large backups are streamed
directly to tape so it is better to have a single file being output and
not multiple files.

c) If you buffer each file and wait for a file to be completely
processed and then push it to tape then it is not true streaming because
you are using intermediate disk space to save the incomplete portions of
all the files.

The answer that we found was to implement the parallel algorithm using a
horizontal strategy instead of a vertical strategy.



In the vertical strategy, each thread acts on a separate file. This
limits streaming since the file sizes can vary.

In the horizontal strategy, each file is broken into a sections (denoted
by multiple colors). A separate thread is assigned to operate on a
single section.

Parallel operations are then possible for reading , processing and
writing of these file subsections because no two threads will be
operating on the same section of the file.

This setup is especially useful when using compression since there can
be multiple threads performing compression while the read and write
continues in parallel.

There may be additional overhead of ensuring that the buffers are in the
correct order when they are written out, but since most of the buffers
of the same size and having similar operations being performed, the
overhead is minimal.

You get truly serialized output that is streamed to tape as it gets
processed. If you are streaming to a remote host or to tape, there is
almost no additional space required on your main server. We call this
new mechanism parallel backup because we are achieving parallelism
thereby making the backup faster. Indeed, using parallel backup may see
up to 10 times the speed of a normal backup in certain scenarios.

The graph below shows the time it took for backup for MEB 3.7.1 v/s MEB
3.8 using varying number of threads.



Note
: This is a
16 GB, 2 x 2000 MHz, 2 RAID DISKS (1027
GB,733.9GB) machine running Oracle
Linux.

As you
can see above; MEB 3.8 provides options to configure the number of
threads you use for reading, writing and processing. Lets denote RT, PT and WT as number of Read,
Process and Write threads respectively. Default values for MEB 3.8 is RT=3,PT=3,
WT=3 which is changing in MEB 3.8.1 to RT=1, PT=6, WT=1.

This is close to the fastest backup we get in the graph
above. The reason for not choosing RT=1, PT=12, WT=1 (which is the
fastest) is because the CPU gets very highly utilized in the 1,12,1
configuration.

Remember,
the read write throughput depends on your input and output devices.
It is possible that multiple threads do not give you a better
performance for read or write v/s a single thread.

There
are also options available to have a configurable number of buffers
used by these threads.

Each
buffer is of size 16MB. You should have at-least [RT+PT+WT+
(MAX(RT,PT,WT) ] number of buffers so that you get optimal
parallelism.

For
Example if RT=1, PT=6, WT=1 then you should configure 1+6+1+6 = 14
buffers (default in MEB 3.8.1)

If for
example you configure multiple threads but configure only 1 buffer
then your backup is not taking advantage of parallelism at all. The
read thread reads into the single buffer, buffer is then processed,
written and then freed. The read thread is waiting for a buffer to be
free to read into it; so it is like a serial process.

One
more thing to note is that the number of buffers is limited by the
memory limit configured for backup (default 300MB). Please ensure
that you configure enough memory to be able to distribute it to the
buffers you have configured. If the memory limit configured is less
then what is required for the configured number of buffers; MEB will
automatically decrease the number of buffers to fit into the memory
limit. Based on the default values, if you are configuring more than
18 buffers you will need to increase the memory limit.

Please
look at the previous 3.8 blog for detailed configuration examples :

https://blogs.oracle.com/mysqlenterprisebackup/entry/parallel_backup_in_mysql_enterprise

or into
our documentation of this feature at

http://dev.mysql.com/doc/mysql-enterprise-backup/3.8/en/backup-capacity-options.html

Cheers 

and remember the wise DBA advise:

If you don't verify your backups periodically it is like not having backups at all


Join the discussion

Comments ( 5 )
  • guest Thursday, February 14, 2013

    Very informative and to the point, useful! Thanks Sanjay.


  • Sreekar Tuesday, April 2, 2013

    Is multiple write threads possible with SBT interface ??


  • Sanjay Tuesday, September 17, 2013

    Sreekar,

    Since the file is being written as a single stream, even if you designate multiple threads, in effect it will take the same amount of the time since it will wait to synchronize the output in the order it needs to be written.

    Most SBT enabled backup frameworks like OSB, Symantec and TSM expect there data to be written from a single thread.

    -Sanjay


  • Sreekar Monday, September 23, 2013

    Hi Sanjay

    Thanks for the response.(It's a bit late but I know you would be busy). I understood the problem but How is it possible for multiple write threads while we backup without SBT (I mean, to disk)? When MEB can write in multiple threads to a folder on disk, I want to explore the possibility/hack to make it work with tape also? at least with the help of some buffering at the machine which has tape library installed.


  • Sanjay Thursday, October 3, 2013

    Sreekar,

    Yes, in general you can specify multiple output threads.

    And you can definitely play with that.

    See what works and what does not.

    There are sometimes worst performance numbers when you have multiple

    write threads (even in non-raid disks).

    Regards,

    Sanjay


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.