Friday Feb 08, 2013

Truly Parallel backup (MySQL Enterprise Backup 3.8 and later)

How do you implement a parallel algorithm for a software which needs to be streamed to tapes?
How do you ensure that you have the capability to be able to tune the level of parallelism for varying input and output devices and varying levels of load?
These were some of the questions that we needed to answer when we were trying to implement multi-threading capability for MySQL Enterprise Backup (MEB).
The trivial way of achieving parallelism is by having the multiple threads pick up the different files (in a file per table) scenario. But this did not seem adequate because:
a) The sizes of these files (corresponding to the tables) could be different and then one large file would limit the level of parallelism since it would be processed by a single thread.
b) If you have to stream the backup how do you reconcile these multiple files being streamed by separate threads? Large backups are streamed directly to tape so it is better to have a single file being output and not multiple files.
c) If you buffer each file and wait for a file to be completely processed and then push it to tape then it is not true streaming because you are using intermediate disk space to save the incomplete portions of all the files.
The answer that we found was to implement the parallel algorithm using a horizontal strategy instead of a vertical strategy.


In the vertical strategy, each thread acts on a separate file. This limits streaming since the file sizes can vary.
In the horizontal strategy, each file is broken into a sections (denoted by multiple colors). A separate thread is assigned to operate on a single section.
Parallel operations are then possible for reading , processing and writing of these file subsections because no two threads will be operating on the same section of the file.
This setup is especially useful when using compression since there can be multiple threads performing compression while the read and write continues in parallel.
There may be additional overhead of ensuring that the buffers are in the correct order when they are written out, but since most of the buffers of the same size and having similar operations being performed, the overhead is minimal.
You get truly serialized output that is streamed to tape as it gets processed. If you are streaming to a remote host or to tape, there is almost no additional space required on your main server. We call this new mechanism parallel backup because we are achieving parallelism thereby making the backup faster. Indeed, using parallel backup may see up to 10 times the speed of a normal backup in certain scenarios.
The graph below shows the time it took for backup for MEB 3.7.1 v/s MEB 3.8 using varying number of threads.



Note : This is a 16 GB, 2 x 2000 MHz, 2 RAID DISKS (1027 GB,733.9GB) machine running Oracle Linux.

As you can see above; MEB 3.8 provides options to configure the number of threads you use for reading, writing and processing. Lets denote RT, PT and WT as number of Read, Process and Write threads respectively. Default values for MEB 3.8 is RT=3,PT=3, WT=3 which is changing in MEB 3.8.1 to RT=1, PT=6, WT=1.

This is close to the fastest backup we get in the graph above. The reason for not choosing RT=1, PT=12, WT=1 (which is the fastest) is because the CPU gets very highly utilized in the 1,12,1 configuration.

Remember, the read write throughput depends on your input and output devices. It is possible that multiple threads do not give you a better performance for read or write v/s a single thread.

There are also options available to have a configurable number of buffers used by these threads.

Each buffer is of size 16MB. You should have at-least [RT+PT+WT+ (MAX(RT,PT,WT) ] number of buffers so that you get optimal parallelism.

For Example if RT=1, PT=6, WT=1 then you should configure 1+6+1+6 = 14 buffers (default in MEB 3.8.1)

If for example you configure multiple threads but configure only 1 buffer then your backup is not taking advantage of parallelism at all. The read thread reads into the single buffer, buffer is then processed, written and then freed. The read thread is waiting for a buffer to be free to read into it; so it is like a serial process.

One more thing to note is that the number of buffers is limited by the memory limit configured for backup (default 300MB). Please ensure that you configure enough memory to be able to distribute it to the buffers you have configured. If the memory limit configured is less then what is required for the configured number of buffers; MEB will automatically decrease the number of buffers to fit into the memory limit. Based on the default values, if you are configuring more than 18 buffers you will need to increase the memory limit.

Please look at the previous 3.8 blog for detailed configuration examples :

https://blogs.oracle.com/mysqlenterprisebackup/entry/parallel_backup_in_mysql_enterprise

or into our documentation of this feature at

http://dev.mysql.com/doc/mysql-enterprise-backup/3.8/en/backup-capacity-options.html

Cheers 

and remember the wise DBA advise:

If you don't verify your backups periodically it is like not having backups at all


Thursday Feb 07, 2013

MySQL Enterprise Backup 3.8.1 release for 5.6 Server

The MySQL Enterprise Backup 3.8.1 release's main goal was support MySQL 5.6 server. But also beyond that primary goal MEB team added some valuable new options and features to ensure you'll get most from the new features in 5.6 as well. At a glance, here are some of the highlights,

MEB copy of InnoDB undo log tablespaces

MySQL 5.6 introduces a new feature to store undo logs in separate files called as undo tablespaces for improved performance. These undo tablespaces are logically part of system  tablespace. All the commands associated with MEB - "backup", "apply-log" and "copy-back"  now take care of the undo tablespaces in the same way as they process the system tablespace. MEB now supports innodb_undo_directory[logs][tablespace] option variables. When backup is executed, undo datafiles (up to number specified by innodb_undo_tablespaces) are stored in same directory as the datafiles of system tablespace. During copy-back, files can be stored in a location specified by the user using option --innodb-undo-directory.

MEB support for Global transaction ID's

GTID feature is newly introduced in MySQL 5.6 server. GTID's help to track the data being replicated particularly with the automatic slave promotion when a master fails.
When server is started with GTID's enabled and backup is performed on the master server, mysqlbackup produces a new file called as gtid_executed.sql in meta backup directory . This file is used after restoring the backup data on slave server and contains GTID_PURGED option. This provides information from the server at the end of the backup, thereby ensuring that replication starts from the point in time when backup was taken.

UNC Path name support

MEB now supports UNC path names by specifying a location of network resource such as shared file, directory or printer. This feature helps to start backups using windows task scheduler when shared drives cannot be mapped to a drive letter. Support for UNC path names also allows MEB to take backups when user is not logged in.
eg: ./mysqlbackup --defaults-file=/home/my/my.cnf  --backup-dir="\\mysql\\testmeb\" backup

Where testmeb is shared network directory on windows.

When the shared name is corrupt / invalid, MEB detects and then tries to access the files pointed to by the path and prints an error message.

MEB support for different page size settings for InnoDB

InnoDB page size is the server parameter that is associated with all the innodb tablespaces in the MySQL instance. By default the value of this size used to be 16K in the versions less than MySQL 5.6. But from MySQL 5.6, this option is made user configurable to 4k, 8k, 16k etc. Starting from MEB 3.8.1, Backup will work successfully when server is started with different innodb_page_size values. The innodb-page-size option can also be specified in the mysqlbackup command line but MEB will ignore the option provided the connection to server is available. If innodb_page_size option is not specified in command line or if connection to server is not available, then the value of innodb-page-size is read from the header of the innodb data files.

 InnoDB Checksum Algorithm Support

MEB 3.8.1 introduces new option support of --innodb-checksum-algorithm in MySQL 5.6. This option can also be specified on the command line. A default option is used if its not  specified in the command line and also if the connection to the server is offline. Without the support for this new option, MEB could not start the server after sequence of backup, apply-log and restore operations. One thing to be noted here is,
a. Server backed with strict_crc32, strict_innodb or strict_none checksum algorithms should be restored with the same algorithm
b. Server backed with mixed algorithms should not be restored to a server with strict_* algorithms.

Backup of system tablespace with fractional megabyte.

It sometimes happens that InnoDB engine extends datafiles of system tablespace by few megabytes. But if the disk is full, then system tablespace will actually extend to  fractional megabytes. During such cases, MEB performs a consistency check on the sizes of InnoDB datafiles and if the size does not match the size of the file on the disk, a warning is reported.That is MEB does not backup the fractional datafile in the system tablespace.

Backup restore file per table tablespaces at different locations.

In MySQL 5.6, it is possible to create new InnoDB table with per-table tablespace outside of data directory where .ibd file should be created instead of default location in the database sub directory. For each .ibd file, a .isl file is created in the database subdirectory containing absolute path name acting like a symbolic link to actual tablespace file. All the MEB operations are now able to read the .isl files to locate the .ibd files during backup. During backup, both .isl and .ibd files are copied to the backup directory but .isl is renamed as .bl file. During copy-back, .ibd files are being copied to a location specified in .bl file. But if the target location is changed where the restore of the backup should be performed, then one needs to manually edit the .bl file before doing the restore and specify the abs path name where .ibd files should go.

The above mentioned are new features added to backup code but this release also includes various bug fixes, please take a look at the MEB 3.8.1 reference manuals for more details.

The MEB team has put great deal of efforts to ensure that latest release of MEB 3.8.1 is compliant with MySQL 5.6 server. Please try this new MEB 3.8.1 version with MySQL 5.6 server and as always send us your feedback / comments here. MEB 3.8.1 is now available in My Oracle Support site and will very soon be available in Oracle's Cloud delivery site.

Once again, I would like to thank entire MEB team to deliver this release on time and with many valuable new additions.


About

MySQL MEB Team Blog

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today