Thursday Apr 03, 2014

Data Encryption with MySQL Enterprise Backup 3.10

Introduction

MySQL Enterprise Backup (MEB) 3.10 introduces support for encrypted backups by allowing backup images, or single-file backups, to be encrypted. However, backups stored in multiple files in a backup directory can not be encrypted.

Any MEB command that produces a backup image can be optionally requested to encrypt it. The encrypted backup image can be stored in a file or tape in the same way as an unencrypted backup image. Similarly, any MEB command that reads data from a backup image accepts also an encrypted backup image. This means that encrypted backups can be used in all the same situations as unencrypted backup images.

MEB encrypts data with Advanced Encryption Standard (AES) algorithm in CBC mode with 256-bit keys. AES is a symmetric block cipher which means that the same key is used both for encryption and decryption. The AES cipher has been adopted by the U.S. government and it is now used worldwide.

A new format for the encrypted backup image is introduced. This is a proprietary format developed by Oracle and it allows efficient encryption and decryption in parallel.

Encryption keys

Encryption keys are strings of 256 bits (or 32 bytes) that are represented by strings of 64 hexadecimal digits. The simplest way to create an encryption key for MEB is to type 64 randomly chosen hexadecimal digits and save them in a file. Another method is to use some shell tool to generate a string of random bytes and encode it as hexadecimal digits. For example, one could use the OpenSSL shell command to generate a key as follows:

$ openssl rand 32 -hex
8f3ca9b850ec6366f4a54feba99f2dc42fa79577158911fe8cd641ffff1e63d6

This command uses random data generated on the host for creating the key. Whichever method is used for the creation of the key, the essential point is that the resulting key consists of random bits.

The security of MEB encryption is based on two rules that apply not only to MEB but to all encryption schemes using symmetric block ciphers:

Rule 1: The encryption keys must be random.

Rule 2: The encryption keys must remain secret at all times.

When these rules are followed, it is very difficult for unauthorized persons to get access to the secure data.

Encryption keys can be specified either on the command-line with the

--key=KEY 
option where KEY is a string of 64 hexadecimal digits, or in a file with the
--key-file=FILENAME

option where FILENAME is the name of the file that contains a string of 64 hexadecimal digits.

It is important to notice that specifying the key on the command-line with the --key option is generally not secure because the command-line is usually visible to other users on the system and it may even be saved in system log files that may be accessible by unauthorized persons. Therefore, the --key-file option should be preferred over the --key option in all production environments, and the use of the --key option should be limited to testing and software development environments.

Using encryption

Encryption is very simple to use. Any MEB command that produces a backup image can be requested to encrypt it by specifying the --encrypt option with either --key or --key-file option. The following example shows how to make a compressed backup and store it as an encrypted backup image.


$ mysqlbackup --encrypt --key-file=/backups/key --compress --backup-dir=/full-backup  --backup-image=/backups/image.enc  backup-to-image

MySQL Enterprise Backup version 3.10.0 Linux-3.2.0-58-generic-i686 [2014/03/04]

Copyright (c) 2003, 2014, Oracle and/or its affiliates. All Rights Reserved.

 mysqlbackup: INFO: Starting with following command line ...

 /home/pekka/bzr/meb-3.10/src/build/mysqlbackup --encrypt

        --key-file=/backups/key --compress --backup-dir=/full-backup

        --backup-image=/backups/image.enc backup-to-image

 mysqlbackup: INFO:

IMPORTANT: Please check that mysqlbackup run completes successfully.

           At the end of a successful 'backup-to-image' run mysqlbackup

           prints "mysqlbackup completed OK!".

140306 21:40:33 mysqlbackup: INFO: MEB logfile created at /full-backup/meta/MEB_2014-03-06.21-40-33_compress_img_backup.log

 mysqlbackup: WARNING: innodb_checksum_algorithm could not be obtained from config or server variable and so mysqlbackup uses the default checksum algorithm 'innodb'.

--------------------------------------------------------------------

                       Server Repository Options:

--------------------------------------------------------------------

...

...

...

Backup Image Path = /backups/image.enc

 mysqlbackup: INFO: Unique generated backup id for this is 13941348344547471

 mysqlbackup: INFO: Uses LZ4 r109 for data compression.

 mysqlbackup: INFO: Creating 18 buffers each of size 16794070.

140306 21:40:36 mysqlbackup: INFO: Compress Image Backup operation starts with following threads

        1 read-threads    6 process-threads    1 write-threads

140306 21:40:36 mysqlbackup: INFO: System tablespace file format is Barracuda.

140306 21:40:36 mysqlbackup: INFO: Starting to copy all innodb files...

 mysqlbackup: INFO: Copying meta file /full-backup/backup-my.cnf.

 mysqlbackup: INFO: Copying meta file /full-backup/meta/backup_create.xml.

140306 21:40:36 mysqlbackup: INFO: Copying /sqldata/simple-5.6/ibdata1 (Barracuda file format).

140306 21:40:36 mysqlbackup: INFO: Found checkpoint at lsn 188642964.

...

...

...

140306 21:40:51 mysqlbackup: INFO: Compress Image Backup operation completed successfully.

 mysqlbackup: INFO: Image Path = /backups/image.enc

-------------------------------------------------------------

   Parameters Summary         

-------------------------------------------------------------

   Start LSN                  : 188642816

   End LSN                    : 188642964

-------------------------------------------------------------

mysqlbackup completed OK! with 2 warnings



This resulting encrypted backup image (file "image.enc") can be used with all commands that accept a backup image in the same way as an unencrypted backup image. For example, one could restore the server from the encrypted backup as follows:


$ mysqlbackup --decrypt --key-file=/backups/key --uncompress --backup-image=/backups/image.enc --backup-dir=/full-backup copy-back-and-apply-log

MySQL Enterprise Backup version 3.10.0 Linux-3.2.0-58-generic-i686 [2014/03/04]

Copyright (c) 2003, 2014, Oracle and/or its affiliates. All Rights Reserved.

 mysqlbackup: INFO: Starting with following command line ...

 /home/pekka/bzr/meb-3.10/src/build/mysqlbackup --decrypt

        --key-file=/backups/key --uncompress --backup-image=/backups/image.enc

        --backup-dir=/full-backup copy-back-and-apply-log

 mysqlbackup: INFO:

IMPORTANT: Please check that mysqlbackup run completes successfully.

           At the end of a successful 'copy-back-and-apply-log' run mysqlbackup

           prints "mysqlbackup completed OK!".

 mysqlbackup: INFO: Backup Image MEB version string: 3.10.0 [2014/03/04]

 mysqlbackup: INFO: The input backup image contains compressed backup.

140310 12:51:54 mysqlbackup: INFO: MEB logfile created at /full-backup/meta/MEB_2014-03-10.12-51-54_copy_back_cmprs_img_to_datadir.log

...

...

140310 12:52:14 mysqlbackup: INFO: We were able to parse ibbackup_logfile up to

          lsn 188642964.

140310 12:52:14 mysqlbackup: INFO: The first data file is '/home/pekka/sqldata/copyback-simple-5.6/ibdata1'

          and the new created log files are at '/home/pekka/sqldata/copyback-simple-5.6'

140310 12:52:14 mysqlbackup: INFO: Apply-log operation completed successfully.

140310 12:52:14 mysqlbackup: INFO: Full Backup has been restored successfully.

mysqlbackup completed OK!



In these examples we have used the --key-file option for specifying the encryption key because it is more secure than giving the key on the command-line with the --key option.

Tips

This section describes two tips that may be useful when working with encrypted backups.

The "Wrong key" error

Encryption and decryption use the same key. If decryption is attempted with a key different from the encryption key, a wrong key error occurs. When this happens, MEB prints an error message like the one shown below.


MySQL Enterprise Backup version 3.10.0 Linux-3.2.0-58-generic-i686 [2014/03/04]

Copyright (c) 2003, 2014, Oracle and/or its affiliates. All Rights Reserved.

 mysqlbackup: INFO: Starting with following command line ...

        mysqlbackup --backup-image=/backups/image.enc --decrypt

        --key-file=/key-file2 list-image

 mysqlbackup: INFO:

IMPORTANT: Please check that mysqlbackup run completes successfully.

           At the end of a successful 'list-image' run mysqlbackup

           prints "mysqlbackup completed OK!".

 mysqlbackup: INFO: Creating 14 buffers each of size 16777216.

 mysqlbackup: ERROR: Failed to decrypt encrypted data in file /backups/image.enc : the file may be corrupted or a wrong encryption key was specified.



For the user, this can be problematic because two possible reasons for the failure are offered in the error message: either the backup is corrupted or a wrong key was supplied. This is not a bug or feature of MySQL Enterprise Backup but, instead, it is a theoretical limitation imposed by the encryption scheme. It is not possible even in theory to distinguish with absolute certainty between these two explanations when decryption fails.

However, these two explanations are not always equally likely. If decryption fails at the very start without decrypting any data, then it is more likely that a wrong key was supplied. On the other hand, if the decryption fails later after some data was successfully decrypted, then it is very likely that the correct key was given but the encrypted backup is broken. Using these two rules it is possible to determine with high probability the cases where decryption fails because of a wrong key.

Recognizing encrypted backups

On Unix-like operating systems "magic numbers" may be used for identifying the type of a file. Magic numbers are patterns in files that allow recognizing the type of a file by examining the first bytes in the file. Both the unencrypted backup images and encrypted backup images have magic numbers that can be used by shell tools to detect the file type. For example, by putting these lines to the /etc/magic file

0   string  MBackuP\n   MySQL Enterprise Backup backup image
0   string  MebEncR\n   MySQL Enterprise Backup encrypted backup


the file command detects the backups images as follows:

$ file /backups/image1 /backups/image2
/backups/image1: MySQL Enterprise Backup backup image
/backups/image2: MySQL Enterprise Backup encrypted backup


Wednesday Apr 02, 2014

Offline checksum validation for directory and Image backup using MySQL Enterprise Backup

Data integrity:
-------------------
Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle. Every organization whether it's small or large want to make sure their data is consistent and error free. Data might move to other media/ different storage system for performance, speed, scalability or any other business reasons. So we want to make sure data is not corrupt while migration/movement. Data integrity is a policy which enterprise can enforce to be confident about their own data.

The overall intent of any data integrity technique is to ensure data is recorded exactly as  intended and upon retrieval later, ensure data is the same as it was when originally recorded.


Objective:
---------------
User should be able to verify data integrity of Innodb data files of a taken backup. Because during backup MEB performs integrity check to ensure consistency of data which MEB copies  from server data_dir. This feature gives flexibility to the user to run integrity check on his/her data at any time after backup. Thus this feature allows to check data integrity of backup directory/Image off-line.


Advantage:
----------------
Checksum mismatch/s will cause InnoDB to deliberately shut down a running server. It is preferable to use this command/operation rather than waiting for a server in production usage to encounter the damaged data pages.

This feature will be useful when user has taken a backup and is skeptical that the data might be corrupt before restore. It allows user to verify correctness of their backed data before restore.

 MEB's parallel architecture supports integrity check in parallel. So multiple threads in parallel operating on different chunks of the IBD data at the same time. Performance of data integrity is truly great compared to innochecksum offline utility which is single threaded.




Command-line options:

Existing "validate" command will be used to validate backup directory content. In option field, 

"--backup-dir=back_dir" we have to specify with validate.

e.g. ./mysqlbackup --backup-dir=back_dir validate


To validate compressed backup dir following command line should be used

e.g. ./mysqlbackup --backup-dir=comp_back_dir validate

To validate image

e.g. ./mysqlbackup --backup-image=back_image validate

The error message expected from Validate operation over a corrupt data file is:

 "mysqlbackup: ERROR: <filename> is corrupt and has : N corrupt pages"

In order to validate each pages of an Innodb data file. We need an algorithm name which was being used by server while we took backup.

In backup_dir/backup-my.cnf has parameter named "innodb_checksum_algorithm" along with other parameters. We use this parameter from "backup-my.cnf" file and initialize server checksum algorithm for validate of backup directory.

We have several algorithms like none, which stores magic value on each page, crc32, innodb as well as strict mode. Strict algorithm mode will try to validate checksum on given algorithm only. If checksum of a page is calculated with some other algorithm then it'll fail to validate. But, if algorithm given is not in strict mode it will try to validate page by trying all algorithm.

Validate operation involves no write sub-operation and hence no write threads required.

PAGE_CORRUPT_THRESHOLD is a constant, which specifies threshold/upper limit of corrupt pages per .ibd file. To avoid scanning through all the pages in ibd file we have an internal "PAGE_CORRUPT_THRESHOLD" for each .ibd file. When "validate" reaches this threshold it skips current .ibd file and moves to the next .ibd file.


Limitations:
------------------
Due to the limitations of checksum algorithms in principle, a 100% safe detection of each and every corruption is not guaranteed. But if MEB does not find a corruption, the server won't either since MEB uses the same algorithm. However, the algorithms used by server was theoretically proven solid in terms of detecting corruption.


MEB "validate" feature validates files of Innodb storage engine like .ibd, .par (Partitioned Innodb table file) etc. MEB can't validate Non-Innodb files as server don't have support of checksum for these files.


Reason for above problem:
--------------------------------------
For Non-Innodb files like .frm, .MYD, .MYI etc. no checksum is added by the server. InnoDB adds checksums before it writes data to the disk. So the data is protected for its whole life time: write to disk by server, stay on disk, read from disk by backup, write to disk by backup, stay on disk, read from disk by validate and even the copy-back cycle.

Friday Mar 28, 2014

Faster and Space efficient Restore with MySQL Enterprise Backup

The new operation 'Copy-back-and-apply-log' introduced in MySQL Enterprise Backup 3.9.0 helps in faster restoration and helps in reducing the amount of space involved in restoring the backup contained in an image, because of elimination of the intermediate step of extraction of image contents. 

[Read More]

Monday Mar 10, 2014

MySQL Enterprise Backup 3.10.0-What's New

MySQL Enterprise Backup team is proud to announce the new release of MySQL Enterprise Backup (MEB) 3.10.0.

Security of backups and usability of backup tools form an important factor in taking successful backups. DBAs prefer to have tools that would help them in backup and restore without much overhead and also to help keep their backups protected. So in this release of MEB we have enhanced usability and added provision to take secured backups. In addition to this we have also provided enhancements to compressed backup.

Product New Features

MEB 3.10.0 is a major release with the following valuable features.

Feature

Benefit

Improved Compressed Backup

Multiple Compression Options

Encrypted Backup

Security

Exclude tables from backup

Usability

Validate backup dir or image

Data Integrity

Streaming restore of partial backup

Usability

  • Improved Compressed Backup
    -  Multiple compression algorithms added for users to select, while taking compressed backups
    -  LZ4, http://code.google.com/p/lz4 -   Extremely Fast Compression algorithm, made as default for compressed backups which yield better results when I/O is efficient
    -  LZMA, algorithm added to provide the highest compression ratio
    -  ZLIB, MEB’s older compression algorithm also retained as an option to select

  • Encrypted Backup
    -  Backups need to be secured, as nowadays they are stored in public cloud or in remote servers that are less trusted
    -  Encrypted backup helps in preventing backup being an easy vector to gain access to data in cyber attacks
    -  In this release we have introduced options to encrypt a single file backup

  • Exclude Tables
    -  DBAs often face with the challenge of doing frequent backup and restore of certain data efficiently
    -  Exclude Tables feature helps in leaving out some unimportant or larger tables while doing the backup
    -  This provides excellent ease of use for administrators as they don’t have to specify every single table they wanted to backup

  • Validate backup dir and image
    -  Backups to be validated for any corrupted data pages before they are restored
    -  This feature validates each page of the Innodb data file in the backup directory or image file for its checksum
    -  Checks the data integrity of the backup

  • Streaming restore of partial backup
    -  In this release we support single step restore of partial backup taken with use-tts option
    -  MEB can take selective backups of Innodb tables and now restore it to another running server seamlessly

  • Other Enhancements
    Cleaner interface for Partial Backups:   Easier to use options  –include-tables and –exclude-tables are added in this release so that some of the not-so intuitive options –include and –databases of partial backup are now made as legacy options.
    MEB Documentation Guide:   Improved usability of the documentation by providing a master list of all command line options and references from there to the option descriptions.
    -  When using the –skip-unused-pages option for backup operations, MySQL Enterprise Backup now displays the number of pages of data skipped and  the total amount of space saved by using the option.
    -  Many other important bugs are also addressed in this release. Please check the change log.

Summary
For a detailed overview on the above features please refer 3.10.0 documentation.
Supported Platforms:  MEB 3.10.0 is supported  on Windows, Oracle Enterprise Linux, RHEL 4, 5, SLES 10,11, Solaris 10/11 Sparc and X86.
MEB 3.10.0 is now available in My Oracle Support and will soon be available in Oracle Cloud Delivery.

Overall the much awaited MEB 3.10.0 release comes out with an enriched feature set.

We also plan to provide you with more detailed and interesting blogs of the above features.

Monday Nov 25, 2013

MEB integration with Workbench

This blog talks about MySQL Enterprise Backup integration with Workbench and how the Workbench UI can be used to configure and operate MEB.[Read More]

Sunday Sep 29, 2013

Backing up full server instance using MySQL Enterprise Backup

Introduction:

MySQL Enterprise Backup(MEB) takes fast, consistent backups of MySQL server data, and helps in restoring the server to source server's data at the time of backup. But most of the times it is as much important to have same source server's state(server configuration like server global variables, plugins), as data. As backups become more frequent, server variables modified, plugins added or removed, it is very difficult to keep track of this changing server states for every backup. MEB 3.9.0 helps user in providing a complete backup so that the restored server can run with exactly the same state as that of the source server at the time of the backup.

MEB 3.9.0 performs full server instance backup,  which, on top of the log files and data files, also includes all the global variables and plugins(both internal and external) details. With this feature, backup-content.xml, a meta file under "meta" folder of the backup directory , now additionally contains all the plugins details like name, status, type etc under <plugins> section. In addition, there are two new files created under backup directory

  • server-my.cnf - contains all the global variables with non-default values for that server environment(MySQL server version, Operating System, Hardware Architecture etc).
  • server-all.cnf - contains all the global variables, that includes all the variables with non-default values and the other global variables with default values.

Advantages of Full Server Instance Backup:

  • Create replica - User can clone source server state by using either server-my.cnf or server-all.cnf file in the backup as defaults-file for starting the target server. As most of the global variables default value depend on the server environment, user can create a server with same state even if the target environment is different from source by using server-all.cnf as defaults file. If the target environment is same as source, user can use either of the files.
  • Keep a history of global variables - This new feature reduces the user's task for storing the state of the server, if changed, before every backup or incremental backup, so user need not worry about keeping track of server global variables. With this feature, now we can figure out non-default values of global variables for the running server with server-my.cnf.
  • Full plugins information - With all the plugins information backed up, this information will be used while installing missing plugins on restored server. Plugin details like type, status, library can be used to install missing plugins with the same configuration of the source.

Using Full Server Instance Backup:

Backup:

From MEB 3.9 onwards, this feature is enabled by default for all kinds of backup(i.e normal, incremental, image, compressed etc). That is there is no need to turn on any feature or use any option and all backups are full server instance backups.

Note: Binary logs and *info files used for replication and Innodb buffer pool details, which are also part of server instance, are not included in backups. Server plugins details are copied, but actual plugins binaries are not copied.

Restore:

After copy-back operation, server-all.cnf, server-my.cnf will be present in restored data directory. If there were any external plugins exist in source server, copy-back operation throws a warning about the missing plugins to install.

Starting Server:

The files server-my.cnf, server-all.cnf, which could be used as defaults-file to start the server on the restored data directory. When source and target environments are same, restoring server using server-my.cnf will be easier than using server-all.cnf as server-my.cnf has fewer global variables to verify or modify.

Note: User has to be careful while starting another server instance on the same host using server-my.cnf, server-all.cnf files without changes. There is a possibility of modifying source server settings or data, as some file paths like innodb_log_group_home_dir, innodb_log_group_home_dir,tmpdir,general-log etc are related to source server.

Incremental Backup:

The files server-my.cnf,server-all.cnf reflect the state of the server at the time of a incremental backup and it is desirable to have the same state after applying the incremental backup. So after apply-incremental-backup operation, full backup's server-my.cnf and server-all.cnf will be overwritten by corresponding incremental backup's server-my.cnf and server-all.cnf files.

Thursday Sep 26, 2013

MEB & OSB slides for the the talk at Oracle open World

Here are the slides for the talk given at Oracle Open world about MEB & OSB.

MySQL Enterprise Backup: Introduction and Working with Oracle Secure Backup

Slides for MEB talk at MySQL Connect

Here are the slides for my talk at MySQL connect, 

Backing up the MySQL Database.


Thursday Sep 19, 2013

Backing up selective innodb tables using MEB.

MySQL 5.6 introduced the TTS(transportable table spaces) feature which enables moving a table from one server to another. This feature coupled with MEB 3.9 enables backing up a set of tables matching (regex specified with) the –include option.

The backup of selective tables using transportable tablespaces feature of innodb is referred as tts/selective backup in the remainder of the section.

The difference between a regular partial backup and with using tts is that the regular partial backups are stand alone and cannot be plugged into a another server where as the tts backups in contrast enables the tables to be plugged into another server instance


The Selective Backup Operation

To specify a set of tables to be backed up use the --use-tts option along with the --include=[regex] option for the backup operation. --use-tts option supports two values with-minimum-locking and with-full-locking.

with-minimum-locking -  This is the default option. The tables being backed up are hot copied in parallel along with the redo log. After the data file copy, the tables are locked in read only mode, the delta of the log is copied into the backup, and the locks are released. The advantage of this option is that the tables are available for modifications during most part of the back up process and are available in read only mode for a short duration.

with-full-locking - With this options the tables are locked in read only mode during the entire duration of the backup. As there cannot be any modifications while the backup is happening, the tables are consistent and the redo log is not backed up. This saves space and makes the apply log step faster as it just involves some book keeping operations.

Eg:
mysqlbackup --port=3306 --protocol=tcp --user=root –backup-dir=backupdir –include=Sales.Sales_*  --use-tts backup-and-apply-log


The Restore(copy-back) Operation

Restoring from a tts backup requires the server to be running in case of a tts backup unlike the other types of backup. The connection options of the server where the set of tables have to be restored need to be provided for the copyback of a tts backup.

Eg:
mysqlbackup --defaults-file=/backup-my.cnf --port=3406 --protocol=tcp --user=root --backup-dir=backupdir --datadir=<target_server_datadir> copy-back


Advantages of using MEB for tts Backup

Have a backup strategy for backing a subset of tables(for eg: backing up only important/most used tables).
Take advantage of compressed and image backup options supported by MEB.
This feature can effectively be used to copy a set of innodb tables from one server instance to another.  

This feature handles only tables having their own tablespaces(innodb_file_per_table on) and does not support partitioned tables.

How to restore directly on a remote machine from the backup stream

MySQL Enterprise Backup has been improved to support single step restore from the latest release 3.9.0. It enables you to restore the backup image to remote machine in single step. However, first you would have to create the backup image in local disk, copy the backup image to remote machine, and then restore in remote machine by running copy-back-and-apply-log command.

This approach has two overheads:

    Serial execution: You have to wait for each step to finish before beginning the next (e.g. You must have to wait for backup-to-image operation to finish before beginning copy).
    Disk consumption: You might not have enough space on the source disk to store that backup-image in the first place.

By means of restoring directly on a remote machine via piping backup stream over SSH, you could overcome both these problems.
That means, 
You don't have to store the backup contents anywhere,
Pipe backup stream directly to remote machine,
Optionally, perform compression and decompression on the fly and
Perform restore operation simultaneously.

How to do it:

    Use SSH and pipes to transfer data between backup and restore operations, and
    Perform the backup to stream and restore in remote machine simultaneously.

Steps:

    a) perform image backup and stream the data to stdout --backup-image=- --backup-to-image
    b) pipe the stdout to remote server using ssh and restore data using copy-back-and-apply-log.
Sample command:

mysqlbackup --user=root --port=3306 --backup-dir=backup --socket=/tmp/mysql.sock  --backup-image=- backup-to-image | ssh <user name>@<remote host name> 'mysqlbackup --backup-dir=backup_img --datadir=/data/datadir --innodb_log_group_home_dir=. --innodb_log_files_in_group=8 --innodb_log_file_size=5242880 --innodb_data_file_path="ibdata1:12M:autoextend" --backup-image=- copy-back-and-apply-log'


In case of slower network, you could perform compressed backups to reduce the network traffic.  Compressed backups would require more cpu cycles, but provides faster data transfer.
Sample command with compression:

mysqlbackup --user=root --port=3306 --backup-dir=backup --socket=/tmp/mysql.sock  --backup-image=-  --compress backup-to-image | ssh <user name>@<remote host name> 'mysqlbackup --backup-dir=backup_img --datadir=/data/datadir --innodb_log_group_home_dir=. --innodb_log_files_in_group=8 --innodb_log_file_size=5242880 --innodb_data_file_path="ibdata1:12M:autoextend" --uncompress --backup-image=- copy-back-and-apply-log'


On successful completion of above command, your remote server is being restored and ready to use. This would also be useful to create a data snapshot for replication without any additional storage space.

Wednesday Sep 18, 2013

Skip Unused Pages with MySQL Enterprise Backup 3.9.0

Disclaimer
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Introduction

There are database usage patterns, where tables grow big at times and many rows are deleted from them later. InnoDB does never shrink a table space. In these cases we can end up with big data files, which contain a lot of unused pages. It is a waste of disk- and I/O- resources to back them up.

Users have manifold requested that MySQL Enterprise Backup does not back up unused InnoDB data pages. Some want smaller backups, some want less I/O, some want shrinked table spaces.

MySQL Enterprise Backup 3.9.0 can help with smaller backups. The effect on I/O is not that remarkable. InnoDB data files must be expanded to their original size when they are restored. Backup cannot accomplish a shrinkage of InnoDB table spaces.

In the following I will try to explain, how things work, and why not all wishes can be satisfied. I will also try to show the complexity of the feature. This should clarify, why it comes so late. Because my explanations might be a bit technical, I'll try to summarize important facts in advance.

Administrative Summary

You can use the command line option --skip-unused-pages with any backup operation, but it will be ignored and give a warning for the following cases:
  • backup-and-apply-log. The skipped pages must be re-inserted at the beginning of the apply-log operation. So it would be a waste of resources to do this when backup and apply-log are combined in one operation.
  • incremental-with-redo-log-only. This operation copies log pages only. No data pages are copied. So there is no data page to be skipped.
  • incremental backup: This is simply not implemented in MEB 3.9.0.
Depending on the amount of unused pages in the table spaces, the resulting backup can be much smaller. The saving of I/O resources is moderate. For the same reason don't expect a reduced backup time.

At the beginning of the apply-log operation, the skipped pages must be re-inserted. This means that every backed-up InnoDB data file must be copied over with the skipped pages stuffed in. This means that the total apply-log operation will take at least as long as the backup operation took.

Be prepared that the backup directory grows up to the original database size. At the end of each file, the backup file will be removed, but right before that point, the file exists in the compact and expanded form. You need to have the space for one extra copy of your biggest table.

You can combine --compress and --skip-unused pages. However, in MEB 3.9.0, the uncompression and expansion operations are executed in separate steps. Since both operations copy over all data files, it will take about twice the time as the backup operation took.

In MEB 3.9.0, the single-step restore operation copy-back-and-apply-log cannot be used on backups that have been taken with --skip-unused-pages.

Technical details

Prerequisite

The InnoDB table spaces contain bitmap pages, which indicates pages as free or not free (in-use, that is). The bitmap pages occur every page_size pages in the tablespace. For uncompressed tablespaces, that would be on pages 0, 16384, 32768, 49152, 65536, and so on. For a compressed tablespace with a 2K page size, the bitmap would be on pages 0, 2048, 4096, and so on. The bitmap page always covers the following page_size pages.

Nomenclature

In the following, the set of pages, which are covered by a bitmap page, is called a "map-set".

In the following, the terms "free page" and "unused page" are used interchangeably.

In the following, the term "zero page" means an InnoDB page, which has all its bytes set to zero.

The "free limit" is an InnoDB internal term. It is a page number. It is less or equal to the table space size in pages. If the free limit is less than the table space size in pages, itself and all pages above it are free.

A LSN is a Log Sequence Number. It is the offset in bytes from the logical start of the redo log. The LSN marks the start of a log entry.

The general problem

The task sounds simple. Read the bitmaps. Identify unused pages. Omit them from backup. Replace them with empty pages on restore. All over.

It would be almost as easy as this if we would take backups from cleanly shut down databases. That is, if all data files would be in a consistent state and not be modified during the backup operation.

But if a backup is taken from a hot database, the data files are modified while the backup operation is reading them. Since we cannot read all pages at once, we likely read pages that are modified at different times. When we read a bitmap page and later the pages, which are covered by this bitmap, then they may not be consistent. The bitmap could declare some page as free, but while we read other pages in between, the page may no longer be free when we read it. And vice versa.

Another problem is the InnoDB data cache. Pages are written from the cache to disk at different times. The main constraint is, that a page is never written before all log entries, which describe its modifications, are on disk. Another constraint is that at the time, when an InnoDB checkpoint is noted in the redo log, all page modifications up to that log entry are on disk. Since we start copying the redo log from the latest checkpoint, each page on disk can have any state between that
checkpoint and the current state.

Sure, we have the redo log. We keep copying it in parallel to the data file copy. It should be able to replay all changes that are done on the data pages during the backup operation. But the replay algorithm requires each modified page to be in a certain state. That is, it expects that there are certain data in the page. Each redo log entry describes a transformation of the page contents from one state to another. If the page doesn't have the expected contents, the algorithm fails.

This means that we need to take care, what page contents to restore for pages that were marked free when the corresponding bitmap page was copied. The following shall show, which problems needed to be resolved.

Empty pages

Above we said that we will replace skipped unused pages by empty pages. However, the term "empty page" does not mean a zero page. This won't work because of the following reasons.

The redo algorithm is an idempotent algorithm. The idem-potency is based on the page LSN. Every change to a data page is logged in a log entry of the redo log. That log entry's LSN is written into the page. When the redo log is applied to a database or backup, a log entry is applied to a page only if the page's LSN is lower than the log entry's LSN.

The redo algorithm does not use a pure physical logging. Most log entries do not set a certain number of bytes at a certain offset in a page, but transform a page from one state to another. In other words, the algorithm relies on the correct page contents.

We copied the redo log from the latest checkpoint onward. Usually this contains log entries from before the backup start. So it could happen that there were log entries, which modified a page, before it became unused and before the backup started.

If the unused pages had been recreated as zero pages, their LSN would be zero. Every log entry's LSN is greater than zero. So every log entry for that page would be tried to apply. But the page contents is not, what the log entry expected. The redo log algorithm would fail.

If the unused pages had been recreated as zero pages, but with their LSN set to infinite, no log entry would be applied to unused pages. The apply-log algorithm would finish without errors and warnings. But if there are log entries, which re-initialize and modify the page after the backup operation read the bitmap page, they would not be applied due to the high LSN. The result would be an inconsistent table space.

The correct solution is to use empty pages with the LSN of the bitmap page, which claims it to be an unused page. We know that the page was unused at that LSN. If a page gets freed or re-used, the bitmap is changed and that makes an redo log entry for the bitmap page. The bitmap page gets that LSN. When we copy the bitmap page, we get the bitmap and the corresponding LSN into the backup. Our algorithm assures that each page is consistent in itself. So we have assured knowledge, which pages were free at that LSN. The next higher LSN that affects an unused page must consequently be a log entry, which re-initializes the page. Any log entry with an LSN lower than the bitmap LSN is irrelevant to the pages that are marked free by that bitmap.

Besides the LSN, empty pages will also have the page number, which corresponds to their location in the table space, the table space id, and the checksums set. The remaining part of empty pages are all zero bytes.

Non-free zero pages

The page number of a zero page is zero. Usually one would expect that pages with a page number of zero have never been used and are marked free (except of the page at offset zero - the table space header). But sometimes a page like this is not marked free.

One possible situation in which a page could have a zero page number and not be marked free could occur if a page did become used for the first time short before the backup started, and was not flushed to disk until backup read the page. When the page became used, the bitmap was updated and flushed and the page was updated in memory but not flushed yet. It is possible, yes even probable, that there are redo log entries in the log file, which manipulate the page. The corresponding log entries could have LSNs below the bitmap page's LSN if later more changes were done to the bitmap page. If we would skip the page, and thus effectively declare it as free, the expansion algorithm would insert a page with the bitmap page's LSN, which could be too high.

To be safe, we include non-free zero pages in a backup. They are rare and thus don't make a big difference.

Skip unused pages when reading from the original data files

If we use the bitmaps to avoid reading of unused pages, we turn sequential reading into random reading. Depending on the distribution of unused pages among used pages, this could even drop the read performance. Only if unused pages occur in big contiguous chunks, skipping those could give a speed increase.

Since a bitmap page occurs every page_size pages, only page_size - 1 pages can be skipped in one go at best. Hence, the optimal performance enhancement cannot be reached. All bitmap pages are used pages and must be copied to the backup, even if they mark all "their" pages as free. After all, we could detect such case only after having read the bitmap page. Anyway, it will be rare cases that multiple bitmap pages in a table space mark all their pages free. In theory this could happen in a new table space, which was created way too big for the data. But then InnoDB maintains the free limit. Bitmap pages at and above the free limit are not initialized and don't need to be backed up.

At the beginning of each map-set, only the bitmap page needs to be read. Then unused pages can be identified in contiguous chunks. If a chunk is big enough, then that chunk can be skipped from reading.

It does not seem desirable to read in different chunk sizes. So the read algorithm is now designed so that the read size is the data buffer size. At the beginning of a map-set, a data buffer is always read. Skipping of read pages is done in multiples of the data buffer size. A buffer can only be skipped, if all its pages are unused. Since MEB does currently use a fixed buffer size of 16 MB, it contains at least 1024 pages, depending on the table's page size. The probability for 1024 contiguous pages to be free isn't that high. That's why we won't reduce the I/O load much. And so we haven't implemented this part yet.

Skip unused pages when writing to the backup files

On the write side it is advantageous to suppress even single unused pages. It doesn't break sequential writes.

The algorithm is similar to incremental backup. From the read side we get a buffer, which could contain unused pages. For the write side we produce a buffer with only the used pages from the read buffer. For each page, we decide independently, if we copy it to the output buffer. Every bitmap page needs to be included.

Compressed files must not be empty. If the backup files are compressed before they are written, we must assure that they contain some contents. This is a requirement of our compression algorithm. Since we include all bitmap pages, this is not a problem for file-per-table table spaces, nor for the first file of the system table space. But it can happen if a follow-up data file of the system table space has all its pages free, and does not have a bitmap page below the free limit. To work around this problem, we do always include the first page of a file.

A map-set can cover pages from multiple buffers. We need to keep them in memory until all covered buffers are written.

Restore

On restore, MEB has to recreate unused pages at the right places. The reason for this is explained below in the section "Backup cannot shrink table spaces". Since the contents of the inserted pages do not matter, except of page number, space id, LSN, and checksums, empty pages are written.

In the following, the algorithm to recreate the skipped pages is called "expansion".

Please note that the expansion must take place before an apply-log operation. The apply-log algorithm works on a data file where all pages are at their correct places. Apply-log can modify initially unused pages. Hence those must also be present at the right places in the data file, and have the right LSN.

In MEB 3.9.0, a sequential algorithm is used, which detects skipped pages by a mismatch of a page number and the current write position in the expanded file. Empty pages are inserted until the page number matches the current write position.

If the last page from the backup file is below the free limit from the table space header, empty pages are appended up to the free limit. If the resulting page is below the table space size from the table space header, zero pages are appended up to the table space size.

Backup cannot shrink table spaces

What users really want, is a feature to remove unused pages from a table space, and make the data file(s) smaller.

MEB cannot help with it. Every InnoDB page has a page number, which corresponds to its position in a data file. The InnoDB tablespaces contain tree structures, where a page can reference one or more other pages. These references are done by page numbers. Hence it is vital that every page retains its position in a data file. If there are unused pages among used pages, the used pages cannot be simply shifted down in a file to take the place of an unused page.

If one wants to change a position of a page, one must assign it a new page number (the one that corresponds to its new place) and modify the page number in all places that reference the page. This means that a bunch of random-access page-read and page-write operations can be necessary for each shifted page.

If a table has a single-file table space (innodb-file-per-table=ON) then an OPTIMIZE TABLE statement would create a new tablespaace with freshly constructed trees and thus take the minimum amount of space. Unfortunately that operation can put a too big burden on the database. With MEB, one could think of a workaround. Do a backup of the table, restore it to a temporary place, run a server on it, OPTIMIZE TABLE, and transport the resulting tablespace to the original server.

Wednesday Aug 28, 2013

MySQL Enterprise Backup 3.9.0 – An Insight

 

 

MySQL Enterprise Backup team is excited to announce the new release of MySQL Enterprise Backup (MEB) 3.9.0.

Overview

MEB 3.9.0 focusses more on the ease of use of the product and addresses some of the challenges faced currently by database administrators.  With growing data the need for hardware for database backups has significantly increased. Keeping that in mind MEB 3.9.0 has extended the product capability to save disk space during backup and restore.  MEB 3.9.0 has also enhanced some of the existing features helping MEB users to perform their tasks in a more user friendly way without much overhead. 

Product Enhancements

 MEB 3.9.0 is a major release with the following valuable features.

·         Single Step Restore

-  Direct restore from backup directory or backup image to the target server

-     The restore process is simple and faster and saves significant disk space

-  This addresses the customer requirement Bug #68390 Enterprise Backup requires 2x disk space of data dir to restore 

Core feature in this release as it gives advantage of both disk space and performance and thus saving resources.

 

·         Selective Backup and Restore

- MEB 3.9.0 has an improved version to do a selective backup and restore of a single or a subset of tables.

- This is done using TTS (Transportable Table Spaces) and without any need for manual entry of MySQL statements.

- This feature helps in adding the backed up tables onto the same or different online server.

- DBAs would benefit a lot from the ease of use of this command.

·         Full Instance Backup

- This feature helps in providing a complete backup so that the restored server can be run with exactly the same configuration as that of the backed up server.

- This helps the DBAs in easy restoration as most of the server details are backed up and available for the restored server to start.

 

·         Skip Unused Pages  

- When this feature is enabled MEB skips unused Innodb pages while taking backups. This helps in reduced backup size when there is lot of unused pages in the database.

- This helps in low storage space requirement for the backups taken.

 

·         Console Output Logging

- MEB errors are now redirected to a log file for every operation by default so that it can be referred later to analyze any issues.

- This relieves DBAs from the overhead of saving the error stream to a file manually or performing the operation again for debugging.

 

Summary

For a detailed overview on the above features please refer 3.9.0 documentation.

MEB 3.9.0 is supported  on Windows, Oracle Enterprise Linux, RHEL 4, 5, SLES 10,11, Solaris 10 Sparc and X86. Solaris 11 Sparc and X86 is also supported in this release.

In a nutshell, MEB 3.9.0 has extended the backup and restore capabilities making life easier for database administrators.

Stay tuned to read more blogs about the new features in the next coming days.

 

Friday Jun 28, 2013

MySQL Enterprise Backup 3.8.2 - Overview

 

MySQL Enterprise Backup (MEB) is the ideal solution for backing up MySQL databases. MEB 3.8.2 is released in June 2013.

MySQL Enterprise Backup 3.8.2 release’s main goal is to improve usability. With this release, users can know the progress of backup completed both in terms of size and as a percentage of the total. This release also offers options to be able to manage the behavior of MEB in case the space on the secondary storage is completely exhausted during backup.

The progress indicator is a (short) string that indicates how far the execution of a time-consuming MEB command has progressed. It consists of one or more "meters" that measures the progress of the command. There are two options introduced to control the progress reporting function of mysqlbackup command (1) –show-progress (2) –progress-interval.

The user can control the progress indicator by using “--show-progress” option in any of the MEB operations. This option instructs MEB to output periodically short reports on the progress of time-consuming commands. The argument of this option instructs where the output could be sent. For example it could be stderr, stdout, file, fifo and table.

With the “--show-progress” option both the total size of the backup to be copied and the size that’s already copied will be shown. Along with this, the state of the operation for example data or meta-data being copied or tables being locked and other such operations will also be reported. This gives more clear information to the DBA on the progress of the backup that’s happening.

Interval between progress report in seconds is controlled by “--progress-interval” option.

For more information on this please refer progress-report-options.

MEB can also be accessed through GUI from MySQL WorkBench’s next version. This can be used as the front end interface for MEB users to perform backup operations at the click of a button. This feature was highly requested by DBAs and will be very useful. Refer http://insidemysql.com/mysql-workbench-6-0-a-sneak-preview/ for WorkBench upcoming release info.

Along with the progress report feature some of the important issues like below are also addressed in MEB 3.8.2.

  • In MEB 3.8.2 a new command line option “--on-disk-full” is introduced to abort or warn the user when a backup process encounters a full disk condition. When no option is given, by default it would abort.

  • A few issues related to “incremental-backup” are also addressed in this release. Please refer 3.8.2 documentation for more details. It would be good for MEB users to move to 3.8.2 to take incremental backups.

Overall the added usability and the important defects fixed in this release makes MySQL Enterprise Backup 3.8.2 a promising release.

 

Wednesday Jun 26, 2013

MySQL Enterprise Backup 3.8.2 has been released!

MySQL Enterprise Backup v3.8.2, a maintenance release of online MySQL backup tool, is now available for download from My Oracle Support  (MOS) website as our latest GA release.  It will also be available via the Oracle Software Delivery Cloud in approximately 1-2 weeks. A brief summary of the changes in MySQL Enterprise Backup version 3.8.2 is given below.


  A. Functionality Added or Changed: 

  • MySQL Enterprise Backup has a new --on-disk-full command line option. mysqlbackup could hang when the disk became full, rather than detecting the low space condition. mysqlbackup now monitors disk space when running backup commands, and users can now specify the action to take at a disk-full condition with the --on-disk-full option. For more details, refer this page
  • MySQL Enterprise Backup has a new progress report feature, which periodically outputs short progress indicators on its  operations to user-selected destinations (for example, stdout, stderr, a file, or other choices). For more details on progress report options, refer here

  B. Bugs Fixed:

  • When --innodb-file-per-table=ON, if a table was renamed and backup-to-image was in progress, apply-log would fail when being run on the backup. (Bug #16903973) 
  •  MySQL Server failed to start after a backup was restored if  there had been online DDL transactions on partitioned tables during the time of backup. (Bug #16924499) 
  • apply-incremental-backup might fail with an assertion error if  the InnoDB tables being backed up were created in Barracuda format and with their KEY_BLOCK_SIZE  values  different from the innodb_page_size . This fix ensures that different KEY_BLOCK_SIZE  values are handled properly during incremental backup and apply-incremental-backup operations. 
  • If a table was renamed following a full backup, a subsequent incremental backup could copy the .frm file with the new name, but not the associated .ibd file with the new name. After a  restore, the InnoDB data dictionary could be in an  inconsistent state. This issue primarily occurred if the table  was not changed between the full backup and the subsequent  incremental backup. Bug #16262690)
  •  After a full backup, if a table was renamed and modified,  apply-incremental-backup would crash when run on the backup directory. (Bug #16262609)
  • The value of the binary log position in backup_variables.txt  could be different from the output displayed during the   backup-and-apply-log operation. (This issue did not occur if  the backup and apply-log steps were done separately.) (Bug  #16195529)
  • When using the --only-innodb-with-frm option, MySQL Enterprise Backup tried to create temporary files at unintended locations in the file system, which might cause a failure when, for example, the user had no write privilege for those locations.   This fix makes sure the paths for the temporary files are  correct. (Bug #14787324)
  •  A backup process might hang when it ran into an LSN mismatch between a data file  and the redo log. This fix makes sure the process does not hang and it displays an error message showing the  name of the problematic data file (Bug #14791645)

Please post your questions / comments about Backup in forums.

Thanks,

MEB Team


About

MySQL MEB Team Blog

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today