Offline checksum validation for directory and Image backup using MySQL Enterprise Backup

Data integrity:
-------------------
Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle. Every organization whether it's small or large want to make sure their data is consistent and error free. Data might move to other media/ different storage system for performance, speed, scalability or any other business reasons. So we want to make sure data is not corrupt while migration/movement. Data integrity is a policy which enterprise can enforce to be confident about their own data.

The overall intent of any data integrity technique is to ensure data is recorded exactly as  intended and upon retrieval later, ensure data is the same as it was when originally recorded.


Objective:
---------------
User should be able to verify data integrity of Innodb data files of a taken backup. Because during backup MEB performs integrity check to ensure consistency of data which MEB copies  from server data_dir. This feature gives flexibility to the user to run integrity check on his/her data at any time after backup. Thus this feature allows to check data integrity of backup directory/Image off-line.


Advantage:
----------------
Checksum mismatch/s will cause InnoDB to deliberately shut down a running server. It is preferable to use this command/operation rather than waiting for a server in production usage to encounter the damaged data pages.

This feature will be useful when user has taken a backup and is skeptical that the data might be corrupt before restore. It allows user to verify correctness of their backed data before restore.

 MEB's parallel architecture supports integrity check in parallel. So multiple threads in parallel operating on different chunks of the IBD data at the same time. Performance of data integrity is truly great compared to innochecksum offline utility which is single threaded.




Command-line options:

Existing "validate" command will be used to validate backup directory content. In option field, 

"--backup-dir=back_dir" we have to specify with validate.

e.g. ./mysqlbackup --backup-dir=back_dir validate


To validate compressed backup dir following command line should be used

e.g. ./mysqlbackup --backup-dir=comp_back_dir validate

To validate image

e.g. ./mysqlbackup --backup-image=back_image validate

The error message expected from Validate operation over a corrupt data file is:

 "mysqlbackup: ERROR: <filename> is corrupt and has : N corrupt pages"

In order to validate each pages of an Innodb data file. We need an algorithm name which was being used by server while we took backup.

In backup_dir/backup-my.cnf has parameter named "innodb_checksum_algorithm" along with other parameters. We use this parameter from "backup-my.cnf" file and initialize server checksum algorithm for validate of backup directory.

We have several algorithms like none, which stores magic value on each page, crc32, innodb as well as strict mode. Strict algorithm mode will try to validate checksum on given algorithm only. If checksum of a page is calculated with some other algorithm then it'll fail to validate. But, if algorithm given is not in strict mode it will try to validate page by trying all algorithm.

Validate operation involves no write sub-operation and hence no write threads required.

PAGE_CORRUPT_THRESHOLD is a constant, which specifies threshold/upper limit of corrupt pages per .ibd file. To avoid scanning through all the pages in ibd file we have an internal "PAGE_CORRUPT_THRESHOLD" for each .ibd file. When "validate" reaches this threshold it skips current .ibd file and moves to the next .ibd file.


Limitations:
------------------
Due to the limitations of checksum algorithms in principle, a 100% safe detection of each and every corruption is not guaranteed. But if MEB does not find a corruption, the server won't either since MEB uses the same algorithm. However, the algorithms used by server was theoretically proven solid in terms of detecting corruption.


MEB "validate" feature validates files of Innodb storage engine like .ibd, .par (Partitioned Innodb table file) etc. MEB can't validate Non-Innodb files as server don't have support of checksum for these files.


Reason for above problem:
--------------------------------------
For Non-Innodb files like .frm, .MYD, .MYI etc. no checksum is added by the server. InnoDB adds checksums before it writes data to the disk. So the data is protected for its whole life time: write to disk by server, stay on disk, read from disk by backup, write to disk by backup, stay on disk, read from disk by validate and even the copy-back cycle.
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

MySQL MEB Team Blog

Search

Categories
Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today