Wednesday Jun 25, 2008

Why check the digest of files you copy and what to do when they don't match

I'm always copying data from home to work and less often from work to home. Mostly these are disk images. I always check the md5 sum just out of paranoia. It turns out you can't be paranoid enough! The thing to remember if the check sums don't match is not to copy the file again but use rsync. It will bring over just the blocks that are corrupt.

: enoexec.eu FSS 43 $; scp thegerhards.com:/tank/tmp/diskimage.fat.bz2 .
diskimage.fat.bz2    100% |\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*|  1825 MB 11:10:31    
: enoexec.eu FSS 44 $; digest -a md5 diskimage.fat.bz2    
674f69eec065da2b4d3da4bf45c7ae5f
: enoexec.eu FSS 45 $; ssh thegerhards.com digest -a md5 /tank/tmp/diskimage.fat.bz2
191f26762d5b48e0010a575b54746e80
: enoexec.eu FSS 46 $; ls -l diskimage.fat.bz2
-rw-r-----   1 cg13442  staff    1913779931 Jun 25 08:56 diskimage.fat.bz2
: enoexec.eu FSS 47 $; rsync thegerhards.com:/tank/tmp/diskimage.fat.bz2 diskimage.fat.bz2            
: enoexec.eu FSS 48 $; digest -a md5 diskimage.fat.bz2                        
191f26762d5b48e0010a575b54746e80
: enoexec.eu FSS 49 $; 

Since my home directory is now on ZFS and I snapshot every time my card gets inserted into the Sun Ray I can now take a look at what went wrong. Using my zfs_versions script I can get a list of the different versions of the file from all the snapshots:


: enoexec.eu FSS 56 $; digest -a md5 $( zfs_versions diskimage.fat.bz2 | nawk '{ print $NF }')
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-05:51:57/diskimage.fat.bz2) = 0a193e0e80dbf83beabca12de09702a0
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-05:54:44/diskimage.fat.bz2) = 7aa78dba6a7556fe10115aa5fc345bad
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-07:05:34/diskimage.fat.bz2) = c6a77429920f258dfca1dbbd5018a69c
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2) = 674f69eec065da2b4d3da4bf45c7ae5f
(/home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2) = 191f26762d5b48e0010a575b54746e80
: enoexec.eu FSS 57 $;


So the last two files in the list represent the corrupted file and the good file:

: enoexec.eu FSS 57 $; cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-2>
cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2 /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2 | head -10                 
84262913   0 360
84262914   0  14
84262915   0 237
84262916   0  25
84262917   0 342
84262918   0 304
84262919   0  41
84262920   0  12
84262921   0 372
84262922   0  20
: enoexec.eu FSS 58 $;

and there appear to be blocks of zeros.

: enoexec.eu FSS 58 $; cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-2>
cmp -l /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:06:39/diskimage.fat.bz2 /home/cg13442/.zfs/snapshot/user_snap_2008-06-25-09:38:22/diskimage.fat.bz2 | nawk '$2 != 0 { print $0 } $2 == 0 { count++ } END { printf("%x\\n", count ) }'
23d8c
: enoexec.eu FSS 58 $;

or at least 0x23d8c bytes were zero that should not have been. Need to see if I can reproduce this.

Anyway the moral is always check the md5 digest and if it is wrong use rsync to correct it.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« July 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
   
       
Today