Backing up a zvol
By user12614620 on Dec 08, 2009
My response is: "That's the wrong question." In fact, someone replied to Michael2024 already saying that rsync was not the right tool, but no one suggested the best tool for backing up zvols: snapshots
"But Mark," you say (because we're on first-name terms, and that is in fact my first name). "The snapshot is right there on the device that I'm trying to back up! How can that possibly help me?"
I'm glad you asked.
If you try to "back up" a zvol using a tool like dd, you're going to have to copy the whole volume, even the blocks that contain no data. But zvols are ZFS constructs which means they follow the copy-on-write paradigm which, in turn, means that ZFS needs to know what's data and what's not.
So that means that any snapshot will only contain the data that is actually on the disk. That's right: a snapshot of a 100TB volume that has 10MB of data will only contain those 10MB of data. And therefore, any "zfs send" stream will only contain real data and not a bunch of unwritten garbage.
To demonstrate, let's create a 100MB volume and snapshot it:
-bash-4.0# zfs create -V 100m tank/vol -bash-4.0# zfs snapshot tank/vol@snapHow big is the send stream? Easy enough to check:
-bash-4.0# zfs send tank/vol@snap | wc -c 4256Just a smidge over 4k. Let's write some data:
-bash-4.0# dd if=/dev/random of=/dev/zvol/rdsk/tank/vol bs=1k count=10 10+0 records in 10+0 records out -bash-4.0# zfs snapshot tank/vol@snap2 -bash-4.0# zfs send tank/vol@snap2 | wc -c 21264OK, we wrote 10k of data, and the send stream is 20k. With such a little amount of data, the overhead is about half the stream. But, what if we write to the same blocks again?
-bash-4.0# dd if=/dev/random of=/dev/zvol/rdsk/tank/vol bs=1k count=10 10+0 records in 10+0 records out -bash-4.0# zfs snapshot tank/vol@snap3 -bash-4.0# zfs send tank/vol@snap3 | wc -c 21264The exact same amount! So ZFS knows exactly how much data there is on the zvol. Let's write 1MB instead:
-bash-4.0# dd if=/dev/random of=/dev/zvol/rdsk/tank/vol bs=1k count=1024 1024+0 records in 1024+0 records out -bash-4.0# zfs snapshot tank/vol@snap4 -bash-4.0# zfs send tank/vol@snap4 | wc -c 1092768 -bash-4.0#And now the overhead is quite a bit smaller than the data, around 3-4%.
The question then is: which is more efficient? Doing a full block-by-block copy using something "higher up in the stack" (quoting from Michael2024 there), or creating another pool and doing a "zfs send | zfs recv"? On top of that, add the under-appreciated feature of incremental send streams, and you have a full backup solution that does not require any external tools.
I would respond on the Spiceworks website, but alas they are both members-only and require you download a Windows client just to register. Lame!