News, tips, partners, and perspectives for the Oracle Linux operating system and upstream Linux kernel work

Btrfs send/receive helps to move and backup your data

In this update, we share Btrfs functionality that helps make moving data between Btrfs volumes faster and more efficient. It's not new feature but it's an underutilized feature which showcases the unique capabilities of Btrfs as the native Linux copy-on-write filesystem.

Btrfs send is introduced in Linux v3.5 and the amazing part is that it offers the ability of incremental update. Here I'll go through the command as a user and try to understand it as a btrfs developer. A user can transfer one whole subvolume tree to another btrfs filesystem by using 'send', keep in mind that the subvolume tree must be _readonly_, so the steps could be
as simple as a few commands. 

By 'whole subvolume tree' I mean both data and metadata will be transferred to the receive side, in order to do this, command 'send' uses pipe(2), which creates two file descriptors, one for reader and one for writer, it is the writer fd that kernel writes send's instructions to, and in the userspace progs retrives those instructions from the reader fd and writes to stdout by default. In the above example, we created another pipe to redirect stdout to the receive side.

$ man btrfs-send
usage: btrfs send [-ve] [-p ] [-c <clone-src>] [-f ] <subvol> [<subvol>...]
Send the subvolume(s) to stdout.

    Sends the subvolume(s) specified by <subvol> to stdout.
    <subvol> should be read-only here.
    By default, this will send the whole subvolume. To do an incremental
    send, use '-p <parent>'. If you want to allow btrfs to clone from
    any additional local snapshots, use '-c <clone-src>' (multiple times
    where applicable). You must not specify clone sources unless you
    guarantee that these snapshots are exactly in the same state on both
    sides, the sender and the receiver. It is allowed to omit the
    '-p <parent>' option when '-c <clone-src>' options are given, in
    which case 'btrfs send' will determine a suitable parent among the
    clone sources itself.
    -e               If sending multiple subvols at once, use the new
                     format and omit the end-cmd between the subvols.
    -p <parent>      Send an incremental stream from <parent> to
    -c <clone-src>   Use this snapshot as a clone source for an 
                     incremental send (multiple allowed)
    -f <outfile>     Output is normally written to stdout. To write to
                     a file, use this option. An alternative would be to
                     use pipes.
    --no-data        send in NO_FILE_DATA mode, Note: the output stream
                     does not contain any file data and thus cannot be used
                     to transfer changes. This mode is faster and useful to
		                          show the differences in metadata.
    -v|--verbose     enable verbose output to stderr, each occurrence of
                     this option increases verbosity
    -q|--quiet       suppress all messages, except errors

$ btrfs subvolume snapshot -r /mnt/send/subvol /mnt/send/snapshot
$ btrfs send /mnt/send/snapshot | btrfs receive /mnt/recv/
#then, we get a identical 'snapshot' under /mnt/recv_side
$ ls /mnt/receive_side

Then on the receive side, 'btrfs receive' is used to create a new subvolume (/mnt/recv/snapshot) and apply the instructions in the send stream to make it look like the one on the send side (/mnt/send/snapshot).

This feature is often found to be helpful when people do regular backup on filesystem because it combines built-in easy and cheap snapshot with incremental updates. Paired with out-of-band deduplication, btrfs provides all the features to build a powerful backup appliance.

Last but not least, please note that nothing comes for free, although creating a snapshot can be as easy, fast and cheap as nothing, deleting snapshot could be a factor to slow down the whole filesystem. It takes a good amount of efforts to traverse across several btrees to remove references on everything, and can consume CPU quite intensively. The problem is also known as "snowball effect of wandering trees". It's highly recommended to only keep snapshots which are necessary to have.

About the options...

  • -f <outfile>

Although stdout is used by default, often its file descriptor can refer to tty(terminal), then we may get this error,

$ btrfs send /mnt/btrfs/snap2
ERROR: not dumping send stream into a terminal, redirect it into a file

# Fix this error with one of the following commands:
btrfs send /mnt/snap > output
btrfs send -f output /mnt/snap
  • -p <parent>

This option can potentially speed up a 'send-receive' process because it informs the receiver to create a snapshot of <parent> before applying changes passed in the send stream. It assumes that a previous send-receive had happened so that <parent> exists on both sender side and receiver side.

Incremental updates can be applied with a minimum amount of effort by making a snapshot of <parent> on receiver side. It mostly works as expected, except one problem I observed, i.e. the receiver doesn't check whether <parent> is readonly or read-write. You can see this

a) toggle off the RO bit of <parent> with 'btrfs property set -s subvol <parent> ro false'
b) add or remove files/directories under <parent>

then the snapshot on the sender side will not be identical to the snapshot on the receive side, here is an example,

$ btrfs sub create /mnt/send/sub
$ touch /mnt/send/sub/foo
$ btrfs sub snap -r /mnt/send/sub /mnt/send/parent
# send parent out
$ btrfs send /mnt/send/parent | btrfs receive /mnt/recv/
# change parent and file under it
$ btrfs property set -t subvol /mnt/recv/parent ro false
$ truncate -s 4096 /mnt/recv/parent/foo
$ btrfs sub snap -r /mnt/send/sub /mnt/send/update
$ btrfs send -p /mnt/send/parent /mnt/send/update | btrfs receive /mnt/recv
$ ls -l /mnt/send/update
total 0
-rw-r--r-- 1 root root 0 Mar 6 11:13 foo
$ ls -l /mnt/recv/update
total 0
-rw-r--r-- 1 root root 4096 Mar 6 11:14 foo

However, if 'foo' in /mnt/send/update has a non-zero size, it shows the correct size on receiver side,

$ truncate -s 8192 /mnt/send/sub/foo
$ btrfs sub snap -r /mnt/send/sub /mnt/send/update-new
$ btrfs send -p /mnt/send/parent /mnt/send/update-new | btrfs receive /mnt/recv
$ ls -l /mnt/send/update-new
total 0
-rw-r--r-- 1 root root 8192 Mar 6 11:21 foo
$ ls -l /mnt/recv/update-new
total 0
-rw-r--r-- 1 root root 8192 Mar 6 11:21 foo

'btrfs receive' doesn't apply the file size if size is zero.

These issues are under development. The correct way to make changes in a readonly snapshot is to create another snapshot of itself which has write access.

  • -c <clone-src>

To understand the option, we need to explain clone first.

Clone simply refers to a kind of operation which allows two files (or two different parts within the same file) to share the same piece of data on disk, and copy-on-write will happen if any parts of the shared data gets changed.

With '-c' option, the send-receive process can avoid transferring data in the send stream because the required data has been availalbe on the receiver side, all it needs to do is to do reflink from <clone-src>.

Similar to '-p <parent>', it also assumes that <clone-src> exists on both sender side and receiver side, the difference is that '-c <clone-src>' only avoids tranferring data and '-p <parent>' avoids both data and metadata.

To reach the best result, multiple <clone-src> can be given and 'btrfs send' will try to figure out the best fit parent to use, but in case of failing to do so, an error will be printed: 'parent determination failed for xxx'.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.