ZFS snapshots to and from S3

Saving and Restoring ZFS Snapshots to and from Amazon S3

by Sean O'Dell

We can use ZFS snapshots to save and restore filesystems from one Solaris EC2 instance to another. This functionality is very useful, for example, for saving user home directories, web server documents, MySQL databases, etc., terminating a EC2 instance, and then restoring these filesystems on a new EC2 instance created at a later date.

Amazon Simple Storage Service (S3) provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.

Getting Started

In our example, we use the second EC2 ephemeral storage device, c0d1p0 (c4d1p0 for OpenSolaris 2008.05, c3d0s0 for OpenSolaris 2008.11), to create a ZFS pool. We create filesystems within this pool for data that we want to restore for each new EC2 instance, and then save and restore the data using S3.

We use the S3 Sync command line tools listed in the references section to interface with S3. We assume the that reader has an AWS account and has a basic understanding of EC2, S3 and Solaris ZFS.

In this section we describe the steps necessary to get started using ZFS within the EC2 environment. For our example, we use the SXCE Build 79 32 bit image AMI:
ami-eb7a9f82. This example has been tested and works with any of the Sun provided Solaris AMIs, including OpenSolaris 2008.05.

Launch an instance of the Solaris AMI and login to the new instance.

bash# ec2-run-instances -k <my-key-pair> -t m1.small ami-eb7a9f82

bash# ec2-describe-instances | grep INSTANCE | cut -f2,3,4,6
i-196fb970  ami-eb7a9f82  ec2-75-101-225-153.compute-1.amazonaws-DOT-com  running

bash# ssh -i <my-key-pair-file> root-AT-ec2-75-101-225-153.compute-1.amazonaws.com
Last login: Mon Apr 28 02:24:21 2008
Sun Microsystems Inc.   SunOS 5.11      snv_79  January 2008

# bash -o vi

Download and install the S3 Sync tools. Install the tools in your preferred location. In our example, we use /usr/local/aws.

bash# mkdir -p /usr/local/aws
bash# cd /usr/local/aws

bash# /usr/sfw/bin/wget \\
  http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
bash# gzcat s3sync.tar.gz | tar xf -
bash# cd /usr/local/aws/s3sync
bash# ln -s s3cmd.rb s3cmd
bash# export PATH=$PATH:/usr/local/aws/s3sync

Please note that there is an additional step required if you are using the OpenSolaris 2008.05 AMI. This AMI does not have ruby installed. Since the S3 Sync tools require ruby, we need to install ruby if using the OpenSolaris 22008.05 AMI.

bash# which ruby
no ruby in /usr/sbin /usr/bin /usr/local/aws/s3sync

bash# pkg install SUNWruby18
bash# which ruby
/usr/bin/ruby

Set the correct time, set your AWS environment variables, and create a S3 bucket to store your saved ZFS snapshots.

bash# ntpdate 0.north-america.pool.ntp.org
bash# export AWS_ACCESS_KEY_ID=<my-aws-access-key>
bash# export AWS_SECRET_ACCESS_KEY=<my-aws-secret-access-key>
bash# s3cmd createbucket <my-bucket-name-for-zfs-snapshots>

We are now ready to create our ZFS pool and a few file systems. Logged in to the instance created and setup with the steps above, do the following.

Note: the extra disk for the OpenSolaris 2008.05 image is: c4d1p0. From this point forward, we use the SXCE Build 79 32 bit image in our examples. If using OpenSolaris 2008.05, substitute c4d1p0 for c0d1p0.

bash# zpool create s3pool c0d1p0
bash# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
s3pool   149G   111K   149G     0%  ONLINE  -

bash# for i in home mysqldb share www
do
    zfs create s3pool/$i
done

bash# zfs list -r s3pool
NAME             USED  AVAIL  REFER  MOUNTPOINT
s3pool           210K   147G    23K  /s3pool
s3pool/home       18K   147G    18K  /s3pool/home
s3pool/mysqldb    18K   147G    18K  /s3pool/mysqldb
s3pool/share      18K   147G    18K  /s3pool/share
s3pool/www        18K   147G    18K  /s3pool/www

bash# ls /s3pool
home     mysqldb  share    www

At this point we now have ZFS file systems created and mounted, ready to be populated with files.

Saving ZFS Snapshots to S3

On our source EC2 instance we have the following ZFS filesystems in pool s3pool.

bash# zfs list -r s3pool

NAME             USED  AVAIL  REFER  MOUNTPOINT
s3pool          2.11G  13.5G    23K  /s3pool
s3pool/home      117M  13.5G   117M  /s3pool/home
s3pool/mysqldb  20.8M  13.5G  20.8M  /s3pool/mysqldb
s3pool/share    7.90M  13.5G  7.90M  /s3pool/share
s3pool/www       609K  13.5G   609K  /s3pool/www


Create a snapshot for each of the filesystems.

bash# export snapshotdate=`date '+%Y%m%d-%H%M%S'`
bash# export poolname=s3pool
bash# for i in home mysqldb share www
do
    zfs snapshot -r ${poolname}/${i}@s3-${i}-$snapshotdate
done

bash# zfs list -t snapshot -r s3pool
NAME                                        USED  AVAIL  REFER  MOUNTPOINT
s3pool/home@s3-home-20080629-223417            0      -   117M  -
s3pool/mysqldb@s3-mysqldb-20080629-223417      0      -  20.8M  -
s3pool/share@s3-share-20080629-223417          0      -  7.90M  -
s3pool/www@s3-www-20080629-223417              0      -   609K  -

For each ZFS snapshot, save to a stream file.

bash# mkdir /var/tmp/snapshots
bash# for i in home mysqldb share www
do
    zfs send -R ${poolname}/${i}@s3-${i}-$snapshotdate | bzip2 \\
      > /var/tmp/snapshots/s3-${i}.bz2
done

bash# cd /var/tmp/snapshots
bash# ls
s3-home.bz2     s3-mysqldb.bz2  s3-share.bz2    s3-www.bz2

Upload the ZFS snapshot streams to S3.

bash# cd /var/tmp/snapshots
bash# export s3bucketname=<my-bucket-name-for-zfs-snapshots>

bash# for i in `ls`
do
    s3cmd put ${s3bucketname}:$i $i
done

bash# s3cmd list ${s3bucketname}
--------------------
s3-home.bz2
s3-mysqldb.bz2
s3-share.bz2
s3-www.bz2

bash# rm /var/tmp/snapshots/\*.bz2

Restoring ZFS Snapshots from S3

On our destination EC2 instance we can restore the ZFS Snapshots previously saved.

Create the new ZFS pool.

bash# zpool create s3pool c0d1p0

bash# zpool list
NAME     SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
s3pool   149G    94K   149G     0%  ONLINE  -

Download the ZFS Snapshots from S3.

bash# mkdir /var/tmp/snapshots
bash# cd /var/tmp/snapshots
bash# export s3bucketname=<my-bucket-name-for-zfs-snapshots>

bash# for i in `s3cmd list ${s3bucketname}`
do
    if ! [[ $i == "--------------------" ]]
    then
        s3cmd get ${s3bucketname}:$i $i
    fi
done

bash# ls
s3-home.bz2     s3-mysqldb.bz2  s3-share.bz2    s3-www.bz2

Restore the ZFS Snapshots.

bash# for i in `ls`
do
    bzcat $i | zfs recv -F -d s3pool
done

bash# zfs list -r s3pool
NAME                                        USED  AVAIL  REFER  MOUNTPOINT
s3pool                                      146M   147G    23K  /s3pool
s3pool/home                                 117M   147G   117M  /s3pool/home
s3pool/home@s3-home-20080629-223417            0      -   117M  -
s3pool/mysqldb                             20.8M   147G  20.8M  /s3pool/mysqldb
s3pool/mysqldb@s3-mysqldb-20080629-223417      0      -  20.8M  -
s3pool/share                               7.90M   147G  7.90M  /s3pool/share
s3pool/share@s3-share-20080629-223417          0      -  7.90M  -
s3pool/www                                  609K   147G   609K  /s3pool/www
s3pool/www@s3-www-20080629-223417              0      -   609K  -

References
Solaris ZFS Administration Guide
S3 Sync - free and open source interfaces to the Amazon S3 system

Comments:

Hi Sean,

A quick question about your blog entry.

The s3 pool mount points that you list appear to be capable of storing 147GB of data, however, I was under the impression that the maximum size of an S3 object is 5GB. How does your solution work around this limitation? I.E. a fully populated s3pool/home would result in a s3-home.bz2 file larger than 5GB which exceeds the S3 object size limitation... does your solution some how slit the s3-home.bz2 over multiple S3 objects within a given bucket?

Thanks,
Carl C.

Posted by Carl Cavanagh on July 06, 2008 at 11:58 AM PDT #

Hi Carl,

Thank you for your interest and question. I did not consider the 5Gb object limit in S3, but I think we can solve with an enhanced process to upload the files to S3, described in the blog entry in section "Upload the ZFS snapshot streams to S3".

Included below is a sample script that can be used. At this point we have created the ZFS snapshots and saved to a directory, for example /var/tmp/snapshots. We then cd to this directory and run the script shown below. Not sure of the formatting of this comment, so please feel free to email me at sean.odell @ sun.com if you would like me to send the example script file to you.

One thing to note is that you will need to modify the "restore from S3" process to retrieve the stream piece parts and then use cat to assemble back into a bz2 file. Also, I did not have a file larger then 5Gb, so I used 500Mb to test.

#===============================================
#!/usr/bin/bash

# script to split files and then upload to S3 bucket

# 500 Mb
s3objmazsize=$((500\*1024\*1024))

# 5 Gb
#s3objmazsize=$((5\*1024\*1024\*1024))

# set the S3 bucket name to store the objects
s3bucketname=skodell.ec2.zfs.snapshots

for filename in `ls \*.bz2`; do
filesize=`ls -l ${filename} | awk '{print $5}'`

if [ ${filesize} -gt ${s3objmazsize} ]; then

echo "Split file: ${filename}"
split -b ${s3objmazsize} ${filename} ${filename}.

for splitfilename in `ls ${filename}.\*`
do
echo "Upload file: ${splitfilename}"
s3cmd put ${s3bucketname}:${splitfilename} ${splitfilename}
done
else
echo "Split not needed: ${filename}"
echo "Upload file: ${filename}"
s3cmd put ${s3bucketname}:${filename} ${filename}
fi
done
#===============================================

Posted by Sean ODell on July 07, 2008 at 04:23 AM PDT #

Hi Sean,

Yep, I think that would do the trick. In fact I think that would also work for loop back device files as well.

Thanks,
Carl C.

Posted by Carl Cavanagh on July 08, 2008 at 11:28 AM PDT #

How would one tie this into the boot process of a modified AMI in order to restore ZFS snapshots on boot?

Posted by Jon Sharp on July 29, 2008 at 12:32 PM PDT #

Hi Jon,

Take a look at the blog entry:

Using Parameterized Launches to Customize Your AMIs
http://blogs.sun.com/ec2/entry/using_parameterized_launches_to_customize

We describe a process to add a script to a Solaris AMI which is then executed every time the instance is first launched. In the example, we setup a NFS server, which when first launched, downloads ZFS snapshots that were saved to S3, and then restores the ZFS snapshots.

Please let me know if this does not answer your question.

Regards,
Sean
sean.odell @ sun.com

Posted by Sean O'Dell on July 29, 2008 at 12:54 PM PDT #

Post a Comment:
Comments are closed for this entry.
About

Information about Solaris and OpenSolaris on Amazon Web Services (AWS) EC2. Look here for the latest information on the program and any late breaking information on (Open)Solaris on EC2.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today