ZFS Snapshot and Amazon S3 (Part 2 of 2)

Alright, I made you read my blather about costs and such, now its time for some code! Well, you may be disappointed to know there is really very little code to creating a ZFS snapshot, sending it to Amazon S3, and restoring the file system back to its original state after retrieving the snapshot BACK from Amazon S3.

This version of the implementation is going to take the "simple is elegant" route. I use the built-in capabilites of ZFS to take snapshots, save them, and restore them and I couple this some Java code to move to and from Amazon S3 (I'm much quicker with Java over the many other languages you can use for Amazon S3, Perl / PHP / etc...).

Basic Setup
The following was set up on my system:
\* One storage pool: zfs create media c0t1d0
\* A file system within the media storage pool: zfs create media/mypictures
\* Mountpoint change to /export: zfs set mountpoint=/export/media/mypictures media/mypictures
\* Shared over NFS: zfs set sharenfs=on media/mypictures
\* Compression turned on: zfs set compression=on media/mypictures

I copied some pictures into the /export/media/mypictures directory (enough to create an acceptable snapshot...193 Meg).

Snapshot and Store
The steps for creating the snapshot and sending it to Amazon S3 are:
\* Create a snapshot of the file system
\* Compress the snapshot and redirect it to a file
\* uuencode the snapshot (removes control characters)
\* Send it to Amazon S3 with appropriate metadata

To create a snapshot, use the "zfs snapshot" command. I will snapshot the entire /export/media/mypictures directory and name the snapshot "20070607" with the command:

zfs snapshot media/mypictures@20070607

The snapshot should initially take up no additional space in my filesystem. As files change, the snapshot space grows, since the original blocks must be retained along with the new blocks. Still, saving the snapshot will require the full amount of space since I am creating a "file" full of the snapshot of the data (which happens to be all of the original data). It is relatively easy to turn the snapshot itself into a stream of data, simply "send" the snapshot to a file. Enroute to the file, pass it through gzip for any extra compression (remember, we'll be paying for this space):

zfs send media/mypictures@20070607 | gzip > ~/backups/20070607.gz

Mileage varies with compression on snapshots.

Next, I uuencode the file to prepare it to be sent over the Internet. The uuencode process ''expands'' the file size by about 35% so its highly likely that any gains we made through compression are taken back out by uuencoding. Here is the uuencoding process from the command line:

uuencode 20070607.gz 20070607.gz > 20070607.gz.uu

That's it on the Solaris-side, now I can send the snapshot to Amazon S3.

I will assume that a "bucket" is already created and that we are merely sending the final, uuencoded snapshot to the Amazon S3 bucket. To be honest, I tried using Curl, Perl, and a variety of other things and I couldn't quickly get the right libraries to create the signatures and I just hate scrounging around the Internet for the right this or that and changing compilation flags and recompiling and... So, I went with the Java - REST approach.

Use the Amazon S3 Library for REST in Java library. This has classes for doing all of your favorite Amazon S3 operations and was quite easy to use. You can see the whole program with the keys removed, or you can just take a look at the essential pieces of code for uploading here:


AWSAuthConnection conn =
new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey);
S3Object object = null;

try {
byte[] bytes = new byte[(int)length];
/\* READ SNAPSHOT INTO BYTE ARRAY HERE \*/
object = new S3Object(bytes, null);
} catch (IOException ioe) {
}

Map headers = new TreeMap();
headers.put("Content-Type", Arrays.asList(new String[] { "text/plain" }));
System.out.println(
conn.put(bucketName, keyName, object, headers).connection.getResponseMessage()
);



Run the code and you are GOOD TO GO!

Retrieve and Restore
The process of retrieving and restoring the snapshot when you lose your data or want to return to a previous time in your history is relatively simple as well, simply have a program to reverse the process above. Here is the essential Java code (using the Amazon S3 Java REST libraries again) for retrieving the blob from Amazon S3:


AWSAuthConnection conn =
new AWSAuthConnection(awsAccessKeyId, awsSecretAccessKey);

System.out.println("----- getting object -----");
byte[] bytes = conn.get(bucketName, args[0], null).object.data;
/\* WRITE THE DATA TO A NEW SNAPSHOT FILE \*/

Once you have your uuencoded, gzipped snapshot back from Amazon S3, decode it and decompress the snapshot like this (the reverse of what you did before):


# uudecode 20070607.gz.uu
# gunzip 20070607.gz

Now you have to decide what to do with your snapshot. I moved the existing mypictures pool and restored my old snapshot into its place to give me a complete time travel back to my snapshot. Here are the commands:


# zfs rename media/mypictures media/mypictures.old
# zfs receive media/mypictures < 20070607

That's it! Going to /export/media/mypictures will bring me to the pictures I snapshotted on June 7, 2007!

Issues with implementation
Yes, there are plenty of issues with the implementation above...I will simply call it a "reference implementation", right?
\* Amazon S3 size limitations - In the "Terms of Use", Amazon S3 specifies the following: "You may not, however, store "objects" (as described in the user documentation) that contain more than 5Gb of data, or own more than 100 "buckets" (as described in the user documentation) at any one time. As a result, one would want to slice the snapshot up appropriately so as to conform to the Amazon S3 limitations, or possibly work with Amazon S3. The limitation IS completely reasonable though due to lengths and limitations in the HTTPS protocol itself, anything bigger and you will have to "chunk" it up for storage...this will take many more than 3 lines of code, but not too much.
\* Encryption - Data should be encrypted appropriately before being sent to a third-party storage site
\* Cron Job - Timely snapshots
\* Non-Java - It would be nice to do the whole process from scripts, but I got hung up on signing the header and streaming it over Curl so I hopped to my native tongue (Java).
\* Heap Limits in Java - It is insane to read a 5GB file into Java, but I was being simple and quick, I think Curl would be a better solution if I could get the header signing right.

Total Cost
I incurred a total cost of $0.07 to my Amazon.com Web Services account for the raw storage and bandwidth for the blog post, to be charged July 1, 2007. As I want to save a little money for my company, I will swallow the cost and call it a day (or two).

Comments:

Nice article!

For uploading filesystem snapshots, you definitely want a user-agent that does streaming. I recommend the s3curl sample program. It is a wrapper around curl that creates the appropriate HTTP Authorization header from your curl command line arguments.

One question: What was the purpose of the uuencode step? Amazon S3 has no problem storing binary data.

Posted by John Cormie on June 08, 2007 at 12:59 PM MDT #

Thank you! And you are, of course, correct. The uuencoding is a holdover from the Remote Monitoring work I do, I guess I gratuitously do it. I went through the article and did a strikethrough on the uuencode sections.

Just an FYI, here is a list of file sizes:

  • Raw Snapshot: 143,110,056
  • .gz Snapshot: 139,541,587
  • .gz.uu Snapshot: 192,257,330

Thanks for the catch!

Posted by Paul Monday on June 10, 2007 at 03:47 PM MDT #

[Trackback] amazon プロフィール検索 動画検索 画像検索

Posted by いま話題の検索語とは? on June 13, 2007 at 10:02 PM MDT #

Post a Comment:
Comments are closed for this entry.
About

pmonday

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today