ZFS Snapshot and Amazon S3 (Part 1 of 2)
By pmonday on Jun 07, 2007
There are a few things on my mind these days: ZFS, Amazon S3, Project Blackbox, Why the "C" on my keyboard is sticking so much, OpenSolaris, Network Attached Storage, Storage System Blueprints, Communities, and on and on. I finally decided to sit and think about the economics of utility storage vs. rolling your own as well as whether I could create a very simple BluePrint for backing up and restoring ZFS to a storage utility, like Amazon S3. Here's a hint...ZFS to Amazon S3 is insanely simple.
Needless to say I incurred my own credit card bills to bring this to you since I figured the cost of backing up some of my pictures was going to be less than filling out an expense report SO, since I burned the money, you are forced to sit through a 2 part blog post that first looks at VERY HIGH LEVEL economics of storage utilities and is followed by the "recipe" for implementing a simple ZFS snapshot to Amazon S3 backup and restore process.
Before I got too involved with the whole implementation, I needed to see what the whole process would cost me, cost is king in a start-up. This is a nowhere-near completely scientific breakdown, but it is based on how I've seen some friends do this type of thing, so it is decent. I would really suggest you do your own up front planning and roll your own costs into the mix and do comparisons to colocation services along with your own server or even some of the ISPs around that give you liberal storage quotas.
Here are the simple requirements:
- Off-site backup location
- 40 GB/Month Backup/Archive, maxing out and staying even at 400 GB after 10 months (continuing to upload 40 GB/Month but deleting snapshots prior to that)
- 2 times a year download of 40GB for restore
Notice I have not discussed access rates or access latency. My assumption that "Internet Time" for access rates and latency is sufficient. The choice of 40GB a month is "relatively" arbitrary so please, please do your own pricing (using Sun Servers of course if you are doing a comparison).
- Amazon S3 - Pricing is located on the site and is based on bandwidth plus storage.
- The total price for new storage + bandwidth for my requirements is $10
- The raw storage cost increases each month until I hit 400GB ($6, $12, ... , $60 and remaining at $60/month afterwards).
- I planned two downloads per year of 40GB which will cost me $7.20 each, $14.40 per year
- The TOTAL cost for my 2 years of 400GB of storage should be about $1294.80.
At this point, I will give you a very important disclaimer. The cost seems relatively reasonable for a worry-free 400GB, but using some other trajectories, you can exceed the costs of your own server and hosting. The cross-over point for my overly simple world seems to be around a year when you are using terrabyte economics. Colocation of a Sun Fire X4500 can be extremely competitive or better if you are looking at backups hitting the 24TB range. For the SmugMug architecture and storage requirements (well over the TB range), Don MacAskill does see great cost savings with Amazon S3.
Consider the simple modification to assume that the storage is capped at 400GB but 400GB is replaced each month, increasing the bandwidth consumption. This brings the entire cost for 24 month span up to $2428.80, add another 10x for 4TB per month and we start talking real money (OK...to me, real money started at anything over $30). But raw storage and bandwidth are not the only concerns, what about system administration that goes into rolling your own server and maintaining it, that costs money and time!
You really need to ponder these types of decisions long and hard. I am a HUGE advocate of storage utilities for the short-handed, but I am also a huge advocate of rolling your own to get your own economies of scale (did I mention that Sun provides some great solutions?). Every company is unique and has its own requirements.
There are many, many factors to consider for your own business. There are CLEAR advantages to storage utilities like Amazon S3, here are a few:
- Pay for what you use
- No system administration on your part
- Single throat to choke when something is down
- You reap the benefits of their efficiencies of scale
There are disadvantages too
- You are somewhat locked into your storage utility
- Misjudging your requirements can result in high cost to you
- You cannot create better efficiencies of scale as you reach high amounts of data capacity (lack of hierarchical pricing in Amazon S3 based on aging is a bummer)
Still, Amazon S3 is clearly useful to startups and even your own personal backups. SmugMug uses it liberally as well as a variety of other startups. We shouldn't eliminate the other storage utilities around either, there are quite a few to choose from these days.
I have about 5GB of pictures that I need to keep backed up. That backup would cost me mere dollars a month, 10s of dollars for a year...that is pretty reasonable when you consider they are not replaceable and my house may be hit by a power surge at a moment's notice.
Part 2 will be posted soon. I used the ZFS snapshot capability and a simple Java program to move data back and forth to the Storage Utility! With ZFS, combining with Amazon S3 is a piece of cake.