Yes, our analogy is silly. Yes, our analogy is relevant. It is easy to decide that you want another job in another city so you hire a mover, pack up all your stuff, and move to another apartment. This is where our analogy breaks down. Cloud vendors charge you for every piece of furniture that you take out of the building. They charge you to use the stairs or elevator. They charge you every time a moving van exits the building full of furniture and boxes of clothes. It is free to bring stuff in because it locks you into the apartment. Just don't try to take anything out. Remember that storage closet or garage that you got with your apartment, you can open the door and put stuff in for free but if you carry anything out (even if you just relocate it to your apartment) you get charged per item that you carry across the threshold.
If you look at storage from any cloud vendor they offer a metered storage service. The same is true for compute services. You can lease a virtual processor and memory and grind on data all that you want. The catch is when you want to transfer your files or report results of you analysis to your desktop computer, you get charged on a per gigabyte transferred across the internet. Cost calculators help you calculate these costs but they are a little hard to estimate and use to calculate outbound data charges. Amazon, for example, has a calculators that you can use. The AWS pricing calculator allows you to look at the cost of all cloud services.
Let's walk through the cost of Amazon Glacier. The price list says that you should pay $0.007/GB/month or $7/TB/month to keep things in cold storage. We will use 120 TB as our basis for analysis. We put this as the amount to store and see the cost of storing the data is $860/month.
If we plan on reading back 10% of this data during the month the price goes up to $2217. The bulk of these charges are the outbound charges. The cost goes to $921 if we read the data to an EC2 instance and not all the way back to our data center or desktop computer. To use our apartment analogy, you are paying $860 to get a storage garage. You pay $61 every time you take something out and move it to your apartment. You can put all you want into the storage area (as long as your don't exceed the space of the storage unit) but taking something out will cost you. If you put your recently retrieved item in your car or a truck and drive it out the gate you get a surcharge of $1300. It is important to remember that pulling more stuff out of your storage will cost you more. Putting this in terms of computer archive, you can store all of your emails, contracts, customer transactions, patient records in long term storage. If your on-site storage fails for some reason or if you get a legal request to review five years of data, you can pull the data back from cold storage. It will cost you to pull the data back but it is still cheaper than keeping the seven years of longer data on spinning disk in your data center (estimate $3K/TB plus 10% per year for spinning disk in your data center).
We can do the same calculation for cloud storage using S3. We can store 120 TB for roughly $3950/month. If we want to read back 10%, or 12 TB, of that data, it will cost us $5150 or $1200 additional.
We can reduce the cost by using lower speed storage in the cloud. We put the S3 data into the infrequent category to save money. This drops the cost to just over $3K which does save us about $2100/month. We agree to pay a lower cost to get higher latency and longer retrieval times. It is better than using tape in the cloud and we can save some money with this option.
We can opt for reduced redundancy storage (aka non-mirrored and non-replicated data) but we risk data loss since we will only have a single copy in the cloud. This drops the cost to $4300 with the data retrieval but we have to weigh the cost vs data loss risks.
Let's not pick on Amazon. How does this compare to Azure? Unfortunately, we can't start with Microsoft tape in the cloud, they don't offer the service. We must start with block storage in the Azure cloud. Microsoft has an Azure pricing calculator that you can use to perform the same calculations. The calculator and pricing is a little difficult to use when you first get into it. You basically need to put together the calculation a piece at a time. You need to factor in the cost of the storage and the cost of transferring the data from the cloud to your data center. This is done in two different pieces. An example of what we are looking for can be seen below.
We need to piece together the calculator. First we add the storage component then the bandwidth component. There is a transaction component but this amount is trivial and we are going to ignore it for simplicity.
If we look at the options for Azure storage, we can basically select blob storage in different zones. In the grand scheme of things, the cost is not siginificantly higher one way or another. The basic cost is about the same.
The third class of storage that are going to look at is the Oracle Storage Cloud Service. We can look at Oracle Storage Cloud Service as well as Oracle Archive Cloud Service. The Archive service compares directly to the Amazon Glacier service except that it is $1/TB/month and suffers the same transfer charges for outbound data. The Oracle Storage Cloud Service is similar to the Amazon S3 and Azure Blob Storage Service but it is offered either as a metered service (as is S3 and Blobs) and un-metered services. Unfortunately, Oracle does not provide a cost calculator for general use. The Value Added Distributors are given a copy of the calculator but it is not generally available. The key difference with the Oracle storage services is that there are two significant flavor differences; metered and non-metered. The metered services are charged just like the Microsoft and Amazon services. You pay for what you use on a per GB basis and pay for outbound data transfer. An example of the pricing calculator is shown below. Note that we do need to have a good guestimate on how much data we will transfer outbound across the internet. These charges are not consumed if you are reading the data to a compute engine in the cloud unlike S3 which still consumes cost just for reading the data off the disk.
The most significant differential in storage offerings is the non-metered storage. Oracle offers storage in blocks that you reserve and allocate for 12 months. This is different from the metered storage in that metered can start with 10 TB and grow to 120 TB over the year. With the non-metered storage, you start with 120 TB and end with 120 TB. You can extend your contract and grow storage but you basically extend a new contract for more storage. You can not shrink your storage and pay for less. The benefit of this is that you don't have to pay for outbound data transfer. You can read and write as much as you want and not get charged for transferring the data across the internet. A pricing calculator for this is simple. How much do you need and how long do you need it?
If we piece all of this together and look a price comparison between the three service providers, the answer of which is cheapest comes down to it depends. Oracle non-metered storage has a significant advantage if you are planning on reading back your data at high or unpredictable rates. Amazon S3 infrequent is the cheapest if you don't plan on reading back your data and want it as an insurance policy only. I honestly would go with Glacier or Oracle Archive if this is the case since it is an order of magnitude cheaper. The chart below compares 120 TB of storage and the variable charge for reading back this data on a monthly basis. If you have 120 TB of storage and plan on reading back 120 TB on a monthly basis, the Oracle non-metered storage is significantly cheaper. If you are only planning on reading back 12 to 24 TB per month the cost is about the same for all of the services.
In summary, one option is not clearly better than the other (except for high read rates) and this blog is intended to help you decide on what fits your needs best. Pricing calculators can help with the cost based on transfer rates. It is important to remember that storage transfer is a significant part of the calculation. It is also important to look at your usage model. We assumed that you started with 120 TB and ended with 120 TB for our analysis. If you start with 12 TB and grow to 120 TB, the pricing calculation will be a little different. Neither the Amazon nor Azure calculators will help you run this simulation and you will have to calculate everything on a month by month basis. It is also interesting to take 120 TB of on-premise storage and assume that each TB can be purchased at $3K/TB. If we assume 10% annual hardware maintenance and a three year amortization, the charge for on-premise storage is $1030/month which might be more or less than cloud based storage. Your results might vary.