X

Pat Shuff's Blog

Storage in Amazon S3

To be honest, I am going out on a limb here. I know just enough about Amazon S3 to be dangerous. Most of the reference material that I used was from the amazon web site or Safari Books. The books that I relied upon the most are
The key use cases according to the S3 Essentials book are file hosting, storing data on mobile based applications, static web hosting, video hosting, and data backup. We will look at a couple of these configurations and how to deploy and use them.

With Oracle Storage Cloud Services we had the concept of a container. This container had characteristics like spinning disk, archive, ownership, and other features related to ownership and security. Amazon S3 has a similar concept but they call this container a bucket. A bucket can contain nested folders and has properties associated with it. If we look at the AWS Console, we see six types of storage and content delivery

  • S3 - block storage in the cloud
  • Cloud Front - content delivery network
  • Elastic File System - fully managed file system for EC2
  • Glacier - tape storage in the cloud
  • Snowball - large scale data transport to get data into and out of S3
  • Storage Gateway - an appliance to reduce latency for S3 storage

We will be focusing on S3 and Glacier. Snowball is a mechanism to transport large amounts of data to and from the Amazon data center. The Storage Gateway is an appliance in a data center to reduce latency and provide a quicker access to data stored in S3. We will need to dive a little deeper into S3 and the Storage Gateway but not the Cloud Front, and the Elastic File System in this blog.

We first start with the AWS console and click on the S3 console. We can create a new bucket by clicking on Create Bucket.



When we create a new S3 bucket, we can name it and define which data center we want the storage to be allocated into. We have to be careful when we create a bucket name. The namespace is shared with all users. If we want to create a common name, it will probably be used by someone else and we will see an error in creating the name.

If we look at the properties associated with this storage we can see that we have a variety of options to configure.

  • Permissions
  • Static Web Hosting
  • Logging
  • Events
  • Versioning
  • Lifecycle
  • Cross-Region Replication
  • Tags
  • Requester Pays

Let's go through each of these individually. With Permissions, you have the ability to control who can see, modify, delete, and download the contents of the bucket. Bucket policies can get relatively complex and have a variety of conditions and restrictions applied to it. You can find out more at Detailing Advanced Policies. This feature allows you to restrict who can read content by ip address, access keys, or usernames.

Static web hosting allows you to create a web site based on the files in a container. If you have an index.html, it becomes to the basis for accessing all of the other files in this directory. This is both good and bad because you get the basic functionality of a web server but you don't get the configuration and access logs. It has some uses but is limited in how it works. It does make static web page presentation easy because you no longer need an EC2 instance, operating system, or application to host the web site.

Logging allows you to view how, who, and from where files were accessed. You can generate logs to look at access patterns and access locations.

Versioning allows you to keep past copies of files. If a file is edited and changed, previous versions and deltas are tracked. This is a good feature but does cause storage consumption to grow because you never delete older versions of a file but keep the deltas for a fixed amount of history.

Lifecycle allows you to automatically archive files from spinning disk to tape after a fixed amount of time and history of access. If no one has accessed a file in months, it can be written to Glacier for long term lower cost archive.

Cross-Region Replication allows you to replicate blocks between data centers automatically. This allows for high availability in the event that one data center fails or storage at one location is having significant problems.

Tags and Request Payer allows for charge-back features to allow people who consume resources to pay for the download and storage. The person creating the bucket is not charged for usage but has the mechanism to transfer the charges to the person reading the data.

Reading and writing to our newly created bucket requires a user interface or usage of the Amazon Rest api to transfer files. Amazon does provide a user interface to upload and edit the properties of the files and directories. We recommend using another interface like CloudBerry or other graphical tool or the command line utilities because this interface is a bit limiting.





This blog entry is significantly different from the one yesterday. Yesterday we started with pricing then got technical. Today we dove straight into the technical and ignored pricing. Let's dive into pricing. The cost of S3 storage is $30/TB/month plus outbound charges. I suggest using the S3 price list and the S3 price calculator to figure pricing. Attached are screen shots of pricing for 120 TB of storage using the calculator and screen shots of the price list.





One thing that we talked about with the Oracle Storage Cloud and have not talked about here is an on premise virtual machine to reduce latency. Amazon offers this with the AWS Storage Gateway. The key differences between the two products are 1) AWS Gateway uses iSCSI on the client side to provide storage access to the data center and 2) it cost $125/month/gateway. It solves the same latency problem but does it slightly differently. Unfortunately, we are not going to install and configure this virtual instance and play with it because it requires 8 virtual CPUs which is greater than my laptop will offer.


In summary, this is an initial review of S3 storage with Amazon AWS. We did not dive deep into Glacier or the Storage Gateway. We did not review elastic block services (EBS) because these are typically attached to EC2 instances. It is important to note that the focus of S3 is different than Oracle Storage Cloud Services but very similar. Files and directories can be stored in containers and access can be controlled. S3 extends services to provide things like video streaming, static web site hosting, and migrating data to and from tape in the cloud. You can use S3 for backup archives and generic block storage and access it via REST api or AWS api calls. Products like CloudBerry Explorer and S3 Explorer exist to help translate the human interface to S3 storage calls. The cost for S3 is roughly $30/TB/month with additional charges for outbound data on a per GB basis. Archive storage is roughly $7/TB/month with additional charges for data retrieval and outbound data on a per GB basis. The intent of this blog is not to say that one service is better than the other but provide resources to help you make your own decisions and decide what works best for your situation and corporation.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.