This week we are going to focus on storage. Storage is a slippery slope and difficult conversation to have. Are we talking about a file synchronization like dropbox.com, google.com/docs, or box.com? Are we talking about raw block storage or long term archive storage? There are many services available from many vendors. We are going to focus on block storage in the cloud that can be used for files if desired or for backups of databases and virtual machines. Some of the cloud vendors have specific focused storage like Azure tables that offer a noSQL type storage or Amazon S3 allowing you to run a website without a web server. Today we will look at the Oracle IaaS Storage set of products. This is different than the Oracle PaaS Documents option which is more of a Google Docs like solution. The IaaS Storage is a block of storage that you pay for either on a metered usage or non-metered usage basis.
Notice from the cloud.oracle.com web page, we click on Infrastructure and follow the Storage route. We see that we get the raw block storage or the archive storage as options. We also have the option of an on-site cache front end that reduces latency and offers an NFS front end to the users providing more of a document management strategy rather than a raw block option.
Before we dive a little deeper into the options and differences between the storage appliance, spinning disk, and spinning tape in the cloud, we need to have a discussion about pricing and usage models. If you click on the Pricing tab at the top of the screen you see the screens below.
Metered pricing consists of three parts. 1) how much storage are you going to start with, 2) how much storage are you going to grow to, and 3) how much are you going to read back? Metering is difficult to guestimate and unfortunately it has a significant cost associated with being wrong. Many long term customers of AWS S3 understand this and have gotten sticker shock when the first bill comes in. The basic cost for outbound transfer is measured on a per GB basis. The more that you read across the internet, the more you pay. You can circumvent this by reading into a compute server in the Oracle cloud and not have to pay the outbound transfer. If, for example, you are backing up video surveillance data and uploading 24 hours of video at the end of they day, you can read the 24 hour bundle into a compute server and extract the 10-15 minutes that you are interested in and pay for the outbound charges on compute for the smaller video file.
Non-Metered pricing consists of one part. How much storage are you going to use over the year. Oracle does not charge for the amount of data transferred in-bound or out-bound with this storage. You can read and write as much as you want and there is no charge for data transfer across the internet. In the previous example you could read the 24 hours of video from the cloud storage, throw away 90% of it from a server in your data center, and not incur any charges for the volume of transfer.
Given that pricing is difficult to calculate, we created our own spreadsheet to estimate pricing as well as part numbers that should be ordered when consuming Oracle cloud resources. The images below show the cost of 120 TB of archive storage, metered block storage, and non-metered block storage.
Note that the data transfer price is non-trivial. Reading the data back from the cloud can get significantly more expensive than the cost of the storage itself. A good rule of thumb is the cost of spinning disk in the cloud should not exceed $30/TB/month or $400/TB/year. If you look at the cost of a NetApp or EMC storage system, you are looking at $3K-$4K/TB purchase price with 10% annual maintenance per year ($300-$400). If you are currently running out of storage and your NFS filer is filling up, you can purchase cloud resources for a few months and see if it works. It won't cost you anything more than paying support and you can grow your cloud storage as needed rather than buying 3 years ahead as you would with a filer in your data center. The key issue with cloud storage is latency and access times. Access to a filer in your data center is typically 10ms where access time to cloud storage is typically 80+ms. All cloud storage vendors have on site appliance solutions that act as cache front ends to address this latency problem. Oracle has one that talks NFS. Amazon has one that talks iSCSI. Microsoft has one that talk SMB. There truly is no single vendor with a generic solution that addresses all problems.
Enough with the business side of storage. Unfortunately, storage is a commodity so the key conversation is economics, reliability, and security. We have already addressed economics. When it comes to reliability the three cloud vendors address data replication and availability in different ways. Oracle triple mirrors the data and provides public-private key encryption of all data uploaded to the cloud. Data can be mirrored to another data center in the same geography but can not be mirrored across an ocean. This selection is done post configuration and is tied to your account as a storage configuration.
Now to the ugly part of block storage. Traditionally, block storage has been addressed through an operating system as a logical unit or aggregation of blocks on a disk drive. Terms like tracks and sectors bleed into the conversation. With cloud storage, it is not part of the discussion. Storage in the cloud is storage. It is accessed through an interface called a REST api. The data can be created, read, updated, and deleted using html calls. All of this is documented in the Oracle Documents - Cloud Storage web site.
The first step is to authenticate to the cloud site with an instance name, username, and password. What is passed back is an authentication token. Fortunately, there are a ton of tools to help read and write HTML code and are specifically tuned to help create headers and JSON structured data packets for the REST api interfaces. The screen below shows the Postman interface available through Chrome. A similar one exists for Firefox called RESTClient API. Unfortunately, there is no extension for Internet Explorer.
The first step is to get an auth header by typing in the username and password into the Basic Authentication screen.
Once we are authorized, we connect to the service by going to https://storage.us2.oraclecloud.com/v1/Storage-(identity domain) where identity domain is the cloud provider account that we have been assigned. In our example we are connecting to metcsgse00029 as our identity domain and logging in as the user cloud.admin. We can see what "containers" are available by sending a GET call or create a new container by sending a PUT call with the new container name at the end of our html string. I use the word container because the top level of storage consists of different areas. These areas are not directories. They are not file systems. The are containers that hold special properties. We can create a container that is standard storage which represents spinning disk in the cloud or we can create a container that is archive storage which represents a tape unit in the cloud. This is done by sending the X-Storage-Class header. If there is no header, the default is block storage and spinning disk. If the X-Storage-Class is assigned to Archive it is tape in the cloud. Some examples of creating a container are shown below. We can do this via Postman inside Chrome or a command line
From the command line this would look like
c url -is -X GET -H "X-Storage-User:Storage-$ODOMAIN:$OUID"
This should return an html header with HTTP 200 OK and an embedded header of X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706. Note that the value after the X-Auth-Token is what we will use to pass into all other requests. This token will change with each request and is good for 30 minutes from first execution. Once we have the authentication finished we either change the request type from a GET to a PUT and append the container name to the end. The screen above shows how to do this with Postman. The results should look like the screen below. We can do this from the command line as show below as well.
c url -is -X PUT -H "X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706"
In this example we create a new container from the command line called new_area. We can verify this by reviewing the cloud storage by changing the PUT to a GET.
c url -is -X GET -H "X-Auth-Token: AUTH_tk578061b9ae7f864ae9cde3cfdd75d706"
Both of these methods allow us to see the storage that we created. I personally do not like this interface. It is not intended to be human consumable. Uploading and downloading a file is difficult at best. A user interface that makes dragging and dropping files is desirable. This is where dropbox and google docs shine. They allow you to drag and drop as well as synchronize directories to cloud storage. The Oracle Storage Cloud is not intended to be this solution. It is designed so that you can drop a new library into your rman backup and backup straight from your database to the cloud. You can point your ComVault or Legato backup software to a cloud instance and replicate your data to the cloud. If you want a human readable interface you need to purchase something like the Cloudberry Explorer from Cloudberry. This give you a Windows Explorer like interface and allows your to drag and drop files, create containers and directories, and schedule archives or backups as desired.
Note that the way that you create a block storage container vs an archive container is a simple menu selection. Retrieving the archive storage is a little more complex because the tape unit must stage the file from the tape to disk and notify you that the restoration has been completed. This is a little more complex and we will defer this discussion to a later blog.
Copying files is little more than dragging and dropping a file between sections of a window in Cloudberry.
For completeness, I have included the command line screen shots so that you can see the request/response of a command line interaction.
It is important to remember our objective. We can use the cloud block storage as a repository for things like database and a holding point for our backups. When we configure a database in the cloud, we backup and restore from this storage. This is configured in the database provisioning screen. The Storage-metcsgse00029/backup is the location of RMAN backup and restores. The backup container is created through the REST api or Cloudberry interface. We can also attach to the cloud storage through the cloud storage appliance software which runs inside a virtual machine and listens for NFS requests and translates them into REST api calls. A small disk is attached to the virtual machine and it acts as a cache front end to the cloud storage. As files are written via NFS they are copied to the cloud storage. As the cache fills up, files contents are dropped from local storage and the metadata pointing to where the files are located are updated relocating the storage to the cloud rather than the cache disk. If a file is retrieved via NFS, the file is read from cache or retrieved from the cloud and inserted into the cache as it is written to the client that requested it.
In summary, we covered the economics behind why you would select cloud storage over on site storage. We talked about how to access the storage from a browser based interface, web based interface, or command line. We talked about improving latency and security. Overall, cloud based storage is something that everyone is familiar with. Products like Facebook, Picaso, or Instagram do nothing more than store photos in cloud storage for you to retrieve when you want. You pay for these services by advertisements injected into the web page. Corporations are turning more and more towards cloud storage as a cheaper way to consume long term storage at a much lower price. The Oracle Storage Cloud service is first of three that we will evaluate this week.