Loading Data to the Object Store for Autonomous Data Warehouse Cloud

So you got your first service instance of your autonomous data warehouse set up, you experienced the performance of the environment using the sample data, went through all tutorials and videos and are getting ready to rock-n-roll. But the one thing you’re not sure about is this Object Store. Yes, you used it successfully as described in the tutorial, but what’s next?. And what else is there to know about the Object Store?

First and foremost, if you are interested in understanding a bit more about what this Object Store is, you should read the following blog post from Rachna, the Product Manager for the Object Store among other things. It introduces the Object Store, how to set it up and manage files with the UI, plus a couple of simple command line examples (don’t get confused by the term ‘BMC’, that’s the old name of Oracle’s Cloud Infrastructure; that’s true for the command line utility as well, which is now called oci). You should read that blog post to get familiar with the basic concepts of the Object Store and a cloud account (tenant).

The documentation and blog posts are great, but now you actually want to do use it to load data into ADWC. This means loading more (and larger) files, more need for automation, and more flexibility. This post will focus on exactly that: to become productive with command line utilities without being a developer, and to leverage the power of the Oracle Object Store to upload more files in one go and even how to upload larger files in parallel without any major effort.

The blog post will cover both:

The Oracle oci command line interface for managing files
The Swift REST interface for managing files

Using the oci command line interface

The Oracle oci command line interface (CLI) is a tool that enables you to work with Oracle Cloud Infrastructure objects and services. It’s a thin layer on top of the oci APIs (typically REST) and one of Oracle’s open source project (the source code is on GitHub).

Let’s quickly step through what you have to do for using this CLI. If you do not want to install anything, that is fine, too. In that case feel free to jump to the REST section in this post right away, but you’re going to miss out on some cool stuff that the CLI provides you out of the box.

To get going with the utility is really simple, as simple as one-two-three

Install oci cli following the installation instructions on github.
I just did this on an Oracle Linux 7.4 VM instance that I created in the Oracle Cloud and had the utility up and running in no time.
Configure your oci cli installation.
You need a user created in the Oracle Cloud account that you want to use, and that user must have the appropriate privileges to work with the object store. A keypair is used for signing API requests, with the public key uploaded to Oracle. Only the user calling the API should possess the private key. All this is described in the configuration section of the CLI.

That is probably the part that takes you the most time of the setup. You have to ensure to have UI console access when doing this since you have to upload the public key for your user.
Use oci cli.
After successful setup you can use the command line interface to manage your buckets for storing all your files in the Cloud, among other things.

First steps with oci cli

The focus of the command line interface is on ease-of-use and to make its usage as self-explaining as possible, with a comprehensive built-in help system in the utility. Whenever you want to know something without looking around, use the –help, -h, or -? Syntax for a command, irrespective of how many parameters you have already entered. So you can start with oci -h and let the utility guide you.

For the purpose of file management the important category is the object store category, with the main tasks of:

Creating, managing, and deleting buckets
This task is probably done by an administrator for you, but we will cover it briefly nevertheless
Uploading, managing, and downloading objects (files)
That’s your main job in the context of the Autonomous Data Warehouse Cloud

That’s what we are going to do now.

Creating a bucket

Buckets are containers that store objects (files). Like other resources, buckets belong to a compartment, a collection of resources in the Cloud that can be used as entity for privilege management. To create a bucket you have to know the compartment id to create a bucket. That is the only time we have to deal with this cloud-specific unique identifiers. All other object (file) operations use names.

So let’s create a bucket. The following creates a bucket named myFiles in my account ADWCACCT in a compartment given to me by the Cloud administrator.

$ oci os bucket create --compartment-id ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq --namespace-name adwcaact --name myFiles { "data": { "compartment-id": "ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq", "created-by": "ocid1.user.oc1..aaaaaaaaomoqtk3z7y43543cdvexq3y733pb5qsuefcbmj2n5c6ftoi7zygq", "etag": "c6119bd6-98b6-4520-a05b-26d5472ea444", "metadata": {}, "name": "myFiles", "namespace": "adwcaact", "public-access-type": "NoPublicAccess", "storage-tier": "Standard", "time-created": "2018-02-26T22:16:30.362000+00:00" }, "etag": "c6119bd6-98b6-4520-a05b-26d5472ea733" }

The operation returns with the metadata of the bucket after successful creation. We’re ready to upload and manage files in the object store.

Upload your first file with oci cli

You can upload a single file very easily with the oci command line interface. And, as promised before, you do not even have to remember any ocid in this case … .

$ oci os object put --namespace adwcacct --bucket-name myFiles --file /stage/supplier.tbl Uploading object [####################################] 100% { "etag": "662649262F5BC72CE053C210C10A4D1D", "last-modified": "Mon, 26 Feb 2018 22:50:46 GMT", "opc-content-md5": "8irNoabnPldUt72FAl1nvw==" }

After successful upload you can check the md5 sum of the file; that’s basically the fingerprint that the data on the other side (in the cloud) is not corrupt and the same than local (on the machine where the data is coming from). The only “gotcha” is that OCI is using base64 encoding, so you cannot just do a simple md5. The following command solves this for me on my Mac:

$ openssl dgst -md5 -binary supplier.tbl |openssl enc -base64 8irNoabnPldUt72FAl1nvw==

Now that’s a good start. I can use this command in any shell program, like the following which loads all files in a folder sequentially to the object store:

for i in `ls *.tbl` do oci os object put --namespace adwcacct --bucket-name myFiles --file $i done

You can write it to load multiple files in parallel, load only files that match a specific name pattern, etc. You get the idea. Whatever you can do with a shell you can do.

Alternatively, if it’s just about loading all the files in you can achieve the same with the oci cli as well by using its bulk upload capabilities. The following shows briefly

oci os object bulk-upload -ns adwcacct -bn myFiles --src-dir /MyStagedFiles

{ "skipped-objects": [], "upload-failures": {}, "uploaded-objects": { "chan_v3.dat": { "etag": "674EFB90B1A3CECAE053C210D10AC9D9", "last-modified": "Tue, 13 Mar 2018 17:43:28 GMT", "opc-content-md5": "/t4LbeOiCz61+Onzi/h+8w==" }, "coun_v3.dat": { "etag": "674FB97D50C34E48E053C230C10A1DF8", "last-modified": "Tue, 13 Mar 2018 17:43:28 GMT", "opc-content-md5": "sftu7G5+bgXW8NEYjFNCnQ==" }, "cust1v3.dat": { "etag": "674FB97D52274E48E053C210C10A1DF8", "last-modified": "Tue, 13 Mar 2018 17:44:06 GMT", "opc-content-md5": "Zv76q9e+NTJiyXU52FLYMA==" }, "sale1v3.dat": { "etag": "674FBF063F8C50ABE053C250C10AE3D3", "last-modified": "Tue, 13 Mar 2018 17:44:52 GMT", "opc-content-md5": "CNUtk7DJ5sETqV73Ag4Aeg==" } } }

Uploading a single large file in parallel

Ok, now we can load one or many files to the object store. But what do you do if you have a single large file that you want to get uploaded? The oci command line offers built-in multi-part loading where you do not need to split the file beforehand. The command line provides you built-in capabilities to (A) transparently split the file into sized parts and (B) to control the parallelism of the upload.

$ oci os object put -ns adwcacct -bn myFiles --file lo_aa.tbl --part-size 100 --parallel-upload-count 4

While the load is ongoing you can all in-progress uploads, but unfortunately without any progress bar or so; the progress bar is reserved for the initiating session:

$ oci os multipart list -ns adwcacct -bn myFiles { "data": [ { "bucket": "myFiles", "namespace": "adwcacct", "object": "lo_aa.tbl", "time-created": "2018-02-27T01:19:47.439000+00:00", "upload-id": "4f04f65d-324b-4b13-7e60-84596d0ef47f" } ] }

While a serial process for a single file gave me somewhere around 35 MB/sec to upload on average, the parallel load sped up things quite a bit, so it’s definitely cool functionality (note that your mileage will vary and is probably mostly dependent on your Internet/proxy connectivity and bandwidth.

If you’re interested in more details about how that works, here is a link from Rachna who explains the inner details of this functionality in more detail.

Using the Swift REST interface

Now after having covered the oci utility, let’s briefly look into what we can do out of the box, without the need to install anything. Yes, without installing anything you can leverage REST endpoints of the object storage service. All you need to know is your username/SWIFT password and your environment details, e.g. which region your uploading to, the account (tenant) and the target bucket.

This is where the real fun starts, and this is where it can become geeky, so we will focus only on the two most important aspects of dealing with files and the object store: uploading and downloading files.

Understanding how to use Openstack Swift REST

File management with REST is equally simple than it is with the oci cli command. Similar to the setup of the oci cli, you have to know the basic information about your Cloud account, namely:

a user in the cloud account that has the appropriate privileges to work with a bucket in your tenancy. This user also has to be configured to have a SWIFT password (see here how that is done).
a bucket in one of the object stores in a region (we are not going to discuss how to use REST to do this). The bucket/region defines the rest endpoint, for example if you are using the object store in Ashburn, VA, the endpoint is https://swiftobjectstorage.us-ashburn-1.oraclecloud.com)

The URI for accessing your bucket is built as follows:

In my case for the simple example it would be

https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/adwcacct/myFiles

If you have all this information you are set to upload and download files.

Uploading an object (file) with REST

Uploading a file is putting a file into the Cloud, so the REST command is a PUT. You also have to specify the file you want to upload and how the file should be named in the object store. With this information you can write a simple little shell script like the following that will take both the bucket and file name as input:

# usage: upload_oss.sh <file> <bucket> file=$1 bucket=$2 curl -v -X PUT \ -u 'jane.doe@acme.com:)#sdswrRYsi-In1-MhM.! ' \ --upload-file ${file} \ https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/adwcacct/${bucket}/${file}

So if you want to upload multiple files in a directory, similar to what we showed for the oci cli command, you just save this little script, say upload_oss.sh, and call it just like you called oci cli:

for i in `ls *.tbl` do upload_oss.sh myFiles $i done

Downloading an object (file) with REST

While we expect you to upload data to the object store way more often than downloading information, let’s quickly cover that, too. So you want to get a file from the object store? Well, the REST command GET will do this for you. It is similarly intuitive than uploading, and you might be able to guess the complete syntax already. Yes, it is:

curl -v -X GET \ -u 'jane.doe@acme.com:)#sdswrRYsi-In1-MhM.!' \ https://swiftobjectstorage.us-ashburn1.oraclecloud.com/v1/adwcacct/myFiles/myFileName \ --output myLocalFileName

That’s about all you need to get started uploading all your files to the Oracle Object Store so that you then can consume them from within the Autonomous Data Warehouse Cloud.

Happy uploading!

Loading Data to the Object Store for Autonomous Data Warehouse Cloud

Using the oci command line interface

First steps with oci cli

Creating a bucket

Upload your first file with oci cli

Uploading a single large file in parallel

Using the Swift REST interface

Understanding how to use Openstack Swift REST

Uploading an object (file) with REST

Downloading an object (file) with REST

Hermann Baer

Senior Director Product Management

Using Materialized Views with Big Data SQL to Accelerate Performance

Object Store Service operations. Part 1 - Loading data

Loading Data to the Object Store for Autonomous Data Warehouse Cloud

Using the oci command line interface

First steps with oci cli

Creating a bucket

Upload your first file with oci cli

Uploading a single large file in parallel

Using the Swift REST interface

Understanding how to use Openstack Swift REST

Uploading an object (file) with REST

Downloading an object (file) with REST

Authors

Hermann Baer

Senior Director Product Management

Using Materialized Views with Big Data SQL to Accelerate Performance

Object Store Service operations. Part 1 - Loading data