X

Information, tips, tricks and sample code for Big Data Warehousing in an autonomous, cloud-driven world

Loading Data to the Object Store for Autonomous Data Warehouse Cloud

By: Hermann Baer | Senior Director Product Management

So you got your first service instance of your autonomous data warehouse set up, you experienced the performance of the environment using the sample data, went through all tutorials and videos and are getting ready to rock-n-roll. But the one thing you’re not sure about is this Object Store. Yes, you used it successfully as described in the tutorial, but what’s next?. And what else is there to know about the Object Store?

First and foremost, if you are interested in understanding a bit more about what this Object Store is, you should read the following blog post from Rachna, the Product Manager for the Object Store among other things. It introduces the Object Store, how to set it up and manage files with the UI, plus a couple of simple command line examples (don’t get confused by the term ‘BMC’, that’s the old name of Oracle’s Cloud Infrastructure; that’s true for the command line utility as well, which is now called oci). You should read that blog post to get familiar with the basic concepts of the Object Store and a cloud account (tenant).

The documentation and blog posts are great, but now you actually want to do use it to load data into ADWC.  This means loading more (and larger) files, more need for automation, and more flexibility.  This post will focus on exactly that: to become productive with command line utilities without being a developer, and to leverage the power of the Oracle Object Store to upload more files in one go and even how to upload larger files in parallel without any major effort.

The blog post will cover both:

  • The Oracle oci command line interface for managing files
  • The Swift REST interface for managing files

 

Using the oci command line interface

The Oracle oci command line interface (CLI) is a tool that enables you to work with Oracle Cloud Infrastructure objects and services. It’s a thin layer on top of the oci APIs (typically REST) and one of Oracle’s open source project (the source code is on GitHub).

Let’s quickly step through what you have to do for using this CLI. If you do not want to install anything, that is fine, too. In that case feel free to jump to the REST section in this post right away, but you’re going to miss out on some cool stuff that the CLI provides you out of the box.

To get going with the utility is really simple, as simple as one-two-three

  1. Install oci cli following the installation instructions on github.
    I just did this on an Oracle Linux 7.4 VM instance that I created in the Oracle Cloud and had the utility up and running in no time.
     
  2. Configure your oci cli installation.
    You need a user created in the Oracle Cloud account that you want to use, and that user must have the appropriate privileges to work with the object store. A keypair is used for signing API requests, with the public key uploaded to Oracle. Only the user calling the API should possess the private key. All this is described in the configuration section of the CLI. 

    That is probably the part that takes you the most time of the setup. You have to ensure to have UI console access when doing this since you have to upload the public key for your user.
     
  3. Use oci cli.
    After successful setup you can use the command line interface to manage your buckets for storing all your files in the Cloud, among other things.

 

First steps with oci cli

The focus of the command line interface is on ease-of-use and to make its usage as self-explaining as possible, with a comprehensive built-in help system in the utility. Whenever you want to know something without looking around, use the --help, -h, or -? Syntax for a command, irrespective of how many parameters you have already entered. So you can start with oci -h and let the utility guide you.

For the purpose of file management the important category is the object store category, with the main tasks of:

  • Creating, managing, and deleting buckets
    This task is probably done by an administrator for you, but we will cover it briefly nevertheless
     
  • Uploading, managing, and downloading objects (files)
    That’s your main job in the context of the Autonomous Data Warehouse Cloud

That’s what we are going to do now.

 

Creating a bucket

Buckets are containers that store objects (files). Like other resources, buckets belong to a compartment, a collection of resources in the Cloud that can be used as entity for privilege management. To create a bucket you have to know the compartment id to create a bucket. That is the only time we have to deal with this cloud-specific unique identifiers. All other object (file) operations use names.

So let’s create a bucket. The following creates a bucket named myFiles in my account ADWCACCT in a compartment given to me by the Cloud administrator.

$ oci os bucket create --compartment-id ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq --namespace-name adwcaact --name myFiles

{
  "data": {
    "compartment-id": "ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq",
    "created-by": "ocid1.user.oc1..aaaaaaaaomoqtk3z7y43543cdvexq3y733pb5qsuefcbmj2n5c6ftoi7zygq",
    "etag": "c6119bd6-98b6-4520-a05b-26d5472ea444",
    "metadata": {},
    "name": "myFiles",
    "namespace": "adwcaact",
    "public-access-type": "NoPublicAccess",
    "storage-tier": "Standard",
    "time-created": "2018-02-26T22:16:30.362000+00:00"
  },
  "etag": "c6119bd6-98b6-4520-a05b-26d5472ea733"
}

The operation returns with the metadata of the bucket after successful creation. We’re ready to upload and manage files in the object store.

 

Upload your first file with oci cli

You can upload a single file very easily with the oci command line interface. And, as promised before, you do not even have to remember any ocid in this case … .

$ oci os object put --namespace adwcacct --bucket-name myFiles --file /stage/supplier.tbl

Uploading object  [####################################]  100%
{
  "etag": "662649262F5BC72CE053C210C10A4D1D",
  "last-modified": "Mon, 26 Feb 2018 22:50:46 GMT",
  "opc-content-md5": "8irNoabnPldUt72FAl1nvw=="
}

After successful upload you can check the md5 sum of the file; that’s basically the fingerprint that the data on the other side (in the cloud) is not corrupt and the same than local (on the machine where the data is coming from). The only “gotcha” is that OCI is using base64 encoding, so you cannot just do a simple md5. The following command solves this for me on my Mac:

$ openssl dgst -md5 -binary supplier.tbl |openssl enc -base64
8irNoabnPldUt72FAl1nvw==

Now that’s a good start. I can use this command in any shell program, like the following which loads all files in a folder sequentially to the object store: 

for i in `ls *.tbl`
do
  oci os object put --namespace adwcacct --bucket-name myFiles --file $i
done

You can write it to load multiple files in parallel, load only files that match a specific name pattern, etc. You get the idea. Whatever you can do with a shell you can do.

Alternatively, if it's just about loading all the files in  you can achieve the same with the oci cli as well by using its bulk upload capabilities. The following shows briefly

oci os object bulk-upload -ns adwcacct -bn myFiles --src-dir /MyStagedFiles

{
  "skipped-objects": [],
  "upload-failures": {},
  "uploaded-objects": {
    "chan_v3.dat": {
      "etag": "674EFB90B1A3CECAE053C210D10AC9D9",
      "last-modified": "Tue, 13 Mar 2018 17:43:28 GMT",
      "opc-content-md5": "/t4LbeOiCz61+Onzi/h+8w=="
    },
    "coun_v3.dat": {
      "etag": "674FB97D50C34E48E053C230C10A1DF8",
      "last-modified": "Tue, 13 Mar 2018 17:43:28 GMT",
      "opc-content-md5": "sftu7G5+bgXW8NEYjFNCnQ=="
    },
    "cust1v3.dat": {
      "etag": "674FB97D52274E48E053C210C10A1DF8",
      "last-modified": "Tue, 13 Mar 2018 17:44:06 GMT",
      "opc-content-md5": "Zv76q9e+NTJiyXU52FLYMA=="
    },
    "sale1v3.dat": {
      "etag": "674FBF063F8C50ABE053C250C10AE3D3",
      "last-modified": "Tue, 13 Mar 2018 17:44:52 GMT",
      "opc-content-md5": "CNUtk7DJ5sETqV73Ag4Aeg=="
    }
  }
}

Uploading a single large file in parallel 

Ok, now we can load one or many files to the object store. But what do you do if you have a single large file that you want to get uploaded? The oci command line offers built-in multi-part loading where you do not need to split the file beforehand. The command line provides you built-in capabilities to (A) transparently split the file into sized parts and (B) to control the parallelism of the upload.

$ oci os object put -ns adwcacct -bn myFiles --file lo_aa.tbl --part-size 100 --parallel-upload-count 4

While the load is ongoing you can all in-progress uploads, but unfortunately without any progress bar or so; the progress bar is reserved for the initiating session: 

$ oci os multipart list -ns adwcacct -bn myFiles
{
  "data":
   [    
    {
      "bucket": "myFiles",
      "namespace": "adwcacct",
      "object": "lo_aa.tbl",
      "time-created": "2018-02-27T01:19:47.439000+00:00",
      "upload-id": "4f04f65d-324b-4b13-7e60-84596d0ef47f"
    }
  ]
}

 

While a serial process for a single file gave me somewhere around 35 MB/sec to upload on average, the parallel load sped up things quite a bit, so it’s definitely cool functionality (note that your mileage will vary and is probably mostly dependent on your Internet/proxy connectivity and bandwidth. 

If you’re interested in more details about how that works, here is a link from Rachna who explains the inner details of this functionality in more detail.

 

Using the Swift REST interface

Now after having covered the oci utility, let’s briefly look into what we can do out of the box, without the need to install anything. Yes, without installing anything you can leverage REST endpoints of the object storage service. All you need to know is your username/SWIFT password and your environment details, e.g. which region your uploading to, the account (tenant) and the target bucket. 

This is where the real fun starts, and this is where it can become geeky, so we will focus only on the two most important aspects of dealing with files and the object store: uploading and downloading files.

 

Understanding how to use Openstack Swift REST

File management with REST is equally simple than it is with the oci cli command. Similar to the setup of the oci cli, you have to know the basic information about your Cloud account, namely: 

  • a user in the cloud account that has the appropriate privileges to work with a bucket in your tenancy. This user also has to be configured to have a SWIFT password (see here how that is done).
  • a bucket in one of the object stores in a region (we are not going to discuss how to use REST to do this). The bucket/region defines the rest endpoint, for example if you are using the object store in Ashburn, VA, the endpoint is https://swiftobjectstorage.us-ashburn-1.oraclecloud.com)

The URI for accessing your bucket is built as follows:

<object store rest endpoint>/v1/<tenant name>/<bucket name>

In my case for the simple example it would be

https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/adwcacct/myFiles

If you have all this information you are set to upload and download files.

 

Uploading an object (file) with REST

Uploading a file is putting a file into the Cloud, so the REST command is a PUT. You also have to specify the file you want to upload and how the file should be named in the object store. With this information you can write a simple little shell script like the following that will take both the bucket and file name as input:

# usage: upload_oss.sh <file> <bucket>
file=$1
bucket=$2
 
curl -v -X PUT  \
-u 'jane.doe@acme.com:)#sdswrRYsi-In1-MhM.! '  \
 --upload-file ${file} \
 https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/adwcacct/${bucket}/${file}

So if you want to upload multiple files in a directory, similar to what we showed for the oci cli command, you just save this little script, say upload_oss.sh, and call it just like you called oci cli:

for i in `ls *.tbl`
do
  upload_oss.sh myFiles $i
done

 

Downloading an object (file) with REST 

While we expect you to upload data to the object store way more often than downloading information, let’s quickly cover that, too. So you want to get a file from the object store? Well, the REST command GET will do this for you. It is similarly intuitive than uploading, and you might be able to guess the complete syntax already. Yes, it is: 

curl -v -X GET  \
-u 'jane.doe@acme.com:)#sdswrRYsi-In1-MhM.!'  \
https://swiftobjectstorage.us-ashburn1.oraclecloud.com/v1/adwcacct/myFiles/myFileName \
--output myLocalFileName

That’s about all you need to get started uploading all your files to the Oracle Object Store so that you then can consume them from within the Autonomous Data Warehouse Cloud. 

Happy uploading!

Join the discussion

Comments ( 2 )
  • Raj Gupta Thursday, July 19, 2018
    Thanks for sharing the details. I am having few question.
    - If i am having data on on-prem VM and used OCI CLI to upload data on object storage, how much security is available as it will use public internet an also what about the protocol http or https?

    - My requirement is to upload huge amount of data from on prem to object storage and then download the same inside cloud vm. Please suggest best practice


    Thanks
  • Hermann Baer Tuesday, July 24, 2018
    Hi Raj -

    oci cli is using REST under the covers and is always using https as protocol. So even while your communication is going over the Internet, the data is safe. Just as safe as when you are accessing your bank account online ;-)

    Note also that if you want to bring the security to the next level, you can also integrate it with your VCN to make a bucket private.

    See https://cloud.oracle.com/storage/object-storage/faq for more details and answers to the most common questions

    Now to the second part: if the data volume is large then you should consider Oracle's data transfer options. For really large data volumes snail mail is still the fastest. See https://cloud.oracle.com/storage/data-transfer/features for more details.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services