So you got your first service instance of your autonomous data warehouse set up, you experienced the performance of the environment using the sample data, went through all tutorials and videos and are getting ready to rock-n-roll. But the one thing you’re not sure about is this Object Store. Yes, you used it successfully as described in the tutorial, but what’s next?. And what else is there to know about the Object Store?
First and foremost, if you are interested in understanding a bit more about what this Object Store is, you should read the following blog post from Rachna, the Product Manager for the Object Store among other things. It introduces the Object Store, how to set it up and manage files with the UI, plus a couple of simple command line examples (don’t get confused by the term ‘BMC’, that’s the old name of Oracle’s Cloud Infrastructure; that’s true for the command line utility as well, which is now called oci). You should read that blog post to get familiar with the basic concepts of the Object Store and a cloud account (tenant).
The documentation and blog posts are great, but now you actually want to do use it to load data into ADWC. This means loading more (and larger) files, more need for automation, and more flexibility. This post will focus on exactly that: to become productive with command line utilities without being a developer, and to leverage the power of the Oracle Object Store to upload more files in one go and even how to upload larger files in parallel without any major effort.
The blog post will cover both:
The Oracle oci command line interface (CLI) is a tool that enables you to work with Oracle Cloud Infrastructure objects and services. It’s a thin layer on top of the oci APIs (typically REST) and one of Oracle’s open source project (the source code is on GitHub).
Let’s quickly step through what you have to do for using this CLI. If you do not want to install anything, that is fine, too. In that case feel free to jump to the REST section in this post right away, but you’re going to miss out on some cool stuff that the CLI provides you out of the box.
To get going with the utility is really simple, as simple as one-two-three
The focus of the command line interface is on ease-of-use and to make its usage as self-explaining as possible, with a comprehensive built-in help system in the utility. Whenever you want to know something without looking around, use the --help, -h, or -? Syntax for a command, irrespective of how many parameters you have already entered. So you can start with oci -h and let the utility guide you.
For the purpose of file management the important category is the object store category, with the main tasks of:
That’s what we are going to do now.
Buckets are containers that store objects (files). Like other resources, buckets belong to a compartment, a collection of resources in the Cloud that can be used as entity for privilege management. To create a bucket you have to know the compartment id to create a bucket. That is the only time we have to deal with this cloud-specific unique identifiers. All other object (file) operations use names.
So let’s create a bucket. The following creates a bucket named myFiles in my account ADWCACCT in a compartment given to me by the Cloud administrator.
$ oci os bucket create --compartment-id ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq --namespace-name adwcaact --name myFiles
{
"data": {
"compartment-id": "ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq",
"created-by": "ocid1.user.oc1..aaaaaaaaomoqtk3z7y43543cdvexq3y733pb5qsuefcbmj2n5c6ftoi7zygq",
"etag": "c6119bd6-98b6-4520-a05b-26d5472ea444",
"metadata": {},
"name": "myFiles",
"namespace": "adwcaact",
"public-access-type": "NoPublicAccess",
"storage-tier": "Standard",
"time-created": "2018-02-26T22:16:30.362000+00:00"
},
"etag": "c6119bd6-98b6-4520-a05b-26d5472ea733"
}
The operation returns with the metadata of the bucket after successful creation. We’re ready to upload and manage files in the object store.
You can upload a single file very easily with the oci command line interface. And, as promised before, you do not even have to remember any ocid in this case … .
$ oci os object put --namespace adwcacct --bucket-name myFiles --file /stage/supplier.tbl
Uploading object [####################################] 100%
{
"etag": "662649262F5BC72CE053C210C10A4D1D",
"last-modified": "Mon, 26 Feb 2018 22:50:46 GMT",
"opc-content-md5": "8irNoabnPldUt72FAl1nvw=="
}
After successful upload you can check the md5 sum of the file; that’s basically the fingerprint that the data on the other side (in the cloud) is not corrupt and the same than local (on the machine where the data is coming from). The only “gotcha” is that OCI is using base64 encoding, so you cannot just do a simple md5. The following command solves this for me on my Mac:
$ openssl dgst -md5 -binary supplier.tbl |openssl enc -base64
8irNoabnPldUt72FAl1nvw==
Now that’s a good start. I can use this command in any shell program, like the following which loads all files in a folder sequentially to the object store:
for i in `ls *.tbl`
do
oci os object put --namespace adwcacct --bucket-name myFiles --file $i
done
You can write it to load multiple files in parallel, load only files that match a specific name pattern, etc. You get the idea. Whatever you can do with a shell you can do.
Alternatively, if it's just about loading all the files in you can achieve the same with the oci cli as well by using its bulk upload capabilities. The following shows briefly
oci os object bulk-upload -ns adwcacct -bn myFiles --src-dir /MyStagedFiles
{
"skipped-objects": [],
"upload-failures": {},
"uploaded-objects": {
"chan_v3.dat": {
"etag": "674EFB90B1A3CECAE053C210D10AC9D9",
"last-modified": "Tue, 13 Mar 2018 17:43:28 GMT",
"opc-content-md5": "/t4LbeOiCz61+Onzi/h+8w=="
},
"coun_v3.dat": {
"etag": "674FB97D50C34E48E053C230C10A1DF8",
"last-modified": "Tue, 13 Mar 2018 17:43:28 GMT",
"opc-content-md5": "sftu7G5+bgXW8NEYjFNCnQ=="
},
"cust1v3.dat": {
"etag": "674FB97D52274E48E053C210C10A1DF8",
"last-modified": "Tue, 13 Mar 2018 17:44:06 GMT",
"opc-content-md5": "Zv76q9e+NTJiyXU52FLYMA=="
},
"sale1v3.dat": {
"etag": "674FBF063F8C50ABE053C250C10AE3D3",
"last-modified": "Tue, 13 Mar 2018 17:44:52 GMT",
"opc-content-md5": "CNUtk7DJ5sETqV73Ag4Aeg=="
}
}
}
Ok, now we can load one or many files to the object store. But what do you do if you have a single large file that you want to get uploaded? The oci command line offers built-in multi-part loading where you do not need to split the file beforehand. The command line provides you built-in capabilities to (A) transparently split the file into sized parts and (B) to control the parallelism of the upload.
$ oci os object put -ns adwcacct -bn myFiles --file lo_aa.tbl --part-size 100 --parallel-upload-count 4
While the load is ongoing you can all in-progress uploads, but unfortunately without any progress bar or so; the progress bar is reserved for the initiating session:
$ oci os multipart list -ns adwcacct -bn myFiles
{
"data":
[
{
"bucket": "myFiles",
"namespace": "adwcacct",
"object": "lo_aa.tbl",
"time-created": "2018-02-27T01:19:47.439000+00:00",
"upload-id": "4f04f65d-324b-4b13-7e60-84596d0ef47f"
}
]
}
While a serial process for a single file gave me somewhere around 35 MB/sec to upload on average, the parallel load sped up things quite a bit, so it’s definitely cool functionality (note that your mileage will vary and is probably mostly dependent on your Internet/proxy connectivity and bandwidth.
If you’re interested in more details about how that works, here is a link from Rachna who explains the inner details of this functionality in more detail.
Now after having covered the oci utility, let’s briefly look into what we can do out of the box, without the need to install anything. Yes, without installing anything you can leverage REST endpoints of the object storage service. All you need to know is your username/SWIFT password and your environment details, e.g. which region your uploading to, the account (tenant) and the target bucket.
This is where the real fun starts, and this is where it can become geeky, so we will focus only on the two most important aspects of dealing with files and the object store: uploading and downloading files.
File management with REST is equally simple than it is with the oci cli command. Similar to the setup of the oci cli, you have to know the basic information about your Cloud account, namely:
The URI for accessing your bucket is built as follows:
<object store rest endpoint>/v1/<tenant name>/<bucket name>
In my case for the simple example it would be
https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/adwcacct/myFiles
If you have all this information you are set to upload and download files.
Uploading a file is putting a file into the Cloud, so the REST command is a PUT. You also have to specify the file you want to upload and how the file should be named in the object store. With this information you can write a simple little shell script like the following that will take both the bucket and file name as input:
# usage: upload_oss.sh <file> <bucket>
file=$1
bucket=$2
curl -v -X PUT \
-u 'jane.doe@acme.com:)#sdswrRYsi-In1-MhM.! ' \
--upload-file ${file} \
https://swiftobjectstorage.us-ashburn-1.oraclecloud.com/v1/adwcacct/${bucket}/${file}
So if you want to upload multiple files in a directory, similar to what we showed for the oci cli command, you just save this little script, say upload_oss.sh, and call it just like you called oci cli:
for i in `ls *.tbl`
do
upload_oss.sh myFiles $i
done
While we expect you to upload data to the object store way more often than downloading information, let’s quickly cover that, too. So you want to get a file from the object store? Well, the REST command GET will do this for you. It is similarly intuitive than uploading, and you might be able to guess the complete syntax already. Yes, it is:
curl -v -X GET \
-u 'jane.doe@acme.com:)#sdswrRYsi-In1-MhM.!' \
https://swiftobjectstorage.us-ashburn1.oraclecloud.com/v1/adwcacct/myFiles/myFileName \
--output myLocalFileName
That’s about all you need to get started uploading all your files to the Oracle Object Store so that you then can consume them from within the Autonomous Data Warehouse Cloud.
Happy uploading!
- If i am having data on on-prem VM and used OCI CLI to upload data on object storage, how much security is available as it will use public internet an also what about the protocol http or https?
- My requirement is to upload huge amount of data from on prem to object storage and then download the same inside cloud vm. Please suggest best practice
Thanks
oci cli is using REST under the covers and is always using https as protocol. So even while your communication is going over the Internet, the data is safe. Just as safe as when you are accessing your bank account online ;-)
Note also that if you want to bring the security to the next level, you can also integrate it with your VCN to make a bucket private.
See https://cloud.oracle.com/storage/object-storage/faq for more details and answers to the most common questions
Now to the second part: if the data volume is large then you should consider Oracle's data transfer options. For really large data volumes snail mail is still the fastest. See https://cloud.oracle.com/storage/data-transfer/features for more details.