Introduction

The Data Studio data loading tools in the Oracle Autonomous Database let you quickly load data using a graphical interface, no coding required. If you have not yet discovered the data loading tools, you can read about them in my post Data Loading Made Easy for Oracle Autonomous Database.

The REST APIs used by the data loading tools are documented in the REST Data Services API, allowing developers to write their own data loading programs. This post will walk through an example of loading data from the Oracle Cloud Store into a local table in the Oracle Autonomous Database using the REST APIs. This post will use data from the MovieStream demonstration data set. 

Recipe for a Data Load Job

This post will describe the most common steps to load data from cloud storage to a local table in the Oracle Autonomous Database. These steps include:

  1. Create a Cloud Store Location if one does not already exist.
  2. Survey the data to understand the data in the source files.
  3. Run a data load job.
  4. Monitor the data load job.

Calling REST APIs using CURL

This post calls REST API calling using CURL in a terminal window.  The APIs return data as unformatted JSON.  jq can be used to format the JSON by adding  | jq  to the end of a CURL command if jq is installed on the computer.  Search the internet to learn how to install jq.  Alternatively, the JSON can be formatted using an online tool.

Sample Data

The sample data includes a small number of CSV files on the Oracle Cloud Infrastructure object store. Past the following URI into a browser to view a listing of the files.

https://objectstorage.us-ashburn-1.oraclecloud.com/n/c4u04/b/moviestream_landing/o/

Preparing the Oracle Database User

The Oracle Database user must be granted the DW_ROLE role and be REST enabled to use REST APIs and any applications that use the APIs.

The Oracle Database ADMIN user can enable the user by running the following commands in SQL Worksheet.

GRANT dwrole TO username;
BEGIN
    ords.enable (p_schema => username);
END;
/

The Database Actions Database Users tool can also grant these privileges. If the Database Users tool is used, allow the user Web Access and choose both the Granted and Default options for the DWROLE role.

Constructing Endpoints

The base URL is easly found by logging into the Database Actions toolset.  For example:

https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_sdw/

The base URL includes everything up to, but not including _sdw/. For example:

https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/

Server and Username

The examples in this post use yourserver.adb.youregion.oraclecloudapps.com as the server and your_username as the database username. Substitute values appropriate for your Autonomous Database instance.

Check for Existing Cloud Store Locations

The examples in this post will use curl run from a terminal window.

Use the cloud-storage-links GET endpoint to get existing Cloud Storage Locations.

curl -s -X GET -H 'accept: application/json' -u your_username:Welcome#1234 https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/cloud-storage-links/

The “items” array is empty. There are no existing Cloud Storage Locations.

"items":[ ]

Create the Cloud Storage Location

A Cloud Storage location is an object that includes everything needs to access data in a cloud store bucket.  Cloud Store Locations are created before data loading.  The Cloud Storage Location includes:

  • The location URI of the cloud store bucket.
  • A credential, if one is required.
  • Descriptive information.

A credential is unnecessary in this example because the location URI is PUBLIC (that is, does not require authentication).

Use the cloud-storage-links POST endpoint to create a new Cloud Storage Location.

curl -s -X POST -u your_username:Welcome#1234 -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/cloud-storage-links/ \
  -d '{
   "cloud_storage_links":[
      {
         "storage_link_name":"MOVIESTREAM_SALES_DATA",
         "storage_link_description":"MovieStream Sales Data",
         "uri":"https://objectstorage.us-ashburn-1.oraclecloud.com/n/c4u04/b/moviestream_landing/o/"
      }
   ]
}'

This endpoint returns a message indicating that the Cloud Store Location was created:

{"status":201,"cloud_storage_links":[{"storage_link_name":"MOVIESTREAM_SALES_DATA"}]}

Once again, use the cloud-storage-links GET endpoint to get a listing of Cloud Storage Locations.

curl -s -X GET -H 'accept: application/json' -u your_username:Welcome#1234 https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/cloud-storage-links/

The “items” array includes the MOVIESTREAM_SALES_DATA Cloud Store Location. (Cloud Store Location in the Data Studio Data Load from Cloud Store tool is synonymous with storage_link_name in the metadata.)

"items":[
      {
         "owner":"DATA_STUDIO",
         "storage_link_name":"MOVIESTREAM_SALES_DATA",
         "uri":"https://objectstorage.us-ashburn-1.oraclecloud.com/n/c4u04/b/moviestream_landing/o/",
         "metadata_path":"\"STORAGE_LINK\".\"MOVIESTREAM_SALES_DATA\"",
         "credential_name":null,
         "metadata":"{\"storageType\":\"ObjectStore\",\"description\":\"Moviestream Gold Bucket\"}",
         "created":"2023-04-13T19:23:31.249Z",
         "updated":"2023-04-13T19:23:31.249Z",
         "links":[
            {
               "rel":"self",
               "href":"https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/cloud-storage-links/DATA_STUDIO,MOVIESTREAM_SALES_DATA"
            }
         ]
      }
   ]

Survey the Data Files

In the next step, survey the data. The database will examine the data and propose a data load specification. The Data Studio Data Load application and APIs are very good at reviewing the data files and recommending column names and data types. In most cases, you can accept the recommendations as is.

A sample of the data follows.

"DAY_ID","GENRE_ID","MOVIE_ID","CUST_ID","APP","DEVICE","OS","PAYMENT_METHOD","LIST_PRICE","DISCOUNT_TYPE","DISCOUNT_PERCENT","ACTUAL_PRICE"
"2019-01-01",20,2915,1282424,"firefox","mac","macos","none",0.0,"none",0.0,0.0
"2019-01-01",5,2262,1153996,"edge","pc","windows","none",0.0,"none",0.0,0.0
"2019-01-01",22,205,1171554,"edge","pc","windows","none",0.0,"none",0.0,0.0
"2019-01-01",8,3382,1076720,"safari","ipad","ios","none",0.0,"none",0.0,0.0
"2019-01-01",9,499,1293457,"firefox","pc","windows","none",0.0,"none",0.0,0.0

To view or download a data file, put this URL in a browser:

https://objectstorage.us-ashburn-1.oraclecloud.com/n/c4u04/b/moviestream_landing/o/custsales/custsales-2019-01.csv

The survey POST endpoint examines data in a file.  The survey endpoint accepts a folder or a specific file. For example:

"objects":[ { "object_name": "custsales/" }],

or

"objects":[ { "object_name": "custsales/custsales-2019-01.csv" }],

When a folder is used, all files in the folder should have the same structure and be loaded into the same table. One file is chosen to be examined.

This example uses a folder.  The response to the REST API call is stored in a shell variable for reuse with the data-load endpoint.

data_survey=$( curl -s -X POST \
    -u your_username:Welcome#1234 \
    https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/surveys/ \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{
    "data_loads": [
      {
        "storage_link_name": "MOVIESTREAM_SALES_DATA",
        "objects":[ { "object_name": "custsales/" }],
        "table_name":"MOVIE_SALES_DATA_TABLE"
      }
    ]
  }'
)

To view the data returned by the survey.

echo "${data_survey}"

The response follows.

{
   "status":201,
   "num_data_loads":1,
   "data_loads":[
      {
         "table_name":"MOVIE_SALES_DATA_TABLE",
         "storage_link_name":"MOVIESTREAM_SALES_DATA",
         "objects":[
            {
               "object_name":"custsales/"
            }
         ],
         "format":{
            "characterset":"AL32UTF8",
            "trimspaces":"lrtrim",
            "ignoreblanklines":false,
            "ignoremissingcolumns":true,
            "blankasnull":true,
            "escape":false,
            "skipheaders":1,
            "quote":"\"",
            "rejectlimit":10000000,
            "recorddelimiter":"",
            "delimiter":","
         },
         "columns":[
            {
               "column_name":"DAY_ID",
               "field_name":"DAY_ID",
               "data_type":"DATE",
               "data_format":"YYYY-MM-DD",
               "column_id":1
            },
            {
               "column_name":"GENRE_ID",
               "field_name":"GENRE_ID",
               "data_type":"NUMBER",
               "column_id":2
            },
            {
               "column_name":"MOVIE_ID",
               "field_name":"MOVIE_ID",
               "data_type":"NUMBER",
               "column_id":3
            },
            {
               "column_name":"CUST_ID",
               "field_name":"CUST_ID",
               "data_type":"NUMBER",
               "column_id":4
            },
            {
               "column_name":"APP",
               "field_name":"APP",
               "data_type":"VARCHAR2",
               "data_length":4000,
               "column_id":5
            },
            {
               "column_name":"DEVICE",
               "field_name":"DEVICE",
               "data_type":"VARCHAR2",
               "data_length":4000,
               "column_id":6
            },
            {
               "column_name":"OS",
               "field_name":"OS",
               "data_type":"VARCHAR2",
               "data_length":4000,
               "column_id":7
            },
            {
               "column_name":"PAYMENT_METHOD",
               "field_name":"PAYMENT_METHOD",
               "data_type":"VARCHAR2",
               "data_length":4000,
               "column_id":8
            },
            {
               "column_name":"LIST_PRICE",
               "field_name":"LIST_PRICE",
               "data_type":"NUMBER",
               "column_id":9
            },
            {
               "column_name":"DISCOUNT_TYPE",
               "field_name":"DISCOUNT_TYPE",
               "data_type":"VARCHAR2",
               "data_length":4000,
               "column_id":10
            },
            {
               "column_name":"DISCOUNT_PERCENT",
               "field_name":"DISCOUNT_PERCENT",
               "data_type":"NUMBER",
               "column_id":11
            },
            {
               "column_name":"ACTUAL_PRICE",
               "field_name":"ACTUAL_PRICE",
               "data_type":"NUMBER",
               "column_id":12
            }
         ]
      }
   ]
}

Run the Data Load Job

Use the data-loads POST endpoint to run the data-load job. Note the survey is passed using the data_survey shell variable. A job starts on the Database to load the data.

curl -s -X POST \
    -u your_username:Welcome#1234 \
    https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/data-loads/ \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d "${data_survey}"

The data load job is started.

{"data_loads":[{"job_id":18666}],"status":201,"num_data_loads":1}

Get Job Progress

The progress GET endpoint returns the progress of the data load job. Note that the username and job id are included in this GET.

curl -s -X GET \
    -u your_username:Welcome#1234 \
    https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/data-loads/data_studio,18666/progress 

In this example, the job is being prepared to load data.

    "items":[
       {
          "owner":"DATA_STUDIO",
          "data_load_job_id":"18685",
          "table_name":"MOVIE_SALES_DATA_TABLE",
          "rows_loaded":null,
          "rows_rejected":null,
          "rows_total":null,
          "estimated_total_rows":null,
          "percentage_completed":null,
          "status":"preparing",
          "time_start":"2023-04-14T13:19:48.802771000+00:00",
          "time_completion":null,
          "exception_details":null,
          "last_event":"COPYDATA_START",
          "time_of_last_event":"2023-04-14T13:19:49.109890000+00:00",
          "links":[
             {
                "rel":"self",
                "href":"https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/data-loads/data_studio,18685/DATA_STUDIO,18685"
             }
          ]

In this example, the job has been completed as indicated by “status”:”complete”;

    "items":[
       {
          "owner":"DATA_STUDIO",
          "data_load_job_id":"18685",
          "table_name":"MOVIE_SALES_DATA_TABLE",
          "rows_loaded":"25037621",
          "rows_rejected":"0",
          "rows_total":"25037621",
          "estimated_total_rows":null,
          "percentage_completed":null,
          "status":"complete",
          "time_start":"2023-04-14T13:19:48.802771000+00:00",
          "time_completion":"2023-04-14T13:21:13.323876000+00:00",
          "exception_details":null,
          "last_event":"DIRECTIVE_END",
          "time_of_last_event":null,
          "links":[
             {
                "rel":"self",
                "href":"https://yourserver.adb.youregion.oraclecloudapps.com/ords/data_studio/_/db-api/latest/data-tools/data-loads/data_studio,18685/DATA_STUDIO,18685"
             }
          ]

Verifying the Data

The table can be verified by querying it in SQL Worksheet.

SELECT * FROM MOVIE_SALES_FACT_DATA;

Conclusion

Using the REST APIs for data loading the same APIs used by the Data Studio Data Load from Cloud Store tool can be use to load data quickly and reliably in custom applications.  This post provided a recipe for loading data immediately.  APIs for Live Feed are also available.  Live Feed supports the ability to run a data load on an event such as a new file being added to the Object Store.