Unlock the Power of Your Data: Seamlessly Create and Refresh Datasets from Files Stored in OCI Object Storage

May 31, 2024 | 5 minute read
Luis Rivas
Director of Product Management - Oracle Analytics
Text Size 100%:

In this post, I provide an overview and examples of creating datasets from files stored in Oracle Cloud Infrastructure (OCI) Object Storage. I also explore the new capabilities with centralized file storage that enables the ability to schedule periodic reloads of the data stored in Object Storage.  I review the creation of an OCI Resource connection including the creation of a dataset using the new connection, using the new UI to search and navigate the compartments, buckets, and objects, and creating datasets based on some files.  Finally, I describe how you can manually reload or schedule a periodic reload of the dataset by updating files in Object Storage.

What Is OCI Object Storage?

OCI Object Storage enables you to securely store any type of data in its native format. With built-in redundancy, OCI Object Storage is ideal for building modern applications that require scale and flexibility, because it can be used to consolidate multiple data sources for analytics, backup, or archive purposes. For more detailed information about OCI Object Storage, visit this page: OCI Object Storage.

Creating an OCI Resource Connection

In order to access files stored in OCI Object Storage, you first create an OCI Resource Connection using an API Key.  This connection is the same type of connection required for connecting Oracle Analytics to OCI functions and OCI models such as Vision and Language.  For full instructions on how to create an OCI Resource Connection, visit this page: Create a Connection to Your OCI Tenancy

OCI Connection

 

Creating Datasets from Files in OCI Object Storage

Once a connection has been successfully created, you can start the process of creating datasets from files in the OCI Object Storage buckets.  Start the process by creating a dataset by clicking Create Dataset from the home page. Notice that the OCI Resource Connection is displayed as one of the data sources in the Create Dataset dialog.

Creating a dataset from OCI Object Storage

 

Region Selection

After selecting the OCI Connection, use the dialog to change the default region if necessary, and to easily search for the right compartments, buckets, and objects that could include folders, subfolders, and files.  Review the default region and change if necessary with the drop-down list.

Selecting a Region

 

Navigating and Searching Compartments

After selecting or keeping the default region, either manually navigate or enter a full or partial search string to search all the compartments. The search results are filtered to display only those compartments that meet the search criteria. The search is a wildcard, case-insensitive search.

Navigating and searching compartments

 

Navigating and Searching Buckets

After clicking the correct compartment where the buckets reside, do the same type of navigation or wildcard search on the buckets.  After locating the bucket where the files reside, click it and notice that all the objects in the selected bucket are displayed on the right-hand panel.

Navigating and searching buckets

 

Navigating and Searching Objects and Selecting a File

Again, manually navigate the objects in the bucket, which can consist of folders, subfolders, and files, or perform a wildcard case-insensitive search.  After locating the file to import into the dataset, click OK.  The system imports the file into OAC and provides a preview of the contents of the file for review.  After the review, click OK to bring the file into the Dataset Editor, where a representative sample is extracted and the deep semantic profile is triggered and the results are displayed in the form of the Data Quality Insights for the contents of the file.

Navigating and searching objects and selecting a file

 

Adding Another File from the Same OCI Connection

After creating the table based on the file from the OCI Object Storage bucket, notice that the connection (My OCI Connection), the resource (OCI Object Storage), and the imported file are listed in the left-hand panel.  To add another file from the same connection, click the icon to the right of the resource.  After clicking that icon, the navigation dialog is displayed again, and you can drill into the bucket again to get the second file. Add as many files as you need and join them to create the dataset.  You can also join files from OCI Object Storage with database tables and other files.

Adding another file from the same OCI Resource Connection

 

Extract Credit – Scheduling a Dataset Reload

One of the really cool advantages of creating datasets from files in OCI Object Storage buckets is that you can create a recurring workflow where an upstream process can place updated files of the same name in the same bucket periodically. You can then schedule dataset refreshes to automatically update the data in cache from those updated files. You can set up either a one-time or a recurring schedule. Additionally, you can check the details of a schedule to see the last run time and the next scheduled run. This process and capability provides a way to update visualizations with the latest data from datasets based on files.  Find full instructions on how to create a one-time or recurring schedule here: Work with Dataset Reload Schedules

Creating a reload schedule

 

Call to Action

We hope you've enjoyed this overview of creating datasets from files in OCI Object Storage buckets! And we challenge you to start creating datasets from your files stored in buckets and hope that you find them to be both powerful and user-friendly. Keep exploring the powerful world of self-service data modeling and stay tuned for our upcoming blog posts, where we'll share more tips and tricks on both new and existing features of our product.

Luis Rivas

Director of Product Management - Oracle Analytics


Previous Post

Incremental Migration of Catalog Objects in Oracle Analytics

Amarpreet Nagra | 4 min read

Next Post


Building Models with OML4R and Deploying them in Oracle Analytics Cloud

Ravi Bhuma | 6 min read