Oracle Analytics Cloud (OAC) has an exciting feature for Machine Learning: the OAC Data Science integration. This feature allows OAC to directly consume Oracle Cloud Infrastructure (OCI) Data Science models within OAC Data Flows. This makes OAC a perfect environment for you to seamlessly invoke and apply machine learning models on your own datasets with the click of a button.

Data scientists analyze, prepare, explore, and visualize data to build accurate machine learning models.  They often use programming languages such as Python for this work, and to deploy models into applications. OAC enables you to immediately consume these models, providing a friendly environment that helps you make the most of powerful ML models so you can immediately extract value from them.

In this blog, you will learn how to register an OCI Data Science model in OAC, with the following sections:

  • Overview of OCI Data Science
  • Prerequisites
    • OCI Connection
    • OCI Policies
    • Data Science ML Model in OCI
  • Register a Data Science Model
    1. Select an OCI Connection
    2. Select a Compartment & Project
    3. Select a Model
    4. Resource Parameters
    5. Inspect the Model
    6. Modify Resource Parameters

OCI Data Science

OCI Data Science enables data science teams to build and evaluate machine learning (ML) models. It provides ML capabilities for examining data and extracting model results, and it simplifies data access across different formats and sources. OCI Data Science provides a collaborative and project-driven workspace for data scientists via its self-service, serverless platform for data science workloads. OCI Data Science includes Python-centric tools, libraries, and packages developed by the open source community and the Oracle Accelerated Data Science Library, which supports the end-to-end lifecycle of ML models. OCI Data Science integrates with the rest of the OCI stack, including Functions, Autonomous Data Warehouse, Object Storage, etc. Data Science jobs enable you to define and run repeatable machine learning tasks on fully-managed infrastructure. It helps data scientists concentrate on methodology and domain expertise to deliver models to production.

Please refer to Overview of OCI Data Science for ways to access OCI Data Science, available regions, limits on resources and other key concepts.

Prerequisites

The following are the prerequisites to register and apply a Data Science machine learning model in OAC.

OCI Connection

Your OCI connection stores relevant credentials and connection information that will be used by OAC to access various OCI services such as Functions, Vision Services, Data Science, etc. To create a new OCI connection in OAC, click Create > Connection in OAC and select OCI Resource.

oci conn

The Create Connection Dialog opens. Select the OCI instance region.

Copy the Tenancy OCID and User OCID from OCI Settings and supply them in the Create Connection dialog. Please use the OCI tenancy where you have saved the Data Science model in the Model Catalog.

conn1

conn1  conn1

Then generate and copy the API key from the dialog and paste it into Add API Keys under User Settings in OCI and click Add.

oci conn

Now, you can save the OCI Connection in OAC. The saved connection is listed under Connection tab on the Data page.

conn1

OCI Policies

Prerequisites for the OCI Data Science integration are setting both compartment access and required policies. The OCI user needs to have read, write and delete permissions to the compartment used in the integration process, which will be explained in detail in this page.

The following policies must be set in OCI for the user group that the connecting user is part of. Please note these are the most basic access levels needed to operate the OCI Data Science integration successfully. Access at higher privileges than this will automatically work.

Policies

allow group <group_name> to read data-science-projects in compartment <compartment_name>
allow group <group_name> to read data-science-models in compartment <compartment_name>
allow group <group_name> to manage data-science-jobs in compartment <compartment_name>
allow group <group_name> to inspect instance-family in compartment <compartment_name>
allow group <group_name> to manage data-science-job-runs in compartment <compartment_name>
allow group <group_name> to read objectstorage-namespaces in <compartment_name>
allow group <group_name> to read buckets in compartment <compartment_name>
allow group <group_name> to manage objects in compartment <compartment_name> where target.bucket.name='<staging_bucket_name>'
allow group <group_name> to inspect virtual-network-family in compartment <compartment_name>
allow service datascience to use virtual-network-family in compartment <compartment_name>
allow group <group_name> to manage log-groups in compartment <compartment_name>
allow dynamic-group <dynamic-group_name> to read data-science-models in compartment <compartment_name>
allow dynamic-group <dynamic-group_name> to manage objects in compartment <compartment_name> where all { target.bucket.name='<staging_bucket_name>', any {request.permission='OBJECT_CREATE', request.permission='OBJECT_READ'}}
allow dynamic-group <dynamic-group_name> to use log-content in compartment <compartment_name>

<dynamic-group_name> is the name of the dynamic group that is defined by the following rule:

             all { resource.type='datasciencejobrun', resource.compartment.id='<compartment_id>' }

<compartment_id> is the OCID of the compartment that contains the Data Science model.

Data Science ML model in OCI

Please refer to the blog entry Create a Data Science Model for OAC to create a binary classification model in OCI which can be consumed in OAC.

Register a Data Science Model

When a Data Science model is registered in OAC, a new Data Science Job Run is created in the background in OCI.

For this, use the Binary classification model you created with Employee Attrition dataset.

emp attr

Now you will see how to register a Data Science Model in Oracle Analytics.

From the OAC action menu, click Register Model/Function > Machine Learning Models.

reg1

1. Select an OCI Connection

Select the OCI Connection from the list of connections.

reg1

2. Select a Compartment & Project

Select a project.

reg1

3. Select a Model

Select a model and provide a name for it, then click Next.

reg1

4. Resource Parameters

Enter resource parameters like the staging bucket, compute shapes, VCN, Subnet and Log compartments and click Register.

reg1

The following resource parameters are required to configure an OCI Data Science job.

Staging Bucket Compartment – Staging bucket’s compartment name.
Staging Bucket – Name is required for data transfer. It is used to securely store temporary input and output data while invoking the model from OAC Dataflows.
Compute Shape – Compute shape is the virtual machine configuration, required for Data Science job creation. 
OCPUs – Number of OCPUs to be configured. Required only if the Compute shape is a Flex shape. 
Memory (GB) – Configures the memory of the Compute shape. Required only if the Compute shape is a Flex shape.
Storage (GB) – Size of block storage required for the Data Science job creation.
Use Default Networking – Checkbox to select the default option for networking configuration. With this option, you can skip creating a network and setting up subnets and gateways. If you use the default network configuration, you can’t access or modify the provided default network for other purposes.
VCN Compartment – Required only when Use Default Networking is not checked.
VCN – The VCN created in OCI. Required only when Use Default Networking is not checked.
Subnet Compartment –  Required only when Use Default Networking is not checked.
Subnet – Subnet created in OCI. Required only when Use Default Networking is not checked.
Enable Logging – Option to enable logging in OCI Data Science.
Log Group Compartment – Compartment name to be used to write logs in OCI. Required only when logging is enabled.
Log Group – Log group created to write logs in OCI. Required only when logging is enabled.

These resource parameters can be modified in the Inspect dialog. 

The registered models can be found in the Models tab under Machine Learning page.

reg1

5. Inspect the Model

The General tab on Inspect provides information like name, description, connection, etc. 

reg1

The Details tab provides information about the Model, Input, Output Columns and Parameters.

reg1

6. Modify Resource Parameters

The Resources tab displays the resource parameters of the model, and it also allows the user to edit resource parameters. However, you can’t change default networking in this dialog.

reg1

This dialog also provides the Log Name (if logging is enabled) and Job Name which can be used to analyze/debug in OCI.

Now the Data Science job run is created with the specified resource parameters and configurations. Registering Data Science models in OAC is a one-time activity. Successfully registered Data Science models can be invoked multiple times from OAC using data flows.

Summary

In this blog, you have learned how to register a Data Science model in OAC so that it can be consumed using data flows. As a next step, learn how to apply the model using OAC data flows. Please refer to the blog Invoke a Data Science Model from OAC for detailed steps.