Feature Highlight: Using Resource Principals in the Data Science service

August 19, 2020 | 4 minute read
Text Size 100%:

By Elena Sunshine, Sr Principal Product Manager

From time to time, data scientists will want to access Oracle Cloud Infrastructure resources outside of their Data Science workload (such as a notebook session) in order to accomplish a step of their model development lifecycle. For example, while using the Data Science service, you might want to:

  • Access the Data Science model catalog to save or load models.
  • List Data Science projects.
  • Access data from an Object Storage bucket, perform some operation on the data, and then write the modified data back to the Object Storage bucket.
  • Create and run a Data Flow application to run a serverless Spark job, perhaps to perform large scale ETL.
  • Access your secrets stored in the Vault, perhaps to authenticate to a database.

When you are working within a Data Science service workload, you are operating as the Linux user datascience. This user does not have an Oracle Cloud Infrastructure Identity and Access Management (IAM) identity, so it has no access to the Oracle Cloud Infrastructure API (OCI API) which you would require in order to accomplish the above use cases.

Up until today, users were required to add configuration and key files to their ~/.oci directory in order to authenticate as their own IAM user. Now, Oracle Cloud Infrastructure Data Science enables you to authenticate using a resource principal to access other Oracle Cloud Infrastructure resources. When compared to using the Oracle Cloud Infrastructure configuration and key files approach, using resource principals provides a more secure and easy-to-use method to authenticate to resources.

A resource principal is a feature of IAM that enables resources to be authorized principal actors that can perform actions on service resources. Each resource has its own identity, and it authenticates using the certificates that are added to it. These certificates are automatically created, assigned to resources, and rotated, avoiding the need for you to upload and manage your own credentials.

You can authenticate to the OCI API with resource principals using the following interfaces:

import oci

from oci.data_science import DataScienceClient

rps = oci.auth.signers.get_resource_principals_signer()

dsc = DataScienceClient(config={}, signer=rps)

  • With the Oracle Cloud Infrastructure CLI, use the --auth=resource_principal flag with each command.

Now that we have covered the authentication mechanism and how to use it, let’s discuss how resource principals become authorized to access Oracle Cloud Infrastructure resources.

 

Subscribe to the Oracle AI & Data Science Newsletter to get the latest AI, ML, and data science content sent straight to your inbox! 

 

Prior to making a call using your resource principal, your tenancy administrator must write policies to grant permissions to your resource principal. Oracle Cloud Infrastructure IAM enables administrators to write policies for resource principals which are part of dynamic groups. Dynamic groups are created by administrators to contain resources (such as Data Science notebook sessions) that match rules that they define. Therefore, administrators need to complete two steps:

1. Create a dynamic group that contains the resource principals of your notebook sessions

To create a dynamic group, navigate to the Dynamic Groups page in the Identity service in the Oracle Cloud Infrastructure console. Click Create Dynamic Group, give it a name and a description, and add a matching rule to contain your notebook session resource principals.

  • If you want to create a dynamic group for all notebook session resource principals in your tenancy, use ALL {resource.type = 'datasciencenotebooksession'}
  • If you want to create a dynamic group for all notebook session resource principals in a specific compartment, use ALL {resource.type = 'datasciencenotebooksession', resource.compartment.id = '<compartment-ocid>'}
  • You can also create dynamic groups for specific notebook session IDs or for notebook sessions associated with specific tags.

2. Write policy statements for that dynamic group to enable access to Oracle Cloud Infrastructure resources

To write policies for your dynamic group, navigate to the Policies page in the Identity service in the Oracle Cloud Infrastructure console. Click Create Policy, give it a name and a description, and write the following policy statements:

  • To access the Data Science model catalog to save or load models as well as list Data Science projects: allow dynamic-group <dynamic-group-name> to manage data-science-family in compartment <compartment-name>.
  • To access data from an Object Storage bucket, perform some operation on the data, and then write the modified data back to the Object Storage bucket: allow dynamic-group <dynamic-group-name> to manage object-family in compartment <compartment-name>.
  • To create and run a Data Flow application to run a serverless Spark job, perhaps to perform large scale ETL: allow dynamic-group <dynamic-group-name> to manage dataflow-family in compartment <compartment-name>.
  • To access your secrets stored in the Vault, perhaps to authenticate to a database: allow dynamic-group <dynamic-group-name> to manage secrets-family in compartment <compartment-name>.

The bottom line: When you authenticate using resource principals, you no longer need to create and manage you own configuration file or key pairs in your notebook session. The Data Science service makes resource principals readily available to you and secures resource principals for you.

One more final thought: Please note that if you don't explicitly use resource principals when invoking the SDKs or CLI, they use the configuration and key files approach by default. We have kept this approach as the default to avoid breaking changes to your current code. However, we plan to move over time to set the resource principal as the default mechanism in the future. We will announce this change as it is planned in this blog feed.

To learn more about Oracle's data science solutions, visit the Oracle Data Science page, and follow us on Twitter @OracleDataSci

Guest Author


Previous Post

Announcement: Resource Principals and other Improvements to Oracle Cloud Infrastructure Data Science Now Available

Jean-Rene Gauthier | 5 min read

Next Post


3 Ways to Apply Emerging Technology to Your Company

Emma Hitzke | 4 min read
Oracle Chatbot
Disconnected