X

Oracle AI & Data Science Blog
Learn AI, ML, and data science best practices

Announcement: Resource Principals and other Improvements to Oracle Cloud Infrastructure Data Science Now Available

Jean-Rene Gauthier, and Elena Sunshine

By Elena Sunshine, Sr. Principal Product Manager, and Jean-Rene Gauthier, Sr. Principal Product Data Scientist

On August 11, 2020, the Oracle Cloud Infrastructure Data Science service released an upgrade to the notebook session environment. Oracle Cloud Infrastructure Data Science is a serverless, fully managed platform for data science teams to build, train, and manage machine learning models using Oracle Cloud Infrastructure. In this post, we will review the major changes to the notebook session environment in this new release. 

As with previous upgrades to the notebook sessions, these changes do not apply to currently running notebook sessions. If you want to uptake the notebook upgrade, just create a new or reactivate your existing notebook session. 

Subscribe to the Oracle AI & Data Science Newsletter to get the latest AI, ML, and data science content sent straight to your inbox! 

 

Support for Resource Principals in Notebook Sessions 

From time to time, data scientists will want to access Oracle Cloud Infrastructure resources outside of their notebook session in order to accomplish a step of their model development lifecycle. For example, while using the Data Science service, you might want to:

  • Access the Data Science model catalog to save or load models.
  • List Data Science projects.
  • Access data from an Object Storage bucket, perform some operation on the data, and then write the modified data back to the Object Storage bucket.
  • Create and run a Data Flow application to run a serverless Spark job, perhaps to perform large scale ETL.
  • Access your secrets stored in the Vault, perhaps to authenticate to a database.

Up until today, users were required to add configuration and key files to their ~/.oci directory in order to authenticate as their own Oracle Cloud Infrastructure IAM user. Now, Oracle Cloud Infrastructure Data Science enables you to authenticate using your notebook session's resource principal to access other Oracle Cloud Infrastructure resources. When compared to using the Oracle Cloud Infrastructure configuration and key files approach, using resource principals provides a more secure and easy-to-use method to authenticate to resources.

To learn more about how to use resource principals in your notebook sessions, see the documentation.

 

New Accumulated Local Effects (ALEs) Diagnostic in MLX

In this release, we included accumulated local effects (ALEs) as a new model explanation diagnostic in MLX, our machine learning explainability library. ALEs are global explainers and just like partial dependence plot (PDPs), ALEs describe how feature values influence the predictions of machine learning models. In a nutshell, the difference between PDPs and ALEs lies in the way that the marginal expectation is computed. In the case of PDPs, the expectation value is taken over the marginal distribution of the feature values. In contrast, ALEs take the expectation value over the conditional distribution of the features. This ensures that unlikely combinations of feature values are weighted down compared to more likely scenarios. Consequently this makes ALEs an unbiased measure of feature impact on model predictions. 

ALEs also differ from PDPs in how the feature influence is measured. While PDPs directly average the model predictions across all data points, ALEs instead compute the prediction gradient over a small interval of the feature of interest. 

You can access ALEs through the mlx module of ADS: 

 

from ads.explanations.explainer import ADSExplainer

from ads.explanations.mlx_global_explainer import MLXGlobalExplainer

 

explainer_classification = ADSExplainer(<your-test-dataset>, <your-model>, training_data=<your-training-dataset>)

global_explainer_classification = explainer_classification.global_explanation(provider=MLXGlobalExplainer())

ale = global_explainer_classification.compute_accumulated_local_effects("<your-feature>")

ale.show_in_notebook(labels=True)

 

We have included two new notebook examples, mlx_ale.ipynb and mlx_pdp_vs_ale.ipynb, to walk you through what ALEs are and the pros and cons of their applicability. 

New Accumulated Local Effects (ALEs) Diagnostic in MLX

 

New "What-if" Scenario Diagnostic in MLX 

We also added a new “What-if” diagnostic to MLX in this release. The purpose of What-if is to understand how changing feature values in either one example or an entire dataset impacts model predictions. 

You can access the new what-if explainer from the mlx module of ADS: 

 

from ads.explanations.explainer import ADSExplainer

from ads.explanations.mlx_whatif_explainer import MLXWhatIfExplainer

 

explainer = ADSExplainer(<your-data>, <your-model>, training_data=<your-train-dataset>)

whatif_explainer = explainer.whatif_explanation(provider=MLXWhatIfExplainer())


The explore_sample() feature allows you to interactively change the features values of a single example and observe the impact on the model predictions. 

Explore_sample() feature what if scenario diagnostic

You can also take advantage of the Predictions Explorer tool which allows you to explore model predictions across either the marginal distribution (1-feature) or the joint distribution (2-features) of feature values: 

whatif_explainer.explore_predictions(x=’<your-feature>')

Predictions explorer marginal distribution

Predictions explorer joint distribution

We have included a new notebook example (mlx_whatif.ipynb) to go over the new What-if scenario feature.  


ADS Upgrades, Bug Fixes, and Minor Changes 

In addition to resource principals and new model explanation diagnostics, we also upgraded the OCI Python SDK that comes with the notebook VM image (version 2.18.1) as well as git (version 2.27.0). 

We also made improvements to the content of the model artifact that is generated by ADS via either the ADS prepare_generic_model() interface or the ADSModel.from_estimator() approach: 

  • we now generate the required artifact files for deployment to Oracle Functions by default
  • there is no fn-model/ directory in your artifact anymore. Everything is in the top-level directory of your artifact. 
  • the generated model artifact has now these five files at the minimum: 
    • func.py: Python script containing Oracle Functions handler() function definition. 
    • func.yaml: Runtime environment definition for Oracle Functions 
    • requirements.txt: Best guess estimate of the list of requirements needed to run your Oracle Function. 
    • runtime.yaml: A description of the training environment of your model. The file captures a comprehensive list of attributes of the notebook session in which the model was trained
    • score.py: The inference script containing both load_model() and predict(). 

Several bugs found in ADS were fixed in this release including: 

  • the correlation map calculation in the ADSDataset show_in_notebook() and show_corr() methods;

Correlation map ADS

  • the Data Flow client module ads.dataflow.dataflow;
  • progress bar indicator not completing in many ADS tasks;
  • and many more. 

We invite you to read the release notes of ADS to get a comprehensive list of all the bugs that were. 


Keep in Touch! 

•    Visit our service documentation
•    Our tutorials
•    (Oracle Internal) Our slack channel #oci_datascience_users 
•    Our YouTube Playlist

To learn more, visit the Oracle Data Science page, and follow us on Twitter @OracleDataSci
 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.