By Elena Sunshine, Sr. Principal Product Manager, and Jean-Rene Gauthier, Sr. Principal Product Data Scientist
On August 11, 2020, the Oracle Cloud Infrastructure Data Science service released an upgrade to the notebook session environment. Oracle Cloud Infrastructure Data Science is a serverless, fully managed platform for data science teams to build, train, and manage machine learning models using Oracle Cloud Infrastructure. In this post, we will review the major changes to the notebook session environment in this new release.
As with previous upgrades to the notebook sessions, these changes do not apply to currently running notebook sessions. If you want to uptake the notebook upgrade, just create a new or reactivate your existing notebook session.
Subscribe to the Oracle AI & Data Science Newsletter to get the latest AI, ML, and data science content sent straight to your inbox!
From time to time, data scientists will want to access Oracle Cloud Infrastructure resources outside of their notebook session in order to accomplish a step of their model development lifecycle. For example, while using the Data Science service, you might want to:
Up until today, users were required to add configuration and key files to their ~/.oci directory in order to authenticate as their own Oracle Cloud Infrastructure IAM user. Now, Oracle Cloud Infrastructure Data Science enables you to authenticate using your notebook session's resource principal to access other Oracle Cloud Infrastructure resources. When compared to using the Oracle Cloud Infrastructure configuration and key files approach, using resource principals provides a more secure and easy-to-use method to authenticate to resources.
To learn more about how to use resource principals in your notebook sessions, see the documentation.
In this release, we included accumulated local effects (ALEs) as a new model explanation diagnostic in MLX, our machine learning explainability library. ALEs are global explainers and just like partial dependence plot (PDPs), ALEs describe how feature values influence the predictions of machine learning models. In a nutshell, the difference between PDPs and ALEs lies in the way that the marginal expectation is computed. In the case of PDPs, the expectation value is taken over the marginal distribution of the feature values. In contrast, ALEs take the expectation value over the conditional distribution of the features. This ensures that unlikely combinations of feature values are weighted down compared to more likely scenarios. Consequently this makes ALEs an unbiased measure of feature impact on model predictions.
ALEs also differ from PDPs in how the feature influence is measured. While PDPs directly average the model predictions across all data points, ALEs instead compute the prediction gradient over a small interval of the feature of interest.
You can access ALEs through the mlx module of ADS:
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
explainer_classification = ADSExplainer(<your-test-dataset>, <your-model>, training_data=<your-training-dataset>)
global_explainer_classification = explainer_classification.global_explanation(provider=MLXGlobalExplainer())
ale = global_explainer_classification.compute_accumulated_local_effects("<your-feature>")
ale.show_in_notebook(labels=True)
We have included two new notebook examples, mlx_ale.ipynb and mlx_pdp_vs_ale.ipynb, to walk you through what ALEs are and the pros and cons of their applicability.
We also added a new “What-if” diagnostic to MLX in this release. The purpose of What-if is to understand how changing feature values in either one example or an entire dataset impacts model predictions.
You can access the new what-if explainer from the mlx module of ADS:
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_whatif_explainer import MLXWhatIfExplainer
explainer = ADSExplainer(<your-data>, <your-model>, training_data=<your-train-dataset>)
whatif_explainer = explainer.whatif_explanation(provider=MLXWhatIfExplainer())
The explore_sample() feature allows you to interactively change the features values of a single example and observe the impact on the model predictions.
You can also take advantage of the Predictions Explorer tool which allows you to explore model predictions across either the marginal distribution (1-feature) or the joint distribution (2-features) of feature values:
whatif_explainer.explore_predictions(x=’<your-feature>')
We have included a new notebook example (mlx_whatif.ipynb) to go over the new What-if scenario feature.
In addition to resource principals and new model explanation diagnostics, we also upgraded the OCI Python SDK that comes with the notebook VM image (version 2.18.1) as well as git (version 2.27.0).
We also made improvements to the content of the model artifact that is generated by ADS via either the ADS prepare_generic_model() interface or the ADSModel.from_estimator() approach:
Several bugs found in ADS were fixed in this release including:
We invite you to read the release notes of ADS to get a comprehensive list of all the bugs that were.
• Visit our service documentation.
• Our tutorials
• (Oracle Internal) Our slack channel #oci_datascience_users
• Our YouTube Playlist
To learn more, visit the Oracle Data Science page, and follow us on Twitter @OracleDataSci.