On January 13, 2021, the Oracle Cloud Infrastructure (OCI) Data Science service released a new feature called Conda Environments to the notebook session resource. This new feature includes a JupyterLab extension called the Environment Explorer, available through the JupyterLab Launcher tab, and a CLI tool called odsc conda available through the JupyterLab terminal window. These tools give you the capabilities to manage the lifecycle of Conda Environments in notebook sessions.
This is a major change to the notebook session resource that data scientists have been using since the OCI Data Science service was released in February 2020. Data scientists and machine learning (ML) engineers can now pick-and-choose which environments they want to install in their notebook sessions from a list of pre-built ones or create, install, and publish their own environments.
In this post, I give an overview of the new Conda Environments feature set.
You can think of a Conda Environment as somewhere between a Docker image and a Python virtual environment. Conda is like a virtual environment that lets you run Python processes in different environments with different versions of the same library. It’s more powerful than virtualenv, because it also manages different versions of Python that aren’t installed system-wide, lets you upgrade libraries, and supports the installation of packages for R, Python, Node.js, Java, and so on.
The process of building Conda Environments is simpler and faster than building Docker images. For many ML and AI use cases, Conda Environments offer the right level of isolation and flexibility.
Conda Environments give you the following capabilities:
You can install Python libraries from the different Conda channels such as conda-forge, from a pypi service, or directly from a third-party version control provider such as github.com.
Conda Environments are also portable through the conda-pack tool. You can archive them in an Object Storage bucket, for example, or shipped across platforms and operating systems.
You can access different Conda Environments as different notebook kernels in JupyterLab. So, data scientists and machine learning engineers can simultaneously execute different notebooks in different kernels with potentially conflicting sets of dependencies.
Within notebook sessions, you can leverage the Environment Explorer extension, available through the JupyterLab launcher tab, to list, install, publish, delete, and clone Conda Environments. Each Explorer tab allows you to filter on either the Data Science, Installed, or Published Conda Environments.
Alternatively, you can call the odsc conda CLI to list, create, install, delete, clone, and publish Conda Environments directly from the JupyterLab terminal window.
From the odsc conda CLI or the Explorer extension, you can install one or more of the Data Science Conda Environments. Those environments are built and curated by the OCI Data Science service team. Although all of these environments include Accelerated Data Science (ADS), only Classic and General Machine Learning environments include AutoML and MLX.
More Data Science Conda Environments are added over time around NLP, computer vision, time series, and geospatial modeling.
Each Data Science Conda Environment is versioned. New versions of existing Data Science Conda Environments will include upgrades to libraries, ADS, notebook examples, etc. or include new libraries.
Maybe the Data Science Conda Environments don’t have exactly what you are looking for. That’s not a problem. You can always create your own Conda Environment using odsc conda create command. List what libraries you want to install in a Conda compatible environment.yaml file, and we take care of the rest, including installing the dependencies needed to turn your Conda Environment into a notebook kernel! Conda supports the installation of libraries from Conda channels and pip.
One of the greatest benefits of our environment feature is the ability to take a Conda Environment that you’ve installed or created in your notebook session and publish it to your object storage bucket, using the odsc conda publish command. This capability allows you to share Conda Environments with colleagues who have access to the same bucket or to install a Conda that you’ve previously published in a different notebook session.
Once an environment is published, it becomes available under the Published Conda Environment tab of the Explorer extension. The odsc conda list -o command does the same from a terminal window! You can reinstall any of the environments that you’ve previously published.
All Conda Environments that you create or install in your notebook session are stored in the block volume drive. So, the Conda Environments persist a notebook session deactivation and activation cycle. You never have to reinstall the same libraries after notebook session activation ever again!
Notebook examples are now specific to each Data Science Conda Environment that you install in your notebook session. Although each Conda Environment has its own set of example notebooks, we included a tailored Getting Started notebook for each Data Science Conda Environment. Whenever you delete a Conda Environment, we also delete the associated notebook examples.
You can now install one or more of the pre-built Data Science Conda Environments that are made available through the Explorer notebook extension and the odsc conda CLI tool. Each Conda Environment comes with its own set of notebook examples to help you get started quickly with each environment.
We offer various Conda Environments, and this list grows over time to include Condas tailored for particular use cases like NLP, compute vision, and time series.
The following Data Science Conda Environments are available in this initial release:
Attend Oracle Developer Live for technical sessions, hands-on labs, and live Q&A about how you can optimize data for the machine learning lifecycle. Sign up for any of the following dates:
I'll be leading sessions on Accelerated Data Science, the end-to-end machine learning lifecycle, and using GPUs for data science with NVIDIA RAPIDS.
- Visit our website
- Visit our service documentation
- (Oracle Internal) Visit our slack channel #oci_datascience_users
- Visit our YouTube Playlist
- Visit our LiveLabs Hands-on Lab