How to create a new conda environment in OCI Data Science

August 31, 2021 | 7 minute read
JR Gauthier
Sr Principal Product Data Scientist
Text Size 100%:

Earlier this year, the Oracle Cloud Infrastructure Data Science service team released a new feature in notebook sessions that lets you install one or more pre-built Data Science Conda Environments in your notebook session and use the same conda as a runtime environment for model deployment. There are now over 20 pre-built conda environments to choose from, including ones dedicated to Oracle PyPGX, PySpark, NVIDIA RAPIDS, and more. 

But what if you need to create your own custom conda environment? There are many reasons why you might want to do that. Perhaps the pre-built Data Science Conda Environments do not include the right libraries or library versions that you need, you want to install your own proprietary library, or you need a different Python runtime version. 

The following is a simple recipe to create a conda environment in a notebook session using the odsc command line interface (CLI) and share the environment with colleagues using the publish feature of odsc. You can also watch the screencast below which walks you through the steps needed to create your own conda environment in notebook sessions. 

Step 1: Open or launch a notebook session 

In OCI Data Science, launch a notebook session. Make sure that your notebook session can access the public internet since you will be downloading python packages from public channels. 

Step 2: Write a conda-compatible environment.yaml File 

Next step is to write a conda compatible environment file (environment.yaml). This file contains the channels and the dependencies that you want to install in your conda environment. Here is a simple environment.yaml that I wrote. This environment definition file instructs conda to pull packages from the conda channel conda-forge

channels: 
  - conda-forge
dependencies:
  - numpy
  - pandas

You can also select packages from pypi 

Adding pip packages to the list of dependencies

You can install packages directly from pypi . Simply modify environment.yaml as follows: 

channels: 
  - conda-forge
dependencies:
  - numpy
  - pandas
  - pip:
     - scikit-learn==0.24.2

I simply instructed conda to install scikit-learn from pypi.org and pick version 0.24.2. 

Step 3: Create the conda environment with odsc conda create Command 

Once we have an environment.yaml file, the next step is to create the conda environment. For this step, you could, in principle, use a simple conda env create command, but we highly recommend that you use the odsc conda create command instead. Why? Because that way, odsc will install the additional Python dependencies (ipykernel, jupyterlab, nb_conda_kernels) that are required for your conda environment to become available as a JupyterLab notebook kernel, and it will automatically create the conda environment manifest file on your behalf. You can find the list of all the additional dependencies that odsc installs in the file base-env.yaml. The odsc conda create command generates that file in the same folder as your environment.yaml file. 

Open a terminal window in your notebook session and run: 

odsc conda create -f environment.yaml -n my-conda-env

This command will create a brand new kernel in your notebook session called my-conda-env. A version v1.0 will be assigned to the conda environment by default and appended to the name of conda slugname. You can change that by assigning a value to the create command optional parameter -v.

The command will take a couple of minutes to complete. Explore the different options that are available to you when creating a conda by running: 

odsc conda create --help

Once the odsc conda create command is completed, it's time to check your work.

Step 4: Validate the new conda environment 

Go back to the launcher tab of your notebook session. You should see the new kernel available under "Notebook" and "Console".

Click on the new notebook kernel button to generate a new notebook file (ipynb). Confirm that the notebook is executed in the kernel you just created. The name of the kernel shows up in the top right corner of the notebook tab.

In your notebook, import numpy and pandas and confirm that these libraries are available in your environment. Do the same thing for scikit-learn if you installed it from pypi: 

import numpy 
import pandas 
import sklearn 

print(numpy.__version__)
print(pandas.__version__)
print(sklearn.__version__)

Tip: If you ever need to switch to a different kernel, simply click on the kernel name in the top right corner and you will be able to choose a different one from the "Select Kernel" window. 

 

Step 5: Publish the new environment 

This step is optional. Once you've successfully created the conda environment, you can use it to run notebooks and Python scripts. If you want to share the conda environment with your colleagues or across notebook sessions or assign it as a runtime environment for Model Deployment, you will have to publish the environment

Publishing a conda environment consists of creating a pack and uploading it to an Object Storage bucket that you specify. This lets conda environments be shared among colleagues or persisted across notebook sessions. We recommend that you publish conda environments to ensure that a model training environment can be reproduced or re-used for model deployment (that is, assign it as the inference environment of your model). 

You can use the odsc CLI to publish an environment. First, you need to specify the target object storage bucket where the published environment will be stored. This can be done through the odsc conda init command: 

odsc conda init -b  -n  -a {api_key | resource_principal}

You only have to run this command once after you started your notebook session. You need to replace the values above with the name of your object storage bucket, the namespace of your OCI tenancy, and select the method of authentication you want to use to authenticate with object storage: api_key if you are using user principal (you will need to provide the path to your oci config file with the -k option) or resource_principal if your notebook session is authorized to access the bucket you specified. 

Now you are ready to publish the environment. Use the odsc conda publish command. Specify the slug name of the conda environment you just created. The slug name is the name of the conda environment and its version. It corresponds to the notebook kernel name minus the "conda-env:" part. In my case, it would be my-conda-envv1_0Yours should be same if you executed the same odsc commands. You can also find the slug name by inspecting the conda environment cards in the Environment Explorer extension (see below). 

odsc conda publish -s my-conda-envv1_0

The publish command uses object storage multipart upload feature to push the pack to object storage faster.  

Go to your object storage bucket in the OCI console and confirm that the new conda pack is stored in the bucket. 

Congratulations! You have created a conda environment that you can use to run a notebook, and you can publish it to your object storage bucket. 

Listing the conda environments through the environment explorer extension 

In the Environment Explorer extension of the notebook session, you can also list and inspect all the condas that you have available in your notebook session and the ones that are published to a shared object storage bucket. 

Click on the "Installed Conda Environments" tab. You should see a card with your conda environment. 

Now click on the "Published Conda Environments" tab. You should see a card with your publish conda environment. If your team is using the same bucket to publish their conda environments, you will be able to see the conda environments that your colleagues have created. This allows you to install condas in your notebook session that others have published to the bucket. This is also a good way to archive or share environments across notebook sessions. 

 

Other Resources on Conda Environments 

We have additional resources to help you get started with conda environments: 

Explore OCI Data Science

Try Oracle Cloud Free Tier! A 30-day trial with US$300 in free credits gives you access to OCI Data Science service.

Ready to learn more about the OCI Data Science service?

 

JR Gauthier

Sr Principal Product Data Scientist

I've also written post under my full first name. You can find those posts here: https://blogs.oracle.com/ai-and-datascience/authors/Blog-Author/COREA7667DA212B34765B4DB91B94737F00E/jean-rene-gauthier


Previous Post

Faster exploratory data analysis with feature type stats

John Peach | 9 min read

Next Post


How to easily extract text data for machine learning

Jize Zhang | 6 min read