Accelerate machine learning in Oracle Cloud Infrastructure on NVIDIA GPUs with RAPIDS

January 19, 2021 | 7 minute read
JR Gauthier
Sr Principal Product Data Scientist
Text Size 100%:

Earlier this month, Oracle Cloud Infrastructure (OCI) Data Science released Conda Environments for notebook sessions. One of the environments available for (NVIDIA) GPU virtual machines (VMs)  is the RAPIDS (version 0.16) environment. In this post, I give an overview of NVIDIA RAPIDS and why it's awesome! 

First, RAPIDS is a suite of open source machine learning libraries that lets machine learning engineers execute end-to-end machine learning and analytics pipelines entirely on GPUs. I like to think of RAPIDS as the "pandas+sklearn for GPUs", a flexible collection of libraries that covers a wide range of algorithms typically used in industry. As you might expect, you have substantial speedup gains when training models on GPUs compared to CPUs. I show a few examples in this post. 

The RAPIDS user interface is also similar to what you experience in the PyData stack of libraries, including pandas, scikit-learn, and Dask. If you're familiar with those libraries, the RAPIDS learning curve is minimal. 

To get started with OCI Data Science and our notebook session environment, visit our "Getting Started with Data Science" tutorial page. This page guides you through the process of configuring your OCI tenancy for Data Science and launching a notebook session. You can also watch the video "Oracle Cloud Infrastructure Data Science Tenancy Setup" on our YouTube playlist. 

Create a new notebook session or activate a deactivated notebook session on a GPU shape. Both VM.GPU2.1 and VM.GPU3 shapes work. Install the NVIDIA RAPIDS 0.16 conda environment from the Environment Explorer tool. RAPIDS works out-of-the-box on GPUs.

If you want to monitor usage of the GPUs while you run RAPIDs commands, I recommend using the handy gpustat command from a JupyterLab terminal window. The following command refreshes statistics every 3 seconds. 

gpustat -i 3 
I see the following result in my terminal window:

Overview of the NVIDIA RAPIDS Conda Environment for GPUs 

One of the Data Science Conda Environments that we offer in notebook sessions is the NVIDIA RAPIDS version 0.16. It's available on VM.GPU2.1 (NVIDIA P100) and VM.GPU3.X (NVIDIA V100 Tensor Core GPU) shapes.  Watch our new video on "Data Science Conda Environments" available in our YouTube playlist for more information about the new conda environments feature. 

In the remainder of this post, I will walk you through each of the RAPIDS library to help showcase its functionality and performance improvement. 


RAPIDS dataframe ETL library (cuDF)

RAPIDS cuDF stands for CUDA data frame and is probably the first library you will need when using RAPIDS. cuDF is similar to pandas in that it handles data frames. It's built on top of Apache Arrow, which supports columnar data formats. You can do most of the standard operations you can do with pandas or Dask with cuDF. cuDF supports the standard NumPy data types (dtypes).

Let's download a csv file from our public Object Storage bucket.  The dataset is used to train classifiers on an imbalanced dataset. The dataset has 294K rows and 36 columns. The target is called "anomalous". 

import pandas as pd 
import cudf as cdf 
from urllib.request import urlretrieve

urlretrieve("", "./fraud.csv")

In the next code cell below, I read the local fraud.csv file with Pandas using read_csv(). I do the same operation in a separate cell with cudf read_csv() and compare the wall time for both executions. 


pdf = pd.read_csv("fraud.csv")

with cuDF: 


df = cdf.read_csv("fraud.csv")

Same dataset, same location. You can see a significant speedup with cuDF. I get about twice the speedup. Although this is a small dataset, I have run similar tests on a much larger dataset (11M rows with 29 columns), and I got a speedup of about 20 times. I ran my tests on a VM.GPU3.2 shape. cudf can also ingest data in various forms, including Avro, CSV, Feather, HDF, ORC, Parquet, and JSON. 

Data ingestion is fast, but operations on the data are much faster than the equivalent done with pandas. Next, let's try a simple sort command: 


With cuDF: 


I get speedups of about 5-6 times. Now, try a custom transformation of your choice. cuDF supports various transformations on data frames including grouping, joining, filtering, and custom transformations applied to rows and columns. To learn more about the supported transformations, consult the cuDF documentation or refer to the cuDF GitHub repo. If a transformation isn't supported with cuDF, you can convert to a pandas data frame, apply the transformations, convert back to a cuDF data frame, and then resume your workload. 

Another interesting feature of cuDF is that it can be used with Dask via the dask_cudf library for scaling workloads to multiple GPUs on the same compute node or to multiple nodes with multiple GPUs. If your data doesn't fit in the memory of a single GPU, use dask_cudf. 

RAPIDS machine learning library (cuML)

RAPIDS cuML is the machine learning library of RAPIDS. It closely follows the scikit-learn API and provides implementations for the following algorithms: 

  • DBSCAN and K-means for clustering analysis.
  • PCA, tSVD, UMAP, and TSNE for dimensionality reduction.
  • Linear regression with lasso, ridge, and elasticnet regularization for linear modeling of both regression and classification use cases.  
  • Random Forest for both regression and classification use cases. 
  • K-Nearest neighbors for both classification and regression use cases.
  • SVC and epsilon-SVR.
  • Holt-winters exponential smoothing and ARIMA modeling for time series. 
  • and more. 

cuML is well suited to do "bread-and-butter" machine learning work and for training models on tabular dataset. As I mentioned with cuDF, you can train most of the algorithms listed at scale using more than one GPU using Dask. You can benefit from this distributed approach by launching notebook sessions on VM.GPU3.2 or VM.GPU3.4 shapes. 

Let's look at an example. The following code snippet compares the training of a random forest algorithm using scikit-learn and cuML. It is similar to what the NIVIDA team posted in their introductory notebook

Let's first import the modules we need. Although the cuML random forest algorithm is different from the sklearn one, I tried to use a similar set of hyperparameters for both algorithms to allow for a meaningful comparison. 

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split as sk_train_test_split

import cuml
from cuml.preprocessing.model_selection import train_test_split as cu_train_test_split
from cuml.ensemble import RandomForestClassifier as cuRF

# random forest depth and size. For both cuML and sklearn algos
n_estimators = 25
max_depth = 10

Next, we split the dataset column wise by separating the features from the target variable. Then we do a train and test split using sklearn and cuML functions. Both have the same train and test datasets sizes. 

# splitting our datasets between features and target variable: 
X, y = pdf[pdf.columns.drop('anomalous').values], pdf['anomalous']
Xcu, ycu = df[df.columns.drop('anomalous').values], df['anomalous']

# converting the dataframes into float32. cuML random forest predict 
# supports only float32.  
X = X.astype(np.float32)
Xcu = Xcu.astype(np.float32)

# Train/test split for each dataset. By default, sklearn train_test_split function 
# generates a train dataset with size corresponding to 75% of the original dataset.  
X_train, X_test, y_train, y_test = sk_train_test_split(X, y)
Xcu_train, Xcu_test, ycu_train, ycu_test = cu_train_test_split(Xcu, ycu, train_size=0.75)

We're now ready to instantiate estimator objects for both libraires using similar hyperparameters: 

# sklearn random forest estimator: 
sk_est = RandomForestClassifier(n_jobs=-1, max_depth=max_depth, n_estimators=n_estimators)

# cuML random forest estimator: 
cu_est = cuRF( max_depth = max_depth, n_estimators = n_estimators, random_state  = 0)

We can fit each estimator separately: 


sk_rfc =,y)

And for cuML: 


cu_rfc =, ycu)

On a separate machine with a NVIDIA P100 GPU card, I noticed an approximate 50-times speedup with cuML. Fitting the random forest estimator with sklearn took about five seconds, while it took about 100ms with cuML -- a significant speedup. I can also explore the hyperparameter space of my random forest more efficiently with cuML than I can with sklearn.

Now, we can compare the performance of each estimator on a test dataset. Since I didn't apply any up/down sampling correction to the imbalanced dataset, take the results with a grain of salt. 

sk_predictions_test = sk_rfc.predict(X_test)
cu_predictions_test = cu_rfc.predict(Xcu_test)

cu_score = cuml.metrics.accuracy_score( ycu_test, cu_predictions_test)
sk_score = accuracy_score(y_test, sk_predictions_test)

print( " cuml accuracy: ", cu_score )
print( " sklearn accuracy : ", sk_score )
 cuml accuracy:  0.9993894100189209
 sklearn accuracy :  0.9995793872622181

Both algorithms are performing similarly, but cuML's random forest is 50 times faster to train. I invite you to generate datasets of different sizes and look at the impact of dataset size on cuML's performance. 

You have a lot of resources on cuML to get you started with the library, including this great post on speeding up random forests by up to 45x using cuML

Other RAPIDS libraries of note


cuGraph is a collection of graph algorithms and utility functions developed to be run on GPUs. Probably the closest open source library equivalent running on CPUs is NetworkX.  You can do a lot of analysis with cuGraph, including finding the shortest path between two nodes, running the pagerank algorithm, and measuring the similarity between the neighborhoods of two nodes -- all on a single GPU. But cuGraph also offers distributed graph analytics (using Dask!) for VMs with more than one GPU, most notable the pagerank algorithm. 

To get started with graph analytics on GPUs, see the cuGraph Blogs and Presentations


cuSignal and cuSpatial 

cuSignal and cuSpatial are two other libraries offered as part of RAPIDS and are installed in the RAPIDS 0.16 conda environment. cuSignal focuses on signal processing. It's a great tool if you're filtering, sampling, applying convolution/deconvolution of signals, or doing spectrum analysis. On the other hand, cuSpatial is great for spatial and trajectory data. It can compute a variety of distances, such as Haversine andHausdorff, and speed of trajectories, among other things. 


In conclusion

I hope that I gave you a good overview of the RAPIDS 0.16 conda environment available in Oracle Cloud Infrastructure Data Science notebook sessions. Go ahead and try it!

I also invite you to participate in our upcoming Oracle Developer Live: AI and ML for Your Enterprise event on January 26, 28, and February 2, 2021. I will be giving a workshop entitled "Hands-On Lab: GPU Accelerated Data Science with Nvidia RAPIDS" on January 26 during which you will be able to use RAPIDS to train ML models on GPUs in OCI Data Science. Register today!  

RAPIDS is an open source project and is open to contributions. Join the developer community at

Keep in touch with OCI Data Science! 

-    Visit our website

-    Visit our service documentation

-    (Oracle Internal) Visit our slack channel #oci_datascience_users

-    Visit our YouTube Playlist

-    Visit our LiveLabs Hands-on Lab 


JR Gauthier

Sr Principal Product Data Scientist

I've also written post under my full first name. You can find those posts here:

Previous Post

New Conda Environment feature available in Oracle Cloud Infrastructure Data Science

JR Gauthier | 7 min read

Next Post

Optimizing estimators with the ADSTuner: A hyperparameter optimization engine

Nupur Chatterji | 10 min read