Announcement: Notebook Sessions Running on GPU VM Shapes Now Available in Oracle Cloud Infrastructure Data Science 

September 24, 2020 | 5 minute read
Praveen Patil
Principal Product Manager - Data Science
Text Size 100%:

Beginning on September 23, 2020, Jupyter notebook sessions running on GPU virtual machines (VMs) will be generally available in Oracle Cloud Infrastructure Data Science. Data scientists can now leverage NVIDIA’s Pascal (P100) and Volta (V100) generations of GPU’s to build and train their machine learning models. 

 

What’s the Difference Between GPUs and CPUs?

GPUs hold unique advantages over CPUs for processing large amounts of data or training deep learning models, and doing inference on those models. While CPU cores are designed to handle general computations and workloads, GPU cores are optimized exclusively for data computations. A GPU core is simpler and has a smaller die area than a CPU, allowing many more GPU cores to be packed onto a single chip. Using a GPU allows you to vastly improve your performance (as much as five to 10 times) when running on a GPU comparted with a CPU.

Thus, GPUs are designed for speedy performance of large-scale matrix calculations and are best suited for parallel execution for large scale machine learning (ML) and deep learning (DL) problems. Consequently, ML applications, which perform large numbers of computations on large amounts of structured or unstructured data (e.g. image, text, video), can see huge performance improvements of five to 10 times faster when running on a GPU, as compared to the same computation parallelized on a large number of CPUs.

 

How to Access GPUs

It’s already easy to select and use the desired compute and storage configuration for project environments in Oracle Cloud Infrastructure Data Science. This same simplicity now extends to GPU Virtual Machines.

When users create a new notebook session or reactivate an existing one, they select the compute shape to be used. The compute shape represents the type and number of NVIDIA GPU cards in an instance. For example, VM.GPU2.1 will have one NVIDIA P100 card and VM.GPU3.2 will have two NVIDIA V100 cards. When users select GPU VM shapes, they can use GPU cards to build and train deep learning models or use the associated CPUs for machine learning, according to their needs. 

 

GPU Availability:  

VM shapes with GPUs are available in four Oracle Cloud Infrastructure regions: 

  • US East (Ashburn) 
  • UK South (London)
  • Germany Central (Frankfurt)
  • Japan East (Tokyo)

The table below includes Oracle Cloud Infrastructure regions that host GPU VM shapes with the available shapes and associated generations of NVIDIA GPU. 

Region Availability
US East (Ashburn) VM.GPU2.1 (NVIDIA P100 GPUs) and VM.GPU3.X (NVIDIA V100 Tensor Core GPUs) shapes
Germany Central (Frankfurt) VM.GPU2.1 (NVIDIA P100 GPUs) shapes
UK South (London) VM.GPU3.X (NVIDIA V100 Tensor Core GPUs) shapes
Japan East (Tokyo) VM.GPU3.X (NVIDIA V100 Tensor Core GPUs) shapes

 

By default, limits on GPU counts are set to zero for all customers. To run notebook sessions on GPU shapes, customers need to request a service limit increase within the console. Find out how to request a service limit increase.

 

ML Libraries on NVIDIA GPU:

Notebook sessions running on GPU shapes come pre-installed with major open source ML libraries for building and training models. Below are some of the popular open source ML libraries that are available within the GPU notebook session environment.

 

TensorFlow 2.2.0

TensorFlow is an end-to-end open source ML framework that is primarily used to design, build, and train deep learning models. This version of TensorFlow has better performance across the board, much tighter integration with Keras, distributed training specifically for GPUs, standardized SavedModel file format, multiple runtime support including multi-GPUs, and more.  

TensorFlow

 

PyTorch 1.2

PyTorch is a Python-first open-source deep-learning framework that helps accelerate the path from research training to production deployment. PyTorch is mainly used to build applications in compute vision and natural language processing. With PyTorch 1.2, the open source ML framework takes a major step forward for production usage with the addition of an improved and more polished TorchScript environment. These improvements make it even easier to ship production models, expand support for exporting ONNX formatted models, and enhance module level support for transformers. 

PyTorch

 

MXNet 1.5.1

Apache MXNet is a flexible and efficient open source ML framework for building deep learning models. Apache MXNet enables scalable distributed training (multi-GPU training support), deep integration with Python, and provides a rich ecosystem of tools and libraries for use cases in computer vision, natural language processing, time series and more. 

Apache MXNet

 

XGBoost-GPU 1.1.1

XGBoost is an open source ML library that provides a high-performance implementation of gradient-boosted decision trees. 

 

Tips for Using GPUs for Data Science

From a JupyterLab terminal window in Oracle Cloud Infrastructure Data Science, you can obtain the number of GPU machines and their specifications by entering this command:

nvidia-smi

The pre-installed GPU statistics allow you to monitor how the GPU machines are being used while building and training machine learning models. These statistics include utilization, memory, and power.

From a JupyterLab terminal window, enter:

gpustat

To get utilization, memory, and power statistics, enter:

gpustat -u -p -c -i 3 --show-power

The command details are in gpustat.

 

GPUs on Oracle Cloud Infrastructure Data Science vs. Oracle Cloud Infrastructure Virtual Machines:

The GPU offering within Oracle Cloud Infrastructure Data Science is a service-managed platform where-in customers can build and train ML models in a collaborative environment using the open source Python ecosystem. The platform comes pre-installed with all the associated NVIDIA drivers and most widely used ML libraries.

GPUs under Oracle Cloud Infrastructure Virtual Machines are unmanaged, pre-configured environments that enable customers to build models and deliver business value. These VM’s are available to use from the Oracle Cloud Marketplace, and customers have complete control of application stack and infrastructure. Discover more details about the Oracle Cloud Infrastructure Virtual Machines offering. 

Additional details:

To learn more, visit the Oracle Data Science page, and follow us on Twitter @OracleDataSci

Praveen Patil

Principal Product Manager - Data Science

Currently working as Product manager associated with Data & AI group within Oracle Cloud Infrastruture. 

Prior to moving to Product role I was a practitioner in Data science space. Over the years my experience has been in applying Advanced analytics and Data science methodologies to various domains - Financial services, Teleco, Entertainment & Gaming and Cloud Business 


Previous Post

Announcing Tribuo, a Java Machine Learning library

Adam Pocock | 5 min read

Next Post


Accelerate your machine learning workflow

Guest Author | 2 min read