Beginning on September 23, 2020, Jupyter notebook sessions running on GPU virtual machines (VMs) will be generally available in Oracle Cloud Infrastructure Data Science. Data scientists can now leverage NVIDIA’s Pascal (P100) and Volta (V100) generations of GPU’s to build and train their machine learning models.
GPUs hold unique advantages over CPUs for processing large amounts of data or training deep learning models, and doing inference on those models. While CPU cores are designed to handle general computations and workloads, GPU cores are optimized exclusively for data computations. A GPU core is simpler and has a smaller die area than a CPU, allowing many more GPU cores to be packed onto a single chip. Using a GPU allows you to vastly improve your performance (as much as five to 10 times) when running on a GPU comparted with a CPU.
Thus, GPUs are designed for speedy performance of large-scale matrix calculations and are best suited for parallel execution for large scale machine learning (ML) and deep learning (DL) problems. Consequently, ML applications, which perform large numbers of computations on large amounts of structured or unstructured data (e.g. image, text, video), can see huge performance improvements of five to 10 times faster when running on a GPU, as compared to the same computation parallelized on a large number of CPUs.
It’s already easy to select and use the desired compute and storage configuration for project environments in Oracle Cloud Infrastructure Data Science. This same simplicity now extends to GPU Virtual Machines.
When users create a new notebook session or reactivate an existing one, they select the compute shape to be used. The compute shape represents the type and number of NVIDIA GPU cards in an instance. For example, VM.GPU2.1 will have one NVIDIA P100 card and VM.GPU3.2 will have two NVIDIA V100 cards. When users select GPU VM shapes, they can use GPU cards to build and train deep learning models or use the associated CPUs for machine learning, according to their needs.
VM shapes with GPUs are available in four Oracle Cloud Infrastructure regions:
The table below includes Oracle Cloud Infrastructure regions that host GPU VM shapes with the available shapes and associated generations of NVIDIA GPU.
|US East (Ashburn)||VM.GPU2.1 (NVIDIA P100 GPUs) and VM.GPU3.X (NVIDIA V100 Tensor Core GPUs) shapes|
|Germany Central (Frankfurt)||VM.GPU2.1 (NVIDIA P100 GPUs) shapes|
|UK South (London)||VM.GPU3.X (NVIDIA V100 Tensor Core GPUs) shapes|
|Japan East (Tokyo)||VM.GPU3.X (NVIDIA V100 Tensor Core GPUs) shapes|
By default, limits on GPU counts are set to zero for all customers. To run notebook sessions on GPU shapes, customers need to request a service limit increase within the console. Find out how to request a service limit increase.
Notebook sessions running on GPU shapes come pre-installed with major open source ML libraries for building and training models. Below are some of the popular open source ML libraries that are available within the GPU notebook session environment.
TensorFlow is an end-to-end open source ML framework that is primarily used to design, build, and train deep learning models. This version of TensorFlow has better performance across the board, much tighter integration with Keras, distributed training specifically for GPUs, standardized SavedModel file format, multiple runtime support including multi-GPUs, and more.
PyTorch is a Python-first open-source deep-learning framework that helps accelerate the path from research training to production deployment. PyTorch is mainly used to build applications in compute vision and natural language processing. With PyTorch 1.2, the open source ML framework takes a major step forward for production usage with the addition of an improved and more polished TorchScript environment. These improvements make it even easier to ship production models, expand support for exporting ONNX formatted models, and enhance module level support for transformers.
Apache MXNet is a flexible and efficient open source ML framework for building deep learning models. Apache MXNet enables scalable distributed training (multi-GPU training support), deep integration with Python, and provides a rich ecosystem of tools and libraries for use cases in computer vision, natural language processing, time series and more.
XGBoost is an open source ML library that provides a high-performance implementation of gradient-boosted decision trees.
From a JupyterLab terminal window in Oracle Cloud Infrastructure Data Science, you can obtain the number of GPU machines and their specifications by entering this command:
The pre-installed GPU statistics allow you to monitor how the GPU machines are being used while building and training machine learning models. These statistics include utilization, memory, and power.
From a JupyterLab terminal window, enter:
To get utilization, memory, and power statistics, enter:
gpustat -u -p -c -i 3 --show-power
The command details are in gpustat.
The GPU offering within Oracle Cloud Infrastructure Data Science is a service-managed platform where-in customers can build and train ML models in a collaborative environment using the open source Python ecosystem. The platform comes pre-installed with all the associated NVIDIA drivers and most widely used ML libraries.
GPUs under Oracle Cloud Infrastructure Virtual Machines are unmanaged, pre-configured environments that enable customers to build models and deliver business value. These VM’s are available to use from the Oracle Cloud Marketplace, and customers have complete control of application stack and infrastructure. Discover more details about the Oracle Cloud Infrastructure Virtual Machines offering.