X

NVIDIA A100 Bare Metal Performance in Oracle Cloud Infrastructure

Justin Blau
Senior Technical Product Manager

Oracle Cloud Infrastructure is making available a system featuring NVIDIA’s Ampere architecture, the next generation of data center computing hardware. The A100 shape has been tuned to give the best possible performance and adds to a growing list of performance systems that provide on-premises performance in the cloud.

A New and Improved Offering

Continuing the line of bare metal offerings, Oracle Cloud Infrastructure provides access to infrastructure without virtualization. Every aspect of the system has been upgraded to achieve A100’s 312 TeraFLOPS (TFLOPS) of performance.

BM.GPUx.8 Specification Comparison

  BM.GPU3.8 BM.GPU4.8
GPUs 8 V100 Tensor Core GPUs with 16 GB 8 A100 Tensor Core GPUs with 40 GB
CPUs 2 26 Intel Core at 2.0 GHz 2 32 Core AMD at 2.9 GHz
Memory 768 GB DDR4 2048 GB DDR4
Networking 2 25 Gbps 8 200 Gbps
SSDs Up to 1 PB of block storage

4 6.4-TB NVMe SSD

Up to 1 PB Block

Price Per GPU-Hour $2.95 $3.05

The BM.GPU4.8 gives a significant uplift in price performance over the BM.GPU3.8. Ten cents per GPU-hour, in addition to the new Ampere architecture and third-gen NVLink, the new version gains 176-GB GPU RAM, 1280-GB CPU memory, 25.6 TB of NVMe storage, 1550 total Gbps networking, and the ability to enable remote direct memory access (RDMA) for multisystem communication. RDMA allows for low latency connections between nodes and access to GPU memory without involving the CPU.

Specifications are one thing; empirical performance is another. To benchmark and compare the V100 to the A100, test followed NVIDIA’s Deep Learning Examples library. The workloads tested were BERT-Large for language modeling, Jasper for speech recognition, MaskRCNN for image segmentation, and GNMT for translation. All tests ran in the NVIDIA-prepared containers for PyTorch and Tensorflow and the tests were configured for 32-bit tensors.

Artificial Intelligence Workload Performance

Task BM.GPU3.8 BM.GPU4.8 Speed-up Factor
BERT-Large 341 sequences per second 1773 sequences per second 5.2
Jasper 85 sequences per second 278 sequences per second 3.3
MaskRCNN 70 images per second 136 images per second 2
GNMT 96792 tokens per second 170382 tokens per second 1.75

Upgrading workloads to run on the A100 can drive significant compute cost savings and dramatically reduce model training turn-around time. Reaching an accurate model with the BERT-large workload takes around 3.7 days using eight V100 compared to 14.8 hours using eight A100. Reducing the time to solution for areas of Artificial Intelligence, accelerate the development of innovative scientific journals and industry products.

“Our growing collaboration with Oracle is fueling incredible innovations across a wide range of industries and uses. By integrating NVIDIA’s new A100 Tensor Core GPUs into its cloud service offerings, Oracle is giving innovators everywhere access to breakthrough computing performance to accelerate their most critical work in AI, machine learning, data analytics and high-performance computing.” -Ian Buck, Vice President, NVIDIA Tesla Data Center Business

Try for Yourself

This new shape is generally available on September 30, 2020 with virtual machine instances to follow in the coming months. To learn more about how to apply this performance to your development life cycles, visit our GPUs on the Oracle Cloud Infrastructure page.

Resources

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.