OCI’s exceptional performance for AI validated in MLPerf Inference v3.1 results

September 12, 2023 | 7 minute read
Seshadri Dehalisan
Distinguished Cloud Architect
Akshai Parthasarathy
Product Marketing Director, Oracle
Ruzhu Chen
Master Principal Cloud Architect, Healthcare & Life Sciences
Text Size 100%:

The authors want to thank Dr. Sanjay Basu, Senior Director of OCI Engineering, and Rob Dolin, Senior Manager of OCI Engineering, for their assistance in publishing these results.

Oracle Cloud Infrastructure (OCI) has achieved strong results across multiple benchmarks in the MLCommons Inference Datacenter v3.1 suite, the industry standard for measuring AI infrastructure performance. OCI was tested across several shapes powered by NVIDIA GPUs, including the NVIDIA H100 Tensor Core GPU, the NVIDIA A100 Tensor Core GPU, and the NVIDIA A10 Tensor Core GPU, and has the following key highlights:

  • OCI’s BM.GPU.H100.8 shape with eight NVIDIA H100 GPUs delivered top results, outperforming or matching competitors on RESNET, RetinaNet, BERT, DLRMv2, and GPT-J benchmarks.
  • BM.GPU.A100-v2.8 with eight NVIDIA A100 GPUs also showed strong performance across the board.
  • BM.GPU.A10.4 with four NVIDIA A10 GPUs demonstrated cost-effective performance on select benchmarks like RetinaNet and RNNT.

OCI’s focus on high performance

From its inception, OCI has been focused on providing high-performance infrastructure for any workload. We were one of the first cloud providers to offer support for bare metal instances natively and high-performance RDMA network for internode communications. We can support all phases of the AI workflow, including training and inference. Many organizations are driving innovation on OCI with AI and NVIDIA GPUs, including MosaicML, Twist Bioscience, and Emory University

This announcement is for our inaugural publication of OCI’s MLPerf results through the MLCommons Inference Datacenter v3.1 benchmarks. MLCommons is a collaborative engineering organization focused on developing the AI ecosystem through benchmarks, public datasets, and research. OCI’s portfolio of GPU shapes include NVIDIA H100, NVIDIA A100, and NVIDIA A10 GPUs, among others. We have obtained inference benchmark results for all three of these industry-leading NVIDIA GPU shapes.

OCI Compute shapes benchmarked for MLCommons Inference v3.1

Bare metal shape powered by NVIDIA H100 GPUs: BM.GPU.H100.8

The BM.GPU.H100.8 shape includes eight NVIDIA H100 GPUs, with 80 GB of GPU memory per GPU. The CPU processors are Intel Xeon Platinum 8480+ with 2-TB system memory and 112 cores. The shape also includes 16 local NVMe drives with a capacity of 3.84 TB each.

The following table shows the benchmark results of BM.GPU.H100.8. The results are comparable or superior to those from alternatives. For the complete results, see the ML Commons Inference Data Center page.

 

Scenario

Benchmark

Server (Queries/sec)

Offline (Samples/sec)

RESNET

584,197.00

703,548.00

RetinaNet

12,884.60

14,047.20

3D U-Net 99

-

51.45

3D U-Net 99.9

-

51.48

BERT 99

56,022.10

70,689.90

BERT 99.9

49,617.50

62,285.50

DLRM v2 99

300,033.00

339,265.00

DLRM v2 99.9

300,033.00

339,050.00

GPT-J 99

79.90

106.69

 

Bare metal shape powered by NVIDIA A100 GPUs: BM.GPU.A100-v2.8

This shape includes eight NVIDIA A100 GPUs, with 80 GB of GPU memory per GPU. The CPUs are AMD EPYC 7J13 64-Core Processors with 128 cores and 2 TB of system memory. The benchmark results indicated that the results were on par with, if not superior to, the other hyperscalers.

 

Scenario

Benchmark

Server (Queries/sec)

Offline (Samples/sec)

RESNET

290,028.00

325,567.00

RetinaNet

5603.34

6512.98

3D U-Net 99

-

30.32

3D U-Net 99.9

-

30.33

RNNT

104,012.00

107,408.00

BERT 99

25,406.20

28.028.60

BERT 99.9

12,824.10

14,534.40

DLRM v2 99

80,018.10

138,331.00

DLRM v2 99.9

80,018.10

138,179.00

GPT-J 99

16.92

27.13

GPT-J 99.9

17.04

25.29

 

Bare metal shape powered by NVIDIA A10 GPUs: BM.GPU.A10.4

The BM.GPU.A10.4 shape includes four NVIDIA A10 GPUs, with 24 GB of GPU memory and 1-TB system memory. The shape also includes two 3.5-TB NVMe local storage.

The benchmark identified that OCI Compute instances based on NVIDIA A10 GPUs are a suitable option for inferencing specific models at optimum price-performance. GPTJ and DLRMv2 benchmarks were not run on A10.4 for this iteration.

The complete results are available at the ML Commons Inference Data Center page.

 

Scenario

Benchmark

Server (Queries/sec)

Offline (Samples/sec)

RetinaNet

855.00

953.53

3D U-Net 99

-

5.15

RNNT

9,202.52

16,989.30

 

Takeaway

Oracle Cloud Infrastructure provides a comprehensive portfolio of GPU options optimized for AI workloads including training and inference. The MLPerf Inference results showcase OCI’s competitive strength in AI infrastructure and ability to handle demanding workloads like large language models. For further information on our products, see our GPU and AI infrastructure pages.

Seshadri Dehalisan

Distinguished Cloud Architect

Sesh is a Distinguished Cloud Architect. His passion is in leveraging technology to enable business outcome especially in the areas of MLOps and Cloud Native solutions. He has an MBA from University of Minnesota and a bachelor's in engineering.

Akshai Parthasarathy

Product Marketing Director, Oracle

Akshai is a Director of Product Marketing for Oracle Cloud Infrastructure (OCI) focused on driving adoption of OCI’s services and solutions. He has over 15 years of experience and is a graduate of UC Berkeley and Georgia Tech.

Ruzhu Chen

Master Principal Cloud Architect, Healthcare & Life Sciences

Ruzhu is a master principal cloud architect in OCI's AIML cloud engineering team with strong hands-on expertise in large AI/ML platform and application optimization. He has 20+ years’ experience in Life Science application development, enablement, and user support previously as IBM Lead Scientist and SME in the Life Science global team. He holds a PhD in microbiology (molecular biology focus) and a master's in computer science.


Previous Post

Channels in OCI Queue enable messaging fairness, ephemeral destinations, and more

Abhishek Bhaumik | 6 min read

Next Post


Announcing OCI Site-to-Site VPN over FastConnect

Misha Kasvin | 7 min read