OCI delivers stellar generative AI performance in MLPerf Inference v4.0 benchmarks

April 17, 2024 | 13 minute read
Seshadri Dehalisan
Distinguished Cloud Architect
Akshai Parthasarathy
Product Marketing Director, Oracle
Ruzhu Chen
Master Principal Cloud Architect, Healthcare & Life Sciences
Text Size 100%:

MLPerf™ Inference is an industry benchmark suite developed by MLCommons for measuring the performance of systems running AI/ML models with various deployment scenarios. OCI has achieved stellar results in all benchmark cases in vision (classification and detection, medical imaging), natural language processing (NLP), recommendation, speech recognition, large language model (LLM) and text-to-image inferences in OCI’s new BM.GPU.H100.8 shape powered by eight NVIDIA H100 Tensor Core GPUs and using NVIDIA TensorRT-LLM. The highlights: 

  • OCI’s BM.GPU.H100.8 bare metal shape outperformed or matched competitors on RESNET50, Retinanet, BERT, DLRMv2, 3D-Unet, RNN-T, GPT-J, Llama2-70B and Stable Diffusion XL benchmarks.1
  • Generation over generation, OCI’s BM.GPU.H100.8 shows up to 12.6x improved performance vs. BM.GPU.A100.8 (GPT-J benchmark) powered by eight NVIDIA A100 Tensor Core GPUs and 14.7x vs. BM.GPU.A10.4 (RNN-T benchmark) powered by four NVIDIA A10 Tensor Core GPUs.1, 2
  • For NVIDIA H100 GPU-based instances, OCI has performed up to 22% better in the DLRMv2 benchmark than the closest cloud competitor.1

OCI BM.GPU.H100.8 Shape Benchmark Performance 

The table below shows performance numbers for OCI’s BM.GPU.H100.8 shape. For an exhaustive list of submitters’ performance, please visit MLPerf benchmark results1.

Reference App

Benchmark

Scenarios

Server

(queries/s

Offline

(samples/s)

Vision (image Classification)

ResNet50 99

584,147

699,409

Vision (Object Detection)

Retinanet 99

12,876

13,997

Vision (Medical Imaging)

3D-Unet 99

-

52

3D-Unet 99.9

-

52

Speech to Text

RNN-T 99

143,986

139,846

Recommendation

DLRMv2 99

500,098

557,592

DLRMv2 99.9

315,013

347177

NLP

BERT 99

55,983

69,821

BERT 99.9

49,587

61,818

LLM

GPT-J 99

230

237

GPT-J 99.9

230

236

LLM

Llama2-70B 99

70

21,299

Llama2-70B 99.9

70

21,032

Text to Image Gen

Stable Diffusion XL 99

13

13

Source: MLPerf® v4.0 Inference Closed. Retrieved from https://mlcommons.org/benchmarks/inference-datacenter/  14 April 2024, entry 4.0-0073.

 

Performance across instance types for AI inference

The published results on MLPerf 4.0 and MLPerf 3.1 for  BM.GPU.H100.8 (8 x NVIDIA H100 GPUs), BM.GPU.A100.8 (8 x NVIDIA A100 GPUs) and BM.GPU.A10.4 (4 x NVIDIA A10 GPUs) are shown below.1,2 

 

BM.GPU.H100.8 *

BM.GPU.H100.8 vs. BM.GPU.A100.8*

BM.GPU.H100 vs. BM.GPU.A10*

Benchmark

Server (Queries/s)

Offline (Samples/s)

Server (Queries/s)

Offline (Samples/s)

Server (Queries/s)

Offline (Samples/s)

RESNET

0%

-1%

101%

115%

N/A

N/A

RetinaNet

0%

0%

98%

150%

1.5x

1406%

14.1x

1368%

13.7x

3D U-Net 99

N/A

0%

N/A

70%

N/A

900%

3D U-Net 99.9

N/A

0%

N/A

70%

N/A

N/A

RNN-T

N/A

N/A

38%

30%

1465%

14.7x

723%

7.2x

BERT 99

0%

-1%

100%

175%

1.8x

N/A

N/A

BERT 99.9

0%

-1%

287%

2.9x

325%

3.3x

N/A

N/A

DLRM v2 99

67%

64%

525%

5.3x

303%

3.0x

N/A

N/A

DLRM v2 99.9

5%

2%

N/A

N/A

N/A

N/A

GPT-J 99

187%

122%

1258%

12.6x

774%

7.7x

N/A

N/A

GPT-J 99.90

N/A

N/A

1248%

12.5x

832%

8.3x

N/A

N/A

Llama 2-70B 99

N/A

N/A

N/A

N/A

N/A

N/A

Llama 2-70B 99.90

N/A

N/A

N/A

N/A

N/A

N/A

SDXL

N/A

N/A

N/A

N/A

N/A

N/A

* comparisons were made for results obtained in MLPerf v4.0 vs. MLPerf v3.1 for three scenarios. For the comparisons titled “vs. BM.GPU.A100.8” and “vs. BM.GPU.A10,” MLPerf v3.1 benchmark results were used for the BM.GPU.A100.8 and BM.GPU.A10 instance families.1,2

From the table above, we see that:

  • The MLPerf v4.0 vs. MLPerf v3.1 performance of BM.GPU.H100.8 across all tests is either in-line or better. Notably, there is up to a 67% improvement with DLRM v2 99% and 187% improvement with GPT-J 99%. OCI customers can get improved inference performance for neural network-based personalization and recommendation models, and LLMs.
  • From BM.GPU.H100.8 vs. BM.GPU.A100.8, we see a significant improvement in performance of the current generation H100 GPU-based instances compared to the previous generation A100 GPU-based instances. Both instances featured eight NVIDIA GPUs. The performance for GPT-J (LLMs) was an order of magnitude better in BM.GPU.H100.8 than BM.GPU.A100.8 (12.6x for GPT-J 99% Server and 7.7x for GPT-J 99% Offline).
  • For BM.GPU.H100.8 vs. BM.GPU.A10.4, as expected, there is about an order-of-magnitude improvement, across the board, against BM.GPU.A10.4. The BM.GPU.A10.4 bare metal instance is based on the lower power and cost of NVIDIA A10 Tensor Core GPUs. Also, BM.GPU.A10.4 has four NVIDIA A10 GPUs compared to eight NVIDIA H100 GPUs in BM.GPU.H100.8. Depending on the level of price-performance needed, customers can choose to use either option.

High performance of generative AI and other accelerated workloads

With the growing importance of generative AI, two new highly anticipated generative AI benchmarks, Llama2-70B and Stable Diffusion XL, are added to benchmark suite version 4.0. Llama2-70B and Stable Diffusion XL, run exceptionally well on systems with NVIDIA H100 GPUs. As shown below, the GPT-J benchmark BM.GPU.H100.8 shows over 13X the performance compared with the BM.GPU.A100.8. 1,2

Bar chart showing select benchmarks with a highlight on the high performance of A10 and A100 instances

When benchmarking RNN-T, BM.GPU.H100.8 shows 15X the performance of BM.GPU.A10.4 and BM.GPU.A100.8 instance shows 11X the performance of BM.GPU.A10.4. Additional comparisons are shown below. 1,2

Bar chart showing select benchmarks with a highlight on the RNN-TServer

Takeaway

OCI provides a comprehensive portfolio of GPU options optimized for AI workloads, including training and inference. These GPUs are available globally in our 48 public cloud regions, in addition to sovereign, government and dedicated regions. Our AI portfolio also includes state-of-the-art generative AI innovations, pre-built AI services, vector databases and more. 

The MLPerf 4.0 inference results showcase OCI’s competitive strength in AI infrastructure and ability to handle a wide array of workloads, including LLMs and recommendation systems. For further information on our products, see our GPU and AI infrastructure pages.

Acknowledgement

The authors want to thank Dr. Sanjay Basu, Senior Director of OCI Engineering, and Ramesh Subramaniam, Principal Program Manager of OCI Engineering, for their assistance in publishing these results.

Footnotes:

[1] MLPerf® v4.0 Inference Closed. Retrieved from https://mlcommons.org/benchmarks/inference-datacenter/  29 March 2024, entry 4.0-0073. Result verified by MLCommons Association. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

[2] MLPerf® v3.1 Inference Closed. Retrieved from https://mlcommons.org/benchmarks/inference-datacenter/  29 March 2024, entries 3.1-0119, 3.1-0120, 3.1-0121. Result verified by MLCommons Association. The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.

Seshadri Dehalisan

Distinguished Cloud Architect

Sesh is a Distinguished Cloud Architect. His passion is in leveraging technology to enable business outcome especially in the areas of MLOps and Cloud Native solutions. He has an MBA from University of Minnesota and a bachelor's in engineering.

Akshai Parthasarathy

Product Marketing Director, Oracle

Akshai is a Director of Product Marketing for Oracle Cloud Infrastructure (OCI) focused on driving adoption of OCI’s services and solutions. He has over 15 years of experience and is a graduate of UC Berkeley and Georgia Tech.

Ruzhu Chen

Master Principal Cloud Architect, Healthcare & Life Sciences

Ruzhu is a master principal cloud architect in OCI's AIML cloud engineering team with strong hands-on expertise in large AI/ML platform and application optimization. He has 20+ years’ experience in Life Science application development, enablement, and user support previously as IBM Lead Scientist and SME in the Life Science global team. He holds a PhD in microbiology (molecular biology focus) and a master's in computer science.


Previous Post

RSA Conference 2024: The Art of Possible

Maria Philip | 4 min read

Next Post


ETIQMEDIA certifies their automated video transcription and indexation solution on OCI

María Perolet | 5 min read