Oracle participated in the MLPerf v4.0 Training Benchmark suite and achieved spectacular performance. Oracle Cloud Infrastructure (OCI) was able to scale linearly with the increase in the OCI Compute service with NVIDIA GPUs, providing evidence for reducing time-to-value for AI training workloads. The blog post outlines the benchmark results.
While the world of generative AI has captured everyone’s attentions, enterprises are exploring ways and means to harness the power of generative AI to realize their business outcomes or explore Blue Ocean Strategy. Considering that the NVIDIA GPUs are in significant premium, most organizations are exploring the power and scale of the cloud. While one business unit might be interested in text summarization, the other business unit might be interested in image generation. So, there’s no one size fits all.
The enterprises must identify the appropriate cloud platform that can not only scale with their business needs but also provide appropriate guardrails to protect their competitive advantage. Still, the challenge remains of identifying the platform that provides the best performance and reduces time to market.
The MLPerf Training Benchmark suite provides full system tests that stress machine learning (ML) models, software, and hardware for a broad range of applications. The open source and peer-reviewed benchmark suite provides a level playing field for competition that drives innovation, performance, and energy efficiency for the entire industry
Oracle participated in the timebound MLPerf Training 4.0 benchmark and trained benchmarks on Stable Diffusion, Single Shot Detection (SSD), DLRMv2, Llama 70B, and 3D U-Net. The critical success factor for getting excellent numbers in the MLPerf benchmark is the ability of the platform to scale across GPU, compute, network, and storage dimensions as more GPUs are added to the cluster.
Our engineers ran the benchmark on OCI’s BM.GPU.H100 bare metal compute with two Intel Sapphire Rapids CPU with the following specifications:
The benchmarks used units of eight NVIDIA H100 GPUs with one node each, 64 NVIDIA H100 GPUs with eight nodes each, and 128 NVIDIA H200 GPUs with 16 nodes each to prove the scale. The AI cluster used RDMA over converged ethernet (RoCE), which is an open standard enabling remote direct memory access and network offloads over an ethernet network. For more details on RoceV2, see OCI accelerates HPC, AI, and database using RoCE and NVIDIA ConnectX.
The network topology is rail-optimized and helps maximize all-reduce performance while minimizing inference between network flows. For details and advantages of rail-optimized network, see Doubling all2all Performance with NVIDIA Collective Communication Library 2.12.
Oracle participated in the following MLPerf Training v4.0 workloads to ensure that we cover a wide range of use cases and are aligned with industry trends:
Workload type |
Model |
Relevant industries |
Large language model (LLM) fine-tuning |
Llama 2 70B with LoRA |
All |
Text-to-image |
Stable Diffusion v2 |
All |
Recommender |
DLRM-dnnv2 |
Online retailers |
Image classification |
ResNet-50 |
All |
Lightweight object detection |
RetinaNet |
Healthcare and Life Sciences |
Biomedical image segmentation |
3d U-Net |
Healthcare and Life Sciences |
The following table shows the results:
|
Latency (Time to Train) in minutes |
||
|
8 NVIDIA H100 |
64 NVIDIA H100 |
128 NVIDIA H100 |
ResNet |
13.329 |
2.494 |
|
SSD |
37.705 |
6.581 |
|
Stable Diffusion |
|
6.843 |
4.032 |
dlrm_dcnv2 |
4.171 |
|
|
llama2_70b_lora |
29.7 |
|
|
unet3d |
|
1.949 |
|
U-Net 3D ran on 64 NVIDIA H100 GPUs and one BM.GPU NVIDIA H100 master node (eight H100)
We observed linear scalability observed as the training scale increased from 8 to 64 to 128 NVIDIA H100 GPUs, as evidenced in the following graphs:
1. Results of ResNet’s benchmark test on 8 and 64 NVIDIA H100 GPUs
2. Results of SSD’s benchmark test on 8 and 64 NVIDIA H100 GPUs
3. Results of stable diffusion’s benchmark test on 64 and 128 NVIDIA H100 GPUs
You can read the full results of MLPerf Training Benchmark Suite 4.0 on ML Commons’ website. The performance numbers on BM.GPU.H100 bare metal shape outperformed or matched competitors on the workloads in which Oracle participated.
The results from Oracle’s participation in the MLPerf Training Benchmark Suite 4.0 demonstrate the exceptional performance and scalability of OCI’s GPU infrastructure. The ability to scale linearly across various workloads, including LLMs, text-to-image generation, and biomedical image segmentation, highlights OCI’s potential to meet the diverse needs of enterprises exploring generative AI.
Oracle’s BM.GPU.H100 bare metal Compute shape, with its robust specifications and advanced networking capabilities, proves to be a reliable and efficient solution for demanding AI training tasks. This performance, coupled with Oracle’s scalable infrastructure, makes OCI an attractive option for organizations looking to accelerate their AI initiatives and reduce time to market.
Learn more about our performance and scalability on the Oracle Cloud Infrastructure AI Infrastructure product page and speak to an AI expert today.
Sanjay focuses on the advanced services like Generative AI, Machine-Learning, GPU Engineering, Blockchain, Microservices, Industrial IoT, 5G core along with Cloud Security and Compliance. He has double masters in Computer Science and Systems Design. His PhD was in Organizational Behaviour and Applied Neuroscience. Currently, he is pursuing his second PhD in AI. His focus of research is Retentive Networks.
Sesh is a Distinguished Cloud Architect. His passion is in leveraging technology to enable business outcome especially in the areas of MLOps and Cloud Native solutions. He has an MBA from University of Minnesota and a bachelor's in engineering.
Ruzhu is a master principal cloud architect in OCI's AIML cloud engineering team with strong hands-on expertise in large AI/ML platform and application optimization. He has 20+ years’ experience in Life Science application development, enablement, and user support previously as IBM Lead Scientist and SME in the Life Science global team. He holds a PhD in microbiology (molecular biology focus) and a master's in computer science.
Next Post