Graphics rendering with NVIDIA A10 GPU shapes on OCI

June 5, 2023 | 7 minute read
Praveen Coca
Master Principal Cloud Architect
Yan Sun
Cloud Solution Technologist
Text Size 100%:

Join us in test driving the new Oracle Cloud Infrastructure (OCI) GPU shapes with 3D graphics rendering, animation, and ray-tracing capabilities. We walk you through simple rendering and animation tasks using the Blender classroom samples, summarize the performance findings, and showcase the GPU-based rendering in relation to CPU-based rendering.

NVIDIA A10 shapes on OCI

OCI provides several bare metal and virtual machine (VM) instances powered by NVIDIA A10 Tensor Core GPUs suitable for a variety of accelerated workloads, including artificial intelligence (AI), machine learning (ML) inferencing, computational fluid dynamics (CFD), and virtual desktops when paired with NVIDIA RTX Virtual Workstation software.

Of particular interest to this blog are the OCI GPU Compute shapes based on NVIDIA A10 Tensor Core GPUs, which are best suited for 4K video and gaming and related AI applications such as Stable Diffusion and NVIDIA Omniverse. These shapes on OCI network infrastructure also support SMPTE ST 2110 uncompressed video transport for real-time video production and playout applications. The A10-based GPU shapes offer superior price-performance with a list price of $2 per hour per GPU for pay as you go (PAYG) instances.

The A10 GPUs are available in a bare metal shape (BM.GPU.A10.4) with quad A10 GPUs, supported by 96 GB of GPU memory on a system powered by Intel Xeon Platinum 8358 processors with a total of 64 OCPUs (equivalent of 128 vCPUs), 1 TB of CPU memory, two NVMe drives totaling 7.68 TB of local storage, and two 50 Gbps network bandwidth.

The A10 GPUs are also available as VM instances with the following features and benefits:

  • VM.GPU.A10.1 with one GPU and 24 GB of GPU memory supported by 15 OCPUs, 240 GB of CPU memory, and 24Gbps of network bandwidth

  • VM.GPU.A10.2 with two GPUs and 48 GB of GPU memory supported by 30 OCPUs, 480 GB of CPU memory, and 48 Gbps of network bandwidth

  • The VM shapes only come with block storage.

Check out an excellent blog post by Jeff Davies about running Blender on Oracle Cloud Infrastructure. We test drove the new A10 GPU VM shapes on OCI for gathering metrics for sample rendering tasks using the following methodology:

  • Provisioned the wanted Compute shapes, VM.GPU.A10.1 and VM.GPU.A10.2, with Ubuntu 20.04 as the operating system of choice.

  • Installed NVIDIA drivers.

  • Installed Blender 3.5.0.

  • Downloaded the classroom demo project from the Blender website.

  • Ran the Blender commands to render a single frame and multiple frames of the classroom object, which uses CYCLES and NVIDIA OptiX ray-tracing functionality.

The A10 GPU-based shapes excel at rendering tasks as evidenced by the metrics, illustrating the difference in speeds between GPU- and CPU-based rendering. The Blender command line interface used the following Python script to set OptiX as the render engine for ray tracing and use GPUs for rendering:

rendersettings.py

import bpy

prop = bpy.context.preferences.addons['cycles'].preferences

prop.get_devices()

prop.compute_device_type = 'OPTIX'

for device in prop.devices:

    if device.type == 'OPTIX':

        device.use = True

    else:

        device.use = False

bpy.context.scene.cycles.device = 'GPU'

for scene in bpy.data.scenes:

    scene.cycles.device = 'GPU'

For single-frame rendering using the Blender command line headless invocation, we ran the following commands for CPU-based rendering:

blender -b classroom.blend -o //classroom -f 1 -F PNG -noaudio > blender-CPU-F1.log

For GPU-based rendering on A10 GPU-based VMs, we ran the following command:

blender -b classroom.blend -o //classroom -f 1 -F PNG -noaudio -E CYCLES -- --cycles-device OPTIX --cycles-print-stats -P ~/rendersettings.py --debug-cycles > blender-GPU-F1.log

We also ran multiple-frame rendering using the Blender command line headless invocation. For CPU-based rendering of multiple frames, we ran the following command:

blender -b classroom.blend -o //classroom -s 1 -e 6 -F PNG -noaudio -a > blender-CPU-F6.log

Then we ran GPU-based rendering for multiple frames:

blender -b classroom.blend -o //classroom -s 1 -e 6 -F PNG -noaudio -a -E CYCLES -- --cycles-device OPTIX -P ~/rendersettings.py > blender-GPU-F6.log

Running these commands gathered the following information. In this table, an OCPU is equivalent to one physical core of a processor with hyper-threading enabled. An OCPU corresponds to two hardware processing threads or vCPUs.

Instance Type

GPU/CPU

Frames

Rendering time

VM.Standard.E4.Flex

16 OCPU

1

3 minutes 48 seconds

VM.Standard.E4.Flex

32 OCPU

1

1 minutes 55 seconds

VM.GPU.A10.1

One A10 GPU

1

19.3 seconds

VM.GPU.A10.2

Two A10 GPUs

1

11.4 seconds

VM.Standard.E4.Flex

16 OCPU

6

22 minutes 42 seconds

VM.Standard.E4.Flex

32 OCPU

6

11 minutes 30 seconds

VM.GPU.A10.1

One A10 GPU

6

1 minute 56 seconds

VM.GPU.A10.2

Two A10 GPUs

6

1 minute 7 seconds

 

Performance improvement with GPU-based rendering:

GPU shape

Rendering performance improvement over E4.Flex with 16 OCPU

Rendering performance improvement over E4.Flex with 32 OCPU

VM.GPU.A10.1 (1 frame)

91.5%

83.2%

VM.GPU.A10.2 (1 frame)

95%

.90.1%

VM.GPU.A10.1 (6 frames)

91.5%

83.2%

VM.GPU.A10.2 (6 frames)

95%

90.3%

Conclusion

We hope that these benchmarks help you appreciate the performance, speed, and price benefits offered by the NVIDIA A10-based GPU shapes on OCI for your graphics rendering, animation, video, and ML workloads and solutions. Try Oracle Cloud Infrastructure yourself for free and explore the capabilities.

For more information, see the following resources:

Praveen Coca

Master Principal Cloud Architect

Yan Sun

Cloud Solution Technologist


Previous Post

Announcing OCI Toolkit for VS Code: Supercharge your cloud development

Shreya Krishnan | 5 min read

Next Post


Reduce your Kubernetes costs with preemptible nodes

Rishi Johari | 9 min read