OCI with NVIDIA A100 Tensor Core GPUs for HPC and AI sets risk calculation records in financial services

December 11, 2023 | 5 minute read
Amarendra Joshi
Director, North America Cloud Engineering
Xinghong He
Master Principal Cloud Architect
Florent Duguet
Principal Developer Technology Engineer, NVIDIA
Prabhu Ramamoorthy
CFA, FRM, CAIA, Global Partner Success Manager, NVIDIA Financial Services Team
Text Size 100%:

Generative AI is taking the world by storm, from large language models (LLMs) like generative pretrained transformer (GPT) models to diffusion models. Powered by NVIDIA technology, Oracle Cloud Infrastructure (OCI) is uniquely positioned to accelerate generative AI workloads, and those for data processing, analytics, high-performance computing (HPC), quantitative financial applications, and more. It’s a one-stop solution for diverse workload needs as we increasingly see a convergence of HPC quantitative finance and AI requirements from end users.

In this blog post, we begin by focusing on the quantitative applications setting new records on OCI with NVIDIA GPUs. In financial risk management applications, for example, OCI powered by NVIDIA GPUs offers incredible speed with great efficiency and cost savings. NVIDIA A100 Tensor Core GPUs were featured in a stack that set several records in a recent STAC-A2™ audit with 8 x NVIDIA A100 GPUs in an Oracle Cloud BM.GPU4.8 Instance (SUT ID NVDA231026). The system was independently audited by the Strategic Technology Analysis Center (STAC®). STAC and all STAC names are trademarks or registered trademarks of the Strategic Technology Analysis Center.

STAC-A2

STAC-A2 is the technology benchmark standard based on financial market risk analysis. Designed by quants and technologists from some of the world’s largest banks, STAC-A2 reports the performance, scaling, quality, and resource efficiency of any technology stack that can handle the workload. The benchmark is a Monte Carlo estimation of Heston-based Greeks for a path-dependent, multi-asset option with early exercise. The workload can be a proxy extended to price discovery, market risk calculations such as sensitivity Greeks, profit and loss, and value at risk (VaR) in market risk and counterparty credit risk (CCR) workloads, such as credit valuation adjustment (CVA) and margins that financial institutions calculate for trading and risk management.

The STAC-A2 tests were performed using an NVIDIA-authored STAC Pack on OCI hardware. In all, the STAC-A2 specifications delivered over 200 test results which are summarized in the STAC Report. OCI wants to draw attention to the following points:

  • Compared to all publicly reported solutions to date, this solution set several records, including the best price performance among all cloud solutions (48,404 options per USD) over one hour, three days, and one year of continuous use (STAC-A2.β2.HPORTFOLIO.PRICE_PERF.[BURST | PERIODIC | CONTINUOUS])
  • Compared to the most recently audited, single-server cloud-based solution (INTC221006), this OCI demonstrated the following capabilities:
    • Demonstrated 3.2-times, 3.2-times, and 2.0-times price-performance advantages (in options per USD) for one hour, three days, and one year of continuous use (STAC-A2.β2.HPORTFOLIO.PRICE_PERF.[BURST | PERIODIC | CONTINUOUS])
    • Had 14.6-times the throughput (options per second) (STAC-A2.β2.HPORTFOLIO.SPEED)
    • Demonstrated 6.8-times and 7.8-times the speed in cold and warm runs of the baseline Greeks benchmarks (STAC-A2.β2.GREEKS.TIME.COLD|WARM)
    • Demonstrated 9.0-times / 4.3-times the speed in cold and warm runs of the large Greeks benchmark (STAC-A2.β2.GREEKS.10-100K.1260.TIME.COLD|WARM)

How OCI stacks up

The financial industry’s key concerns have been pricing and risk calculation, which rely heavily on the latest technologies for instantaneous calculations and real-time decision-making for trading. Pricing and risk calculation, algorithmic trading model development, and backtesting need a robust scalable environment with the fastest interconnects and networking.

The OCI Compute GPU instance, BM.GPU4.8, and other instances based on NVIDIA Ampere and Hopper architectures provide that solution, so that these workloads can run standalone or scaled per end user, such as for Market Risk VaR calculations or CCR calculations, such as CVA. In areas like CVA, scaled setups have been shown to reduce the number of nodes from 100 to 4 in simulation- and compute-intensive calculations, separately from STAC benchmarking.

The OCI-based solution enables scaling up with NVIDIA GPUs using fewer nodes. It enables the highest performance at the lowest operating cost with the ease of use of adopting cutting-edge hardware for solutions on the cloud. The solutions can extend to other workloads, such as AI, quantitative modeling through various techniques like traditional quantitative models, machine learning such as XGBOOST, and deep learning, such as long short-term memory (LSTM), recurrent neural networks (RNNs), and other advanced areas. These models must be backtested on various ticker symbols for different products, so they need a flexible cloud infrastructure, such as OCI Compute with NVIDIA GPU instances.

NVIDIA provides all the key software component layers. NVIDIA offers multiple options to developers, including the NVIDIA CUDA software development kit (SDK) for CUDA and C++, and enables other languages and directive-based solutions, such as OpenMP, OpenACC, accelerations with C++ 17 standard parallelism, and Fortran parallel constructs with the NVIDIA HPC Software Developer Kit (SDK).

The implementation used for STAC-A2 was developed on CUDA 12.0 and uses the highly optimized libraries delivered with CUDA: cuBLAS, the GPU-enabled implementation of the linear algebra package BLAS, and cuRAND, a parallel and efficient GPU implementation of random number generators. The STAC Pack used the CUDA Toolkit-12.2 that includes NVCC12.2.91, associated CUDA libraries, and GCC 11.2.1.

The different components of the implementation were designed in a modular and maintainable framework using object-oriented programming. All floating-point operations were conducted in IEEE-754 double precision (64 bits). The STAC-A2 implementation was developed using tools that NVIDIA provides to help debug and profile CUDA code. These tools include NVIDIA Nsight Systems for timeline profiling, NVIDIA Nsight Compute for kernel profiling, and NVIDIA Compute Sanitizer and CUDA-GDB for debugging.

Summary

The convergence of HPC and AI is happening as financial firms, including global market banks, insurers, hedge funds, market-makers, high-frequency traders, and asset managers, work on big-picture solutions. These combine various modeling techniques, including HPC quantitative finance, machine learning (ML), reinforcement learning (RL) and AI neural nets, and natural language processing (NLP) generative AI with LLMs.

Organizations can use LLMs on unstructured sources of information, such as financial news, and techniques such as retrieval augmented generation (RAG) to gain an information edge that's beyond traditional sources of tabular market data, called “alternative data.” Organizations are converging NLP with generative AI, creating new signals, and feeding inputs into quantitative calculations. Enterprise customers can benefit from customizing such AI LLM models to understand the financial domain better and meet their individual needs with greater accuracy by leveraging a combination of AI and quantitative financial models in their workflows. In addition, signals generated by such models, along with trading risk pricing and calculations, are performed on a real-time basis and repeated multiple times to backtest the models for ongoing monitoring based on market conditions.

Powered by NVIDIA technology, Oracle Cloud Infrastructure is uniquely positioned to accelerate workloads ranging from HPC quantitative financial applications and data processing to analytics and generative AI, providing maximum value and return on investment (ROI) and reducing total cost of ownership (TCO) for customers looking to integrate diverse workloads into their financial applications.

Amarendra Joshi

Director, North America Cloud Engineering

Amarendra Joshi is a Director in North America Cloud Engineering. His team helps customers leverage Oracle Cloud Infrastructure for their cloud computing needs.

Xinghong He

Master Principal Cloud Architect

HPC/AI Enablement and Performance, GPU and Parallel Computing Solutions.

Florent Duguet

Principal Developer Technology Engineer, NVIDIA

Florent is principal Developer Technology Engineer (Devtech) at NVIDIA. He graduated with a PhD in applied mathematics at INRIA in 2005. After graduation, he consulted for several financial institutions, mainly investment banks in electronic markets and then quantitative research teams. An early adopter of CUDA, he enabled several institutions with GPU computing, in Finance, Insurance and Oil and Gas. Today, Florent is working on optimizing CUDA implementations for quantitative finance and energy simulations.

Prabhu Ramamoorthy

CFA, FRM, CAIA, Global Partner Success Manager, NVIDIA Financial Services Team

Prabhu Ramamoorthy is the financial ecosystem partner manager at NVIDIA, where he focuses on quant and AI acceleration for financial services. Previously, he was head of technology at the margin software firm Dash Regtech, catering to leading investment banks. He also served as a director at KPMG/EY, where he helped 100+ financial institutions over the last 10 years. Ramamoorthy holds an MBA from the University of Wisconsin-Madison and an undergraduate degree in Engineering from BITS-Pilani, one of the top engineering institutes in India. He is a CFA charterholder, financial risk manager, and chartered alternative investment analyst specializing in financial transformation use cases.


Previous Post

Zero-trust interoperability for global defense alliances: 5 ways Oracle technology enables classified data integration

Greg Magram | 8 min read

Next Post


OCI fall audit reports are now available

Jessica Giddens | 2 min read