Democratizing AI infrastructure with OCI

December 13, 2022 | 3 minute read
Leo Leung
Vice President, Product Marketing, OCI
Dan Reger
Senior Manager, Product Marketing, OCI
Text Size 100%:

Whether you’re an artificial intelligence (AI) startup looking to train models with billions of parameters for workloads like natural language processing or an enterprise analyzing huge data sets to improve customer experience through services like recommendations systems, you need the right cloud to do this work economically and at scale. You also need a cloud vendor to support you, wherever you are with your AI expertise, from training models in-house with cluster networking and GPU infrastructure to using prebuilt models for text analysis, chatbots, anomaly detection, and more.

Oracle Cloud Infrastructure (OCI) can support all organizations with AI infrastructure: Machine learning (ML) startups that are training large models, technology companies creating voice AI, biotech corporations looking to use AI for drug discovery, retailers looking to process data from video cameras, and much more.

OCI AI infrastructure

OCI provides the following sets of services to meet your diverse AI workload needs:

With Accelerated Infrastructure, you can build your own compute clusters in the cloud, with high performance compute, storage, and networking. AI training needs, measured by model complexity, are growing exponentially each year, and many companies want to build a cluster to process vast amounts of data for AI model training and inferencing. OCI cluster networking and bare metal Compute instances, powered by NVIDIA A100 Tensor Core GPUs, help you train the largest models faster, while OCI Block Volumes can help you create high-bandwidth cluster file systems.

In 2023, we’re increasing our cluster networks to support thousands of nodes. This expansion includes “superclusters” with thousands of GPUs, where each node has eight 2-100Gbps NVIDIA ConnectX SmartNICs connected in cluster network blocks, resulting in 1,600 Gbps of bandwidth between nodes.

OCI also provides options, such as flexible virtual machines (VMs) and NVIDIA A10 Tensor Core GPUs for general purpose workloads, such as scientific workstations, virtual desktops, and computer-aided design (CAD) that don’t require GPU clusters. And you can turn off unused capacity when you don’t need it, paying only for what you use.

This video dives into our cluster network design:

Data scientists can use machine learning services to collaboratively build end-to-end AI workflows all the way from data preparation and cleansing to model training, deployment, inferencing, and management.

With AI services, you can use and customize pretrained models with their own data to improve accuracy. AI services include OCI Language for text analysis, OCI Speech for speech-to-text conversion, OCI Vision for image analysis, and more.

OCI delivers performance like on-premises dedicated custom compute clusters, while providing the elasticity and consumption-based costs of the cloud, including the following examples:

  • Cluster networking provides high throughput and microsecond latency.

  • Bare metal servers offer high-frequency processors, GPUs, and fast and dense local storage.

  • OCI Block Volumes provide industry-leading 300K IOPS per volume.

According to MosaicML’s cloud performance comparison, OCI excels at both training times and cost compared to Amazon Web Services (AWS) and Google Cloud Platform (GCP). For example, providing both a 7.6-times speedup over their baseline at 90% less cost and OCI (in purple) is consistently delivering higher accuracy in image classification (ImageNet) at a lower cost than other clouds.

You can deploy OCI AI Infrastructure anywhere in our public cloud regions as a dedicated region. In a multicloud scenario, applications deployed on third-party clouds, on-premises infrastructure, and colocation facilities can use the REST APIs of Cloud AI services. You can import data from other platforms into OCI Object Storage for analysis using Oracle Data Warehouse, Object Storage Data Lake, OCI Data Science, and Oracle Analytics Cloud.

Learn more, on your own or with Oracle

In the coming months, you can see more from us on how we’re helping customers do more with AI.

Today, you can learn more about how OCI combines specialized network hardware with optimized network flow to limit latency and prevent packet loss in this long-form technical video or see how SoundHound uses Oracle Cloud Infrastructure for AI.

You can also read more about our GPU solutions for AI innovators or contact our Sales team about large compute and GPU clusters.

Leo Leung

Vice President, Product Marketing, OCI

I'm an experienced product manager and product marketer at both large and startup vendors. I've been an cloud application, platform, and infrastructure end user and website developer, enterprise storage system product manager and marketer, cloud storage and cloud application product manager and operations manager, and storage software product marketer. I've managed business partnerships with many infrastructure ISVs, as well as large systems vendors like Cisco, Dell, EMC, HPE, and IBM.

Dan Reger

Senior Manager, Product Marketing, OCI

Previous Post

Why is my email getting suppressed in OCI Email Delivery?

Josh Nason | 3 min read

Next Post

Behind the scenes: Scaling UI for hundreds of services

Abishek Murali Mohan | 13 min read