If any business can commission any number of cloud-based servers to access infinite compute capacity at relatively low cost, then why haven’t more companies moved their on-premises high-performance computing (HPC) workloads to the cloud?
“The challenges are more around inertia and the thought process rather than anything technical,” said Karan Batta, vice president, Oracle Cloud Infrastructure (OCI), on a panel discussion during the SC20 virtual event on November 17. “If you talk to people who run an on-premises HPC cluster today, they live and breathe that thing. They like to touch and feel the environment. You have a lot of engineers who actually like to optimize their solution right down to the kernel level.”
For these engineers, making their cloud experience identical to their on-premises experience is essential. But can such teams still get a high-performance cloud experience even if they want to run their HPC software on a 10-year-old operating system?
“Yes, we can emulate it for them,” Batta said. “We standardize everything on bare metal, which allows us to run whatever we want on top, while getting 15–20% better performance compared to a virtual machine.”
Batta offered the example of a flat network with RDMA that limits the number of network hops between compute and storage to a maximum of two. It allows for no network or CPU oversubscription and includes locally attached nonvolatile memory (NVMe) storage. For such networks, Oracle Cloud Infrastructure HPC platform solutions offer the lowest latency, most-predictable performance, and fastest HPC cloud services on the market today, while providing all the standard cloud benefits: scalability, flexibility, and pay-per-use.
Flexible, pay-for-use capacity is how GridMarkets, a participant in the Oracle for Startups program, keeps the cost of running its high-performance rendering platform low.
“We don’t own or maintain any hardware, and we’re paying 70% less to deploy an instance on Oracle Cloud Infrastructure than we did running workloads on Amazon Web Services or Google Cloud,” said CEO and Cofounder Mark Ross during a recent episode of Oracle’s HPC in Healthcare, hosted by Batta.
GridMarkets runs its rendering platform on high-performance servers located in OCI data centers worldwide. Whether GridMarkets’ platform is simulating a drug molecule of 20 atoms to learn how their electrons behave or assessing multiple molecules consisting of 2 million atoms, these tasks can take weeks using a cluster of conventional on-premises high-performance computers. Throw in a simulation of a drug molecule’s reaction to different proteins, and it could take several months.
“With Oracle Cloud Infrastructure, we can respond immediately with however many machines and specific configurations our customers need to run their jobs,” Ross told Batta. “What this means for our customers is that there’s no waiting, no queuing.”
When users submit jobs to GridMarkets, they can pick different options. Low-cost options limit the number of machines they can choose from, while users with more complex or urgent needs can select from an unlimited number of virtual machines, CPUs, and GPUs.
Avoiding the queue can be a matter of life or death for users of the digital electrocardiogram (ECG) platform HEARTio. Part of the Oracle for Startups program, HEARTio runs its diagnostic application on OCI, helping people with chest pain instantly detect whether it’s related to specific heart problems.
By pairing artificial intelligence with the patient’s first test, HEARTio can quickly determine the origin of the chest pain, said Cofounder and CEO Utkars Jain. “Getting these kinds of results on-premises could take hours, days, even weeks. But when doctors are making life or death decisions and have only a couple of moments to think about what they’re doing, we’re able to give them the information they need to make sure they’re making the right decisions for their patients.”
For example, the company’s ECG software can detect coronary artery disease with just a 10-second scan. Rather than take months to compute millions of data points for a single training run on the company’s local computer, HEARTio uses eight Nvidia V100 GPUs running on OCI to complete eight different training runs in a week. “I don’t think we could have done this without OCI and Nvidia,” Jain said.
By running their HPC workloads on Oracle Cloud Infrastructure, companies, such as GridMarkets and HEARTio, get the specialization and control of on-premises with the flexibility, performance, and cost advantages of the cloud, Oracle’s Batta said. “That’s really a differentiated aspect of the Oracle Cloud, and I think we’re proving that.”