X

On-Premises HPC Performance with Oracle Cloud Infrastructure

Taylor Newill
Principal Product Manager

In conjunction with Supercomputing 2018 we are announcing the availability of one of the fastest high performance cloud computing offerings. Oracle Cloud Infrastructure now offers the BM.HPC2.36 shape, which provides the exact same HPC performance you see on-premises. This new shape strengthens the end-to-end HPC experience on Oracle Cloud Infrastructure, read more about how Oracle is addressing HPC here.

Migrating HPC workloads to the cloud involves surmounting several challenges, not the least of which is ensuring you have the same levels of performance, security, and control as your on on-premises infrastructure.  With this new bare metal compute instance, it is possible.

Built on Intel's 6154 processor, this new bare metal compute instance offers an all core turbo clock speed of 3.7 GHz, and because it's bare metal there’s no virtualization performance penalty.

In addition to the 6.7 TB local NVME drive and the 384 GB of dual rank memory, Oracle Cloud Infrastructure's new HPC shape provides the world's first public cloud Bare Metal RDMA network, enabled by a Mellanox 100 Gbps Network card, in addition to the 25 Gbps Network card for standard traffic. No virtualization means no jitter or bulky and unnecessary cloud monitoring agents. Run any MPI or any HPC workload in the cloud with a performance similar to your on-premises infrastructure.

We're going to share a lot of data in this blog and we encourage you to take a free HPC test drive to validate for yourself. You can deploy a 1,000 core cluster for a few hours for free , you can access these clusters in our Ashburn, VA datacenter or our London datacenter.

Raw Performance

First let's look at raw performance. The BM.HPC2.36 shape has two 18-core Intel Xeon Gold 6154 processors. Intel integrates world-class compute with powerful fabric, memory, storage, and acceleration. You can move your research and innovation forward faster to solve some of the world’s most complex challenges. Working with leading HPC hardware providers like Intel and Mellanox ensures that Oracle Cloud Infrastructure customers get access to on-premises levels of performance with cloud flexibility.

HPC applications perform the same on Oracle Cloud Infrastructure as they do on-premises, for both large and small models.

A common benchmark for compute intensive workloads comes from the Standard Performance Evaluation Corporation or SPEC. SPEC has designed test suites to provide a comparative measure of compute-intensive performance across the widest practical range of hardware using workloads developed from real user applications. Publicly available results for some on-premises clusters compared to BM.HPC2.36 is shown below. Typically cloud vendors are hesitant to share their numbers because virtualized environments do not perform nearly as well as on-premises environments, we are happy to share our results.

 

Scaling

Oracle Cloud Infrastructure scales HPC workloads efficiently. Some cloud vendors have typically expected you to pay for poor single node performance and to overlook their lack of scaling. We invite you to bring your workload to OCI and let us show you that you can run your MPI, compiler, and application workloads on bare metal.

HPC applications do not handle virtualization well, on-premises HPC vendors have shown the significant negative impact that virtualization has on HPC workloads. The performance hit you take when running on some cloud vendors grows exponentially when you run an HPC cluster. Cloud monitoring agents will run frequently and are not synchronized across a cloud cluster, with bare metal you have complete control over the servers in your cluster, this makes a huge performance difference. Running RDMA in a virtualized environment undercuts the value of RDMA, to get the best performance out of RDMA it has to be run on bare metal. To get the best performance from RDMA, it must run on bare metal, as illustrated in the following graph:

When running simulation applications across an HPC cluster the ability to efficiently scale at high node counts is important. It guarantees predictability of the simulation and increases the return-on-investment for expensive application licenses. In a CFD simulation BM.HPC2.36 scales over 100% from 450,000 cells per core to below 6,000 cells per core, consistently, the same performance that you see with on-premises clusters.

Price

With true HPC performance all of the cost and flexibility benefits of the cloud can now be applied to HPC workloads. Our customers are seeing a significant advantage in terms of simulation time, cost per jobs, and capacity. Additionally, on the cloud the concept of “one user, one cluster” means no queue times. and "one user, one cluster."

Many HPC customers are able to attach a per job cost to their jobs. It is very easy to optimize per job cost in the cloud, in fact if the job utilizes RDMA the cost of the job remains the same independent of the speed at which it completes. When a customer is able to specify the number of jobs that they burst per month or per year the value of high performance cloud computing becomes clear.  See the table below, even with conservative numbers for an on-premises HPC cluster, can help customers save money in the short and long term running in the cloud.

Oracle Cloud Infrastructure enables ad-hoc on-demand HPC clusters. This means that each user can spin up a cluster as needed. There is no need to support hundreds of users and a massive file server for your HPC cluster. You can size your HPC cluster specifically for the workload and stop paying for it when you are done with your job.

In addition to the performance, scalability, and price performance Oracle Cloud Infrastructure provides an end-to-end HPC experience with GPUs, Intel and AMD bare metal processors, high performance block storage, and a full POSIX File Storage Service.

Conclusion

You can now run any HPC workload on Oracle cloud with the same predictable performance of your on-premises HPC infrastructure. With the fast Intel processors and RDMA technology, jobs will scale efficiently. At 7.5 cents per core hour, Oracle Cloud Infrastructure's HPC offering provides one of the most FLOPs per penny in the cloud. Navigate to https://cloud.oracle.com/iaas/hpc to test drive an HPC cluster for yourself or signup for our free HPC benchmarking service, and find out how benchmarks prove that Oracle Cloud is faster. For more information, sign up for our upcoming webinar on the advantages of HPC.

Join the discussion

Comments ( 1 )
  • Mark Kelly Thursday, November 15, 2018
    This is a tipping point and offer many benefits. Back in the day when I was engineering we only imagined these possibles.

    I suggest you emphasis the costing model per-se more. i.e. HW depreciated over 3 (or 5) years and needs to be replaced!
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha