Oracle Cloud Infrastructure (OCI) aims to solve complex problems for customers. OCI is leading the cloud high performance computing (HPC) battle in performance and price. Over the last few months, we have set new cloud standards for internode latency, cloud HPC benchmarks, and application performance. Oracle Cloud Infrastructure’s bare metal infrastructure lets you get the same on-premises performance in the cloud. To improve the run time, OCI introduced cluster networking in 2018. A cluster network is a pool of high-performance computing (HPC) instances that are connected with a high-bandwidth, ultra-low-latency network providing Large Clusters, Lowest Latency: Cluster Networking on Oracle Cloud Infrastructure.
Resizing a Cluster Network
Our customers were looking for scale-in and scale-out HPC instances as they further optimize workloads and reuse HPC machines. With the launch of “Cluster Network Resize”, customers can now efficiently manage their cost and heal the existing clusters without having to tear them down and recreate. Customers no longer have to manage large clusters to account for “in-case” scenario, they can shrink and expand cluster dynamically while keeping all the instances (existing or new additions) on an extremely low latency and high bandwidth RDMA (remote direct memory access) network.
Customers can now leverage full functionality of instance pools, change the number of instances in a cluster network by simply resizing the underlying instance pool. When you increase the size, instances are provisioned until the required number of instances in the pool are launched within the cluster’s RDMA network. When you decrease the size, instances are terminated (deleted) in the order that they were created: as mentioned in step-by-step guide of resizing a cluster network. This functionality will continue to maintain the desired capacity required for workloads to execute and making workloads extremely efficient in-terms of cost and efficiency.
Customers can now easily troubleshoot any hosts without impacting the whole cluster by simply detaching a specific instance from the pool. This feature provides additional capability to handle any individual execution if required. To remove a specific instance from the cluster network, customer can follow specific step-by-step guide of detach the instance from the cluster network. This instance is retained until customer decides to terminate.
Both the features, dynamic resize of cluster network along with capability of detaching an instance from the pool, are extremely useful for the customers to execute workloads efficiently, troubleshoot problems faster and in a precise way along with retaining their instances on high-bandwidth, ultra-low-latency network.
Availability
Today, cluster networking resize is available in the regions that have our HPC instances Supported Regions and Availability Domains. Cluster networking will continue to spread throughout all our regions as cluster networking-enabled instances continue to roll out.
