Performance-sensitive workloads, such as weather prediction, have specific computing requirements. It’s hard to pick the right configuration and even harder to fine-tune it for the right price performance. As a result, companies routinely overspend on the computing power needed. Typical mistakes include picking the wrong infrastructure size, being unaware of idle times, overestimating workload scaling, missing a performance plateau, and picking the wrong architecture, ignoring performance bottlenecks.
With these many factors to consider and different ways to architect high-performance computing (HPC) environments, customers need help with arriving at the right configuration. HPC HUB is building a new approach of a machine learning-based recommendation system, suggesting an optimal configuration for any unique workload.
HPC HUB’s latest product, the RocketCompute platform, is a service that allows DevOps teams to dramatically cut costs on public cloud HPC deployments by using machine learning analysis of hardware performance metrics data. It monitors, detects bottlenecks and anomalies, finds similarities in workloads, and suggests configuration optimizations based on historical data collected from previous runs.
Oracle Cloud Infrastructure provides remote direct memory access (RDMA)-enabled cluster networking and bare metal HPC instances. Oracle Cloud Infrastructure now combines its proven HPC instance with a low-latency network that can span thousands of cores. The Compute shape has 36 cores from two 3.0 GHz Intel Xeon Gold 6154 Processors, 384-GB RAM, and 6.4-TB NVME local storage.
One example of using historical data to enable optimization is Windy.app, an HPC HUB customer in weather simulation.
Windy.app is a market leader in global weather forecast for outdoor activities and has 10 million active users worldwide. It features a hyper local weather forecast adopted for outdoor activities, which helps to find the right place, time, and conditions for any outdoor activity.
To improve accuracy in some popular regions, Windy.app makes its own simulations, using an open source weather simulation platform, Weather Research and Forecasting (WRF), a numerical weather prediction system designed to serve both atmospheric research, and operational forecasting needs.
Windy.app aims to find the most cost-efficient configuration that simultaneously performs a daily forecast for a certain time limit.
Figure 1: Process for optimizing price and performance.
To optimize the application for price performance, Windy.app’ s typical workload was wrapped into a test scenario and ran through the optimization algorithm. After the initial calibration run, workload was classified based on the performance metrics. After generating more data from the runs, RocketCompute optimized the cost for running the weather simulation while keeping the run time below a set limit.
Oracle Cloud Infrastructure now offers per second billing that can provide customers accurate cost estimates for their workloads. After a successful search for an optimal configuration, HPC HUB’s team also took care of a regular deployment process integrating with Windy.app’s workflow.
Figure 2: Architectural diagram showing Windy.app integration with RocketCompute and Oracle Cloud Infrastructure.
HPC HUB found excellent scalability and price performance for WRF simulations on Oracle Cloud Infrastructure. It took six iterations with 43 total launches to arrive at the optimal configuration, which was two-times cheaper and 1.5 times faster, than other clouds, resulting in significant savings for Windy.app.
Independent software vendors that want to host their applications in Oracle Cloud Infrastructure can find more information in Design the Infrastructure for Hosting SaaS Applications, along with best practices.