SAS Grid provides a shared, centrally managed analytic computing environment that provides workload management, high availability, and accelerated processing. A SAS Grid environment in Oracle Cloud Infrastructure has the flexibility to expand the computing infrastructure incrementally as the number of users and the amount of data increase.
Oracle and SAS have collaborated to provide a high-performance SAS Grid Manager solution on Oracle Cloud Infrastructure. You can now build a SAS Grid Manager cluster on Oracle Cloud Infrastructure to enable many SAS solutions to automatically use a centrally managed grid computing infrastructure. This solution can be deployed by using automated Terraform deployment templates, scales easily, and provides performance that’s consistent with on-premises SAS Grid Manager.
SAS Grid applications need a robust I/O subsystem. This post discusses how to set up SAS Grid in Oracle Cloud Infrastructure to meet the I/O throughput demands of SAS Grid.
Three different performance tests were run to validate SAS Grid on Oracle Cloud Infrastructure.
This test was for the shared file system for the permanent files being shared by all the SAS Grid compute nodes, using the iotest.sh script provided by SAS.
Results: Oracle Cloud Infrastructure met the I/O throughout requirements of 100 MB per physical core per second for SAS DATA (read: 115 MB/s, write: 98 MB/s).
This test used the iotest.sh script provided by SAS.
Results: Oracle Cloud Infrastructure met the I/O throughout requirements of at least 150 MB per physical core per second for SAS WORK and SAS UTILLOC robust temporary file system (read: 320 MB/s, write: 194 MB/s).
Oracle Cloud Infrastructure offers DenseIO compute shapes like bare metal and virtual machine with local NVMe SSDs that can be leveraged for SAS WORK and SAS UTILLOC.
The SAS mixed analytics workload is a scaled workload of computation and I/O-oriented tests used to measure concurrent, mixed job performance. The actual workload used was composed of 19 individual SAS tests: 10 computational, 2 memory, and 7 I/O-intensive tests. Each test was composed of multiple steps, some relying on existing data stores, and others (primarily computation tests) relying on generated data. The tests were chosen as a matrix of long-running and shorter-running tests (ranging in duration from approximately 5 minutes to 1 hour and 20 minutes). In some cases, the same test (running against replicated data streams) was run concurrently and/or back-to-back in a serial fashion to achieve an average of 20 simultaneous streams of heavy I/O, computation (fed by significant I/O in many cases), and memory stress. In all, to achieve the 20-concurrent test matrix, 77 tests were launched.
An aggregate of approximately 300 GB of data from I/O tests and over 120 GB of data from computation tests was submitted to a single instance of the SAS mixed analytic workload with 20 simultaneous tests on each node. Much more data is generated from test-step activity and threaded kernel procedures such as PROC SORT (for example, PROC SORT makes three copies of the incoming file to be sorted). As stated, some of the same tests were run concurrently using different data, and some of the same tests were run back-to-back, to implement a total average of 20 tests running concurrently. This raised the total concurrent I/O throughput of the workload significantly.
Results: The results of the testing showed that Oracle Cloud Infrastructure gave the workload all the I/O throughput that it needed. This is not easily done in a public cloud environment.
“My team has tested SAS Grid on many public clouds," said Margaret Crevar, Sr Manager, SAS Performance Lab, SAS. “We are happy to say that Oracle Cloud’s infrastructure provides the I/O throughput to the IBM Spectrum Scale shared file system that is needed for SAS Grid.”
Oracle Cloud Infrastructure offers low, predictable pricing so that you can realize immediate cost savings in the cloud. Oracle offers the lowest compute pricing from a pay-as-you-go (PAYG) perspective. For details about our Compute offering and pricing, see the Compute page.
For SAS Grid nodes, we recommend using VM.DenseIO2.x series Compute shapes. For SAS Metadata and SAS Mid-tier nodes, we recommend using VM.Standard2.x series (VM.Standard2.2 or higher) or VM.Standard.E2.x series (VM.Standard.E2.4 or higher) Compute shapes.
For storage, you can build shared file system like IBM Spectrum Scale on Oracle Cloud for as low as $0.05 GB/month using Oracle Cloud Infrastructure Block Volumes and bare metal Compute nodes for Spectrum Scale File servers. For more information, see the Block Volumes overview page, and the Oracle Block Volume performance metrics and local NVMe storage metrics blog posts.
We’ve created a Terraform deployment template that automatically provisions Oracle Cloud Infrastructure and deploys SAS Grid software. We’ll open source the template soon. In the meantime, you can contact me (firstname.lastname@example.org) to get access.
The following diagram shows the SAS Grid architecture on Oracle Cloud Infrastructure.
We’ll follow up with a more detailed reference architecture document or technical blog post that includes recommended infrastructure (compute, storage) and detailed results from joint performance testing by SAS and Oracle.
Every use case is different. The only way to know if Oracle Cloud Infrastructure is right for you is to try it. You can select either the Oracle Cloud Free Tier or a 30-day free trial that includes US$300 in credit to get you started with a range of services, including compute, storage, and network. Also, visit us at booth 609 at SuperComputing 2019 (SC19) conference on November 17–22 in Denver, Colorado.