X

High Performance X7 Compute Service Review and Analysis

Hi Everyone, I’m Lee Gates and I am in the Oracle Cloud Infrastructure team researching performance and optimizing efficiency for applications. At Oracle OpenWorld, we announced a major enhancement to our Compute Service, introducing the new X7 compute platform. In this blog, we'll provide a detailed report covering the performance of our newest compute shapes, Standard2.52 and DenseIO2.52. My colleague Karan Batta announced the service, High Performance X7 Platform Generally Available. We've been able to improve on our first generation across the board!

Let's put our enhanced Compute Service through its paces. First we'll cover the specifications of these two new bare metal instances.

Shape Instance Cores RAM (GB) Networking NVMe Storage (TB)
New Bare Metal Compute Instances
Standard BM.Standard2.52 52 768 2x25 Gigabit Ethernet N/A
Dense I/O BM.DenseIO2.52 52 768 2x25 Gigabit Ethernet 51.2

 

Oracle Cloud Infrastructure bare metal instances deliver over 5.5MM IOPS from NVMe storage devices improving on OCI's best in class performance! The compute architecture is Intel Xeon Platinum 8167M CPU @ up to 2.4 GHZ, enabling high performance compute intensive workloads. A Broadcom 2x25GBE network adapter delivers network access to your block devices, other instances in your network, and your internet traffic. The NVMe flash storage is delivered by Intel NVMe P4500 SSDs.

At Oracle OpenWorld, we introduced our new X7 compute platform by comparing to AWS i3.  We'll compare performance and capability with AWS i3, and then go through our test plan and review the results.  First here's how I like to think about the top line of the direct comparison on capacities and counts.

Oracle Cloud Infrastructure 63% more vcores per compute instance than AWS i3 1% more write IOPS per compute instance than AWS i3 67% more read IOPS per compute instance than AWS i3 57% more memory per compute instance than AWS i3 237% more local NVMe SSD storage than AWS i3
Oracle Cloud Infrastructure Capacity Comparison - Updated February 16, 2018
Instance OCI Dense2.52 AWS i3 OCI Dense2.52 AWS i3 OCI Dense2.52 AWS i3 OCI Dense2.52 AWS i3 OCI Dense2.52 AWS i3
Metric 104 64 1,408,412 1,400,000  5,497,776 3,300,000 768 488 51.2 15.2

 

Bare Metal Test Plan

This table covers the tests I use to measure and benchmark performance, and summarize how other services measure.

Component Measurement Observation
Measurements
NVMe devices 51.2 TB - latency and throughput < 1 millisecond latency for all R/W mixes
Network 25 Gbe bandwidth - host to host < 100 microseconds
Memory Memory bandwidth up to 186 GB/second
Compute CPU2017 SPECrate2017_int estimate up to 197
Block storage Single Volume and 32 Volume Performance < 1 millisecond latency @ 25 Gbe for all R/W mixes

 

Compute

We'll start with the CPU and then work through the components.

Standard Performance Evaluation Corporation (SPEC) CPU®2017 v1 is an industry standard CPU intensive benchmark suite stressing a system's processor, memory subsystem and compiler.  It consists of 10 integer benchmarks, and 14 floating point benchmarks.  The SPEC CPU2017 suite can be run to provide a speed metric or a throughput metric, each using the same base optimizations, or per-benchmark peak optimizations.

SPEC CPU2017 is SPEC's latest update to the CPU series of benchmarks. The focus of CPU2017 is on compute intensive performance and the benchmarks emphasize the performance of the processor, memory hierarchy, and compilers.

The benchmark is also divided into four suites:

  • SPECspeed 2017 Integer – 10 integer benchmarks
  • SPECspeed 2017 Floating Point – 10 floating point benchmarks
  • SPECrate 2017 Integer – 10 integer benchmarks
  • SPECrate 2017 Floating Point – 13 floating point benchmarks

Each of the suites contain two metrics, base and peak, which reflect the amount of optimization allowed. The overall metrics for the benchmark suites which are commonly used are:

  • SPECspeed2017_int_base, SPECspeed2017_int_peak: integer speed
  • SPECspeed2017_fp_base, SPECspeed2017_fp_peak: floating point speed
  • SPECrate2017_int_base, SPECrate2017_int_peak: integer rate
  • SPECrate2017_fp_base, SPECrate2017_fp_peak: floating point rate

When I ran the test using default values for a DenseIO2 bare metal instance the test estimates were:

Shape O/S Compiler SPECspeed2017_int SPECspeed2017_fp SPECrate2017_int SPECrate2017_fp
Base Peak Base Peak Base Peak Base Peak
Oracle Cloud Infrastructure Measured Estimates
Bare Metal Machine Shapes
BM.DenseIO2.52 (stock)
2x Xeon Platinum 8167M
(2.0/2.4* GHz, 2s/52c/104t,
768 GB DDR4/2400 MHz, 25GbE)
OL 7.3 Intel 17.0.4.196 5.28 5.58 101 101 184 197 188 191

"Stock" means BMCS BIOS settings as of Aug 2017. 
* First CPU speed is nominal rating. Second CPU speed is the peak Intel® Turbo Boost speed.
The test log files are
attached.

 

Memory

Memory bandwidth and latency are important for data intensive workloads.  We're running the memory stream-scaling test harness automated by Cloud Harmony.  STREAM measures sustainable memory bandwidth and the corresponding computation rate for four simple vector kernels.  While it can be run serially, it is typically run in parallel (using either OpenMP, pthreads, or MPI).  The benchmark benefits from the amount of compiler optimization applied up to a point; for parallel runs performance is ultimately constrained by thread or process synchronization (e.g. the efficiency of barrier() calls in underlying system libraries). Additionally, some parallel library implementations will use only (and bind only) to physical cores, so some care is required when interpreting results if vcpus (e.g. Intel Hyper-threading) is enabled.

Oracle Cloud Infrastructure Results
Bare Metal Machine Shapes
Shape O/S Compiler Threads Memory Bandwidth GB/s
(1GB = 109 bytes)
Copy Scale Add Triad
BM.DenseIO2.52 (stock)
2x Xeon Platinum 8167M
(2.0/2.4* GHz, 2s/52c/104t,
768 GB DDR4/2400 MHz, 25GbE
OL 7.3 gcc 4.8.5 521 123.43 122.58 145.05 145.15
104 127.92 127.63 143.62 143.81
BM.DenseIO2.52 (stock)
2x Xeon Platinum 8167M
(2.0/2.4* GHz, 2s/52c/104t,
768 GB DDR4/2400 MHz, 25GbE
OL 7.3 Intel 17.0.4 521 170.33 175.31 185.85 186.37
104 160.93 162.68 177.15 177.33

Average of 5 runs.
"Stock" means bare metal instance BIOS settings as of Aug 2017. 
1. System with Hyperthreading enabled, but benchmark run/bound only on physical cores. 

The test log files are attached.

Network

Oracle Cloud Infrastructure employs state of the art networking architecture to ensure consistent, predictable performance.  I'm using the Cloud Harmony networking tests to measure latency, bandwidth, and DNS response time.  We'll cover these here, and attach the logfiles to this post for the summarized results.  The test log files are attached for the measurements.

Latency Measurements within single AD

  • Ping count is 100.
  • Ping interval is .001 seconds.
  • Measurements are microseconds.
  • DenseIO1.36 measurements were taken in Phoenix.
  • DenseIO2.52 measurements were take in Ashburn.
  • Latency tests are not shape dependent
  Average Latency (µs) Min Max Std Dev
BM.DenseIO1.36 60 56 156 10
BM.DenseIO2.52 42 38 95 5.5
VM.DenseIO2.24 42 41 75 3.4

Bandwidth Measurements within single AD

  • Download file size is 100 MB.
  • The test file is downloaded 100 times
  • Measurements are Megabits per second.
  • DenseIO1.36 measurements were taken in Phoenix DC, zone 3.
  • DenseIO2.52 measurements were take in Ashburn DC, zone 1.
 
Shape Average Bandwidth Min Max Std Dev
BM.Standard1.36 9,409 7,926 10,097 285
BM.DenseIO2.52 2,3372 21,167 24,296 338
VM.Standard 2.1 952 716 1,017 37.8
VM.Standard 2.2 1,904 1,355 1,916 93.9
VM.Standard 2.4 3,806 2,649 3,826 142
VM.Standard 2.8 7,601 7,565 7,664 17
VM.Standard 2.16 15,155 14,980 16,000 285
VM.Standard 2.24 23,734 19,187 25,774 2,071
VM.DenseIO 2.8 7,602 7,559 7,737 32
VM.DenseIO 2.16 15,141 12,311 16,739 493
VM.DenseIO 2.24 23,765 19,622 25,284 1,835

DNS Query Response Time

  • The DNS query test measures elapsed time for DNS lookups using test URL (default is google.com).
  • The test file is run 10 times
  • Measurements are Milliseconds.
  • DenseIO1.36 measurements were taken in Phoenix.
  • DenseIO2.52 measurements were taken in Ashburn.
  • DNS query test response times are not shape dependent.
Shape Average Query Time (ms) Min Max Std Dev
BM.DenseIO1.36 64 47 114 26
BM.DenseIO2.52 21 18 33 5.9
BM.DenseIO2.24 21 16 32 6.1

 

NVMe Storage

The performance for the NVMe devices is great.  Intel has delivered improved density and read performance with clear results.  51.2TB of NVMe with 5.5MM IOPS at < 1ms latency flash bests our first generation and we're working to improve even further.

Before running any tests, protect your data by making a backup of your data and operating system environment to prevent any data loss. WARNING: Do not run FIO tests directly against a device that is already in use, such as /dev/sdX. If it is in use as a formatted disk and there is data on it, running FIO with a write workload (readwrite, randrw, write, trimwrite) will overwrite the data on the disk, and cause data corruption. Run FIO only on unformatted raw devices that are not in use.

Test Details

NVMe Block Volume Capacity: 6.4TB x 8 Direct I/O
Host Shape: DenseIO
Region: Ashburn

Observed Performance: 52 Core DenseIO2.52 Bare Metal Machine up to 5.5MM IOPS

Reproduction Steps

  1. Provision 52 Core DenseIO2.52 BM
  2. Run Gartner Cloud Harmony Block Storage
    1. ~/block-storage/run.sh  
      --target /dev/nvme0n1,/dev/nvme1n1,/dev/nvme2n1,
      /devnvme3n1,/dev/nvme4n1,/dev/nvme5n1,
      /dev/nvme6n1,/devnvme7n1 --skip_blocksize 512b
The test log file is attached for the measurements. Updated February 16, 2018.

 

Block Storage

For this test, our newest instances are available in our newest regions - Ashburn and Frankfurt.  Here's how 32 volumes perform when concurrently attached to the same second generation dense instance but each volume with independent FIO tests.  We see an incredible average of 400,000 IOPS to the host.

BM.DenseIO2.52 - Block Storage Summary devices for all Blocksizes and R/W Mix:

 

Test Details

Block Volume Capacity: 1TB x 32
Direct I/O
Host Shape: Standard2.52
Region: Ashburn

Observed Performance: 52 Core DenseIO Bare Metal Machine > 400K IOPS

Reproduction Steps

  1. Reproduction Steps
  2. Provision 52 Core Standard2.52 or DenseIO2.52 BM
  3. Run Gartner Cloud Harmony Block Storage
    1. ~/block-storage/run.sh  
      --target /dev/sdb,/dev/sdc,
      /dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh,/dev/sdi,
      /dev/sdj,/dev/sdk,/dev/sdl,/dev/sdm,/dev/sdn,/dev/sdo,
      /dev/sdp,/dev/sdq,/dev/sdr,/dev/sds,/dev/sdt,/dev/sdu,
      /dev/sdv,/dev/sdw,/dev/sdx,/dev/sdy,/dev/sdz,/dev/sdaa,
      /dev/ab,/dev/sdac,/dev/sdad,/dev/sdae,/dev/sdaf,
      /dev/sdag

The test log files are attached for the measurements. Updated February 16, 2018.

 

Delivering on the Oracle Cloud Infrastructure Promise of High Performance and Value

Our compute team is delighted with the value, performance, and and overall feedback during testing.  Multiple dimensions of performance improvement confirm our updated second generation compute service is capable of meeting the most difficult data-intensive enterprise application requirements.  We hope this straightforward analysis can help open discussion in your organization about using Oracle Cloud Infrastructure for use cases ranging from high performance computing in research and development to databases, to everyday internet-facing applications, all with very low cost of entry and transparent pricing.  Explore the bottom line advantages of converting your IT capital expense to operational expenses with OCI services, and accelerate your innovation with quick-to-deploy, low cost testing environments.

Please share your most challenging high availability and performance sensitive workloads.  Or, if you want more information on our performance methodology, have questions on specific workloads or need help achieving similar results, please reach out to me at lee.gates@oracle.com.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.