Tuesday Sep 10, 2013

Oracle ZFS Storage ZS3-4 Produces Best 2-Node Performance on SPECsfs2008 NFSv3

The Oracle ZFS Storage ZS3-4 storage system delivered world record two-node performance on the SPECsfs2008 NFSv3 benchmark, beating results published on NetApp's dual-controller and four-node high-end FAS6240 storage systems.

  • The Oracle ZFS Storage ZS3-4 storage system delivered a world record two-node result of 450,702 SPECsfs2008_nfs.v3 Ops/sec with an Overall Response Time (ORT) of 0.70 msec on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-4 storage system delivered 2.4x higher throughput than the dual-controller NetApp FAS6240 and 4.5x higher throughput than the dual-controller NetApp FAS3270 on the SPECsfs2008_nfs.v3 benchmark at less than half the list price of either result.

  • The Oracle ZFS Storage ZS3-4 storage system had 42 percent higher throughput than the four-node NetApp FAS6240 on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-4 storage aystem has 54 percent better Overall Response Time than the 4-node NetApp FAS6240 on the SPECsfs2008 NFSv3 benchmark.

Performance Landscape

Two node results for SPECsfs2008_nfs.v3 presented (in decreasing SPECsfs2008_nfs.v3 Ops/sec order) along with other select results.

Sponsor System Nodes Disks Throughput
(Ops/sec)
Overall Response
Time (msec)
Oracle ZS3-4 2 464 450,702 0.70
IBM SONAS 1.2 2 1975 403,326 3.23
NetApp FAS6240 4 288 260,388 1.53
NetApp FAS6240 2 288 190,675 1.17
EMC VG8 312 135,521 1.92
Oracle 7320 2 136 134,140 1.51
EMC NS-G8 100 110,621 2.32
NetApp FAS3270 2 360 101,183 1.66

Throughput SPECsfs2008_nfs.v3 Ops/sec — the Performance Metric
Overall Response Time — the corresponding Response Time Metric
Nodes — Nodes and Controllers are being used interchangeably

Complete SPECsfs2008 benchmark results may be found at http://www.spec.org/sfs2008/results/sfs2008.html.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS3-4 storage system in clustered configuration
2 x Oracle ZFS Storage ZS3-4 controllers, each with
8 x 2.4 GHz Intel Xeon E7-4870 processors
2 TB memory
2 x 10GbE NICs
20 x Sun Disk shelves
18 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 8 x 73 GB SAS-2 flash-enabled write-cache

Benchmark Description

SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation (SPEC) benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. SPECsfs2008 results summarize the server's capabilities with respect to the number of operations that can be handled per second, as well as the overall latency of the operations. The suite is a follow-on to the SFS97_R1 benchmark, adding a CIFS workload, an updated NFSv3 workload, support for additional client platforms, and a new test harness and reporting/submission framework.

See Also

Disclosure Statement

SPEC and SPECsfs are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of September 10, 2013, for more information see www.spec.org. Oracle ZFS Storage ZS3-4 Appliance 450,702 SPECsfs2008_nfs.v3 Ops/sec, 0.70 msec ORT, NetApp Data ONTAP 8.1 Cluster-Mode (4-node FAS6240) 260,388 SPECsfs2008_nfs.v3 Ops/Sec, 1.53 msec ORT, NetApp FAS6240 190,675 SPECsfs2008_nfs.v3 Ops/Sec, 1.17 msec ORT. NetApp FAS3270 101,183 SPECsfs2008_nfs.v3 Ops/Sec, 1.66 msec ORT.

Nodes refer to the item in the SPECsfs2008 disclosed Configuration Bill of Materials that have the Processing Elements that perform the NFS Processing Function. These are the first item listed in each of disclosed Configuration Bill of Materials except for EMC where it is both the first and third items listed, and HP, where it is the second item listed as Blade Servers. The number of nodes is from the QTY disclosed in the Configuration Bill of Materials as described above. Configuration Bill of Materials list price for Oracle result of US$ 423,644. Configuration Bill of Materials list price for NetApp FAS3270 result of US$ 1,215,290. Configuration Bill of Materials list price for NetApp FAS6240 result of US$ 1,028,118. Oracle pricing from https://shop.oracle.com/pls/ostore/f?p=dstore:home:0, traverse to "Storage and Tape" and then to "NAS Storage". NetApp's pricing from http://www.netapp.com/us/media/na-list-usd-netapp-custom-state-new-discounts.html.

Oracle ZFS Storage ZS3-2 Beats Comparable NetApp on SPECsfs2008 NFSv3

Oracle ZFS Storage ZS3-2 storage system delivered outstanding performance on the SPECsfs2008 NFSv3 benchmark, beating results published on NetApp's fastest midrange platform, the NetApp FAS3270, the NetApp FAS6240 and the EMC Gateway NS-G8 Server Failover Cluster.

  • The Oracle ZFS Storage ZS3-2 storage system delivered 210,535 SPECsfs2008_nfs.v3 Ops/sec with an Overall Response Time (ORT) of 1.12 msec on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-2 storage system delivered 10% higher throughput than the NetApp FAS6240 on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-2 storage system has 52% higher throughput than the NetApp FAS3270 on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-2 storage system has 5% better Overall Response Time than the NetApp FAS6240 on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-2 storage system has 33% better Overall Response Time than the NetApp FAS3270 on the SPECsfs2008 NFSv3 benchmark.

Performance Landscape

Results for SPECsfs2008 NFSv3 (in decreasing SPECsfs2008_nfs.v3 Ops/sec order) for competitive systems.

Sponsor System Throughput
(Ops/sec)
Overall Response
Time (msec)
Oracle ZS3-2 210,535 1.12
NetApp FAS6240 190,675 1.17
EMC VG8 135,521 1.92
EMC NS-G8 110,621 2.32
NetApp FAS3270 101,183 1.66
NetApp FAS3250 100,922 1.76

Throughput SPECsfs2008_nfs.v3 Ops/sec = the Performance Metric
Overall Response Time = the corresponding Response Time Metric

Complete SPECsfs2008 benchmark results may be found at http://www.spec.org/sfs2008/results/sfs2008.html.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS3-2 storage system in clustered configuration
2 x Oracle ZFS Storage ZS3-2 controllers, each with
4 x 2.1 GHz Intel Xeon E5-2658 processors
512 GB memory
8 x Sun Disk shelves
3 x shelves with 24 x 900 GB 10K RPM SAS-2 drives
3 x shelves with 20 x 900 GB 10K RPM SAS-2 drives
2 x shelves with 20 x 900 GB 10K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Benchmark Description

SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation (SPEC) benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. SPECsfs2008 results summarize the server's capabilities with respect to the number of operations that can be handled per second, as well as the overall latency of the operations. The suite is a follow-on to the SFS97_R1 benchmark, adding a CIFS workload, an updated NFSv3 workload, support for additional client platforms, and a new test harness and reporting/submission framework.

 

See Also

Disclosure Statement

SPEC and SPECsfs are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of September 10, 2013, for more information see www.spec.org. Oracle ZFS Storage ZS3-2 Appliance 210,535 SPECsfs2008_nfs.v3 Ops/sec, 1.12 msec ORT, NetApp FAS6240 190,675 SPECsfs2008_nfs.v3 Ops/Sec, 1.17 msec ORT, EMC Celerra VG8 Server Failover Cluster, 2 Data Movers (1 stdby) / Symmetrix VMAX 135,521 SPECsfs2008_nfs.v3 Ops/Sec, 1.92 msec ORT, EMC Celerra Gateway NS-G8 Server Failover Cluster, 3 Datamovers (1 stdby) / Symmetrix V-Max 110,621 SPECsfs2008_nfs.v3 Ops/Sec, 2.32 msec ORT. NetApp FAS3270 101,183 SPECsfs2008_nfs.v3 Ops/Sec, 1.66 msec ORT. NetApp FAS3250 100,922 SPECsfs2008_nfs.v3 Ops/Sec, 1.76 msec ORT.

Thursday Apr 19, 2012

Sun ZFS Storage 7420 Appliance Delivers 2-Node World Record SPECsfs2008 NFS Benchmark

Oracle's Sun ZFS Storage 7420 appliance delivered world record two-node performance on the SPECsfs2008 NFS benchmark, beating results published on NetApp's dual-controller and 4-node high-end FAS6240 storage systems.

  • The Sun ZFS Storage 7420 appliance delivered a world record two-node result of 267,928 SPECsfs2008_nfs.v3 Ops/sec with an Overall Response Time (ORT) of 1.31 msec on the SPECsfs2008 NFS benchmark.

  • The Sun ZFS Storage 7420 appliance delivered 1.4x higher throughput than the dual-controller NetApp FAS6240 and 2.6x higher throughput than the dual-controller NetApp FAS3270 on the SPECsfs2008_nfs.v3 benchmark at less than half the list price of either result.

  • The Sun ZFS Storage 7420 appliance required 10 percent less rack space than the dual-controller NetApp FAS6240.

  • The Sun ZFS Storage 7420 appliance had 3 percent higher throughput than the 4-node NetApp FAS6240 on the SPECsfs2008_nfs.v3 benchmark.

  • The Sun ZFS Storage 7420 appliance required 25 percent less rack space than the 4-node NetApp FAS6240.

  • The Sun ZFS Storage 7420 appliance has 14 percent better Overall Response Time than the 4-node NetApp FAS6240 on the SPECsfs2008_nfs.v3 benchmark.

Performance Landscape

SPECsfs2008_nfs.v3 Performance Chart (in decreasing SPECsfs2008_nfs.v3 Ops/sec order)

Sponsor System Throughput
(Ops/sec)
Overall Response
Time (msec)
Nodes Memory (GB)
Including Flash
Disks Rack Units –
Controllers
+Disks
Oracle 7420 267,928 1.31 2 6,728 280 54
NetApp FAS6240 260,388 1.53 4 2,256 288 72
NetApp FAS6240 190,675 1.17 2 1,128 288 60
EMC VG8 135,521 1.92 280 312
Oracle 7320 134,140 1.51 2 4,968 136 26
EMC NS-G8 110,621 2.32 264 100
NetApp FAS3270 101,183 1.66 2 40 360 66

Throughput SPECsfs2008_nfs.v3 Ops/sec — the Performance Metric
Overall Response Time — the corresponding Response Time Metric
Nodes — Nodes and Controllers are being used interchangeably

Complete SPECsfs2008 benchmark results may be found at http://www.spec.org/sfs2008/results/sfs2008.html.

Configuration Summary

Storage Configuration:

Sun ZFS Storage 7420 appliance in clustered configuration
2 x Sun ZFS Storage 7420 controllers, each with
4 x 2.4 GHz Intel Xeon E7-4870 processors
1 TB memory
4 x 512 GB SSD flash-enabled read-cache
2 x 10GbE NICs
12 x Sun Disk shelves
10 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Server Configuration:

4 x Sun Fire X4270 M2 servers, each with
2 x 3.3 GHz Intel Xeon E5680 processors
144 GB memory
1 x 10 GbE NIC
Oracle Solaris 10 9/10

Switches:

1 x 24-port 10Gb Ethernet Switch

Benchmark Description

SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation (SPEC) benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. SPECsfs2008 results summarize the server's capabilities with respect to the number of operations that can be handled per second, as well as the overall latency of the operations. The suite is a follow-on to the SFS97_R1 benchmark, adding a CIFS workload, an updated NFSv3 workload, support for additional client platforms, and a new test harness and reporting/submission framework.

See Also

Disclosure Statement

SPEC and SPECsfs are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of April 18, 2012, for more information see www.spec.org. Sun ZFS Storage 7420 Appliance 267,928 SPECsfs2008_nfs.v3 Ops/sec, 1.31 msec ORT, NetApp Data ONTAP 8.1 Cluster-Mode (4-node FAS6240) 260,388 SPECsfs2008_nfs.v3 Ops/Sec, 1.53 msec ORT, NetApp FAS6240 190,675 SPECsfs2008_nfs.v3 Ops/Sec, 1.17 msec ORT. NetApp FAS3270 101,183 SPECsfs2008_nfs.v3 Ops/Sec, 1.66 msec ORT.

Nodes refer to the item in the SPECsfs2008 disclosed Configuration Bill of Materials that have the Processing Elements that perform the NFS Processing Function. These are the first item listed in each of disclosed Configuration Bill of Materials except for EMC where it is both the first and third items listed, and HP, where it is the second item listed as Blade Servers. The number of nodes is from the QTY disclosed in the Configuration Bill of Materials as described above. Configuration Bill of Materials list price for Oracle result of US$ 423,644. Configuration Bill of Materials list price for NetApp FAS3270 result of US$ 1,215,290. Configuration Bill of Materials list price for NetApp FAS6240 result of US$ 1,028,118. Oracle pricing from https://shop.oracle.com/pls/ostore/f?p=dstore:home:0, traverse to "Storage and Tape" and then to "NAS Storage". NetApp's pricing from http://www.netapp.com/us/media/na-list-usd-netapp-custom-state-new-discounts.html.

Monday Feb 27, 2012

Sun ZFS Storage 7320 Appliance 33% Faster Than NetApp FAS3270 on SPECsfs2008

Oracle's Sun ZFS Storage 7320 appliance delivered outstanding performance on the SPECsfs2008 NFS benchmark, beating results published on NetApp's fastest midrange platform, the NetApp FAS3270, and the EMC Gateway NS-G8 Server Failover Cluster.

  • The Sun ZFS Storage 7320 appliance delivered 134,140 SPECsfs2008_nfs.v3 Ops/sec with an Overall Response Time (ORT) of 1.51 msec on the SPECsfs2008 NFS benchmark.

  • The Sun ZFS Storage 7320 appliance has 33% higher throughput than the NetApp FAS3270 on the SPECsfs2008 NFS benchmark.

  • The Sun ZFS Storage 7320 appliance required less than half the rack space of the NetApp FAS3270.

  • The Sun ZFS Storage 7320 appliance has 9% better Overall Response Time than the NetApp FAS3270 on the SPECsfs2008 NFS benchmark.

Performance Landscape

SPECsfs2008_nfs.v3 Performance Chart (in decreasing SPECsfs2008_nfs.v3 Ops/sec order)

Sponsor System Throughput
(Ops/sec)
Overall Response
Time (msec)
Memory
(GB)
Disks Exported
Capacity (TB)
Rack Units
Controllers+Disks
EMC VG8 135,521 1.92 280 312 19.2
Oracle 7320 134,140 1.51 288 136 37.0 26
EMC NS-G8 110,621 2.32 264 100 17.6
NetApp FAS3270 101,183 1.66 40 360 110.1 66

Throughput SPECsfs2008_nfs.v3 Ops/sec = the Performance Metric
Overall Response Time = the corresponding Response Time Metric

Complete SPECsfs2008 benchmark results may be found at http://www.spec.org/sfs2008/results/sfs2008.html.

Configuration Summary

Storage Configuration:

Sun ZFS Storage 7320 appliance in clustered configuration
2 x Sun ZFS Storage 7320 controllers, each with
2 x 2.4 GHz Intel Xeon E5620 processors
144 GB memory
4 x 512 GB SSD flash-enabled read-cache
6 x Sun Disk shelves
4 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Server Configuration:

3 x Sun Fire X4270 M2 servers, each with
2 x 2.4 GHz Intel Xeon E5620 processors
12 GB memory
1 x 10 GbE connection to the Sun ZFS Storage 7320 appliance
Oracle Solaris 10 8/11

Benchmark Description

SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation (SPEC) benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. SPECsfs2008 results summarize the server's capabilities with respect to the number of operations that can be handled per second, as well as the overall latency of the operations. The suite is a follow-on to the SFS97_R1 benchmark, adding a CIFS workload, an updated NFSv3 workload, support for additional client platforms, and a new test harness and reporting/submission framework.

See Also

Disclosure Statement

SPEC and SPECsfs are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of February 22, 2012, for more information see www.spec.org. Sun ZFS Storage 7320 Appliance 134,140 SPECsfs2008_nfs.v3 Ops/sec, 1.51 msec ORT, NetApp FAS3270 101,183 SPECsfs2008_nfs.v3 Ops/Sec, 1.66 msec ORT, EMC Celerra Gateway NS-G8 Server Failover Cluster, 3 Datamovers (1 stdby) / Symmetrix V-Max 110,621 SPECsfs2008_nfs.v3 Ops/Sec, 2.32 msec ORT.

Monday Oct 03, 2011

Sun ZFS Storage 7420 Appliance Doubles NetApp FAS3270A on SPC-1 Benchmark

Oracle's Sun ZFS Storage 7420 appliance delivered outstanding performance and price/performance on the SPC Benchmark 1, beating results published on the NetApp FAS3270A.

  • The Sun ZFS Storage 7420 appliance delivered 137,066.20 SPC-1 IOPS at $2.99 $/SPC-1 IOPS on the SPC-1 benchmark.

  • The Sun ZFS Storage 7420 appliance outperformed the NetApp FAS3270A by 2x on the SPC-1 benchmark.

  • The Sun ZFS Storage 7420 appliance outperformed the NetApp FAS3270A by 2.5x on price/performance on the SPC-1 benchmark.

Performance Landscape

SPC-1 Performance Chart (in decreasing performance order)

System SPC-1
IOPS
$/SPC-1
IOPS
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
Huawei Symantec S6800T 150,061.17 $3.08 43,937.515 $461,471.75 Mirroring 08/31/11 A00107
Sun ZFS Storage 7420 137,066.20 $2.99 23,703.035 $409,933 Mirroring 10/03/11 A00108
Huawei Symantec S5600T 102,471.66 $2.73 35,945.185 $279,914.53 Mirroring 08/25/11 A00106
Pillar Axiom 600 70,102.27 $7.32 32,000.000 $513,112 Mirroring 04/19/11 A00104
NetApp FAS3270A 68,034.63 $7.48 21,659.386 $509,200.79 RAID DP 11/09/10 AE00004
Sun Storage 6780 62,261.80 $6.89 13,742.218 $429,294 Mirroring 06/01/10 A00094
NetApp FAS3170 60,515.34 $10.01 19,628,500 $605,492 RAID-DP 06/10/08 A00066
IBM V7000 56,510.85 $7.24 14,422.309 $409,410.86 Mirroring 10/22/10 A00097
IBM V7000 53,014.29 $7.52 24,433.592 $389,425.11 Mirroring 03/14/11 A00103

SPC-1 IOPS = the Performance Metric
$/SPC-1 IOPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-1 benchmark results may be found at http://www.storageperformance.org.

Configuration Summary

Storage Configuration:

Sun ZFS Storage 7420 appliance in clustered configuration
2 x Sun ZFS Storage 7420 controllers, each with
4 x 2.0 GHz Intel Xeon X7550 processors
512 GB memory, 64 x 8 GB 1066 MHz DDR3 DIMMs
4 x 512 GB SSD flash-enabled read-cache
12 x Sun Disk shelves
10 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Server Configuration:

1 x SPARC T3-2 server
2 x 1.65 GHz SPARC T3 processors
128 GB memory
6 x 8 Gb FC connections to the Sun ZFS Storage 7420 appliance
Oracle Solaris 10 9/10

Benchmark Description

SPC Benchmark-1 (SPC-1): is the first industry standard storage benchmark and is the most comprehensive performance analysis environment ever constructed for storage subsystems. The I/O workload in SPC-1 is characterized by predominately random I/O operations as typified by multi-user OLTP, database, and email servers environments. SPC-1 uses a highly efficient multi-threaded workload generator to thoroughly analyze direct attach or network storage subsystems. The SPC-1 benchmark enables companies to rapidly produce valid performance and price/performance results using a variety of host platforms and storage network topologies.

SPC1 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

See Also

Disclosure Statement

SPC-1, SPC-1 IOPS, $/SPC-1 IOPS are registered trademarks of Storage Performance Council (SPC). Results as of October 2, 2011, for more information see www.storageperformance.org. Sun ZFS Storage 7420 Appliance http://www.storageperformance.org/results/benchmark_results_spc1#a00108; NetApp FAS3270A http://www.storageperformance.org/results/benchmark_results_spc1#ae00004.

Tuesday Sep 21, 2010

ProMAX Performance and Throughput on Sun Fire X2270 and Sun Storage 7410

Halliburton/Landmark's ProMAX 3D Prestack Kirchhoff Time Migration's single job scalability and multiple job throughput using various scheduling methods are evaluated on a cluster of Oracle's Sun Fire X2270 servers attached via QDR InfiniBand to Oracle's Sun Storage 7410 system.

Two resource scheduling methods, compact and distributed, are compared while increasing the system load with additional concurrent ProMAX jobs.

  • A single ProMAX job has near linear scaling of 5.5x on 6 nodes of a Sun Fire X2270 cluster.

  • A single ProMAX job has near linear scaling of 7.5x on a Sun Fire X2270 server when running from 1 to 8 threads.

  • ProMAX can take advantage of Oracle's Sun Storage 7410 system features compared to dedicated local disks. There was no significant difference in run time observed when running up to 8 concurrent 16 thread jobs.

  • The 8-thread ProMAX job throughput using the distributed scheduling method is equivalent or slightly faster than the compact scheme for 1 to 4 concurrent jobs.

  • The 16-thread ProMAX job throughput using the distributed scheduling method is up to 8% faster when compared to the compact scheme on an 8-node Sun Fire X2270 cluster.

The multiple job throughput characterization revealed in this benchmark study are key in pre-configuring Oracle Grid Engine resource scheduling for ProMAX on a Sun Fire X2270 cluster and provide valuable insight for server consolidation.

Performance Landscape

Single Job Scaling

Single job performance on a single node is near linear up the number of cores in the node, i.e. 2 Intel Xeon X5570s with 4 cores each. With hyperthreading (2 active threads per core) enabled, more ProMAX threads are used increasing the load on the CPU's memory architecture causing the reduced speedups.
ProMAX single job performance on the 6-node cluster shows near linear speedup node to node.
Single Job 6-Node Scalability
Hyperthreading Enabled - 16 Threads/Node Maximum
Number of Nodes Threads Per Node Speedup to 1 Thread Speedup to 1 Node
6 16 54.2 5.5
4 16 36.2 3.6
3 16 26.1 2.6
2 16 17.6 1.8
1 16 10.0 1.0
1 14 9.2
1 12 8.6
1 10 7.2\*
1 8 7.5
1 6 5.9
1 4 3.9
1 3 3.0
1 2 2.0
1 1 1.0

\* 2 threads contend with two master node daemons

Multiple Job Throughput Scaling, Compact Scheduling

With the Sun Storage 7410 system, performance of 8 concurrent jobs on the cluster using compact scheduling is equivalent to a single job.

Multiple Job Throughput Scalability
Hyperthreading Enabled - 16 Threads/Node Maximum
Number of Nodes Number of Nodes per Job Threads Per Node per Job Performance Relative to 1 Job Total Nodes Percent Cluster Used
1 1 16 1.00 1 13
2 1 16 1.00 2 25
4 1 16 1.00 4 50
8 1 16 1.00 8 100

Multiple 8-Thread Job Throughput Scaling, Compact vs. Distributed Scheduling

These results report the difference of different distributed method resource scheduling levels to 1, 2, and 4 concurrent job compact method baselines.

Multiple 8-Thread Job Scheduling
HyperThreading Enabled - Use 8 Threads/Node Maximum
Number of Jobs Number of Nodes per Job Threads Per Node per Job Performance Relative to 1 Job Total Nodes Total Threads per Node Used Percent of PVM Master 8 Threads Used
1 1 8 1.00 1 8 100
1 4 2 1.01 4 2 25
1 8 1 1.01 8 1 13

2 1 8 1.00 2 8 100
2 4 2 1.01 4 4 50
2 8 1 1.01 8 2 25

4 1 8 1.00 4 8 100
4 4 2 1.00 4 8 100
4 8 1 1.01 8 4 100

Multiple 16-Thread Job Throughput Scaling, Compact vs. Distributed Scheduling

The results are reported relative to the performance of 1, 2, 4, and 8 concurrent 2-node, 8-thread jobs.

Multiple 16-Thread Job Scheduling
HyperThreading Enabled - 16 Threads/Node Available
Number of Jobs Number of Nodes per Job Threads Per Node per Job Performance Relative to 1 Job Total Nodes Total Threads per Node Used Percent of PVM Master 16 Threads Used
1 1 16 0.66 1 16 100\*
1 2 8 1.00 2 8 50
1 4 4 1.03 4 4 25
1 8 2 1.06 8 2 13

2 1 16 0.70 2 16 100\*
2 2 8 1.00 4 8 50
2 4 4 1.07 8 4 25
2 8 2 1.08 8 4 25

4 1 16 0.74 4 16 100\*
4 4 4 0.74 4 16 100\*
4 2 8 1.00 8 8 50
4 4 4 1.05 8 8 50
4 8 2 1.04 8 8 50

8 1 16 1.00 8 16 100\*
8 4 4 1.00 8 16 100\*
8 8 2 1.00 8 16 100\*

\* master PVM host; running 20 to 21 total threads (over-subscribed)

Results and Configuration Summary

Hardware Configuration:

8 x Sun Fire X2270 servers, each with
2 x 2.93 GHz Intel Xeon X5570 processors
48 GB memory at 1333 MHz
1 x 500 GB SATA
Sun Storage 7410 system
4 x 2.3 GHz AMD Opteron 8356 processors
128 GB memory
2 Internal 233GB SAS drives = 466 GB
2 Internal 93 GB read optimized SSD = 186 GB
1 External Sun Storage J4400 array with 22 1TB SATA drives and 2 18GB write optimized SSD
11 TB mirrored data and mirrored write optimized SSD

Software Configuration:

SUSE Linux Enterprise Server 10 SP 2
Parallel Virtual Machine 3.3.11
Oracle Grid Engine
Intel 11.1 Compilers
OpenWorks Database requires Oracle 10g Enterprise Edition
Libraries: pthreads 2.4, Java 1.6.0_01, BLAS, Stanford Exploration Project Libraries

Benchmark Description

The ProMAX family of seismic data processing tools is the most widely used Oil and Gas Industry seismic processing application. ProMAX is used for multiple applications, from field processing and quality control, to interpretive project-oriented reprocessing at oil companies and production processing at service companies. ProMAX is integrated with Halliburton's OpenWorks Geoscience Oracle Database to index prestack seismic data and populate the database with processed seismic.

This benchmark evaluates single job scalability and multiple job throughput of the ProMAX 3D Prestack Kirchhoff Time Migration while processing the Halliburton benchmark data set containing 70,808 traces with 8 msec sample interval and trace length of 4992 msec. Alternative thread scheduling methods are compared for optimizing single and multiple job throughput. The compact scheme schedules the threads of a single job in as few nodes as possible, whereas, the distributed scheme schedules the threads across a many nodes as possible. The effects of load on the Sun Storage 7410 system are measured. This information provides valuable insight into determining the Oracle Grid Engine resource management policies.

Hyperthreading is enabled for all of the tests. It should be noted that every node is running a PVM daemon and ProMAX license server daemon. On the master PVM daemon node, there are three additional ProMAX daemons running.

The first test measures single job scalability across a 6-node cluster with an additional node serving as the master PVM host. The speedup relative to a single node, single thread are reported.

The second test measures multiple job scalability running 1 to 8 concurrent 16-thread jobs using the Sun Storage 7410 system. The performance is reported relative to a single job.

The third test compares 8-thread multiple job throughput using different job scheduling methods on a cluster. The compact method involves putting all 8 threads for a job on the same node. The distributed method involves spreading the 8 threads of job across multiple nodes. The results report the difference of different distributed method resource scheduling levels to 1, 2, and 4 concurrent job compact method baselines.

The fourth test is similar to the second test except running 16-thread ProMAX jobs. The results are reported relative to the performance of 1, 2, 4, and 8 concurrent 2-node, 8-thread jobs.

The ProMAX processing parameters used for this benchmark:

Minimum output inline = 65
Maximum output inline = 85
Inline output sampling interval = 1
Minimum output xline = 1
Maximum output xline = 200 (fold)
Xline output sampling interval = 1
Antialias inline spacing = 15
Antialias xline spacing = 15
Stretch Mute Aperature Limit with Maximum Stretch = 15
Image Gather Type = Full Offset Image Traces
No Block Moveout
Number of Alias Bands = 10
3D Amplitude Phase Correction
No compression
Maximum Number of Cache Blocks = 500000

Key Points and Best Practices

  • The application was rebuilt with the Intel 11.1 Fortran and C++ compilers with these flags.

    -xSSE4.2 -O3 -ipo -no-prec-div -static -m64 -ftz -fast-transcendentals -fp-speculation=fast
  • There are additional execution threads associated with a ProMAX node. There are two threads that run on each node: the license server and PVM daemon. There are at least three additional daemon threads that run on the PVM master server: the ProMAX interface GUI, the ProMAX job execution - SuperExec, and the PVM console and control. It is best to allocate one node as the master PVM server to handle the additional 5+ threads. Otherwise, hyperthreading can be enabled and the master PVM host can support up to 8 ProMAX job threads.

  • When hyperthreading is enabled in on one of the non-master PVM hosts, there is a 7% penalty going from 8 to 10 threads. However, 12 threads are 11 percent faster than 8. This can be contributed to the two additional support threads when hyperthreading initiates.

  • Single job performance on a single node is near linear up the number of cores in the node, i.e. 2 Intel Xeon X5570s with 4 cores each. With hyperthreading (2 active threads per core) enabled, more ProMAX threads are used increasing the load on the CPU's memory architecture causing the reduced speedups.

    Users need to be aware of these performance differences and how it effects their production environment.

See Also

Disclosure Statement

The following are trademarks or registered trademarks of Halliburton/Landmark Graphics: ProMAX. Results as of 9/20/2010.

Monday Sep 20, 2010

Schlumberger's ECLIPSE 300 Performance Throughput On Sun Fire X2270 Cluster with Sun Storage 7410

Oracle's Sun Storage 7410 system, attached via QDR InfiniBand to a cluster of eight of Oracle's Sun Fire X2270 servers, was used to evaluate multiple job throughput of Schlumberger's Linux-64 ECLIPSE 300 compositional reservoir simulator processing their standard 2 Million Cell benchmark model with 8 rank parallelism (MM8 job).

  • The Sun Storage 7410 system showed little difference in performance (2%) compared to running the MM8 job with dedicated local disk.

  • When running 8 concurrent jobs on 8 different nodes all to the Sun Storage 7140 system, the performance saw little degradation (5%) compared to a single MM8 job running on dedicated local disk.

Experiments were run changing how the cluster was utilized in scheduling jobs. Rather than running with the default compact mode, tests were run distributing the single job among the various nodes. Performance improvements were measured when changing from the default compact scheduling scheme (1 job to 1 node) to a distributed scheduling scheme (1 job to multiple nodes).

  • When running at 75% of the cluster capacity, distributed scheduling outperformed the compact scheduling by up to 34%. Even when running at 100% of the cluster capacity, the distributed scheduling is still slightly faster than compact scheduling.

  • When combining workloads, using the distributed scheduling allowed two MM8 jobs to finish 19% faster than the reference time and a concurrent PSTM workload to find 2% faster.

The Oracle Solaris Studio Performance Analyzer and Sun Storage 7410 system analytics were used to identify a 3D Prestack Kirchhoff Time Migration (PSTM) as a potential candidate for consolidating with ECLIPSE. Both scheduling schemes are compared while running various job mixes of these two applications using the Sun Storage 7410 system for I/O.

These experiments showed a potential opportunity for consolidating applications using Oracle Grid Engine resource scheduling and Oracle Virtual Machine templates.

Performance Landscape

Results are presented below on a variety of experiments run using the 2009.2 ECLIPSE 300 2 Million Cell Performance Benchmark (MM8). The compute nodes are a cluster of Sun Fire X2270 servers connected with QDR InfiniBand. First, some definitions used in the tables below:

Local HDD: Each job runs on a single node to its dedicated direct attached storage.
NFSoIB: One node hosts its local disk for NFS mounting to other nodes over InfiniBand.
IB 7410: Sun Storage 7410 system over QDR InfiniBand.
Compact Scheduling: All 8 MM8 MPI processes run on a single node.
Distributed Scheduling: Allocate the 8 MM8 MPI processes across all available nodes.

First Test

The first test compares the performance of a single MM8 test on a single node using local storage to running a number of jobs across the cluster and showing the effect of different storage solutions.

Compact Scheduling
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Local HDD Relative Throughput NFSoIB Relative Throughput IB 7410 Relative Throughput
13% 1 1.00 1.00\* 0.98
25% 2 0.98 0.97 0.98
50% 4 0.98 0.96 0.97
75% 6 0.98 0.95 0.95
100% 8 0.98 0.95 0.95

\* Performance measured on node hosting its local disk to other nodes in the cluster.

Second Test

This next test uses the Sun Storage 7410 system and compares the performance of running the MM8 job on 1 node using the compact scheduling to running multiple jobs with compact scheduling and to running multiple jobs with the distributed schedule. The tests are run on a 8 node cluster, so each distributed job has only 1 MPI process per node.

Comparing Compact and Distributed Scheduling
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Compact Scheduling
Relative Throughput
Distributed Scheduling\*
Relative Throughput
13% 1 1.00 1.34
25% 2 1.00 1.32
50% 4 0.99 1.25
75% 6 0.97 1.10
100% 8 0.97 0.98

\* Each distributed job has 1 MPI process per node.

Third Test

This next test uses the Sun Storage 7410 system and compares the performance of running the MM8 job on 1 node using the compact scheduling to running multiple jobs with compact scheduling and to running multiple jobs with the distributed schedule. This test only uses 4 nodes, so each distributed job has two MPI processes per node.

Comparing Compact and Distributed Scheduling on 4 Nodes
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Compact Scheduling
Relative Throughput
Distributed Scheduling\*
Relative Throughput
25% 1 1.00 1.39
50% 2 1.00 1.28
100% 4 1.00 1.00

\* Each distributed job it has two MPI processes per node.

Fourth Test

The last test involves running two different applications on the 4 node cluster. It compares the performance of running the cluster fully loaded and changing how the applications are run, either compact or distributed. The comparisons are made against the individual application running the compact strategy (as few nodes as possible). It shows that appropriately mixing jobs can give better job performance than running just one kind of application on a single cluster.

Multiple Job, Multiple Application Throughput Results
Comparing Scheduling Strategies
2009.2 ECLIPSE 300 MM8 2 Million Cell and 3D Kirchoff Time Migration (PSTM)

Number of PSTM Jobs Number of MM8 Jobs Compact Scheduling
(1 node x 8 processes
per job)
ECLIPSE
Distributed Scheduling
(4 nodes x 2 processes
per job)
ECLIPSE
Distributed Scheduling
(4 nodes x 4 processes
per job)
PSTM
Compact Scheduling
(2 nodes x 8 processes per job)
PSTM
Cluster Load
0 1 1.00 1.40

25%
0 2 1.00 1.27

50%
0 4 0.99 0.98

100%
1 2
1.19 1.02
100%
2 0

1.07 0.96 100%
1 0

1.08 1.00 50%

Results and Configuration Summary

Hardware Configuration:

8 x Sun Fire X2270 servers, each with
2 x 2.93 GHz Intel Xeon X5570 processors
24 GB memory (6 x 4 GB memory at 1333 MHz)
1 x 500 GB SATA
Sun Storage 7410 system, 24 TB total, QDR InfiniBand
4 x 2.3 GHz AMD Opteron 8356 processors
128 GB memory
2 Internal 233GB SAS drives (466 GB total)
2 Internal 93 GB read optimized SSD (186 GB total)
1 Sun Storage J4400 with 22 1 TB SATA drives and 2 18 GB write optimized SSD
20 TB RAID-Z2 (double parity) data and 2-way striped write optimized SSD or
11 TB mirrored data and mirrored write optimized SSD
QDR InfiniBand Switch

Software Configuration:

SUSE Linux Enterprise Server 10 SP 2
Scali MPI Connect 5.6.6
GNU C 4.1.2 compiler
2009.2 ECLIPSE 300
ECLIPSE license daemon flexlm v11.3.0.0
3D Kirchoff Time Migration

Benchmark Description

The benchmark is a home-grown study in resource usage options when running the Schlumberger ECLIPSE 300 Compositional reservoir simulator with 8 rank parallelism (MM8) to process Schlumberger's standard 2 Million Cell benchmark model. Schlumberger pre-built executables were used to process a 260x327x73 (2 Million Cell) sub-grid with 6,206,460 total grid cells and model 7 different compositional components within a reservoir. No source code modifications or executable rebuilds were conducted.

The ECLIPSE 300 MM8 job uses 8 MPI processes. It can run within a single node (compact) or across multiple nodes of a cluster (distributed). By using the MM8 job, it is possible to compare the performance between running each job on a separate node using local disk to using a shared network attached storage solution. The benchmark tests study the affect of increasing the number of MM8 jobs in a throughput model.

The first test compares the performance of running 1, 2, 4, 6 and 8 jobs on a cluster of 8 nodes using local disk, NFSoIB disk, and the Sun Storage 7410 system connected via InfiniBand. Results are compared against the time it takes to run 1 job with local disk. This test shows what performance impact there is when loading down a cluster.

The second test compares different methods of scheduling jobs on a cluster. The compact method involves putting all 8 MPI processes for a job on the same node. The distributed method involves using 1 MPI processes per node. The results compare the performance against 1 job on one node.

The third test is similar to the second test, but uses only 4 nodes in the cluster, so when running distributed, there are 2 MPI processes per node.

The fourth test compares the compact and distributed scheduling methods on 4 nodes while running a 2 MM8 jobs and one 16-way parallel 3D Prestack Kirchhoff Time Migration (PSTM).

Key Points and Best Practices

  • ECLIPSE is very sensitive to memory bandwidth and needs to be run on 1333 MHz or greater memory speeds. In order to maintain 1333 MHz memory, the maximum memory configuration for the processors used in this benchmark is 24 GB. Bios upgrades now allow 1333 MHz memory for up to 48 GB of memory. Additional nodes can be used to handle data sets that require more memory than available per node. Allocating at least 20% of memory per node for I/O caching helps application performance.

  • If allocating an 8-way parallel job (MM8) to a single node, it is best to use an ECLIPSE license for that particular node to avoid the any additional network overhead of sharing a global license with all the nodes in a cluster.

  • Understanding the ECLIPSE MM8 I/O access patterns is essential to optimizing a shared storage solution. The analytics available on the Oracle Unified Storage 7410 provide valuable I/O characterization information even without source code access. A single MM8 job run shows an initial read and write load related to reading the input grid, parsing Petrel ascii input parameter files and creating an initial solution grid and runtime specifications. This is followed by a very long running simulation that writes data, restart files, and generates reports to the 7410. Due to the nature of the small block I/O, the mirrored configuration for the 7410 outperformed the RAID-Z2 configuration.

    A single MM8 job reads, processes, and writes approximately 240 MB of grid and property data in the first 36 seconds of execution. The actual read and write of the grid data, that is intermixed with this first stage of processing, is done at a rate of 240 MB/sec to the 7410 for each of the two operations.

    Then, it calculates and reports the well connections at an average 260 KB writes/second with 32 operations/second = 32 x 8 KB writes/second. However, the actual size of each I/O operation varies between 2 to 100 KB and there are peaks every 20 seconds. The write cache is on average operating at 8 accesses/second at approximately 61 KB/second (8 x 8 KB writes/sec). As the number of concurrent jobs increases, the interconnect traffic and random I/O operations per second to the 7410 increases.

  • MM8 multiple job startup time is reduced on shared file systems, if each job uses separate input files.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/20/2010.

About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today