Tuesday Jul 21, 2009

1.6 GHz SPEC CPU2006 - Rate Benchmarks

UltraSPARC T2 and T2 Plus Systems

Improved Performance Over 1.4 GHz

Reported 07/21/09

Significance of Results

Results are presented for the SPEC CPU2006 rate benchmarks run on the new 1.6 GHz Sun UltraSPARC T2 and Sun UltraSPARC T2 Plus processors based systems. The new processors were tested in the Sun CMT family of systems, including the Sun SPARC Enterprise T5120, T5220, T5240, T5440 servers and the Sun Blade T6320 server module.

SPECint_rate2006

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered 57% and 37% better results than the best 4-chip IBM POWER6+ based systems on the SPEC CPU2006 integer throughput metrics.

  • The Sun SPARC Enterprise T5240 server equipped with two 1.6 GHz UltraSPARC T2 Plus processor chips, produced 68% and 48% better results than the best 2-chip IBM POWER6+ based systems on the SPEC CPU2006 integer throughput metrics.

  • The single-chip 1.6 GHz UltraSPARC T2 processor-based Sun CMT servers produced 59% to 68% better results than the best single-chip IBM POWER6 based systems on the SPEC CPU2006 integer throughput metrics.

  • On the four-chip Sun SPARC Enterprise T5440 server, when compared versus the 1.4 GHz version of this server, the new 1.6 GHz UltraSPARC T2 Plus processor delivered performance improvements of 25% and 20% as measured by the SPEC CPU2006 integer throughput metrics.

  • The new 1.6 GHz UltraSPARC T2 Plus processor, when put into the 2-chip Sun SPARC Enterprise T5240 server, delivered improvements of 20% and 17% when compared to the 1.4 GHz UltraSPARC T2 Plus processor based server, as measured by the SPEC CPU2006 integer throughput metrics.

  • On the single-chip Sun Blade T6320 server module, Sun SPARC Enterprise T5120 and T5220 servers, the new 1.6 GHz UltraSPARC T2 processor delivered performance improvements of 13% to 17% over the 1.4 GHz version of these servers, as measured by the SPEC CPU2006 integer throughput metrics.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered a SPECint_rate_base2006 score 3X the best 4-chip Itanium based system.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processors, delivered a SPECint_rate_base2006 score of 338, a World Record score for 4-chip systems running a single operating system instance (i.e. SMP, not clustered).

SPECfp_rate2006

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered 35% and 22% better results than the best 4-chip IBM POWER6+ based systems on the SPEC CPU2006 floating-point throughput metrics.

  • The Sun SPARC Enterprise T5240 server, equipped with two 1.6 GHz UltraSPARC T2 Plus processor chips, produced 40% and 27% better results than the best 2-chip IBM POWER6+ based systems on the SPEC CPU2006 floating-point throughput metrics.

  • The single 1.6 GHz UltraSPARC T2 processor based Sun CMT servers produced between 24% and 18% better results than the best single-chip IBM POWER6 based systems on the SPEC CPU2006 floating-point throughput metrics.

  • On the four chip Sun SPARC Enterprise T5440 server, the new 1.6 GHz UltraSPARC T2 Plus processor delivered performance improvements of 20% and 17% when compared to 1.4 GHz processors in the same system, as measured by the SPEC CPU2006 floating-point throughput metrics.

  • The new 1.6 GHz UltraSPARC T2 Plus processor, when put into a Sun SPARC Enterprise T5240 server, delivered an improvement of 12% when compared to the 1.4 GHz UltraSPARC T2 Plus processor based server as measured by the SPEC CPU2006 floating-point throughput metrics.

  • On the single processor Sun Blade T6320 server module, Sun SPARC Enterprise T5120 and T5220 servers, the new 1.6 GHz UltraSPARC T2 processor delivered a performance improvement over the 1.4 GHz version of these servers of between 11% and 10% as measured by the SPEC CPU2006 floating-point throughput metrics.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered a peak score 3X the best 4-chip Itanium based system, and base 2.9X, on the SPEC CPU2006 floating-point throughput metrics.

Performance Landscape

SPEC CPU2006 Performance Charts - bigger is better, selected results, please see www.spec.org for complete results. All results as of 7/17/09.

In the tables below
"Base" = SPECint_rate_base2006 or SPECfp_rate_base2006
"Peak" = SPECint_rate2006 or SPECfp_rate2006

SPECint_rate2006 results - 1 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Supermicro X8DAI 4/1 Xeon W3570 3200 8 127 136 Best Nehalem result
HP ProLiant BL465c G6 6/1 Opteron 2435 2600 6 82.1 104 Best Istanbul result
Sun SPARC T5220 8/1 UltraSPARC T2 1582 63 89.1 97.0 New
Sun SPARC T5120 8/1 UltraSPARC T2 1582 63 89.1 97.0 New
Sun Blade T6320 8/1 UltraSPARC T2 1582 63 89.2 96.7 New
Sun Blade T6320 8/1 UltraSPARC T2 1417 63 76.4 85.5
Sun SPARC T5120 8/1 UltraSPARC T2 1417 63 76.2 83.9
IBM System p 570 2/1 POWER6 4700 4 53.2 60.9 Best POWER6 result

SPECint_rate2006 - 2 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Fujitsu CELSIUS R670 8/2 Xeon W5580 3200 16 249 267 Best Nehalem result
Sun Blade X6270 8/2 Xeon X5570 2933 16 223 260
A+ Server 1021M-UR+B 12/2 Opteron 2439 SE 2800 12 168 215 Best Istanbul result
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1582 127 171 183 New
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1415 127 142 157
IBM Power 520 4/2 POWER6+ 4700 8 101 124 Best POWER6+ peak
IBM Power 520 4/2 POWER6+ 4700 8 102 122 Best POWER6+ base
HP Integrity rx2660 4/2 Itanium 9140M 1666 4 58.1 62.8 Best Itanium peak
HP Integrity BL860c 4/2 Itanium 9140M 1666 4 61.0 na Best Itanium base

SPECint_rate2006 - 4 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
SGI Altix ICE 8200EX 16/4 Xeon X5570 2933 32 466 499 Best Nehalem result
Note: clustered, not SMP
Tyan Thunder n4250QE 24/4 Opteron 8439 SE 2800 24 326 417 Best Istanbul result
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1596 255 338 360 New.  World record for
4-chip SMP
SPECint_rate_base2006
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1414 255 270 301
IBM Power 550 8/4 POWER6+ 5000 16 215 263 Best POWER6 result
HP Integrity BL870c 8/4 Itanium 9150N 1600 8 114 na Best Itanium result

SPECfp_rate2006 - 1 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Supermicro X8DAI 4/1 Xeon W3570 3200 8 102 106 Best Nehalem result
HP ProLiant BL465c G6 6/1 Opteron 2435 2600 6 65.2 72.2 Best Istanbul result
Sun SPARC T5220 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun SPARC T5120 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun Blade T6320 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun Blade T6320 8/1 UltraSPARC T2 1417 63 58.1 62.3
SPARC T5120 8/1 UltraSPARC T2 1417 63 57.9 62.3
SPARC T5220 8/1 UltraSPARC T2 1417 63 57.9 62.3
IBM System p 570 2/1 POWER6 4700 4 51.5 58.0 Best POWER6 result

SPECfp_rate2006 - 2 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
ASUS TS700-E6 8/2 Xeon W5580 3200 16 201 207 Best Nehalem result
A+ Server 1021M-UR+B 12/2 Opteron 2439 SE 2800 12 133 147 Best Istanbul result
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1582 127 124 133 New
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1415 127 111 119
IBM Power 520 4/2 POWER6+ 4700 8 88.7 105 Best POWER6+ result
HP Integrity rx2660 4/4 Itanium 9140M 1666 4 54.5 55.8 Best Itanium result

SPECfp_rate2006 - 4 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
SGI Altix ICE 8200EX 16/4 Xeon X5570 2933 32 361 372 Best Nehalem result
Tyan Thunder n4250QE 24/4 Opteron 8439 SE 2800 24 259 285 Best Istanbul result
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1596 255 254 270 New
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1414 255 212 230
IBM Power 550 8/4 POWER6+ 5000 16 188 222 Best POWER6+ result
HP Integrity rx7640 8/4 Itanium 2 9040 1600 8 87.4 90.8 Best Itanium result

Results and Configuration Summary

Test Configurations:


Sun Blade T6320
1.6 GHz UltraSPARC T2
64 GB (16 x 4GB)
Solaris 10 10/08
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5120/T5220
1.6 GHz UltraSPARC T2
64 GB (16 x 4GB)
Solaris 10 10/08
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5240
2 x 1.6 GHz UltraSPARC T2 Plus
128 GB (32 x 4GB)
Solaris 10 5/09
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5440
4 x 1.6 GHz UltraSPARC T2 Plus
256 GB (64 x 4GB)
Solaris 10 5/09
Sun Studio 12 Update 1, gccfss V4.2.1

Results Summary:



T6320 T5120 T5220 T5240 T5440
SPECint_rate_base2006 89.2 89.1 89.1 171 338
SPECint_rate2006 96.7 97.0 97.0 183 360
SPECfp_rate_base2006 64.1 64.1 64.1 124 254
SPECfp_rate2006 68.5 68.5 68.5 133 270

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark, with over 7000 results published in the three years since it was introduced. It measures:

  • "Speed" - single copy performance of chip, memory, compiler
  • "Rate" - multiple copy (throughput)

The rate metrics are used for the throughput-oriented systems described on this page. These metrics include:

  • SPECint_rate2006: throughput for 12 integer benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding
  • SPECfp_rate2006: throughput for 17 floating point benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

There are "base" variants of both the above metrics that require more conservative compilation, such as using the same flags for all benchmarks.

See here for additional information.

Key Points and Best Practices

Result on this page for the Sun SPARC Enterprise T5120 server were measured on a Sun SPARC Enterprise T5220. The Sun SPARC Enterprise T5120 and Sun SPARC Enterprise T5220 are electronically equivalent. A SPARC Enterprise 5120 can hold up to 4 disks, and a T5220 can hold up to 8. This system was tested with 4 disks; therefore, results on this page apply to both the T5120 and the T5220.

Know when you need throughput vs. speed. The Sun CMT systems described on this page provide massive throughput, as demonstrated by the fact that up to 255 jobs are run on the 4-chip system, 127 on 2-chip, and 63 on 1-chip. Some of the competitive chips do have a speed advantage - e.g. Nehalem and Istanbul - but none of the competitive results undertake to run the large number of jobs tested on Sun's CMT systems.

Use the latest compiler. The Sun Studio group is always working to improve the compiler. Sun Studio 12, and Sun Studio 12 Update 1, which are used in these submissions, provide updated code generation for a wide variety of SPARC and x86 implementations.

I/O still counts. Even in a CPU-intensive workload, some I/O remains. This point is explored in some detail at http://blogs.sun.com/jhenning/entry/losing_my_fear_of_zfs.

Disclosure Statement

SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Competitive results from www.spec.org as of 16 July 2009.  Sun's new results quoted on this page have been submitted to SPEC.
Sun Blade T6320 89.2 SPECint_rate_base2006, 96.7 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006;
Sun SPARC Enterprise T5220/T5120 89.1 SPECint_rate_base2006, 97.0 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006;
Sun SPARC Enterprise T5240 172 SPECint_rate_base2006, 183 SPECint_rate2006, 124 SPECfp_rate_base2006, 133 SPECfp_rate2006;
Sun SPARC Enterprise T5440 338 SPECint_rate_base2006, 360 SPECint_rate2006, 254 SPECfp_rate_base2006, 270 SPECfp_rate2006;
Sun Blade T6320 76.4 SPECint_rate_base2006, 85.5 SPECint_rate2006, 58.1 SPECfp_rate_base2006, 62.3 SPECfp_rate2006;
Sun SPARC Enterprise T5220/T5120 76.2 SPECint_rate_base2006, 83.9 SPECint_rate2006, 57.9 SPECfp_rate_base2006, 62.3 SPECfp_rate2006;
Sun SPARC Enterprise T5240 142 SPECint_rate_base2006, 157 SPECint_rate2006, 111 SPECfp_rate_base2006, 119 SPECfp_rate2006;
Sun SPARC Enterprise T5440 270 SPECint_rate_base2006, 301 SPECint_rate2006, 212 SPECfp_rate_base2006, 230 SPECfp_rate2006;
IBM p 570 53.2 SPECint_rate_base2006, 60.9 SPECint_rate2006, 51.5 SPECfp_rate_base2006, 58.0 SPECfp_rate2006;
IBM Power 520 102 SPECint_rate_base2006, 124 SPECint_rate2006, 88.7 SPECfp_rate_base2006, 105 SPECfp_rate2006;
IBM Power 550 215 SPECint_rate_base2006, 263 SPECint_rate2006, 188 SPECfp_rate_base2006, 222 SPECfp_rate2006;
HP Integrity BL870c 114 SPECint_rate_base2006;
HP Integrity rx7640 87.4 SPECfp_rate_base2006, 90.8 SPECfp_rate2006.

New SPECjAppServer2004 Performance on the Sun SPARC Enterprise T5440

One Sun SPARC Enterprise T5440 server with four UltraSPARC T2 Plus processors at 1.6GHz delivered a single-system World Record result of 7661.16 SPECjAppServer2004 JOPS@Standard using Oracle WebLogic Server, a component of Oracle Fusion Middleware, together with Oracle Database 11g.

  • This benchmark used the Oracle WebLogic 10.3.1 Application Server and Oracle Database 11g Enterprise Edition. This benchmark result proves that the Sun SPARC Enterprise T5440 server using the UltraSPARC T2 Plus processor performs as an outstanding J2EE application server as well as an Oracle 11g OLTP database server.
  • The Sun SPARC Enterprise T5440 server (four 1.6 GHz UltraSPARC T2 Plus chips) running as the application server delivered 6.4X better performance than the best published single application server result from the IBM p 570 system based on the 4.7 GHz POWER6 processor.
  • The Sun SPARC Enterprise T5440 server (four 1.6 GHz UltraSPARC T2 Plus chips) demonstrated 73% better performance than the HP DL580 G5 result of 4410.07 SPECjAppServer2004 JOPS@Standard, which used four 2.66 GHz Intel 6-core Xeon processors.
  • The Sun SPARC Enterprise T5440 server (four 1.6 GHz UltraSPARC T2 Plus chips) demonstrated 2.3X better performance than the HP DL580 G5 result of 3339.94 SPECjAppServer2004 JOPS@Standard, which used four 2.93 GHz Intel 4-core Xeon processors.
  • One Sun SPARC Enterprise T5440 server (four 1.6 GHz UltraSPARC T2 Plus chips) demonstrated 1.9X better performance than the Dell PowerEdge R610 result of 3975.13 SPECjAppServer2004 JOPS@Standard, which used two 2.93 GHz Intel 4-core Xeon processors.
  • One Sun SPARC Enterprise T5440 server (four 1.6 GHz UltraSPARC T2 Plus chips) demonstrated 5% better performance than the Dell PowerEdge R610 result of 7311.50 SPECjAppServer2004 JOPS@Standard, which used two Dell R610 systems each with two 2.93 GHz Intel 4-core Xeon processors.
  • These results were obtained using Sun Java SE 6 Update 14 Performance Release on the Sun SPARC Enterprise T5440 server and running the Solaris 10 5/09 Operating Environment.
  • The Sun SPARC Enterprise T5440 server used Solaris Containers technology to consolidate 7 Oracle Weblogic application server instances to achieve this result.
  • Oracle Fusion Middleware provides a family of complete, integrated, hot pluggable and best-of-breed products known for enabling enterprise customers to create and run agile and intelligent business applications. Oracle WebLogic Server’s on-going, record-setting Java application server performance demonstrates why so many customers rely on Oracle Fusion Middleware as their foundation for innovation.

Performance Landscape

SPECjAppServer2004 Performance Chart as of 7/20/2009. Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org. SPECjAppServer2004 JOPS@Standard (bigger is better)

Vendor SPECjApp-Server2004
JOPS@Standard
J2EE Server DB Server
Sun 7661.16 1x Sun SPARC Enterprise T5440
32 cores, 4 chips, 1.6 GHz US-T2 Plus
Oracle WebLogic 10.3.1
1x Sun SPARC Enterprise T5440
32 cores, 4 chips, 1.4 GHz US-T2 Plus
Oracle 11g DB 11.1.0.7
Dell 7311.50 2x PowerEdge R610
16 cores, 4 chips, 2.93 Xeon X5570
Oracle WebLogic 10.3
1x PowerEdge R900
24 cores, 4 chips @ 2.66 Xeon X7460
Oracle 11g DB 11.1.0.7
Sun 6334.86 1x Sun SPARC Enterprise T5440
32 cores, 4 chips, 1.4 GHz US-T2 Plus
Oracle WebLogic 10.3
1x Sun SPARC Enterprise T5440
32 cores, 4 chips, 1.4 GHz US-T2 Plus
Oracle 11g DB 11.1.0.7
Dell 4794.33 2x PowerEdge 2950
16 cores, 4 chips, 3.3 Xeon X5470
Oracle WebLogic 10.3
1x PowerEdge R900
24 cores, 4 chips, 2.66 Xeon X7460
Oracle 11g DB 11.1.0.6
HP 4410.07 1x ProLiant DL580 G5
24 cores, 4 chips, 2.66 GHz Xeon X7460
Oracle WebLogic 10.3
1x ProLiant DL580 G5
24 cores, 4 chips, 2.66 GHz Xeon X7460
Oracle 11g DB 11.1.0.6
HP 3975.13 1x Dell PowerEdge R610
8 cores, 2 chips, 2.93 GHz Xeon X5570
Oracle WebLogic 10.3
1x PowerEdge R900
24 cores, 4 chips, 2.66 GHz Xeon X7460
Oracle 11g DB 11.1.0.7
IBM 1197.51 1x IBM System p 570
4 cores, 2 chips, 4.7 GHz POWER6
WebSphere Application Server V6.1
1x IBM p5 550
4 cores, 2 chips, 2.1 GHz POWER5+
IBM DB2 Universal Database 9.1

Results and Configuration Summary

Application Server:
    Sun SPARC Enterprise T5440
      4 x 1.6 GHz 8-core UltraSPARC T2 Plus
      256 GB memory
      2 x 10GbE XAUI NIC
      2 x 32GB SATA SSD
      Solaris 10 5/09
      Solaris Containers
      Oracle WebLogic 10.3.1 Application Server - Standard Edition
      Oracle Fusion Middleware
      JDK 1.6.0_14 Performance Release

Database Server:

    Sun SPARC Enterprise T5440
      4 x 1.4 GHz 8-core UltraSPARC T2 Plus
      256 GB memory
      6 x Sun StorageTek 2540 FC Array
      4 x Sun StorageTek 2501 FC Expansion Array
      Solaris 10 5/09
      Oracle Database Enterprise Edition Release 11.1.0.7

Benchmark Description

SPECjAppServer2004 (Java Application Server) is a multi-tier benchmark for measuring the performance of Java 2 Enterprise Edition (J2EE) technology-based application servers. SPECjAppServer2004 is an end-to-end application which exercises all major J2EE technologies implemented by compliant application servers as follows:
  • The web container, including servlets and JSPs
  • The EJB container
  • EJB2.0 Container Managed Persistence
  • JMS and Message Driven Beans
  • Transaction management
  • Database connectivity
Moreover, SPECjAppServer2004 also heavily exercises all parts of the underlying infrastructure that make up the application environment, including hardware, JVM software, database software, JDBC drivers, and the system network. The primary metric of the SPECjAppServer2004 benchmark is jAppServer Operations Per Second (JOPS) which is calculated by adding the metrics of the Dealership Management Application in the Dealer Domain and the Manufacturing Application in the Manufacturing Domain. There is NO price/performance metric in this benchmark.

Key Points and Best Practices

  • 7 Oracle WebLogic server instances on the Sun SPARC Enterprise T5440 server were hosted in separate Solaris Containers to demonstrate consolidation of multiple application servers.
  • Each appserver container was bound to a separate processor set each containing 4 cores. This was done to improve performance by reducing memory access latency using the physical memory closest to the processors. The default set was used for network & disk interrupt handling.
  • The Oracle WebLogic application servers were executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • The Oracle database processes were run in 4 processor sets using the psrset utility and executed in the FX scheduling class. This was done to improve perfomance by reducing memory access latency and reducing conext switches.
  • Oracle Log Writer process run in a separate processor set containing 1 core and run in the RT scheduling class. This was done to insure that the Log Writer had the most efficient use of cpu resources.
  • Enhancements to the JVM had a major impact on performance.
  • The Sun SPARC Enterprise T5440 used 2x 10GbE NICs for network traffic from the driver systems.

See Also

Disclosure Statement

SPECjAppServer2004, Sun SPARC Enterprise T5440 (4 chips, 32 cores) 7661.16 SPECjAppServer2004 JOPS@Standard; HP DL580 G5 (4 chips, 24 cores) 4410.07 SPECjAppServer2004 JOPS@Standard; HP DL580 G5 (4 chips, 16 cores) 3339.94 SPECjAppServer2004 JOPS@Standard; Two Dell PowerEdge 2950 (4 chips, 16 cores) 4794.33 SPECjAppServer2004 JOPS@Standard; Dell PowerEdge R610 (2 chips, 8 cores) 3975.13 SPECjAppServer2004 JOPS@Standard; Two Dell PowerEdge R610 (4 chips, 16 cores) 7311.50 SPECjAppServer2004 JOPS@Standard; IBM p570 (2 chips, 4 cores) 1197.51 SPECjAppServer2004 JOPS@Standard; SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. Results from http://www.spec.org as of 7/20/09

Friday Jul 03, 2009

SPECmail2009 on Sun Fire X4275+Sun Storage 7110: Mail Server System Solution

Significance of Results

Sun has a new SPECmail2009 result on a Sun Fire X4275 server and Sun Storage 7110 Unified Storage System running Sun Java Messaging server 6.2.  OpenStorage and ZFS were a key part of the new World Record SPECmail2009.

  • The Sun Fire X4275 server, equipped by 2, 2.93GHz Intel Xeon QC X5570 processors, running the Sun Java System Messaging Server 6.2 on Solaris 10 achieved a new World Record 8000 SPECmail_Ent2009 IMAP4 users at 38,348 Sessions/hour. 

  • The Sun result was obtained using  about half disk spindles, with the Sun Storage 7110 Unified Storage System, than  Apple's direct attached storage solution.  The Sun submission is the first result using a NAS solution, specifically two Sun Storage 7110 Unified Storage System.
  • This benchmark result clearly demonstrates that the Sun Fire X4275 server together with Sun Java System Messaging Server 6.2, Solaris 10 on Sun Storage 7110 Unified Storage Systems can support a large, enterprise level IMAP mail server environment as a reliable low cost solution, delivering best performance and maximizing data integrity with ZFS.

SPECmail 2009 Performance Landscape(ordered by performance)

System Processors Performance
Type GHz Ch, Co, Th SPECmail_Ent2009
Users
SPECmail2009
Sessions/hour
Sun Fire X4275 Intel X5570 2.93 2, 8, 16 8000 38,348
Apple Xserv3,1 Intel X5570 2.93 2, 8, 16 6000 28,887
Sun SPARC Enterprise T5220 UltraSPARC T2 1.4 1, 8, 64 3600 17,316

Notes:
    Number of SPECmail_Ent2009 users (bigger is better)
    SPECmail2009 Sessions/hour (bigger is better)
    Ch, Co, Th: Chips, Cores, Threads

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org

Results and Configuration Summary

Hardware Configuration:
    Sun Fire X4275
      2 x 2.93 GHz QC Intel Xeon X5570 processors
      72 GB
      12 x 300GB, 10000 RPM SAS disk

    2 x Sun Storage 7110 Unified Storage System, each with
      16 x 146GB SAS 10K RPM

Software Configuration:

    O/S: Solaris 10
    ZFS
    Mail Server: Sun Java System Messaging Server 6.2

Benchmark Description

The SPECmail2009 benchmark measures the ability of corporate e-mail systems to meet today's demanding e-mail users over fast corporate local area networks (LAN). The SPECmail2009 benchmark simulates corporate mail server workloads that range from 250 to 10,000 or more users, using industry standard SMTP and IMAP4 protocols. This e-mail server benchmark creates client workloads based on a 40,000 user corporation, and uses folder and message MIME structures that include both traditional office documents and a variety of rich media content. The benchmark also adds support for encrypted network connections using industry standard SSL v3.0 and TLS 1.0 technology. SPECmail2009 replaces all versions of SPECmail2008, first released in August 2008. The results from the two benchmarks are not comparable.

Software on one or more client machines generates a benchmark load for a System Under Test (SUT) and measures the SUT response times. A SUT can be a mail server running on a single system or a cluster of systems.

A SPECmail2009 'run' simulates a 100% load level associated with the specific number of users, as defined in the configuration file. The mail server must maintain a specific Quality of Service (QoS) at the 100% load level to produce a valid benchmark result. If the mail server does maintain the specified QoS at the 100% load level, the performance of the mail server is reported as SPECmail_Ent2009 SMTP and IMAP Users at SPECmail2009 Sessions per hour. The SPECmail_Ent2009 users at SPECmail2009 Sessions per Hour metric reflects the unique workload combination for a SPEC IMAP4 user.

Key Points and Best Practices

  • Each XTA7110 was configured as 1xRAID1 (14x143GB) volume with 2 NFS Shared LUNs, accessed by the SUT via NFSV4. The mailstore volumes were mounted with the nfs mount options: nointr,hard, xattr. There were a total of 4GB/sec Network connections between the SUT and the 7110 Unified Storage systems using 2 NorthStar dual 1GB/sec NICs.
  • The clients used these Java options: java -d64 -Xms4096m -Xmx4096m -XX:+AggressiveHeap
  • See the SPEC Report for all OS, network and messaging server tunings.

See Also

Disclosure Statement

SPEC, SPECmail reg tm of Standard Performance Evaluation Corporation. Results as of 07/06/2009 on www.spec.org. SPECmail2009: Sun Fire X4275 (8 cores, 2 chips) SPECmail_Ent2009 8000 users at 38,348 SPECmail2009 Sessions/hour. Apple Xserv3,1 (8 cores, 2 chips) SPECmail_Ent2009 6000 users at 28,887 SPECmail2009 Sessions/hour.

Wednesday Jun 17, 2009

Performance of Sun 7410 and 7310 Unified Storage Array Line

Roch (rhymes with Spock) Bourbonnais, posted more data showing the performance of Sun's OpenStorage products.  Some of his basic conclusions are:

  • The Sun Storage 7410 Unified Storage Array delivers 1 GB/sec throughput performance.
  • The Sun Storage 7310 Unified Storage Array delivers over 500 MB/sec on streaming writes for backups and imaging applications.
  • The Sun Storage 7410 Unified Storage Array delivers over 22000 of 8K synchronous writes per second combining great DB performance and ease of deployment of Network Attached Storage while delivering the economics benefits of inexpensive SATA disks.
  • The Sun Storage 7410 Unified Storage Array delivers over 36000 of random 8K reads per second from a 400GB working set for great Mail application responsiveness. This corresponds to an entreprise of 100000 people with every employee accessing new data every 3.6 second consolidated on a single server.

You can read more about it at: http://blogs.sun.com/roch/entry/compared_performance_of_sun_7000

Monday Jun 15, 2009

Sun Fire X4600 M2 Server Two-tier SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark

Significance of Results

  • World Record performance result with 8 processors on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark as of June 10, 2009.
  • The Sun Fire X4600 M2 server with 8 AMD Opteron 8384 SE processors (32 cores, 32 threads) achieved 6,050 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using MaxDB 7.8 database and Solaris 10 OS.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • ZFS is used in this benchmark for its database and log files.
  • The Sun Fire X4600 M2 server beats both the HP ProLiant DL785 G5 and the NEC Express5800 running Windows by 10% and 35% respectively even though all three systems use the same number of processors.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding.  Refer to SAP Note for more details (https://service.sap.com/sap/support/notes/1139642 Note: User and password for SAP Service Marketplace required).

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4600 M2
8xAMD Opteron 8384 SE @2.7GHz
256 GB
Solaris 10
MaxDB 7.8
6,050 2009
6.0 EP4
(Unicode)
33,230 4,154 10-Jun-09
HP ProLiant DL785 G5
8xAMD Opteron 8393 SE @3.1GHz
128 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
5,518 2009
6.0 EP4
(Unicode)
30,180 3,772 24-Apr-09
NEC Express 5800
8xIntel Xeon X7460 @2.66GHz
256 GB
Windows Server 2008
Datacenter Edition
SQL Server 2008
4,485 2009
6.0 EP4
(Unicode)
25,280 12,640 09-Feb-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,700 2009
6.0 EP4
(Unicode)
20,300 10,150 30-Mar-09

SAP ERP 6.0 (non-unicode) Results
(Old version of the benchmark retired at the end of 2008)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4600 M2
8xAMD Opteron 8384 @2.7GHz
128 GB
Solaris 10
MaxDB 7.6
7,825 2005
6.0
39,270 4,909 09-Dec-08
IBM System x3650
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2003 EE
DB2 9.5
5,100 2005
6.0
25,530 12,765 19-Dec-08
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2003 EE
SQL Server 2005
4,995 2005
6.0
25,000 12,500 15-Dec-08

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4600 M2
      8 x 2.7 GHz AMD Opteron 8384 SE processors (8 processors / 32 cores / 32 threads)
      256 GB memory
      3 x STK2540, 3 x STK2501 each with 12 x 146GB/15KRPM disks

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    MaxDB 7.8

Certified Results

    Performance:
    6050 benchmark users
    SAP Certification:
    2009022

Key Points and Best Practices

  • This is the best 8 Processor SAP ERP 6.0 EP4 (Unicode) result as of June 10, 2009.
  • Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark on Sun Fire X4600 M2 (8 processors, 32 cores, 32 threads, 8x2.7 GHz AMD Opteron 8384 SE) was able to support 6,050 SAP SD Users on top of the Solaris 10 OS.
  • Since random writes are an important part of this benchmark, we used zfs to help coalesce those into sequential writes.

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 06/10/09: Sun Fire X4600 M2 (8 processors, 32 cores, 32 threads) 6,050 SAP SD Users, 8x 2.7 GHz AMD Opteron 8384 SE, 256 GB memory, MaxDB 7.8, Solaris 10, Cert# 2009022. HP ProLiant DL785 G5 (8 processors, 32 cores, 32 threads) 5,518 SAP SD Users, 8x 3.1 GHz AMD Opteron 8393 SE, 128 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009009. NEC Express 5800 (8 processors, 48 cores, 48 threads) 4,485 SAP SD Users, 8x 2.66 GHz Intel Xeon X7460, 256 GB memory, SQL Server 2008, Windows Server 2008 Datacenter Edition, Cert# 2009001. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,700 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009005. Sun Fire X4600M2 (8 processors, 32 cores, 32 threads) 7,825 SAP SD Users, 8x 2.7 GHz AMD Opteron 8384, 128 GB memory, MaxDB 7.6, Solaris 10, Cert# 2008070. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Thursday Jun 11, 2009

SAS Grid Computing 9.2 utilizing the Sun Storage 7410 Unified Storage System

Sun has demonstrated the first large scale grid validation for the SAS Grid Computing 9.2 benchmark. This workload showed both the strength of Solaris 10 utilizing containers for ease of deployment, as well as the Sun Storage 7410 Unified Storage Systems and Fishworks Analytics in analyzing, tuning and delivering performance in complex SAS grid data-intensive multi-node environments.

In order to model the real world, the Grid Endurance Test uses large data sizes and complex data processing.  Results demonstrate real customer scenarios and results.  These benchmark results represent significant engineering effort, collaboration and coordination between SAS and Sun. The results also illustrate the commitment of the two companies to provide the best solutions for the most demanding data integration requirements.

  • A combination of 7 Sun Fire X2200 M2 servers utilizing Solaris 10 and a Sun Storage 7410 Unified Storage System showed continued performance improvement as the node count increased from 2 to 7 nodes for the Grid Endurance Test.
  • SAS environments are often complex. Ease of deployment, configuration, use, and ability to observe application IO characteristics (hotspots, trouble areas) are critical for production environments. The power of Fishworks Analytics combined with the reliability of ZFS is a perfect fit for these types of applications.
  • Sun Storage 7410 Unified Storage System (exporting via NFS) satisfied performance needs, throughput peaking at over  900MB/s (near 10GbE line speed) in this multi-node environment.
  • Solaris 10 Containers were used to create agile and flexible deployment environments. Container deployments were trivially migrated (within minutes) as HW resources became available (Grid expanded).
  • This result is the only large scale grid validation for the SAS Grid Computing 9.2, and the first and most timely qualification of OpenStorage for SAS.
  • The test show a delivered throughput through client 1Gb connection of over 100MB/s.

Configuration

The test grid consisted of 8x Sun Fire x2200 M2 servers, 1 configured as the grid manager, 7 as the actual grid nodes.  Each node had a 1GbE connection through a Brocade FastIron 1GbE/10GbE switch.  The 7410 had a 10GbE connection to the switch and sat as the back end storage providing a common shared file system to all nodes which SAS Grid Computing requires.  A storage appliance like the 7410 serves as an easy to setup and maintain solution, satisfying the bandwidth required by the grid.  Our particular 7410 consisted of 46 700GB 7200RPM SATA drives, 36GB of write optimized SSD's and 300GB of Read optimized SSD's.


About the Test

The workload is a batch mixture.  CPU bound workloads are numerically intensive tests, some using tables varying in row count from  9,000 to almost 200,000.  The tables have up to 297 variables, and are processed with both stepwise linear regression and stepwise logistic regression.   Other computational tests use GLM (General Linear Model).  IO intensive jobs vary as well.  One particular test reads raw data from multiple files, then generates 2 SAS data sets, one containing over 5 million records, the 2nd over 12 million.  Another IO intensive job creates a 50 million record SAS data set, then subsequently does lookups against it and finally sorts it into a dimension table.   Finally, other jobs are both compute and IO intensive.

 The SAS IO pattern for all these jobs is almost always sequential, for read, write, and mixed access, as can be viewed via Fishworks Analytics further below.  The typical block size for IO is 32KB. 

Governing the batch is the SAS Grid Manager Scheduler,  Platform LSF.  It determines when to add a job to a node based on number of open job slots (user defined), and a point in time sample of how busy the node actually is.  From run to run, jobs end up scheduled randomly making runs less predictable.  Inevitably, multiple IO intensive jobs will get scheduled on the same node, throttling the 1Gb connection, creating a bottleneck while other nodes do little to no IO.  Often this is unavoidable due to the great variety in behavior a SAS program can go through during its lifecycle.  For example, a program can start out as CPU intensive and be scheduled on a node processing an IO intensive job.  This is the desired behavior and the correct decision based on that point in time.  However, the intially CPU intensive job can then turn IO intensive as it proceeds through its lifecycle.


Results of scaling up node count

Below is a chart of results scaling from 2 to 7 nodes.  The metric is total run time from when the 1st job is scheduled, until the last job is completed.

Scaling of 400 Analytics Batch Workload
Number of Nodes Time to Completion
2 6hr 19min
3 4hr 35min
4 3hr 30min
5 3hr 12min
6 2hr 54min
7 2hr 44min

One may note that time to completion is not linear as node count scales upwards.   To a large extent this is due to the nature of the workload as explained above regarding 1Gb connections getting saturated.  If this were a highly tuned benchmark with jobs placed with high precision, we certainly could have improved run time.  However, we did not do this in order to keep the batch as realistic as possible.  On the positive side, we do continue to see improved run times up to the 7th node.

The Fishworks Analytics displays below show several performance statistics with varying numbers of nodes, with more nodes on the left and fewer on the right.  The first two graphs show file operations per second, and the third shows network bytes per second.   The 7410 provides over 900 MB/sec in the seven-node test.  More information about the interpretation of the Fishworks data for these test will be provided in a later white paper.

An impressive part is in the Fishworks Analytics shot above, throughput of 763MB/s was achieved during the sample period.  That wasn't the top end of the what 7410 could provide.  For the tests summarized in the table above, the 7 node run peaked at over 900MB/s through a single 10GbE connection.  Clearly the 7410 can sustain a fairly high level of IO.

It is also important to note that while we did try to emulate a real world scenario with varying types of jobs and well over 1TB of data being manipulated during the batch, this is a benchmark.  The workload tries to encompass a large variety of job behavior.  Your scenario may vary quite differently from what was run here.  Along with scheduling issues, we were certainly seeing signs of pushing this 7410 configuration near its limits (with the SAS IO pattern and data set sizes), which also affected the ability to achieve linear scaling .  But many grid environments are running workloads that aren't very IO intensive and tend to be more CPU bound with minimal IO requirements.  In that scenario one could expect to see excellent node scaling well beyond what was demonstrated by this batch.  To demonstrate this, the batch was run sans the IO intensive jobs.  These jobs do require some IO, but tend to be restricted to 25MB/s or less per process and only for the purpose of initially reading a data set, or writing results.

  • 3 nodes ran in 120 minutes
  • 7 nodes ran in 58 minutes
Very nice scaling - near linear, especially with the lag time that can occur with scheduling batch jobs.  The point of this exercise being, know your workload.  In this case, the 7410 solution on the back end was more than up to the demands these 350+ jobs put on it and there was still room to grow and scale out more nodes further reducing overall run time.


Tuning(?)

The question mark is actually appropriate.  For the achieved results, after configuring a RAID1 share on the 7410, only 1 parameter made a significant difference.  During the IO intensive periods, single 1Gb client throughput was observed at 120MB/s simplex, and 180MB/s duplex - producing well over 100,000 interrupts a second.  Jumbo frames were enabled on the 7410 and clients, reducing interrupts by almost 75% and reducing IO intensive job run time by an average of 12%.  Many other NFS, Solaris, tcp/ip tunings were tried, with no meaningful reduction in microbenchmarks, or the actual batch.  Nice relatively simple (for a grid) setup.

Not a direct tuning but an application change worth mentioning was due to the visibility that Analytics provides.  Early on during the load phase of the benchmark, the IO rate was less than spectacular.  What should have taken about 4.5 hours was going to take almost a day.  Drilling down through analytics showed us that 100,000's of file open/closes were occurring that the development team had been unaware of.  Quickly that was fixed and the data loader ran at expected rates.


Okay - Really no other tuning?  How about 10GbE!

Alright, so there was something else we tried which was outside the test results achieved above.  The x2200 we were using is an 8 core box.  Even when maxing out the 1Gb testing with multiple IO bound jobs, there was still CPU resources left over.  Considering that a higher core count with more memory is becoming more the standard when referencing a "client", it makes sense to utilize all those resources.  In the case where a node would be scheduled with multiple IO jobs, we wanted to see if 10GbE could potentially push up client throughput.  Through our testing, two things helped improve performance.

The first was to turn off interrupt blanking.  With blanking disabled, packets are processed when they arrive as opposed to being processed when an interrupt is issued.  Doing this resulted in a ~15% increase in duplex throughput.  Caveat - there is a reason interrupt blanking exists and it isn't to slow down your network throughput.  Tune this only if you have a decent amount of idle cpu as disabling interrupt blanking will consume it.  The other piece that resulted in a significant increase in throughput through the 10GbE NIC was to use multiple NFS client processes.  We achieved this through zones.  By adding a second zone, throughput through the single 10GbE interface increased ~30%.  The final duplex numbers were (These are also peak throughput).

  • 288MB/s no tuning
  • 337MB/s interrupt blanking disabled
  • 430MB/s 2 NFS client processes + interrupt blanking disabled


Conclusion - what does this show?

  • SAS Grid Computing which requires a shared file system between all nodes, can fit in very nicely on the 7410 storage appliance.  The workload continues to scale while adding nodes.
  • The 7410 can provide very solid throughput peaking at over 900MB/s (near 10GbE linespeed) with the configuration tested. 
  • The 7410 is easy to set up, gives an incredible depth of knowledge about the IO your application does which can lead to optimization. 
  • Know your workload, in many cases the 7410 storage appliance can be a great fit at a relatively inexpensive price while providing the benefits described (and others not described) above.
  • 10GbE client networking can be a help if your 1GbE IO pipeline is a bottleneck and there is a reasonable amount of free CPU overhead.


Additional Reading

Sun.com on Sas Grid Computing and the Sun Storage 7410 Unified Storage Array

Description of Sas Grid Computing


    Wednesday Jun 03, 2009

    Wide Variety of Topics to be discussed on BestPerf

    A sample of the various Sun and partner technologies to be discussed:
    OpenSolaris, Solaris, Linux, Windows, vmware, gcc, Java, Glassfish, MySQL, Java, Sun-Studio, ZFS, dtrace, perflib, Oracle, DB2, Sybase, OpenStorage, CMT, SPARC64, X64, X86, Intel, AMD

    About

    BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

    Index Pages
    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today