Friday Oct 02, 2009

Sun X4270 VMware VMmark benchmark achieves excellent result

The Sun Fire X4270 server delivered an excellent result on the virtualization benchmark VMmark from VMware for 8 core platforms running VMware ESX 4.0 OS.

The Sun Fire X4270 server powered by two 2.93GHz Quad-Core Intel Xeon X5570 processors achieved a score of 24.18 @ 17 tiles, supporting 102 fully fledged Virtual Machine instances (17 Tiles).

With this competitive result of 24.18 @ 17 tiles for the VMmark virtualization benchmark, the Sun Fire X4270 server is within 1% of the top score of 24.35 @ 17 tiles for 8 core platforms with equivalent 1066 MHz memory.

The Sun and VMware partnership offers one of the best virtualization platforms in the industry with the performance and scalability features available on the Sun Fire X4270 server.

Under the heavy load conditions of the VMmark benchmark, the Sun Fire X4270 server delivers near linear scalability.

Consolidating multiple applications onto the Sun Fire X4270 server helps IT organizations cut cost and complexity, increase agility, and reduce data center power and cooling.

Customers can consolidate several Solaris 10 OS, Linux, and Microsoft Windows applications onto a single server using VMware Virtual Infrastructure technology

The Sun Fire X4270 server achieved the competitive result of  24.18 @17 tiles with a simple I/O configuration consisting of One single port 10Gbe network card and One 4Gb dual port FC HBA and One SAS/SATA combo HBA for supporting 8 internal SATA SSDs.

Competitive Landscape Performance

VMmark 8 Core Results (sorted by score, bigger is better)


System
CPU (GHz\*) GB ESX ver Spindles RAID Tiles Score Pub Date
HP BL490c G6 2 x Xeon X5570 (2.93) 96 4.0 #164009 133 0 17 24.54 09/22/09
Lenovo R525 G2 2 x Xeon X5570 (2.93) 96 4.0 #164009 55 0 17 24.35 06/30/09
Dell PowerEdge R710 2 x Xeon X5570 (2.93) 96 4.0 #164009 98 0 17 24.27 09/08/09
HP BL490 G6 2 x Xeon X5570 (2.93) 96 4.0 #158725 132 0 17 24.24 05/19/09
Fujitsu RX200 S5 2 x Xeon X5570 (2.93) 96 4.0 #164009 291 0 17 24.20 08/11/09
Sun Fire X4270 2 x Xeon X5570 (2.93) 96 4.0 #164009 235 0 17 24.18 09/28/09
HP DL380 G6 2 x Xeon X5570 (2.93) 96 4.0 #148783 120 0 17 24.15 05/19/09
Cisco B200-M1 2 x Xeon X5570 (2.93) 96 4.0 #151628 20 0 17 24.14 04/21/09
IBM BladeCenter HS22 2 x Xeon X5570 (2.93) 96 4.0 #161959 289 0 17 24.05 06/30/09
Dell PowerEdge R710 2 x Xeon X5570 (2.93) 96 4.0 #150817 170 0 17 24.00 04/21/09

Notes:
\* Intel Turbo Boost up to 3.33GHz

Configuration Summary

Hardware Configuration:

  1. Sun Fire X4270 Server
  2. 2 x 2.93GHz 4-Core Intel Xeon X5570 EP processors
    96GB memory (12x 8GB DIMMs)
    1x 32GB SATA SSD for OS
    7x 32GB SATA SSD for database VMs.
    1x QLE2462 4Gb dual port Fiber Channel Host Adapter
    1x Intel Pro/10GbE-SR

  3. Storage
  4. 9x STK2540 + 9x STK2501 RAID level 0, each with
    12x 146GB SAS 15k rpm drives
    1x STK2540 RAID level 0 with
    12x 146GB SAS 15k rpm drives

  5. Clients:
  6. Sun Blade 6000 Chassis with 10x Sun Blade X6240
    Sun Blade 6000 Chassis with 7x Sun Blade X6240
    Each X6240 equipped with 2 x 2.5 GHz Quad Core AMD Opteron 2380,
    32GB memory, 1x 73GB 15K rpm SAS Disk

Software Configuration:

  1. VMware OS and Benchmark Software
  2. VMware ESX 4.0 build #164009
    VMmark 1.1.1

  3. VMmark Virtual machines
    1. Mail server
    2. Windows 2003 32-bit Enterprise Edition
      2 Virtual CPUs (vcpu)
      24 GB disk
      1 GB of memory
      Exchange 2003
    3. Java server
    4. Windows 2003 64-bit Enterprise Edition
      2 VCPUs
      8 GB disk
      1 GB of memory
      SPECjbb2005
    5. Standby server
    6. Windows 2003 32-bit Enterprise Edition
      1 VCPUs
      4 GB disk
      256 MB of memory
      No application
    7. Web server
    8. SLES 10 64-bit
      2 Virtual CPUs (vcpu)
      8 GB disk
      512 MB of memory
      SPECweb2005
    9. Database server
    10. SLES 10 64-bit
      2 Virtual CPUs (vcpu)
      10 GB disk
      2 GB of memory
      MySQL
    11. File server
    12. SLES 10 32-bit
      1 Virtual CPUs (vcpu)
      8 GB disk
      256 MB of memory
      dbbench

  4. Clients
  5. Windows 2003 32-bit Enterprise Edition
    LoadSim2003, Microsoft Outlook 2003
    SPECjbb Monior
    Idle VM test
    SPECweb2005 client
    MySQL, Sysbench
    dbbench based tbench_srv.exe
    BEA JRockit 5.0 JVM JDK
    VMmark Harness
    STAF framework and STAX execution engine.

Benchmark Description

VMmark is a benchmark developed, distributed and owned by VMware. The purpose of this benchmark is to measure performance and scalability of a pre-established mix of workloads (a Tile), which allows comparisons among similar platforms.

A Tile consists of 6 fixed workload applications, each running in its own Virtual Machine (VM) (6 VMs per Tile) such as Mail, Java, Web, Database and File Serving plus a standby server (spare Virtual Machine).

VMmark benchmark provides two key performance metrics:

  1. The Number of TILES supported by a system, which is an indication of how many systems/applications can be consolidated on one platform (the higher the number of tiles supported the higher the number of consolidated systems.

  2. The Score, which is an overall measure of the amount of work that is accomplished by all the Tiles in the system at a specified level of service of all the workloads during a benchmark run. The Score or Amount of Work is a composition of Actions/minute(Mail server), New Orders/minute(Java server), Access/minute(web server), Commits/minute(Database), MB/sec(file server).

Thus, among systems with the same number of tiles, the system with the higher score is the system that is capable of producing the greater amount of work. For detailed description of VMmark, tiles and score definition, please refer to http://www.vmware.com/products/vmmark/features.html.

See Also

Disclosure Statement

VMware(R) VMmark(tm) is a product of VMware, Inc. VMmark utilizes SPECjbb(r)2005 and SPECweb(r)2005, which are available from the Standard Performance Evaluation Corporation (SPEC). Results from http://www.vmware.com/products/vmmark/ as of September 29, 2009.

Tuesday Sep 22, 2009

Sun X4270 Virtualized for Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

Two-Processor Performance using 8 Virtual CPU Solaris 10 Container Configuration:
  • Sun achieved 36% better performance using Solaris and Solaris 10 containers than a similar configuration on SUSE Linux using VMware ESX Server 4.0 on the same benchmark both using 8 virtual cpus.
  • Solaris Containers are the best virtualization technology for SAP projects and has been supported for more than 4 years. Other virtualization technologies suffer various overheads that decrease performance.
  • The Sun Fire X4270 server with 48G memory and a Solaris 10 container configured with 8 virtual CPUs achieved 2800 SAP SD Benchmark users and beat the Fujitsu PRIMERGY RX300 S5 server with 96G memory and the SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0 by 36%. Both results used the same CPUs and were running the SAP ERP application release 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark.
  • Both the Sun and Fujitsu results were run at 50% and 48% utilization respectively. With these servers being half utilized, there is headroom for additional performance.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details. Note: username and password for SAP Service Marketplace required.
  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details. Note: username and password for SAP Service Marketplace required.

SAP-SD 2-Tier Performance Landscape (in decreasing performance order).


SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results (New version of the benchmark as of January 2009)

System OS
Database
Virtualized? Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
no 3,800 2009
6.0 EP4
(Unicode)
21,000 10,500 21-Aug-09
IBM System 550
4xPower6 @5GHz
64 GB
AIX 6.1
DB2 9.5
no 3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
SUSE Linux Ent Svr 10
MaxDB 7.8
no 3,171 2009
6.0 EP4
(Unicode)
17,380 8,690 17-Apr-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10 container
(8 virtual CPUs)
Oracle 10g
YES
50% util
2,800 2009
6.0 EP4
(Unicode)
15,320 7,660 10-Sep-09
Fujitsu PRIMERGY RX300 S5
2xIntel Xeon X5570 @2.93GHz
96 GB
SUSE Linux Ent Svr 10 on
VMware ESX Server 4.0
MaxDB 7.8
YES
48% util
2,056 2009
6.0 EP4
(Unicode)
11,230 5,615 04-Aug-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4270
      2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
      48 GB memory
      Sun StorageTek CSM200 with 32 \* 73GB 15KRPM 4Gb FC-AL and 32 \* 146GB 15KRPM 4Gb FC-AL Drives

Software Configuration:

    Solaris 10 container configured with 8 virtual CPUs
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle 10g

Sun has submitted the following result for the SAP-SD 2-Tier benchmark. It was approved and published by SAP.

      Number of benchmark users:
    2,800
      Average dialog response time:
    0.971 s

    Fully processed order line:
    306,330

    Dialog steps/hour:
    919,000

    SAPS:
    15,320
      SAP Certification:
    2009034

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Key Points and Best Practices

  • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

  • Solaris 10 Container best practices how-to guide

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 09/10/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) run in 8 virtual cpu container, 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009034. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10, Cert# 2009006. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10 container configured with 8 virtual CPUs, Cert# 2009034. Fujitsu PRIMERGY Model RX300 S5 (2 processors, 8 cores, 16 threads) 2,056 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 96 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0, Cert# 2009029.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

Friday Aug 28, 2009

Sun X4270 World Record SAP-SD 2-Processor Two-tier SAP ERP 6.0 EP 4 (Unicode)

Sun Fire X4270 Server World Record Two Processor performance result on Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • World Record 2-processor performance result on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark on the Sun Fire X4270 server.

  • The Sun Fire X4270 server with two Intel Xeon X5570 processors (8 cores, 16 threads) achieved 3,800 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using Oracle 10g database and Solaris 10 operating system.

  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.

  • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the IBM System 550 server using 4 POWER6 processors, 64 GB memory and the AIX 6.1 operating system.
  • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the HP ProLiant BL460c G6 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Windows Server 2008 operating system.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,800 2009
6.0 EP4
(Unicode)
21,000 10,500 21-Aug-09
IBM System 550
4xPower6 @5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,700 2009
6.0 EP4
(Unicode)
20,300 10,150 30-Mar-09
HP ProLiant BL460c G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,415 2009
6.0 EP4
(Unicode)
18,670 9,335 04-Aug-09
Fujitsu PRIMERGY TX/RX 300 S5
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,328 2009
6.0 EP4
(Unicode)
18,170 9,085 13-May-09
HP ProLiant BL460c G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,310 2009
6.0 EP4
(Unicode)
18,070 9,035 27-Mar-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,300 2009
6.0 EP4
(Unicode)
18,030 9,015 27-Mar-09
Fujitsu PRIMERGY BX920 S1
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,260 2009
6.0 EP4
(Unicode)
17,800 8,900 18-Jun-09
NEC Express5800
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,250 2009
6.0 EP4
(Unicode)
17,750 8,875 28-Jul-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
SuSE Linux Enterprise Server 10
MaxDB 7.8
3,171 2009
6.0 EP4
(Unicode)
17,380 8,690 17-Apr-09

Complete benchmark results may be found at the SAP benchmark website: http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4270
      2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
      48 GB memory
      Sun Storage 6780 with 48 x 73GB 15KRPM 4Gb FC-AL and 16 x 146GB 15KRPM 4Gb FC-AL Drives

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle 10g

Certified Results:

          Performance: 3800 benchmark users
          SAP Certification: 2009033

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Key Points and Best Practices

  • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

See Also

Benchmark Tags

World-Record, Performance, SAP-SD, Solaris, Oracle, Intel, X64, x86, HP, IBM, Application, Database

Disclosure Statement

    Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 08/21/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,700 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009005. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,415 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009031. Fujitsu PRIMERGY TX/RX 300 S5 (2 processors, 8 cores, 16 threads) 3,328 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009014. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,310 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009003. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,300 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009004. Fujitsu PRIMERGY BX920 S1 (2 processors, 8 cores, 16 threads) 3,260 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009024. NEC Express5800 (2 processors, 8 cores, 16 threads) 3,250 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009027. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, MaxDB 7.8, SuSE Linux Enterprise Server 10, Cert# 2009006. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071.

    SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

Monday Jul 06, 2009

Sun Blade 6048 Chassis with Sun Blade X6275: RADIOSS Benchmark Results

Significance of Results

The Sun Blade X6275 cluster, equipped with 2.93 GHz Intel QC X5570 processors and QDR InfiniBand interconnect, delivered the best performance at 32, 64 and 128 cores for the RADIOSS Neon_1M and Taurus_Frontal benchmarks.

  • Using half the nodes (16), the Sun Blade X6275 cluster was 3% faster than the 32-node SGI cluster running the Neon_1M test case.
  • In the 128-core configuration, the Sun Blade X6275 cluster was 49% faster than the SGI cluster running the Neon_1M test case.
  • In the 128-core configuration, the Sun Blade X6275 cluster was 49% faster than the SGI cluster running the Neon_1M test case.
  • In the 128-core configuration, the Sun Blade X6275 cluster was 16% faster than the top SGI cluster running the Taurus_Frontal test case.
  • At both the 32- and 64-core levels the Sun Blade X6275 cluster was 60% faster running the Neon_1M test case.
  • At both the 32- and 64-core levels the Sun Blade X6275 cluster was 4% faster running the Taurus_Frontal test case.

Performance Landscape


RADIOSS Public Benchmark Test Suite
  Results are Total Elapsed Run Times (sec.)

System
cores Benchmark Test Case
TAURUS_FRONTAL
1.8M
NEON_1M
1.06M
NEON_300K
277K

SGI Altix ICE 8200 IP95 2.93GHz, 32 nodes, DDR 256 3559 1672 310

Sun Blade X6275 2.93GHz, 16 nodes, QDR 128 4397 1627 361
SGI Altix ICE 8200 IP95 2.93GHz, 16 nodes, DDR 128 5033 2422 360

Sun Blade X6275 2.93GHz, 8 nodes, QDR 64 5934 2526 587
SGI Altix ICE 8200 IP95 2.93GHz, 8 nodes, DDR 64 6181 4088 584

Sun Blade X6275 2.93GHz, 4 nodes, QDR 32 9764 4720 1035
SGI Altix ICE 8200 IP95 2.93GHz, 4 nodes, DDR 32 10120 7574 1017

Results and Configuration Summary

Hardware Configuration:
    8 x Sun Blade X6275
    2x2.93 GHz Intel QC X5570 processors, turbo enabled (per half blade)
    24 GB (6 x 4GB 1333 MHz DDR3 dimms)
    InfiniBand QDR interconnects

Software Configuration:

    OS: 64-bit SUSE Linux Enterprise Server SLES 10 SP 2
    Application: RADIOSS V9.0 SP 1
    Benchmark: RADIOSS Public Benchmark Test Suite

Benchmark Description

Altair has provided a suite of benchmarks to demonstrate the performance of RADIOSS. The initial set of benchmarks provides four automotive crash models. Future updates will add in marine and aerospace applications, as well as including automotive NVH applications. The benchmarks use real data, requiring double precision computations and the parith feature (Parallel arithmetic algorithm) to obtain exactly the same results whatever the number of processors used.

Please go here for a more complete description of the tests.

Key Points and Best Practices

The Intel QC X5570 processors include a turbo boost feature coupled with a speed-step option in the CPU section of the Advanced BIOS settings. Under specific circumstances, this can provide cpu overclocking which increases the processor frequency from 2.93GHz to 3.2GHz. This feature was was enabled when generating the results reported here.

Node to Node MPI ping-pong tests show a bandwidth of 3000 MB/sec on the Sun Blade X6275 cluster using QDR. The same tests performed on a Sun Fire X2270 cluster and equipped with DDR interconnect produced a bandwidth of 1500 MB/sec. On another recent Intel based Sun Fire X2250 cluster (3.4 GHz DC E5272 processors) also equipped with DDR interconnects, the bandwidth was 1250 MB/sec. This same Sun Fire X2250 cluster equipped with SDR IB interconnect produced an MPI ping-pong bandwidth of 975 MB/sec.

See Also

Current RADIOSS Benchmark Results:
http://www.altairhyperworks.com/Benchmark.aspx

Disclosure Statement

All information on the Fluent website is Copyright 2009 Altair Engineering, Inc. All Rights Reserved. Results from http://www.altairhyperworks.com/Benchmark.aspx

Friday Jul 03, 2009

SPECmail2009 on Sun Fire X4275+Sun Storage 7110: Mail Server System Solution

Significance of Results

Sun has a new SPECmail2009 result on a Sun Fire X4275 server and Sun Storage 7110 Unified Storage System running Sun Java Messaging server 6.2.  OpenStorage and ZFS were a key part of the new World Record SPECmail2009.

  • The Sun Fire X4275 server, equipped by 2, 2.93GHz Intel Xeon QC X5570 processors, running the Sun Java System Messaging Server 6.2 on Solaris 10 achieved a new World Record 8000 SPECmail_Ent2009 IMAP4 users at 38,348 Sessions/hour. 

  • The Sun result was obtained using  about half disk spindles, with the Sun Storage 7110 Unified Storage System, than  Apple's direct attached storage solution.  The Sun submission is the first result using a NAS solution, specifically two Sun Storage 7110 Unified Storage System.
  • This benchmark result clearly demonstrates that the Sun Fire X4275 server together with Sun Java System Messaging Server 6.2, Solaris 10 on Sun Storage 7110 Unified Storage Systems can support a large, enterprise level IMAP mail server environment as a reliable low cost solution, delivering best performance and maximizing data integrity with ZFS.

SPECmail 2009 Performance Landscape(ordered by performance)

System Processors Performance
Type GHz Ch, Co, Th SPECmail_Ent2009
Users
SPECmail2009
Sessions/hour
Sun Fire X4275 Intel X5570 2.93 2, 8, 16 8000 38,348
Apple Xserv3,1 Intel X5570 2.93 2, 8, 16 6000 28,887
Sun SPARC Enterprise T5220 UltraSPARC T2 1.4 1, 8, 64 3600 17,316

Notes:
    Number of SPECmail_Ent2009 users (bigger is better)
    SPECmail2009 Sessions/hour (bigger is better)
    Ch, Co, Th: Chips, Cores, Threads

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org

Results and Configuration Summary

Hardware Configuration:
    Sun Fire X4275
      2 x 2.93 GHz QC Intel Xeon X5570 processors
      72 GB
      12 x 300GB, 10000 RPM SAS disk

    2 x Sun Storage 7110 Unified Storage System, each with
      16 x 146GB SAS 10K RPM

Software Configuration:

    O/S: Solaris 10
    ZFS
    Mail Server: Sun Java System Messaging Server 6.2

Benchmark Description

The SPECmail2009 benchmark measures the ability of corporate e-mail systems to meet today's demanding e-mail users over fast corporate local area networks (LAN). The SPECmail2009 benchmark simulates corporate mail server workloads that range from 250 to 10,000 or more users, using industry standard SMTP and IMAP4 protocols. This e-mail server benchmark creates client workloads based on a 40,000 user corporation, and uses folder and message MIME structures that include both traditional office documents and a variety of rich media content. The benchmark also adds support for encrypted network connections using industry standard SSL v3.0 and TLS 1.0 technology. SPECmail2009 replaces all versions of SPECmail2008, first released in August 2008. The results from the two benchmarks are not comparable.

Software on one or more client machines generates a benchmark load for a System Under Test (SUT) and measures the SUT response times. A SUT can be a mail server running on a single system or a cluster of systems.

A SPECmail2009 'run' simulates a 100% load level associated with the specific number of users, as defined in the configuration file. The mail server must maintain a specific Quality of Service (QoS) at the 100% load level to produce a valid benchmark result. If the mail server does maintain the specified QoS at the 100% load level, the performance of the mail server is reported as SPECmail_Ent2009 SMTP and IMAP Users at SPECmail2009 Sessions per hour. The SPECmail_Ent2009 users at SPECmail2009 Sessions per Hour metric reflects the unique workload combination for a SPEC IMAP4 user.

Key Points and Best Practices

  • Each XTA7110 was configured as 1xRAID1 (14x143GB) volume with 2 NFS Shared LUNs, accessed by the SUT via NFSV4. The mailstore volumes were mounted with the nfs mount options: nointr,hard, xattr. There were a total of 4GB/sec Network connections between the SUT and the 7110 Unified Storage systems using 2 NorthStar dual 1GB/sec NICs.
  • The clients used these Java options: java -d64 -Xms4096m -Xmx4096m -XX:+AggressiveHeap
  • See the SPEC Report for all OS, network and messaging server tunings.

See Also

Disclosure Statement

SPEC, SPECmail reg tm of Standard Performance Evaluation Corporation. Results as of 07/06/2009 on www.spec.org. SPECmail2009: Sun Fire X4275 (8 cores, 2 chips) SPECmail_Ent2009 8000 users at 38,348 SPECmail2009 Sessions/hour. Apple Xserv3,1 (8 cores, 2 chips) SPECmail_Ent2009 6000 users at 28,887 SPECmail2009 Sessions/hour.

Tuesday Jun 30, 2009

Sun Blade 6048 and Sun Blade X6275 NAMD Molecular Dynamics Benchmark beats IBM BlueGene/L

Significance of Results

A Sun Blade 6048 chassis with 12 Sun Blade X6275 server modules ran benchmarks using the NAMD molecular dynamics applications software. NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. NAMD is driven by major trends in computing and structural biology and received a 2002 Gordon Bell Award.
  • The cluster of 12 Sun Blade X6275 server modules was 6.2x faster than 256 processor configuration of the IBM BlueGene/L.
  • The cluster of 12 Sun Blade X6275 server modules exhibited excellent scalability for NAMD molecular dynamics simulation, up to 10.4x speedup for 12 blades relative to 1 blade.
  • For largest molecule considered, the cluster of 12 Sun Blade X6275 server modules achieved a throughput of 0.094 seconds per simulation step.
Molecular dynamics simulation is important to biological and materials science research. Molecular dynamics is used to determine the low energy conformations or shapes of a molecule. These conformations are presumed to be the biologically active conformations.

Performance Landscape

The NAMD Performance Benchmarks web page plots the performance of NAMD when the ApoA1 benchmark is executed on a variety of clusters. The performance is expressed in terms of the time in seconds required to execute one step of the molecular dynamics simulation, multiplied by the number of "processors" on which NAMD executes in parallel. The following table compares the performance of NAMD version 2.6 when executed on the Sun Blade X6275 cluster to the performance of NAMD as reported for several of the clusters on the web page. In this table, the performance is expressed in terms of the time in seconds required to execute one step of the molecular dynamics simulation, however, not multiplied by the number of "processors". A smaller number implies better performance.
Cluster Name and Interconnect Throughput for 128 Cores
(seconds per step)
Throughput for 192 Cores
(seconds per step)
Throughput for 256 Cores
(seconds per step)
Sun Blade X6275 InfiniBand 0.013 0.010
Cambridge Xeon/3.0 InfiniPath 0.016
0.0088
NCSA Xeon/2.33 InfiniBand 0.019
0.010
AMD Opteron/2.2 InfiniPath 0.025
0.015
IBM HPCx PWR4/1.7 Federation 0.039
0.021
SDSC IBM BlueGene/L MPI 0.108
0.062

The following tables report results for NAMD molecular dynamics using a cluster of Sun Blade X6275 server modules. The performance of the cluster is expressed in terms of the time in seconds that is required to execute one step of the molecular dynamics simulation. A smaller number implies better performance.

Blades Cores STMV molecule (1) f1 ATPase molecule (2) ApoA1 molecule (3)
Thruput
(secs/ step)
spdup effi'cy Thruput
(secs/ step)
spdup effi'cy Thruput
(secs/ step)
spdup effi'cy
12 192 0.0941 10.6 88% 0.0270 9.1 76% 0.0102 8.1 68%
8 128 0.1322 7.5 94% 0.0317 7.7 97% 0.0131 6.3 79%
4 64 0.2656 3.7 94% 0.0610 4.0 101% 0.0204 4.1 102%
1 16 0.9952 1.0 100% 0.2454 1.0 100% 0.0829 1.0 100%

spdup - speedup versus 1 blade result
effi'cy - speedup efficiency versus 1 blade result

(1) Synthetic Tobacco Mosaic Virus (STMV) molecule, 1,066,628 atoms, 12 Angstrom cutoff, Langevin dynamics, 500 time steps
(2) f1 ATPase molecule, 327,506 atoms, 11 Angstrom cutoff, particle mesh Ewald dynamics, 500 time steps
(3) ApoA1 molecule, 92,224 atoms, 12 Angstrom cutoff, particle mesh Ewald dynamics, 500 time steps

Results and Configuration Summary

Hardware Configuration

  • Sun Blade[tm] 6048 Modular System with one shelf configured with
    • 12 x Sun Blade X6275, each with
      • 2 x (2 x 2.93 GHz Intel QC Xeon X5570 processors)
      • 2 x (24 GB memory)
      • Hyper-Threading (HT) off, Turbo Mode on

Software Configuration

  • SUSE Linux Enterprise Server 10 SP2 kernel version 2.6.16.60-0.31_lustre.1.8.0.1-smp
  • Scali MPI 5.6.6
  • gcc 4.1.2 (1/15/2007), gfortran 4.1.2 (1/15/2007)

Key Points and Best Practices

  • Models with large numbers of atoms scale better than models with small numbers of atoms.

About the Sun Blade X6275

The Intel QC X5570 processors include a turbo boost feature coupled with a speed-step option in the CPU section of the Advanced BIOS settings. Under specific circumstances, this can provide cpu overclocking which increases the processor frequency from 2.93GHz to 3.2GHz. This feature was was enabled when generating the results reported here.

Benchmark Description

Molecular dynamics simulation is widely used in biological and materials science research. NAMD is a public-domain molecular dynamics software application for which a variety of molecular input directories are available. Three of these directories define:
  • the Synthetic Tobacco Mosaic Virus (STMV) that comprises 1,066,628 atoms
  • the f1 ATPase enzyme that comprises 327,506 atoms
  • the ApoA1 enzyme that comprises 92,224 atoms
Each input directory also specifies the type of molecular dynamics simulation to be performed, for example, Langevin dynamics with a 12 Angstrom cutoff for 500 time steps, or particle mesh Ewald dynamics with an 11 Angstrom cutoff for 500 time steps.

See Also

Disclosure Statement

NAMD, see http://www.ks.uiuc.edu/Research/namd/performance.html for more information, results as of 6/26/2009.

Tuesday Jun 23, 2009

Sun Blade X6275 results capture Top Places in CPU2006 SPEED Metrics

Significance of Multiple World Records

The Sun Blade X6275 server module, equipped with two Intel QC Xeon X5570 2.93 GHz processors and running the OpenSolaris 2009.06 operating system delivered the best SPECfp2006 and SPECint2006 results to date.
  • The Sun Blade X6275 server module using the Sun Studio 12 update 1 compiler and the OpenSolaris 2009.06 operating system delivered a World Record SPECfp2006 result of 50.8.

  • This SPECfp2006 result beats the best result by the competition, using the same processor type, by 20%.
  • The Sun Blade X6275 server module using the Sun Studio 12 update 1 compiler and the OpenSolaris 2009.06 operating system delivered a World Record SPECint2006 result of 37.4.

  • This SPECint2006 result just tops the best result by the competition even though that result used the 9% faster clock W-series chip of the Nehalem family.
Sun(TM) Studio 12 Update 1 contains new features and enhancements to boost performance and simplify the creation of high-performance parallel applications for the latest multicore x86 and SPARC-based systems running on leading Linux platforms, the Solaris(TM) Operating System (OS) or OpenSolaris(TM). The Sun Studio 12 Update 1 software has set almost a dozen industry benchmark records to date, and in conjunction with the freely available community-based OpenSolaris 2009.06 OS, was instrumental in landing four new ground-breaking SPEC CPU2006 results.

Sun Studio 12 Update 1 includes improvements in the compiler's ability to automatically parallelise codes - afterall the easiest way to develop parallel applications is if the compiler can do it for you; improvements to the support of parallelisation specifications like OpenMP, this release includes support for the latest OpenMP 3.0 specification; and improvements in the tools and their ability to provide the developer meaningful feedback about parallel code, for example the ability of the Performance Analyzer to profile MPI code.

Performance Landscape

SPEC CPU2006 Performance Charts - bigger is better, selected results, please see www.spec.org for complete results.

SPECint2006

System Processors Performance Results Notes (1)
Type GHz Chips Cores Peak Base
Sun Blade X6275 Xeon X5570 2.93 2 8 37.4 31.0 New Record
ASUS TS700-E6 (Z8PE-D12X) Xeon W5580 3.2 2 8 37.3 33.2 Previous Best
Fujitsu R670 Xeon W5580 3.2 2 8 37.2 33.2
Sun Blade X6270 Xeon X5570 2.93 2 8 36.9 32.0
Fujitsu Celsius R570 Xeon X5570 2.93 2 8 36.3 32.2
YOYOtech MLK1610 Intel Core i7-965 3.73 1 4 36.0 32.5
HP ProLiant DL585 G5 Opteron 8393 3.1 1 1 23.4 19.7 Best Opteron
IBM System p570 POWER6 4.70 1 1 21.7 17.8 Best POWER6

(1) Results as of 22 June 2009 from www.spec.org and this report.

SPECfp2006

System Processors Performance Results Notes (2)
Type GHz Chips Cores Peak Base
Sun Blade X6275 Xeon X5570 2.93 2 8 50.8 44.2 New Record
Sun Blade X6270 Xeon X5570 2.93 2 8 50.4 45.0 Previous Best
Sun Blade X4170 Xeon X5570 2.93 2 8 48.9 43.9
Fujitsu R670 Xeon W5580 3.2 2 8 42.2 39.5
HP ProLiant DL585 G5 Opteron 8393 3.1 2 8 25.9 23.6 Best Opteron
IBM Power 595 POWER6 5.00 1 1 24.9 20.1 Best POWER6

(2) Results as of 22 June 2009 from www.spec.org and this report.

Results and Configuration Summary

Hardware Configuration:
    Sun Blade X6275
      2 x 2.93 GHz QC Intel Xeon X5570 processors, turbo enabled
      24 GB, (6 x 4GB DDR3-1333 DIMM)
      1 x 146 GB, 10000 RPM SAS disk

Software Configuration:

    O/S: OpenSolaris 2009.06
    Compiler: Sun Studio 12 Update 1
    Other SW: MicroQuill SmartHeap Library 9.01 x64
    Benchmark: SPEC CPU2006 V1.1

Key Points and Best Practices

These results show that choosing the right compiler for the job can maximize one's investment in hardware. The Sun Studio compiler teamed with the OpenSolaris operating system allows one to tackle hard problems to get quick solution turnaround.
  • Autoparallelism was used to deliver the maximum time to solution. These results show that autoparallel compilation is a very viable compilation option that should be considered when one needs the quickest turnaround of results. Note that not all codes benefit from this optimization, just like they can't always take advantage of other compiler optimization techniques.
  • OpenSolaris 2009.06 was able to fully take advantage of the turbo mode of the Nehalem family of processors.

Disclosure Statement

SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 6/22/2009. Sun Blade X6275 (Intel X5570, 2 chips, 8 cores) 50.8 SPECfp2006, 37.4 SPECint2006; ASUS TS700-E6 (Intel W5570, 2 chips, 8 cores) 37.3 SPECint2006; Fujitsu R670 (Intel X5570, 2 chips, 8 cores) 42.2 SPECfp2006.

Tuesday Jun 16, 2009

Sun Fire X2270 MSC/Nastran Vendor_2008 Benchmarks

Significance of Results

The I/O intensive MSC/Nastran Vendor_2008 benchmark test suite was used to compare the performance on a Sun Fire X2270 server when using SSDs internally instead of HDDs.

The effect on performance from increasing memory to augment I/O caching was also examined. The Sun Fire X2270 server was equipped with Intel QC Xeon X5570 processors (Nehalem). The positive effect of adding memory to increase I/O caching is offset to some degree by the reduction in memory frequency with additional DIMMs in the bays of each memory channel on each cpu socket for these Nehalem processors.

  • SSDs can significantly improve NASTRAN performance especially on runs with larger core counts.
  • Additional memory in the server can also increase performance, however in some systems additional memory can decrease memory GHz so this may offset the benefits of increased capacity.
  • If SSDs are not used striped disks will often improve performance of IO-bound MCAE applications.
  • To obtain the highest performance it is recommended that SSDs be used and servers be configured with the largest memory possible without decreasing memory GHz. One should always look at the workload characteristics and compare against this benchmark to correctly set expectations.

SSD vs. HDD Performance

The performance of two striped 30GB SSDs was compared to two striped 7200 rpm 500GB SATA drives on a Sun Fire X2270 server.

  • At the 8-core level (maximum cores for a single node) SSDs were 2.2x faster for the larger xxocmd2 and the smaller xlotdf1 cases.
  • For 1-core results SSDs are up to 3% faster.
  • On the smaller mdomdf1 test case there was no increase in performance on the 1-, 2-, and 4-cores configurations.

Performance Enhancement with I/O Memory Caching

Performance for Nastran can often be increased by additional memory to provide additional in-core space to cache I/O and thereby reduce the IO demands.

The main memory was doubled from 24GB to 48GB. At the 24GB level one 4GB DIMM was placed in the first bay of each of the 3 CPU memory channels on each of the two CPU sockets on the Sun Fire X2270 platform. This configuration allows a memory frequency of 1333MHz.

At the 48GB level a second 4GB DIMM was placed in the second bay of each of the 3 CPU memory channels on each socket. This reduces the memory frequency to 1066MHz.

Adding Memory With HDDs (SATA)

  • The additional server memory increased the performance when running with the slower SATA drives at the higher core levels (e.g. 4- & 8-cores on a single node)
  • The larger xxocmd2 case was 42% faster and the smaller xlotdf1 case was 32% faster at the maximum 8-core level on a single system.
  • The special I/O intensive getrag case was 8% faster at the 1-core level.

Adding Memory With SDDs

  • At the maximum 8-core level (for a single node) the larger xxocmd2 case was 47% faster in overall run time.
  • The effects were much smaller at lower core counts and in the tests at the 1-core level most test cases ran from 5% to 14% slower with the slower CPU memory frequency dominating over the added in-core space available for I/O caching vs. direct transfer to SSD.
  • Only the special I/O intensive getrag case was an exception running 6% faster at the 1-core level.

Increasing performance with Two Striped (SATA) Drives

The performance of multiple striped drives was also compared to single drive. The study compared two striped internal 7200 rpm 500GB SATA drives to a singe single internal SATA drive.

  • On a single node with 8 cores, the largest test xx0cmd2 was 40% faster, a smaller test case xl0tdf1 was 33% faster and even the smallest test case mdomdf1 case was 12% faster.

  • On 1-core the added boost in performance with striped disks was from 4% to 13% on the various test cases.

  • One 1-core the special I/O-intensive test case getrag was 29% faster.

Performance Landscape

Times in table are elapsed time (sec).


MSC/Nastran Vendor_2008 Benchmark Test Suite

Test Cores Sun Fire X2270
2 x X5570 QC 2.93 GHz
2 x 7200 RPM SATA HDDs
Sun Fire X2270
2 x X5570 QC 2.93 GHz
2 x SSDs
48 GB
1067MHz
24 GB
2 SATA
1333MHz
24 GB
1 SATA
1333MHz
Ratio (2xSATA):
48GB/
24GB
Ratio:
2xSATA/
1xSATA
48 GB
1067MHz
24 GB
1333MHz
Ratio:
48GB/
24GB
Ratio (24GB):
2xSATA/
2xSSD

vlosst1 1 133 127 134 1.05 0.95 133 126 1.05 1.01

xxocmd2 1
2
4
8
946
622
466
1049
895
614
631
1554
978
703
991
2590
1.06
1.01
0.74
0.68
0.87
0.87
0.64
0.60
947
600
426
381
884
583
404
711
1.07
1.03
1.05
0.53
1.01
1.05
1.56
2.18

xlotdf1 1
2
4
8
2226
1307
858
912
2000
1240
833
1562
2081
1308
1030
2336
1.11
1.05
1.03
0.58
0.96
0.95
0.81
0.67
2214
1315
744
674
1939
1189
751
712
1.14
1.10
0.99
0.95
1.03
1.04
1.11
2.19

xloimf1 1 1216 1151 1236 1.06 0.93 1228 1290 0.95 0.89

mdomdf1 1
2
4
987
524
270
913
485
237
983
520
269
1.08
1.08
1.14
0.93
0.93
0.88
987
524
270
911
484
250
1.08
1.08
1.08
1.00
1.00
0.95

Sol400_1
(xl1fn40_1)
1 2555 2479 2674 1.03 0.93 2549 2402 1.06 1.03

Sol400_S
(xl1fn40_S)
1 2450 2302 2481 1.06 0.93 2449 2262 1.08 1.02

getrag
(xx0xst0)
1 778 843 1178 0.92 0.71 771 817 0.94 1.03

Results and Configuration Summary

Hardware Configuration:
    Sun Fire X2270
      1 2-socket rack mounted server
      2 x 2.93 GHz QC Intel Xeon X5570 processors
      2 x internal striped SSDs
      2 x internal striped 7200 rpm 500GB SATA drives

Software Configuration:

    O/S: Linux 64-bit SUSE SLES 10 SP 2
    Application: MSC/NASTRAN MD 2008
    Benchmark: MSC/NASTRAN Vendor_2008 Benchmark Test Suite
    HP MPI: 02.03.00.00 [7585] Linux x86-64
    Voltaire OFED-5.1.3.1_5 GridStack for SLES 10

Benchmark Description

The benchmark tests are representative of typical MSC/Nastran applications including both SMP and DMP runs involving linear statics, nonlinear statics, and natural frequency extraction.

The MD (Multi Discipline) Nastran 2008 application performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior and/or deformations are concerned. The new release includes the MARC module for general purpose nonlinear analyses and the Dytran module that employs an explicit solver to analyze crash and high velocity impact conditions.

  • As of the Summer '08 there is now an official Solaris X64 version of the MD Nastran 2008 system that is certified and maintained.
  • The memory requirements for the test cases in the new MSC/Nastran Vendor 2008 benchmark test suite range from a few hundred megabytes to no more than 5 GB.

Please go here for a more complete description of the tests.

Key Points and Best Practices

For more on Best Practices of SSD on HPC applications also see the Sun Blueprint:
http://wikis.sun.com/display/BluePrints/Solid+State+Drives+in+HPC+-+Reducing+the+IO+Bottleneck

Additional information on the MSC/Nastran Vendor 2008 benchmark test suite.

  • Based on the maximum physical memory on a platform the user can stipulate the maximum portion of this memory that can be allocated to the Nastran job. This is done on the command line with the mem= option. On Linux based systems where the platform has a large amount of memory and where the model does not have large scratch I/O requirements the memory can be allocated to a tmpfs scratch space file system. On Solaris X64 systems advantage can be taken of ZFS for higher I/O performance.

  • The MSC/Nastran Vendor 2008 test cases don't scale very well, a few not at all and the rest on up to 8 cores at best.

  • The test cases for the MSC/Nastran module all have a substantial I/O component where 15% to 25% of the total run times are associated with I/O activity (primarily scratch files). The required scratch file size ranges from less than 1 GB on up to about 140 GB. Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system, further enhanced as indicated here by implementing the Lustre based I/O system. High performance interconnects such as Infiniband for inter node cluster message passing as well as I/O transfer from the storage system can also enhance performance substantially.

See Also

Disclosure Statement

MSC.Software is a registered trademark of MSC. All information on the MSC.Software website is copyrighted. MSC/Nastran Vendor 2008 results from http://www.mscsoftware.com and this report as of June 9, 2009.

Tuesday Jun 09, 2009

Free Compiler Wins Nehalem Race by 2x

Not the Free Compiler That You Thought, No, This Other One.

Nehalem performance measured with several software configurations

Contributed by: John Henning and Karsten Guthridge

Introduction

race

The GNU C Compiler, GCC, is popular, widely available, and an exemplary collaborative effort.

But how does it do for performance -- for example, on Intel's latest hot "Nehalem" processor family? How does it compare to the freely available Sun Studio compiler?

Using the SPEC CPU benchmarks, we take a look at this question. These benchmarks depend primarily on performance of the chip, the memory hierarchy, and the compiler. By holding the first two of these constant, it is possible to focus in on compiler contributions to performance.

Current Record Holder

The current SPEC CPU2006 floating point speed record holder is the Sun Blade X6270 server module. Using 2x Intel Xeon X5570 processor chips and 24 GB of DDR3-1333 memory, it delivers a result of 45.0 SPECfp_base2006 and 50.4 SPECfp2006. [1]

We used this same blade system to compare GCC vs. Studio. On separate, but same-model disks, this software was installed:

  • SuSE Linux Enterprise Server 11.0 (x86_64) and GCC V4.4.0 built with gmp-4.3.1 and mpfr-2.4.1
  • OpenSolaris2008.11 and Sun Studio 12 Update 1

Tie One Hand Behind Studio's Back

In order to make the comparison more fair to GCC, we took several steps.

  1. We simplified the tuning for the OpenSolaris/Sun Studio configuration. This was done in order to counter the criticism that one sometimes hears that SPEC benchmarks have overly aggressive tuning. Benchmarks were optimized with a reasonably short tuning string:

    For all:  -fast -xipo=2 -m64 -xvector=simd -xautopar
    For C++, add:  -library=stlport4
  2. Recall that SPEC CPU2006 allows two kinds of tuning: "base", and "peak". The base metrics require that all benchmarks of a given language use the same tuning. The peak metrics allow individual benchmarks to have differing tuning, and more aggressive optimizations, such as compiler feedback. The simplified Studio configuration used only the less aggressive base tuning.

Both of the above changes limited the performance of Sun Studio.  Several measures were used to increase the performance of GCC:

  1. We tested the latest released version of GCC, 4.4.0, which was announced on 21 April 2009. In our testing, GCC 4.4.0 provides about 10% better overall floating point performance than V4.3.2. Note that GCC 4.4.0 is more recent than the compiler that is included with recent Linux distributions such as SuSE 11, which includes 4.3.2; or Ubuntu 8.10, which updates to 4.3.2 when one does "apt-get install gcc". It was installed with the math libraries mpfr 2.4.1 and gmp 4.3.1, which are labeled as the latest releases as of 1 June 2009.

  2. A tuning effort was undertaken with GCC, including testing of -O2 -O3 -fprefetch-loop-arrays -funroll-all-loops -ffast-math -fno-strict-aliasing -ftree-loop-distribution -fwhole-program -combine and -fipa-struct-reorg

  3. Eventually, we settled on this tuning string for GCC base:

    For all:  -O3 -m64 -mtune=core2 -msse4.2 -march=core2
    -fprefetch-loop-arrays -funroll-all-loops
    -Wl,-z common-page-size=2M
    For C++, add:  -ffast-math

    The reason that only the C++ benchmarks used the fast math library was that 435.gromacs, which uses C and Fortran, fails validation with this flag. (Note: we verified that the benchmarks successfully obtained 2MB pages.)

Studio wins by 2x, even with one hand tied behind its back

At this point, a fair base-to-base comparison can be made, and Sun Studio/OpenSolaris finishes the race while GCC/Linux is still looking for its glasses: 44.8 vs. 21.1 (see Table 1). Notice that Sun Studio provides more than 2x the performance of GCC.

Table 1: Initial Comparisons, SPECfp2006,
Sun Studio/Solaris vs. GCC/Linux
  Base Peak
Industry FP Record
    Sun Studio 12 Update 1
    OpenSolaris 2008.11
45.0 50.4
Studio/OpenSolaris: simplify above (less tuned) 44.8  
GCC V4.4 / SuSE Linux 11 21.1  
Notes: All results reported are from rule-compliant, "reportable" runs of the SPEC CPU2006 floating point suite, CFP2006. "Base" indicates the metric SPECfp_base2006. "Peak" indicates SPECfp2006. Peak uses the same benchmarks and workloads as base, but allows more aggressive tuning. A base result, may, optionally, be quoted as peak, but the converse is not allowed. For details, see SPEC's Readme1st.

Fair? Did you say "Fair"?

Wait, wait, the reader may protest - this is all very unfair to GCC, because the Studio result used all 8 cores on this 2-chip system, whereas GCC used only one core! You're using trickery!

To this plaintive whine, we respond that:

  • Compiler auto-parallelization technology is not a trick. Rather, it is an essential technology in order to get the best performance from today's multi-core systems. Nearly all contemporary CPU chips provide support for multiple cores. Compilers should do everything possible to make it easy to take advantage of these resources.

  • We tried to use more than one core for GCC, via the -ftree-parallelize-loops=n flag. GCC's autoparallelization appears to be in a much earlier development stage than Studio's, since we did not observe any improvements for all values of "n" that we tested. From the GCC wiki, it appears that a new autoparallelization effort is under development, which may improve its results at a later time.

  • But, all right, if you insist, we will make things even harder for Studio, and see how it does.

Tie Another Hand Behind Studio's Back

The earlier section mentioned various ways in which the performance comparison had been made easier for GCC. Continuing the paragraph numbering from above, we took these additional measures:

  1. Removed the autoparallelization from Studio, substituting instead a request for 2MB pagesizes (which the GCC tuning already had).

  2. Added "peak" tuning to GCC: for benchmarks that benefit, add -ffast-math, and compiler profile-driven feedback

At this point, Studio base beats GCC base by 38%, and Studio base beats GCC peak by more than 25% (see table 2).

Table 2: Additional Comparisons, SPECfp2006,
Sun Studio/Solaris vs. GCC/Linux
  Base Peak
Sun Studio/OpenSolaris: base only, noautopar 29.1  
GCC V4.4 / SuSE Linux 11 21.1 23.1
The notes from Table 1 apply here as well.

Bottom line

The freely available Sun Studio 12 Update 1 compiler on OpenSolaris provides more than double the performance of GCC V4.4 on SuSE Linux, as measured by SPECfp_base2006.

If compilation is restricted to avoid using autoparallelization, Sun Studio still wins by 38% (base to base), or by more than 25% (Studio base vs. GCC peak).

YMMV

Your mileage may vary. It is certain that both GCC and Studio could be improved with additional tuning efforts. Both provide dozens of compiler flags, which can keep the tester delightfully engaged for an unbounded number of days. We feel that the tuning presented here is reasonable, and that additional tuning effort, if applied to both compilers, would not radically alter the conclusions.

Additional Information

The results disclosed in this article are from "reportable" runs of the SPECfp2006 benchmarks, which have been submitted to SPEC.

[1] SPEC and SPECfp are registered trademarks of the Standard Performance Evaluation Corporation. Competitive comparisons are based on data published at www.spec.org as of 1 June 2009. The X6270 result can be found at http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090413-07019.html.

Friday Jun 05, 2009

Interpreting Sun's SPECpower_ssj2008 Publications

Sun recently entered the SPECpower fray with the publication of three results on the SPECpower_ssj2008 benchmark.  Strangely, the three publications documented results on the same hardware platform (Sun Netra X4250) running identical software stacks, but the results were markedly different.  What exactly were we trying to get at?

 Benchmark Configurations

Sun produces robust industrial-grade servers with a range of redundancy features we believe benefit our customers.   These features increase reliability, at the cost of additional power consumption. For example, redundant power supplies and redundant fans allow servers to tolerate faults, and hot-swap capabilities further minimize downtime.

The benchmark run and reporting rules require the incorporation within the tested configuration of all components implied by the model name.  Within these limitations, the first publication was intended to be the best result (that is, the lowest power consumption per unit of performance) achievable on the Sun Netra X4250 platform, by minimizing the configured hardware to the greatest extent possible.

Common Components

All tested configurations had the following components in common:

  • System:  Sun Netra X4250
  • Processor: 2 x Intel L5408 QC @ 2.13GHz
  • 2 x 658 watt redundant AC power supplies
  • redundant fans
  • standard I/O expansion mezzanine
  • standard Telco dry contact alarm

And the same software stack:

  • OS: Windows Server 2003 R2 Enterprise X64 Edition SP2
  • Drivers: platform-specific drivers from Sun Netra X4250 Tools and Drivers DVD Version 2.1N
  • JVM: Java HotSpot 32-Bit Server VM on Windows, version 1.6.0_14

Tiny Configuration

In addition to the common hardware components, the tiny configuration was limited to:

  • 8 GB of Memory (4 x 2048 MB as PC2-5300F 2Rx8)
  • 1 x Sun 146 GB 10K RPM SAS internal drive

This is called the tiny configuration because it seems unlikely that most customers would configure an 8-core server with only one disk and only 1 GB available per core. Nevertheless, from a benchmark point of view, this configuration gave the best result.

Typical Configuration

The other two results were both produced on a configuration we considered much more typical of configurations that are actually ordered by customers.  In addition to the common hardware, these typical configuration included:

  • 32 GB of Memory (8 x 4096 MB as PC2-5300F)
  • 4 x Sun 146 GB 10K RPM SAS internal drives
  • 1 x Sun x8 PCIe Quad Gigabit Ethernet option card (X4447A-Z)

Nothing special was done with the additional components.  The added memory increased the performance component of the benchmark. The other components were installed and configured but allowed to sit idle, so consumed less power than they would have under load.

One Other Thing: Tuning for Performance

So one thing we're getting at is the difference in power consumption between a small configuration optimized for a power-performance benchmark and a typical configuration optimized for customer workloads.  Hardware (power consumption) is only half of the benchmark--the other half being the performance achieved by the System Under Test (SUT).

Tuning Choices 

In all three publications the identical tunings were applied at the software level: identical java command-line arguments and JVM-to-processor affinity.  We also applied, in the case of the better results, the common (but usually non-default) BIOS-level optimization of disabling hardware prefetcher and adjacent cache line prefetch.  These optimizations are commonly applied to produce optimized SPECpower_ssj2008 results but it is unlikely that many production applications would benefit from these settings.  To demonstrate the effect of this tuning, the final result was generated with standard BIOS settings.

 And just so we couldn't be accused of sand-bagging the results, the number of JVMs was increased in the typical configurations to take advantage of the additional memory populated over and above the tiny configuration.  Additional performance was achieved but sadly it doesn't compensate for the higher power consumption of all that memory.

So in summary we tuned:

  • Tiny Configuration: non-default BIOS settings
  • Typical Configuration 1: non-default BIOS settings; additional JVMs to utilize added memory
  • Typical Configuration 2: default BIOS settings; additional JVMs to utilize added memory

At the OS level, all tunings  were identical.

Results 

The results are summarized in this table:

System
(Click system for SPEC full disclosure)

Processors

Performance

Model

GHz

Metric
overall
ssj_ops/watt

Peak
Performance
ssj_ops

Peak
Power
watts

Idle
Power
watts

Sun Netra X4250
(8GB non-default BIOS)

L5408

2.13

600

244832

226

174

Sun Netra X425
(32GB non-default BIOS)

L5408

2.13

478

251555

294

226

Sun Netra X4250
(32GB default BIOS)

L5408

2.13

437

229828

296

225

Conclusions

  • The measurement and reporting methods of the benchmark encourage small memory configurations.  Comparing the first and second result, adding additional memory yielded minimal performance improvement (from 244832 to 251555) but a large increase in power consumption, 68 watts at peak.

  • In our opinion, unrealistically small configurations yield the best results on this benchmark.  On the more typical system, the benchmark overall metric decreased from 600 overall ssj_ops per watt to 478 overall ssj_ops per watt, despite our best effort to utilize the additional configured memory.

  • On typical configurations, reverting to default BIOS settings resulted in a significant decrease in performance (from 25155 to 229828) with no corresponding decrease in power consumption (essentially identical for both results).

Configurations typical of customer systems (with adequate memory, internal disks, and option cards) consume more power than configurations which are commonly benchmarked, while providing no corresponding improvement in SPECpower_ssj2008 benchmark performance. The result is a lower overall power-performance metric on typical configurations and a lack of published benchmark results on robust systems with the capacities and redundancies that enterprise customers desire.

Fair Use Disclosure

SPEC, SPECpower, and SPECpower_ssj are trademarks of the Standard Performance Evaluation Corporation.  All results from the SPEC website (www.spec.com)  as of June 5, 2009.  For a complete set of accepted results refer to that site.

Wednesday Jun 03, 2009

Wide Variety of Topics to be discussed on BestPerf

A sample of the various Sun and partner technologies to be discussed:
OpenSolaris, Solaris, Linux, Windows, vmware, gcc, Java, Glassfish, MySQL, Java, Sun-Studio, ZFS, dtrace, perflib, Oracle, DB2, Sybase, OpenStorage, CMT, SPARC64, X64, X86, Intel, AMD

About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« May 2016
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31    
       
Today