Wednesday Sep 28, 2011

SPARC T4 Servers Set World Record on PeopleSoft HRMS 9.1

Oracle's SPARC T4-4 servers running Oracle's PeopleSoft HRMS Self-Service 9.1 benchmark and Oracle Database 11g Release 2 achieved World Record performance on Oracle Solaris 10.

  • Using two SPARC T4-4 servers to run the application and database tiers and one SPARC T4-2 server to run the webserver tier, Oracle demonstrated world record performance of 15,000 concurrent users running the PeopleSoft HRMS Self-Service 9.1 benchmark.

  • The combination of the SPARC T4 servers running the PeopleSoft HRMS 9.1 benchmark supports 3.8x more online users with faster response time compared to the best published result from IBM on the previous PeopleSoft HRMS 8.9 benchmark.

  • The average CPU utilization on the SPARC T4-4 server in the application tier handling 15,000 users was less than 50%, leaving significant room for application growth.

  • The SPARC T4-4 server on the application tier used Oracle Solaris Containers which provide a flexible, scalable and manageable virtualization environment.

Performance Landscape

PeopleSoft HRMS Self-Service 9.1 Benchmark
Systems Processors Users Ave Response -
Search (sec)
Ave Response -
Save (sec)
SPARC T4-2 (web)
SPARC T4-4 (app)
SPARC T4-4 (db)
2 x SPARC T4, 2.85 GHz
4 x SPARC T4, 3.0 GHz
4 x SPARC T4, 3.0 GHz
15,000 1.01 0.63
PeopleSoft HRMS Self-Service 8.9 Benchmark
IBM Power 570 (web/app)
IBM Power 570 (db)
12 x POWER5, 1.9 GHz
4 x POWER5, 1.9 GHz
4,000 1.74 1.25
IBM p690 (web)
IBM p690 (app)
IBM p690 (db)
4 x POWER4, 1.9 GHz
12 x POWER4, 1.9 GHz
6 x 4392 MPIS/Gen1
4,000 1.35 1.01

The main differences between version 9.1 and version 8.9 of the benchmark are:

  • the database expanded from 100K employees and 20K managers to 500K employees and 100K managers,
  • the manager data was expanded,
  • a new transaction, "Employee Add Profile," was added, the percent of users executing it is less then 2%, and the transaction has a heavier footprint,
  • version 9.1 has a different benchmark metric (Average Response search/save time for x number of users) versus single user search/save time,
  • newer versions of the PeopleSoft application and PeopleTools software are used.

Configuration Summary

Application Server:

1 x SPARC T4-4 server
4 x SPARC T4 processors 3.0 GHz
512 GB main memory
5 x 300 GB SAS internal disks,
2 x 100 GB internal SSDs
1 x 300 GB internal SSD
Oracle Solaris 10 8/11
PeopleSoft PeopleTools 8.51.02
PeopleSoft HCM 9.1
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_20

Web Server:

1 x SPARC T4-2 server
2 x SPARC T4 processors 2.85 GHz
256 GB main memory
1 x 300 GB SAS internal disks
1 x 300 GB internal SSD
Oracle Solaris 10 8/11
PeopleSoft PeopleTools 8.51.02
Oracle WebLogic Server 11g (10.3.3)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_20

Database Server:

1 x SPARC T4-4 server
4 x SPARC T4 processors 3.0 GHz
256 GB main memory
3 x 300 GB SAS internal disks
1 x Sun Storage F5100 Flash Array (80 flash modules)
Oracle Solaris 10 8/11
Oracle Database 11g Release 2

Benchmark Description

The purpose of the PeopleSoft HRMS Self-Service 9.1 benchmark is to measure comparative online performance of the selected processes in PeopleSoft Enterprise HCM 9.1 with Oracle Database 11g. The benchmark kit is an Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark with no dependency on remote COBOL calls, there is no batch workload, and DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published.

PeopleSoft defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consists of 14 scenarios which emulate users performing typical HCM transactions such as viewing paychecks, promoting and hiring employees, updating employee profiles and other typical HCM application transactions.

All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. The benchmark metric is the Average Response Time for search and save for 15,000 users..

Key Points and Best Practices

  • The application tier was configured with two PeopleSoft application server instances on the SPARC T4-4 server hosted in two separate Oracle Solaris Containers to demonstrate consolidation of multiple application, ease of administration, and load balancing.

  • Each PeopleSoft Application Server instance running in an Oracle Solaris Container was configured to run 5 application server Domains with 30 application server instances to be able to effectively handle the 15,000 users workload with zero application server queuing and minimal use of resources.

  • The web tier was configured with 20 WebLogic instances and with 4 GB JVM heap size to load balance transactions across 10 PeopleSoft Domains. That enables equitable distribution of transactions and scaling to high number of users.

  • Internal SSDs were configured in the application tier to host PeopleSoft Application Servers object CACHE file systems and in the web tier for WebLogic servers' logging providing near zero millisecond service time and faster server response time.

See Also

Disclosure Statement

Oracle's PeopleSoft HRMS 9.1 benchmark, www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

Tuesday Sep 27, 2011

SPARC T4-4 Server Sets World Record on PeopleSoft Payroll (N.A.) 9.1, Outperforms IBM Mainframe, HP Itanium

Oracle's SPARC T4-4 server achieved world record performance on the Unicode version of Oracle's PeopleSoft Enterprise Payroll (N.A) 9.1 extra-large volume model benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 10.

  • The SPARC T4-4 server was able to process 1,460,544 payments/hour using PeopleSoft Payroll N.A 9.1.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 2.8x faster than IBM z10 EC 2097 Payroll 9.0 (UNICODE version) result of 87.4 minutes. The IBM mainframe is rated at 6,512 MIPS.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 3.1x faster than HP rx7640 Itanium2 non-UNICODE result of 96.17 minutes, on Payroll 9.0.

  • The average CPU utilization on the SPARC T4-4 server was only 30%, leaving significant room for business growth.

  • The SPARC T4-4 server processed payroll for 500,000 employees, 750,000 payments, in 30.84 minutes compared to the earlier world record result of 46.76 minutes on Oracle's SPARC Enterprise M5000 server.

  • The SPARC Enterprise M5000 server configured with eight 2.66 GHz SPARC64 VII processors has a result of 46.76 minutes on Payroll 9.1. That is 7% better than the result of 50.11 minutes on the SPARC Enterprise M5000 server configured with eight 2.53 GHz SPARC64 VII processors on Payroll 9.0. The difference in clock speed between the two processors is ~5%. That is close to the difference in the two results, thereby showing that the impact of the Payroll 9.1 benchmark on the overall result is about the same as that of Payroll 9.0.

Performance Landscape

PeopleSoft Payroll (N.A.) 9.1 – 500K Employees (7 Million SQL PayCalc, Unicode)

System OS/Database Payroll Processing
Result (minutes)
Run 1
(minutes)
Num of
Streams
SPARC T4-4, 4 x 3.0 GHz SPARC T4 Solaris/Oracle 11g 30.84 43.76 96
SPARC M5000, 8 x 2.66 GHz SPARC64 VII+ Solaris/Oracle 11g 46.76 66.28 32

PeopleSoft Payroll (N.A.) 9.0 – 500K Employees (3 Million SQL PayCalc, Non-Unicode)

System OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000, 8 x 2.53 GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 58.96 80.5 250.68 462.6 8
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 87.4 ** 107.6 - - 8
HP rx7640, 8 x 1.6 GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

** This result was run with Unicode. The IBM z10 EC 2097 UNICODE result of 87.4 minutes is 48% slower than IBM z10 EC 2097 non-UNICODE result of 58.96 minutes, both on Payroll 9.0, each configured with nine 4.4GHz Gen1 processors.

Payroll 9.1 Compared to Payroll 9.0

Please note that Payroll 9.1 is Unicode based and Payroll 9.0 had non-Unicode and Unicode versions of the workload. There are 7 million executions of an SQL statement for the PayCalc batch process in Payroll 9.1 and 3 million executions of the same SQL statement for the PayCalc batch process in Payroll 9.0. This gets reflected in the elapsed time (27.33 min for 9.1 and 23.78 min for 9.0). The elapsed times of all other batch processes is lower (better) on 9.1.

Configuration Summary

Hardware Configuration:

SPARC T4-4 server
4 x 3.0 GHz SPARC T4 processors
256 GB memory
Sun Storage F5100 Flash Array
80 x 24 GB FMODs

Software Configuration:

Oracle Solaris 10 8/11
PeopleSoft HRMS and Campus Solutions 9.10.303
PeopleSoft Enterprise (PeopleTools) 8.51.035
Oracle Database 11g Release 2 11.2.0.1 (64-bit)
Micro Focus COBOLServer Express 5.1 (64-bit)

Benchmark Description

The PeopleSoft 9.1 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing a large organization. The five processes are:

  • Paysheet Creation: Generates payroll data worksheets consisting of standard payroll information for each employee for a given pay cycle.

  • Payroll Calculation: Looks at paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by Payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by the above processes and produces an electronic transmittal file that is used to transfer payroll funds directly into an employee's bank account.

Key Points and Best Practices

  • The SPARC T4-4 server with the Sun Storage F5100 Flash Array device had an average read throughput of up to 103 MB/sec and an average write throughput of up to 124 MB/sec while consuming 30% CPU on average.

  • The Sun Storage F5100 Flash Array device is a solid-state device that provides a read latency of only 0.5 msec. That is about 10 times faster than the normal disk latencies of 5 msec measured on this benchmark.

See Also

  • Oracle PeopleSoft Benchmark White Papers
    oracle.com
  • PeopleSoft Enterprise Human Capital Management (Payroll)
    oracle.com

  • PeopleSoft Enterprise Payroll 9.1 Using Oracle for Solaris (Unicode) on an Oracle's SPARC T4-4 – White Paper
    oracle.com

  • SPARC T4-4 Server
    oracle.com
  • Oracle Solaris
    oracle.com
  • Oracle Database 11g Release 2 Enterprise Edition
    oracle.com
  • Sun Storage F5100 Flash Array
    oracle.com

Disclosure Statement

Oracle's PeopleSoft Payroll 9.1 benchmark, SPARC T4-4 30.84 min,
http://www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

Friday Aug 12, 2011

Sun Blade X6270 M2 with Oracle WebLogic World Record 2 Processor SPECjEnterprise 2010 Benchmark

Oracle produced a World Record single application server using 2 chips result for the SPECjEnterprise2010 benchmark of 5,427.42 SPECjEnterprise2010 EjOPS using one of Oracle's Sun Blade X6270 M2 server module for the application tier and one Sun Blade X6270 M2 server module for the database.

  • The Sun Blade X6270 M2 server module equipped with two Intel Xeon X5690 processors running at 3.46 GHz, demonstrated 47% better performance compared to the 2-chip IBM System HS22 server result of 3,694.35 SPECjEnterprise2010 EjOPS using the same model of Intel Xeon X5690 processor.

  • The Sun Blade X6270 M2 server module running the application tier demonstrated 33% better performance compared to the 2-chip IBM Power 730 Express server result of 4,062.38 SPECjEnterprise2010 EjOPS.

  • The Sun Blade X6270 M2 server modules used Oracle WebLogic Server 11g Release 1 (10.3.5) application, Java SE 6 Update 26, and Oracle Database 11g Release 2 to produce this result.

Performance Landscape

Complete benchmark results are at the SPEC website, SPECjEnterprise2010 Results.

SPECjEnterprise2010 Performance Chart
as of 8/11/2011
Submitter EjOPS* Application Server Database Server
Oracle 5,427.42 1x Sun Blade X6270 M2
2x 3.46 GHz Intel Xeon X5690
Oracle WebLogic 11g (10.3.5)
1x Sun Blade X6270 M2
2x 3.46 GHz Intel Xeon X5690
Oracle 11g DB 11.2.0.2
IBM 4,062.38 1x IBM Power 730 Express
2x 3.5 GHz POWER 7
WebSphere Application Server V7
1x IBM BladeCenter PS701
1x 3.0 GHz POWER 7
IBM DB2 9.7 Workgroup Server Edition FP3a
IBM 3,694.35 1x IBM HS22
2x 3.46 GHz Intel Xeon X5690
WebSphere Application Server V8
1x IBM x3850 X5
2x 2.4 GHz Intel Xeon E7-4870
IBM DB2 9.7 FP3a

* SPECjEnterprise2010 EjOPS, bigger is better.

Configuration Summary

Application Server:
    1 x Sun Blade X6270 M2
      2 x 3.46 GHz Intel Xeon X5690
      48 GB memory
      4 x 10 GbE NIC
      Oracle Linux 5 Update 6
      Oracle WebLogic Server 11g Release 1 (10.3.5)
      Java HotSpot(TM) 64-Bit Server VM on Linux, version 1.6.0_26 (Java SE 6 Update 26)

Database Server:

    1 x Sun Blade X6270 M2
      2 x 3.46 GHz Intel Xeon X5690
      144 GB memory
      2 x 10 GbE NIC
      2 x Sun Storage 6180
      Oracle Linux 5 Update 6
      Oracle Database 11g Enterprise Edition Release 11.2.0.2

Benchmark Description

SPECjEnterprise2010 is the third generation of the SPEC organization's J2EE end-to-end industry standard benchmark application. The SPECjEnterprise2010 benchmark has been designed and developed to cover the Java EE 5.0 specification's significantly expanded and simplified programming model, highlighting the major features used by developers in the industry today. This provides a real world workload driving the Application Server's implementation of the Java EE specification to its maximum potential and allowing maximum stressing of the underlying hardware and software systems.

The workload consists of an end to end web based order processing domain, an RMI and Web Services driven manufacturing domain and a supply chain model utilizing document based Web Services. The application is a collection of Java classes, Java Servlets, Java Server Pages , Enterprise Java Beans, Java Persistence Entities (pojo's) and Message Driven Beans.

The SPECjEnterprise2010 benchmark heavily exercises all parts of the underlying infrastructure that make up the application environment, including hardware, JVM software, database software, JDBC drivers, and the system network.

The primary metric of the SPECjEnterprise2010 benchmark is jEnterprise Operations Per Second ("SPECjEnterprise2010 EjOPS"). The primary metric for the SPECjEnterprise2010 benchmark is calculated by adding the metrics of the Dealership Management Application in the Dealer Domain and the Manufacturing Application in the Manufacturing Domain. There is no price/performance metric in this benchmark.

Key Points and Best Practices

  • Two Oracle WebLogic server instances were started using numactl binding 1 instance per chip.
  • Two Oracle database listener processes were started and each was bound to a separate chip.
  • Additional tuning information is in the report at http://spec.org.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjEnterprise are registered trademarks of the Standard Performance Evaluation Corporation. Sun Blade X6270 M2, 5,427.42 SPECjEnterprise2010 EjOPS; IBM Power 730 Express, 4,062.38 SPECjEnterprise2010 EjOPS; IBM System HS22, 3,694.35 SPECjEnterprise2010 EjOPS. Results from www.spec.org as of 8/11/2011.

Friday Jun 03, 2011

SPARC Enterprise M8000 with Oracle 11g Beats IBM POWER7 on TPC-H @1000GB Benchmark

Oracle's SPARC Enterprise M8000 server configured with SPARC64 VII+ processors, Oracle's Sun Storage F5100 Flash Array storage, Oracle Solaris, and Oracle Database 11g Release 2 achieved a TPC-H performance result of 209,533 QphH@1000GB with price/performance of $9.53/QphH@1000GB.

Oracle's SPARC server surpasses the performance of the IBM POWER7 server on the 1 TB TPC-H decision support benchmark.

Oracle focuses on the performance of the complete hardware and software stack. Implementation details such as the number of cores or the number of threads obscures the important metric of delivered system performance. The SPARC Enterprise M8000 server delivers higher performance than the IBM Power 780 even though the SPARC VII+ processor-core is 1.6x slower than the POWER7 processor-core.

  • The SPARC Enterprise M8000 server is 27% faster than the IBM Power 780. IBM's reputed single-thread performance leadership does not provide benefit for throughput.

  • Oracle beats IBM Power with better performance. This shows that Oracle's focus on integrated system design provides more customer value than IBM's focus on per core performance.

  • The SPARC Enterprise M8000 server is up to 3.8 times faster than the IBM Power 780 for Refresh Function. Again, IBM's reputed single-thread performance leadership does not provide benefit for this important function.

  • The SPARC Enterprise M8000 server is 49% faster than the HP Superdome 2 (1.73 GHz Itanium 9350).

  • The SPARC Enterprise M8000 server is 22% better price performance than the HP Superdome 2 (1.73 GHz Itanium 9350).

  • The SPARC Enterprise M8000 server is 2 times faster than the HP Superdome 2 (1.73 GHz Itanium 9350) for Refresh Function.

  • Oracle used Storage Redundancy Level 3 as defined by the TPC-H 2.14.0 specification which is the highest level.

  • One should focus on the performance of the complete hardware and software stack since server implementation details such as the number of cores or the number of threads obscures the important metric of delivered system performance.

  • This TPC-H result demonstrates that the SPARC Enterprise M8000 server can handle the increasingly large databases required of DSS systems. The server delivered more than 16 GB/sec of IO throughput through Oracle Database 11g Release 2 software maintaining high cpu load.

Performance Landscape

The table below lists published results from comparable enterprise class systems from Oracle, HP and IBM. Each system was configured with 512 GB of memory.

TPC-H @1000GB

System
CPU type
Proc/Core/Thread
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC Enterprise M8000
3 GHz SPARC64 VII+
16 / 64 / 128
209,533.6 $9.53 177,845.9 246,867.2 Oracle 11g 09/22/11
IBM Power 780
4.14 GHz POWER7
8 / 32 / 128
164,747.2 $6.85 170,206.4 159,463.1 Sybase 03/31/11
HP SuperDome 2
1.73 GHz Intel Itanium 9350
16 / 64 / 64
140,181.1 $12.15 139,181.0 141,188.3 Oracle 11g 10/20/10

QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server:

SPARC Enterprise M8000 server
16 x SPARC64 VII+ 3.0 GHz processors (total of 64 cores, 128 threads)
512 GB memory
12 x internal SAS (12 x 300 GB) disk drives

External Storage:

4 x Sun Storage F5100 Flash Array device, each with
80 x 24 GB Flash Modules

Software:

Oracle Solaris 10 8/11
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 1000 GB (Scale Factor 3000)
TPC-H Composite: 209,533.6 QphH@1000GB
Price/performance: $9.53/QphH@1000GB
Available: 09/22/2011
Total 3 year Cost: $1,995,715
TPC-H Power: 177,845.9
TPC-H Throughput: 246,867.2
Database Load Time: 1:27:12

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance.

Key Points and Best Practices

  • Four Sun Storage F5100 Flash Array devices were used for the benchmark. Each F5100 device contains 80 Flash Modules (FMODs). Twenty (20) FMODs from each F5100 device were connected to a single SAS 6 Gb HBA. A single F5100 device showed 4.16 GB/sec for sequential read and demonstrated linear scaling of 16.62 GB/sec with 4 x F5100 devices.
  • The IO rate from the Oracle database was over 16 GB/sec.
  • Oracle Solaris 10 8/11 required very little system tuning.
  • The SPARC Enterprise M8000 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle parallel processes.
  • The Oracle database files were mirrored under Solaris Volume Manager (SVM). Two F5100 arrays were mirrored to another 2 F5100 arrays. IO performance was good and balanced across all the FMODs. Because of the SVM mirror one of the durability tests, the disk/controller failure test, was transparent to the Oracle database.

See Also

Disclosure Statement

SPARC Enterprise M8000 209,533.6 QphH@1000GB, $9.53/QphH@1000GB, avail 09/22/11, IBM Power 780 QphH@1000GB, 164,747.2 QphH@1000GB, $6.85/QphH@1000GB, avail 03/31/11, HP Integrity Superdome 2 140,181.1 QphH@1000GB, $12.15/QphH@1000GB avail 10/20/10, TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

Friday Mar 25, 2011

SPARC Enterprise M9000 with Oracle Database 11g Delivers World Record Single Server TPC-H @3000GB Result

Oracle's SPARC Enterprise M9000 server delivers single-system TPC-H @3000GB world record performance. The SPARC Enterprise M9000 server along with Oracle's Sun Storage 6180 arrays and running Oracle Database 11g Release 2 on the Oracle Solaris operating system proves the power of Oracle's integrated solution.

  • The SPARC Enterprise M9000 server configured with SPARC64 VII+ processors, Sun Storage 6180 arrays and running Oracle Solaris 10 combined with Oracle Database 11g Release 2 achieved World Record TPC-H performance of 386,478.3 QphH@3000GB for non-clustered systems.

  • The SPARC Enterprise M9000 server running the Oracle Database 11g Release 2 software is 2.5 times faster than the IBM p595 (POWER6) server which ran with Sybase IQ v.15.1 database software.

  • The SPARC Enterprise M9000 server is 3.4 times faster than the IBM p595 server for data loading.

  • The SPARC Enterprise M9000 server is 3.5 times faster than the IBM p595 server for Refresh Function.

  • The SPARC Enterprise M9000 server configured with Sun Storage 6180 arrays shows linear scaling up to the maximum delivered IO performance of 48.3 GB/sec as measured by vdbench.

  • The SPARC Enterprise M9000 server running the Oracle Database 11g Release 2 software is 2.4 times faster than the HP ProLiant DL980 server which used Microsoft SQL Server 2008 R2 Enterprise Edition software.

  • The SPARC Enterprise M9000 server is 2.9 times faster than the HP ProLiant DL980 server for data loading.

  • The SPARC Enterprise M9000 server is 4 times faster than the HP ProLiant DL980 server for Refresh Function.

  • A 1.94x improvement was delivered by the SPARC Enterprise M9000 server result using 64 SPARC64 VII+ processors compared to the previous Sun SPARC Enterprise M9000 server result which used 32 SPARC64 VII processes.

  • Oracle's TPC-H result shows that the SPARC Enterprise M9000 server can handle the increasingly large databases required of DSS systems. The IO rate as measured by the Oracle database is over 40 GB/sec.

  • Oracle used Storage Redundancy Level 3 as defined by the TPC-H 2.14.0 specification which is the highest level.

Performance Landscape

TPC-H @3000GB, Non-Clustered Systems

System
CPU type
Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC Enterprise M9000
3 GHz SPARC64 VII+
1024 GB
386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g 09/22/11
Sun SPARC Enterprise M9000
2.88 GHz SPARC64 VII
512 GB
198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g 12/09/10
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
512 GB
162,601.7 $2.68 185,297.7 142,601.7 SQL Server 10/13/10
IBM Power 595
5.0 GHz POWER6
512 GB
156,537.3 $20.60 142,790.7 171,607.4 Sybase 11/24/09

QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server:

SPARC Enterprise M9000
64 x SPARC VII+ 3.0 GHz processors
1024 GB memory
4 x internal SAS (4 x 146 GB)

External Storage:

32 x Sun Storage 6180 arrays (each with 16 x 600 GB)

Software:

Oracle Solaris 10 9/10
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 3000 GB (Scale Factor 3000)
TPC-H Composite: 386,478.3 QphH@3000GB
Price/performance: $18.19/QphH@3000GB
Available: 09/22/2011
Total 3 year Cost: $7,030,009
TPC-H Power: 316,835.8
TPC-H Throughput: 471,428.6
Database Load Time: 2:59:01

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance.

Key Points and Best Practices

  • The Sun Storage 6180 array showed linear scalability of 48.3 GB/sec Sequential Read with thirty-two Sun Storage 6180 arrays. Scaling could continue if there are more arrays available.
  • Oracle Solaris 10 9/10 required very little system tuning.
  • The optimal Sun Storage 6180 arrays configuration for the benchmark was to set up 1 disk per volume instead of multiple disks per volume and let Oracle Oracle Automatic Storage Management (ASM) mirror. Presenting as many volumes as possible to Oracle database gave the highest scan rate.

  • The storage was managed by ASM with 4 MB stripe size. 1 MB is the default stripe size but 4 MB works better for large databases.

  • All the Oracle database files, except TEMP tablespace, were mirrored under ASM. 16 x Sun Storage 6180 arrays (256 disks) were mirrored to another 16 x Sun Storage 6180 arrays using ASM. IO performance was good and balanced across all the disks. With the ASM mirror the benchmark passed the ACID (Atomicity, Consistency, Isolation and Durablity) test.

  • Oracle database tables were 256-way partitioned. The parallel degree for each table was set to 256 to match the number of available cores. This setting worked the best for performance.

  • Oracle Database 11g Release 2 feature Automatic Parallel Degree Policy was set to AUTO for the benchmark. This enabled automatic degree of parallelism, statement queuing and in-memory parallel execution.

See Also

Disclosure Statement

SPARC Enterprise M9000 386,478.3 QphH@3000GB, $18.19/QphH@3000GB, avail 09/22/11, IBM Power 595 QphH@3000GB, 156,537.3 QphH@3000GB, $20.60/QphH@3000GB, avail 11/24/09, HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB avail 10/13/10, TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

Tuesday Mar 22, 2011

Netra SPARC T3-1 22% Faster Than IBM Running Oracle Communications ASAP

Oracle's Netra SPARC T3-1 server delivered better performance than the IBM Power 570 server running the Oracle Communications ASAP application. Oracle Communications ASAP is used by the world's leading communication providers to enable voice, data, video and content services across wireless, wireline and satellite networks.

  • A Netra SPARC T3-1 server is 22% faster than the IBM Power 570 server delivering higher order volume throughput. This was achieved by consolidating Oracle Database 11g Release 2 and Oracle Communications ASAP 7.0.2 software onto a single Netra SPARC T3-1 server.

  • Oracle's Netra servers are NEBS level 3 certified, unlike the competition. NEBS is a set of safety, physical, and environmental design guidelines for telecommunications equipment in the United States.

  • A single Netra SPARC T3-1 server takes one-eighth the rack space of an IBM Power 570 system.

  • The single processor Netra SPARC T3-1 server beat an eight processor IBM Power 570 server.

  • The ASAP result which was run on the Netra SPARC T3-1 server is the highest single-system throughput ever measured for this benchmark.

Performance Landscape

Results of Oracle Communications ASAP run with Oracle Database 11g.

System Processor Memory OS Orders/hour Version
Netra SPARC T3-1 1 x 1.65 GHz SPARC T3 128 GB Solaris 10 570,000 7.0.2
IBM Power 570 8 x 5 GHz POWER6 128 GB AIX 6.1.2 463,500 7.0

In both cases, server utilization ranged between 60 and 75%.

Configuration Summary

Hardware Configuration:

Netra SPARC T3-1
1 x 1.65 GHz T3 processor
128 GB memory
Sun Storage 7410 Unified Storage System with one Sun Storage J4400 array

Software Configuration:

Oracle Solaris 10 9/10
Oracle Database 11g Release 2 (11.2.0.1.0)
Java Platform, Standard Edition 6 Update 18
Oracle Communications ASAP 7.0.2
Oracle WebLogic Server 10.3.3.0

Benchmark Description

Oracle Communications Service Activation orchestrates the activation of complex services in a flow-through manner across multiple technology domains for both wireline and wireless service providers. This Activation product has two engines: ASAP (Automatic Service Activation Program) and IPSA (IP Service Activator). ASAP covers multiple technologies and vendors, while IPSA focuses on IP-based services.

ASAP converts order activation requests (also referred to as CSDLs) into specific atomic actions for network elements (ASDLs). ASAP performance is measured in throughput and can be expressed either as number of input business orders processed (orders/hour or CSDLs/hour) or as number of actions on network elements (ASDLs/sec). The ratio of CSDL to ASDL depends on the specific telco operator. This workload uses a 1:7 ratio (commonly used by wireless providers), which means that every order translates into actions for 7 network elements. For this benchmark, ASAP was configured to use one NEP (Network Element Processor) per network element.

Key Points and Best Practices

The application and database tiers were hosted on same Netra SPARC T3-1 server.

ASAP has three main components: WebLogic, SARM, NEP. WebLogic is used to receive and translate orders coming in as JMS messages. SARM and NEP, both native applications, perform the core activations functions.

A single ASAP instance delivered slightly under 300k orders/hour, with 27% system utilization. To take better advantage of the SPARC T3 processor's threads, two more instances of ASAP were deployed, reaching 570k orders/hour. The observed ratio between ASAP and Oracle database processor load was 1 to 1.

The Sun Storage 7410 data volumes were mounted via NFS and accessed through the onboard GbE NIC.

A second test was conducted with a more complex configuration of 24 NEPs instead of 7. This simulates the requirements of one of the largest ASAP customers. For this scenario, a single ASAP instances delivered 200k orders/hour.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 3/22/2011.

Tuesday Oct 26, 2010

3D VTI Reverse Time Migration Scalability On Sun Fire X2270-M2 Cluster with Sun Storage 7210

This Oil & Gas benchmark shows the Sun Storage 7210 system delivers almost 2 GB/sec bandwidth and realizes near-linear scaling performance on a cluster of 16 Sun Fire X2270 M2 servers.

Oracle's Sun Storage 7210 system attached via QDR InfiniBand to a cluster of sixteen of Oracle's Sun Fire X2270 M2 servers was used to demonstrate the performance of a Reverse Time Migration application, an important application in the Oil & Gas industry. The total application throughput and computational kernel scaling are presented for two production sized grids of 800 samples.

  • Both the Reverse Time Migration I/O and combined computation shows near-linear scaling from 8 to 16 nodes on the Sun Storage 7210 system connected via QDR InfiniBand to a Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231: 2.0x improvement
      2486 x 1151 x 1231: 1.7x improvement
  • The computational kernel of the Reverse Time Migration has linear to super-linear scaling from 8 to 16 nodes in Oracle's Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231 : 2.2x improvement
      2486 x 1151 x 1231 : 2.0x improvement
  • Intel Hyper-Threading provides additional performance benefits to both the Reverse Time Migration I/O and computation when going from 12 to 24 OpenMP threads on the Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231: 8% - computational kernel; 2% - total application throughput
      2486 x 1151 x 1231: 12% - computational kernel; 6% - total application throughput
  • The Sun Storage 7210 system delivers the Velocity, Epsilon, and Delta data to the Reverse Time Migration at a steady rate even when timing includes memory initialization and data object creation:

      1243 x 1151 x 1231: 1.4 to 1.6 GBytes/sec
      2486 x 1151 x 1231: 1.2 to 1.3 GBytes/sec

    One can see that when doubling the size of the problem, the additional complexity of overlapping I/O and multiple node file contention only produces a small reduction in read performance.

Performance Landscape

Application Scaling

Performance and scaling results of the total application, including I/O, for the reverse time migration demonstration application are presented. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server.

Application Scaling Across Multiple Nodes
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup
16 504 259 2.0 2.2\* 1024 551 1.7 2.0
14 565 279 1.8 2.0 1191 677 1.5 1.6
12 662 343 1.6 1.6 1426 817 1.2 1.4
10 784 394 1.3 1.4 1501 856 1.2 1.3
8 1024 560 1.0 1.0 1745 1108 1.0 1.0

\* Super-linear scaling due to the compute kernel fitting better into available cache

Application Scaling – Hyper-Threading Study

The affects of hyperthreading are presented when running the reverse time migration demonstration application. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server.

Hyper-Threading Comparison – 12 versus 24 OpenMP Threads
Number Nodes Thread per Node Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total HT Speedup Kernel HT Speedup Total Time (sec) Kernel Time (sec) Total HT Speedup Kernel HT Speedup
16 24 504 259 1.02 1.08 1024 551 1.06 1.12
16 12 515 279 1.00 1.00 1088 616 1.00 1.00

Read Performance

Read performance is presented for the velocity, epsilon and delta files running the reverse time migration demonstration application. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server.

Velocity, Epsilon, and Delta File Read and Memory Initialization Performance
Number Nodes Overlap MBytes Read Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Time (sec) Time Relative 8-node Total GBytes Read Read Rate GB/s Time (sec) Time Relative 8-node Total GBytes Read Read Rate GB/s
16 2040 16.7 1.1 23.2 1.4 36.8 1.1 44.3 1.2
8 951
14.8 1.0 22.1 1.6 33.0 1.0 43.2 1.3

Configuration Summary

Hardware Configuration:

16 x Sun Fire X2270 M2 servers, each with
2 x 2.93 GHz Intel Xeon X5670 processors
48 GB memory (12 x 4 GB at 1333 MHz)

Sun Storage 7210 system connected via QDR InfiniBand
2 x 18 GB SATA SSD (logzilla)
40 x 1 TB 7200 RM SATA disk

Software Configuration:

SUSE Linux Enterprise Server SLES 10 SP 2
Oracle Message Passing Toolkit 8.2.1 (for MPI)
Sun Studio 12 Update 1 C++, Fortran, OpenMP

Benchmark Description

This Reverse Time Migration (RTM) demonstration application measures the total time it takes to image 800 samples of various production size grids and write the final image to disk. In this version, each node reads in only the trace, velocity, and conditioning data to be processed by that node plus a four element inline 3-D array pad (spatial order of eight) shared with its neighbors to the left and right during the initialization phase. It represents a full RTM application including the data input, computation, communication, and final output image to be used by the next work flow step involving 3D volumetric seismic interpretation.

Key Points and Best Practices

This demonstration application represents a full Reverse Time Migration solution. Many references to the RTM application tend to focus on the compute kernel and ignore the complexity that the input, communication, and output bring to the task.

I/O Characterization without Optimal Checkpointing

Velocity, Epsilon, and Delta Files - Grid Reading

The additional amount of overlapping reads to share velocity, epsilon, and delta edge data with neighbors can be calculated using the following equation:

    (number_nodes - 1) x (order_in_space) x (y_dimension) x (z_dimension) x (4 bytes) x (3 files)

For this particular benchmark study, the additional 3-D pad overlap for the 16 and 8 node cases is:

    16 nodes: 15 x 8 x 1151 x 1231 x 4 x 3 = 2.04 GB extra
    8 nodes: 7 x 8 x 1151 x 1231 x 4 x 3 = 0.95 GB extra

For the first of the two test cases, the total size of the three files used for the 1243 x 1151 x 1231 case is

    1243 x 1151 x 1231 x 4 bytes = 7.05 GB per file x 3 files = 21.13 GB

With the additional 3-D pad, the total amount of data read is:

    16 nodes: 2.04 GB + 21.13 GB = 23.2 GB
    8 nodes: 0.95 GB + 21.13 GB = 22.1 GB

For the second of the two test cases, the total size of the three files used for the 2486 x 1151 x 1231 case is

    2486 x 1151 x 1231 x 4 bytes = 14.09 GB per file x 3 files = 42.27 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: 2.04 GB + 42.27 GB = 44.3 GB
    8 nodes: 0.95 GB + 42.27 GB = 43.2 GB

Note that the amount of overlapping data read increases, not only by the number of nodes, but as the y dimension and/or the z dimension increases.

Trace Reading

The additional amount of overlapping reads to share trace edge data with neighbors for can be calculated using the following equation:

    (number_nodes - 1) x (order_in_space) x (y_dimension) x (4 bytes) x (number_of_time_slices)

For this particular benchmark study, the additional overlap for the 16 and 8 node cases is:

    16 nodes: 15 x 8 x 1151 x 4 x 800 = 442MB extra
    8 nodes: 7 x 8 x 1151 x 4 x 800 = 206MB extra

For the first case the size of the trace data file used for the 1243 x 1151 x 1231 case is

    1243 x 1151 x 4 bytes x 800 = 4.578 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: .442 GB + 4.578 GB = 5.0 GB
    8 nodes: .206 GB + 4.578 GB = 4.8 GB

For the second case the size of the trace data file used for the 2486 x 1151 x 1231 case is

    2486 x 1151 x 4 bytes x 800 = 9.156 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: .442 GB + 9.156 GB = 9.6 GB
    8 nodes: .206 GB + 9.156 GB = 9.4 GB

As the number of nodes is increased, the overlap causes more disk lock contention.

Writing Final Output Image

1243x1151x1231 - 7.1 GB per file:

    16 nodes: 78 x 1151 x 1231 x 4 = 442MB/node (7.1 GB total)
    8 nodes: 156 x 1151 x 1231 x 4 = 884MB/node (7.1 GB total)

2486x1151x1231 - 14.1 GB per file:

    16 nodes: 156 x 1151 x 1231 x 4 = 930 MB/node (14.1 GB total)
    8 nodes: 311 x 1151 x 1231 x 4 = 1808 MB/node (14.1 GB total)

Resource Allocation

It is best to allocate one node as the Oracle Grid Engine resource scheduler and MPI master host. This is especially true when running with 24 OpenMP threads in hyperthreading mode to avoid oversubscribing a node that is cooperating in delivering the solution.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/20/2010.

Monday Oct 11, 2010

Sun SPARC Enterprise M9000 Server Delivers World Record Non-Clustered TPC-H @3000GB Performance

Oracle's Sun SPARC Enterprise M9000 server delivered a single-system TPC-H 3000GB world record performance. The Sun SPARC Enterprise M9000 server, running Oracle Database 11g Release 2 on the Oracle Solaris operating system proves the power of Oracle's integrated solution.

  • Oracle beats IBM Power with better performance and price/performance (3 Year TCO). This shows that Oracle's focus on integrated system design provides more customer value than IBM's focus on "per core performance"!

  • The Sun SPARC Enterprise M9000 server is 27% faster than the IBM Power 595.

  • The Sun SPARC Enterprise M9000 server is 22% faster than the HP ProLiant DL980 G7.

  • The Sun SPARC Enterprise M9000 server is 26% lower than the IBM Power 595 for price/performance.

  • The Sun SPARC Enterprise M9000 server is 2.7 times faster than the IBM Power 595 for data loading.

  • The Sun SPARC Enterprise M9000 server is 2.3 times faster than the HP ProLiant DL980 for data loading.

  • The Sun SPARC Enterprise M9000 server is 2.6 times faster than the IBM p595 for Refresh Function.

  • The Sun SPARC Enterprise M9000 server is 3 times faster than the HP ProLiant DL980 for Refresh Function.

  • Oracle used Storage Redundancy Level 3 as defined by the TPC-H 2.12.0 specification, which is the highest level. IBM is the only other vendor to secure the storage to this level.

  • One should focus on the performance of the complete hardware and software stack since server implementation details such as the number of cores or the number of threads will obscure the important metrics of delivered system performance and system price/performance.

  • The Sun SPARC Enterprise M9000 server configured with SPARC VII processors, Sun Storage 6180 arrays, and running Oracle Solaris 10 operating system combined with Oracle Database 11g Release 2 achieved World Record TPC-H performance of 198,907.5 QphH@3000GB for non-clustered systems.

  • The Sun SPARC Enterprise M9000 server is over three times faster than the HP Itanium2 Superdome.

  • The Sun Storage 6180 array configuration (a total of 16 6180 arrays) in this benchmark delivered IO performance of over 21 GB/sec Sequential Read performance as measured by the vdbench tool.

  • This TPC-H result demonstrates that the Sun SPARC Enterprise M9000 server can handle the increasingly large databases required of DSS systems. The server delivered more than 18 GB/sec of real IO throughput as measured by the Oracle Database 11g Release 2 software.

  • Both Oracle and IBM had the same level of hardware discounting as allowed by TPC rules to provide a effective comparison of price/performance.

  • IBM has not shown any delivered I/O performance results for the high-end IBM POWER7 systems. In addition, they have not delivered any commercial benchmarks (TPC-C, TPC-H, etc.) which have heavy I/O demands.

Performance Landscape

TPC-H @3000GB, Non-Clustered Systems

System
CPU type
Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
Sun SPARC Enterprise M9000
2.88GHz SPARC64 VII
512GB
198,907.5 $15.27 182,350.7 216,967.7 Oracle 12/09/10
HP ProLiant DL980 G7
2.27GHz Intel Xeon X7560
512GB
162,601.7 $2.68 185,297.7 142,601.7 SQL Server 10/13/10
IBM Power 595
5.0GHz POWER6
512GB
156,537.3 $20.60 142,790.7 171,607.4 Sybase 11/24/09
Unisys ES7000 7600R
2.6GHz Intel Xeon
1024GB
102,778.2 $21.05 120,254.8 87,841.4 SQL Server 05/06/10
HP Integrity Superdome
1.6GHz Intel Itanium
256GB
60,359.3 $32.60 80,838.3 45,068.3 SQL Server 05/21/07

QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server:

Sun SPARC Enterprise M9000
32 x SPARC VII 2.88 GHz processors
512 GB memory
4 x internal SAS (4 x 300 GB)

External Storage:

16 x Sun Storage 6180 arrays (16x 16 x 300 GB)

Software:

Operating System: Oracle Solaris 10 10/09
Database: Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 3000 GB (Scale Factor 3000)
TPC-H Composite: 198,907.5 QphH@3000GB
Price/performance: $15.27/QphH@3000GB
Available: 12/09/2010
Total 3 year Cost: $3,037,900
TPC-H Power: 182,350.7
TPC-H Throughput: 216,967.7
Database Load Time: 3:40:11

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance.

Key Points and Best Practices

  • The Sun Storage 6180 array showed good scalability and these sixteen 6180 arrays showed over 21 GB/sec Sequential Read performance as measured by the vdbench tool.
  • Oracle Solaris 10 10/09 required little system tuning.
  • The optimal 6180 configuration for the benchmark was to set up 1 disk per volume instead of multiple disks per volume and let Oracle Solaris Volume Manager (SVM) mirror. Presenting as many volumes as possible to Oracle database gave the highest scan rate.

  • The storage was managed by SVM with 1MB stripe size to match with Oracle's database IO size. The default 16K stripe size is just too small for this DSS benchmark.

  • All the Oracle files, except TEMP tablespace, were mirrored under SVM. Eight 6180 arrays (128 disks) were mirrored to another 8 6180 arrays using 128-way stripe. IO performance was good and balanced across all the disks with a round robin order. Read performance was the same with mirror or without mirror. With the SVM mirror the benchmark passed the ACID (Atomicity, Consistency, Isolation and Durablity) test.

  • Oracle tables were 128-way partitioned and parallel degree for each table was set to 128 because the system had 128 cores. This setting worked the best for performance.

  • CPU usage during the Power run was not so high. This is because parallel degree was set to 128 for the tables and indexes so it utilized 128 vcpus for the most of the queries but the system had 256 vcpus.

See Also

Disclosure Statement

Sun SPARC Enterprise M9000 198,907.5 QphH@3000GB, $15.27/QphH@3000GB, avail 12/09/10, IBM Power 595 QphH@3000GB, 156,537.3 QphH@3000GB, $20.60/QphH@3000GB, avail 11/24/09, HP Integrity Superdome 60,359.3 QphH@3000GB, $32.60/QphH@3000GB avail 06/18/07, TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

Monday Sep 20, 2010

Schlumberger's ECLIPSE 300 Performance Throughput On Sun Fire X2270 Cluster with Sun Storage 7410

Oracle's Sun Storage 7410 system, attached via QDR InfiniBand to a cluster of eight of Oracle's Sun Fire X2270 servers, was used to evaluate multiple job throughput of Schlumberger's Linux-64 ECLIPSE 300 compositional reservoir simulator processing their standard 2 Million Cell benchmark model with 8 rank parallelism (MM8 job).

  • The Sun Storage 7410 system showed little difference in performance (2%) compared to running the MM8 job with dedicated local disk.

  • When running 8 concurrent jobs on 8 different nodes all to the Sun Storage 7140 system, the performance saw little degradation (5%) compared to a single MM8 job running on dedicated local disk.

Experiments were run changing how the cluster was utilized in scheduling jobs. Rather than running with the default compact mode, tests were run distributing the single job among the various nodes. Performance improvements were measured when changing from the default compact scheduling scheme (1 job to 1 node) to a distributed scheduling scheme (1 job to multiple nodes).

  • When running at 75% of the cluster capacity, distributed scheduling outperformed the compact scheduling by up to 34%. Even when running at 100% of the cluster capacity, the distributed scheduling is still slightly faster than compact scheduling.

  • When combining workloads, using the distributed scheduling allowed two MM8 jobs to finish 19% faster than the reference time and a concurrent PSTM workload to find 2% faster.

The Oracle Solaris Studio Performance Analyzer and Sun Storage 7410 system analytics were used to identify a 3D Prestack Kirchhoff Time Migration (PSTM) as a potential candidate for consolidating with ECLIPSE. Both scheduling schemes are compared while running various job mixes of these two applications using the Sun Storage 7410 system for I/O.

These experiments showed a potential opportunity for consolidating applications using Oracle Grid Engine resource scheduling and Oracle Virtual Machine templates.

Performance Landscape

Results are presented below on a variety of experiments run using the 2009.2 ECLIPSE 300 2 Million Cell Performance Benchmark (MM8). The compute nodes are a cluster of Sun Fire X2270 servers connected with QDR InfiniBand. First, some definitions used in the tables below:

Local HDD: Each job runs on a single node to its dedicated direct attached storage.
NFSoIB: One node hosts its local disk for NFS mounting to other nodes over InfiniBand.
IB 7410: Sun Storage 7410 system over QDR InfiniBand.
Compact Scheduling: All 8 MM8 MPI processes run on a single node.
Distributed Scheduling: Allocate the 8 MM8 MPI processes across all available nodes.

First Test

The first test compares the performance of a single MM8 test on a single node using local storage to running a number of jobs across the cluster and showing the effect of different storage solutions.

Compact Scheduling
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Local HDD Relative Throughput NFSoIB Relative Throughput IB 7410 Relative Throughput
13% 1 1.00 1.00\* 0.98
25% 2 0.98 0.97 0.98
50% 4 0.98 0.96 0.97
75% 6 0.98 0.95 0.95
100% 8 0.98 0.95 0.95

\* Performance measured on node hosting its local disk to other nodes in the cluster.

Second Test

This next test uses the Sun Storage 7410 system and compares the performance of running the MM8 job on 1 node using the compact scheduling to running multiple jobs with compact scheduling and to running multiple jobs with the distributed schedule. The tests are run on a 8 node cluster, so each distributed job has only 1 MPI process per node.

Comparing Compact and Distributed Scheduling
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Compact Scheduling
Relative Throughput
Distributed Scheduling\*
Relative Throughput
13% 1 1.00 1.34
25% 2 1.00 1.32
50% 4 0.99 1.25
75% 6 0.97 1.10
100% 8 0.97 0.98

\* Each distributed job has 1 MPI process per node.

Third Test

This next test uses the Sun Storage 7410 system and compares the performance of running the MM8 job on 1 node using the compact scheduling to running multiple jobs with compact scheduling and to running multiple jobs with the distributed schedule. This test only uses 4 nodes, so each distributed job has two MPI processes per node.

Comparing Compact and Distributed Scheduling on 4 Nodes
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Compact Scheduling
Relative Throughput
Distributed Scheduling\*
Relative Throughput
25% 1 1.00 1.39
50% 2 1.00 1.28
100% 4 1.00 1.00

\* Each distributed job it has two MPI processes per node.

Fourth Test

The last test involves running two different applications on the 4 node cluster. It compares the performance of running the cluster fully loaded and changing how the applications are run, either compact or distributed. The comparisons are made against the individual application running the compact strategy (as few nodes as possible). It shows that appropriately mixing jobs can give better job performance than running just one kind of application on a single cluster.

Multiple Job, Multiple Application Throughput Results
Comparing Scheduling Strategies
2009.2 ECLIPSE 300 MM8 2 Million Cell and 3D Kirchoff Time Migration (PSTM)

Number of PSTM Jobs Number of MM8 Jobs Compact Scheduling
(1 node x 8 processes
per job)
ECLIPSE
Distributed Scheduling
(4 nodes x 2 processes
per job)
ECLIPSE
Distributed Scheduling
(4 nodes x 4 processes
per job)
PSTM
Compact Scheduling
(2 nodes x 8 processes per job)
PSTM
Cluster Load
0 1 1.00 1.40

25%
0 2 1.00 1.27

50%
0 4 0.99 0.98

100%
1 2
1.19 1.02
100%
2 0

1.07 0.96 100%
1 0

1.08 1.00 50%

Results and Configuration Summary

Hardware Configuration:

8 x Sun Fire X2270 servers, each with
2 x 2.93 GHz Intel Xeon X5570 processors
24 GB memory (6 x 4 GB memory at 1333 MHz)
1 x 500 GB SATA
Sun Storage 7410 system, 24 TB total, QDR InfiniBand
4 x 2.3 GHz AMD Opteron 8356 processors
128 GB memory
2 Internal 233GB SAS drives (466 GB total)
2 Internal 93 GB read optimized SSD (186 GB total)
1 Sun Storage J4400 with 22 1 TB SATA drives and 2 18 GB write optimized SSD
20 TB RAID-Z2 (double parity) data and 2-way striped write optimized SSD or
11 TB mirrored data and mirrored write optimized SSD
QDR InfiniBand Switch

Software Configuration:

SUSE Linux Enterprise Server 10 SP 2
Scali MPI Connect 5.6.6
GNU C 4.1.2 compiler
2009.2 ECLIPSE 300
ECLIPSE license daemon flexlm v11.3.0.0
3D Kirchoff Time Migration

Benchmark Description

The benchmark is a home-grown study in resource usage options when running the Schlumberger ECLIPSE 300 Compositional reservoir simulator with 8 rank parallelism (MM8) to process Schlumberger's standard 2 Million Cell benchmark model. Schlumberger pre-built executables were used to process a 260x327x73 (2 Million Cell) sub-grid with 6,206,460 total grid cells and model 7 different compositional components within a reservoir. No source code modifications or executable rebuilds were conducted.

The ECLIPSE 300 MM8 job uses 8 MPI processes. It can run within a single node (compact) or across multiple nodes of a cluster (distributed). By using the MM8 job, it is possible to compare the performance between running each job on a separate node using local disk to using a shared network attached storage solution. The benchmark tests study the affect of increasing the number of MM8 jobs in a throughput model.

The first test compares the performance of running 1, 2, 4, 6 and 8 jobs on a cluster of 8 nodes using local disk, NFSoIB disk, and the Sun Storage 7410 system connected via InfiniBand. Results are compared against the time it takes to run 1 job with local disk. This test shows what performance impact there is when loading down a cluster.

The second test compares different methods of scheduling jobs on a cluster. The compact method involves putting all 8 MPI processes for a job on the same node. The distributed method involves using 1 MPI processes per node. The results compare the performance against 1 job on one node.

The third test is similar to the second test, but uses only 4 nodes in the cluster, so when running distributed, there are 2 MPI processes per node.

The fourth test compares the compact and distributed scheduling methods on 4 nodes while running a 2 MM8 jobs and one 16-way parallel 3D Prestack Kirchhoff Time Migration (PSTM).

Key Points and Best Practices

  • ECLIPSE is very sensitive to memory bandwidth and needs to be run on 1333 MHz or greater memory speeds. In order to maintain 1333 MHz memory, the maximum memory configuration for the processors used in this benchmark is 24 GB. Bios upgrades now allow 1333 MHz memory for up to 48 GB of memory. Additional nodes can be used to handle data sets that require more memory than available per node. Allocating at least 20% of memory per node for I/O caching helps application performance.

  • If allocating an 8-way parallel job (MM8) to a single node, it is best to use an ECLIPSE license for that particular node to avoid the any additional network overhead of sharing a global license with all the nodes in a cluster.

  • Understanding the ECLIPSE MM8 I/O access patterns is essential to optimizing a shared storage solution. The analytics available on the Oracle Unified Storage 7410 provide valuable I/O characterization information even without source code access. A single MM8 job run shows an initial read and write load related to reading the input grid, parsing Petrel ascii input parameter files and creating an initial solution grid and runtime specifications. This is followed by a very long running simulation that writes data, restart files, and generates reports to the 7410. Due to the nature of the small block I/O, the mirrored configuration for the 7410 outperformed the RAID-Z2 configuration.

    A single MM8 job reads, processes, and writes approximately 240 MB of grid and property data in the first 36 seconds of execution. The actual read and write of the grid data, that is intermixed with this first stage of processing, is done at a rate of 240 MB/sec to the 7410 for each of the two operations.

    Then, it calculates and reports the well connections at an average 260 KB writes/second with 32 operations/second = 32 x 8 KB writes/second. However, the actual size of each I/O operation varies between 2 to 100 KB and there are peaks every 20 seconds. The write cache is on average operating at 8 accesses/second at approximately 61 KB/second (8 x 8 KB writes/sec). As the number of concurrent jobs increases, the interconnect traffic and random I/O operations per second to the 7410 increases.

  • MM8 multiple job startup time is reduced on shared file systems, if each job uses separate input files.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/20/2010.

Monday Aug 23, 2010

Repriced: SPC-1 Sun Storage 6180 Array (8Gb) 1.9x Better Than IBM DS5020 in Price-Performance

Results are presented on Oracle's Sun Storage 6180 array with 8Gb connectivity for the SPC-1 benchmark.
  • The Sun Storage 6180 array is more than 1.9 times better in price-performance compared to the IBM DS5020 system as measured by the SPC-1 benchmark.

  • The Sun Storage 6180 array delivers 50% more SPC-1 IOPS than the previous generation Sun Storage 6140 array and IBM DS4700 on the SPC-1 benchmark.

  • The Sun Storage 6180 array is more than 3.1 times better in price-performance compared to the NetApp FAS3040 system as measured by the SPC-1 benchmark.

  • The Sun Storage 6180 array betters the Hitachi 2100 system by 34% in price-performance on the SPC-1 benchmark.

  • The Sun Storage 6180 array has 16% better IOPS/disk drive performance than the Hitachi 2100 on the SPC-1 benchmark.

Performance Landscape

Select results for the SPC-1 benchmark comparing competitive systems (ordered by performance), data as of August 6th, 2010 from the Storage Performance Council website.

Sponsor System SPC-1 IOPS $/SPC-1
IOPS
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Results
Identifier
Hitachi HDS 2100 31,498.58 $5.85 3,967.500 $187,321 Mirroring A00076
NetApp FAS3040 30,992.39 $13.58 12,586.586 $420,800 RAID6 A00062
Oracle SS6180 (8Gb) 26,090.03 $4.37 5,145.060 $114,042 Mirroring A00084
IBM DS5020 (8Gb) 26,090.03 $8.46 5,145.060 $220,778 Mirroring A00081
Fujitsu DX80 19,492.86 $3.45 5,355.400 $67,296 Mirroring A00082
Oracle STK6140 (4Gb) 17,395.53 $4.93 1,963.269 $85,823 Mirroring A00048
IBM DS4700 (4Gb) 17,195.84 $11.67 1,963.270 $200,666 Mirroring A00046

SPC-1 IOPS = the Performance Metric
$/SPC-1 IOPS = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-1 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

80 x 146.8GB 15K RPM drives
4 x Qlogic QLE 2560 HBA

Server Configuration:

IBM system x3850 M2

Software Configuration:

MS Windows 2003 Server SP2
SPC-1 benchmark kit

Benchmark Description

SPC Benchmark-1 (SPC-1): is the first industry standard storage benchmark and is the most comprehensive performance analysis environment ever constructed for storage subsystems. The I/O workload in SPC-1 is characterized by predominately random I/O operations as typified by multi-user OLTP, database, and email servers environments. SPC-1 uses a highly efficient multi-threaded workload generator to thoroughly analyze direct attach or network storage subsystems. The SPC-1 benchmark enables companies to rapidly produce valid performance and price-performance results using a variety of host platforms and storage network topologies.

SPC1 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1, SPC-1 IOPS, $/SPC-1 IOPS reg tm of Storage Performance Council (SPC). More info www.storageperformance.org, results as of 8/6/2010. Sun Storage 6180 array 26,090.03 SPC-1 IOPS, ASU Capacity 5,145.060GB, $/SPC-1 IOPS $4.37, Data Protection Mirroring, Cost $114,042, Ident. A00084.

Repriced: SPC-2 (RAID 5 & 6 Results) Sun Storage 6180 Array (8Gb) Outperforms IBM DS5020 by up to 64% in Price-Performance

Results are presented on Oracle's Sun Storage 6180 array with 8 Gb connectivity for the SPC-2 benchmark using RAID 5 and RAID 6.
  • The Sun Storage 6180 array outperforms the IBM DS5020 system by 62% in price-performance for SPC-2 benchmark using RAID 5 data protection.

  • The Sun Storage 6180 array outperforms the IBM DS5020 system by 64% in price-performance for SPC-2 benchmark using RAID 6 data protection.

  • The Sun Storage 6180 array is over 50% faster than the previous generation systems, the Sun Storage 6140 array and IBM DS4700, on the SPC-2 benchmark using RAID 5 data protection.

Performance Landscape

Select results from Oracle and IBM competitive systems for the SPC-2 benchmark (in performance order), data as of August 7th, 2010 from the Storage Performance Council website.

Sponsor System SPC-2 MBPS $/SPC-2 MBPS ASU Capacity (GB) TSC Price Data
Protection
Level
Results Identifier
Oracle SS6180 1,286.74 $56.88 3,504.693 $73,190 RAID 6 B00044
IBM DS5020 1,286.74 $93.26 3,504.693 $120,002 RAID 6 B00042
Oracle SS6180 1,244.89 $50.40 3,504.693 $62,747 RAID 5 B00043
IBM DS5020 1,244.89 $81.73 3,504.693 $101,742 RAID 5 B00041
IBM DS4700 823.62 $106.73 1,748.874 $87,903 RAID 5 B00028
Oracle ST6140 790.67 $67.82 1,675.037 $53,622 RAID 5 B00017
Oracle ST2540 735.62 $37.32 2,177.548 $27,451 RAID 5 B00021
Oracle ST2530 672.05 $26.15 1,451.699 $17,572 RAID 5 B00026

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

Sun Storage 6180 array with 4GB cache
30 x 146.8GB 15K RPM drives (for RAID 5)
36 x 146.8GB 15K RPM drives (for RAID 6)
4 x PCIe 8 Gb single port HBA

Server Configuration:

IBM system x3850 M2

Software Configuration:

Microsoft Windows 2003 Server SP2
SPC-2 benchmark kit

Benchmark Description

The SPC Benchmark-2™ (SPC-2) is a series of related benchmark performance tests that simulate the sequential component of demands placed upon on-line, non-volatile storage in server class computer systems. SPC-2 provides measurements in support of real world environments characterized by:
  • Large numbers of concurrent sequential transfers.
  • Demanding data rate requirements, including requirements for real time processing.
  • Diverse application techniques for sequential processing.
  • Substantial storage capacity requirements.
  • Data persistence requirements to ensure preservation of data without corruption or loss.

Key Points and Best Practices

  • This benchmark was performed using RAID 5 and RAID 6 protection.
  • The controller stripe size was set to 512k.
  • No volume manager was used.

See Also

Disclosure Statement

SPC-2, SPC-2 MBPS, $/SPC-2 MBPS are regular trademarks of Storage Performance Council (SPC). More info www.storageperformance.org, results as of 8/9/2010. Sun Storage 6180 Array 1,286.74 SPC-2 MBPS, $/SPC-2 MBPS $56.88, ASU Capacity 3,504.693 GB, Protect RAID 6, Cost $73,190, Ident. B00044. Sun Storage 6180 Array 1,244.89 SPC-2 MBPS, $/SPC-2 MBPS $50.40, ASU Capacity 3,504.693 GB, Protect RAID 5, Cost $62,747, Ident. B00043.

Wednesday Jun 09, 2010

PeopleSoft Payroll 500K Employees on Sun SPARC Enterprise M5000 World Record

Oracle's Sun SPARC Enterprise M5000 server combined with Oracle's Sun Storage F5100 Flash Array system has produced World Record Performance on PeopleSoft Payroll 9.0 (North American) 500K employees benchmark.
  • The Sun SPARC Enterprise M5000 server and the Sun Storage F5100 Flash Array system processed payroll for 500K employees using 32 payroll threads 18% faster than the IBM z10 EC 2097-709 mainframe as measured for payroll processing tasks in the Peoplesoft Payroll 9.0 (North American) benchmark. This IBM mainframe is rated at 6,512 MIPS.

  • The IBM z10 mainframe with nine 4.4 GHz Gen1 processors has a list price over $6M.

  • The Sun SPARC Enterprise M5000 server together with the Sun Storage F5100 Flash Array system processed payroll for 500K employees using 32 payroll threads 92% faster than an HP rx7640 as measured for payroll processing tasks in the Peoplesoft Payroll 9.0 (North American) benchmark.

  • The Sun Storage F5100 Flash Array system is a high performance, high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

  • The Sun SPARC Enterprise M5000 server used the Oracle Solaris 10 operating system and ran with the Oracle 11gR1 database for this benchmark.

Performance Landscape

500K Employees

System Processor OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000 8x 2.53GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 9x 4.4GHz Gen1, 6,512 MIPS Z/OS /DB2 58.96 80.5 250.68 462.6 8
HP rx7640 8x 1.6GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

Times under all Run columns above represent Payroll processing and Post-processing elapsed times and furthermore:

  • Run 1 = 32 parallel job streams & Single Check option = "No"
  • Run 2 = 32 sequential jobs for Pay Calculation process & 32 parallel job streams for the rest. Single Check option = "Yes"
  • Run 3 = One job stream & Single Check option = "Yes"

Times under Result column represents Payroll processing only.

Results and Configuration Summary

Hardware Configuration:

    1 x Sun SPARC Enterprise M5000 (8 x 2.53 GHz/64 GB)
    1 x Sun Storage F5100 Flash Array (40 x 24 GB FMODs)
    1 x StorageTek 2510 (4 x 136 GB SAS 15K RPM)
    4 x Dual-Port SAS Fibre Channel Host Bus Adapters (HBA)

Software Configuration:

    Oracle Solaris 10 10/09
    Oracle PeopleSoft HCM and Campus Solutions 9.00.00.311 64-bit
    Oracle PeopleSoft Enterprise (PeopleTools) 8.49.25 64-bit
    Oracle 11g R1 11.1.0.7 64-bit
    Micro Focus COBOL Server Express 4.0 SP4 64-bit

Benchmark Description

The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing large organization. The five processes are:

  • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

  • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

For the benchmark, we collect at least three data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of thirty-two job streams to be configured to run in parallel.

Key Points and Best Practices

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle PeopleSoft Payroll 9.0 benchmark, Sun SPARC Enterprise M5000 (8 2.53GHz SPARC64 VII) 50.11 min, IBM z10 (9 gen1) 58.96 min, HP rx7640 (8 1.6GHz Itanium2) 96.17 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html, results 6/3/2010.

Tuesday May 11, 2010

Per-core Performance Myth Busting

IBM continually touts "performance per core" in sales and marketing messages.  IBM implies that higher performance per core will somehow deliver better customer experience.

Oracle's "Optimized System Performance"  vs. IBM's "Per-core Performance Focus"

Customers care about system performance and the ROI of their solution.  Does better "per-core performance" predict better system performance or price/performance?  No, is the simple answer.

Modern server & CPU designers can make various trade-offs on complexity, performance and number of threads & cores.  The best way to address these trade-offs is to look at the integrated system design.

Below are two examples where better "system design" is far more important than a focus on "per-core" performance:

  • Oracle's Sun SPARC Enterprise M9000 server delivered a single-system TPC-H 3000GB world record.
    • beats IBM's Power 595 performance by 20%
    • beats IBM's Power 595 price/performance (3 year TCO: hardware, software, maintenance, etc.)
    • Oracle database load time(4hr 45min) was over 2 times faster than IBM (10hr 2min)!
    For TPC-H, IBM used half the number of cores, but could not deliver better customer value.
  • Oracle's 12-node Sun SPARC Enterprise T5440 server cluster delivered a TPC-C world record.
    • beats IBM's Power 595 (5GHz) with IBM DB2 9.5 database performance by 26%
    • beats IBM's IBM's Power 595 price/performance (3 year TCO: hardware, software, maintenance, etc.) by 16%
    • in addition with the Oracle solution one also has better response time, Oracle's New Order response time was 7.3x faster than IBM!
    For TPC-C, IBM used one-sixth the number of cores, but could not deliver better customer value.

In conclusion, Better ROI is achieved with Oracle's Integrated System design.


Required Disclosure statements:

Sun SPARC Enterprise M9000 (32 procs, 128 cores, 128 threads) 188,229.9 QphH@3000GB, $20.19/QphH@3000GB, database load time 4:43:25, avail 04/10/10.  IBM Power 595 (32 procs, 64 cores, 128 threads) QphH@3000GB, 156,537.3 QphH@3000GB, $20.60/QphH@3000GB, database load time 10:02:25, avail 11/24/09. TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org
http://blogs.sun.com/BestPerf/tags/tpc-h

12-node Sun SPARC Enterprise T5440 Cluster (12 nodes, 48 procs, 384 cores, 3072 threads) with Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning, 7,646,486.7 tpmC, $2.36/tpmC, response time new order average 0.168, Available 3/19/10. IBM Power 595 (5GHz Power6, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5, 6,085,166 tpmC, $2.81/tpmC, response time new order average 1.22, available 12/10/08. TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Performance Processing Council (TPC), source: www.tpc.org, results as of 11/5/09.
http://blogs.sun.com/BestPerf/tags/tpc-c

Monday Mar 29, 2010

Sun Blade X6275/QDR IB/ Reverse Time Migration

Significance of Results

Oracle's Sun Blade X6275 cluster with a Lustre file system was used to demonstrate the performance potential of the system when running reverse time migration applications complete with I/O processing.

  • Reduced the Total Application run time for the Reverse Time Migration when processing 800 input traces for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes, by implementing algorithm I/O optimizations and taking advantage of MPI I/O features in HPC ClusterTools:

    • 1243x1151x1231 - Wall clock time reduced from 11.5 to 6.3 minutes (1.8x improvement)
    • 2486x1151x1231 - Wall clock time reduced from 21.5 to 13.5 minutes (1.6x improvement)
  • Reduced the I/O Intensive Trace-Input time for the Reverse Time Migration when reading 800 input traces for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes running HPC ClusterTools, by modifying the algorithm to minimize the per node data requirement and avoiding unneeded synchronization:

    • 2486x1151x1231 : Time reduced from 121.5 to 3.2 seconds (38.0x improvement)
    • 1243x1151x1231 : Time reduced from 71.5 to 2.2 seconds (32.5x improvement)
  • Reduced the I/O Intensive Grid Initialization time for the Reverse Time Migration Grid when reading the Velocity, Epsilon, and Delta slices for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes running HPC ClusterTools, by modifying the algorithm to minimize the per node grid data requirement:

    • 2486x1151x1231 : Time reduced from 15.6 to 4.9 seconds (3.2x improvement)
    • 1243x1151x1231 : Time reduced from 8.9 to 1.2 seconds (7.4x improvement)

Performance Landscape

In the tables below, the hyperthreading feature is enabled and the systems are fully utilized.

This first table presents the total application performance in minutes. The overall performance improved significantly because of the improved I/O performance and other benefits.


Total Application Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (mins)
MPI I/O
Time (mins)
Improvement Original
Time (mins)
MPI I/O
Time (mins)
Improvement
24 11.5 6.3 1.8x 21.5 13.5 1.6x
20 12.0 8.0 1.5x 21.9 15.4 1.4x
16 13.8 9.7 1.4x 26.2 18.0 1.5x
12 21.7 13.2 1.6x 29.5 23.1 1.3x

This next table presents the initialization I/O time. The results are presented in seconds and shows the advantage of the improved MPI I/O strategy.


Initialization Time Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (sec)
MPI I/O
Time (sec)
Improvement Original
Time (sec)
MPI I/O
Time (sec)
Improvement
24 8.9 1.2 7.4x 15.6 4.9 3.2x
20 9.3 1.5 6.2x 16.9 3.9 4.3x
16 9.7 2.5 3.9x 17.4 11.3 1.5x
12 9.8 3.3 3.0x 22.5 14.9 1.5x

This last table presents the trace I/O time. The results are presented in seconds and shows the significant advantage of the improved MPI I/O strategy.


Trace I/O Time Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (sec)
MPI I/O
Time (sec)
Improvement Original
Time (sec)
MPI I/O
Time (sec)
Improvement
24 71.5 2.2 32.5x 121.5 3.2 38.0x
20 67.7 2.4 28.2x 118.3 3.9 30.3x
16 64.2 2.7 23.7x 110.7 4.6 24.1x
12 69.9 4.2 16.6x 296.3 14.6 20.3x

Results and Configuration Summary

Hardware Configuration:

Oracle's Sun Blade 6048 Modular Modular System with
12 x Oracle's Sun Blade x6275 Server Modules, each with
4 x 2.93 GHz Intel Xeon QC X5570 processors
12 x 4 GB memory at 1333 MHz
2 x 24 GB Internal Flash
QDR InfiniBand Lustre 1.8.0.1 File System

Software Configuration:

OS: 64-bit SUSE Linux Enterprise Server SLES 10 SP 2
MPI: Oracle Message Passing Toolkit 8.2.1 for I/O optimization to Lustre file system
MPI: Scali MPI Connect 5.6.6-59413 for original Lustre file system runs
Compiler: Oracle Solaris Studio 12 C++, Fortran, OpenMP

Benchmark Description

The primary objective of this Reverse Time Migration Benchmark is to present MPI I/O tuning techniques, exploit the power of the Sun's HPC ClusterTools MPI I/O implementation, and demonstrate the world-class performance of Sun's Lustre File System to Exploration Geophysicists throughout the world. A Sun Blade 6048 Modular System with 12 Sun Blade X6275 server modules were clustered together with a QDR Infiniband Lustre File System to show performance improvements in the Reverse Time Migration Throughput by using the Sun HPC ClusterTools MPI-IO features to implement specific algorithm I/O optimizations.

This Reverse Time Migration Benchmark measures the total time it takes to image 800 samples of various production size grids and write the final image to disk. In this new I/O optimized version, each node reads in only the data to be processed by that node plus a 4 element inline pad shared with it's neighbors to the left and right. This latest version, essentially loads the boundary condition data during the initialization phase. The previous version handled boundary conditions by having each node read in all the trace, velocity, and conditioning data. Or, alternatively, the master node would read in all the data and distribute it in it's entirety to every node in the cluster. With the previous version, each node had full memory copies of all input data sets even when it only processed a subset of that data. The new version only holds the inline dimensions and pads to be processed by a particular node in memory.

Key Points and Best Practices

  • The original implementation of the trace I/O involved the master node reading in nx \* ny floats and communicating this trace data to all the other nodes in a synchronous manner. Each node only used a subset of the trace data for each of the 800 time steps. The optimized I/O version has each node asynchronously read in only the (nx/num_procs + 8) \* ny floats that it will be processing. The additional 8 inline values for the optimized I/O version are the 4 element pads of a node's left and right neighbors to handle initial boundary conditions. The MPI_Barrier needed for the original implementation, for synchronization, and the additional I/O for each node to load all the data values, truly impacts performance. For the I/O optimized version, each node reads only the data values it needs and does not require the same MPI_Barrier synchronization as the original version of the Reverse Time Migration Benchmark. By performing such I/O optimizations, a significant improvement is seen in the Trace I/O.

  • For the best MPI performance, allocate the X6275 nodes in blade by blade order and run with HyperThreading enabled. The "Binary Conditioning" part of the Reverse Time Migration specifically likes hyperthreading.

  • To get the best I/O performance, use a maximum of 70% of each nodes available memory for the Reverse Time Migration application. Execution time may vary I/O results can occur if the nodes have different memory size configurations.

See Also

Thursday Jan 21, 2010

SPARC Enterprise M4000 PeopleSoft NA Payroll 240K Employees Performance (16 Streams)

The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology, the Sun Storage F5100 flash array, has produced World Record Performance on PeopleSoft Payroll 9.0 (North American) 240K employees benchmark.

  • The Sun SPARC Enterprise M4000 server with four 2.53 GHz SPARC64 VII processors and the Sun Storage F5100 flash array using 16 job streams (payroll threads) is 55% faster than the HP rx6600 (4 x 1.6GHz Itanium2 processors) as measured for payroll processing tasks in the PeopleSoft Payroll 9.0 (North American) benchmark. The Sun result used the Oracle 11gR1 database running on Solaris 10.

  • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and the Sun Storage F5100 flash array is 2.1x faster than the 2027 MIPs IBM Z990 (6 Z990 Gen1 processors) as measured for payroll processing tasks in the PeopleSoft Payroll 9.0 (North American) benchmark. The Sun result use the Oracle 11gR1 database running on Solaris 10 while the IBM result was run with 8 payroll threads and used IBM DB2 for Z/OS 8.1 for the database.

  • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and a Sun Storage F5100 flash array processed payroll for 240K employees using PeopleSoft Payroll 9.0 (North American) and Oracle 11gR1 running on Solaris 10 with different execution strategies with resulted in a maximum CPU utilization of 45% compared to HP's reported CPU utilization of 89%.

  • The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology processed 16 Sequential Jobs and single run control with a total time of 534 minutes, an improvement of 19% compared to HP's time of 633 minutes.

  • Sun's FlashFire technology dramatically improves IO performance for the Peoplesoft Payroll 9.0 (North American) benchmark with significant performance boost over best optimized FC disks (60+).

  • The Sun Storage F5100 Flash Array is a high performance high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

  • Sun estimates that the MIPS rating for a Sun SPARC Enterprise M4000 server is over 3000 MIPS.

Performance Landscape

240K Employees

System Processor OS/Database Time in Minutes Num of
Streams
Ver
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M4000 4x 2.53GHz SPARC64 VII Solaris/Oracle 11gR1 43.78 51.26 286.11 534.35 16 9.0
HP rx6600 4x 1.6GHz Itanium2 HP-UX/Oracle 11g 68.07 81.17 350.16 633.25 16 9.0
IBM Z990 6x Gen1 2027 MIPS Z/OS /DB2 91.70 107.34 328.66 544.80 8 9.0

Note: IBM benchmark documents show that 6 Gen1 procs is 2027 mips. 13 Gen1 processors were in this config but only 6 were available for testing.

Results and Configuration Summary

Hardware Configuration:

    1 x Sun SPARC Enterprise M4000 (4 x 2.53 GHz/32GB)
    1 x Sun Storage F5100 Flash Array (40 x 24GB FMODs)
    1 x Sun Storage J4200 (12 x 450GB SAS 15K RPM)

Software Configuration:

    Solaris 10 5/09
    Oracle PeopleSoft HCM 9.0 64-bit
    Oracle PeopleSoft Enterprise (PeopleTools) 8.49.08 64-bit
    Micro Focus Server Express 4.0 SP4 64-bit
    Oracle RDBMS 11.1.0.7 64-bit
    HP's Mercury Interactive QuickTest Professional 9.0

Benchmark Description

The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing large organization. The five processes are:

  • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

  • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

For the benchmark, we collect at least three data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of sixteen job streams to be configured to run in parallel.

Key Points and Best Practices

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle PeopleSoft Payroll 9.0 benchmark, Sun M4000 (4 2.53GHz SPARC64) 43.78 min, IBM Z990 (6 gen1) 91.70 min, HP rx6600 (4 1.6GHz Itanium2) 68.07 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html, results 1/21/2010.

Tuesday Nov 24, 2009

Sun M9000 Fastest SAP 2-tier SD Benchmark on current SAP EP4 for SAP ERP 6.0 (Unicode)

The Sun SPARC Enterprise M9000 server (64 processors, 256 cores, 512 threads) set a World Record on the SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.
  • The Sun SPARC Enterprise M9000 server with 2.88 GHz SPARC64 VII processors achieved 32,000 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.

  • The Sun SPARC Enterprise M9000 server result is 8.6x faster than the only IBM 5GHz POWER6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • IBM has not submitted any IBM 595 results on the current SAP enhancement package 4 for SAP ERP 6.0 (unicode) Standard Sales and Distribution (SD) Benchmark. This benchmark has been current for almost a year. IBM p595 systems only have 8x more cores than the system than IBM system 550.

  • HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • This new result is 1.84x times greater than the previous record result delivered on the Sun SPARC Enterprise M9000 server which used 32 processors.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note 1139642 for more details.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape SAP enhancement package 4 for SAP ERP 6.0 (Unicode) Results (in decreasing performance)

(ERP 6.0 EP is the current version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS Date
Sun SPARC Enterprise M9000
64xSPARC 64 VII @2.88GHz
1152 GB
Solaris 10
Oracle10g
32,000 2009
6.0 EP4
(Unicode)
175,600 18-Nov-09
Sun SPARC Enterprise M9000
32xSPARC 64 VII @2.88GHz
1024 GB
Solaris 10
Oracle10g
17,430 2009
6.0 EP4
(Unicode)
95,480 12-Oct-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 16-Jun-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Results and Configuration Summary

Certified Result:

    Number of SAP SD benchmark users:
    32,000
    Average dialog response time:
    0.93 seconds
    Throughput:

    Fully processed order line items/hour:
    3,512,000

    Dialog steps/hour:
    10,536,000

    SAPS:
    175,600
    SAP Certification:
    2009046

Hardware Configuration:

    Sun SPARC Enterprise M9000
      64 x 2.88GHz SPARC64 VII, 1152 GB memory

Software Configuration:

    Solaris 10
    SAP enhancement package 4 for SAP ERP 6.0 (unicode)
    Oracle10g

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmarks as of 11/18/09: Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 32,000 SAP SD Users, 64 x 2.88 GHz SPARC VII, 1152 GB memory, Oracle10g, Solaris10, Cert# 2009046. Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) 17,430 SAP SD Users, 32 x 2.88 GHz SPARC VII, 1024 GB memory, Oracle10g, Solaris10, Cert# 2009038. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Friday Nov 20, 2009

Sun Blade 6048 and Sun Blade X6275 NAMD Molecular Dynamics Benchmark beats IBM BlueGene/L

Significance of Results

A Sun Blade 6048 chassis with 48 Sun Blade X6275 server modules ran benchmarks using the NAMD molecular dynamics applications software. NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD is driven by major trends in computing and structural biology and received a 2002 Gordon Bell Award.

  • The cluster of 32 Sun Blade X6275 server modules was 9.2x faster than the 512 processor configuration of the IBM BlueGene/L.

  • The cluster of 48 Sun Blade X6275 server modules exhibited excellent scalability for NAMD molecular dynamics simulation, up to 37.8x speedup for 48 blades relative to 1 blade.

  • For largest molecule considered, the cluster of 48 Sun Blade X6275 server modules achieved a throughput of 0.028 seconds per simulation step.
Molecular dynamics simulation is important to biological and materials science research. Molecular dynamics is used to determine the low energy conformations or shapes of a molecule. These conformations are presumed to be the biologically active conformations.

Performance Landscape

The NAMD Performance Benchmarks web page plots the performance of NAMD when the ApoA1 benchmark is executed on a variety of clusters. The performance is expressed in terms of the time in seconds required to execute one step of the molecular dynamics simulation, multiplied by the number of "processors" on which NAMD executes in parallel. The following table compares the performance of the Sun Blade X6275 cluster to several of the clusters for which performance is reported on the web page. In this table, the performance is expressed in terms of the time in seconds required to execute one step of the molecular dynamics simulation. A smaller number implies better performance.

Cluster Name and Interconnect Throughput for 128 Cores
(seconds per step)
Throughput for 256 Cores
(seconds per step)
Throughput for 512 Cores
(seconds per step)
Sun Blade X6275 InfiniBand 0.014 0.0073 0.0048
Cambridge Xeon/3.0 InfiniPath 0.016 0.0088 0.0056
NCSA Xeon/2.33 InfiniBand 0.019 0.010 0.008
AMD Opteron/2.2 InfiniPath 0.025 0.015 0.008
IBM HPCx PWR4/1.7 Federation 0.039 0.021 0.013
SDSC IBM BlueGene/L MPI 0.108 0.061 0.044

The following tables report results for NAMD molecular dynamics using a cluster of Sun Blade X6275 server modules. The performance of the cluster is expressed in terms of the time in seconds that is required to execute one step of the molecular dynamics simulation. A smaller number implies better performance.

Blades Cores STMV molecule (1) f1 ATPase molecule (2) ApoA1 molecule (3)
Thruput
(secs/ step)
spdup effi'cy Thruput
(secs/ step)
spdup effi'cy Thruput
(secs/ step)
spdup effi'cy
48 768 0.0277 37.8 79% 0.0075 35.2 73% 0.0039 22.2 46%
36 576 0.0324 32.3 90% 0.0096 27.4 76% 0.0045 19.3 54%
32 512 0.0368 28.4 89% 0.0104 25.3 79% 0.0048 18.1 57%
24 384 0.0481 21.8 91% 0.0136 19.3 80% 0.0066 13.2 55%
16 256 0.0715 14.6 91% 0.0204 12.9 81% 0.0073 11.9 74%
12 192 0.0875 12.0 100% 0.0271 9.7 81% 0.0096 9.1 76%
8 128 0.1292 8.1 101% 0.0337 7.8 98% 0.0139 6.3 79%
4 64 0.2726 3.8 95% 0.0666 4.0 100% 0.0224 3.9 98%
1 16 1.0466 1.0 100% 0.2631 1.0 100% 0.0872 1.0 100%

spdup - speedup versus 1 blade result
effi'cy - speedup efficiency versus 1 blade result

(1) Satellite Tobacco Mosaic Virus (STMV) molecule, 1,066,628 atoms, 12 Angstrom cutoff, Langevin dynamics, 500 time steps
(2) f1 ATPase molecule, 327,506 atoms, 11 Angstrom cutoff, particle mesh Ewald dynamics, 500 time steps
(3) ApoA1 molecule, 92,224 atoms, 12 Angstrom cutoff, particle mesh Ewald dynamics, 500 time steps

Results and Configuration Summary

Hardware Configuration

    48 x Sun Blade X6275, each with
      2 x (2 x 2.93 GHz Intel QC Xeon X5570 (Nehalem) processors)
      2 x (24 GB memory)
      Hyper-Threading (HT) off, Turbo Mode on

Software Configuration

    SUSE Linux Enterprise Server 10 SP2 kernel version 2.6.16.60-0.31_lustre.1.8.0.1-smp
    OpenMPI 1.3.2
    gcc 4.1.2 (1/15/2007), gfortran 4.1.2 (1/15/2007)

Benchmark Description

Molecular dynamics simulation is widely used in biological and materials science research. NAMD is a public-domain molecular dynamics software application for which a variety of molecular input directories are available. Three of these directories define:
  • the Satellite Tobacco Mosaic Virus (STMV) that comprises 1,066,628 atoms
  • the f1 ATPase enzyme that comprises 327,506 atoms
  • the ApoA1 enzyme that comprises 92,224 atoms
Each input directory also specifies the type of molecular dynamics simulation to be performed, for example, Langevin dynamics with a 12 Angstrom cutoff for 500 time steps, or particle mesh Ewald dynamics with an 11 Angstrom cutoff for 500 time steps.

Key Points and Best Practices

Models with large numbers of atoms scale better than models with small numbers of atoms.

The Intel QC X5570 processors include a turbo boost feature coupled with a speed-step option in the CPU section of the Advanced BIOS settings. Under specific circumstances, this can provide cpu overclocking which increases the processor frequency from 2.93GHz to 3.33GHz. This feature was was enabled when generating the results reported here.

See Also

Disclosure Statement

NAMD, see http://www.ks.uiuc.edu/Research/namd/performance.html for more information, results as of 11/17/2009.

Tuesday Oct 13, 2009

SAP 2-tier SD Benchmark on Sun SPARC Enterprise M9000/32 SPARC64 VII

Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark Sun SPARC Enterprise M9000/32 SPARC64 VII

World Record on 32-processor using SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • The Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) set a World Record on 32-processor using SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark, as Oct. 12th, 2009.

  • The 32-way Sun SPARC Enterprise M9000 with 2.88 GHz SPARC64 VII+ processors achieved 17,430 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.

  • The Sun SPARC Enterprise M9000 result is 4.6x faster than the only IBM 5GHz Power6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • IBM has not submitted any p595 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS Date
Sun SPARC Enterprise M9000
32xSPARC 64 VII @2.88GHz
1024 GB
Solaris 10
Oracle10g
17,430 2009
6.0 EP4
(Unicode)
95,480 12-Oct-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 16-Jun-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Certified Result:

    Number of SAP SD benchmark users:
    17,430
    Average dialog response time:
    0.95 seconds
    Throughput:

    Fully processed order line items/hour:
    1,909,670

    Dialog steps/hour:
    5,729,000

    SAPS:
    95,480
    SAP Certification:
    2009038

Hardware Configuration:

    Sun SPARC Enterprise M9000
      32 x 2.88GHz SPARC64 VII, 1024 GB memory
      6 x 6140 storage arrays

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle10g

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 10/12/09: Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) 17,430 SAP SD Users, 32 x 2.88 GHz SPARC VII, 1024 GB memory, Oracle10g, Solaris10, Cert# 2009038. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Monday Oct 12, 2009

SPC-2 Sun Storage 6180 Array RAID 5 & RAID 6 Over 70% Better Price Performance than IBM

Significance of Results

Results on the Sun Storage 6180 Array with 8Gb connectivity are presented for the SPC-2 benchmark using RAID 5 and RAID 6.
  • The Sun Storage 6180 Array outperforms the IBM DS5020 by 77% in price performance for SPC-2 benchmark using RAID 5 data protection.

  • The Sun Storage 6180 Array outperforms the IBM DS5020 by 91% in price performance for SPC-2 benchmark using RAID 6 data protection.

  • The Sun Storage 6180 Array is 50% faster than the previous generation, the Sun Storage 6140 Array and IBM DS4700 on the SPC-2 benchmark using RAID 5 data protection.

Performance Landscape

SPC-2 Performance Chart (in increasing price-performance order)

Sponsor System SPC-2 MBPS $/SPC-2 MBPS ASU Capacity (GB) TSC Price Data Protection Level Date Results Identifier
Sun SS6180 1,286.74 $45.47 3,504.693 $58,512 RAID 6 10/08/09 B00044
IBM DS5020 1,286.74 $87.04 3,504.693 $112,002 RAID 6 10/08/09 B00042
Sun SS6180 1,244.89 $42.53 3,504.693 $52,951 RAID 5 10/08/09 B00043
IBM DS5020 1,244.89 $75.30 3,504.693 $93,742 RAID 5 10/08/09 B00041
Sun J4400 887.44 $25.63 23,965.918 $22,742 unprotected 08/15/08 B00034
IBM DS4700 823.62 $106.73 1,748.874 $87,903 RAID 5 04/01/08 B00028
Sun ST6140 790.67 $67.82 1,675.037 $53,622 RAID 5 02/13/07 B00017
Sun ST2540 735.62 $37.32 2,177.548 $27,451 RAID 5 04/10/07 B00021
IBM DS3400 731.25 $34.36 1,165.933 $25,123 RAID 5 02/27/08 B00027
Sun ST2530 672.05 $26.15 1,451.699 $17,572 RAID 5 08/16/07 B00026
Sun J4200 548.80 $22.92 11,995.295 $12,580 Unprotected 07/10/08 B00033

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

    30 146.8GB 15K RPM drives (for RAID 5)
    36 146.8GB 15K RPM drives (for RAID 6)
    4 Qlogic HBA

Server Configuration:

    IBM system x3850 M2

Software Configuration:

    MS Win 2003 Server SP2
    SPC-2 benchmark kit

Benchmark Description

The SPC Benchmark-2™ (SPC-2) is a series of related benchmark performance tests that simulate the sequential component of demands placed upon on-line, non-volatile storage in server class computer systems. SPC-2 provides measurements in support of real world environments characterized by:
  • Large numbers of concurrent sequential transfers.
  • Demanding data rate requirements, including requirements for real time processing.
  • Diverse application techniques for sequential processing.
  • Substantial storage capacity requirements.
  • Data persistence requirements to ensure preservation of data without corruption or loss.

Key Points and Best Practices

  • This benchmark was performed using RAID 5 and RAID 6 protection.
  • The controller stripe size was set to 512k.
  • No volume manager was used.

See Also

Disclosure Statement

SPC-2, SPC-2 MBPS, $/SPC-2 MBPS are regular trademarks of Storage Performance Council (SPC). More info www.storageperformance.org. Sun Storage 6180 Array 1,286.74 SPC-2 MBPS, $/SPC-2 MBPS $45.47, ASU Capacity 3,504.693 GB, Protect RAID 6, Cost $58,512.00, Ident. B00044. Sun Storage 6180 Array 1,244.89 SPC-2 MBPS, $/SPC-2 MBPS $42.53, ASU Capacity 3,504.693 GB, Protect RAID 5, Cost $52,951.00, Ident. B00043.

SPC-1 Sun Storage 6180 Array Over 70% Better Price Performance than IBM

Significance of Results

Results on the Sun Storage 6180 Array with 8Gb connectivity are presented for the SPC-1 benchmark.
  • The Sun Storage 6180 Array outperforms the IBM DS5020 by 72% in price performance on the SPC-1 benchmark.

  • The Sun Storage 6180 Array is 50% faster than the previous generation, Sun Storage 6140 Array and IBM DS4700 on the SPC-1 benchmark.

  • The Sun Storage 6180 Array betters the HDS 2100 by 27% in price performance on the SPC-1 benchmark.

  • The Sun Storage 6180 Array has 16% better IOPS/Drive performance than the HDS 2100 on the SPC-1 benchmark.

Performance Landscape

SPC-1 Performance Chart (in increasing price-performance order)

Sponsor System SPC-1 IOPS $/SPC-1 IOPS ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
HDS AMD 2300 42,502.61 $6.96 7,955.000 $295,740 Mirroring 3/24/09 A00077
HDS AMD 2100 31,498.58 $5.85 3,967.500 $187,321 Mirroring 3/24/09 A00076
Sun SS6180 (8Gb) 26,090.03 $4.70 5,145.060 $122,623 Mirroring 10/09/09 A00084
IBM DS5020 (8Gb) 26,090.03 $8.08 5,145.060 $210,782 Mirroring 8/25/09 A00081
Fujitsu DX80 19,492.86 $3.45 5,355.400 $67,296 Mirroring 9/14/09 A00082
Sun STK6140 (4Gb) 17,395.53 $4.93 1,963.269 $85,823 Mirroring 10/16/06 A00048
IBM DS4700 (4Gb) 17,195.84 $11.67 1,963.270 $200,666 Mirroring 8/21/06 A00046

SPC-1 IOPS = the Performance Metric
$/SPC-1 IOPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-1 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

    80 x 146.8GB 15K RPM drives
    8 Qlogic HBA

Server Configuration:

    IBM system x3850 M2

Software Configuration:

    MS Windows 2003 Server SP2
    SPC-1 benchmark kit

Benchmark Description

SPC Benchmark-1 (SPC-1): is the first industry standard storage benchmark and is the most comprehensive performance analysis environment ever constructed for storage subsystems. The I/O workload in SPC-1 is characterized by predominately random I/O operations as typified by multi-user OLTP, database, and email servers environments. SPC-1 uses a highly efficient multi-threaded workload generator to thoroughly analyze direct attach or network storage subsystems. The SPC-1 benchmark enables companies to rapidly produce valid performance and price/performance results using a variety of host platforms and storage network topologies.

SPC1 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1, SPC-1 IOPS, $/SPC-1 IOPS reg tm of Storage Performance Council (SPC). More info www.storageperformance.org. Sun Storage 6180 Array 26,090.03 SPC-1 IOPS, ASU Capacity 5,145.060GB, $/SPC-1 IOPS $4.70, Data Protection Mirroring, Cost $122,623, Ident. A00084.


Tuesday Sep 22, 2009

Sun X4270 Virtualized for Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

Two-Processor Performance using 8 Virtual CPU Solaris 10 Container Configuration:
  • Sun achieved 36% better performance using Solaris and Solaris 10 containers than a similar configuration on SUSE Linux using VMware ESX Server 4.0 on the same benchmark both using 8 virtual cpus.
  • Solaris Containers are the best virtualization technology for SAP projects and has been supported for more than 4 years. Other virtualization technologies suffer various overheads that decrease performance.
  • The Sun Fire X4270 server with 48G memory and a Solaris 10 container configured with 8 virtual CPUs achieved 2800 SAP SD Benchmark users and beat the Fujitsu PRIMERGY RX300 S5 server with 96G memory and the SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0 by 36%. Both results used the same CPUs and were running the SAP ERP application release 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark.
  • Both the Sun and Fujitsu results were run at 50% and 48% utilization respectively. With these servers being half utilized, there is headroom for additional performance.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details. Note: username and password for SAP Service Marketplace required.
  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details. Note: username and password for SAP Service Marketplace required.

SAP-SD 2-Tier Performance Landscape (in decreasing performance order).


SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results (New version of the benchmark as of January 2009)

System OS
Database
Virtualized? Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
no 3,800 2009
6.0 EP4
(Unicode)
21,000 10,500 21-Aug-09
IBM System 550
4xPower6 @5GHz
64 GB
AIX 6.1
DB2 9.5
no 3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
SUSE Linux Ent Svr 10
MaxDB 7.8
no 3,171 2009
6.0 EP4
(Unicode)
17,380 8,690 17-Apr-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10 container
(8 virtual CPUs)
Oracle 10g
YES
50% util
2,800 2009
6.0 EP4
(Unicode)
15,320 7,660 10-Sep-09
Fujitsu PRIMERGY RX300 S5
2xIntel Xeon X5570 @2.93GHz
96 GB
SUSE Linux Ent Svr 10 on
VMware ESX Server 4.0
MaxDB 7.8
YES
48% util
2,056 2009
6.0 EP4
(Unicode)
11,230 5,615 04-Aug-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4270
      2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
      48 GB memory
      Sun StorageTek CSM200 with 32 \* 73GB 15KRPM 4Gb FC-AL and 32 \* 146GB 15KRPM 4Gb FC-AL Drives

Software Configuration:

    Solaris 10 container configured with 8 virtual CPUs
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle 10g

Sun has submitted the following result for the SAP-SD 2-Tier benchmark. It was approved and published by SAP.

      Number of benchmark users:
    2,800
      Average dialog response time:
    0.971 s

    Fully processed order line:
    306,330

    Dialog steps/hour:
    919,000

    SAPS:
    15,320
      SAP Certification:
    2009034

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Key Points and Best Practices

  • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

  • Solaris 10 Container best practices how-to guide

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 09/10/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) run in 8 virtual cpu container, 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009034. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10, Cert# 2009006. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10 container configured with 8 virtual CPUs, Cert# 2009034. Fujitsu PRIMERGY Model RX300 S5 (2 processors, 8 cores, 16 threads) 2,056 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 96 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0, Cert# 2009029.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

Tuesday Sep 01, 2009

String Searching - Sun T5240 & T5440 Outperform IBM Cell Broadband Engine

Significance of Results

Sun SPARC Enterprise T5220, T5240 and T5440 servers ran benchmarks using the Aho-Corasick string searching algorithm. String searching or pattern matching are important to a variety of commercial, government and HPC applications. One of the core functions needed for text identification algorithms in data repositories is real-time string searching. For this benchmark, the IBM, HP and Sun systems used the Aho-Corasick algorithm for string searching.

Sun SPARC Enterprise T5440

  • A 1.6 GHz Sun SPARC Enterprise T5440 server could search a book as tall as Mt. Everest (29,208 feet, 861 GB book) in 61 seconds, which corresponds to a string search rate of 14.2 GB/s.

  • A 1.6 GHz Sun SPARC Enterprise T5440 server can search at a rate of 14.2 GB/s, which corresponds to searching a book containing one terabyte of data (34,745 feet high) in only 70 seconds.

  • The 4-chip 1.6 GHz Sun SPARC Enterprise T5440 server performed string searching at a rate of 14.2 GB/s which is 29.9 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s

  • The 4-chip 1.6 GHz Sun SPARC Enterprise T5440 server performed string searching 3.7 times as fast as the 4-chip HP DL-580 (2.93 GHz Xeon QC) server that performed string searching at a rate of 3.87 GB/s. The 1.6 GHz Sun SPARC Enterprise T5440 server has a 1.7 times advantage in delivered power-performance over the HP DL-580 (using a power consumption rate of 830 watts for the HP system as measured on other tests).

  • The 1.6 GHz Sun SPARC Enterprise T5440 server demonstrated a 12% improvement over the 1.4 GHz Sun SPARC Enterprise T5440.

  • The 1.6 GHz Sun SPARC Enterprise T5440 server demonstrated a 2x speedup over the 1.6 GHz Sun SPARC Enterprise T5240 server which demonstrated a 2.3x speedup over the 1.4 GHz Sun SPARC Enterprise T5220 server.

Sun SPARC Enterprise T5240

  • The 2-chip 1.6 GHz Sun SPARC Enterprise T5240 server performed string searching at a rate of 7.22 GB/s which is 15.4 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s.

  • The 2-chip 1.6 GHz Sun SPARC Enterprise T5240 server performed string searching 1.9 times as fast as the 4-chip HP DL-580 (2.93 GHz Xeon QC) server that performed string searching at a rate of 3.87 GB/s. The 1.6 GHz Sun SPARC Enterprise T5240 server has a 2.4 times advantage in delivered power-performance over the HP DL-580 (using a power consumption rate of 830 watts for the HP system as measured on other

  • The 1.6 GHz Sun SPARC Enterprise T5240 server demonstrated a 14% speedup over the 1.4 GHz Sun SPARC Enterprise T5240 server.

Sun SPARC Enterprise T5220

  • The 1-chip 1.4 GHz Sun SPARC Enterprise T5220 server performed string searching at a rate of 3.16 GB/s which is 6.7 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s.

Performance Landscape

System Throughput
(GB/sec)
Chips Cores
Sun SPARC Enterprise T5440 (1.6 GHz) 14.2 4 32
Sun SPARC Enterprise T5440 (1.4 GHz) 12.7 4 32
Sun SPARC Enterprise T5240 (1.6 GHz) 7.2 2 16
Sun SPARC Enterprise T5240 (1.4 GHz) 6.4 2 16
HP DL-580 (2.9 GHz) 3.9 4 16
Sun SPARC Enterprise T5220 (1.4 GHz) 3.2 1 8
IBM Cell Broadband Engine DD3 Blade (3.2 GHz) 0.475 2 16

Results and Configuration Summary

Hardware Configuration:
    Sun SPARC Enterprise T5440 (1.6 GHz)
      4 x 1.6 GHz UltraSPARC T2 Plus processors
      256 GB
    Sun SPARC Enterprise T5440 (1.4 GHz)
      4 x 1.4 GHz UltraSPARC T2 Plus processors
      128 GB
    Sun SPARC Enterprise T5240 (1.6 GHz)
      2 x 1.6 GHz UltraSPARC T2 Plus processors
      64 GB
    Sun SPARC Enterprise T5240 (1.4 GHz)
      2 x 1.4 GHz UltraSPARC T2 Plus processors
      64 GB
    Sun SPARC Enterprise T5220 (1.4 GHz)
      1 x 1.4 GHz UltraSPARC T2 processor
      32 GB

Software Configuration:

    Sun SPARC Enterprise T5440 (1.6 GHz)
      OpenSolaris 2009.06
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5440 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5240 (1.6 GHz)
      OpenSolaris 2009.06
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5240 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5220 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)

Benchmark Description

One of the core functions needed for text identification algorithms in data repositories is real-time string searching. This string searching benchmark demonstrates the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code creation and speed of code execution. In IEEE Computer, Volume 41, Number 4, pp. 42-50, April 2008, IBM describes a variant of the Aho-Corasick string searching algorithm that uses deterministic finite automata. The algorithm first constructs a graph that represents a dictionary, then walks that graph using successive input characters from a text file. Each "state" in the graph includes a state transition table (STT) that is accessed using the next input character from the text file to determine the address of the next state in the graph. IBM defines an automaton as a two-step loop that: (1) obtains the address of the next state from the STT, and (2) fetches the next state in the graph.

IBM reports the performance of its Cell Broadband Engine (CBE) to execute this algorithm to search a 4.4 MB version of the King James Bible using a dictionary of the 20,000 most used words in the English language (average word length of 7.59 characters). Each of the 8 synergistic processing elements (SPEs) of each of the two CBEs executes 16 automata, for a total of 256 automata. All automata and hence all SPEs access a single, shared dictionary.

IBM describes elaborate optimizations of the Aho-Corasick algorithm, including state shuffling, state replication, alphabet shuffling and state caching. These optimizations were required to: (1) overcome "memory congestion", i.e., contention amongst the SPEs for access to the shared dictionary, and (2) compensate for the limited local storage that is associated with each SPE. These optimizations were necessary to achieve the performance reported for the CBE DD3 Blade.

IBM does not provide references that indicate where to obtain the dictionary and Bible. IBM reports the algorithmic performance in Gbits/s but does not indicate whether an 8-bit byte is extended to 10 bits as required for network transmission.

In order to closely approximate the dictionary and Bible that were used by IBM, Sun used a dictionary of 25,143 English words (the Open Solaris file cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/spell/list) for which the average word length is 7.2 characters, and a 4.6 MB version of the King James Bible (www.patriot.net/users/bmcgin/kjv12.zip). For reporting of results in Gbits/s, the length of a byte is assumed to be 8 bits.

Key Points and Best Practices

  • Power was measured during execution of the Aho-Corasick algorithm using a WattsUp power meter, and the average rate of power consumption is presented.

  • The Aho-Corasick algorithm as deployed on the IBM Cell Broadband Engine DD3 Blade required substantial optimization and tuning to achieve the reported performance, whereas on the Sun SPARC Enterprise T5220, T5240 or T5440 servers only a basic implementation of the algorithm and a simple compilation were needed.

  • In order to demonstrate the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code generation and speed of code execution, Sun implemented the Aho-Corasick algorithm using ANSI C. No optimizations of the algorithm were required to achieve the performance reported for the T5220, T5240 and T5440. The source code was compiled using the -m32 -xO3 and -xopenmp options. The dictionary is represented using a graph that comprises 82 MB. Each core of the T5220, T5240 or T5440 executes 8 automata using one OpenMP thread per automaton. Thus, the T5220 executes 64 total automata, the T5240 executes 128 total automata and the T5440 executes 256 total automata. All automata and hence all cores access a single, shared dictionary. Access to this dictionary is accelerated by the large, shared L2 caches of the Sun SPARC Enterprise T5220, T5240 and T5440.

See Also

Friday Aug 28, 2009

Sun X4270 World Record SAP-SD 2-Processor Two-tier SAP ERP 6.0 EP 4 (Unicode)

Sun Fire X4270 Server World Record Two Processor performance result on Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • World Record 2-processor performance result on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark on the Sun Fire X4270 server.

  • The Sun Fire X4270 server with two Intel Xeon X5570 processors (8 cores, 16 threads) achieved 3,800 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using Oracle 10g database and Solaris 10 operating system.

  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.

  • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the IBM System 550 server using 4 POWER6 processors, 64 GB memory and the AIX 6.1 operating system.
  • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the HP ProLiant BL460c G6 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Windows Server 2008 operating system.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,800 2009
6.0 EP4
(Unicode)
21,000 10,500 21-Aug-09
IBM System 550
4xPower6 @5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,700 2009
6.0 EP4
(Unicode)
20,300 10,150 30-Mar-09
HP ProLiant BL460c G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,415 2009
6.0 EP4
(Unicode)
18,670 9,335 04-Aug-09
Fujitsu PRIMERGY TX/RX 300 S5
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,328 2009
6.0 EP4
(Unicode)
18,170 9,085 13-May-09
HP ProLiant BL460c G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,310 2009
6.0 EP4
(Unicode)
18,070 9,035 27-Mar-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,300 2009
6.0 EP4
(Unicode)
18,030 9,015 27-Mar-09
Fujitsu PRIMERGY BX920 S1
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,260 2009
6.0 EP4
(Unicode)
17,800 8,900 18-Jun-09
NEC Express5800
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,250 2009
6.0 EP4
(Unicode)
17,750 8,875 28-Jul-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
SuSE Linux Enterprise Server 10
MaxDB 7.8
3,171 2009
6.0 EP4
(Unicode)
17,380 8,690 17-Apr-09

Complete benchmark results may be found at the SAP benchmark website: http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4270
      2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
      48 GB memory
      Sun Storage 6780 with 48 x 73GB 15KRPM 4Gb FC-AL and 16 x 146GB 15KRPM 4Gb FC-AL Drives

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle 10g

Certified Results:

          Performance: 3800 benchmark users
          SAP Certification: 2009033

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Key Points and Best Practices

  • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

See Also

Benchmark Tags

World-Record, Performance, SAP-SD, Solaris, Oracle, Intel, X64, x86, HP, IBM, Application, Database

Disclosure Statement

    Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 08/21/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,700 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009005. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,415 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009031. Fujitsu PRIMERGY TX/RX 300 S5 (2 processors, 8 cores, 16 threads) 3,328 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009014. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,310 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009003. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,300 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009004. Fujitsu PRIMERGY BX920 S1 (2 processors, 8 cores, 16 threads) 3,260 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009024. NEC Express5800 (2 processors, 8 cores, 16 threads) 3,250 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009027. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, MaxDB 7.8, SuSE Linux Enterprise Server 10, Cert# 2009006. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071.

    SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

Thursday Aug 27, 2009

Sun SPARC Enterprise T5240 with 1.6GHz UltraSPARC T2 Plus Beats 4-Chip IBM Power 570 POWER6 System on SPECjbb2005

Significance of Results

A Sun SPARC Enterprise T5240 server equipped with two UltraSPARC T2 Plus processors at 1.6GHz delivered a result of 422782 SPECjbb2005 bops, 26424 SPECjbb2005 bops/JVM. The Sun SPARC Enterprise T5240 consumed an average of 875 Watts of power during the execution of the benchmark.

  • The Sun SPARC Enterprise T5240 server running 2x 1.6 GHz UltraSPARC T2 Plus processor delivered 5% better performance than an IBM Power 570 with 4x 4.7 GHz POWER6 processors as measured by the SPECjbb2005 benchmark.

  • The Sun SPARC Enterprise T5240 server equipped with two UltraSPARC T2 Plus processors at 1.6GHz demonstrated 10% better performance than the Sun SPARC Enterprise T5240 server equipped with two UltraSPARC T2 Plus processors at 1.4GHz.
  • One Sun SPARC Enterprise T5240 (two 1.6GHz UltraSPARC T2 Plus chips, 2RU) has 2.3 times the power/performance than the IBM Power 570 (8RU) that used four 4.7GHz POWER6 chips.
  • The Sun SPARC Enterprise T5240 used OpenSolaris 2009.06 and the Sun JDK 1.6.0_14 Performance Release to obtain this result.

Performance Landscape

SPECjbb2005 Performance Chart (ordered by performance), select results presented.

bops : SPECjbb2005 Business Operations per Second (bigger is better)

System Processors Performance
Chips Cores Threads GHz Type bops bops/JVM
Sun SPARC Enterprise T5240 2 16 128 1.6 UltraSPARC T2 Plus 422782 26424
IBM Power 570 4 8 16 4.7 POWER6 402923 100731
Sun SPARC Enterprise T5240 2 16 128 1.4 UltraSPARC T2 Plus 384934 24058

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise T5240
      2 x 1.6 GHz UltraSPARC T2 Plus processors
      64 GB

Software Configuration:

    OpenSolaris 2009.06
    Java HotSpot(TM) 32-Bit Server, Version 1.6.0_14 Performance Release

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Key Points and Best Practices

  • Each JVM executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Each JVM was bound to a separate processor containing 1 core to reduce memory access latency using the physical memory closest to the processor.

See Also

Disclosure Statement

SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 8/25/2009 on http://www.spec.org.
Sun SPARC T5240 (2 chips, 16 cores) 422782 SPECjbb2005 bops, 26424 SPECjbb2005 bops/JVM;Sun SPARC T5240 (2 chips, 16 cores) 384934 SPECjbb2005 bops, 24058 SPECjbb2005 bops/JVM; IBM Power 570 (4 chips, 8 cores) 402923 SPECjbb2005 bops, 100731 SPECjbb2005 bops/JVM.

Sun watts were measured on the system during the test.

IBM p 570 4P (2 building blocks) power specifications calculated as 80% of maximum input power reported 7/8/09 in 'Facts and Features Report': ftp://ftp.software.ibm.com/common/ssi/pm/br/n/psb01628usen/PSB01628USEN.PDF

Wednesday Aug 26, 2009

Sun SPARC Enterprise T5220 with 1.6GHz UltraSPARC T2 Sets Single Chip World Record on SPECjbb2005

Significance of Results

A Sun SPARC Enterprise T5220 server equipped with one UltraSPARC T2 processor at 1.6GHz delivered a World Record single-chip result of 231464 SPECjbb2005 bops, 28933 SPECjbb2005 bops/JVM. The Sun SPARC Enterprise T5220 consumed an average of 520 Watts of power during the execution of this benchmark.

  • The Sun SPARC Enterprise T5220 server (one 1.6 GHz UltraSPARC T2 chip) demonstrated 3% better performance over the Fujitsu TX100 result of 223691 SPECjbb2005 bops which used one 3.16 GHz Xeon X3380 processor.
  • The Sun SPARC Enterprise T5220 (one 1.6 GHz UltraSPARC T2 chip) demonstrated 8% better performance over the IBM x3200 result of 214578 SPECjbb2005 bops which used one 3.16 GHz Xeon X3380 processor.
  • The Sun SPARC Enterprise T5220 server (one 1.6 GHz UltraSPARC T2 chip) demonstrated 10% better performance over the Fujitsu RX100 result of 211144 SPECjbb2005 bops which used one 3.16 GHz Xeon X3380 processor.
  • The Sun SPARC Enterprise T5220 server (one 1.6 GHz UltraSPARC T2 chip) demonstrated 19% better performance over the IBM X3350 result of 194256 SPECjbb2005 bops which used one 3 GHz Xeon X3370 processor.
  • The Sun SPARC Enterprise T5220 server (one 1.6 GHz UltraSPARC T2 chip) demonstrated 2.6X the performance over the IBM p570 result of 88089 SPECjbb2005 bops which used one 4.7 GHz POWER6 processor.
  • One Sun SPARC Enterprise T5220 (one 1.6GHz UltraSPARC T2 Plus chip, 2RU) has 2.1 the power/performance than the IBM Power 570 (4RU) that used two 4.7GHz POWER6 chips.
  • The Sun SPARC Enterprise T5220 used OpenSolaris 2009.06 and the Sun JDK 1.6.0_14 Performance Release to obtain this result.

Performance Landscape

SPECjbb2005 Performance Chart (ordered by performance)

bops : SPECjbb2005 Business Operations per Second (bigger is better)

System Processors Performance
Chips Cores Threads GHz Type bops bops/JVM
Sun SPARC Enterprise T5220 1 8 64 1.6 UltraSPARC T2 231464 28933
Sun Blade T6320 1 8 64 1.6 UltraSPARC T2 229576 28697
Fujitsu TX100 1 4 4 3.16 Intel Xeon 223691 111846
IBM x3200 M2 1 4 4 3.16 Intel Xeon 214578 107289
Fujitsu RX100 1 4 4 3.16 Intel Xeon 211144 105572
IBM Power 570 2 4 8 4.7 POWER6 205917 102959
IBM x3350 1 4 4 3.0 Intel Xeon 194256 97128
Sun SPARC Enterprise T5220 1 8 64 1.4 UltraSPARC T2 192055 24007
IBM Power 570 1 2 4 4.7 POWER6 88089 88089

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise T5220
      1x 1.6 GHz UltraSPARC T2 processor
      64 GB

Software Configuration:

    OpenSolaris 2009.06
    Java HotSpot(TM) 32-Bit Server, Version 1.6.0_14 Performance Release

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Key Points and Best Practices

  • Each JVM executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Each JVM was bound to a separate processor containing 1 core to reduce memory access latency using the physical memory closest to the processor.

See Also

Disclosure Statement

SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 8/25/2009 on http://www.spec.org.
Sun SPARC T5220 231464 SPECjbb2005 bops, 28933 SPECjbb2005 bops/JVM Submitted to SPEC for review; IBM p 570 88089 SPECjbb2005 bops, 88089 SPECjbb2005 bops/JVM; Fujitsu TX100 223691 SPECjbb2005 bops, 111846 SPECjbb2005 bops/JVM; IBM x3350 194256 SPECjbb2005 bops, 97128 SPECjbb2005 bops/JVM; Sun SPARC Enterprise T5120 192055 SPECjbb2005 bops, 24007 SPECjbb2005 bops/JVM.

Sun watts were measured on the system during the test.

IBM p 570 2P (1 building blocks) power specifications calculated as 80% of maximum input power reported 7/8/09 in "Facts and Features Report": ftp://ftp.software.ibm.com/common/ssi/pm/br/n/psb01628usen/PSB01628USEN.PDF

Wednesday Jul 22, 2009

Why does 1.6 beat 4.7?

Sun has upgraded the UltraSPARC T2 and UltraSPARC T2 Plus processors to 1.6 GHz. As described in some detail in yesterday's post, new results show SPEC CPU2006 performance improvements vs. previous systems that often exceed the clock speed improvement.  The scaling can be attributed to both memory system improvements and software improvements, such as the Sun Studio 12 Update 1 compiler.

A MHz improvement within a product line is often useful.  If yesterday's chip runs at speed n and today's at n\*1.12 then, intuitively, sure, I'll take today's.

Comparing MHz across product lines is often counter-intuitive.  Consider that Sun's new systems provide:

  • up to 68% more throughput than the 4.7 GHz POWER6+ [1], and
  • up to 3x the throughput of the Itanium 9150N [2].

The comparisons are particularly striking when one takes into account the cache size advantage for both the POWER6+ and the Itanium 9150N, and the MHz advantage for the POWER6+:

Processor GHz Number of
hw cache levels
Size of
last cache
(per chip)
SPECint_rate_base2006
UltraSPARC T2
UltraSPARC T2 Plus
1.6 2 4 MB 1 chip: 89
2 chips: 171
4 chips: 338
POWER6+ 4.7 3 32 MB Best 2 chip result: 102. UltraSPARC T2 Plus delivers 68% more integer throughput [1]
Itanium 9150N 1.6 3 24 MB Best 4 chip result: 114. UltraSPARC T2 Plus delivers 3x the integer throughput. [2]

These are per-chip results, not per-core or per-thread. Sun's CMT processors are designed for overall system throughput: how much work can the overall system get done.  

A mystery: With comparatively smaller caches and modest clock rates, why do the Sun CMT processors win?

The performance hole: Memory latency. From the point of view of a CPU chip, the big performance problem is that memory latency is inordinately long compared to chip cycle times.

A hardware designer can attempt to cover up that latency with very large caches, as in the POWER6+ and Itanium, and this works well when running a small number of modest-sized applications. Large caches become less helpful, though, as workloads become more complex.

MHz isn't everything. In fact, MHz hardly counts at all when the problem is memory latency. Suppose the hot part of an application looks like this:

  loop:
       computational instruction
       computational instruction
       computational instruction
       memory access instruction
       branch to loop

For an application that looks like this, the computational instructions may complete in only a few cycles, while the memory access instruction may easily require on the order of 100ns - which, for a 1 GHz chip, is on the order of 100 cycles. If the processor speed is increased by a factor of 4, but memory speed is not, then memory is still 100ns away, and when measured in cycles, it is now 400 cycles distant. The overall loop hardly speeds up at all.

Lest the reader think I am making this up - consider page 8 of this IBM talk from April, 2008 regarding the POWER6:

latencies

The IBM POWER systems have some impressive performance characteristics - if your application is tiny enough to fit in its first or second level cache. But memory latency is not impressive. If your workload requires multiple concurrent threads accessing a large memory space, Sun's CMT approach just might be a better fit.

Operating System Overhead A context switch from one process to another is mediated by operating system services. The OS parks context from the process that is currently running - typically saving dozens of program registers and other context (such as virtual address space information); decides which process to run next (which may require access to several OS data structures); and loads the context for the new process (registers, virtual address context, etc.). If the system is running many processes, then caches are unlikely to be helpful during this context switch, and thousands of cycles may be spent on main memory accesses.

Design for throughput: Sun's CMT approach handles the complexity of real-world applications by allowing up to 64 processes to be simultaneously on-chip. When a long-latency stall occurs, such as an access to main memory, the chip switches to executing instructions on behalf of other, non-stalled threads, thus improving overall system throughput. No operating system intervention is required as resources are shared among the processes on the chip.

[1] http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090427-07263.html
[2] http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090522-07485.html

Competitive results retrieved from www.spec.org   20 July 2009.  Sun's CMT results have been submitted to SPEC.  SPEC, SPECfp, SPECint are registered trademarks of the Standard Performance Evaluation Corporation.

Tuesday Jul 21, 2009

Sun T5440 Oracle BI EE World Record Performance

Oracle BI EE Sun SPARC Enterprise T5440 World Record Performance

The Sun SPARC Enterprise T5440 server running the new 1.6 GHz UltraSPARC T2 Plus processor delivered world record performance on Oracle Business Intelligence Enterprise Edition (BI EE) tests using Sun's ZFS.
  • The Sun SPARC Enterprise T5440 server with four 1.6 GHz UltraSPARC T2 Plus processors delivered the best single system performance of 28K concurrent users on the Oracle BI EE benchmark. This result used Solaris 10 with Solaris Containers and the Oracle 11g Database software.

  • The benchmark demonstrates the scalability of Oracle Business Intelligence Cluster with 4 nodes running in Solaris Containers within single Sun SPARC Enterprise T5440 server.

  • The Sun SPARC Enterprise Server T5440 server with internal SSD and the ZFS file system showed significant I/O performance improvement over traditional disk for Business Intelligence Web Catalog activity.

Performance Landscape

System Processors Users
Chips Cores Threads GHz Type
1 x Sun SPARC Enterprise T5440 4 32 256 1.6 UltraSPARC T2 Plus 28,000
5 x Sun Fire T2000 1 8 32 1.2 UltraSPARC T1 10,000

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise T5440
      4 x 1.6 GHz UltraSPARC T2 Plus processors
      256 GB
      STK2540 (6 x 146GB)

Software Configuration:

    Solaris 10 5/09
    Oracle BIEE 10.1.3.4 64-bit
    Oracle 11g R1 Database

Benchmark Description

The objective of this benchmark is to highlight how Oracle BI EE can support pervasive deployments in large enterprises, using minimal hardware, by simulating an organization that needs to support more than 25,000 active concurrent users, each operating in mixed mode: ad-hoc reporting, application development, and report viewing.

The user population was divided into a mix of administrative users and business users. A maximum of 28,000 concurrent users were actively interacting and working in the system during the steady-state period. The tests executed 580 transactions per second, with think times of 60 seconds per user, between requests. In the test scenario 95% of the workload consisted of business users viewing reports and navigating within dashboards. The remaining 5% of the concurrent users, categorized as administrative users, were doing application development.

The benchmark scenario used a typical business user sequence of dashboard navigation, report viewing, and drill down. For example, a Service Manager logs into the system and navigates to his own set of dashboards viz. .Service Manager.. The user then selects the .Service Effectiveness. dashboard, which shows him four distinct reports, .Service Request Trend., .First Time Fix Rate., .Activity Problem Areas., and .Cost Per completed Service Call . 2002 till 2005. . The user then proceeds to view the .Customer Satisfaction. dashboard, which also contains a set of 4 related reports. He then proceeds to drill-down on some of the reports to see the detail data. Then the user proceeds to more dashboards, for example .Customer Satisfaction. and .Service Request Overview.. After navigating through these dashboards, he logs out of the application

This benchmark did not use a synthetic database schema. The benchmark tests were run on a full production version of the Oracle Business Intelligence Applications with a fully populated underlying database schema. The business processes in the test scenario closely represents a true customer scenario.

Key Points and Best Practices

Since the server has 32 cores, we created 4 x Solaris Containers with 8 cores dedicated to each of the containers. And a total of four instances of BI server + Presentation server (collectively referred as an 'instance' here onwards) were installed at one instance per container. All the four BI instances were clustered using the BI Cluster software components.

The ZFS file system was used to overcome the 'Too many links' error when there are ~28,000 concurrent users. Earlier the file system has reached UFS limitation of 32767 sub-directories (LINK_MAX) with ~28K users online -- and there are thousands of errors due to the inability to create new directories beyond 32767 directories within a directory. Web Catalog stores the user profile on the disk by creating at least one dedicated directory for each user. If there are more than 25,000 concurrent users, clearly ZFS is the way to go.

See Also:

Oracle Business Intelligence Website,  BUSINESS INTELLIGENCE has other results

Disclosure Statement

Oracle Business Intelligence Enterprise Edition benchmark, see http://www.oracle.com/solutions/business_intelligence/resource-library-whitepapers.html for more. Results as of 7/20/09.

Sun T5440 World Record SAP-SD 4-Processor Two-tier SAP ERP 6.0 EP 4 (Unicode)

Sun SPARC Enterprise T5440 Server World Record Four Processor performance result on Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • World Record performance result with four processors on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark as of July 21, 2009.
  • The Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC-T2 Plus processors (32 cores, 256 threads)achieved 4,720 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using Oracle10g database and Solaris 10 OS.
  • Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC T2 Plus processors beats IBM System 550 by 26% using Oracle10g and Solaris 10 even though they both use the same number of processors.
  • Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC T2 Plus processors beats HP ProLiant DL585 G6 using Oracle10g and Solaris 10 with the same number of processors.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun SPARC Enterprise servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. Refer to SAP Note for more details (https://service.sap.com/sap/support/notes/1139642 Note: User and password for SAP Service Marketplace required).
  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to SAP Note for more details (https://service.sap.com/sap/support/notes/1139642 Note: User and password for SAP Service Marketplace required).

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun SPARC Enterprise T5440 Server
4xUltraSPARC T2 Plus@1.6GHz
256 GB
Solaris 10
Oracle10g
4,720 2009
6.0 EP4
(Unicode)
25,830 6,458 21-Jul-09
HP ProLiant DL585 G6
4xAMD Opteron 8439 SE @2.8Hz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
4,665 2009
6.0 EP4
(Unicode)
25,530 6,383 10-Jul-09
HP ProLiant BL685c G6
4xAMD Opteron Processor 8435 @2.6GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
4,422 2009
6.0 EP4
(Unicode)
24,230 6,058 29-May-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
HP ProLiant DL585 G5
4xAMD Opteron Processor 8393 SE@3.1GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,430 2009
6.0 EP4
(Unicode)
18,730 4,683 24-Apr-09
HP ProLiant BL685 G6
4xAMD Opteron Processor 8389 @2.9GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,118 2009
6.0 EP4
(Unicode)
17,050 4,263 24-Apr-09
NEC Express5800
4xIntel Xeon Processor X7460@2.66GHz
64 GB
Windows Server 2008 Enterprise Edition
SQL Server 2008
2,957 2009
6.0 EP4
(Unicode)
16,170 4,043 28-May-09
Dell PowerEdge M905
4xAMD Opteron Processor 8384@2.7GHz
96 GB
Windows Server 2003 Enterprise Edition
SQL Server 2005
2,129 2009
6.0 EP4
(Unicode)
11,770 2,943 18-May-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun SPARC Enterprise T5440 Server
      4 x 1.6 GHz UltraSPARC T2 Plus processors (4 processors / 32 cores / 256 threads)
      256 GB memory
      3 x STK2540 each with 12 x 73GB/15KRPM disks

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle10g
SAE (Strategic Applications Engineering) and ISV-E (ISV Engineering) have submitted the following result for the SAP-SD 2-Tier benchmark. It was approved and published by SAP.

Certified Results

    Performance:
    4720 benchmark users
    SAP Certification:
    2009026

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Sun SPARC Enterprise T5440 Server Benchmark Details

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 07/21/09: Sun SPARC Enterprise T5440 Server (4 processors, 32 cores, 256 threads) 4,720 SAP SD Users, 4x 1.6 GHz UltraSPARC T2 Plus, 256 GB memory, Oracle10g, Solaris10, Cert# 2009026. HP ProLiant DL585 G6 (4 processors, 24 cores, 24 threads) 4,665 SAP SD Users, 4x 2.8 GHz AMD Opteron Processor 8439 SE, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009025. HP ProLiant BL685c G6 (4 processors, 24 cores, 24 threads) 4,422 SAP SD Users, 4x 2.6 GHz AMD Opteron Processor 8435, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009021. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL585 G5 (4 processors, 16 cores, 16 threads) 3,430 SAP SD Users, 4x 3.1 GHz AMD Opteron Processor 8393 SE, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009008. HP ProLiant BL685 G6 (4 processors, 16 cores, 16 threads) 3,118 SAP SD Users, 4x 2.9 GHz AMD Opteron Processor 8389, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009007. NEC Express5800 (4 processors, 24 cores, 24 threads) 2,957 SAP SD Users, 4x 2.66 GHz Intel Xeon Processor X7460, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009018. Dell PowerEdge M905 (4 processors, 16 cores, 16 threads) 2,129 SAP SD Users, 4x 2.7 GHz AMD Opteron Processor 8384, 96 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2009017. Sun Fire X4600M2 (8 processors, 32 cores, 32 threads) 7,825 SAP SD Users, 8x 2.7 GHz AMD Opteron 8384, 128 GB memory, MaxDB 7.6, Solaris 10, Cert# 2008070. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071. SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

1.6 GHz SPEC CPU2006 - Rate Benchmarks

UltraSPARC T2 and T2 Plus Systems

Improved Performance Over 1.4 GHz

Reported 07/21/09

Significance of Results

Results are presented for the SPEC CPU2006 rate benchmarks run on the new 1.6 GHz Sun UltraSPARC T2 and Sun UltraSPARC T2 Plus processors based systems. The new processors were tested in the Sun CMT family of systems, including the Sun SPARC Enterprise T5120, T5220, T5240, T5440 servers and the Sun Blade T6320 server module.

SPECint_rate2006

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered 57% and 37% better results than the best 4-chip IBM POWER6+ based systems on the SPEC CPU2006 integer throughput metrics.

  • The Sun SPARC Enterprise T5240 server equipped with two 1.6 GHz UltraSPARC T2 Plus processor chips, produced 68% and 48% better results than the best 2-chip IBM POWER6+ based systems on the SPEC CPU2006 integer throughput metrics.

  • The single-chip 1.6 GHz UltraSPARC T2 processor-based Sun CMT servers produced 59% to 68% better results than the best single-chip IBM POWER6 based systems on the SPEC CPU2006 integer throughput metrics.

  • On the four-chip Sun SPARC Enterprise T5440 server, when compared versus the 1.4 GHz version of this server, the new 1.6 GHz UltraSPARC T2 Plus processor delivered performance improvements of 25% and 20% as measured by the SPEC CPU2006 integer throughput metrics.

  • The new 1.6 GHz UltraSPARC T2 Plus processor, when put into the 2-chip Sun SPARC Enterprise T5240 server, delivered improvements of 20% and 17% when compared to the 1.4 GHz UltraSPARC T2 Plus processor based server, as measured by the SPEC CPU2006 integer throughput metrics.

  • On the single-chip Sun Blade T6320 server module, Sun SPARC Enterprise T5120 and T5220 servers, the new 1.6 GHz UltraSPARC T2 processor delivered performance improvements of 13% to 17% over the 1.4 GHz version of these servers, as measured by the SPEC CPU2006 integer throughput metrics.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered a SPECint_rate_base2006 score 3X the best 4-chip Itanium based system.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processors, delivered a SPECint_rate_base2006 score of 338, a World Record score for 4-chip systems running a single operating system instance (i.e. SMP, not clustered).

SPECfp_rate2006

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered 35% and 22% better results than the best 4-chip IBM POWER6+ based systems on the SPEC CPU2006 floating-point throughput metrics.

  • The Sun SPARC Enterprise T5240 server, equipped with two 1.6 GHz UltraSPARC T2 Plus processor chips, produced 40% and 27% better results than the best 2-chip IBM POWER6+ based systems on the SPEC CPU2006 floating-point throughput metrics.

  • The single 1.6 GHz UltraSPARC T2 processor based Sun CMT servers produced between 24% and 18% better results than the best single-chip IBM POWER6 based systems on the SPEC CPU2006 floating-point throughput metrics.

  • On the four chip Sun SPARC Enterprise T5440 server, the new 1.6 GHz UltraSPARC T2 Plus processor delivered performance improvements of 20% and 17% when compared to 1.4 GHz processors in the same system, as measured by the SPEC CPU2006 floating-point throughput metrics.

  • The new 1.6 GHz UltraSPARC T2 Plus processor, when put into a Sun SPARC Enterprise T5240 server, delivered an improvement of 12% when compared to the 1.4 GHz UltraSPARC T2 Plus processor based server as measured by the SPEC CPU2006 floating-point throughput metrics.

  • On the single processor Sun Blade T6320 server module, Sun SPARC Enterprise T5120 and T5220 servers, the new 1.6 GHz UltraSPARC T2 processor delivered a performance improvement over the 1.4 GHz version of these servers of between 11% and 10% as measured by the SPEC CPU2006 floating-point throughput metrics.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered a peak score 3X the best 4-chip Itanium based system, and base 2.9X, on the SPEC CPU2006 floating-point throughput metrics.

Performance Landscape

SPEC CPU2006 Performance Charts - bigger is better, selected results, please see www.spec.org for complete results. All results as of 7/17/09.

In the tables below
"Base" = SPECint_rate_base2006 or SPECfp_rate_base2006
"Peak" = SPECint_rate2006 or SPECfp_rate2006

SPECint_rate2006 results - 1 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Supermicro X8DAI 4/1 Xeon W3570 3200 8 127 136 Best Nehalem result
HP ProLiant BL465c G6 6/1 Opteron 2435 2600 6 82.1 104 Best Istanbul result
Sun SPARC T5220 8/1 UltraSPARC T2 1582 63 89.1 97.0 New
Sun SPARC T5120 8/1 UltraSPARC T2 1582 63 89.1 97.0 New
Sun Blade T6320 8/1 UltraSPARC T2 1582 63 89.2 96.7 New
Sun Blade T6320 8/1 UltraSPARC T2 1417 63 76.4 85.5
Sun SPARC T5120 8/1 UltraSPARC T2 1417 63 76.2 83.9
IBM System p 570 2/1 POWER6 4700 4 53.2 60.9 Best POWER6 result

SPECint_rate2006 - 2 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Fujitsu CELSIUS R670 8/2 Xeon W5580 3200 16 249 267 Best Nehalem result
Sun Blade X6270 8/2 Xeon X5570 2933 16 223 260
A+ Server 1021M-UR+B 12/2 Opteron 2439 SE 2800 12 168 215 Best Istanbul result
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1582 127 171 183 New
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1415 127 142 157
IBM Power 520 4/2 POWER6+ 4700 8 101 124 Best POWER6+ peak
IBM Power 520 4/2 POWER6+ 4700 8 102 122 Best POWER6+ base
HP Integrity rx2660 4/2 Itanium 9140M 1666 4 58.1 62.8 Best Itanium peak
HP Integrity BL860c 4/2 Itanium 9140M 1666 4 61.0 na Best Itanium base

SPECint_rate2006 - 4 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
SGI Altix ICE 8200EX 16/4 Xeon X5570 2933 32 466 499 Best Nehalem result
Note: clustered, not SMP
Tyan Thunder n4250QE 24/4 Opteron 8439 SE 2800 24 326 417 Best Istanbul result
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1596 255 338 360 New.  World record for
4-chip SMP
SPECint_rate_base2006
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1414 255 270 301
IBM Power 550 8/4 POWER6+ 5000 16 215 263 Best POWER6 result
HP Integrity BL870c 8/4 Itanium 9150N 1600 8 114 na Best Itanium result

SPECfp_rate2006 - 1 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Supermicro X8DAI 4/1 Xeon W3570 3200 8 102 106 Best Nehalem result
HP ProLiant BL465c G6 6/1 Opteron 2435 2600 6 65.2 72.2 Best Istanbul result
Sun SPARC T5220 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun SPARC T5120 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun Blade T6320 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun Blade T6320 8/1 UltraSPARC T2 1417 63 58.1 62.3
SPARC T5120 8/1 UltraSPARC T2 1417 63 57.9 62.3
SPARC T5220 8/1 UltraSPARC T2 1417 63 57.9 62.3
IBM System p 570 2/1 POWER6 4700 4 51.5 58.0 Best POWER6 result

SPECfp_rate2006 - 2 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
ASUS TS700-E6 8/2 Xeon W5580 3200 16 201 207 Best Nehalem result
A+ Server 1021M-UR+B 12/2 Opteron 2439 SE 2800 12 133 147 Best Istanbul result
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1582 127 124 133 New
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1415 127 111 119
IBM Power 520 4/2 POWER6+ 4700 8 88.7 105 Best POWER6+ result
HP Integrity rx2660 4/4 Itanium 9140M 1666 4 54.5 55.8 Best Itanium result

SPECfp_rate2006 - 4 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
SGI Altix ICE 8200EX 16/4 Xeon X5570 2933 32 361 372 Best Nehalem result
Tyan Thunder n4250QE 24/4 Opteron 8439 SE 2800 24 259 285 Best Istanbul result
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1596 255 254 270 New
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1414 255 212 230
IBM Power 550 8/4 POWER6+ 5000 16 188 222 Best POWER6+ result
HP Integrity rx7640 8/4 Itanium 2 9040 1600 8 87.4 90.8 Best Itanium result

Results and Configuration Summary

Test Configurations:


Sun Blade T6320
1.6 GHz UltraSPARC T2
64 GB (16 x 4GB)
Solaris 10 10/08
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5120/T5220
1.6 GHz UltraSPARC T2
64 GB (16 x 4GB)
Solaris 10 10/08
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5240
2 x 1.6 GHz UltraSPARC T2 Plus
128 GB (32 x 4GB)
Solaris 10 5/09
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5440
4 x 1.6 GHz UltraSPARC T2 Plus
256 GB (64 x 4GB)
Solaris 10 5/09
Sun Studio 12 Update 1, gccfss V4.2.1

Results Summary:



T6320 T5120 T5220 T5240 T5440
SPECint_rate_base2006 89.2 89.1 89.1 171 338
SPECint_rate2006 96.7 97.0 97.0 183 360
SPECfp_rate_base2006 64.1 64.1 64.1 124 254
SPECfp_rate2006 68.5 68.5 68.5 133 270

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark, with over 7000 results published in the three years since it was introduced. It measures:

  • "Speed" - single copy performance of chip, memory, compiler
  • "Rate" - multiple copy (throughput)

The rate metrics are used for the throughput-oriented systems described on this page. These metrics include:

  • SPECint_rate2006: throughput for 12 integer benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding
  • SPECfp_rate2006: throughput for 17 floating point benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

There are "base" variants of both the above metrics that require more conservative compilation, such as using the same flags for all benchmarks.

See here for additional information.

Key Points and Best Practices

Result on this page for the Sun SPARC Enterprise T5120 server were measured on a Sun SPARC Enterprise T5220. The Sun SPARC Enterprise T5120 and Sun SPARC Enterprise T5220 are electronically equivalent. A SPARC Enterprise 5120 can hold up to 4 disks, and a T5220 can hold up to 8. This system was tested with 4 disks; therefore, results on this page apply to both the T5120 and the T5220.

Know when you need throughput vs. speed. The Sun CMT systems described on this page provide massive throughput, as demonstrated by the fact that up to 255 jobs are run on the 4-chip system, 127 on 2-chip, and 63 on 1-chip. Some of the competitive chips do have a speed advantage - e.g. Nehalem and Istanbul - but none of the competitive results undertake to run the large number of jobs tested on Sun's CMT systems.

Use the latest compiler. The Sun Studio group is always working to improve the compiler. Sun Studio 12, and Sun Studio 12 Update 1, which are used in these submissions, provide updated code generation for a wide variety of SPARC and x86 implementations.

I/O still counts. Even in a CPU-intensive workload, some I/O remains. This point is explored in some detail at http://blogs.sun.com/jhenning/entry/losing_my_fear_of_zfs.

Disclosure Statement

SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Competitive results from www.spec.org as of 16 July 2009.  Sun's new results quoted on this page have been submitted to SPEC.
Sun Blade T6320 89.2 SPECint_rate_base2006, 96.7 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006;
Sun SPARC Enterprise T5220/T5120 89.1 SPECint_rate_base2006, 97.0 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006;
Sun SPARC Enterprise T5240 172 SPECint_rate_base2006, 183 SPECint_rate2006, 124 SPECfp_rate_base2006, 133 SPECfp_rate2006;
Sun SPARC Enterprise T5440 338 SPECint_rate_base2006, 360 SPECint_rate2006, 254 SPECfp_rate_base2006, 270 SPECfp_rate2006;
Sun Blade T6320 76.4 SPECint_rate_base2006, 85.5 SPECint_rate2006, 58.1 SPECfp_rate_base2006, 62.3 SPECfp_rate2006;
Sun SPARC Enterprise T5220/T5120 76.2 SPECint_rate_base2006, 83.9 SPECint_rate2006, 57.9 SPECfp_rate_base2006, 62.3 SPECfp_rate2006;
Sun SPARC Enterprise T5240 142 SPECint_rate_base2006, 157 SPECint_rate2006, 111 SPECfp_rate_base2006, 119 SPECfp_rate2006;
Sun SPARC Enterprise T5440 270 SPECint_rate_base2006, 301 SPECint_rate2006, 212 SPECfp_rate_base2006, 230 SPECfp_rate2006;
IBM p 570 53.2 SPECint_rate_base2006, 60.9 SPECint_rate2006, 51.5 SPECfp_rate_base2006, 58.0 SPECfp_rate2006;
IBM Power 520 102 SPECint_rate_base2006, 124 SPECint_rate2006, 88.7 SPECfp_rate_base2006, 105 SPECfp_rate2006;
IBM Power 550 215 SPECint_rate_base2006, 263 SPECint_rate2006, 188 SPECfp_rate_base2006, 222 SPECfp_rate2006;
HP Integrity BL870c 114 SPECint_rate_base2006;
HP Integrity rx7640 87.4 SPECfp_rate_base2006, 90.8 SPECfp_rate2006.

Sun Blade T6320 World Record SPECjbb2005 performance

Significance of Results

The Sun Blade T6320 server module equipped with one UltraSPARC T2 processor running at 1.6 GHz delivered a World Record single-chip result while running the SPECjbb2005 benchmark.

  • The Sun Blade T6320 server module powered by one 1.6 GHz UltraSPARC T2 processor delivered a result of 229576 SPECjbb2005 bops, 28697 SPECjbb2005 bops/JVM when running the SPECjbb2005 benchmark.
  • The Sun Blade T6320 server module (with one 1.6 GHz UltraSPARC T2 processor) demonstrated 2.6X better performance than the IBM System p 570 with one 4.7 GHz POWER6 processor.
  • The Sun Blade T6320 server module (with one 1.6 GHz UltraSPARC T2 processor) demonstrated 3% better performance than the Fujitsu TX100 result which used one 3.16 GHz Intel Xeon X3380 processor.
  • The Sun Blade T6320 server module (with one 1.6 GHz UltraSPARC T2 processor) demonstrated 7% better performance than the IBM x3200 result which used one 3.16 GHz Xeon X3380 processor.
  • The Sun Blade T6320 server module running the 1.6 GHz UltraSPARC T2 processor delivered 20% better performance than a Sun SPARC Enterprise T5120 with the 1.4 GHz UltraSPARC T2 processor.
  • The Sun Blade T6320 used the OpenSolaris 2009.06 operation system and the Java HotSpot(TM) 32-Bit Server, Version 1.6.0_14 Performance Release JVM to obtain this leading result.

Performance Landscape

SPECjbb2005 Performance Chart (ordered by performance)

bops: SPECjbb2005 Business Operations per Second (bigger is better)

System Processors Performance
Chips Cores Threads GHz Type SPECjbb2005
bops
SPECjbb2005
bops/JVM
Sun Blade T6320 1 8 64 1.6 UltraSPARC T2 229576 28697
Fujitsu TX100 1 4 4 3.16 Intel Xeon 223691 111846
IBM x3200 M2 1 4 4 3.16 Intel Xeon 214578 107289
Fujitsu RX100 1 4 4 3.16 Intel Xeon 211144 105572
IBM x3350 1 4 4 3.0 Intel Xeon 194256 97128
Sun SE T5120 1 8 64 1.4 UltraSPARC T2 192055 24007
IBM p 570 1 2 4 4.7 POWER6 88089 88089

Complete benchmark results may be found at the SPEC benchmark website http://www.spec.org.

Results and Configuration Summary

Hardware Configuration:

    Sun Blade T6320
      1 x 1.6 GHz UltraSPARC T2 processor
      64 GB

Software Configuration:

    OpenSolaris 2009.06
    Java HotSpot(TM) 32-Bit Server, Version 1.6.0_14 Performance Release

Benchmark Description

SPECjbb2005 (Java Business Benchmark) measures the performance of a Java implemented application tier (server-side Java). The benchmark is based on the order processing in a wholesale supplier application. The performance of the user tier and the database tier are not measured in this test. The metrics given are number of SPECjbb2005 bops (Business Operations per Second) and SPECjbb2005 bops/JVM (bops per JVM instance).

Key Points and Best Practices

  • Enhancements to the JVM had a major impact on performance.
  • Each JVM executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Each JVM bound to a separate processor containing 1 core to reduce memory access latency using the physical memory closest to the processor.

See Also

Disclosure Statement

SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Results as of 7/17/2009 on http://www.spec.org. SPECjbb2005, Sun Blade T6320 229576 SPECjbb2005 bops, 28697 SPECjbb2005 bops/JVM; IBM p 570 88089 SPECjbb2005 bops, 88089 SPECjbb2005 bops/JVM; Fujitsu TX100 223691 SPECjbb2005 bops, 111846 SPECjbb2005 bops/JVM; IBM x3350 194256 SPECjbb2005 bops, 97128 SPECjbb2005 bops/JVM; Sun SPARC Enterprise T5120 192055 SPECjbb2005 bops, 24007 SPECjbb2005 bops/JVM.
About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« May 2016
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today