Monday Oct 26, 2015

Oracle E-Business Suite Applications R12.1.3 (OLTP X-Large): SPARC M7-8 World Record

Oracle's SPARC M7-8 server, using a four-chip Oracle VM Server for SPARC (LDom) virtualized server, produced a world record 20,000 users running the Oracle E-Business OLTP X-Large benchmark. The benchmark runs five Oracle E-Business online workloads concurrently: Customer Service, iProcurement, Order Management, Human Resources Self-Service, and Financials.

  • The virtualized four-chip LDom on the SPARC M7-8 was able to handle more users than the previous best result which used eight processors of Oracle's SPARC M6-32 server.

  • The SPARC M7-8 server using Oracle VM Server for SPARC provides enterprise applications high availability, where each application is executed on its own environment, insulated and independent of the others.

Performance Landscape

Oracle E-Business (3-tier) OLTP X-Large Benchmark
System Chips Total Online Users Weighted Average
Response Time (sec)
90th Percentile
Response Time (s)
SPARC M7-8 4 20,000 0.70 1.13
SPARC M6-32 8 18,500 0.61 1.16

Break down of the total number of users by component.

Users per Component
Component SPARC M7-8 SPARC M6-32
Total Online Users 20,000 users 18,500 users
HR Self-Service
Order-to-Cash
iProcurement
Customer Service
Financial
5000 users
2500 users
2700 users
7000 users
2800 users
4000 users
2300 users
2400 users
7000 users
2800 users

Configuration Summary

System Under Test:

SPARC M7-8 server
8 x SPARC M7 processors (4.13 GHz)
4 TB memory
2 x 600 GB SAS-2 HDD
using a Logical Domain with
4 x SPARC M7 processors (4.13 GHz)
2 TB memory
2 x Sun Storage Dual 16Gb Fibre Channel PCIe Universal HBA
2 x Sun Dual Port 10GBase-T Adapter
Oracle Solaris 11.3
Oracle E-Business Suite 12.1.3
Oracle Database 11g Release 2

Storage Configuration:

4 x Oracle ZFS Storage ZS3-2 appliances each with
2 x Read Flash Accelerator SSD
1 x Storage Drive Enclosure DE2-24P containing:
20 x 900 GB 10K RPM SAS-2 HDD
4 x Write Flash Accelerator SSD
1 x Sun Storage Dual 8Gb FC PCIe HBA
Used for Database files, Zones OS, EBS Mid-Tier Apps software stack
and db-tier Oracle Server
2 x Sun Server X4-2L server with
2 x Intel Xeon Processor E5-2650 v2
128 GB memory
1 x Sun Storage 6Gb SAS PCIe RAID HBA
4 x 400 GB SSD
14 x 600 GB HDD
Used for Redo log files, db backup storage.

Benchmark Description

The Oracle E-Business OLTP X-Large benchmark simulates thousands of online users executing transactions typical of an internal Enterprise Resource Processing, simultaneously executing five application modules: Customer Service, Human Resources Self Service, iProcurement, Order Management and Financial.

Each database tier uses a database instance of about 600 GB in size, supporting thousands of application users, accessing hundreds of objects (tables, indexes, SQL stored procedures, etc.).

Key Points and Best Practices

This test demonstrates virtualization technologies running concurrently various Oracle multi-tier business critical applications and databases on four SPARC M7 processors contained in a single SPARC M7-8 server supporting thousands of users executing a high volume of complex transactions with constrained (<1 sec) weighted average response time.

The Oracle E-Business LDom is further configured using Oracle Solaris Zones.

This result of 20,000 users was achieved by load balancing the Oracle E-Business Suite Applications 12.1.3 five online workloads across two Oracle Solaris processor sets and redirecting all network interrupts to a dedicated third processor set.

Each applications processor set (set-1 and set-2) was running concurrently two Oracle E-Business Suite Application servers and two database servers instances, each within its own Oracle Solaris Zone (4 x Zones per set).

Each application server network interface (to a client) was configured to map with the locality group associated to the CPUs processing the related workload, to guarantee memory locality of network structures and application servers hardware resources.

All external storage was connected with at least two paths to the host multipath-capable fibre channel controller ports and Oracle Solaris I/O multipathing feature was enabled.

See Also

Disclosure Statement

Oracle E-Business Suite R12 extra-large multiple-online module benchmark, SPARC M7-8, SPARC M7, 4.13 GHz, 4 chips, 128 cores, 1024 threads, 2 TB memory, 20,000 online users, average response time 0.70 sec, 90th percentile response time 1.13 sec, Oracle Solaris 11.3, Oracle Solaris Zones, Oracle VM Server for SPARC, Oracle E-Business Suite 12.1.3, Oracle Database 11g Release 2, Results as of 10/25/2015.

Monday Nov 25, 2013

World Record Single System TPC-H @10000GB Benchmark on SPARC T5-4

Oracle's SPARC T5-4 server delivered world record single server performance of 377,594 QphH@10000GB with price/performance of $4.65/QphH@10000GB USD on the TPC-H @10000GB benchmark. This result shows that the 4-chip SPARC T5-4 server is significantly faster than the 8-chip server results from HP (Intel x86 based).

  • The SPARC T5-4 server with four SPARC T5 processors is 2.4 times faster than the HP ProLiant DL980 G7 server with eight x86 processors.

  • The SPARC T5-4 server delivered 4.8 times better performance per chip and 3.0 times better performance per core than the HP ProLiant DL980 G7 server.

  • The SPARC T5-4 server has 28% better price/performance than the HP ProLiant DL980 G7 server (for the price/QphH metric).

  • The SPARC T5-4 server with 2 TB memory is 2.4 times faster than the HP ProLiant DL980 G7 server with 4 TB memory (for the composite metric).

  • The SPARC T5-4 server took 9 hours, 37 minutes, 54 seconds for data loading while the HP ProLiant DL980 G7 server took 8.3 times longer.

  • The SPARC T5-4 server accomplished the refresh function in around a minute, the HP ProLiant DL980 G7 server took up to 7.1 times longer to do the same function.

This result demonstrates a complete data warehouse solution that shows the performance both of individual and concurrent query processing streams, faster loading, and refresh of the data during business operations. The SPARC T5-4 server delivers superior performance and cost efficiency when compared to the HP result.

Performance Landscape

The table lists the leading TPC-H @10000GB results for non-clustered systems.

TPC-H @10000GB, Non-Clustered Systems
System
Processor
P/C/T – Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC T5-4
3.6 GHz SPARC T5
4/64/512 – 2048 GB
377,594.3 $4.65 342,714.1 416,024.4 Oracle 11g R2 11/25/13
HP ProLiant DL980 G7
2.4 GHz Intel Xeon E7-4870
8/80/160 – 4096 GB
158,108.3 $6.49 185,473.6 134,780.5 SQL Server 2012 04/15/13

P/C/T = Processors, Cores, Threads
QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric in USD (smaller is better)
QppH = the Power Numerical Quantity (bigger is better)
QthH = the Throughput Numerical Quantity (bigger is better)

The following table lists data load times and average refresh function times.

TPC-H @10000GB, Non-Clustered Systems
Database Load & Database Refresh
System
Processor
Data Loading
(h:m:s)
T5
Advan
RF1
(sec)
T5
Advan
RF2
(sec)
T5
Advan
SPARC T5-4
3.6 GHz SPARC T5
09:37:54 8.3x 58.8 7.1x 62.1 6.4x
HP ProLiant DL980 G7
2.4 GHz Intel Xeon E7-4870
79:28:23 1.0x 416.4 1.0x 394.9 1.0x

Data Loading = database load time
RF1 = throughput average first refresh transaction
RF2 = throughput average second refresh transaction
T5 Advan = the ratio of time to the SPARC T5-4 server time

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server Under Test:

SPARC T5-4 server
4 x SPARC T5 processors (3.6 GHz total of 64 cores, 512 threads)
2 TB memory
2 x internal SAS (2 x 300 GB) disk drives
12 x 16 Gb FC HBA

External Storage:

24 x Sun Server X4-2L servers configured as COMSTAR nodes, each with
2 x 2.5 GHz Intel Xeon E5-2609 v2 processors
4 x Sun Flash Accelerator F80 PCIe Cards, 800 GB each
6 x 4 TB 7.2K RPM 3.5" SAS disks
1 x 8 Gb dual port HBA

2 x 48 port Brocade 6510 Fibre Channel Switches

Software Configuration:

Oracle Solaris 11.1
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 10000 GB (Scale Factor 10000)
TPC-H Composite: 377,594.3 QphH@10000GB
Price/performance: $4.65/QphH@10000GB USD
Available: 11/25/2013
Total 3 year Cost: $1,755,709 USD
TPC-H Power: 342,714.1
TPC-H Throughput: 416,024.4
Database Load Time: 9:37:54

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

  • COMSTAR (Common Multiprotocol SCSI Target) is the software framework that enables an Oracle Solaris host to serve as a SCSI Target platform. COMSTAR uses a modular approach to break the huge task of handling all the different pieces in a SCSI target subsystem into independent functional modules which are glued together by the SCSI Target Mode Framework (STMF). The modules implementing functionality at SCSI level (disk, tape, medium changer etc.) are not required to know about the underlying transport. And the modules implementing the transport protocol (FC, iSCSI, etc.) are not aware of the SCSI-level functionality of the packets they are transporting. The framework hides the details of allocation providing execution context and cleanup of SCSI commands and associated resources and simplifies the task of writing the SCSI or transport modules.

  • The SPARC T5-4 server achieved a peak IO rate of 37 GB/sec from the Oracle database configured with this storage.

  • Twelve COMSTAR nodes were mirrored to another twelve COMSTAR nodes on which all of the Oracle database files were placed. IO performance was high and balanced across all the nodes.

  • Oracle Solaris 11.1 required very little system tuning.

  • Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric when comparing systems.

  • The SPARC T5-4 server and Oracle Solaris efficiently managed the system load of nearly two thousand Oracle Database parallel processes.

See Also

Disclosure Statement

TPC Benchmark, TPC-H, QphH, QthH, QppH are trademarks of the Transaction Processing Performance Council (TPC). Results as of 11/25/13, prices are in USD. SPARC T5-4 www.tpc.org/3293; HP ProLiant DL980 G7 www.tpc.org/3285.

Wednesday Jun 12, 2013

SPARC T5-4 Produces World Record Single Server TPC-H @3000GB Benchmark Result

Oracle's SPARC T5-4 server delivered world record single server performance of 409,721 QphH@3000GB with price/performance of $3.94/QphH@3000GB on the TPC-H @3000GB benchmark. This result shows that the 4-chip SPARC T5-4 server is significantly faster than the 8-chip server results from IBM (POWER7 based) and HP (Intel x86 based).

This result demonstrates a complete data warehouse solution that shows the performance both of individual and concurrent query processing streams, faster loading, and refresh of the data during business operations. The SPARC T5-4 server delivers superior performance and cost efficiency when compared to the IBM POWER7 result.

  • The SPARC T5-4 server with four SPARC T5 processors is 2.1 times faster than the IBM Power 780 server with eight POWER7 processors and 2.5 times faster than the HP ProLiant DL980 G7 server with eight x86 processors on the TPC-H @3000GB benchmark. The SPARC T5-4 server also delivered better performance per core than these eight processor systems from IBM and HP.

  • The SPARC T5-4 server with four SPARC T5 processors is 2.1 times faster than the IBM Power 780 server with eight POWER7 processors on the TPC-H @3000GB benchmark.

  • The SPARC T5-4 server costs 38% less per $/QphH@3000GB compared to the IBM Power 780 server with the TPC-H @3000GB benchmark.

  • The SPARC T5-4 server took 2 hours, 6 minutes, 4 seconds for data loading while the IBM Power 780 server took 2.8 times longer.

  • The SPARC T5-4 server executed the first refresh function (RF1) in 19.4 seconds, the IBM Power 780 server took 7.6 times longer.

  • The SPARC T5-4 server with four SPARC T5 processors is 2.5 times faster than the HP ProLiant DL980 G7 server with the same number of cores on the TPC-H @3000GB benchmark.

  • The SPARC T5-4 server took 2 hours, 6 minutes, 4 seconds for data loading while the HP ProLiant DL980 G7 server took 4.1 times longer.

  • The SPARC T5-4 server executed the first refresh function (RF1) in 19.4 seconds, the HP ProLiant DL980 G7 server took 8.9 times longer.

  • The SPARC T5-4 server delivered 6% better performance than the SPARC Enterprise M9000-64 server and 2.1 times better than the SPARC Enterprise M9000-32 server on the TPC-H @3000GB benchmark.

Performance Landscape

The table lists the leading TPC-H @3000GB results for non-clustered systems.

TPC-H @3000GB, Non-Clustered Systems
System
Processor
P/C/T – Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC T5-4
3.6 GHz SPARC T5
4/64/512 – 2048 GB
409,721.8 $3.94 345,762.7 485,512.1 Oracle 11g R2 09/24/13
SPARC Enterprise M9000
3.0 GHz SPARC64 VII+
64/256/256 – 1024 GB
386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g R2 09/22/11
SPARC T4-4
3.0 GHz SPARC T4
4/32/256 – 1024 GB
205,792.0 $4.10 190,325.1 222,515.9 Oracle 11g R2 05/31/12
SPARC Enterprise M9000
2.88 GHz SPARC64 VII
32/128/256 – 512 GB
198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g R2 12/09/10
IBM Power 780
4.1 GHz POWER7
8/32/128 – 1024 GB
192,001.1 $6.37 210,368.4 175,237.4 Sybase 15.4 11/30/11
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
8/64/128 – 512 GB
162,601.7 $2.68 185,297.7 142,685.6 SQL Server 2008 10/13/10

P/C/T = Processors, Cores, Threads
QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric in USD (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

The following table lists data load times and refresh function times during the power run.

TPC-H @3000GB, Non-Clustered Systems
Database Load & Database Refresh
System
Processor
Data Loading
(h:m:s)
T5
Advan
RF1
(sec)
T5
Advan
RF2
(sec)
T5
Advan
SPARC T5-4
3.6 GHz SPARC T5
02:06:04 1.0x 19.4 1.0x 22.4 1.0x
IBM Power 780
4.1 GHz POWER7
05:51:50 2.8x 147.3 7.6x 133.2 5.9x
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
08:35:17 4.1x 173.0 8.9x 126.3 5.6x

Data Loading = database load time
RF1 = power test first refresh transaction
RF2 = power test second refresh transaction
T5 Advan = the ratio of time to T5 time

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Hardware Configuration:

SPARC T5-4 server
4 x SPARC T5 processors (3.6 GHz total of 64 cores, 512 threads)
2 TB memory
2 x internal SAS (2 x 300 GB) disk drives

External Storage:

12 x Sun Storage 2540-M2 array with Sun Storage 2501-M2 expansion trays, each with
24 x 15K RPM 300 GB drives, 2 controllers, 2 GB cache
2 x Brocade 6510 Fibre Channel Switches (48 x 16 Gbs port each)

Software Configuration:

Oracle Solaris 11.1
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 3000 GB (Scale Factor 3000)
TPC-H Composite: 409,721.8 QphH@3000GB
Price/performance: $3.94/QphH@3000GB
Available: 09/24/2013
Total 3 year Cost: $1,610,564
TPC-H Power: 345,762.7
TPC-H Throughput: 485,512.1
Database Load Time: 2:06:04

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

  • Twelve of Oracle's Sun Storage 2540-M2 arrays with Sun Storage 2501-M2 expansion trays were used for the benchmark. Each contains 24 15K RPM drives and is connected to a single dual port 16Gb FC HBA using 2 ports through a Brocade 6510 Fibre Channel switch.

  • The SPARC T5-4 server achieved a peak IO rate of 33 GB/sec from the Oracle database configured with this storage.

  • Oracle Solaris 11.1 required very little system tuning.

  • Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric when comparing systems.

  • The SPARC T5-4 server and Oracle Solaris efficiently managed the system load of two thousand Oracle Database parallel processes.

  • Six Sun Storage 2540-M2/2501-M2 arrays were mirrored to another six Sun Storage 2540-M2/25001-M2 arrays on which all of the Oracle database files were placed. IO performance was high and balanced across all the arrays.

  • The TPC-H Refresh Function (RF) simulates periodical refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T5-4 server outperformed both the IBM POWER7 server and HP ProLiant DL980 G7 server. (See the RF columns above.)

See Also

Disclosure Statement

TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org, results as of 6/7/13. Prices are in USD. SPARC T5-4 www.tpc.org/3288; SPARC T4-4 www.tpc.org/3278; SPARC Enterprise M9000 www.tpc.org/3262; SPARC Enterprise M9000 www.tpc.org/3258; IBM Power 780 www.tpc.org/3277; HP ProLiant DL980 www.tpc.org/3285. 

Wednesday Nov 30, 2011

SPARC T4-4 Beats 8-CPU IBM POWER7 on TPC-H @3000GB Benchmark

Oracle's SPARC T4-4 server delivered a world record TPC-H @3000GB benchmark result for systems with four processors. This result beats eight processor results from IBM (POWER7) and HP (x86). The SPARC T4-4 server also delivered better performance per core than these eight processor systems from IBM and HP. Comparisons below are based upon system to system comparisons, highlighting Oracle's complete software and hardware solution.

This database world record result used Oracle's Sun Storage 2540-M2 arrays (rotating disk) connected to a SPARC T4-4 server running Oracle Solaris 11 and Oracle Database 11g Release 2 demonstrating the power of Oracle's integrated hardware and software solution.

  • The SPARC T4-4 server based configuration achieved a TPC-H scale factor 3000 world record for four processor systems of 205,792 QphH@3000GB with price/performance of $4.10/QphH@3000GB.

  • The SPARC T4-4 server with four SPARC T4 processors (total of 32 cores) is 7% faster than the IBM Power 780 server with eight POWER7 processors (total of 32 cores) on the TPC-H @3000GB benchmark.

  • The SPARC T4-4 server is 36% better in price performance compared to the IBM Power 780 server on the TPC-H @3000GB Benchmark.

  • The SPARC T4-4 server is 29% faster than the IBM Power 780 for data loading.

  • The SPARC T4-4 server is up to 3.4 times faster than the IBM Power 780 server for the Refresh Function.

  • The SPARC T4-4 server with four SPARC T4 processors is 27% faster than the HP ProLiant DL980 G7 server with eight x86 processors on the TPC-H @3000GB benchmark.

  • The SPARC T4-4 server is 52% faster than the HP ProLiant DL980 G7 server for data loading.

  • The SPARC T4-4 server is up to 3.2 times faster than the HP ProLiant DL980 G7 for the Refresh Function.

  • The SPARC T4-4 server achieved a peak IO rate from the Oracle database of 17 GB/sec. This rate was independent of the storage used, as demonstrated by the TPC-H @3000TB benchmark which used twelve Sun Storage 2540-M2 arrays (rotating disk) and the TPC-H @1000TB benchmark which used four Sun Storage F5100 Flash Array devices (flash storage). [*]

  • The SPARC T4-4 server showed linear scaling from TPC-H @1000GB to TPC-H @3000GB. This demonstrates that the SPARC T4-4 server can handle the increasingly larger databases required of DSS systems. [*]

  • The SPARC T4-4 server benchmark results demonstrate a complete solution of building Decision Support Systems including data loading, business questions and refreshing data. Each phase usually has a time constraint and the SPARC T4-4 server shows superior performance during each phase.

[*] The TPC believes that comparisons of results published with different scale factors are misleading and discourages such comparisons.

Performance Landscape

The table lists the leading TPC-H @3000GB results for non-clustered systems.

TPC-H @3000GB, Non-Clustered Systems
System
Processor
P/C/T – Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC Enterprise M9000
3.0 GHz SPARC64 VII+
64/256/256 – 1024 GB
386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g R2 09/22/11
SPARC T4-4
3.0 GHz SPARC T4
4/32/256 – 1024 GB
205,792.0 $4.10 190,325.1 222,515.9 Oracle 11g R2 05/31/12
SPARC Enterprise M9000
2.88 GHz SPARC64 VII
32/128/256 – 512 GB
198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g R2 12/09/10
IBM Power 780
4.1 GHz POWER7
8/32/128 – 1024 GB
192,001.1 $6.37 210,368.4 175,237.4 Sybase 15.4 11/30/11
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
8/64/128 – 512 GB
162,601.7 $2.68 185,297.7 142,685.6 SQL Server 2008 10/13/10

P/C/T = Processors, Cores, Threads
QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric in USD (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

The following table lists data load times and refresh function times during the power run.

TPC-H @3000GB, Non-Clustered Systems
Database Load & Database Refresh
System
Processor
Data Loading
(h:m:s)
T4
Advan
RF1
(sec)
T4
Advan
RF2
(sec)
T4
Advan
SPARC T4-4
3.0 GHz SPARC T4
04:08:29 1.0x 67.1 1.0x 39.5 1.0x
IBM Power 780
4.1 GHz POWER7
05:51:50 1.5x 147.3 2.2x 133.2 3.4x
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
08:35:17 2.1x 173.0 2.6x 126.3 3.2x

Data Loading = database load time
RF1 = power test first refresh transaction
RF2 = power test second refresh transaction
T4 Advan = the ratio of time to T4 time

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server
4 x SPARC T4 3.0 GHz processors (total of 32 cores, 128 threads)
1024 GB memory
8 x internal SAS (8 x 300 GB) disk drives

External Storage:

12 x Sun Storage 2540-M2 array storage, each with
12 x 15K RPM 300 GB drives, 2 controllers, 2 GB cache

Software Configuration:

Oracle Solaris 11 11/11
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 3000 GB (Scale Factor 3000)
TPC-H Composite: 205,792.0 QphH@3000GB
Price/performance: $4.10/QphH@3000GB
Available: 05/31/2012
Total 3 year Cost: $843,656
TPC-H Power: 190,325.1
TPC-H Throughput: 222,515.9
Database Load Time: 4:08:29

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

  • Twelve Sun Storage 2540-M2 arrays were used for the benchmark. Each Sun Storage 2540-M2 array contains 12 15K RPM drives and is connected to a single dual port 8Gb FC HBA using 2 ports. Each Sun Storage 2540-M2 array showed 1.5 GB/sec for sequential read operations and showed linear scaling, achieving 18 GB/sec with twelve Sun Storage 2540-M2 arrays. These were stand alone IO tests.

  • The peak IO rate measured from the Oracle database was 17 GB/sec.

  • Oracle Solaris 11 11/11 required very little system tuning.

  • Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric in which to compare systems.

  • The SPARC T4-4 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle Database parallel processes.

  • Six Sun Storage 2540-M2 arrays were mirrored to another six Sun Storage 2540-M2 arrays on which all of the Oracle database files were placed. IO performance was high and balanced across all the arrays.

  • The TPC-H Refresh Function (RF) simulates periodical refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T4-4 server outperformed both the IBM POWER7 server and HP ProLiant DL980 G7 server. (See the RF columns above.)

See Also

Disclosure Statement

TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org. SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads.

Monday Oct 03, 2011

SPARC T4-4 Beats IBM POWER7 and HP Itanium on TPC-H @1000GB Benchmark

Oracle's SPARC T4-4 server configured with SPARC-T4 processors, Oracle's Sun Storage F5100 Flash Array storage, Oracle Solaris, and Oracle Database 11g Release 2 achieved a TPC-H benchmark performance result of 201,487 QphH@1000GB with price/performance of $4.60/QphH@1000GB.

  • The SPARC T4-4 server benchmark results demonstrate a complete solution of building Decision Support Systems including data loading, business questions and refreshing data. Each phase usually has a time constraint and the SPARC T4-4 server shows superior performance during each phase.

  • The SPARC T4-4 server is 22% faster than the 8-socket IBM POWER7 server with the same number of cores. The SPARC T4-4 server has over twice the performance per socket compared to the IBM POWER7 server.

  • The SPARC T4-4 server achieves 33% better price/performance than the IBM POWER7 server.

  • The SPARC T4-4 server is up to 4 times faster than the IBM POWER7 server for the Refresh Function.

  • The SPARC T4-4 server is 44% faster than the HP Superdome 2 server. The SPARC T4-4 server has 5.7x the performance per socket of the HP Superdome 2 server.

  • The SPARC T4-4 server is 62% better on price/performance than the HP Itanium server.

  • The SPARC T4-4 server is up to 3.7 times faster than the HP Itanium server for the Refresh Function.

  • The SPARC T4-4 server delivers nearly the same performance as Oracle's SPARC Enterprise M8000 server, but with 52% better price/performance on the TPC-H @1000GB benchmark.

  • Oracle used Storage Redundancy Level 3 as defined by the TPC-H 2.14.2 specification which is the strictest level.

  • This TPC-H result demonstrates that the SPARC T4-4 server can deliver the performance while running the increasingly larger databases required of DSS systems. The server measured more than 16 GB/sec of IO throughput through Oracle Database 11g Release 2 software while maintaining the high cpu load.

Performance Landscape

The table below lists published non-cluster results from comparable enterprise class systems from Oracle, IBM and HP. Each system was configured with 512 GB of memory.

TPC-H @1000GB

System
CPU type
Proc/Core/Thread
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC Enterprise M8000
3 GHz SPARC64 VII+
16 / 64 / 128
209,533.6 $9.53 177,845.9 246,867.2 Oracle 11g 09/22/11
SPARC T4-4
3 GHz SPARC-T4
4 / 32 / 256
201,487.0 $4.60 181,760.6 223,354.2 Oracle 11g 10/30/11
IBM Power 780
4.14 GHz POWER7
8 / 32 / 128
164,747.2 $6.85 170,206.4 159,463.1 Sybase 03/31/11
HP Superdome 2
1.73 GHz Intel Itanium 9350
16 / 64 / 64
140,181.1 $12.15 139,181.0 141,188.3 Oracle 11g 10/20/10

QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server
4 x SPARC-T4 3.0 GHz processors (total of 32 cores, 128 threads)
512 GB memory
8 x internal SAS (8 x 300 GB) disk drives

External Storage:

4 x Sun Storage F5100 Flash Array storage, each with
80 x 24 GB Flash Modules

Software Configuration:

Oracle Solaris 10 8/11
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 1000 GB (Scale Factor 1000)
TPC-H Composite: 201,487 QphH@1000GB
Price/performance: $4.60/QphH@1000GB
Available: 10/30/2011
Total 3 Year Cost: $925,525
TPC-H Power: 181,760.6
TPC-H Throughput: 223,354.2
Database Load Time: 1:22:39

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance.

Key Points and Best Practices

  • Four Sun Storage F5100 Flash Array devices were used for the benchmark. Each F5100 device contains 80 flash modules (FMODs). Twenty (20) FMODs from each F5100 device were connected to a single SAS 6 Gb HBA. A single F5100 device showed 4.16 GB/sec for sequential read and demonstrated linear scaling of 16.62 GB/sec with 4 x F5100 devices.

  • The IO rate from the Oracle database was over 16 GB/sec.

  • Oracle Solaris 10 8/11 required very little system tuning.

  • The SPARC T4-4 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle parallel processes.

  • The Oracle database files for tables and indexes were managed by Oracle Automatic Storage Manager (ASM) with 4M stripe. Two F5100 devices were mirrored to another 2 F5100 devices under ASM. IO performance was high and balanced across all the FMODs.
  • The Oracle redo log files were mirrored across the F5100 devices using Oracle Solaris Volume Manager with 128K stripe.
  • Parallel degree on tables and indexes was set to 128. This setting worked the best for performance.
  • TPC-H Refresh Function simulates periodical Refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T4-4 server outperformed both HP Superdome 2 and IBM POWER7 servers.

See Also

Disclosure Statement

TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org. SPARC T4-4 201,487 QphH@1000GB, $4.60/QphH@1000GB, avail 10/30/2011, 4 processors, 32 cores, 256 threads; SPARC Enterprise M8000 209,533.6 QphH@1000GB, $9.53/QphH@1000GB, avail 09/22/11, 16 processors, 64 cores, 128 threads; IBM Power 780 QphH@1000GB, 164,747.2 QphH@1000GB, $6.85/QphH@1000GB, avail 03/31/11, 8 processors, 32 cores, 128 threads; HP Integrity Superdome 2 140,181.1 QphH@1000GB, $12.15/QphH@1000GB avail 10/20/10, 16 processors, 64, cores, 64 threads.

Tuesday Sep 27, 2011

SPARC T4-4 Server Sets World Record on PeopleSoft Payroll (N.A.) 9.1, Outperforms IBM Mainframe, HP Itanium

Oracle's SPARC T4-4 server achieved world record performance on the Unicode version of Oracle's PeopleSoft Enterprise Payroll (N.A) 9.1 extra-large volume model benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 10.

  • The SPARC T4-4 server was able to process 1,460,544 payments/hour using PeopleSoft Payroll N.A 9.1.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 2.8x faster than IBM z10 EC 2097 Payroll 9.0 (UNICODE version) result of 87.4 minutes. The IBM mainframe is rated at 6,512 MIPS.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 3.1x faster than HP rx7640 Itanium2 non-UNICODE result of 96.17 minutes, on Payroll 9.0.

  • The average CPU utilization on the SPARC T4-4 server was only 30%, leaving significant room for business growth.

  • The SPARC T4-4 server processed payroll for 500,000 employees, 750,000 payments, in 30.84 minutes compared to the earlier world record result of 46.76 minutes on Oracle's SPARC Enterprise M5000 server.

  • The SPARC Enterprise M5000 server configured with eight 2.66 GHz SPARC64 VII processors has a result of 46.76 minutes on Payroll 9.1. That is 7% better than the result of 50.11 minutes on the SPARC Enterprise M5000 server configured with eight 2.53 GHz SPARC64 VII processors on Payroll 9.0. The difference in clock speed between the two processors is ~5%. That is close to the difference in the two results, thereby showing that the impact of the Payroll 9.1 benchmark on the overall result is about the same as that of Payroll 9.0.

Performance Landscape

PeopleSoft Payroll (N.A.) 9.1 – 500K Employees (7 Million SQL PayCalc, Unicode)

System OS/Database Payroll Processing
Result (minutes)
Run 1
(minutes)
Num of
Streams
SPARC T4-4, 4 x 3.0 GHz SPARC T4 Solaris/Oracle 11g 30.84 43.76 96
SPARC M5000, 8 x 2.66 GHz SPARC64 VII+ Solaris/Oracle 11g 46.76 66.28 32

PeopleSoft Payroll (N.A.) 9.0 – 500K Employees (3 Million SQL PayCalc, Non-Unicode)

System OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000, 8 x 2.53 GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 58.96 80.5 250.68 462.6 8
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 87.4 ** 107.6 - - 8
HP rx7640, 8 x 1.6 GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

** This result was run with Unicode. The IBM z10 EC 2097 UNICODE result of 87.4 minutes is 48% slower than IBM z10 EC 2097 non-UNICODE result of 58.96 minutes, both on Payroll 9.0, each configured with nine 4.4GHz Gen1 processors.

Payroll 9.1 Compared to Payroll 9.0

Please note that Payroll 9.1 is Unicode based and Payroll 9.0 had non-Unicode and Unicode versions of the workload. There are 7 million executions of an SQL statement for the PayCalc batch process in Payroll 9.1 and 3 million executions of the same SQL statement for the PayCalc batch process in Payroll 9.0. This gets reflected in the elapsed time (27.33 min for 9.1 and 23.78 min for 9.0). The elapsed times of all other batch processes is lower (better) on 9.1.

Configuration Summary

Hardware Configuration:

SPARC T4-4 server
4 x 3.0 GHz SPARC T4 processors
256 GB memory
Sun Storage F5100 Flash Array
80 x 24 GB FMODs

Software Configuration:

Oracle Solaris 10 8/11
PeopleSoft HRMS and Campus Solutions 9.10.303
PeopleSoft Enterprise (PeopleTools) 8.51.035
Oracle Database 11g Release 2 11.2.0.1 (64-bit)
Micro Focus COBOLServer Express 5.1 (64-bit)

Benchmark Description

The PeopleSoft 9.1 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing a large organization. The five processes are:

  • Paysheet Creation: Generates payroll data worksheets consisting of standard payroll information for each employee for a given pay cycle.

  • Payroll Calculation: Looks at paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by Payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by the above processes and produces an electronic transmittal file that is used to transfer payroll funds directly into an employee's bank account.

Key Points and Best Practices

  • The SPARC T4-4 server with the Sun Storage F5100 Flash Array device had an average read throughput of up to 103 MB/sec and an average write throughput of up to 124 MB/sec while consuming 30% CPU on average.

  • The Sun Storage F5100 Flash Array device is a solid-state device that provides a read latency of only 0.5 msec. That is about 10 times faster than the normal disk latencies of 5 msec measured on this benchmark.

See Also

  • Oracle PeopleSoft Benchmark White Papers
    oracle.com
  • PeopleSoft Enterprise Human Capital Management (Payroll)
    oracle.com

  • PeopleSoft Enterprise Payroll 9.1 Using Oracle for Solaris (Unicode) on an Oracle's SPARC T4-4 – White Paper
    oracle.com

  • SPARC T4-4 Server
    oracle.com
  • Oracle Solaris
    oracle.com
  • Oracle Database 11g Release 2 Enterprise Edition
    oracle.com
  • Sun Storage F5100 Flash Array
    oracle.com

Disclosure Statement

Oracle's PeopleSoft Payroll 9.1 benchmark, SPARC T4-4 30.84 min,
http://www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

Friday Jun 03, 2011

SPARC Enterprise M8000 with Oracle 11g Beats IBM POWER7 on TPC-H @1000GB Benchmark

Oracle's SPARC Enterprise M8000 server configured with SPARC64 VII+ processors, Oracle's Sun Storage F5100 Flash Array storage, Oracle Solaris, and Oracle Database 11g Release 2 achieved a TPC-H performance result of 209,533 QphH@1000GB with price/performance of $9.53/QphH@1000GB.

Oracle's SPARC server surpasses the performance of the IBM POWER7 server on the 1 TB TPC-H decision support benchmark.

Oracle focuses on the performance of the complete hardware and software stack. Implementation details such as the number of cores or the number of threads obscures the important metric of delivered system performance. The SPARC Enterprise M8000 server delivers higher performance than the IBM Power 780 even though the SPARC VII+ processor-core is 1.6x slower than the POWER7 processor-core.

  • The SPARC Enterprise M8000 server is 27% faster than the IBM Power 780. IBM's reputed single-thread performance leadership does not provide benefit for throughput.

  • Oracle beats IBM Power with better performance. This shows that Oracle's focus on integrated system design provides more customer value than IBM's focus on per core performance.

  • The SPARC Enterprise M8000 server is up to 3.8 times faster than the IBM Power 780 for Refresh Function. Again, IBM's reputed single-thread performance leadership does not provide benefit for this important function.

  • The SPARC Enterprise M8000 server is 49% faster than the HP Superdome 2 (1.73 GHz Itanium 9350).

  • The SPARC Enterprise M8000 server is 22% better price performance than the HP Superdome 2 (1.73 GHz Itanium 9350).

  • The SPARC Enterprise M8000 server is 2 times faster than the HP Superdome 2 (1.73 GHz Itanium 9350) for Refresh Function.

  • Oracle used Storage Redundancy Level 3 as defined by the TPC-H 2.14.0 specification which is the highest level.

  • One should focus on the performance of the complete hardware and software stack since server implementation details such as the number of cores or the number of threads obscures the important metric of delivered system performance.

  • This TPC-H result demonstrates that the SPARC Enterprise M8000 server can handle the increasingly large databases required of DSS systems. The server delivered more than 16 GB/sec of IO throughput through Oracle Database 11g Release 2 software maintaining high cpu load.

Performance Landscape

The table below lists published results from comparable enterprise class systems from Oracle, HP and IBM. Each system was configured with 512 GB of memory.

TPC-H @1000GB

System
CPU type
Proc/Core/Thread
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC Enterprise M8000
3 GHz SPARC64 VII+
16 / 64 / 128
209,533.6 $9.53 177,845.9 246,867.2 Oracle 11g 09/22/11
IBM Power 780
4.14 GHz POWER7
8 / 32 / 128
164,747.2 $6.85 170,206.4 159,463.1 Sybase 03/31/11
HP SuperDome 2
1.73 GHz Intel Itanium 9350
16 / 64 / 64
140,181.1 $12.15 139,181.0 141,188.3 Oracle 11g 10/20/10

QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server:

SPARC Enterprise M8000 server
16 x SPARC64 VII+ 3.0 GHz processors (total of 64 cores, 128 threads)
512 GB memory
12 x internal SAS (12 x 300 GB) disk drives

External Storage:

4 x Sun Storage F5100 Flash Array device, each with
80 x 24 GB Flash Modules

Software:

Oracle Solaris 10 8/11
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 1000 GB (Scale Factor 3000)
TPC-H Composite: 209,533.6 QphH@1000GB
Price/performance: $9.53/QphH@1000GB
Available: 09/22/2011
Total 3 year Cost: $1,995,715
TPC-H Power: 177,845.9
TPC-H Throughput: 246,867.2
Database Load Time: 1:27:12

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance.

Key Points and Best Practices

  • Four Sun Storage F5100 Flash Array devices were used for the benchmark. Each F5100 device contains 80 Flash Modules (FMODs). Twenty (20) FMODs from each F5100 device were connected to a single SAS 6 Gb HBA. A single F5100 device showed 4.16 GB/sec for sequential read and demonstrated linear scaling of 16.62 GB/sec with 4 x F5100 devices.
  • The IO rate from the Oracle database was over 16 GB/sec.
  • Oracle Solaris 10 8/11 required very little system tuning.
  • The SPARC Enterprise M8000 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle parallel processes.
  • The Oracle database files were mirrored under Solaris Volume Manager (SVM). Two F5100 arrays were mirrored to another 2 F5100 arrays. IO performance was good and balanced across all the FMODs. Because of the SVM mirror one of the durability tests, the disk/controller failure test, was transparent to the Oracle database.

See Also

Disclosure Statement

SPARC Enterprise M8000 209,533.6 QphH@1000GB, $9.53/QphH@1000GB, avail 09/22/11, IBM Power 780 QphH@1000GB, 164,747.2 QphH@1000GB, $6.85/QphH@1000GB, avail 03/31/11, HP Integrity Superdome 2 140,181.1 QphH@1000GB, $12.15/QphH@1000GB avail 10/20/10, TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

Friday Mar 25, 2011

SPARC Enterprise M9000 with Oracle Database 11g Delivers World Record Single Server TPC-H @3000GB Result

Oracle's SPARC Enterprise M9000 server delivers single-system TPC-H @3000GB world record performance. The SPARC Enterprise M9000 server along with Oracle's Sun Storage 6180 arrays and running Oracle Database 11g Release 2 on the Oracle Solaris operating system proves the power of Oracle's integrated solution.

  • The SPARC Enterprise M9000 server configured with SPARC64 VII+ processors, Sun Storage 6180 arrays and running Oracle Solaris 10 combined with Oracle Database 11g Release 2 achieved World Record TPC-H performance of 386,478.3 QphH@3000GB for non-clustered systems.

  • The SPARC Enterprise M9000 server running the Oracle Database 11g Release 2 software is 2.5 times faster than the IBM p595 (POWER6) server which ran with Sybase IQ v.15.1 database software.

  • The SPARC Enterprise M9000 server is 3.4 times faster than the IBM p595 server for data loading.

  • The SPARC Enterprise M9000 server is 3.5 times faster than the IBM p595 server for Refresh Function.

  • The SPARC Enterprise M9000 server configured with Sun Storage 6180 arrays shows linear scaling up to the maximum delivered IO performance of 48.3 GB/sec as measured by vdbench.

  • The SPARC Enterprise M9000 server running the Oracle Database 11g Release 2 software is 2.4 times faster than the HP ProLiant DL980 server which used Microsoft SQL Server 2008 R2 Enterprise Edition software.

  • The SPARC Enterprise M9000 server is 2.9 times faster than the HP ProLiant DL980 server for data loading.

  • The SPARC Enterprise M9000 server is 4 times faster than the HP ProLiant DL980 server for Refresh Function.

  • A 1.94x improvement was delivered by the SPARC Enterprise M9000 server result using 64 SPARC64 VII+ processors compared to the previous Sun SPARC Enterprise M9000 server result which used 32 SPARC64 VII processes.

  • Oracle's TPC-H result shows that the SPARC Enterprise M9000 server can handle the increasingly large databases required of DSS systems. The IO rate as measured by the Oracle database is over 40 GB/sec.

  • Oracle used Storage Redundancy Level 3 as defined by the TPC-H 2.14.0 specification which is the highest level.

Performance Landscape

TPC-H @3000GB, Non-Clustered Systems

System
CPU type
Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC Enterprise M9000
3 GHz SPARC64 VII+
1024 GB
386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g 09/22/11
Sun SPARC Enterprise M9000
2.88 GHz SPARC64 VII
512 GB
198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g 12/09/10
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
512 GB
162,601.7 $2.68 185,297.7 142,601.7 SQL Server 10/13/10
IBM Power 595
5.0 GHz POWER6
512 GB
156,537.3 $20.60 142,790.7 171,607.4 Sybase 11/24/09

QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server:

SPARC Enterprise M9000
64 x SPARC VII+ 3.0 GHz processors
1024 GB memory
4 x internal SAS (4 x 146 GB)

External Storage:

32 x Sun Storage 6180 arrays (each with 16 x 600 GB)

Software:

Oracle Solaris 10 9/10
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 3000 GB (Scale Factor 3000)
TPC-H Composite: 386,478.3 QphH@3000GB
Price/performance: $18.19/QphH@3000GB
Available: 09/22/2011
Total 3 year Cost: $7,030,009
TPC-H Power: 316,835.8
TPC-H Throughput: 471,428.6
Database Load Time: 2:59:01

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance.

Key Points and Best Practices

  • The Sun Storage 6180 array showed linear scalability of 48.3 GB/sec Sequential Read with thirty-two Sun Storage 6180 arrays. Scaling could continue if there are more arrays available.
  • Oracle Solaris 10 9/10 required very little system tuning.
  • The optimal Sun Storage 6180 arrays configuration for the benchmark was to set up 1 disk per volume instead of multiple disks per volume and let Oracle Oracle Automatic Storage Management (ASM) mirror. Presenting as many volumes as possible to Oracle database gave the highest scan rate.

  • The storage was managed by ASM with 4 MB stripe size. 1 MB is the default stripe size but 4 MB works better for large databases.

  • All the Oracle database files, except TEMP tablespace, were mirrored under ASM. 16 x Sun Storage 6180 arrays (256 disks) were mirrored to another 16 x Sun Storage 6180 arrays using ASM. IO performance was good and balanced across all the disks. With the ASM mirror the benchmark passed the ACID (Atomicity, Consistency, Isolation and Durablity) test.

  • Oracle database tables were 256-way partitioned. The parallel degree for each table was set to 256 to match the number of available cores. This setting worked the best for performance.

  • Oracle Database 11g Release 2 feature Automatic Parallel Degree Policy was set to AUTO for the benchmark. This enabled automatic degree of parallelism, statement queuing and in-memory parallel execution.

See Also

Disclosure Statement

SPARC Enterprise M9000 386,478.3 QphH@3000GB, $18.19/QphH@3000GB, avail 09/22/11, IBM Power 595 QphH@3000GB, 156,537.3 QphH@3000GB, $20.60/QphH@3000GB, avail 11/24/09, HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB avail 10/13/10, TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

Tuesday Oct 26, 2010

3D VTI Reverse Time Migration Scalability On Sun Fire X2270-M2 Cluster with Sun Storage 7210

This Oil & Gas benchmark shows the Sun Storage 7210 system delivers almost 2 GB/sec bandwidth and realizes near-linear scaling performance on a cluster of 16 Sun Fire X2270 M2 servers.

Oracle's Sun Storage 7210 system attached via QDR InfiniBand to a cluster of sixteen of Oracle's Sun Fire X2270 M2 servers was used to demonstrate the performance of a Reverse Time Migration application, an important application in the Oil & Gas industry. The total application throughput and computational kernel scaling are presented for two production sized grids of 800 samples.

  • Both the Reverse Time Migration I/O and combined computation shows near-linear scaling from 8 to 16 nodes on the Sun Storage 7210 system connected via QDR InfiniBand to a Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231: 2.0x improvement
      2486 x 1151 x 1231: 1.7x improvement
  • The computational kernel of the Reverse Time Migration has linear to super-linear scaling from 8 to 16 nodes in Oracle's Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231 : 2.2x improvement
      2486 x 1151 x 1231 : 2.0x improvement
  • Intel Hyper-Threading provides additional performance benefits to both the Reverse Time Migration I/O and computation when going from 12 to 24 OpenMP threads on the Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231: 8% - computational kernel; 2% - total application throughput
      2486 x 1151 x 1231: 12% - computational kernel; 6% - total application throughput
  • The Sun Storage 7210 system delivers the Velocity, Epsilon, and Delta data to the Reverse Time Migration at a steady rate even when timing includes memory initialization and data object creation:

      1243 x 1151 x 1231: 1.4 to 1.6 GBytes/sec
      2486 x 1151 x 1231: 1.2 to 1.3 GBytes/sec

    One can see that when doubling the size of the problem, the additional complexity of overlapping I/O and multiple node file contention only produces a small reduction in read performance.

Performance Landscape

Application Scaling

Performance and scaling results of the total application, including I/O, for the reverse time migration demonstration application are presented. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server.

Application Scaling Across Multiple Nodes
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup
16 504 259 2.0 2.2\* 1024 551 1.7 2.0
14 565 279 1.8 2.0 1191 677 1.5 1.6
12 662 343 1.6 1.6 1426 817 1.2 1.4
10 784 394 1.3 1.4 1501 856 1.2 1.3
8 1024 560 1.0 1.0 1745 1108 1.0 1.0

\* Super-linear scaling due to the compute kernel fitting better into available cache

Application Scaling – Hyper-Threading Study

The affects of hyperthreading are presented when running the reverse time migration demonstration application. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server.

Hyper-Threading Comparison – 12 versus 24 OpenMP Threads
Number Nodes Thread per Node Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total HT Speedup Kernel HT Speedup Total Time (sec) Kernel Time (sec) Total HT Speedup Kernel HT Speedup
16 24 504 259 1.02 1.08 1024 551 1.06 1.12
16 12 515 279 1.00 1.00 1088 616 1.00 1.00

Read Performance

Read performance is presented for the velocity, epsilon and delta files running the reverse time migration demonstration application. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server.

Velocity, Epsilon, and Delta File Read and Memory Initialization Performance
Number Nodes Overlap MBytes Read Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Time (sec) Time Relative 8-node Total GBytes Read Read Rate GB/s Time (sec) Time Relative 8-node Total GBytes Read Read Rate GB/s
16 2040 16.7 1.1 23.2 1.4 36.8 1.1 44.3 1.2
8 951
14.8 1.0 22.1 1.6 33.0 1.0 43.2 1.3

Configuration Summary

Hardware Configuration:

16 x Sun Fire X2270 M2 servers, each with
2 x 2.93 GHz Intel Xeon X5670 processors
48 GB memory (12 x 4 GB at 1333 MHz)

Sun Storage 7210 system connected via QDR InfiniBand
2 x 18 GB SATA SSD (logzilla)
40 x 1 TB 7200 RM SATA disk

Software Configuration:

SUSE Linux Enterprise Server SLES 10 SP 2
Oracle Message Passing Toolkit 8.2.1 (for MPI)
Sun Studio 12 Update 1 C++, Fortran, OpenMP

Benchmark Description

This Reverse Time Migration (RTM) demonstration application measures the total time it takes to image 800 samples of various production size grids and write the final image to disk. In this version, each node reads in only the trace, velocity, and conditioning data to be processed by that node plus a four element inline 3-D array pad (spatial order of eight) shared with its neighbors to the left and right during the initialization phase. It represents a full RTM application including the data input, computation, communication, and final output image to be used by the next work flow step involving 3D volumetric seismic interpretation.

Key Points and Best Practices

This demonstration application represents a full Reverse Time Migration solution. Many references to the RTM application tend to focus on the compute kernel and ignore the complexity that the input, communication, and output bring to the task.

I/O Characterization without Optimal Checkpointing

Velocity, Epsilon, and Delta Files - Grid Reading

The additional amount of overlapping reads to share velocity, epsilon, and delta edge data with neighbors can be calculated using the following equation:

    (number_nodes - 1) x (order_in_space) x (y_dimension) x (z_dimension) x (4 bytes) x (3 files)

For this particular benchmark study, the additional 3-D pad overlap for the 16 and 8 node cases is:

    16 nodes: 15 x 8 x 1151 x 1231 x 4 x 3 = 2.04 GB extra
    8 nodes: 7 x 8 x 1151 x 1231 x 4 x 3 = 0.95 GB extra

For the first of the two test cases, the total size of the three files used for the 1243 x 1151 x 1231 case is

    1243 x 1151 x 1231 x 4 bytes = 7.05 GB per file x 3 files = 21.13 GB

With the additional 3-D pad, the total amount of data read is:

    16 nodes: 2.04 GB + 21.13 GB = 23.2 GB
    8 nodes: 0.95 GB + 21.13 GB = 22.1 GB

For the second of the two test cases, the total size of the three files used for the 2486 x 1151 x 1231 case is

    2486 x 1151 x 1231 x 4 bytes = 14.09 GB per file x 3 files = 42.27 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: 2.04 GB + 42.27 GB = 44.3 GB
    8 nodes: 0.95 GB + 42.27 GB = 43.2 GB

Note that the amount of overlapping data read increases, not only by the number of nodes, but as the y dimension and/or the z dimension increases.

Trace Reading

The additional amount of overlapping reads to share trace edge data with neighbors for can be calculated using the following equation:

    (number_nodes - 1) x (order_in_space) x (y_dimension) x (4 bytes) x (number_of_time_slices)

For this particular benchmark study, the additional overlap for the 16 and 8 node cases is:

    16 nodes: 15 x 8 x 1151 x 4 x 800 = 442MB extra
    8 nodes: 7 x 8 x 1151 x 4 x 800 = 206MB extra

For the first case the size of the trace data file used for the 1243 x 1151 x 1231 case is

    1243 x 1151 x 4 bytes x 800 = 4.578 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: .442 GB + 4.578 GB = 5.0 GB
    8 nodes: .206 GB + 4.578 GB = 4.8 GB

For the second case the size of the trace data file used for the 2486 x 1151 x 1231 case is

    2486 x 1151 x 4 bytes x 800 = 9.156 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: .442 GB + 9.156 GB = 9.6 GB
    8 nodes: .206 GB + 9.156 GB = 9.4 GB

As the number of nodes is increased, the overlap causes more disk lock contention.

Writing Final Output Image

1243x1151x1231 - 7.1 GB per file:

    16 nodes: 78 x 1151 x 1231 x 4 = 442MB/node (7.1 GB total)
    8 nodes: 156 x 1151 x 1231 x 4 = 884MB/node (7.1 GB total)

2486x1151x1231 - 14.1 GB per file:

    16 nodes: 156 x 1151 x 1231 x 4 = 930 MB/node (14.1 GB total)
    8 nodes: 311 x 1151 x 1231 x 4 = 1808 MB/node (14.1 GB total)

Resource Allocation

It is best to allocate one node as the Oracle Grid Engine resource scheduler and MPI master host. This is especially true when running with 24 OpenMP threads in hyperthreading mode to avoid oversubscribing a node that is cooperating in delivering the solution.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/20/2010.

Monday Oct 11, 2010

Sun SPARC Enterprise M9000 Server Delivers World Record Non-Clustered TPC-H @3000GB Performance

Oracle's Sun SPARC Enterprise M9000 server delivered a single-system TPC-H 3000GB world record performance. The Sun SPARC Enterprise M9000 server, running Oracle Database 11g Release 2 on the Oracle Solaris operating system proves the power of Oracle's integrated solution.

  • Oracle beats IBM Power with better performance and price/performance (3 Year TCO). This shows that Oracle's focus on integrated system design provides more customer value than IBM's focus on "per core performance"!

  • The Sun SPARC Enterprise M9000 server is 27% faster than the IBM Power 595.

  • The Sun SPARC Enterprise M9000 server is 22% faster than the HP ProLiant DL980 G7.

  • The Sun SPARC Enterprise M9000 server is 26% lower than the IBM Power 595 for price/performance.

  • The Sun SPARC Enterprise M9000 server is 2.7 times faster than the IBM Power 595 for data loading.

  • The Sun SPARC Enterprise M9000 server is 2.3 times faster than the HP ProLiant DL980 for data loading.

  • The Sun SPARC Enterprise M9000 server is 2.6 times faster than the IBM p595 for Refresh Function.

  • The Sun SPARC Enterprise M9000 server is 3 times faster than the HP ProLiant DL980 for Refresh Function.

  • Oracle used Storage Redundancy Level 3 as defined by the TPC-H 2.12.0 specification, which is the highest level. IBM is the only other vendor to secure the storage to this level.

  • One should focus on the performance of the complete hardware and software stack since server implementation details such as the number of cores or the number of threads will obscure the important metrics of delivered system performance and system price/performance.

  • The Sun SPARC Enterprise M9000 server configured with SPARC VII processors, Sun Storage 6180 arrays, and running Oracle Solaris 10 operating system combined with Oracle Database 11g Release 2 achieved World Record TPC-H performance of 198,907.5 QphH@3000GB for non-clustered systems.

  • The Sun SPARC Enterprise M9000 server is over three times faster than the HP Itanium2 Superdome.

  • The Sun Storage 6180 array configuration (a total of 16 6180 arrays) in this benchmark delivered IO performance of over 21 GB/sec Sequential Read performance as measured by the vdbench tool.

  • This TPC-H result demonstrates that the Sun SPARC Enterprise M9000 server can handle the increasingly large databases required of DSS systems. The server delivered more than 18 GB/sec of real IO throughput as measured by the Oracle Database 11g Release 2 software.

  • Both Oracle and IBM had the same level of hardware discounting as allowed by TPC rules to provide a effective comparison of price/performance.

  • IBM has not shown any delivered I/O performance results for the high-end IBM POWER7 systems. In addition, they have not delivered any commercial benchmarks (TPC-C, TPC-H, etc.) which have heavy I/O demands.

Performance Landscape

TPC-H @3000GB, Non-Clustered Systems

System
CPU type
Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
Sun SPARC Enterprise M9000
2.88GHz SPARC64 VII
512GB
198,907.5 $15.27 182,350.7 216,967.7 Oracle 12/09/10
HP ProLiant DL980 G7
2.27GHz Intel Xeon X7560
512GB
162,601.7 $2.68 185,297.7 142,601.7 SQL Server 10/13/10
IBM Power 595
5.0GHz POWER6
512GB
156,537.3 $20.60 142,790.7 171,607.4 Sybase 11/24/09
Unisys ES7000 7600R
2.6GHz Intel Xeon
1024GB
102,778.2 $21.05 120,254.8 87,841.4 SQL Server 05/06/10
HP Integrity Superdome
1.6GHz Intel Itanium
256GB
60,359.3 $32.60 80,838.3 45,068.3 SQL Server 05/21/07

QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server:

Sun SPARC Enterprise M9000
32 x SPARC VII 2.88 GHz processors
512 GB memory
4 x internal SAS (4 x 300 GB)

External Storage:

16 x Sun Storage 6180 arrays (16x 16 x 300 GB)

Software:

Operating System: Oracle Solaris 10 10/09
Database: Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 3000 GB (Scale Factor 3000)
TPC-H Composite: 198,907.5 QphH@3000GB
Price/performance: $15.27/QphH@3000GB
Available: 12/09/2010
Total 3 year Cost: $3,037,900
TPC-H Power: 182,350.7
TPC-H Throughput: 216,967.7
Database Load Time: 3:40:11

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance.

Key Points and Best Practices

  • The Sun Storage 6180 array showed good scalability and these sixteen 6180 arrays showed over 21 GB/sec Sequential Read performance as measured by the vdbench tool.
  • Oracle Solaris 10 10/09 required little system tuning.
  • The optimal 6180 configuration for the benchmark was to set up 1 disk per volume instead of multiple disks per volume and let Oracle Solaris Volume Manager (SVM) mirror. Presenting as many volumes as possible to Oracle database gave the highest scan rate.

  • The storage was managed by SVM with 1MB stripe size to match with Oracle's database IO size. The default 16K stripe size is just too small for this DSS benchmark.

  • All the Oracle files, except TEMP tablespace, were mirrored under SVM. Eight 6180 arrays (128 disks) were mirrored to another 8 6180 arrays using 128-way stripe. IO performance was good and balanced across all the disks with a round robin order. Read performance was the same with mirror or without mirror. With the SVM mirror the benchmark passed the ACID (Atomicity, Consistency, Isolation and Durablity) test.

  • Oracle tables were 128-way partitioned and parallel degree for each table was set to 128 because the system had 128 cores. This setting worked the best for performance.

  • CPU usage during the Power run was not so high. This is because parallel degree was set to 128 for the tables and indexes so it utilized 128 vcpus for the most of the queries but the system had 256 vcpus.

See Also

Disclosure Statement

Sun SPARC Enterprise M9000 198,907.5 QphH@3000GB, $15.27/QphH@3000GB, avail 12/09/10, IBM Power 595 QphH@3000GB, 156,537.3 QphH@3000GB, $20.60/QphH@3000GB, avail 11/24/09, HP Integrity Superdome 60,359.3 QphH@3000GB, $32.60/QphH@3000GB avail 06/18/07, TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

Monday Sep 20, 2010

Schlumberger's ECLIPSE 300 Performance Throughput On Sun Fire X2270 Cluster with Sun Storage 7410

Oracle's Sun Storage 7410 system, attached via QDR InfiniBand to a cluster of eight of Oracle's Sun Fire X2270 servers, was used to evaluate multiple job throughput of Schlumberger's Linux-64 ECLIPSE 300 compositional reservoir simulator processing their standard 2 Million Cell benchmark model with 8 rank parallelism (MM8 job).

  • The Sun Storage 7410 system showed little difference in performance (2%) compared to running the MM8 job with dedicated local disk.

  • When running 8 concurrent jobs on 8 different nodes all to the Sun Storage 7140 system, the performance saw little degradation (5%) compared to a single MM8 job running on dedicated local disk.

Experiments were run changing how the cluster was utilized in scheduling jobs. Rather than running with the default compact mode, tests were run distributing the single job among the various nodes. Performance improvements were measured when changing from the default compact scheduling scheme (1 job to 1 node) to a distributed scheduling scheme (1 job to multiple nodes).

  • When running at 75% of the cluster capacity, distributed scheduling outperformed the compact scheduling by up to 34%. Even when running at 100% of the cluster capacity, the distributed scheduling is still slightly faster than compact scheduling.

  • When combining workloads, using the distributed scheduling allowed two MM8 jobs to finish 19% faster than the reference time and a concurrent PSTM workload to find 2% faster.

The Oracle Solaris Studio Performance Analyzer and Sun Storage 7410 system analytics were used to identify a 3D Prestack Kirchhoff Time Migration (PSTM) as a potential candidate for consolidating with ECLIPSE. Both scheduling schemes are compared while running various job mixes of these two applications using the Sun Storage 7410 system for I/O.

These experiments showed a potential opportunity for consolidating applications using Oracle Grid Engine resource scheduling and Oracle Virtual Machine templates.

Performance Landscape

Results are presented below on a variety of experiments run using the 2009.2 ECLIPSE 300 2 Million Cell Performance Benchmark (MM8). The compute nodes are a cluster of Sun Fire X2270 servers connected with QDR InfiniBand. First, some definitions used in the tables below:

Local HDD: Each job runs on a single node to its dedicated direct attached storage.
NFSoIB: One node hosts its local disk for NFS mounting to other nodes over InfiniBand.
IB 7410: Sun Storage 7410 system over QDR InfiniBand.
Compact Scheduling: All 8 MM8 MPI processes run on a single node.
Distributed Scheduling: Allocate the 8 MM8 MPI processes across all available nodes.

First Test

The first test compares the performance of a single MM8 test on a single node using local storage to running a number of jobs across the cluster and showing the effect of different storage solutions.

Compact Scheduling
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Local HDD Relative Throughput NFSoIB Relative Throughput IB 7410 Relative Throughput
13% 1 1.00 1.00\* 0.98
25% 2 0.98 0.97 0.98
50% 4 0.98 0.96 0.97
75% 6 0.98 0.95 0.95
100% 8 0.98 0.95 0.95

\* Performance measured on node hosting its local disk to other nodes in the cluster.

Second Test

This next test uses the Sun Storage 7410 system and compares the performance of running the MM8 job on 1 node using the compact scheduling to running multiple jobs with compact scheduling and to running multiple jobs with the distributed schedule. The tests are run on a 8 node cluster, so each distributed job has only 1 MPI process per node.

Comparing Compact and Distributed Scheduling
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Compact Scheduling
Relative Throughput
Distributed Scheduling\*
Relative Throughput
13% 1 1.00 1.34
25% 2 1.00 1.32
50% 4 0.99 1.25
75% 6 0.97 1.10
100% 8 0.97 0.98

\* Each distributed job has 1 MPI process per node.

Third Test

This next test uses the Sun Storage 7410 system and compares the performance of running the MM8 job on 1 node using the compact scheduling to running multiple jobs with compact scheduling and to running multiple jobs with the distributed schedule. This test only uses 4 nodes, so each distributed job has two MPI processes per node.

Comparing Compact and Distributed Scheduling on 4 Nodes
Multiple Job Throughput Results Relative to Single Job
2009.2 ECLIPSE 300 MM8 2 Million Cell Performance Benchmark

Cluster Load Number of MM8 Jobs Compact Scheduling
Relative Throughput
Distributed Scheduling\*
Relative Throughput
25% 1 1.00 1.39
50% 2 1.00 1.28
100% 4 1.00 1.00

\* Each distributed job it has two MPI processes per node.

Fourth Test

The last test involves running two different applications on the 4 node cluster. It compares the performance of running the cluster fully loaded and changing how the applications are run, either compact or distributed. The comparisons are made against the individual application running the compact strategy (as few nodes as possible). It shows that appropriately mixing jobs can give better job performance than running just one kind of application on a single cluster.

Multiple Job, Multiple Application Throughput Results
Comparing Scheduling Strategies
2009.2 ECLIPSE 300 MM8 2 Million Cell and 3D Kirchoff Time Migration (PSTM)

Number of PSTM Jobs Number of MM8 Jobs Compact Scheduling
(1 node x 8 processes
per job)
ECLIPSE
Distributed Scheduling
(4 nodes x 2 processes
per job)
ECLIPSE
Distributed Scheduling
(4 nodes x 4 processes
per job)
PSTM
Compact Scheduling
(2 nodes x 8 processes per job)
PSTM
Cluster Load
0 1 1.00 1.40

25%
0 2 1.00 1.27

50%
0 4 0.99 0.98

100%
1 2
1.19 1.02
100%
2 0

1.07 0.96 100%
1 0

1.08 1.00 50%

Results and Configuration Summary

Hardware Configuration:

8 x Sun Fire X2270 servers, each with
2 x 2.93 GHz Intel Xeon X5570 processors
24 GB memory (6 x 4 GB memory at 1333 MHz)
1 x 500 GB SATA
Sun Storage 7410 system, 24 TB total, QDR InfiniBand
4 x 2.3 GHz AMD Opteron 8356 processors
128 GB memory
2 Internal 233GB SAS drives (466 GB total)
2 Internal 93 GB read optimized SSD (186 GB total)
1 Sun Storage J4400 with 22 1 TB SATA drives and 2 18 GB write optimized SSD
20 TB RAID-Z2 (double parity) data and 2-way striped write optimized SSD or
11 TB mirrored data and mirrored write optimized SSD
QDR InfiniBand Switch

Software Configuration:

SUSE Linux Enterprise Server 10 SP 2
Scali MPI Connect 5.6.6
GNU C 4.1.2 compiler
2009.2 ECLIPSE 300
ECLIPSE license daemon flexlm v11.3.0.0
3D Kirchoff Time Migration

Benchmark Description

The benchmark is a home-grown study in resource usage options when running the Schlumberger ECLIPSE 300 Compositional reservoir simulator with 8 rank parallelism (MM8) to process Schlumberger's standard 2 Million Cell benchmark model. Schlumberger pre-built executables were used to process a 260x327x73 (2 Million Cell) sub-grid with 6,206,460 total grid cells and model 7 different compositional components within a reservoir. No source code modifications or executable rebuilds were conducted.

The ECLIPSE 300 MM8 job uses 8 MPI processes. It can run within a single node (compact) or across multiple nodes of a cluster (distributed). By using the MM8 job, it is possible to compare the performance between running each job on a separate node using local disk to using a shared network attached storage solution. The benchmark tests study the affect of increasing the number of MM8 jobs in a throughput model.

The first test compares the performance of running 1, 2, 4, 6 and 8 jobs on a cluster of 8 nodes using local disk, NFSoIB disk, and the Sun Storage 7410 system connected via InfiniBand. Results are compared against the time it takes to run 1 job with local disk. This test shows what performance impact there is when loading down a cluster.

The second test compares different methods of scheduling jobs on a cluster. The compact method involves putting all 8 MPI processes for a job on the same node. The distributed method involves using 1 MPI processes per node. The results compare the performance against 1 job on one node.

The third test is similar to the second test, but uses only 4 nodes in the cluster, so when running distributed, there are 2 MPI processes per node.

The fourth test compares the compact and distributed scheduling methods on 4 nodes while running a 2 MM8 jobs and one 16-way parallel 3D Prestack Kirchhoff Time Migration (PSTM).

Key Points and Best Practices

  • ECLIPSE is very sensitive to memory bandwidth and needs to be run on 1333 MHz or greater memory speeds. In order to maintain 1333 MHz memory, the maximum memory configuration for the processors used in this benchmark is 24 GB. Bios upgrades now allow 1333 MHz memory for up to 48 GB of memory. Additional nodes can be used to handle data sets that require more memory than available per node. Allocating at least 20% of memory per node for I/O caching helps application performance.

  • If allocating an 8-way parallel job (MM8) to a single node, it is best to use an ECLIPSE license for that particular node to avoid the any additional network overhead of sharing a global license with all the nodes in a cluster.

  • Understanding the ECLIPSE MM8 I/O access patterns is essential to optimizing a shared storage solution. The analytics available on the Oracle Unified Storage 7410 provide valuable I/O characterization information even without source code access. A single MM8 job run shows an initial read and write load related to reading the input grid, parsing Petrel ascii input parameter files and creating an initial solution grid and runtime specifications. This is followed by a very long running simulation that writes data, restart files, and generates reports to the 7410. Due to the nature of the small block I/O, the mirrored configuration for the 7410 outperformed the RAID-Z2 configuration.

    A single MM8 job reads, processes, and writes approximately 240 MB of grid and property data in the first 36 seconds of execution. The actual read and write of the grid data, that is intermixed with this first stage of processing, is done at a rate of 240 MB/sec to the 7410 for each of the two operations.

    Then, it calculates and reports the well connections at an average 260 KB writes/second with 32 operations/second = 32 x 8 KB writes/second. However, the actual size of each I/O operation varies between 2 to 100 KB and there are peaks every 20 seconds. The write cache is on average operating at 8 accesses/second at approximately 61 KB/second (8 x 8 KB writes/sec). As the number of concurrent jobs increases, the interconnect traffic and random I/O operations per second to the 7410 increases.

  • MM8 multiple job startup time is reduced on shared file systems, if each job uses separate input files.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/20/2010.

Wednesday Jun 09, 2010

PeopleSoft Payroll 500K Employees on Sun SPARC Enterprise M5000 World Record

Oracle's Sun SPARC Enterprise M5000 server combined with Oracle's Sun Storage F5100 Flash Array system has produced World Record Performance on PeopleSoft Payroll 9.0 (North American) 500K employees benchmark.
  • The Sun SPARC Enterprise M5000 server and the Sun Storage F5100 Flash Array system processed payroll for 500K employees using 32 payroll threads 18% faster than the IBM z10 EC 2097-709 mainframe as measured for payroll processing tasks in the Peoplesoft Payroll 9.0 (North American) benchmark. This IBM mainframe is rated at 6,512 MIPS.

  • The IBM z10 mainframe with nine 4.4 GHz Gen1 processors has a list price over $6M.

  • The Sun SPARC Enterprise M5000 server together with the Sun Storage F5100 Flash Array system processed payroll for 500K employees using 32 payroll threads 92% faster than an HP rx7640 as measured for payroll processing tasks in the Peoplesoft Payroll 9.0 (North American) benchmark.

  • The Sun Storage F5100 Flash Array system is a high performance, high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

  • The Sun SPARC Enterprise M5000 server used the Oracle Solaris 10 operating system and ran with the Oracle 11gR1 database for this benchmark.

Performance Landscape

500K Employees

System Processor OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000 8x 2.53GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 9x 4.4GHz Gen1, 6,512 MIPS Z/OS /DB2 58.96 80.5 250.68 462.6 8
HP rx7640 8x 1.6GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

Times under all Run columns above represent Payroll processing and Post-processing elapsed times and furthermore:

  • Run 1 = 32 parallel job streams & Single Check option = "No"
  • Run 2 = 32 sequential jobs for Pay Calculation process & 32 parallel job streams for the rest. Single Check option = "Yes"
  • Run 3 = One job stream & Single Check option = "Yes"

Times under Result column represents Payroll processing only.

Results and Configuration Summary

Hardware Configuration:

    1 x Sun SPARC Enterprise M5000 (8 x 2.53 GHz/64 GB)
    1 x Sun Storage F5100 Flash Array (40 x 24 GB FMODs)
    1 x StorageTek 2510 (4 x 136 GB SAS 15K RPM)
    4 x Dual-Port SAS Fibre Channel Host Bus Adapters (HBA)

Software Configuration:

    Oracle Solaris 10 10/09
    Oracle PeopleSoft HCM and Campus Solutions 9.00.00.311 64-bit
    Oracle PeopleSoft Enterprise (PeopleTools) 8.49.25 64-bit
    Oracle 11g R1 11.1.0.7 64-bit
    Micro Focus COBOL Server Express 4.0 SP4 64-bit

Benchmark Description

The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing large organization. The five processes are:

  • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

  • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

For the benchmark, we collect at least three data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of thirty-two job streams to be configured to run in parallel.

Key Points and Best Practices

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle PeopleSoft Payroll 9.0 benchmark, Sun SPARC Enterprise M5000 (8 2.53GHz SPARC64 VII) 50.11 min, IBM z10 (9 gen1) 58.96 min, HP rx7640 (8 1.6GHz Itanium2) 96.17 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html, results 6/3/2010.

Monday Mar 29, 2010

Sun Blade X6275/QDR IB/ Reverse Time Migration

Significance of Results

Oracle's Sun Blade X6275 cluster with a Lustre file system was used to demonstrate the performance potential of the system when running reverse time migration applications complete with I/O processing.

  • Reduced the Total Application run time for the Reverse Time Migration when processing 800 input traces for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes, by implementing algorithm I/O optimizations and taking advantage of MPI I/O features in HPC ClusterTools:

    • 1243x1151x1231 - Wall clock time reduced from 11.5 to 6.3 minutes (1.8x improvement)
    • 2486x1151x1231 - Wall clock time reduced from 21.5 to 13.5 minutes (1.6x improvement)
  • Reduced the I/O Intensive Trace-Input time for the Reverse Time Migration when reading 800 input traces for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes running HPC ClusterTools, by modifying the algorithm to minimize the per node data requirement and avoiding unneeded synchronization:

    • 2486x1151x1231 : Time reduced from 121.5 to 3.2 seconds (38.0x improvement)
    • 1243x1151x1231 : Time reduced from 71.5 to 2.2 seconds (32.5x improvement)
  • Reduced the I/O Intensive Grid Initialization time for the Reverse Time Migration Grid when reading the Velocity, Epsilon, and Delta slices for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes running HPC ClusterTools, by modifying the algorithm to minimize the per node grid data requirement:

    • 2486x1151x1231 : Time reduced from 15.6 to 4.9 seconds (3.2x improvement)
    • 1243x1151x1231 : Time reduced from 8.9 to 1.2 seconds (7.4x improvement)

Performance Landscape

In the tables below, the hyperthreading feature is enabled and the systems are fully utilized.

This first table presents the total application performance in minutes. The overall performance improved significantly because of the improved I/O performance and other benefits.


Total Application Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (mins)
MPI I/O
Time (mins)
Improvement Original
Time (mins)
MPI I/O
Time (mins)
Improvement
24 11.5 6.3 1.8x 21.5 13.5 1.6x
20 12.0 8.0 1.5x 21.9 15.4 1.4x
16 13.8 9.7 1.4x 26.2 18.0 1.5x
12 21.7 13.2 1.6x 29.5 23.1 1.3x

This next table presents the initialization I/O time. The results are presented in seconds and shows the advantage of the improved MPI I/O strategy.


Initialization Time Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (sec)
MPI I/O
Time (sec)
Improvement Original
Time (sec)
MPI I/O
Time (sec)
Improvement
24 8.9 1.2 7.4x 15.6 4.9 3.2x
20 9.3 1.5 6.2x 16.9 3.9 4.3x
16 9.7 2.5 3.9x 17.4 11.3 1.5x
12 9.8 3.3 3.0x 22.5 14.9 1.5x

This last table presents the trace I/O time. The results are presented in seconds and shows the significant advantage of the improved MPI I/O strategy.


Trace I/O Time Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (sec)
MPI I/O
Time (sec)
Improvement Original
Time (sec)
MPI I/O
Time (sec)
Improvement
24 71.5 2.2 32.5x 121.5 3.2 38.0x
20 67.7 2.4 28.2x 118.3 3.9 30.3x
16 64.2 2.7 23.7x 110.7 4.6 24.1x
12 69.9 4.2 16.6x 296.3 14.6 20.3x

Results and Configuration Summary

Hardware Configuration:

Oracle's Sun Blade 6048 Modular Modular System with
12 x Oracle's Sun Blade x6275 Server Modules, each with
4 x 2.93 GHz Intel Xeon QC X5570 processors
12 x 4 GB memory at 1333 MHz
2 x 24 GB Internal Flash
QDR InfiniBand Lustre 1.8.0.1 File System

Software Configuration:

OS: 64-bit SUSE Linux Enterprise Server SLES 10 SP 2
MPI: Oracle Message Passing Toolkit 8.2.1 for I/O optimization to Lustre file system
MPI: Scali MPI Connect 5.6.6-59413 for original Lustre file system runs
Compiler: Oracle Solaris Studio 12 C++, Fortran, OpenMP

Benchmark Description

The primary objective of this Reverse Time Migration Benchmark is to present MPI I/O tuning techniques, exploit the power of the Sun's HPC ClusterTools MPI I/O implementation, and demonstrate the world-class performance of Sun's Lustre File System to Exploration Geophysicists throughout the world. A Sun Blade 6048 Modular System with 12 Sun Blade X6275 server modules were clustered together with a QDR Infiniband Lustre File System to show performance improvements in the Reverse Time Migration Throughput by using the Sun HPC ClusterTools MPI-IO features to implement specific algorithm I/O optimizations.

This Reverse Time Migration Benchmark measures the total time it takes to image 800 samples of various production size grids and write the final image to disk. In this new I/O optimized version, each node reads in only the data to be processed by that node plus a 4 element inline pad shared with it's neighbors to the left and right. This latest version, essentially loads the boundary condition data during the initialization phase. The previous version handled boundary conditions by having each node read in all the trace, velocity, and conditioning data. Or, alternatively, the master node would read in all the data and distribute it in it's entirety to every node in the cluster. With the previous version, each node had full memory copies of all input data sets even when it only processed a subset of that data. The new version only holds the inline dimensions and pads to be processed by a particular node in memory.

Key Points and Best Practices

  • The original implementation of the trace I/O involved the master node reading in nx \* ny floats and communicating this trace data to all the other nodes in a synchronous manner. Each node only used a subset of the trace data for each of the 800 time steps. The optimized I/O version has each node asynchronously read in only the (nx/num_procs + 8) \* ny floats that it will be processing. The additional 8 inline values for the optimized I/O version are the 4 element pads of a node's left and right neighbors to handle initial boundary conditions. The MPI_Barrier needed for the original implementation, for synchronization, and the additional I/O for each node to load all the data values, truly impacts performance. For the I/O optimized version, each node reads only the data values it needs and does not require the same MPI_Barrier synchronization as the original version of the Reverse Time Migration Benchmark. By performing such I/O optimizations, a significant improvement is seen in the Trace I/O.

  • For the best MPI performance, allocate the X6275 nodes in blade by blade order and run with HyperThreading enabled. The "Binary Conditioning" part of the Reverse Time Migration specifically likes hyperthreading.

  • To get the best I/O performance, use a maximum of 70% of each nodes available memory for the Reverse Time Migration application. Execution time may vary I/O results can occur if the nodes have different memory size configurations.

See Also

Monday Jan 25, 2010

Sun/Solaris Leadership in SAP SD Benchmarks and HP claims

COMMENTS ON  SIGNIFICANT SAP SD 2 Tier RESULTS and HP MISLEADING CLAIMS:

HP is making claims of "leadership" in the  Two-Tier SAP SD benchmark by carefully fencing the claims, based on the OS (Linux & Windows),  and conveniently omitting the actual leading results from Sun on Solaris.

HP's claims: ftp://ftp.hp.com/pub/c-products/servers/benchmarks/HP_ProLiant_785_585_385_SAP_perf_brief_121009.pdf

It is worthwhile to take a closer look at the results and the real leadership of Sun and Solaris in this SAP benchmark. All the SAP SD Two-Tier results discussed here can be seen at: http://www.sap.com/solutions/benchmark/sd2tier.epx  All results here use the latest version of SAP Enhancement Package 4  for SAP ERP 6.0 (Unicode).

Here is a summary of the HP claims and the counterpoints showing Sun and Solaris real leadership in performance and scalability.

HP claims the #1 position for 8-Processor, 4-Processor and 2-processor servers as follows;

  • 8-processor:  yes, but on Windows (8,280 SAP SD Benchmark users) and on Linux (8,022 SAP SD Benchmark users) with the HP Proliant DL785 G6 (8x 2.8GHz Opteron 8439 SE).
  • Formally correct statements however HP fails to mention Sun actual  #1 8-processor Overall record result by far, using Solaris at 10,000 SAP SD Benchmark users on a Sun Fire X4640 with eight 2.6GHz Opteron 8435, leading by more than 20% with lower clock speed!
  • 4-processor:  yes, but on Linux (4,370 SAP SD Benchmark users,  using the HP Proliant DL585 G6, 4x 2.8GHz AMD Opteron SE).
  • Again, formally correct statement however, HP fails to mention that Sun holds the 4-processor Overall record result using Solaris at 4,720 SAP SD Benchmark users, obtained on a Sun SPARC Enterprise T5440 with four 1.6GHz UltraSPARC T2 Plus.
  • 2-processor: Similarly HP claims the #1 and #2 rankings, but on Linux (3,171 SAP SD Benchmark users, HP Proliant DL380 G6, 2x 2.93GHz Xeon X5570) and (2,315 SAP SD Benchmark users, HP Proliant DL385 G6, 2x 2.6GHz Opteron 2435).
  • Again, HP omits the fact that Sun holds the  #1 2-Processor Overall  record result on Solaris at 3,800 SAP SD Benchmark users  obtained on a Sun Fire X4270 with 2x 2.93GHz Xeon X5570. A 20% and 64% Sun leading result!

The only conclusion is that that Sun Servers running Solaris have real Leadership in the SAP SD Two-Tier Benchmark. This fact is not only confirmed for the 2- to 8-processor servers but also at the high end where the Sun M9000 with Solaris leads with the World Record Overall for this benchmark, showing real record performance and top vertical scalability.

More details on the World Record Sun M9000 SAP SD 2-Tier results at BestPerf blog: http://blogs.sun.com/BestPerf/entry/sun_m9000_fastest_sap_2

SAP Benchmark Disclosure statement

Two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmarks as of Jan 22, 2010: 

Cert# Benchmark Server Users SAPS Procs Cores Thrds CPU CPU MHz Mem (MB) Operating System RDBMS Release
2009046 Sun SPARC M9000 32000 175600 64 256 512 SPARC64 VII 2880 1179648 Solaris 10 Oracle 10g
2009049 Sun Fire X4640 10000 55070 8 48 48 AMD Opt 8435 2600 262144 Solaris 10 Oracle 10g
2009052 HP ProL DL785 G6 8022 43800 8 48 48 AMD Opt 8439SE 2800 131072 SuSE Linux ES10 MaxDB 7.8
2009035 HP ProL DL785 G6 8280 45350 8 48 48 AMD Opt 8439SE 2800 131072 Windows 2008-EE SQL Server 2008
2009026 Sun SPARC T5440 4720 25830 4 32 256 UltraSPARC T2Plus 1600 262144 Solaris 10 Oracle 10g
2009025 HP ProL DL585 G6 4665 25530 4 24 24 AMD Opt 8439SE 2800 65536 Windows 2008-EE SQL Server 2008
2009051 HP ProL DL585 G6 4370 23850 4 24 24 AMD Opt 8439SE 2800 65536 SuSE Linux ES10 MaxDB 7.8
2009033 Sun Fire X4270 3800 21000 2 8 16 Intel Xeon X5570 2930 49152 Solaris 10 Oracle 10g
2009004 HP ProL DL380 G6 3300 18030 2 8 16 Intel Xeon X5570 2930 49152 Windows 2008-EE SQL Server 2008
2009006 HP ProL DL380 G6 3171 17380 2 8 16 Intel Xeon X5570 2930 49152 SuSE Linux ES10 MaxDB 7.8
2009050 HP ProL DL385 G6 2315 12650 2 12 12 AMD Opt 2435 2600 49152 SuSE Linux ES10 MaxDB 7.8

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Thursday Jan 21, 2010

SPARC Enterprise M4000 PeopleSoft NA Payroll 240K Employees Performance (16 Streams)

The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology, the Sun Storage F5100 flash array, has produced World Record Performance on PeopleSoft Payroll 9.0 (North American) 240K employees benchmark.

  • The Sun SPARC Enterprise M4000 server with four 2.53 GHz SPARC64 VII processors and the Sun Storage F5100 flash array using 16 job streams (payroll threads) is 55% faster than the HP rx6600 (4 x 1.6GHz Itanium2 processors) as measured for payroll processing tasks in the PeopleSoft Payroll 9.0 (North American) benchmark. The Sun result used the Oracle 11gR1 database running on Solaris 10.

  • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and the Sun Storage F5100 flash array is 2.1x faster than the 2027 MIPs IBM Z990 (6 Z990 Gen1 processors) as measured for payroll processing tasks in the PeopleSoft Payroll 9.0 (North American) benchmark. The Sun result use the Oracle 11gR1 database running on Solaris 10 while the IBM result was run with 8 payroll threads and used IBM DB2 for Z/OS 8.1 for the database.

  • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and a Sun Storage F5100 flash array processed payroll for 240K employees using PeopleSoft Payroll 9.0 (North American) and Oracle 11gR1 running on Solaris 10 with different execution strategies with resulted in a maximum CPU utilization of 45% compared to HP's reported CPU utilization of 89%.

  • The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology processed 16 Sequential Jobs and single run control with a total time of 534 minutes, an improvement of 19% compared to HP's time of 633 minutes.

  • Sun's FlashFire technology dramatically improves IO performance for the Peoplesoft Payroll 9.0 (North American) benchmark with significant performance boost over best optimized FC disks (60+).

  • The Sun Storage F5100 Flash Array is a high performance high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

  • Sun estimates that the MIPS rating for a Sun SPARC Enterprise M4000 server is over 3000 MIPS.

Performance Landscape

240K Employees

System Processor OS/Database Time in Minutes Num of
Streams
Ver
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M4000 4x 2.53GHz SPARC64 VII Solaris/Oracle 11gR1 43.78 51.26 286.11 534.35 16 9.0
HP rx6600 4x 1.6GHz Itanium2 HP-UX/Oracle 11g 68.07 81.17 350.16 633.25 16 9.0
IBM Z990 6x Gen1 2027 MIPS Z/OS /DB2 91.70 107.34 328.66 544.80 8 9.0

Note: IBM benchmark documents show that 6 Gen1 procs is 2027 mips. 13 Gen1 processors were in this config but only 6 were available for testing.

Results and Configuration Summary

Hardware Configuration:

    1 x Sun SPARC Enterprise M4000 (4 x 2.53 GHz/32GB)
    1 x Sun Storage F5100 Flash Array (40 x 24GB FMODs)
    1 x Sun Storage J4200 (12 x 450GB SAS 15K RPM)

Software Configuration:

    Solaris 10 5/09
    Oracle PeopleSoft HCM 9.0 64-bit
    Oracle PeopleSoft Enterprise (PeopleTools) 8.49.08 64-bit
    Micro Focus Server Express 4.0 SP4 64-bit
    Oracle RDBMS 11.1.0.7 64-bit
    HP's Mercury Interactive QuickTest Professional 9.0

Benchmark Description

The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing large organization. The five processes are:

  • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

  • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

For the benchmark, we collect at least three data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of sixteen job streams to be configured to run in parallel.

Key Points and Best Practices

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle PeopleSoft Payroll 9.0 benchmark, Sun M4000 (4 2.53GHz SPARC64) 43.78 min, IBM Z990 (6 gen1) 91.70 min, HP rx6600 (4 1.6GHz Itanium2) 68.07 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html, results 1/21/2010.

Tuesday Nov 24, 2009

Sun M9000 Fastest SAP 2-tier SD Benchmark on current SAP EP4 for SAP ERP 6.0 (Unicode)

The Sun SPARC Enterprise M9000 server (64 processors, 256 cores, 512 threads) set a World Record on the SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.
  • The Sun SPARC Enterprise M9000 server with 2.88 GHz SPARC64 VII processors achieved 32,000 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.

  • The Sun SPARC Enterprise M9000 server result is 8.6x faster than the only IBM 5GHz POWER6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • IBM has not submitted any IBM 595 results on the current SAP enhancement package 4 for SAP ERP 6.0 (unicode) Standard Sales and Distribution (SD) Benchmark. This benchmark has been current for almost a year. IBM p595 systems only have 8x more cores than the system than IBM system 550.

  • HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • This new result is 1.84x times greater than the previous record result delivered on the Sun SPARC Enterprise M9000 server which used 32 processors.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note 1139642 for more details.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape SAP enhancement package 4 for SAP ERP 6.0 (Unicode) Results (in decreasing performance)

(ERP 6.0 EP is the current version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS Date
Sun SPARC Enterprise M9000
64xSPARC 64 VII @2.88GHz
1152 GB
Solaris 10
Oracle10g
32,000 2009
6.0 EP4
(Unicode)
175,600 18-Nov-09
Sun SPARC Enterprise M9000
32xSPARC 64 VII @2.88GHz
1024 GB
Solaris 10
Oracle10g
17,430 2009
6.0 EP4
(Unicode)
95,480 12-Oct-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 16-Jun-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Results and Configuration Summary

Certified Result:

    Number of SAP SD benchmark users:
    32,000
    Average dialog response time:
    0.93 seconds
    Throughput:

    Fully processed order line items/hour:
    3,512,000

    Dialog steps/hour:
    10,536,000

    SAPS:
    175,600
    SAP Certification:
    2009046

Hardware Configuration:

    Sun SPARC Enterprise M9000
      64 x 2.88GHz SPARC64 VII, 1152 GB memory

Software Configuration:

    Solaris 10
    SAP enhancement package 4 for SAP ERP 6.0 (unicode)
    Oracle10g

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmarks as of 11/18/09: Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 32,000 SAP SD Users, 64 x 2.88 GHz SPARC VII, 1152 GB memory, Oracle10g, Solaris10, Cert# 2009046. Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) 17,430 SAP SD Users, 32 x 2.88 GHz SPARC VII, 1024 GB memory, Oracle10g, Solaris10, Cert# 2009038. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Monday Nov 02, 2009

Sun Ultra 27 Delivers Leading Single Frame Buffer SPECviewperf 10 Results

A Sun Ultra 27 workstation configured with an nVidia FX5800 graphics card delivered outstanding performance running the SPECviewperf® 10 benchmark.

  • When compared with other workstations running a single graphics card (i.e. not running two or more cards in SLI mode), the Sun Ultra 27 workstation places first in 6 of 8 subtests and second in the remaining two subtests.

  • The calculated geometric mean shows that Sun Ultra 27 workstation is 11% faster than competitor's workstations.

  • The optimum point for price/performance is the nVidia FX1800 graphics card.

Results have been published on the SPEC web site at http://www.spec.org/gwpg/gpc.data/vp10/summary.html.

Performance Landscape

Performance of the Sun Ultra 27 versus the competition. Bigger is better for each of the eight tests. The comparison is based upon the performance of the Sun Ultra 27 workstation. Performance is measured in frames per second.


3DSMAX CATIA ENSIGHT MAYA
Perf % Perf % Perf % Perf %
Sun Ultra 27 FX5800 59.34
68.81
58.07
246.09
HP xw4600 ATI FireGL V7700 49.71 19 48.05 43 57.11 2
268.62 -8
HP xw4600 FX4800 52.26 14 63.26 12 53.79 8
226.82 7
Fujtsu Celsius M470 FX3800 53.67 11 65.25 7 52.19 10 227.37 7

PROENGINEER SOLIDWORKS TEAMCENTER UGS
Perf % Perf % Perf % Perf %
Sun Ultra 27 FX5800 68.96
152.01
42.02
36.04
HP xw4600 ATI FireGL V7700 47.25 32 109.71 28 40.18 4 56.65 -57
HP xw4600 FX4800 61.15 11 131.31 14 28.42 32 33.43 7
Fujtsu Celsius M470 FX3800 64.39 7
139.2 8 29.02 31 33.27 8

Comparison of various frame buffers on the Sun Ultra 27 running SPECviewperf 10. Performance is reported for each test along with the difference in performance as compared to the FX5800 frame buffer. The runs in the table below were made with 3.2GHz W3570 processors.


3DSMAX CATIA ENSIGHT MAYA PROENGR SOLIDWRKS TEAMCNTR UGS
Perf % Perf % Perf % Perf % Perf % Perf % Perf % Perf %
FX5800 57.07
67.84
58.63
219.4
68.05
152.3
40.85
34.73
FX3800 57.17 0 66.57 2
54.91 7
206.4 6 66.48 2 146.3 4 38.48 6 33.12 5
FX1800 56.73 1
64.33 6
52.05 13 189.3 16 64.67 5 135.2 13 34.18 20
30.46 14
FX380 45.90 24 55.81 22 34.93 68 120.3 82 46.09 48 64.11 138 17.00 140 13.88 150

Results and Configuration Summary

Hardware Configuration:

    Sun Ultra 27 Workstation
    1 x 3.33 GHz Intel Xeon (tm) W3580
    2GB (1 x 2GB PC10600 1333MHz)
    1 x 500GB SATA
    nVidia Quadro FX380, FX1800, FX3800 & FX5800
    $7,529.00 (includes Microsoft Windows and monitor)

Software Configuration:

    OS: Microsoft Windows Vista Ultimate, 32-bit
    Benchmark: SPECviewperf 10

Benchmark Description

SPECviewperf measures 3D graphics rendering performance of systems running under OpenGL. SPECviewperf is a synthetic benchmark designed to be a predictor of application performance and a measure of graphics subsystem performance. It is a measure of graphics subsystem performance (primarily graphics bus, driver and graphics hardware) and its impact on the system without the full overhead of an application. SPECviewperf reports performance in frames per second.

Please go here for a more complete description of the tests.

Key Points and Best Practices

SPECviewperf measures the 3D rendering performance of systems running under OpenGL.

The SPECopcSM project group's SPECviewperf 10 is totally new performance evaluation software. In addition to features found in previous versions, it now provides the ability to compare performance of systems running in higher-quality graphics modes that use full-scene anti-aliasing, and measures how effectively graphics subsystems scale when running multithreaded graphics content. Since the SPECviewperf source and binaries have been upgraded to support changes, no comparisons should be made between past results and current results for viewsets running under SPECviewperf 10.

SPECviewperf 10 requires OpenGL 1.5 and a minimum of 1GB of system memory. It currently supports Windows 32/64.

See Also

Disclosure Statement

SPEC® and the benchmark name SPECviewperf® are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of Oct 18, 2009. For the latest SPECviewperf benchmark results, visit www.spec.org/gwpg.

Tuesday Oct 13, 2009

Sun T5440 Oracle BI EE Sun SPARC Enterprise T5440 World Record

The Oracle BI EE, a component of Oracle Fusion Middleware,  workload was run on two Sun SPARC Enterprise T5440 servers and achieved world record performance.
  • Two Sun SPARC Enterprise T5440 servers with four 1.6 GHz UltraSPARC T2 Plus processors delivered the best performance of 50K concurrent users on the Oracle BI EE 10.1.3.4 benchmark with Oracle 11g database running on free and open Solaris 10.

  • The two node Sun SPARC Enterprise T5440 servers with Oracle BI EE running on Solaris 10 using 8 Solaris Containers shows 1.8x scaling over Sun's previous one node SPARC Enterprise T5440 server result with 4 Solaris Containers.

  • The two node SPARC Enterprise T5440 servers demonstrated the performance and scalability of the UltraSPARC T2 Plus processor demonstrating 50K users can be serviced with 0.2776 sec response time.

  • The Sun SPARC Enterprise T5220 server was used as an NFS server with 4 internal SSDs and the ZFS file system which showed significant I/O performance improvement over traditional disk for Business Intelligence Web Catalog activity.

  • Oracle Fusion Middleware provides a family of complete, integrated, hot pluggable and best-of-breed products known for enabling enterprise customers to create and run agile and intelligent business applications. Oracle BI EE performance demonstrates why so many customers rely on Oracle Fusion Middleware as their foundation for innovation.

  • IBM has not published any POWER6 processor based results on this important benchmark.

Performance Landscape

System Processors Users
Chips GHz Type
2 x Sun SPARC Enterprise T5440 8 1.6 UltraSPARC T2 Plus 50,000
1 x Sun SPARC Enterprise T5440 4 1.6 UltraSPARC T2 Plus 28,000
5 x Sun Fire T2000 1 1.2 UltraSPARC T1 10,000

Results and Configuration Summary

Hardware Configuration:

    2 x Sun SPARC Enterprise T5440 (1.6GHz/128GB)
    1 x Sun SPARC Enterprise T5220 (1.2GHz/64GB) and 4 SSDs (used as NFS server)

Software Configuration:

    Solaris10 05/09
    Oracle BI EE 10.1.3.4
    Oracle Fusion Middleware
    Oracle 11gR1

Benchmark Description

The objective of this benchmark is to highlight how Oracle BI EE can support pervasive deployments in large enterprises, using minimal hardware, by simulating an organization that needs to support more than 25,000 active concurrent users, each operating in mixed mode: ad-hoc reporting, application development, and report viewing.

The user population was divided into a mix of administrative users and business users. A maximum of 28,000 concurrent users were actively interacting and working in the system during the steady-state period. The tests executed 580 transactions per second, with think times of 60 seconds per user, between requests. In the test scenario 95% of the workload consisted of business users viewing reports and navigating within dashboards. The remaining 5% of the concurrent users, categorized as administrative users, were doing application development.

The benchmark scenario used a typical business user sequence of dashboard navigation, report viewing, and drill down. For example, a Service Manager logs into the system and navigates to his own set of dashboards viz. .Service Manager.. The user then selects the .Service Effectiveness. dashboard, which shows him four distinct reports, .Service Request Trend., .First Time Fix Rate., .Activity Problem Areas., and .Cost Per completed Service Call . 2002 till 2005. . The user then proceeds to view the .Customer Satisfaction. dashboard, which also contains a set of 4 related reports. He then proceeds to drill-down on some of the reports to see the detail data. Then the user proceeds to more dashboards, for example .Customer Satisfaction. and .Service Request Overview.. After navigating through these dashboards, he logs out of the application

This benchmark did not use a synthetic database schema. The benchmark tests were run on a full production version of the Oracle Business Intelligence Applications with a fully populated underlying database schema. The business processes in the test scenario closely represents a true customer scenario.

See Also

Disclosure Statement

Oracle BI EE benchmark results 10/13/2009, see

SAP 2-tier SD Benchmark on Sun SPARC Enterprise M9000/32 SPARC64 VII

Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark Sun SPARC Enterprise M9000/32 SPARC64 VII

World Record on 32-processor using SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • The Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) set a World Record on 32-processor using SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark, as Oct. 12th, 2009.

  • The 32-way Sun SPARC Enterprise M9000 with 2.88 GHz SPARC64 VII+ processors achieved 17,430 users on the two-tier SAP Sales and Distribution (SD) standard SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark.

  • The Sun SPARC Enterprise M9000 result is 4.6x faster than the only IBM 5GHz Power6 unicode result, which was published on the IBM p550 using the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • IBM has not submitted any p595 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • HP has not submitted any Itanium2 results on the new SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS Date
Sun SPARC Enterprise M9000
32xSPARC 64 VII @2.88GHz
1024 GB
Solaris 10
Oracle10g
17,430 2009
6.0 EP4
(Unicode)
95,480 12-Oct-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 16-Jun-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Certified Result:

    Number of SAP SD benchmark users:
    17,430
    Average dialog response time:
    0.95 seconds
    Throughput:

    Fully processed order line items/hour:
    1,909,670

    Dialog steps/hour:
    5,729,000

    SAPS:
    95,480
    SAP Certification:
    2009038

Hardware Configuration:

    Sun SPARC Enterprise M9000
      32 x 2.88GHz SPARC64 VII, 1024 GB memory
      6 x 6140 storage arrays

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle10g

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 10/12/09: Sun SPARC Enterprise M9000 (32 processors, 128 cores, 256 threads) 17,430 SAP SD Users, 32 x 2.88 GHz SPARC VII, 1024 GB memory, Oracle10g, Solaris10, Cert# 2009038. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 64 x 2.52 GHz SPARC64 VII, 1024GB memory, 39,100 SD benchmark users, 1.93 sec. avg. response time, Cert#2008042, Oracle 10g, Solaris 10, SAP ECC Release 6.0.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Tuesday Sep 22, 2009

Sun X4270 Virtualized for Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

Two-Processor Performance using 8 Virtual CPU Solaris 10 Container Configuration:
  • Sun achieved 36% better performance using Solaris and Solaris 10 containers than a similar configuration on SUSE Linux using VMware ESX Server 4.0 on the same benchmark both using 8 virtual cpus.
  • Solaris Containers are the best virtualization technology for SAP projects and has been supported for more than 4 years. Other virtualization technologies suffer various overheads that decrease performance.
  • The Sun Fire X4270 server with 48G memory and a Solaris 10 container configured with 8 virtual CPUs achieved 2800 SAP SD Benchmark users and beat the Fujitsu PRIMERGY RX300 S5 server with 96G memory and the SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0 by 36%. Both results used the same CPUs and were running the SAP ERP application release 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark.
  • Both the Sun and Fujitsu results were run at 50% and 48% utilization respectively. With these servers being half utilized, there is headroom for additional performance.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. See this SAP Note for more details. Note: username and password for SAP Service Marketplace required.
  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details. Note: username and password for SAP Service Marketplace required.

SAP-SD 2-Tier Performance Landscape (in decreasing performance order).


SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results (New version of the benchmark as of January 2009)

System OS
Database
Virtualized? Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
no 3,800 2009
6.0 EP4
(Unicode)
21,000 10,500 21-Aug-09
IBM System 550
4xPower6 @5GHz
64 GB
AIX 6.1
DB2 9.5
no 3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
SUSE Linux Ent Svr 10
MaxDB 7.8
no 3,171 2009
6.0 EP4
(Unicode)
17,380 8,690 17-Apr-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10 container
(8 virtual CPUs)
Oracle 10g
YES
50% util
2,800 2009
6.0 EP4
(Unicode)
15,320 7,660 10-Sep-09
Fujitsu PRIMERGY RX300 S5
2xIntel Xeon X5570 @2.93GHz
96 GB
SUSE Linux Ent Svr 10 on
VMware ESX Server 4.0
MaxDB 7.8
YES
48% util
2,056 2009
6.0 EP4
(Unicode)
11,230 5,615 04-Aug-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4270
      2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
      48 GB memory
      Sun StorageTek CSM200 with 32 \* 73GB 15KRPM 4Gb FC-AL and 32 \* 146GB 15KRPM 4Gb FC-AL Drives

Software Configuration:

    Solaris 10 container configured with 8 virtual CPUs
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle 10g

Sun has submitted the following result for the SAP-SD 2-Tier benchmark. It was approved and published by SAP.

      Number of benchmark users:
    2,800
      Average dialog response time:
    0.971 s

    Fully processed order line:
    306,330

    Dialog steps/hour:
    919,000

    SAPS:
    15,320
      SAP Certification:
    2009034

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Key Points and Best Practices

  • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

  • Solaris 10 Container best practices how-to guide

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 09/10/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) run in 8 virtual cpu container, 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009034. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10, Cert# 2009006. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 2,800 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 48 GB memory, Oracle 10g, Solaris 10 container configured with 8 virtual CPUs, Cert# 2009034. Fujitsu PRIMERGY Model RX300 S5 (2 processors, 8 cores, 16 threads) 2,056 SAP SD Users, 2x 2.93 GHz Intel Xeon X5570, 96 GB memory, MaxDB 7.8, SUSE Linux Enterprise Server 10 on VMware ESX Server 4.0, Cert# 2009029.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

Tuesday Sep 01, 2009

String Searching - Sun T5240 & T5440 Outperform IBM Cell Broadband Engine

Significance of Results

Sun SPARC Enterprise T5220, T5240 and T5440 servers ran benchmarks using the Aho-Corasick string searching algorithm. String searching or pattern matching are important to a variety of commercial, government and HPC applications. One of the core functions needed for text identification algorithms in data repositories is real-time string searching. For this benchmark, the IBM, HP and Sun systems used the Aho-Corasick algorithm for string searching.

Sun SPARC Enterprise T5440

  • A 1.6 GHz Sun SPARC Enterprise T5440 server could search a book as tall as Mt. Everest (29,208 feet, 861 GB book) in 61 seconds, which corresponds to a string search rate of 14.2 GB/s.

  • A 1.6 GHz Sun SPARC Enterprise T5440 server can search at a rate of 14.2 GB/s, which corresponds to searching a book containing one terabyte of data (34,745 feet high) in only 70 seconds.

  • The 4-chip 1.6 GHz Sun SPARC Enterprise T5440 server performed string searching at a rate of 14.2 GB/s which is 29.9 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s

  • The 4-chip 1.6 GHz Sun SPARC Enterprise T5440 server performed string searching 3.7 times as fast as the 4-chip HP DL-580 (2.93 GHz Xeon QC) server that performed string searching at a rate of 3.87 GB/s. The 1.6 GHz Sun SPARC Enterprise T5440 server has a 1.7 times advantage in delivered power-performance over the HP DL-580 (using a power consumption rate of 830 watts for the HP system as measured on other tests).

  • The 1.6 GHz Sun SPARC Enterprise T5440 server demonstrated a 12% improvement over the 1.4 GHz Sun SPARC Enterprise T5440.

  • The 1.6 GHz Sun SPARC Enterprise T5440 server demonstrated a 2x speedup over the 1.6 GHz Sun SPARC Enterprise T5240 server which demonstrated a 2.3x speedup over the 1.4 GHz Sun SPARC Enterprise T5220 server.

Sun SPARC Enterprise T5240

  • The 2-chip 1.6 GHz Sun SPARC Enterprise T5240 server performed string searching at a rate of 7.22 GB/s which is 15.4 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s.

  • The 2-chip 1.6 GHz Sun SPARC Enterprise T5240 server performed string searching 1.9 times as fast as the 4-chip HP DL-580 (2.93 GHz Xeon QC) server that performed string searching at a rate of 3.87 GB/s. The 1.6 GHz Sun SPARC Enterprise T5240 server has a 2.4 times advantage in delivered power-performance over the HP DL-580 (using a power consumption rate of 830 watts for the HP system as measured on other

  • The 1.6 GHz Sun SPARC Enterprise T5240 server demonstrated a 14% speedup over the 1.4 GHz Sun SPARC Enterprise T5240 server.

Sun SPARC Enterprise T5220

  • The 1-chip 1.4 GHz Sun SPARC Enterprise T5220 server performed string searching at a rate of 3.16 GB/s which is 6.7 times as fast as the 2-chip IBM Cell Broadband Engine DD3 Blade that performed string searching at a rate of 0.475 GB/s.

Performance Landscape

System Throughput
(GB/sec)
Chips Cores
Sun SPARC Enterprise T5440 (1.6 GHz) 14.2 4 32
Sun SPARC Enterprise T5440 (1.4 GHz) 12.7 4 32
Sun SPARC Enterprise T5240 (1.6 GHz) 7.2 2 16
Sun SPARC Enterprise T5240 (1.4 GHz) 6.4 2 16
HP DL-580 (2.9 GHz) 3.9 4 16
Sun SPARC Enterprise T5220 (1.4 GHz) 3.2 1 8
IBM Cell Broadband Engine DD3 Blade (3.2 GHz) 0.475 2 16

Results and Configuration Summary

Hardware Configuration:
    Sun SPARC Enterprise T5440 (1.6 GHz)
      4 x 1.6 GHz UltraSPARC T2 Plus processors
      256 GB
    Sun SPARC Enterprise T5440 (1.4 GHz)
      4 x 1.4 GHz UltraSPARC T2 Plus processors
      128 GB
    Sun SPARC Enterprise T5240 (1.6 GHz)
      2 x 1.6 GHz UltraSPARC T2 Plus processors
      64 GB
    Sun SPARC Enterprise T5240 (1.4 GHz)
      2 x 1.4 GHz UltraSPARC T2 Plus processors
      64 GB
    Sun SPARC Enterprise T5220 (1.4 GHz)
      1 x 1.4 GHz UltraSPARC T2 processor
      32 GB

Software Configuration:

    Sun SPARC Enterprise T5440 (1.6 GHz)
      OpenSolaris 2009.06
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5440 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5240 (1.6 GHz)
      OpenSolaris 2009.06
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5240 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)
    Sun SPARC Enterprise T5220 (1.4 GHz)
      Solaris 10 2008.07
      Sun Studio 12 (Sun C 5.9 2007.05)

Benchmark Description

One of the core functions needed for text identification algorithms in data repositories is real-time string searching. This string searching benchmark demonstrates the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code creation and speed of code execution. In IEEE Computer, Volume 41, Number 4, pp. 42-50, April 2008, IBM describes a variant of the Aho-Corasick string searching algorithm that uses deterministic finite automata. The algorithm first constructs a graph that represents a dictionary, then walks that graph using successive input characters from a text file. Each "state" in the graph includes a state transition table (STT) that is accessed using the next input character from the text file to determine the address of the next state in the graph. IBM defines an automaton as a two-step loop that: (1) obtains the address of the next state from the STT, and (2) fetches the next state in the graph.

IBM reports the performance of its Cell Broadband Engine (CBE) to execute this algorithm to search a 4.4 MB version of the King James Bible using a dictionary of the 20,000 most used words in the English language (average word length of 7.59 characters). Each of the 8 synergistic processing elements (SPEs) of each of the two CBEs executes 16 automata, for a total of 256 automata. All automata and hence all SPEs access a single, shared dictionary.

IBM describes elaborate optimizations of the Aho-Corasick algorithm, including state shuffling, state replication, alphabet shuffling and state caching. These optimizations were required to: (1) overcome "memory congestion", i.e., contention amongst the SPEs for access to the shared dictionary, and (2) compensate for the limited local storage that is associated with each SPE. These optimizations were necessary to achieve the performance reported for the CBE DD3 Blade.

IBM does not provide references that indicate where to obtain the dictionary and Bible. IBM reports the algorithmic performance in Gbits/s but does not indicate whether an 8-bit byte is extended to 10 bits as required for network transmission.

In order to closely approximate the dictionary and Bible that were used by IBM, Sun used a dictionary of 25,143 English words (the Open Solaris file cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/spell/list) for which the average word length is 7.2 characters, and a 4.6 MB version of the King James Bible (www.patriot.net/users/bmcgin/kjv12.zip). For reporting of results in Gbits/s, the length of a byte is assumed to be 8 bits.

Key Points and Best Practices

  • Power was measured during execution of the Aho-Corasick algorithm using a WattsUp power meter, and the average rate of power consumption is presented.

  • The Aho-Corasick algorithm as deployed on the IBM Cell Broadband Engine DD3 Blade required substantial optimization and tuning to achieve the reported performance, whereas on the Sun SPARC Enterprise T5220, T5240 or T5440 servers only a basic implementation of the algorithm and a simple compilation were needed.

  • In order to demonstrate the usefulness of Sun's UltraSPARC T2 and T2 Plus processors for both ease of code generation and speed of code execution, Sun implemented the Aho-Corasick algorithm using ANSI C. No optimizations of the algorithm were required to achieve the performance reported for the T5220, T5240 and T5440. The source code was compiled using the -m32 -xO3 and -xopenmp options. The dictionary is represented using a graph that comprises 82 MB. Each core of the T5220, T5240 or T5440 executes 8 automata using one OpenMP thread per automaton. Thus, the T5220 executes 64 total automata, the T5240 executes 128 total automata and the T5440 executes 256 total automata. All automata and hence all cores access a single, shared dictionary. Access to this dictionary is accelerated by the large, shared L2 caches of the Sun SPARC Enterprise T5220, T5240 and T5440.

See Also

Friday Aug 28, 2009

Sun X4270 World Record SAP-SD 2-Processor Two-tier SAP ERP 6.0 EP 4 (Unicode)

Sun Fire X4270 Server World Record Two Processor performance result on Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • World Record 2-processor performance result on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark on the Sun Fire X4270 server.

  • The Sun Fire X4270 server with two Intel Xeon X5570 processors (8 cores, 16 threads) achieved 3,800 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using Oracle 10g database and Solaris 10 operating system.

  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.

  • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the IBM System 550 server using 4 POWER6 processors, 64 GB memory and the AIX 6.1 operating system.
  • The Sun Fire X4270 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Solaris 10 operating system beat the HP ProLiant BL460c G6 server using 2 Intel Xeon X5570 processors, 48 GB memory and the Windows Server 2008 operating system.

  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to SAP Note for more details. Note: username and password for SAP Service Marketplace required.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,800 2009
6.0 EP4
(Unicode)
21,000 10,500 21-Aug-09
IBM System 550
4xPower6 @5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,700 2009
6.0 EP4
(Unicode)
20,300 10,150 30-Mar-09
HP ProLiant BL460c G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,415 2009
6.0 EP4
(Unicode)
18,670 9,335 04-Aug-09
Fujitsu PRIMERGY TX/RX 300 S5
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,328 2009
6.0 EP4
(Unicode)
18,170 9,085 13-May-09
HP ProLiant BL460c G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,310 2009
6.0 EP4
(Unicode)
18,070 9,035 27-Mar-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,300 2009
6.0 EP4
(Unicode)
18,030 9,015 27-Mar-09
Fujitsu PRIMERGY BX920 S1
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,260 2009
6.0 EP4
(Unicode)
17,800 8,900 18-Jun-09
NEC Express5800
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,250 2009
6.0 EP4
(Unicode)
17,750 8,875 28-Jul-09
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
SuSE Linux Enterprise Server 10
MaxDB 7.8
3,171 2009
6.0 EP4
(Unicode)
17,380 8,690 17-Apr-09

Complete benchmark results may be found at the SAP benchmark website: http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4270
      2 x 2.93 GHz Intel Xeon X5570 processors (2 processors / 8 cores / 16 threads)
      48 GB memory
      Sun Storage 6780 with 48 x 73GB 15KRPM 4Gb FC-AL and 16 x 146GB 15KRPM 4Gb FC-AL Drives

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle 10g

Certified Results:

          Performance: 3800 benchmark users
          SAP Certification: 2009033

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Key Points and Best Practices

  • Set up the storage (LSI-OEM) to deliver the needed raw devices directly out of the storage and do not use any software layer in between.

See Also

Benchmark Tags

World-Record, Performance, SAP-SD, Solaris, Oracle, Intel, X64, x86, HP, IBM, Application, Database

Disclosure Statement

    Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 08/21/09: Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,800 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009033. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,700 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009005. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,415 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009031. Fujitsu PRIMERGY TX/RX 300 S5 (2 processors, 8 cores, 16 threads) 3,328 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009014. HP ProLiant BL460c G6 (2 processors, 8 cores, 16 threads) 3,310 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009003. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,300 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009004. Fujitsu PRIMERGY BX920 S1 (2 processors, 8 cores, 16 threads) 3,260 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009024. NEC Express5800 (2 processors, 8 cores, 16 threads) 3,250 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009027. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 3,171 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, MaxDB 7.8, SuSE Linux Enterprise Server 10, Cert# 2009006. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071.

    SAP, R/3, reg TM of SAP AG in Germany and other countries. More info: www.sap.com/benchmark

Wednesday Jul 22, 2009

Why does 1.6 beat 4.7?

Sun has upgraded the UltraSPARC T2 and UltraSPARC T2 Plus processors to 1.6 GHz. As described in some detail in yesterday's post, new results show SPEC CPU2006 performance improvements vs. previous systems that often exceed the clock speed improvement.  The scaling can be attributed to both memory system improvements and software improvements, such as the Sun Studio 12 Update 1 compiler.

A MHz improvement within a product line is often useful.  If yesterday's chip runs at speed n and today's at n\*1.12 then, intuitively, sure, I'll take today's.

Comparing MHz across product lines is often counter-intuitive.  Consider that Sun's new systems provide:

  • up to 68% more throughput than the 4.7 GHz POWER6+ [1], and
  • up to 3x the throughput of the Itanium 9150N [2].

The comparisons are particularly striking when one takes into account the cache size advantage for both the POWER6+ and the Itanium 9150N, and the MHz advantage for the POWER6+:

Processor GHz Number of
hw cache levels
Size of
last cache
(per chip)
SPECint_rate_base2006
UltraSPARC T2
UltraSPARC T2 Plus
1.6 2 4 MB 1 chip: 89
2 chips: 171
4 chips: 338
POWER6+ 4.7 3 32 MB Best 2 chip result: 102. UltraSPARC T2 Plus delivers 68% more integer throughput [1]
Itanium 9150N 1.6 3 24 MB Best 4 chip result: 114. UltraSPARC T2 Plus delivers 3x the integer throughput. [2]

These are per-chip results, not per-core or per-thread. Sun's CMT processors are designed for overall system throughput: how much work can the overall system get done.  

A mystery: With comparatively smaller caches and modest clock rates, why do the Sun CMT processors win?

The performance hole: Memory latency. From the point of view of a CPU chip, the big performance problem is that memory latency is inordinately long compared to chip cycle times.

A hardware designer can attempt to cover up that latency with very large caches, as in the POWER6+ and Itanium, and this works well when running a small number of modest-sized applications. Large caches become less helpful, though, as workloads become more complex.

MHz isn't everything. In fact, MHz hardly counts at all when the problem is memory latency. Suppose the hot part of an application looks like this:

  loop:
       computational instruction
       computational instruction
       computational instruction
       memory access instruction
       branch to loop

For an application that looks like this, the computational instructions may complete in only a few cycles, while the memory access instruction may easily require on the order of 100ns - which, for a 1 GHz chip, is on the order of 100 cycles. If the processor speed is increased by a factor of 4, but memory speed is not, then memory is still 100ns away, and when measured in cycles, it is now 400 cycles distant. The overall loop hardly speeds up at all.

Lest the reader think I am making this up - consider page 8 of this IBM talk from April, 2008 regarding the POWER6:

latencies

The IBM POWER systems have some impressive performance characteristics - if your application is tiny enough to fit in its first or second level cache. But memory latency is not impressive. If your workload requires multiple concurrent threads accessing a large memory space, Sun's CMT approach just might be a better fit.

Operating System Overhead A context switch from one process to another is mediated by operating system services. The OS parks context from the process that is currently running - typically saving dozens of program registers and other context (such as virtual address space information); decides which process to run next (which may require access to several OS data structures); and loads the context for the new process (registers, virtual address context, etc.). If the system is running many processes, then caches are unlikely to be helpful during this context switch, and thousands of cycles may be spent on main memory accesses.

Design for throughput: Sun's CMT approach handles the complexity of real-world applications by allowing up to 64 processes to be simultaneously on-chip. When a long-latency stall occurs, such as an access to main memory, the chip switches to executing instructions on behalf of other, non-stalled threads, thus improving overall system throughput. No operating system intervention is required as resources are shared among the processes on the chip.

[1] http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090427-07263.html
[2] http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090522-07485.html

Competitive results retrieved from www.spec.org   20 July 2009.  Sun's CMT results have been submitted to SPEC.  SPEC, SPECfp, SPECint are registered trademarks of the Standard Performance Evaluation Corporation.

Tuesday Jul 21, 2009

Sun T5440 Oracle BI EE World Record Performance

Oracle BI EE Sun SPARC Enterprise T5440 World Record Performance

The Sun SPARC Enterprise T5440 server running the new 1.6 GHz UltraSPARC T2 Plus processor delivered world record performance on Oracle Business Intelligence Enterprise Edition (BI EE) tests using Sun's ZFS.
  • The Sun SPARC Enterprise T5440 server with four 1.6 GHz UltraSPARC T2 Plus processors delivered the best single system performance of 28K concurrent users on the Oracle BI EE benchmark. This result used Solaris 10 with Solaris Containers and the Oracle 11g Database software.

  • The benchmark demonstrates the scalability of Oracle Business Intelligence Cluster with 4 nodes running in Solaris Containers within single Sun SPARC Enterprise T5440 server.

  • The Sun SPARC Enterprise Server T5440 server with internal SSD and the ZFS file system showed significant I/O performance improvement over traditional disk for Business Intelligence Web Catalog activity.

Performance Landscape

System Processors Users
Chips Cores Threads GHz Type
1 x Sun SPARC Enterprise T5440 4 32 256 1.6 UltraSPARC T2 Plus 28,000
5 x Sun Fire T2000 1 8 32 1.2 UltraSPARC T1 10,000

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise T5440
      4 x 1.6 GHz UltraSPARC T2 Plus processors
      256 GB
      STK2540 (6 x 146GB)

Software Configuration:

    Solaris 10 5/09
    Oracle BIEE 10.1.3.4 64-bit
    Oracle 11g R1 Database

Benchmark Description

The objective of this benchmark is to highlight how Oracle BI EE can support pervasive deployments in large enterprises, using minimal hardware, by simulating an organization that needs to support more than 25,000 active concurrent users, each operating in mixed mode: ad-hoc reporting, application development, and report viewing.

The user population was divided into a mix of administrative users and business users. A maximum of 28,000 concurrent users were actively interacting and working in the system during the steady-state period. The tests executed 580 transactions per second, with think times of 60 seconds per user, between requests. In the test scenario 95% of the workload consisted of business users viewing reports and navigating within dashboards. The remaining 5% of the concurrent users, categorized as administrative users, were doing application development.

The benchmark scenario used a typical business user sequence of dashboard navigation, report viewing, and drill down. For example, a Service Manager logs into the system and navigates to his own set of dashboards viz. .Service Manager.. The user then selects the .Service Effectiveness. dashboard, which shows him four distinct reports, .Service Request Trend., .First Time Fix Rate., .Activity Problem Areas., and .Cost Per completed Service Call . 2002 till 2005. . The user then proceeds to view the .Customer Satisfaction. dashboard, which also contains a set of 4 related reports. He then proceeds to drill-down on some of the reports to see the detail data. Then the user proceeds to more dashboards, for example .Customer Satisfaction. and .Service Request Overview.. After navigating through these dashboards, he logs out of the application

This benchmark did not use a synthetic database schema. The benchmark tests were run on a full production version of the Oracle Business Intelligence Applications with a fully populated underlying database schema. The business processes in the test scenario closely represents a true customer scenario.

Key Points and Best Practices

Since the server has 32 cores, we created 4 x Solaris Containers with 8 cores dedicated to each of the containers. And a total of four instances of BI server + Presentation server (collectively referred as an 'instance' here onwards) were installed at one instance per container. All the four BI instances were clustered using the BI Cluster software components.

The ZFS file system was used to overcome the 'Too many links' error when there are ~28,000 concurrent users. Earlier the file system has reached UFS limitation of 32767 sub-directories (LINK_MAX) with ~28K users online -- and there are thousands of errors due to the inability to create new directories beyond 32767 directories within a directory. Web Catalog stores the user profile on the disk by creating at least one dedicated directory for each user. If there are more than 25,000 concurrent users, clearly ZFS is the way to go.

See Also:

Oracle Business Intelligence Website,  BUSINESS INTELLIGENCE has other results

Disclosure Statement

Oracle Business Intelligence Enterprise Edition benchmark, see http://www.oracle.com/solutions/business_intelligence/resource-library-whitepapers.html for more. Results as of 7/20/09.

Sun T5440 World Record SAP-SD 4-Processor Two-tier SAP ERP 6.0 EP 4 (Unicode)

Sun SPARC Enterprise T5440 Server World Record Four Processor performance result on Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark

  • World Record performance result with four processors on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark as of July 21, 2009.
  • The Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC-T2 Plus processors (32 cores, 256 threads)achieved 4,720 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using Oracle10g database and Solaris 10 OS.
  • Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC T2 Plus processors beats IBM System 550 by 26% using Oracle10g and Solaris 10 even though they both use the same number of processors.
  • Sun SPARC Enterprise T5440 Server with four 1.6GHz UltraSPARC T2 Plus processors beats HP ProLiant DL585 G6 using Oracle10g and Solaris 10 with the same number of processors.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun SPARC Enterprise servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding. Refer to SAP Note for more details (https://service.sap.com/sap/support/notes/1139642 Note: User and password for SAP Service Marketplace required).
  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to SAP Note for more details (https://service.sap.com/sap/support/notes/1139642 Note: User and password for SAP Service Marketplace required).

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun SPARC Enterprise T5440 Server
4xUltraSPARC T2 Plus@1.6GHz
256 GB
Solaris 10
Oracle10g
4,720 2009
6.0 EP4
(Unicode)
25,830 6,458 21-Jul-09
HP ProLiant DL585 G6
4xAMD Opteron 8439 SE @2.8Hz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
4,665 2009
6.0 EP4
(Unicode)
25,530 6,383 10-Jul-09
HP ProLiant BL685c G6
4xAMD Opteron Processor 8435 @2.6GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
4,422 2009
6.0 EP4
(Unicode)
24,230 6,058 29-May-09
IBM System 550
4xPower6@5GHz
64 GB
AIX 6.1
DB2 9.5
3,752 2009
6.0 EP4
(Unicode)
20,520 5,130 16-Jun-09
HP ProLiant DL585 G5
4xAMD Opteron Processor 8393 SE@3.1GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,430 2009
6.0 EP4
(Unicode)
18,730 4,683 24-Apr-09
HP ProLiant BL685 G6
4xAMD Opteron Processor 8389 @2.9GHz
64 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
3,118 2009
6.0 EP4
(Unicode)
17,050 4,263 24-Apr-09
NEC Express5800
4xIntel Xeon Processor X7460@2.66GHz
64 GB
Windows Server 2008 Enterprise Edition
SQL Server 2008
2,957 2009
6.0 EP4
(Unicode)
16,170 4,043 28-May-09
Dell PowerEdge M905
4xAMD Opteron Processor 8384@2.7GHz
96 GB
Windows Server 2003 Enterprise Edition
SQL Server 2005
2,129 2009
6.0 EP4
(Unicode)
11,770 2,943 18-May-09

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun SPARC Enterprise T5440 Server
      4 x 1.6 GHz UltraSPARC T2 Plus processors (4 processors / 32 cores / 256 threads)
      256 GB memory
      3 x STK2540 each with 12 x 73GB/15KRPM disks

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    Oracle10g
SAE (Strategic Applications Engineering) and ISV-E (ISV Engineering) have submitted the following result for the SAP-SD 2-Tier benchmark. It was approved and published by SAP.

Certified Results

    Performance:
    4720 benchmark users
    SAP Certification:
    2009026

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Sun SPARC Enterprise T5440 Server Benchmark Details

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 07/21/09: Sun SPARC Enterprise T5440 Server (4 processors, 32 cores, 256 threads) 4,720 SAP SD Users, 4x 1.6 GHz UltraSPARC T2 Plus, 256 GB memory, Oracle10g, Solaris10, Cert# 2009026. HP ProLiant DL585 G6 (4 processors, 24 cores, 24 threads) 4,665 SAP SD Users, 4x 2.8 GHz AMD Opteron Processor 8439 SE, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009025. HP ProLiant BL685c G6 (4 processors, 24 cores, 24 threads) 4,422 SAP SD Users, 4x 2.6 GHz AMD Opteron Processor 8435, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009021. IBM System 550 (4 processors, 8 cores, 16 threads) 3,752 SAP SD Users, 4x 5 GHz Power6, 64 GB memory, DB2 9.5, AIX 6.1, Cert# 2009023. HP ProLiant DL585 G5 (4 processors, 16 cores, 16 threads) 3,430 SAP SD Users, 4x 3.1 GHz AMD Opteron Processor 8393 SE, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009008. HP ProLiant BL685 G6 (4 processors, 16 cores, 16 threads) 3,118 SAP SD Users, 4x 2.9 GHz AMD Opteron Processor 8389, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009007. NEC Express5800 (4 processors, 24 cores, 24 threads) 2,957 SAP SD Users, 4x 2.66 GHz Intel Xeon Processor X7460, 64 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009018. Dell PowerEdge M905 (4 processors, 16 cores, 16 threads) 2,129 SAP SD Users, 4x 2.7 GHz AMD Opteron Processor 8384, 96 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2009017. Sun Fire X4600M2 (8 processors, 32 cores, 32 threads) 7,825 SAP SD Users, 8x 2.7 GHz AMD Opteron 8384, 128 GB memory, MaxDB 7.6, Solaris 10, Cert# 2008070. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071. SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

1.6 GHz SPEC CPU2006 - Rate Benchmarks

UltraSPARC T2 and T2 Plus Systems

Improved Performance Over 1.4 GHz

Reported 07/21/09

Significance of Results

Results are presented for the SPEC CPU2006 rate benchmarks run on the new 1.6 GHz Sun UltraSPARC T2 and Sun UltraSPARC T2 Plus processors based systems. The new processors were tested in the Sun CMT family of systems, including the Sun SPARC Enterprise T5120, T5220, T5240, T5440 servers and the Sun Blade T6320 server module.

SPECint_rate2006

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered 57% and 37% better results than the best 4-chip IBM POWER6+ based systems on the SPEC CPU2006 integer throughput metrics.

  • The Sun SPARC Enterprise T5240 server equipped with two 1.6 GHz UltraSPARC T2 Plus processor chips, produced 68% and 48% better results than the best 2-chip IBM POWER6+ based systems on the SPEC CPU2006 integer throughput metrics.

  • The single-chip 1.6 GHz UltraSPARC T2 processor-based Sun CMT servers produced 59% to 68% better results than the best single-chip IBM POWER6 based systems on the SPEC CPU2006 integer throughput metrics.

  • On the four-chip Sun SPARC Enterprise T5440 server, when compared versus the 1.4 GHz version of this server, the new 1.6 GHz UltraSPARC T2 Plus processor delivered performance improvements of 25% and 20% as measured by the SPEC CPU2006 integer throughput metrics.

  • The new 1.6 GHz UltraSPARC T2 Plus processor, when put into the 2-chip Sun SPARC Enterprise T5240 server, delivered improvements of 20% and 17% when compared to the 1.4 GHz UltraSPARC T2 Plus processor based server, as measured by the SPEC CPU2006 integer throughput metrics.

  • On the single-chip Sun Blade T6320 server module, Sun SPARC Enterprise T5120 and T5220 servers, the new 1.6 GHz UltraSPARC T2 processor delivered performance improvements of 13% to 17% over the 1.4 GHz version of these servers, as measured by the SPEC CPU2006 integer throughput metrics.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered a SPECint_rate_base2006 score 3X the best 4-chip Itanium based system.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processors, delivered a SPECint_rate_base2006 score of 338, a World Record score for 4-chip systems running a single operating system instance (i.e. SMP, not clustered).

SPECfp_rate2006

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered 35% and 22% better results than the best 4-chip IBM POWER6+ based systems on the SPEC CPU2006 floating-point throughput metrics.

  • The Sun SPARC Enterprise T5240 server, equipped with two 1.6 GHz UltraSPARC T2 Plus processor chips, produced 40% and 27% better results than the best 2-chip IBM POWER6+ based systems on the SPEC CPU2006 floating-point throughput metrics.

  • The single 1.6 GHz UltraSPARC T2 processor based Sun CMT servers produced between 24% and 18% better results than the best single-chip IBM POWER6 based systems on the SPEC CPU2006 floating-point throughput metrics.

  • On the four chip Sun SPARC Enterprise T5440 server, the new 1.6 GHz UltraSPARC T2 Plus processor delivered performance improvements of 20% and 17% when compared to 1.4 GHz processors in the same system, as measured by the SPEC CPU2006 floating-point throughput metrics.

  • The new 1.6 GHz UltraSPARC T2 Plus processor, when put into a Sun SPARC Enterprise T5240 server, delivered an improvement of 12% when compared to the 1.4 GHz UltraSPARC T2 Plus processor based server as measured by the SPEC CPU2006 floating-point throughput metrics.

  • On the single processor Sun Blade T6320 server module, Sun SPARC Enterprise T5120 and T5220 servers, the new 1.6 GHz UltraSPARC T2 processor delivered a performance improvement over the 1.4 GHz version of these servers of between 11% and 10% as measured by the SPEC CPU2006 floating-point throughput metrics.

  • The Sun SPARC Enterprise T5440 server, equipped with four 1.6 GHz UltraSPARC T2 Plus processor chips, delivered a peak score 3X the best 4-chip Itanium based system, and base 2.9X, on the SPEC CPU2006 floating-point throughput metrics.

Performance Landscape

SPEC CPU2006 Performance Charts - bigger is better, selected results, please see www.spec.org for complete results. All results as of 7/17/09.

In the tables below
"Base" = SPECint_rate_base2006 or SPECfp_rate_base2006
"Peak" = SPECint_rate2006 or SPECfp_rate2006

SPECint_rate2006 results - 1 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Supermicro X8DAI 4/1 Xeon W3570 3200 8 127 136 Best Nehalem result
HP ProLiant BL465c G6 6/1 Opteron 2435 2600 6 82.1 104 Best Istanbul result
Sun SPARC T5220 8/1 UltraSPARC T2 1582 63 89.1 97.0 New
Sun SPARC T5120 8/1 UltraSPARC T2 1582 63 89.1 97.0 New
Sun Blade T6320 8/1 UltraSPARC T2 1582 63 89.2 96.7 New
Sun Blade T6320 8/1 UltraSPARC T2 1417 63 76.4 85.5
Sun SPARC T5120 8/1 UltraSPARC T2 1417 63 76.2 83.9
IBM System p 570 2/1 POWER6 4700 4 53.2 60.9 Best POWER6 result

SPECint_rate2006 - 2 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Fujitsu CELSIUS R670 8/2 Xeon W5580 3200 16 249 267 Best Nehalem result
Sun Blade X6270 8/2 Xeon X5570 2933 16 223 260
A+ Server 1021M-UR+B 12/2 Opteron 2439 SE 2800 12 168 215 Best Istanbul result
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1582 127 171 183 New
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1415 127 142 157
IBM Power 520 4/2 POWER6+ 4700 8 101 124 Best POWER6+ peak
IBM Power 520 4/2 POWER6+ 4700 8 102 122 Best POWER6+ base
HP Integrity rx2660 4/2 Itanium 9140M 1666 4 58.1 62.8 Best Itanium peak
HP Integrity BL860c 4/2 Itanium 9140M 1666 4 61.0 na Best Itanium base

SPECint_rate2006 - 4 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
SGI Altix ICE 8200EX 16/4 Xeon X5570 2933 32 466 499 Best Nehalem result
Note: clustered, not SMP
Tyan Thunder n4250QE 24/4 Opteron 8439 SE 2800 24 326 417 Best Istanbul result
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1596 255 338 360 New.  World record for
4-chip SMP
SPECint_rate_base2006
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1414 255 270 301
IBM Power 550 8/4 POWER6+ 5000 16 215 263 Best POWER6 result
HP Integrity BL870c 8/4 Itanium 9150N 1600 8 114 na Best Itanium result

SPECfp_rate2006 - 1 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
Supermicro X8DAI 4/1 Xeon W3570 3200 8 102 106 Best Nehalem result
HP ProLiant BL465c G6 6/1 Opteron 2435 2600 6 65.2 72.2 Best Istanbul result
Sun SPARC T5220 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun SPARC T5120 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun Blade T6320 8/1 UltraSPARC T2 1582 63 64.1 68.5 New
Sun Blade T6320 8/1 UltraSPARC T2 1417 63 58.1 62.3
SPARC T5120 8/1 UltraSPARC T2 1417 63 57.9 62.3
SPARC T5220 8/1 UltraSPARC T2 1417 63 57.9 62.3
IBM System p 570 2/1 POWER6 4700 4 51.5 58.0 Best POWER6 result

SPECfp_rate2006 - 2 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
ASUS TS700-E6 8/2 Xeon W5580 3200 16 201 207 Best Nehalem result
A+ Server 1021M-UR+B 12/2 Opteron 2439 SE 2800 12 133 147 Best Istanbul result
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1582 127 124 133 New
Sun SPARC T5240 16/2 UltraSPARC T2 Plus 1415 127 111 119
IBM Power 520 4/2 POWER6+ 4700 8 88.7 105 Best POWER6+ result
HP Integrity rx2660 4/4 Itanium 9140M 1666 4 54.5 55.8 Best Itanium result

SPECfp_rate2006 - 4 chip systems

System Processors Base
Copies
Performance Results Comments
Cores/
Chips
Type MHz Base Peak
SGI Altix ICE 8200EX 16/4 Xeon X5570 2933 32 361 372 Best Nehalem result
Tyan Thunder n4250QE 24/4 Opteron 8439 SE 2800 24 259 285 Best Istanbul result
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1596 255 254 270 New
Sun SPARC T5440 32/4 UltraSPARC T2 Plus 1414 255 212 230
IBM Power 550 8/4 POWER6+ 5000 16 188 222 Best POWER6+ result
HP Integrity rx7640 8/4 Itanium 2 9040 1600 8 87.4 90.8 Best Itanium result

Results and Configuration Summary

Test Configurations:


Sun Blade T6320
1.6 GHz UltraSPARC T2
64 GB (16 x 4GB)
Solaris 10 10/08
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5120/T5220
1.6 GHz UltraSPARC T2
64 GB (16 x 4GB)
Solaris 10 10/08
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5240
2 x 1.6 GHz UltraSPARC T2 Plus
128 GB (32 x 4GB)
Solaris 10 5/09
Sun Studio 12, Sun Studio 12 Update 1, gccfss V4.2.1

Sun SPARC Enterprise T5440
4 x 1.6 GHz UltraSPARC T2 Plus
256 GB (64 x 4GB)
Solaris 10 5/09
Sun Studio 12 Update 1, gccfss V4.2.1

Results Summary:



T6320 T5120 T5220 T5240 T5440
SPECint_rate_base2006 89.2 89.1 89.1 171 338
SPECint_rate2006 96.7 97.0 97.0 183 360
SPECfp_rate_base2006 64.1 64.1 64.1 124 254
SPECfp_rate2006 68.5 68.5 68.5 133 270

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark, with over 7000 results published in the three years since it was introduced. It measures:

  • "Speed" - single copy performance of chip, memory, compiler
  • "Rate" - multiple copy (throughput)

The rate metrics are used for the throughput-oriented systems described on this page. These metrics include:

  • SPECint_rate2006: throughput for 12 integer benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding
  • SPECfp_rate2006: throughput for 17 floating point benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

There are "base" variants of both the above metrics that require more conservative compilation, such as using the same flags for all benchmarks.

See here for additional information.

Key Points and Best Practices

Result on this page for the Sun SPARC Enterprise T5120 server were measured on a Sun SPARC Enterprise T5220. The Sun SPARC Enterprise T5120 and Sun SPARC Enterprise T5220 are electronically equivalent. A SPARC Enterprise 5120 can hold up to 4 disks, and a T5220 can hold up to 8. This system was tested with 4 disks; therefore, results on this page apply to both the T5120 and the T5220.

Know when you need throughput vs. speed. The Sun CMT systems described on this page provide massive throughput, as demonstrated by the fact that up to 255 jobs are run on the 4-chip system, 127 on 2-chip, and 63 on 1-chip. Some of the competitive chips do have a speed advantage - e.g. Nehalem and Istanbul - but none of the competitive results undertake to run the large number of jobs tested on Sun's CMT systems.

Use the latest compiler. The Sun Studio group is always working to improve the compiler. Sun Studio 12, and Sun Studio 12 Update 1, which are used in these submissions, provide updated code generation for a wide variety of SPARC and x86 implementations.

I/O still counts. Even in a CPU-intensive workload, some I/O remains. This point is explored in some detail at http://blogs.sun.com/jhenning/entry/losing_my_fear_of_zfs.

Disclosure Statement

SPEC, SPECint, SPECfp reg tm of Standard Performance Evaluation Corporation. Competitive results from www.spec.org as of 16 July 2009.  Sun's new results quoted on this page have been submitted to SPEC.
Sun Blade T6320 89.2 SPECint_rate_base2006, 96.7 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006;
Sun SPARC Enterprise T5220/T5120 89.1 SPECint_rate_base2006, 97.0 SPECint_rate2006, 64.1 SPECfp_rate_base2006, 68.5 SPECfp_rate2006;
Sun SPARC Enterprise T5240 172 SPECint_rate_base2006, 183 SPECint_rate2006, 124 SPECfp_rate_base2006, 133 SPECfp_rate2006;
Sun SPARC Enterprise T5440 338 SPECint_rate_base2006, 360 SPECint_rate2006, 254 SPECfp_rate_base2006, 270 SPECfp_rate2006;
Sun Blade T6320 76.4 SPECint_rate_base2006, 85.5 SPECint_rate2006, 58.1 SPECfp_rate_base2006, 62.3 SPECfp_rate2006;
Sun SPARC Enterprise T5220/T5120 76.2 SPECint_rate_base2006, 83.9 SPECint_rate2006, 57.9 SPECfp_rate_base2006, 62.3 SPECfp_rate2006;
Sun SPARC Enterprise T5240 142 SPECint_rate_base2006, 157 SPECint_rate2006, 111 SPECfp_rate_base2006, 119 SPECfp_rate2006;
Sun SPARC Enterprise T5440 270 SPECint_rate_base2006, 301 SPECint_rate2006, 212 SPECfp_rate_base2006, 230 SPECfp_rate2006;
IBM p 570 53.2 SPECint_rate_base2006, 60.9 SPECint_rate2006, 51.5 SPECfp_rate_base2006, 58.0 SPECfp_rate2006;
IBM Power 520 102 SPECint_rate_base2006, 124 SPECint_rate2006, 88.7 SPECfp_rate_base2006, 105 SPECfp_rate2006;
IBM Power 550 215 SPECint_rate_base2006, 263 SPECint_rate2006, 188 SPECfp_rate_base2006, 222 SPECfp_rate2006;
HP Integrity BL870c 114 SPECint_rate_base2006;
HP Integrity rx7640 87.4 SPECfp_rate_base2006, 90.8 SPECfp_rate2006.

Friday Jul 10, 2009

World Record TPC-H@300GB Price-Performance for Windows on Sun Fire X4600 M2

Significance of Results

Sun and Microsoft combined to deliver World Record price performance for Windows based results on the TPC-H benchmark at the 300GB scale factor. Using Microsoft's SQL Server 2008 Enterprise database along with Microsoft Windows Server 2008 operating system on the Sun Fire X4600 M2 server, the result of 2.80 $/QphH@300GB (USD) was delivered.

  • The Sun Fire X4600 M2 provides World Record price-performance of 2.80 $/QphH@300GB (USD) among Windows based TPC-H results at the 300GB scale factor. This result is 14% better price performance than the HP DL785 result.
  • The Sun Fire X4600 M2 trails HP's World Record single system performance (HP: 57,684 QphH@300GB, Sun: 55,185 QphH@300GB) by less than 5%.
  • The Sun/SQL Server solution used fewer disks for the database (168) than the other top performance leaders @300GB.
  • IBM required 79% more disks (300 total) than Sun to get a result of 46,034 QphH@300GB which is 20% below Sun's QphH.
  • HP required 21% more disks (204 total) than Sun to achieve a result of 3.24 $/QphH@300GB (USD) which is 16% worse than Sun's price performance.

This is Sun's first published TPC-H SQL Server benchmark.

Performance Landscape

ch/co/th = chips, cores, threads
$/QphH = TPC-H Price/Performance metric (smaller is better)

System ch/co/th Processor Database QphH $/QphH Price Disks Available
Sun Fire X4600 M2 8/32/32 2.7 Opteron 8384 SQL Server 2008 55,158 2.80 $154,284 168 07/06/09
HP DL785 8/32/32 2.7 Opteron 8384 SQL Server 2008 57,684 3.24 $186,700 204 11/17/08
IBM x3950 M2 8/32/32 2.93 Intel X7350 SQL Server 2005 46,034 5.40 $248,635 300 03/07/08

Complete benchmark results may be found at the TPC benchmark website http://www.tpc.org.

Results and Configuration Summary

Server:

    Sun Fire X4600 M2 with:
      8 x AMD Opteron 8384, 2.7 GHz QC processors
      256 GB memory
      3 x 73GB (15K RPM) internal SAS disks

Storage:

    14 x Sun Storage J4200 each consisting of 12 x 146GB 15,000 RPM SAS disks

Software:

    Operating System: Microsoft Windows Server 2008 Enterprise x64 Edition SP1
    Database Manager: SQL Server 2008 Enterprise x64 Edition SP1

Audited Results:

    Database Size: 300GB (Scale Factor)
    TPC-H Composite: 55,157.5 QphH@300GB
    Price/performance: $2.80 / QphH@300GB (USD)
    Available: July 6, 2009
    Total 3 Year Cost: $154,284.19 (USD)
    TPC-H Power: 67,095.6
    TPC-H Throughput: 45,343.5
    Database Load Time: 17 hours 29 minutes
    Storage Ratio: 76.82

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

SQL Server 2008 is able to take advantage of the lower latency local memory access provides on the Sun Fire 4600 M2 server. This was achieved by setting the NUMA initialization parameter to enable all NUMA optimizations.

Enabling the Windows large-page feature provided a significant performance improvement. Because SQL Server 2008 manages its own memory buffer, the use of large-pages resulted in significant performance increase. Note that to use large-pages, an application must be part of the large-page group of the OS (Windows).

The 64-bit Windows OS and 64-bit SQL Server software were able to utilize the 256GB of memory available on the Sun Fire 4600 M2 server.

See Also

Disclosure Statement

TPC-H@300GB: Sun Fire X4600 M2 55,158 QphH@300GB, $2.80/QphH@300GB, availability 7/6/09; HP DL785, 57,684 QphH@300GB, $3.24/QphH@300GB, availability 11/17/08; IBM x3950 M2, 46,034 QphH@300GB, $5.40/QphH@300GB, availability 03/07/08; TPC-H, QphH, $/QphH tm of Transaction Processing Performance Council (TPC). More info www.tpc.org.

Monday Jun 15, 2009

Sun Fire X4600 M2 Server Two-tier SAP ERP 6.0 (Unicode) Standard Sales and Distribution (SD) Benchmark

Significance of Results

  • World Record performance result with 8 processors on the two-tier SAP ERP 6.0 enhancement pack 4 (unicode) standard sales and distribution (SD) benchmark as of June 10, 2009.
  • The Sun Fire X4600 M2 server with 8 AMD Opteron 8384 SE processors (32 cores, 32 threads) achieved 6,050 SAP SD Benchmark users running SAP ERP application release 6.0 enhancement pack 4 benchmark with unicode software, using MaxDB 7.8 database and Solaris 10 OS.
  • This benchmark result highlights the optimal performance of SAP ERP on Sun Fire servers running the Solaris OS and the seamless multilingual support available for systems running SAP applications.
  • ZFS is used in this benchmark for its database and log files.
  • The Sun Fire X4600 M2 server beats both the HP ProLiant DL785 G5 and the NEC Express5800 running Windows by 10% and 35% respectively even though all three systems use the same number of processors.
  • In January 2009, a new version, the Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark, was released. This new release has higher cpu requirements and so yields from 25-50% fewer users compared to the previous Two-tier SAP ERP 6.0 (non-unicode) Standard Sales and Distribution (SD) Benchmark. 10-30% of this is due to the extra overhead from the processing of the larger character strings due to Unicode encoding.  Refer to SAP Note for more details (https://service.sap.com/sap/support/notes/1139642 Note: User and password for SAP Service Marketplace required).

  • Unicode is a computing standard that allows for the representation and manipulation of text expressed in most of the world's writing systems. Before the Unicode requirement, this benchmark used ASCII characters meaning each was just 1 byte. The new version of the benchmark requires Unicode characters and the Application layer (where ~90% of the cycles in this benchmark are spent) uses a new encoding, UTF-16, which uses 2 bytes to encode most characters (including all ASCII characters) and 4 bytes for some others. This requires computers to do more computation and use more bandwidth and storage for most character strings. Refer to the above SAP Note for more details.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(New version of the benchmark as of January 2009)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4600 M2
8xAMD Opteron 8384 SE @2.7GHz
256 GB
Solaris 10
MaxDB 7.8
6,050 2009
6.0 EP4
(Unicode)
33,230 4,154 10-Jun-09
HP ProLiant DL785 G5
8xAMD Opteron 8393 SE @3.1GHz
128 GB
Windows Server 2008
Enterprise Edition
SQL Server 2008
5,518 2009
6.0 EP4
(Unicode)
30,180 3,772 24-Apr-09
NEC Express 5800
8xIntel Xeon X7460 @2.66GHz
256 GB
Windows Server 2008
Datacenter Edition
SQL Server 2008
4,485 2009
6.0 EP4
(Unicode)
25,280 12,640 09-Feb-09
Sun Fire X4270
2xIntel Xeon X5570 @2.93GHz
48 GB
Solaris 10
Oracle 10g
3,700 2009
6.0 EP4
(Unicode)
20,300 10,150 30-Mar-09

SAP ERP 6.0 (non-unicode) Results
(Old version of the benchmark retired at the end of 2008)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4600 M2
8xAMD Opteron 8384 @2.7GHz
128 GB
Solaris 10
MaxDB 7.6
7,825 2005
6.0
39,270 4,909 09-Dec-08
IBM System x3650
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2003 EE
DB2 9.5
5,100 2005
6.0
25,530 12,765 19-Dec-08
HP ProLiant DL380 G6
2xIntel Xeon X5570 @2.93GHz
48 GB
Windows Server 2003 EE
SQL Server 2005
4,995 2005
6.0
25,000 12,500 15-Dec-08

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Results and Configuration Summary

Hardware Configuration:

    One, Sun Fire X4600 M2
      8 x 2.7 GHz AMD Opteron 8384 SE processors (8 processors / 32 cores / 32 threads)
      256 GB memory
      3 x STK2540, 3 x STK2501 each with 12 x 146GB/15KRPM disks

Software Configuration:

    Solaris 10
    SAP ECC Release: 6.0 Enhancement Pack 4 (Unicode)
    MaxDB 7.8

Certified Results

    Performance:
    6050 benchmark users
    SAP Certification:
    2009022

Key Points and Best Practices

  • This is the best 8 Processor SAP ERP 6.0 EP4 (Unicode) result as of June 10, 2009.
  • Two-tier SAP ERP 6.0 Enhancement Pack 4 (Unicode) Standard Sales and Distribution (SD) Benchmark on Sun Fire X4600 M2 (8 processors, 32 cores, 32 threads, 8x2.7 GHz AMD Opteron 8384 SE) was able to support 6,050 SAP SD Users on top of the Solaris 10 OS.
  • Since random writes are an important part of this benchmark, we used zfs to help coalesce those into sequential writes.

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 06/10/09: Sun Fire X4600 M2 (8 processors, 32 cores, 32 threads) 6,050 SAP SD Users, 8x 2.7 GHz AMD Opteron 8384 SE, 256 GB memory, MaxDB 7.8, Solaris 10, Cert# 2009022. HP ProLiant DL785 G5 (8 processors, 32 cores, 32 threads) 5,518 SAP SD Users, 8x 3.1 GHz AMD Opteron 8393 SE, 128 GB memory, SQL Server 2008, Windows Server 2008 Enterprise Edition, Cert# 2009009. NEC Express 5800 (8 processors, 48 cores, 48 threads) 4,485 SAP SD Users, 8x 2.66 GHz Intel Xeon X7460, 256 GB memory, SQL Server 2008, Windows Server 2008 Datacenter Edition, Cert# 2009001. Sun Fire X4270 (2 processors, 8 cores, 16 threads) 3,700 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, Oracle 10g, Solaris 10, Cert# 2009005. Sun Fire X4600M2 (8 processors, 32 cores, 32 threads) 7,825 SAP SD Users, 8x 2.7 GHz AMD Opteron 8384, 128 GB memory, MaxDB 7.6, Solaris 10, Cert# 2008070. IBM System x3650 M2 (2 Processors, 8 Cores, 16 Threads) 5,100 SAP SD users,2x 2.93 Ghz Intel Xeon X5570, DB2 9.5, Windows Server 2003 Enterprise Edition, Cert# 2008079. HP ProLiant DL380 G6 (2 processors, 8 cores, 16 threads) 4,995 SAP SD Users, 2x 2.93 GHz Intel Xeon x5570, 48 GB memory, SQL Server 2005, Windows Server 2003 Enterprise Edition, Cert# 2008071.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« July 2016
SunMonTueWedThuFriSat
     
1
2
3
4
5
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today