Monday Oct 26, 2015

Oracle E-Business Order-To-Cash Batch Large: SPARC T7-1 World Record

Oracle's SPARC T7-1 server set a world record running the Oracle E-Business Suite 12.1.3 Standard Large (100,000 Order/Inventory Lines) Order-To-Cash (Batch) workload.

  • The SPARC T7-1 server produced a world record hourly order line throughput of 273,973 per hour (21.90 min elapsed time) on the Oracle E-Business Suite R12 (12.1.3) Large Order-To-Cash (Batch) benchmark using a SPARC T7-1 server for the database and application tiers running Oracle Database 11g on Oracle Solaris 11.

  • The SPARC T7-1 server demonstrated 12% better hourly order line throughput compared to a two-chip Cisco UCS B200 M4 (Intel Xeon Processor E5-2697 v3).

Performance Landscape

Results for the Oracle E-Business 12.1.3 Order-To-Cash Batch Large model workload.

Batch Workload: Order-To-Cash Large Model
System CPU Employees/Hr Elapsed Time (min)
SPARC T7-1 1 x SPARC M7 processor 273,973 21.90
Cisco UCS B200 M4 2 x Intel Xeon Processor E5-2697 v3 243,803 24.61
Cisco UCS B200 M3 2 x Intel Xeon Processor E5-2690 232,739 25.78

Configuration Summary

Hardware Configuration:

SPARC T7-1 server with
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle ZFS Storage ZS3-2 appliance (DB Data storage) with
40 x 900 GB 10K RPM SAS-2 HDD,
8 x Write Flash Accelerator SSD and
2 x Read Flash Accelerator SSD 1.6TB SAS
Oracle Flash Accelerator F160 PCIe Card (1.6 TB NVMe for DB Log storage)

Software Configuration:

Oracle Solaris 11.3
Oracle E-Business Suite R12 (12.1.3)
Oracle Database 11g (11.2.0.3.0)

Benchmark Description

The Oracle E-Business Suite Standard R12 Benchmark combines online transaction execution by simulated users with concurrent batch processing to model a typical scenario for a global enterprise. This benchmark ran one Batch component, Order-To-Cash, in the Large size.

Results can be published in four sizes and use one or more online/batch modules

  • X-large: Maximum online users running all business flows between 10,000 to 20,000; 750,000 order to cash lines per hour and 250,000 payroll checks per hour.
    • Order to Cash Online — 2400 users
      • The percentage across the 5 transactions in Order Management module is:
        • Insert Manual Invoice — 16.66%
        • Insert Order — 32.33%
        • Order Pick Release — 16.66%
        • Ship Confirm — 16.66%
        • Order Summary Report — 16.66%
    • HR Self-Service — 4000 users
    • Customer Support Flow — 8000 users
    • Procure to Pay — 2000 users
  • Large: 10,000 online users; 100,000 order to cash lines per hour and 100,000 payroll checks per hour.
  • Medium: up to 3000 online users; 50,000 order to cash lines per hour and 10,000 payroll checks per hour.
  • Small: up to 1000 online users; 10,000 order to cash lines per hour and 5,000 payroll checks per hour.

Key Points and Best Practices

  • All system optimizations are in the published report, find link in See Also section below.

See Also

Disclosure Statement

Oracle E-Business Large Order-To-Cash Batch workload, SPARC T7-1, 4.13 GHz, 1 chip, 32 cores, 256 threads, 256 GB memory, elapsed time 21.90 minutes, 273,973 hourly order line throughput, Oracle Solaris 11.3, Oracle E-Business Suite 12.1.3, Oracle Database 11g Release 2, Results as of 10/25/2015.

Tuesday Mar 26, 2013

SPARC T5-8 Produces TPC-C Benchmark Single-System World Record Performance

Oracle's SPARC T5-8 server equipped with eight 3.6 GHz SPARC T5 processors obtained a result of 8,552,523 tpmC on the TPC-C benchmark. This result is a world record for single servers. Oracle demonstrated this world record database performance running Oracle Database 11g Release 2 Enterprise Edition with Partitioning.

  • The SPARC T5-8 server delivered a single system TPC-C world record of 8,552,523 tpmC with a price performance of $0.55/tpmC using Oracle Database 11g Release 2. This configuration is available 09/25/13.

  • The SPARC T5-8 server has 2.8x times better performance than the 4-processor IBM x3850 X5 system equipped with Intel Xeon processors.

  • The SPARC T5-8 server delivers 1.7x the performance compared to the next best eight processor result.

  • The SPARC T5-8 server delivers 2.4x the performance per chip compared to the IBM Power 780 3-node cluster result.

  • The SPARC T5-8 server delivers 1.8x the performance per chip compared to the IBM Power 780 non-clustered result.

  • The SPARC T5-8 server delivers 1.4x the performance per chip compared to the IBM Flex x240 Xeon result.

  • The SPARC T5-8 server delivers 1.7x the performance per chip compared to the Sun Server X2-8 system equipped with Intel Xeon processors.

  • The SPARC T5-8 server demonstrated over 3.1 Million 4KB IOP/sec with 76% idle, in a separate IO intensive workload, demonstrating its ability process a large IO workload with lots of processing headroom.

  • This result showed Oracle's integrated hardware and software stacks provide industry leading performance.

  • The Oracle solution utilized Oracle Solaris 11.1 with Oracle Database 11g Enterprise Edition with Partitioning and demonstrates stability and performance with this highly secure operating environment to produce the world record TPC-C benchmark performance.

Performance Landscape

Select TPC-C results (sorted by tpmC, bigger is better)

System p/c/t tpmC Price
/tpmC
Avail Database Memory
Size
IBM Power 780 Cluster 24/192/768 10,366,254 1.38 USD 10/13/2010 IBM DB2 9.7 6 TB
SPARC T5-8 8/128/1024 8,552,523 0.55 USD 9/25/2013 Oracle 11g R2 4 TB
IBM Power 595 32/64/128 6,085,166 2.81 USD 12/10/2008 IBM DB2 9.5 4 TB
Sun Server X2-8 8/80/160 5,055,888 0.89 USD 7/10/2012 Oracle 11g R2 4 TB
IBM x3850 X5 4/40/80 3,014,684 0.59 USD 7/11/2011 IBM DB2 9.7 3 TB
IBM Flex x240 2/16/32 1,503,544 0.53 USD 8/16/2012 IBM DB2 9.7 768 GB
IBM Power 780 2/8/32 1,200,011 0.69 USD 10/13/2010 IBM DB2 9.5 512 GB

p/c/t - processors, cores, threads
Avail - availability date

Oracle and IBM TPC-C Response times

System tpmC Response Time (sec)
New Order 90th%
Response Time (sec)
New Order Average
IBM Power 780 Cluster 10,366,254 2.100 1.137
SPARC T5-8 8,552,523 0.410 0.234
IBM Power 595 6,085,166 1.690 1.220
IBM Power 780 1,200,011 0.694 0.403

Oracle uses Response Time New Order Average and Response Time New Order 90th% for comparison between Oracle and IBM.

Graphs of Oracle's and IBM's Response Time New Order Average and Response Time New Order 90th% can be found in the full disclosure reports on TPC's website TPC-C Official Result Page.

Configuration Summary and Results

Hardware Configuration:

Server
SPARC T5-8
8 x 3.6 GHz SPARC T5
4 TB memory
2 x 600 GB 10K RPM SAS2 internal disks
12 x 8 Gbs FC HBA

Data Storage
54 x Sun Server X3-2L systems configured as COMSTAR heads, each with
2 x 2.4 GHz Intel Xeon E5-2609 processors
16 GB memory
4 x Sun Flash Accelerator F40 PCIe Cards (400 GB each)
12 x 3 TB 7.2K RPM 3.5" SAS disks
2 x 600 GB 10K RPM SAS2 disks
2 x Brocade 6510 switches

Redo Storage
2 x Sun Server X3-2L systems configured as COMSTAR heads, each with
2 x 2.4 GHz Intel Xeon E5-2609 processors
16 GB memory
12 x 3 TB 7.2K RPM 3.5" SAS disks
2 x 600 GB 10K RPM SAS2 disks

Clients
16 x Sun Server X3-2 servers, each with
2 x 2.9 GHz Intel Xeon E5-2690 processors
64 GB memory
2 x 600 GB 10K RPM SAS2 disks

Software Configuration:

Oracle Solaris 11.1 SRU 4.5 (for SPARC T5-8)
Oracle Solaris 11.1 (for COMSTAR systems)
Oracle Database 11g Release 2 Enterprise Edition with Partitioning
Oracle iPlanet Web Server 7.0 U5
Oracle Tuxedo CFS-R

Results:

System: SPARC T5-8
tpmC: 8,552,523
Price/tpmC: 0.55 USD
Available: 9/25/2013
Database: Oracle Database 11g
Cluster: no
Response Time New Order Average: 0.234 seconds

Benchmark Description

TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

Key Points and Best Practices

  • Oracle Database 11g Release 2 Enterprise Edition with Partitioning scales easily to this high level of performance.

  • COMSTAR (Common Multiprotocol SCSI Target) is the software framework that enables an Oracle Solaris host to serve as a SCSI Target platform. COMSTAR uses a modular approach to break the huge task of handling all the different pieces in a SCSI target subsystem into independent functional modules which are glued together by the SCSI Target Mode Framework (STMF). The modules implementing functionality at SCSI level (disk, tape, medium changer etc.) are not required to know about the underlying transport. And the modules implementing the transport protocol (FC, iSCSI, etc.) are not aware of the SCSI-level functionality of the packets they are transporting. The framework hides the details of allocation providing execution context and cleanup of SCSI commands and associated resources and simplifies the task of writing the SCSI or transport modules.

  • Oracle iPlanet Web Server middleware is used for the client tier of the benchmark. Each web server instance supports more than a quarter-million users while satisfying the response time requirement from the TPC-C benchmark.

See Also

Disclosure Statement

TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Processing Performance Council (TPC). SPARC T5-8 (8/128/1024) with Oracle Database 11g Release 2 Enterprise Edition with Partitioning, 8,552,523 tpmC, $0.55 USD/tpmC, available 9/25/2013. IBM Power 780 Cluster (24/192/768) with DB2 ESE 9.7, 10,366,254 tpmC, $1.38 USD/tpmC, available 10/13/2010. IBM x3850 X5 (4/40/80) with DB2 ESE 9.7, 3,014,684 tpmC, $0.59 USD/tpmC, available 7/11/2011. IBM x3850 X5 (4/32/64) with DB2 ESE 9.7, 2,308,099 tpmC, $0.60 USD/tpmC, available 5/20/2011. IBM Flex x240 (2/16/32) with DB2 ESE 9.7, 1,503,544 tpmC, $0.53 USD/tpmC, available 8/16/2012. IBM Power 780 (2/8/32) with IBM DB2 9.5, 1,200,011 tpmC, $0.69 USD/tpmC, available 10/13/2010. Source: http://www.tpc.org/tpcc, results as of 3/26/2013.

Thursday Nov 08, 2012

Improved Performance on PeopleSoft Combined Benchmark using SPARC T4-4

Oracle's SPARC T4-4 server running Oracle's PeopleSoft HCM 9.1 combined online and batch benchmark achieved a world record 18,000 concurrent users experiencing subsecond response time while executing a PeopleSoft Payroll batch job of 500,000 employees in 32.4 minutes.

  • This result was obtained with a SPARC T4-4 server running Oracle Database 11g Release 2, a SPARC T4-4 server running PeopleSoft HCM 9.1 application server and a SPARC T4-2 server running Oracle WebLogic Server in the web tier.

  • The SPARC T4-4 server running the application tier used Oracle Solaris Zones which provide a flexible, scalable and manageable virtualization environment.

  • The average CPU utilization on the SPARC T4-2 server in the web tier was 17%, on the SPARC T4-4 server in the application tier it was 59%, and on the SPARC T4-4 server in the database tier was 47% (online and batch) leaving significant headroom for additional processing across the three tiers.

  • The SPARC T4-4 server used for the database tier hosted Oracle Database 11g Release 2 using Oracle Automatic Storage Management (ASM) for database files management with I/O performance equivalent to raw devices.

Performance Landscape

Results are presented for the PeopleSoft HRMS Self-Service and Payroll combined benchmark. The new result with 128 streams shows significant improvement in the payroll batch processing time with little impact on the self-service component response time.

PeopleSoft HRMS Self-Service and Payroll Benchmark
Systems Users Ave Response
Search (sec)
Ave Response
Save (sec)
Batch
Time (min)
Streams
SPARC T4-2 (web)
SPARC T4-4 (app)
SPARC T4-4 (db)
18,000 0.988 0.539 32.4 128
SPARC T4-2 (web)
SPARC T4-4 (app)
SPARC T4-4 (db)
18,000 0.944 0.503 43.3 64

The following results are for the PeopleSoft HRMS Self-Service benchmark that was previous run. The results are not directly comparable with the combined results because they do not include the payroll component.

PeopleSoft HRMS Self-Service 9.1 Benchmark
Systems Users Ave Response
Search (sec)
Ave Response
Save (sec)
Batch
Time (min)
Streams
SPARC T4-2 (web)
SPARC T4-4 (app)
2x SPARC T4-2 (db)
18,000 1.048 0.742 N/A N/A

The following results are for the PeopleSoft Payroll benchmark that was previous run. The results are not directly comparable with the combined results because they do not include the self-service component.

PeopleSoft Payroll (N.A.) 9.1 - 500K Employees (7 Million SQL PayCalc, Unicode)
Systems Users Ave Response
Search (sec)
Ave Response
Save (sec)
Batch
Time (min)
Streams
SPARC T4-4 (db)
N/A N/A N/A 30.84 96

Configuration Summary

Application Configuration:

1 x SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
512 GB memory
Oracle Solaris 11 11/11
PeopleTools 8.52
PeopleSoft HCM 9.1
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031
Java Platform, Standard Edition Development Kit 6 Update 32

Database Configuration:

1 x SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
256 GB memory
Oracle Solaris 11 11/11
Oracle Database 11g Release 2
PeopleTools 8.52
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031
Micro Focus Server Express (COBOL v 5.1.00)

Web Tier Configuration:

1 x SPARC T4-2 server with
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
Oracle Solaris 11 11/11
PeopleTools 8.52
Oracle WebLogic Server 10.3.4
Java Platform, Standard Edition Development Kit 6 Update 32

Storage Configuration:

1 x Sun Server X2-4 as a COMSTAR head for data
4 x Intel Xeon X7550, 2.0 GHz
128 GB memory
1 x Sun Storage F5100 Flash Array (80 flash modules)
1 x Sun Storage F5100 Flash Array (40 flash modules)

1 x Sun Fire X4275 as a COMSTAR head for redo logs
12 x 2 TB SAS disks with Niwot Raid controller

Benchmark Description

This benchmark combines PeopleSoft HCM 9.1 HR Self Service online and PeopleSoft Payroll batch workloads to run on a unified database deployed on Oracle Database 11g Release 2.

The PeopleSoft HRSS benchmark kit is a Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark where DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published.

PeopleSoft HR SS defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consist of 14 scenarios which emulate users performing typical HCM transactions such as viewing paycheck, promoting and hiring employees, updating employee profile and other typical HCM application transactions.

All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. This benchmark metric is the weighted average response search/save time for all the transactions.

The PeopleSoft 9.1 Payroll (North America) benchmark demonstrates system performance for a range of processing volumes in a specific configuration. This workload represents large batch runs typical of a ERP environment during a mass update. The benchmark measures five application business process run times for a database representing large organization. They are Paysheet Creation, Payroll Calculation, Payroll Confirmation, Print Advice forms, and Create Direct Deposit File. The benchmark metric is the cumulative elapsed time taken to complete the Paysheet Creation, Payroll Calculation and Payroll Confirmation business application processes.

The benchmark metrics are taken for each respective benchmark while running simultaneously on the same database back-end. Specifically, the payroll batch processes are started when the online workload reaches steady state (the maximum number of online users) and overlap with online transactions for the duration of the steady state.

Key Points and Best Practices

  • Two PeopleSoft Domain sets with 200 application servers each on a SPARC T4-4 server were hosted in 2 separate Oracle Solaris Zones to demonstrate consolidation of multiple application servers, ease of administration and performance tuning.

  • Each Oracle Solaris Zone was bound to a separate processor set, each containing 15 cores (total 120 threads). The default set (1 core from first and third processor socket, total 16 threads) was used for network and disk interrupt handling. This was done to improve performance by reducing memory access latency by using the physical memory closest to the processors and offload I/O interrupt handling to default set threads, freeing up cpu resources for Application Servers threads and balancing application workload across 240 threads.

  • A total of 128 PeopleSoft streams server processes where used on the database node to complete payroll batch job of 500,000 employees in 32.4 minutes.

See Also

Disclosure Statement

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 8 November 2012.

Tuesday Oct 02, 2012

Performance of Oracle Business Intelligence Benchmark on SPARC T4-4

Oracle's SPARC T4-4 server configured with four SPARC T4 3.0 GHz processors delivered 25,000 concurrent users on Oracle Business Intelligence Enterprise Edition (BI EE) 11g benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 10.

  • A SPARC T4-4 server running Oracle Business Intelligence Enterprise Edition 11g achieved 25,000 concurrent users with an average response time of 0.36 seconds with Oracle BI server cache set to ON.

  • The benchmark data clearly shows that the underlying hardware, SPARC T4 server, and the Oracle BI EE 11g (11.1.1.6.0 64-bit) platform scales within a single system supporting 25,000 concurrent users while executing 415 transactions/sec.

  • The benchmark demonstrated the scalability of Oracle Business Intelligence Enterprise Edition 11g 11.1.1.6.0, which was deployed in a vertical scale-out fashion on a single SPARC T4-4 server.

  • Oracle Internet Directory configured on SPARC T4 server provided authentication for the 25,000 Oracle BI EE users with sub-second response time.

  • A SPARC T4-4 with internal Solid State Drive (SSD) using the ZFS file system showed significant I/O performance improvement over traditional disk for the Web Catalog activity. In addition, ZFS helped get past the UFS limitation of 32767 sub-directories in a Web Catalog directory.

  • The multi-threaded 64-bit Oracle Business Intelligence Enterprise Edition 11g and SPARC T4-4 server proved to be a successful combination by providing sub-second response times for the end user transactions, consuming only half of the available CPU resources at 25,000 concurrent users, leaving plenty of head room for increased load.

  • The Oracle Business Intelligence on SPARC T4-4 server benchmark results demonstrate that comprehensive BI functionality built on a unified infrastructure with a unified business model yields best-in-class scalability, reliability and performance.

  • Oracle BI EE 11g is a newer version of Business Intelligence Suite with richer and superior functionality. Results produced with Oracle BI EE 11g benchmark are not comparable to results with Oracle BI EE 10g benchmark. Oracle BI EE 11g is a more difficult benchmark to run, exercising more features of Oracle BI.

Performance Landscape

Results for the Oracle BI EE 11g version of the benchmark. Results are not comparable to the Oracle BI EE 10g version of the benchmark.

Oracle BI EE 11g Benchmark
System Number of Users Response Time (sec)
1 x SPARC T4-4 (4 x SPARC T4 3.0 GHz) 25,000 0.36

Results for the Oracle BI EE 10g version of the benchmark. Results are not comparable to the Oracle BI EE 11g version of the benchmark.

Oracle BI EE 10g Benchmark
System Number of Users
2 x SPARC T5440 (4 x SPARC T2+ 1.6 GHz) 50,000
1 x SPARC T5440 (4 x SPARC T2+ 1.6 GHz) 28,000

Configuration Summary

Hardware Configuration:

SPARC T4-4 server
4 x SPARC T4-4 processors, 3.0 GHz
128 GB memory
4 x 300 GB internal SSD

Storage Configuration:

Sun ZFS Storage 7120
16 x 146 GB disks

Software Configuration:

Oracle Solaris 10 8/11
Oracle Solaris Studio 12.1
Oracle Business Intelligence Enterprise Edition 11g (11.1.1.6.0)
Oracle WebLogic Server 10.3.5
Oracle Internet Directory 11.1.1.6.0
Oracle Database 11g Release 2

Benchmark Description

Oracle Business Intelligence Enterprise Edition (Oracle BI EE) delivers a robust set of reporting, ad-hoc query and analysis, OLAP, dashboard, and scorecard functionality with a rich end-user experience that includes visualization, collaboration, and more.

The Oracle BI EE benchmark test used five different business user roles - Marketing Executive, Sales Representative, Sales Manager, Sales Vice-President, and Service Manager. These roles included a maximum of 5 different pre-built dashboards. Each dashboard page had an average of 5 reports in the form of a mix of charts, tables and pivot tables, returning anywhere from 50 rows to approximately 500 rows of aggregated data. The test scenario also included drill-down into multiple levels from a table or chart within a dashboard.

The benchmark test scenario uses a typical business user sequence of dashboard navigation, report viewing, and drill down. For example, a Service Manager logs into the system and navigates to his own set of dashboards using Service Manager. The BI user selects the Service Effectiveness dashboard, which shows him four distinct reports, Service Request Trend, First Time Fix Rate, Activity Problem Areas, and Cost Per Completed Service Call spanning 2002 to 2005. The user then proceeds to view the Customer Satisfaction dashboard, which also contains a set of 4 related reports, drills down on some of the reports to see the detail data. The BI user continues to view more dashboards – Customer Satisfaction and Service Request Overview, for example. After navigating through those dashboards, the user logs out of the application. The benchmark test is executed against a full production version of the Oracle Business Intelligence 11g Applications with a fully populated underlying database schema. The business processes in the test scenario closely represent a real world customer scenario.

See Also

Disclosure Statement

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 30 September 2012.

SPARC T4-4 Delivers World Record First Result on PeopleSoft Combined Benchmark

Oracle's SPARC T4-4 servers running Oracle's PeopleSoft HCM 9.1 combined online and batch benchmark achieved World Record 18,000 concurrent users while executing a PeopleSoft Payroll batch job of 500,000 employees in 43.32 minutes and maintaining online users response time at < 2 seconds.

  • This world record is the first to run online and batch workloads concurrently.

  • This result was obtained with a SPARC T4-4 server running Oracle Database 11g Release 2, a SPARC T4-4 server running PeopleSoft HCM 9.1 application server and a SPARC T4-2 server running Oracle WebLogic Server in the web tier.

  • The SPARC T4-4 server running the application tier used Oracle Solaris Zones which provide a flexible, scalable and manageable virtualization environment.

  • The average CPU utilization on the SPARC T4-2 server in the web tier was 17%, on the SPARC T4-4 server in the application tier it was 59%, and on the SPARC T4-4 server in the database tier was 35% (online and batch) leaving significant headroom for additional processing across the three tiers.

  • The SPARC T4-4 server used for the database tier hosted Oracle Database 11g Release 2 using Oracle Automatic Storage Management (ASM) for database files management with I/O performance equivalent to raw devices.

  • This is the first three tier mixed workload (online and batch) PeopleSoft benchmark also processing PeopleSoft payroll batch workload.

Performance Landscape

PeopleSoft HR Self-Service and Payroll Benchmark
Systems Users Ave Response
Search (sec)
Ave Response
Save (sec)
Batch
Time (min)
Streams
SPARC T4-2 (web)
SPARC T4-4 (app)
SPARC T4-4 (db)
18,000 0.944 0.503 43.32 64

Configuration Summary

Application Configuration:

1 x SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
512 GB memory
1 x 600 GB SAS internal disks
4 x 300 GB SAS internal disks
1 x 100 GB and 2 x 300 GB internal SSDs
2 x 10 Gbe HBA
Oracle Solaris 11 11/11
PeopleTools 8.52
PeopleSoft HCM 9.1
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031
Java Platform, Standard Edition Development Kit 6 Update 32

Database Configuration:

1 x SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
256 GB memory
1 x 600 GB SAS internal disks
2 x 300 GB SAS internal disks
Oracle Solaris 11 11/11
Oracle Database 11g Release 2
PeopleTools 8.52
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031

Web Tier Configuration:

1 x SPARC T4-2 server with
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
2 x 300 GB SAS internal disks
1 x 300 GB internal SSD
1 x 100 GB internal SSD
Oracle Solaris 11 11/11
PeopleTools 8.52
Oracle WebLogic Server 10.3.4
Java Platform, Standard Edition Development Kit 6 Update 32

Storage Configuration:

1 x Sun Server X2-4 as a COMSTAR head for data
4 x Intel Xeon X7550, 2.0 GHz
128 GB memory
1 x Sun Storage F5100 Flash Array (80 flash modules)
1 x Sun Storage F5100 Flash Array (40 flash modules)

1 x Sun Fire X4275 as a COMSTAR head for redo logs
12 x 2 TB SAS disks with Niwot Raid controller

Benchmark Description

This benchmark combines PeopleSoft HCM 9.1 HR Self Service online and PeopleSoft Payroll batch workloads to run on a unified database deployed on Oracle Database 11g Release 2.

The PeopleSoft HRSS benchmark kit is a Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark where DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published.

PeopleSoft HR SS defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consist of 14 scenarios which emulate users performing typical HCM transactions such as viewing paycheck, promoting and hiring employees, updating employee profile and other typical HCM application transactions.

All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. This benchmark metric is the weighted average response search/save time for all the transactions.

The PeopleSoft 9.1 Payroll (North America) benchmark demonstrates system performance for a range of processing volumes in a specific configuration. This workload represents large batch runs typical of a ERP environment during a mass update. The benchmark measures five application business process run times for a database representing large organization. They are Paysheet Creation, Payroll Calculation, Payroll Confirmation, Print Advice forms, and Create Direct Deposit File. The benchmark metric is the cumulative elapsed time taken to complete the Paysheet Creation, Payroll Calculation and Payroll Confirmation business application processes.

The benchmark metrics are taken for each respective benchmark while running simultaneously on the same database back-end. Specifically, the payroll batch processes are started when the online workload reaches steady state (the maximum number of online users) and overlap with online transactions for the duration of the steady state.

Key Points and Best Practices

  • Two Oracle PeopleSoft Domain sets with 200 application servers each on a SPARC T4-4 server were hosted in 2 separate Oracle Solaris Zones to demonstrate consolidation of multiple application servers, ease of administration and performance tuning.

  • Each Oracle Solaris Zone was bound to a separate processor set, each containing 15 cores (total 120 threads). The default set (1 core from first and third processor socket, total 16 threads) was used for network and disk interrupt handling. This was done to improve performance by reducing memory access latency by using the physical memory closest to the processors and offload I/O interrupt handling to default set threads, freeing up cpu resources for Application Servers threads and balancing application workload across 240 threads.

See Also

Disclosure Statement

Oracle's PeopleSoft HR and Payroll combined benchmark, www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 09/30/2012.

Thursday Mar 29, 2012

Sun Server X2-8 (formerly Sun Fire X4800 M2) Delivers World Record TPC-C for x86 Systems

Oracle's Sun Server X2-8 (formerly Sun Fire X4800 M2 server) equipped with eight 2.4 GHz Intel Xeon Processor E7-8870 chips obtained a result of 5,055,888 tpmC on the TPC-C benchmark. This result is a world record for x86 servers. Oracle demonstrated this world record database performance running Oracle Database 11g Release 2 Enterprise Edition with Partitioning.

  • The Sun Server X2-8 delivered a new x86 TPC-C world record of 5,055,888 tpmC with a price performance of $0.89/tpmC using Oracle Database 11g Release 2. This configuration is available 7/10/12.

  • The Sun Server X2-8 delivers 3.0x times better performance than the next 8-processor result, an IBM System p 570 equipped with POWER6 processors.

  • The Sun Server X2-8 has 3.1x times better price/performance than the 8-processor 4.7GHz POWER6 IBM System p 570.

  • The Sun Server X2-8 has 1.6x times better performance than the 4-processor IBM x3850 X5 system equipped with Intel Xeon processors.

  • This is the first TPC-C result on any system using eight Intel Xeon Processor E7-8800 Series chips.

  • The Sun Server X2-8 is the first x86 system to get over 5 million tpmC.

  • The Oracle solution utilized Oracle Linux operating system and Oracle Database 11g Enterprise Edition Release 2 with Partitioning to produce the x86 world record TPC-C benchmark performance.

Performance Landscape

Select TPC-C results (sorted by tpmC, bigger is better)

System p/c/t tpmC Price
/tpmC
Avail Database Memory
Size
Sun Server X2-8 8/80/160 5,055,888 0.89 USD 7/10/2012 Oracle 11g R2 4 TB
IBM x3850 X5 4/40/80 3,014,684 0.59 USD 7/11/2011 DB2 ESE 9.7 3 TB
IBM x3850 X5 4/32/64 2,308,099 0.60 USD 5/20/2011 DB2 ESE 9.7 1.5 TB
IBM System p 570 8/16/32 1,616,162 3.54 USD 11/21/2007 DB2 9.0 2 TB

p/c/t - processors, cores, threads
Avail - availability date

Oracle and IBM TPC-C Response times

System tpmC Response Time (sec)
New Order 90th%
Response Time (sec)
New Order Average

Sun Server X2-8 5,055,888 0.210 0.166
IBM x3850 X5 3,014,684 0.500 0.272
Ratios - Oracle Better 1.6x 1.4x 1.3x

Oracle uses average new order response time for comparison between Oracle and IBM.

Graphs of Oracle's and IBM's response times for New-Order can be found in the full disclosure reports on TPC's website TPC-C Official Result Page.

Configuration Summary and Results

Hardware Configuration:

Server
Sun Server X2-8
8 x 2.4 GHz Intel Xeon Processor E7-8870
4 TB memory
8 x 300 GB 10K RPM SAS internal disks
8 x Dual port 8 Gbs FC HBA

Data Storage
10 x Sun Fire X4270 M2 servers configured as COMSTAR heads, each with
1 x 3.06 GHz Intel Xeon X5675 processor
8 GB memory
10 x 2 TB 7.2K RPM 3.5" SAS disks
2 x Sun Storage F5100 Flash Array storage (1.92 TB each)
1 x Brocade 5300 switches

Redo Storage
2 x Sun Fire X4270 M2 servers configured as COMSTAR heads, each with
1 x 3.06 GHz Intel Xeon X5675 processor
8 GB memory
11 x 2 TB 7.2K RPM 3.5" SAS disks

Clients
8 x Sun Fire X4170 M2 servers, each with
2 x 3.06 GHz Intel Xeon X5675 processors
48 GB memory
2 x 300 GB 10K RPM SAS disks

Software Configuration:

Oracle Linux (Sun Fire 4800 M2)
Oracle Solaris 11 Express (COMSTAR for Sun Fire X4270 M2)
Oracle Solaris 10 9/10 (Sun Fire X4170 M2)
Oracle Database 11g Release 2 Enterprise Edition with Partitioning
Oracle iPlanet Web Server 7.0 U5
Tuxedo CFS-R Tier 1

Results:

System: Sun Server X2-8
tpmC: 5,055,888
Price/tpmC: 0.89 USD
Available: 7/10/2012
Database: Oracle Database 11g
Cluster: no
New Order Average Response: 0.166 seconds

Benchmark Description

TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

Key Points and Best Practices

  • Oracle Database 11g Release 2 Enterprise Edition with Partitioning scales easily to this high level of performance.

  • COMSTAR (Common Multiprotocol SCSI Target) is the software framework that enables an Oracle Solaris host to serve as a SCSI Target platform. COMSTAR uses a modular approach to break the huge task of handling all the different pieces in a SCSI target subsystem into independent functional modules which are glued together by the SCSI Target Mode Framework (STMF). The modules implementing functionality at SCSI level (disk, tape, medium changer etc.) are not required to know about the underlying transport. And the modules implementing the transport protocol (FC, iSCSI, etc.) are not aware of the SCSI-level functionality of the packets they are transporting. The framework hides the details of allocation providing execution context and cleanup of SCSI commands and associated resources and simplifies the task of writing the SCSI or transport modules.

  • Oracle iPlanet Web Server middleware is used for the client tier of the benchmark. Each web server instance supports more than a quarter-million users while satisfying the response time requirement from the TPC-C benchmark.

See Also

Disclosure Statement

TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Processing Performance Council (TPC). Sun Server X2-8 (8/80/160) with Oracle Database 11g Release 2 Enterprise Edition with Partitioning, 5,055,888 tpmC, $0.89 USD/tpmC, available 7/10/2012. IBM x3850 X5 (4/40/80) with DB2 ESE 9.7, 3,014,684 tpmC, $0.59 USD/tpmC, available 7/11/2011. IBM x3850 X5 (4/32/64) with DB2 ESE 9.7, 2,308,099 tpmC, $0.60 USD/tpmC, available 5/20/2011. IBM System p 570 (8/16/32) with DB2 9.0, 1,616,162 tpmC, $3.54 USD/tpmC, available 11/21/2007. Source: http://www.tpc.org/tpcc, results as of 7/15/2011.

Wednesday Sep 28, 2011

SPARC T4 Servers Set World Record on PeopleSoft HRMS 9.1

Oracle's SPARC T4-4 servers running Oracle's PeopleSoft HRMS Self-Service 9.1 benchmark and Oracle Database 11g Release 2 achieved World Record performance on Oracle Solaris 10.

  • Using two SPARC T4-4 servers to run the application and database tiers and one SPARC T4-2 server to run the webserver tier, Oracle demonstrated world record performance of 15,000 concurrent users running the PeopleSoft HRMS Self-Service 9.1 benchmark.

  • The combination of the SPARC T4 servers running the PeopleSoft HRMS 9.1 benchmark supports 3.8x more online users with faster response time compared to the best published result from IBM on the previous PeopleSoft HRMS 8.9 benchmark.

  • The average CPU utilization on the SPARC T4-4 server in the application tier handling 15,000 users was less than 50%, leaving significant room for application growth.

  • The SPARC T4-4 server on the application tier used Oracle Solaris Containers which provide a flexible, scalable and manageable virtualization environment.

Performance Landscape

PeopleSoft HRMS Self-Service 9.1 Benchmark
Systems Processors Users Ave Response -
Search (sec)
Ave Response -
Save (sec)
SPARC T4-2 (web)
SPARC T4-4 (app)
SPARC T4-4 (db)
2 x SPARC T4, 2.85 GHz
4 x SPARC T4, 3.0 GHz
4 x SPARC T4, 3.0 GHz
15,000 1.01 0.63
PeopleSoft HRMS Self-Service 8.9 Benchmark
IBM Power 570 (web/app)
IBM Power 570 (db)
12 x POWER5, 1.9 GHz
4 x POWER5, 1.9 GHz
4,000 1.74 1.25
IBM p690 (web)
IBM p690 (app)
IBM p690 (db)
4 x POWER4, 1.9 GHz
12 x POWER4, 1.9 GHz
6 x 4392 MPIS/Gen1
4,000 1.35 1.01

The main differences between version 9.1 and version 8.9 of the benchmark are:

  • the database expanded from 100K employees and 20K managers to 500K employees and 100K managers,
  • the manager data was expanded,
  • a new transaction, "Employee Add Profile," was added, the percent of users executing it is less then 2%, and the transaction has a heavier footprint,
  • version 9.1 has a different benchmark metric (Average Response search/save time for x number of users) versus single user search/save time,
  • newer versions of the PeopleSoft application and PeopleTools software are used.

Configuration Summary

Application Server:

1 x SPARC T4-4 server
4 x SPARC T4 processors 3.0 GHz
512 GB main memory
5 x 300 GB SAS internal disks,
2 x 100 GB internal SSDs
1 x 300 GB internal SSD
Oracle Solaris 10 8/11
PeopleSoft PeopleTools 8.51.02
PeopleSoft HCM 9.1
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_20

Web Server:

1 x SPARC T4-2 server
2 x SPARC T4 processors 2.85 GHz
256 GB main memory
1 x 300 GB SAS internal disks
1 x 300 GB internal SSD
Oracle Solaris 10 8/11
PeopleSoft PeopleTools 8.51.02
Oracle WebLogic Server 11g (10.3.3)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_20

Database Server:

1 x SPARC T4-4 server
4 x SPARC T4 processors 3.0 GHz
256 GB main memory
3 x 300 GB SAS internal disks
1 x Sun Storage F5100 Flash Array (80 flash modules)
Oracle Solaris 10 8/11
Oracle Database 11g Release 2

Benchmark Description

The purpose of the PeopleSoft HRMS Self-Service 9.1 benchmark is to measure comparative online performance of the selected processes in PeopleSoft Enterprise HCM 9.1 with Oracle Database 11g. The benchmark kit is an Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark with no dependency on remote COBOL calls, there is no batch workload, and DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published.

PeopleSoft defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consists of 14 scenarios which emulate users performing typical HCM transactions such as viewing paychecks, promoting and hiring employees, updating employee profiles and other typical HCM application transactions.

All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. The benchmark metric is the Average Response Time for search and save for 15,000 users..

Key Points and Best Practices

  • The application tier was configured with two PeopleSoft application server instances on the SPARC T4-4 server hosted in two separate Oracle Solaris Containers to demonstrate consolidation of multiple application, ease of administration, and load balancing.

  • Each PeopleSoft Application Server instance running in an Oracle Solaris Container was configured to run 5 application server Domains with 30 application server instances to be able to effectively handle the 15,000 users workload with zero application server queuing and minimal use of resources.

  • The web tier was configured with 20 WebLogic instances and with 4 GB JVM heap size to load balance transactions across 10 PeopleSoft Domains. That enables equitable distribution of transactions and scaling to high number of users.

  • Internal SSDs were configured in the application tier to host PeopleSoft Application Servers object CACHE file systems and in the web tier for WebLogic servers' logging providing near zero millisecond service time and faster server response time.

See Also

Disclosure Statement

Oracle's PeopleSoft HRMS 9.1 benchmark, www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

Tuesday Sep 27, 2011

SPARC T4-4 Server Sets World Record on PeopleSoft Payroll (N.A.) 9.1, Outperforms IBM Mainframe, HP Itanium

Oracle's SPARC T4-4 server achieved world record performance on the Unicode version of Oracle's PeopleSoft Enterprise Payroll (N.A) 9.1 extra-large volume model benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 10.

  • The SPARC T4-4 server was able to process 1,460,544 payments/hour using PeopleSoft Payroll N.A 9.1.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 2.8x faster than IBM z10 EC 2097 Payroll 9.0 (UNICODE version) result of 87.4 minutes. The IBM mainframe is rated at 6,512 MIPS.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 3.1x faster than HP rx7640 Itanium2 non-UNICODE result of 96.17 minutes, on Payroll 9.0.

  • The average CPU utilization on the SPARC T4-4 server was only 30%, leaving significant room for business growth.

  • The SPARC T4-4 server processed payroll for 500,000 employees, 750,000 payments, in 30.84 minutes compared to the earlier world record result of 46.76 minutes on Oracle's SPARC Enterprise M5000 server.

  • The SPARC Enterprise M5000 server configured with eight 2.66 GHz SPARC64 VII processors has a result of 46.76 minutes on Payroll 9.1. That is 7% better than the result of 50.11 minutes on the SPARC Enterprise M5000 server configured with eight 2.53 GHz SPARC64 VII processors on Payroll 9.0. The difference in clock speed between the two processors is ~5%. That is close to the difference in the two results, thereby showing that the impact of the Payroll 9.1 benchmark on the overall result is about the same as that of Payroll 9.0.

Performance Landscape

PeopleSoft Payroll (N.A.) 9.1 – 500K Employees (7 Million SQL PayCalc, Unicode)

System OS/Database Payroll Processing
Result (minutes)
Run 1
(minutes)
Num of
Streams
SPARC T4-4, 4 x 3.0 GHz SPARC T4 Solaris/Oracle 11g 30.84 43.76 96
SPARC M5000, 8 x 2.66 GHz SPARC64 VII+ Solaris/Oracle 11g 46.76 66.28 32

PeopleSoft Payroll (N.A.) 9.0 – 500K Employees (3 Million SQL PayCalc, Non-Unicode)

System OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000, 8 x 2.53 GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 58.96 80.5 250.68 462.6 8
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 87.4 ** 107.6 - - 8
HP rx7640, 8 x 1.6 GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

** This result was run with Unicode. The IBM z10 EC 2097 UNICODE result of 87.4 minutes is 48% slower than IBM z10 EC 2097 non-UNICODE result of 58.96 minutes, both on Payroll 9.0, each configured with nine 4.4GHz Gen1 processors.

Payroll 9.1 Compared to Payroll 9.0

Please note that Payroll 9.1 is Unicode based and Payroll 9.0 had non-Unicode and Unicode versions of the workload. There are 7 million executions of an SQL statement for the PayCalc batch process in Payroll 9.1 and 3 million executions of the same SQL statement for the PayCalc batch process in Payroll 9.0. This gets reflected in the elapsed time (27.33 min for 9.1 and 23.78 min for 9.0). The elapsed times of all other batch processes is lower (better) on 9.1.

Configuration Summary

Hardware Configuration:

SPARC T4-4 server
4 x 3.0 GHz SPARC T4 processors
256 GB memory
Sun Storage F5100 Flash Array
80 x 24 GB FMODs

Software Configuration:

Oracle Solaris 10 8/11
PeopleSoft HRMS and Campus Solutions 9.10.303
PeopleSoft Enterprise (PeopleTools) 8.51.035
Oracle Database 11g Release 2 11.2.0.1 (64-bit)
Micro Focus COBOLServer Express 5.1 (64-bit)

Benchmark Description

The PeopleSoft 9.1 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing a large organization. The five processes are:

  • Paysheet Creation: Generates payroll data worksheets consisting of standard payroll information for each employee for a given pay cycle.

  • Payroll Calculation: Looks at paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by Payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by the above processes and produces an electronic transmittal file that is used to transfer payroll funds directly into an employee's bank account.

Key Points and Best Practices

  • The SPARC T4-4 server with the Sun Storage F5100 Flash Array device had an average read throughput of up to 103 MB/sec and an average write throughput of up to 124 MB/sec while consuming 30% CPU on average.

  • The Sun Storage F5100 Flash Array device is a solid-state device that provides a read latency of only 0.5 msec. That is about 10 times faster than the normal disk latencies of 5 msec measured on this benchmark.

See Also

  • Oracle PeopleSoft Benchmark White Papers
    oracle.com
  • PeopleSoft Enterprise Human Capital Management (Payroll)
    oracle.com

  • PeopleSoft Enterprise Payroll 9.1 Using Oracle for Solaris (Unicode) on an Oracle's SPARC T4-4 – White Paper
    oracle.com

  • SPARC T4-4 Server
    oracle.com
  • Oracle Solaris
    oracle.com
  • Oracle Database 11g Release 2 Enterprise Edition
    oracle.com
  • Sun Storage F5100 Flash Array
    oracle.com

Disclosure Statement

Oracle's PeopleSoft Payroll 9.1 benchmark, SPARC T4-4 30.84 min,
http://www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

Friday Jun 10, 2011

SPARC Enterprise M5000 Delivers First PeopleSoft Payroll 9.1 Benchmark

Oracle's M-series server sets a world record on Oracle's PeopleSoft Enterprise Payroll (N.A) 9.1 with extra large volume model benchmark (Unicode). Oracle's SPARC Enterprise M5000 server was able to to run faster than the previous generation system result even though the PeopleSoft Payroll 9.1 benchmark is more computationally demanding.

Oracle's SPARC Enterprise M5000 server configured with eight 2.66 GHz SPARC64 VII+ processors together with Oracle's Sun Storage F5100 Flash Array storage achieved world record performance on the Unicode version of Oracle's PeopleSoft Enterprise Payroll (N.A) 9.1 with extra large volume model benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 10.

  • The SPARC Enterprise M5000 server processed payroll payments for the 500K employees PeopleSoft Payroll 9.1 (Unicode) benchmark in 46.76 minutes compared to a previous result of 50.11 minutes for the PeopleSoft Payroll 9.0 (non-Unicode) benchmark configured with 2.53 GHz SPARC64 VII processors resulting in 7% better performance.

  • Note that the IBM z10 Gen1 mainframe running the PeopleSoft Payroll 9.0 (Unicode) benchmark was 48% slower than the 9.0 non-Unicode version. The IBM z10 mainframe with nine 4.4 GHz Gen1 processors has a list price over $6M and is rated at 6,512 MIPS.

  • The SPARC Enterprise M5000 server with the Sun Storage F5100 Flash Array system processed payroll for 500K employees completing the end-to-end run in 66.28 mins, 11% faster than earlier published result of 73.88 mins with Payroll 9.0 configured with 2.53 GHz SPARC64 VII processors.

  • The Sun Storage F5100 Flash Array device is a high performance, high-density solid-state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies of 5 msec measured on this benchmark.

Performance Landscape

PeopleSoft Payroll (N.A.) 9.1 – 500K Employees (7 Million SQL PayCalc, Unicode)

System Processor OS/Database Payroll Processing
Result (minutes)
Run 1
(minutes)
Num of
Streams
SPARC M5000 8x 2.66GHz SPARC64 VII+ Solaris/Oracle 11g 46.76 66.28 32

PeopleSoft Payroll (N.A.) 9.0 – 500K Employees (3 Million SQL PayCalc, Non-Unicode)

System Processor OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000 8x 2.53GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 9x 4.4GHz Gen1 Z/OS /DB2 58.96 80.5 250.68 462.6 8
IBM z10 9x 4.4GHz Gen1 Z/OS /DB2 87.4 ** 107.6 - - 8
HP rx7640 8x 1.6GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

** This result was run with Unicode

Payroll 9.1 Compared to Payroll 9.0

Please note that Payroll 9.1 is Unicode based and Payroll 9.0 is non-Unicode. There are 7 million executions of an SQL statement for the PayCalc batch process in Payroll 9.1 and 3 million executions of the same SQL statement for the PayCalc batch process in Payroll 9.0. This gets reflected in the elapsed time (27.33 min for 9.1 and 23.78 min for 9.0). The elapsed times of all other batch processes is lower (better) on 9.1.

Configuration Summary

Hardware Configuration:

SPARC Enterprise M5000 server
8 x 2.66 GHz SPARC64 VII+ processors
128 GB memory
2 x SAS HBA (SG-XPCIE8SAS-E-Z - PCIe HBA for Rack Servers)
Sun Storage F5100 Flash Array
40 x 24 GB FMODs
1 x StorageTek 2501 array with
12 x 146 GB SAS 15K RPM disks
1 x StorageTek 2540 array with
12 x 146 GB SAS 15K RPM disks

Software Configuration:

Oracle Solaris 10 09/10
PeopleSoft HRMS and Campus Solutions 9.10.303
PeopleSoft Enterprise (PeopleTools) 8.51.035
Oracle Database 11g Release 2 11.2.0.1 (64-bit)
Micro Focus COBOLServer Express 5.1 (64-bit)

Benchmark Description

The PeopleSoft 9.1 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing a large organization. The five processes are:

  • Paysheet Creation: Generates payroll data worksheets consisting of standard payroll information for each employee for a given pay cycle.

  • Payroll Calculation: Looks at paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by Payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by the above processes and produces an electronic transmittal file that is used to transfer payroll funds directly into an employee's bank account.

For the benchmark, we collected at least three data points with different numbers of job streams (parallel jobs). This batch benchmark allows a maximum of thirty-two job streams to be configured to run in parallel.

See Also

Disclosure Statement

Oracle's PeopleSoft Payroll 9.1 benchmark, SPARC Enterprise M5000 46.76 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html, results 6/10/2011.

Thursday Dec 02, 2010

World Record TPC-C Result on Oracle's SPARC Supercluster with T3-4 Servers

Oracle demonstrated the world's fastest database performance using 27 of Oracle's SPARC T3-4 servers, 138 Sun Storage F5100 Flash Array storage systems and Oracle Database 11g Release 2 Enterprise Edition with Real Application Clusters (RAC) and Partitioning delivered a world-record TPC-C benchmark result.

  • The SPARC T3-4 server cluster delivered a world record TPC-C benchmark result of 30,249,688 tpmC and $1.01 $/tpmC (USD) using Oracle Database 11g Release 2 on a configuration available 6/1/2011.

  • The SPARC T3-4 server cluster is 2.9x faster than the performance of the IBM Power 780 (POWER7 3.86 GHz) cluster with IBM DB2 9.7 database and has 27% better price/performance on the TPC-C benchmark. Almost identical price discount levels were applied by Oracle and IBM.

  • The Oracle solution has three times better performance than the IBM configuration and only used twice the power during the run of the TPC-C benchmark.  (Based upon IBM's own claims of energy usage from their August 17, 2010 press release.)

  • The Oracle solution delivered 2.9x the performance in only 71% of the space compared to the IBM TPC-C benchmark result.

  • The SPARC T3-4 server with Sun Storage F5100 Flash Array storage solution demonstrates 3.2x faster response time than IBM Power 780 (POWER7 3.86 GHz) result on the TPC-C benchmark.

  • Oracle used a single-image database, whereas IBM used 96 separate database partitions on their 3-node cluster. It is interesting to note that IBM used 32 database images instead of running each server as a simple SMP.

  • IBM did not use DB2 Enterprise Database, but instead IBM used "DB2 InfoSphere Warehouse 9.7" which is a data warehouse and data management product and not their flagship OLTP product.

  • The multi-node SPARC T3-4 server cluster is 7.4x faster than the HP Superdome (1.6 GHz Itanium2) solution and has 66% better price/performance on the TPC-C benchmark.

  • The Oracle solution utilized Oracle's Sun FlashFire technology to deliver this result. The Sun Storage F5100 Flash Array storage system was used for database storage.

  • Oracle Database 11g Enterprise Edition Release 2 with Real Application Clusters and Partitioning scales and effectively uses all of the nodes in this configuration to produce the world record TPC-C benchmark performance.

  • This result showed Oracle's integrated hardware and software stacks provide industry leading performance.

Performance Landscape

TPC-C results (sorted by tpmC, bigger is better)

System tpmC Price/tpmC Avail Database Cluster Racks
27 x SPARC T3-4 30,249,688 1.01 USD 6/1/2011 Oracle 11g RAC Y 15
3 x IBM Power 780 10,366,254 1.38 USD 10/13/10 DB2 9.7 Y 10
HP Integrity Superdome 4,092,799 2.93 USD 08/06/07 Oracle 10g R2 N 46

Avail - Availability date
Racks - Clients, servers, storage, infrastructure

Oracle and IBM TPC-C Response times

System tpmC Response Time (sec)
New Order 90th%
Response Time (sec)
New Order Average
27 x SPARC T3-4 30,249,688 0.750 0.352
3 x IBM Power 780 10,366,254 2.1 1.137
Response Time Ratio - Oracle Better 2.9x 2.8x 3.2x

Oracle uses Average New Order Response time for comparison between Oracle and IBM.

Graphs of Oracle's and IBM's response times for New-Order can be found in the full disclosure reports on TPC's website TPC-C Official Result Page.

Configuration Summary and Results

Hardware Configuration:

15 racks used to hold

Servers
27 x SPARC T3-4 servers, each with
4 x 1.65 GHz SPARC T3 processors
512 GB memory
3 x 300 GB 10K RPM 2.5" SAS disks

Data Storage
69 x Sun Fire X4270 M2 servers configured as COMSTAR heads, each with
1 x 2.93 GHz Intel Xeon X5670 processor
8 GB memory
9 x 2 TB 7.2K RPM 3.5" SAS disks
2 x Sun Storage F5100 Flash Array storage (1.92 TB each)
1 x Brocade DCX switch

Redo Storage
28 x Sun Fire X4270 M2 servers configured as COMSTAR heads, each with
1 x 2.93 GHz Intel Xeon X5670 processor
8 GB memory
11 x 2 TB 7.2K RPM 3.5" SAS disks
2 x Brocade 5300 switches

Clients
81 x Sun Fire X4170 M2 servers, each with
2 x 2.93 GHz Intel X5670 processors
48 GB memory
2 x 146 GB 10K RMP 2.5" SAS disks

Software Configuration:

Oracle Solaris 10 9/10 (for SPARC T3-4 and Sun Fire X4170 M2)
Oracle Solaris 11 Express (COMSTAR for Sun Fire X4270 M2)
Oracle Database 11g Release 2 Enterprise Edition with Real Application Clusters and Partitioning
Oracle iPlanet Web Server 7.0 U5
Tuxedo CFS-R Tier 1

Results:

System 27 x SPARC T3-4
tpmC 30,249,688
Price/tpmC 1.01 USD
Avail 6/1/2011
Database Oracle Database 11g RAC
Cluster yes
Racks 15
New Order Ave Response 0.352 seconds

Benchmark Description

TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

Key Points and Best Practices

  • Oracle Database 11g Release 2 Enterprise Edition with Real Application Clusters and Partitioning scales easily to this high level of performance.

  • Sun Storage F5100 Flash Array storage provides high performance, very low latency, and very high storage density.

  • COMSTAR (Common Multiprotocol SCSI Target), new in Oracle Solaris 11 Express, is the software framework that enables a Solaris host to serve as a SCSI Target platform. COMSTAR uses a modular approach to break the huge task of handling all the different pieces in a SCSI target subsystem into independent functional modules which are glued together by the SCSI Target Mode Framework (STMF). The modules implementing functionality at SCSI level (disk, tape, medium changer etc.) are not required to know about the underlying transport. And the modules implementing the transport protocol (FC, iSCSI, etc.) are not aware of the SCSI-level functionality of the packets they are transporting. The framework hides the details of allocation providing execution context and cleanup of SCSI commands and associated resources and simplifies the task of writing the SCSI or transport modules.

  • Oracle iPlanet Web Server 7.0 U5 is used in the user tier of the benchmark with each of the web server instance supporting more than a quarter-million users, while satisfying the stringent response time requirement from the TPC-C benchmark.

See Also

Disclosure Statement

TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Processing Performance Council (TPC). 27-node SPARC T3-4 Cluster (4 x 1.65 GHz SPARC T3 processors) with Oracle Database 11g Release 2 Enterprise Edition with Real Application Clusters and Partitioning, 30,249,688 tpmC, $1.01/tpmC, Available 6/1/2011. IBM Power 780 Cluster (3 nodes using 3.86 GHz POWER7 processors) with IBM DB2 InfoSphere Warehouse Ent. Base Ed. 9.7, 10,366,254 tpmC, $1.38 USD/tpmC, available 10/13/2010. HP Integrity Superdome(1.6GHz Itanium2, 64 processors, 128 cores, 256 threads) with Oracle 10g Enterprise Edition, 4,092,799 tpmC, $2.93/tpmC, available 8/06/07. Energy claims based upon IBM calculations and internal measurements. Source: http://www.tpc.org/tpcc, results as of 11/22/2010

World Record Performance on PeopleSoft Enterprise Financials Benchmark run on Sun SPARC Enterprise M4000 and M5000

Oracle's Sun SPARC Enterprise M4000 and M5000 servers have combined to produce a world record result on Oracle's PeopleSoft Enterprise Financial Management 9.0 benchmark.

  • The Sun SPARC Enterprise M4000 and M5000 servers configured with SPARC64 VII+ processors along with Oracle's Sun Storage F5100 Flash Array system achieved a world record result using PeopleSoft Enterprise Financial Management and Oracle Database 11g Release 2 software running on the Oracle Solaris 10 operating system.

  • The PeopleSoft Enterprise Financial Management solution processed online business transactions to support 1000 concurrent users using 32 application server threads with compliant response times while simultaneously completing complex batch jobs in record time.

  • The Sun Storage F5100 Flash Array system is a high performance, high-density solid-state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies of 5 msec measured on this benchmark.

  • The Sun SPARC Enterprise M4000 and M5000 servers were able to process online users and concurrent batch jobs simultaneously in 34.72 minutes on this benchmark that reflects complex, multi-tier environment and utilizes a large back-end database of nearly 1 TB.

  • The combination of Oracle's PeopleSoft Enterprise Financial Management 9.00.00.331, PeopleSoft PeopleTools 8.49.23 and Oracle WebLogic server was run on the Sun SPARC Enterprise M4000 server and Oracle database 11g Release 2 was run on the Sun SPARC Enterprise M5000 server for this benchmark.

Performance Landscape

The following table discloses the current and the single previously disclosed result for this benchmark. Results are elapsed times therefore the smaller number is better.

Servers CPU Tier Batch (mins) Batch
w/Online (mins)
Sun SPARC Enterprise M4000 2.66 GHz SPARC64 VII+ Web/App
33.09
34.72
Sun SPARC Enterprise M5000 2.66 GHz SPARC64 VII+ DB

SPARC T3-1 1.65 GHz SPARC T3 Web/App 35.82 37.01
Sun SPARC Enterprise M5000 2.5 GHz SPARC64 VII DB

Configuration Summary

Web/Application Tier Configuration:

1 x Sun SPARC Enterprise M4000
4 x 2.66 GHz SPARC64 VII+ processors
128 GB of memory

Database Tier Configuration:

1 x Sun SPARC Enterprise M5000
8 x 2.66 GHz SPARC64 VII+ processors
128 GB of memory
1 x Sun Storage F5100 Flash Array (74 x 24 GB FMODs)
2 x StorageTek 2540 (12 x 146 GB SAS 15K RPM)
1 x StorageTek 2501 (12 x 146 GB SAS 15K RPM)
1 x Dual-Port SAS Fibre Channel Host Bus Adapters (HBA)

Software Configurations:

Oracle Solaris 10 10/09
PeopleSoft Enterprise Financial Management/SCM 9.00.00.311 64-bit
PeopleSoft Enterprise (PeopleTools) 8.49.23 64-bit
Oracle Database 11g Release 2 11.1.0.6 64-bit
Oracle Tuxedo 9.1 RP36 with Jolt 9.1
Micro Focus COBOL Server Express 4.0 SP4 64-bit

Benchmark Description

This Day-in-the-Life benchmark measured the concurrent batch and online performance for a large database model. This scenario more accurately represents a production environment where users and scheduled batch jobs must run concurrently. This benchmark measured performance results during a Close-the-Books process.

The PeopleSoft Enterprise Financials 9 batch processes included in this benchmark are as follows:

  • Journal Generator: (AE) This process creates journals from accounting entries (AE) generated from various data sources, including non-PeopleSoft systems as well as PeopleSoft applications. In the benchmark, the Journal Generator (FS_JGEN) process is set up to create accounting entries from Oracle's PeopleSoft applications in the same database, such as PeopleSoft Enterprise Payables, Receivables, Asset Management, Expenses, Cash Management. The process is run with the option of Edit and Post turned on to edit and post the journals created by Journal generator. Journal Edit is an AE program and Post is a COBOL program.

  • Allocation: (AE) This process allocates balances held or accumulated in one or more entities to more than one business unit, department or other entities based on user-defined rules.

  • Journal Edit & Post: (AE & COBOL) Journal Edit validates journal transactions before posting them to the ledger. This validation ensures that journals are valid, for example: valid ChartFields values and combinations, debits and credits equal, and inter/intra-unit balanced, Journal Post process posts only valid, edited journals, ensures each journal line posts to the appropriate target detail ledgers, and then changes the journal's status to posted. In this benchmark, the Journal Edit & Post is also set up to edit and post Oracle's PeopleSoft applications from another database, such as PeopleSoft Enterprise Payroll data.

  • Summary Ledger: (AE) Summary Ledger processing summarizes detail ledger data across selected GL BUs. Summary Ledgers can be generated for reporting purposes or used in consolidations.

  • Consolidations: (COBOL) Consolidation processing summarizes ledger balances and generates elimination journal entries across business units based on user-defined rules.

  • SQR & nVision Reporting: Reporting will consist of nVision and SQR reports. A balance sheet, an income statement, and a trial balance will be generated for each GL BU by SQR processes GLS7002 and GLS7012. The consolidated results of the nVision reports are run by 10 nVision users using 4 standard delivered report request definitions such as BALANCE, INCOME, CONSBAL, and DEPTINC. Each of the nVision users will have ownership over 10 Business Units and each of the nVision users will submit multiple runs that are being executed in parallel to generate a total of 40 nVision reports.

Batch processes are run concurrently with more than 1000 emulated users executing 30 pre-defined online applications. Response times for the online applications are collected and must conform to a maximum time.

Key Points and Best Practices

The Sun SPARC Enterprise M4000 and M5000 servers were able process online users and concurrent batch jobs simultaneously in 34.72 minutes.

The Sun Storage F5100 Flash Array system, which is highly tuned for IOPS, contributed to the result through reduced IO latency.

The family of Sun SPARC Enterprise M-series servers, with Sun Storage F5100 Flash Array systems, form an ideal environment for hosting complex multi-tier applications. This is the second public disclosure of any system running this benchmark.

The Sun SPARC Enterprise M4000 server hosted the web and application server tiers providing good response time to emulated user requests. The benchmark specification allows 1000 users, but there is headroom for increased load.

The Sun SPARC Enterprise M5000 server was used for the database server along with a Sun Storage F5100 Flash Array system. The speed of the M-series server with the low latency of the Flash Array provided the overall low latency for user requests, even while completing complex batch jobs.

Despite the systems being lightly loaded, the increased frequency of the SPARC64 VII+ processors yielded lower latencies and faster elapsed times than previously disclosed results.

The low latency of the Sun Storage F5100 Flash Array storage contributed to the excellent response times of emulated users by making data quickly available to the database back-end. The array was configured as several RAID 0 volumes and data was distributed across the volumes, maximizing storage bandwidth.

The transaction processing capacity of the Sun SPARC Enterprise M5000 server enabled very fast batch processing times while supporting over 1000 online users.

While running the maximum workload specified by the benchmark, the systems were lightly loaded, providing headroom to grow.

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle's PeopleSoft Financials 9.0 benchmark, Oracle's Sun SPARC Enterprise M4000 (4 2.66 SPARC64 VII+), Oracle's Sun SPARC Enterprise M5000 (8 2.66 SPARC64 VII+), 34.72 min. Results as of 12/02/2010, see www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html for more about PeopleSoft.

Thursday Sep 23, 2010

Sun Storage F5100 Flash Array with PCI-Express SAS-2 HBAs Achieves Over 17 GB/sec Read

Sun Storage F5100 Flash Array with PCI-Express SAS-2 HBAs Achieves Over 17 GB/sec Read bandwidth flash openstorage performance storage Oracle's Sun Storage F5100 Flash Array storage is a high-performance, high-density, solid-state flash array delivering 17 GB/sec sequential read throughput performance (1 MB reads) and 10 GB/sec write sequential throughput performance (1 MB writes).
  • Use the PCI-Express SAS-2 HBAs and the slot count can be reduced 50%, compared to the PCI-Express SAS-1 HBAs.

  • The Sun Storage F5100 Flash Array storage using 8 PCI-Express SAS-2 HBAs showed a 33% aggregate, sequential read bandwidth improvement over using 16 PCI-Express SAS-1 HBAs.

  • The Sun Storage F5100 Flash Array storage using 8 PCI-Express SAS-2 HBAs showed a 6% aggregate, sequential write bandwidth improvement over using 16 PCI-Express SAS-1 HBAs.

  • Each SAS port of the Sun Storage F5100 Flash Array storage delivered over 1 GB/sec sequential read performance.

  • Performance data is also presented utilizing smaller numbers of FMODs in the full configuration, demonstrating near perfect scaling from 20 to 80 FMODs.

The Sun Storage F5100 Flash Array storage is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

Performance Landscape

Results for the PCI-Express SAS-2 HBAs were obtained using four hosts, each configured with 2 HBAs.

Results for the PCI-Express SAS-1 HBAs were obtained using four hosts, each configured with 4 HBAs.

Bandwidth Measurements

Sequential Read (Aggregate GB/sec) for 1 MB Transfers
HBA Configuration FMODs
1 20 40 80
8 SAS-2 HBAs 0.26 4.3 8.5 17.0
16 SAS-1 HBAs 0.26 3.2 6.4 12.8
Sequential Write (Aggregate GB/sec) for 1 MB Transfers
HBA Configuration FMODs
1 20 40 80
8 SAS-2 HBAs 0.14 2.7 5.2 10.3
16 SAS-1 HBAs 0.12 2.4 4.8 9.7

Results and Configuration Summary

Storage Configuration:

Sun Storage F5100 Flash Array
80 Flash Modules
16 ports
4 domains (20 Flash Modules per domain)
CAM zoning - 5 Flash Modules per port

Server Configuration:

4 x Sun Fire X4270 servers, each with
16 GB memory
2 x 2.93 GHz Quad-core Intel Xeon X5570 processors
2 x PCI-Express SAS-2 External HBAs, firmware version SW1.1-RC5

Software Configuration:

OpenSolaris 2009.06 or Oracle Solaris 10 10/09
Vdbench 5.0

Benchmark Description

Two IO performance metrics on the Sun Storage F5100 Flash Array storage using Vdbench 5.0 were measured: 100% Sequential Read and 100% Sequential Write. This demonstrates the maximum performance and throughput of the storage system.

Vdbench is publicly available for download at: http://vdbench.org

Key Points and Best Practices

  • Please note that the Sun Storage F5100 Flash Array storage is a 4KB sector device. Doing IOs of less than 4KB in size, or IOs not aligned on 4KB boundaries, can impact performance on write operations.
  • Drive each Flash Module with 8 outstanding IOs.
  • Both ports of each LSI PCE-Express SAS-2 HBA were used.
  • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance.

See Also

Disclosure Statement

The Sun Storage F5100 Flash Array storage delivered 17.0 GB/sec sequential read and 10.3 GB/sec sequential write. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 20, 2010.

SPARC T3-1 Performance on PeopleSoft Enterprise Financials 9.0 Benchmark

Oracle's SPARC T3-1 and Sun SPARC Enterprise M5000 servers combined with Oracle's Sun Storage F5100 Flash Array storage has produced the first world-wide disclosure and World Record performance on the PeopleSoft Enterprise Financials 9.0 benchmark.

  • Using SPARC T3-1 and Sun SPARC Enterprise M5000 servers along with a Sun Storage F5100 Flash Array system, the Oracle solution processed online business transactions to support 1000 concurrent users using 32 application server threads with compliant response times while simultaneously completing complex batch jobs. This is the first publication of this benchmark by any vendor world-wide.

  • The Sun Storage F5100 Flash Array system is a high performance, high-density solid-state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies of 5 msec measured on this benchmark.

  • The SPARC T3-1 and Sun SPARC Enterprise M5000 servers were able process online users and concurrent batch jobs simultaneously in 38.66 minutes on this benchmark that reflects complex, multi-tier environment and utilizes a large back-end database of nearly 1 TB.

  • Both the SPARC T3-1 and Sun SPARC Enterprise M5000 servers used the Oracle Solaris 10 operating system.

  • The combination of Oracle's PeopleSoft Enterprise Financials/SCM 9.00.00.331, PeopleSoft Enterprise (PeopleTools) 8.49.23 and Oracle WebLogic server running on the SPARC T3-1 server and the Oracle database 11g Release 1 was run on the Sun SPARC Enterprise M5000 server for this benchmark.

Performance Landscape

As the first world-wide disclosure of this benchmark, no competitive results exist with which the current result may be compared.

Batch Processing Times
Batch Process Elapsed Time in Minutes
Batch Alone\* Batch with
1000 Online Users\*
JGEN Subsystem 7.30 7.78
JEDIT1 2.52 3.77
ALLOCATION 6.05 10.15
ALLOC EDIT/POST 2.32 2.23
SUM LEDGER 1.00 1.18
CONSOLIDATION 1.50 1.55
Total Main Batch Stream 20.69 26.66
SQR/GL_LEDGER 8.92 9.12
SQR/GL_TBAL 3.33 3.35
SQR 11.83 12.00
nVisions 8.78 8.83
nVision 11.83 12.00
Max SQR and nVision Stream 11.83 12.00
Total Batch (sum of Main Batch and Max SQR) 32.52 38.66

\* PeopleSoft Enterprise Financials batch processing and post-processing elapsed times.

Results and Configuration Summary

Hardware Configuration:

1 x SPARC T3-1 (1 x T3 at 1.65 GHz with 128 GB of memory)
1 x Sun SPARC Enterprise M5000 (8 x SPARC64 at 2.53 GHz with 64 GB of memory)
1 x Sun Storage F5100 Flash Array (74 x 24 GB FMODs)
2 x StorageTek 2540 (12 x 146 GB SAS 15K RPM)
1 x StorageTek 2501 (12 x 146 GB SAS 15K RPM)
1 x Dual-Port SAS Fibre Channel Host Bus Adapters (HBA)

Software Configuration:

Oracle Solaris 10 10/09
Oracle's PeopleSoft Enterprise Financials/SCM 9.00.00.311 64-bit
Oracle's PeopleSoft Enterprise (PeopleTools) 8.49.23 64-bit
Oracle 11g R2 11.1.0.6 64-bit
Oracle Tuxedo 9.1 RP36 with Jolt 9.1
Micro Focus COBOL Server Express 4.0 SP4 64-bit

Benchmark Description

The PeopleSoft Enterprise Financials batch processes included in this benchmark are as follows:

  • Journal Generator: (AE) This process creates journals from accounting entries (AE) generated from various data sources, including non-PeopleSoft systems as well as PeopleSoft applications. In the benchmark, the Journal Generator (FS_JGEN) process is set up to create accounting entries from Oracle's PeopleSoft applications in the same database, such as PeopleSoft Enterprise Payables, Receivables, Asset Management, Expenses, Cash Management. The process is run with the option of Edit and Post turned on to edit and post the journals created by Journal generator. Journal Edit is an AE program and Post is a COBOL program.

  • Allocation: (AE) This process allocates balances held or accumulated in one or more entities to more than one business unit, department or other entities based on user-defined rules.

  • Journal Edit & Post: (AE & COBOL) Journal Edit validates journal transactions before posting them to the ledger. This validation ensures that journals are valid, for example: valid ChartFields values and combinations, debits and credits equal, and inter/intra-unit balanced, Journal Post process posts only valid, edited journals, ensures each journal line posts to the appropriate target detail ledgers, and then changes the journal's status to posted. In this benchmark, the Journal Edit & Post is also set up to edit and post Oracle's PeopleSoft applications from another database, such as PeopleSoft Enterprise Payroll data.

  • Summary Ledger: (AE) Summary Ledger processing summarizes detail ledger data across selected GL BUs. Summary Ledgers can be generated for reporting purposes or used in consolidations.

  • Consolidations: (COBOL) Consolidation processing summarizes ledger balances and generates elimination journal entries across business units based on user-defined rules.

  • SQR & nVision Reporting: Reporting will consist of nVision and SQR reports. A balance sheet, and income statement, and a trial balance will be generated for each GL BU by SQR processes GLS7002 and GLS7012. The consolidated results of the nVision reports are run by 10 nVision users using 4 standard delivered report request definitions such as BALANCE, INCOME, CONSBAL, and DEPTINC. Each of the nVision users will have ownership over 10 Business Units and each of the nVision users will submit multiple runs that are being executed in parallel to generate a total of 40 nVision reports.

Batch processes are run concurrently with more than 1000 emulated users executing 30 pre-defined online applications. Response times for the online applications are collected and must conform to a maximum time.

Key Points and Best Practices

Oracle's SPARC T3-1 and Oracle's Sun SPARC Enterprise M5000 servers published the first result for Oracle's PeopleSoft Enterprise Financials 9.0 benchmark for concurrent batch and 1000 online users using the large database model on Oracle 11g running Oracle Solaris 10.

The SPARC T3-1 and Sun SPARC Enterprise M5000 servers were able process online users and concurrent batch jobs simultaneously in 38.66 minutes.

The Sun Storage F5100 Flash Array system, which is highly tuned for IOPS, contributed to the result through reduced IO latency.

The combination of the SPARC T3-1 and Sun SPARC Enterprise M5000 servers, with a Sun Storage F5100 Flash Array system, form an ideal environment for hosting complex multi-tier applications. This is the first public disclosure of any system running this benchmark.

The SPARC T3-1 server hosted the web and application server tiers, providing good response time to emulated user requests. The benchmark specification allows 1000 users, but there is headroom for increased load.

The Sun SPARC Enterprise M5000 server was used for the database server along with a Sun Storage F5100 Flash Array system. The speed of the M-series server with the low latency of the Flash Array provided the overall low latency for user requests, even while completing complex batch jobs.

The parallelism of the SPARC T3-1 server, when used as an application and web server tier, is best taken advantage of by configuring sufficient server processes. With sufficient server processes distributed across the hardware cores, acceptable user response times are achieved.

The low latency of the Sun Storage F5100 Flash Array storage contributed to the excellent response times of emulated users by making data quickly available to the database back-end. The array was configured as several RAID 0 volumes and data was distributed across the volumes, maximizing storage bandwidth.

The transaction processing capacity of the Sun SPARC Enterprise M5000 server enabled very fast batch processing times while supporting over 1000 online users.

While running the maximum workload specified by the benchmark, the systems were lightly loaded, providing headroom to grow.

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle's PeopleSoft Financials 9.0 benchmark, Oracle's SPARC T3-1 (1 1.65GHz SPARC-T3), Oracle's SPARC Enterprise M5000 (8 2.53GHz SPARC64), 38.66 min. www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html Results 09/20/2010.

Tuesday Sep 21, 2010

Sun Flash Accelerator F20 PCIe Cards Outperform IBM on SPC-1C

Oracle's Sun Flash Accelerator F20 PCIe cards delivered outstanding value as measured by the SPC-1C benchmark, showing the advantage of Oracle's Sun FlashFire technology in terms of both performance and price/performance.
  • Three of Sun Flash Accelerator F20 PCIe cards delivered an aggregate of 72,521.11 SPC-1C IOPS, achieving the best price/performance (TSC price / SPC-1C IOPS).

  • The Sun Flash Accelerator F20 PCIe cards delivered 61% better performance at 1/5th the TSC price than the IBM System Storage EXP12S.

  • The Sun Flash Accelerator F20 PCIe cards delivered 9x better price/performance (TSC price / SPC-1C IOPS) than the IBM System Storage EXP12S.

  • The Sun F20 PCIe Flash Accelerators and workload generator were run and priced inside Oracle's Sun Fire X4270M2 server. The storage and workload generator used the same space (2 RU) as the IBM System Storage EXP12S by itself.

  • The Sun Flash Accelerator F20 PCIe cards delivered 6x better access density (SPC-1C IOPS / ASU Capacity (GB)) than the IBM System Storage EXP12S.

  • The Sun Flash Accelerator F20 PCIe cards delivered 1.5x better price / storage capacity (TSC / ASU Capacity (GB)) than the IBM System Storage EXP12S.

  • This type of workload is similar to database acceleration workloads where the storage is used as a low-latency cache. Typically these applications do not require data protection.

Performance Landscape

System SPC-1C
IOPS™
ASU
Capacity
(GB)
TSC Data
Protection
Level
LRT
Response
(µsecs)
Access
Density
Price
/Perf
$/GB Identifier
Sun F5100 300,873.47 1374.390 $151,381 unprotected 330 218.9 $0.50 110.14 C00010
Sun F20 72,521.11 147.413 $15,554 unprotected 468 492.0 $0.21 105.51 C00011
IBM EXP12S 45,000.20 547.610 $87,468 unprotected 460 82.2 $1.94 159.76 E00001

SPC-1C IOPS – SPC-1C performance metric, bigger is better
ASU Capacity – Application storage unit (ASU) capacity (in GB)
TSC – Total price of tested storage configuration, smaller is better
Data Protection Level – Data protection level used in benchmark
LRT Response (µsecs) – Average response time (microseconds) of the 10% BSU load level test run, smaller is better
Access Density – Derived metric of SPC-1C IOPS / ASU Capacity (GB), bigger is better
Price/Perf – Derived metric of TSC / SPC-1C IOPS, smaller is better.
$/GB – Derived metric of TSC / ASU Capacity (GB), smaller is better
Pricing for the IBM EXP12S included maintenance, pricing for the Sun F20 did not
Identifier – The SPC-1C submission identifier

Results and Configuration Summary

Storage Configuration:

3 x Sun Flash Accelerator F20 PCIe cards (4 FMODs each)

Hardware Configuration:

1 x Sun Fire X4270 M2 server
12 GB memory
1 x 2.93 GHz Intel Xeon X5670 processor

Software Configuration:

Oracle Solaris 10
SPC-1C benchmark kit

Benchmark Description

SPC Benchmark 1C? (SPC-1C) is the first Storage Performance Council (SPC) component-level benchmark applicable across a broad range of storage component products such as disk drives, host bus adapters (HBAs), intelligent enclosures, and storage software, such as, Logical Volume Managers. SPC-1C utilizes an identical workload to SPC-1, which is designed to demonstrate the performance of a storage component product, while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries, as well as, update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.

SPC-1C configurations consist of one or more HBAs/Controllers and one of the following storage device configurations:

  • One, two, or four storage devices in a stand alone configuration. An external enclosure may be used, but only to provide power and/or connectivity for the storage devices.

  • A "Small Storage Subsystem" configured in no larger than a 4U enclosure profile (1 - 4U, 2 - 2U, 4 - 1U, etc.).

Key Points and Best Practices

  • For best performance, insure partitions start on a 4K aligned boundary.

See Also

Disclosure Statement

SPC-1C, SPC-1C IOPS, SPC-1C LRT are trademarks of Storage Performance Council (SPC), see www.storageperformance.org for more information. Sun Flash Accelerator F20 PCIe cards SPC-1C submission identifier C00011 results of 72,521.11 SPC-1C IOPS over a total ASU capacity of 147.413 GB using unprotected data protection, a SPC-1C LRT of 0.468 milliseconds, a 100% load over all ASU response time of 6.17 milliseconds and a total TSC price (not including three-year maintenance) of $15,554. Sun Storage F5100 flash array SPC-1C submission identifier C00010 results of 300,873.47 SPC-1C IOPS over a total ASU capacity of 1374.390 GB using unprotected data protection, a SPC-1C LRT of 0.33 milliseconds, a 100% load over all ASU response time of 2.63 milliseconds and a total TSC price (including three-year maintenance) of $151,381. This compares with IBM System Storage EXP12S SPC-1C/E Submission identifier E00001 results of 45,000.20 SPC-1C IOPS over a total ASU capacity of 547.61 GB using unprotected data protection level, a SPC-1C LRT of 0.46 milliseconds, a 100% load over all ASU response time of 6.95 milliseconds and a total TSC price (including three-year maintenance) of $87,468.35. Derived metrics: Access Density (SPC-1C IOPS / ASU Cpacity (GB)); Price / Performance (TSC / SPC-1C IOPS); Price / Storage Capacity (TSC / ASU Cpacity (GB))

The Sun Flash Accelerator F20 PCIe cards is a single half-height, low-profile PCIe card. The IBM System Storage EXP12S is a 2RU (3.5") array.

Thursday Jun 10, 2010

Hyperion Essbase ASO World Record on Sun SPARC Enterprise M5000

Oracle's Sun SPARC Enterprise M5000 server is an excellent platform for implementing Oracle Essbase as demonstrated by the Aggregate Storage Option (ASO) benchmark.

  • Oracle's Sun SPARC Enterprise M5000 server with Oracle Solaris 10 and using Oracle's Sun Storage F5100 Flash Array system has achieved world record performance running the Oracle Essbase Aggregate Storage Option benchmark using Oracle Hyperion Essbase 11.1.1.3 and the Oracle 11g database.

  • The workload used over 1 billion records in a 15 dimensional database with millions of members. Oracle Hyperion is a component of Oracle Fusion Middleware.

  • Sun Storage F5100 Flash Array system provides more than 20% improvement out of the box compared to a mid-size fiber channel disk array for default aggregation and user based aggregation.

  • The Sun SPARC Enterprise M5000 server with Sun Storage F5100 Flash Array system and Oracle Hyperion Essbase 11.1.1.3 running on Oracle Solaris 10 provides less than 1 second query response times for 20K users in a 15 dimensional database.

  • Sun Storage F5100 Flash Array system and Oracle Hyperion Essbase provides the best combination for large Essbase database leveraging ZFS and taking advantage of high bandwidth for faster load and aggregation.

  • Oracle Fusion Middleware provides a family of complete, integrated, hot pluggable and best-of-breed products known for enabling enterprise customers to create and run agile and intelligent business applications. Oracle Hyperion's performance demonstrates why so many customers rely on Oracle Fusion Middleware as their foundation for innovation.

Performance Landscape

System Data Base Size Data Load Def Agg User Aggregation
Sun M5000, 2.53 GHz SPARC64 VII 1000M 269 min 526 min 115 min
Sun M5000, 2.4 GHz SPARC64 VII 400M 120 min 448 min 18 min

less time means faster result.

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise M5000
      4 x SPARC64 VII, 2.53 GHz
      64 GB memory
    Sun Storage F5100 Flash Array
      40 x 24 GB Flash modules

Software Configuration:

    Oracle Solaris 10
    Oracle Solaris ZFS
    Installer V 11.1.1.3
    Oracle Hyperion Essbase Client v 11.1.1.3
    Oracle Hyperion Essbase v 11.1.1.3
    Oracle Hyperion Essbase Administration services 64-bit
    Oracle Weblogic 9.2MP3 -- 64 bit
    Oracle Fusion Middleware
    Oracle RDBMS 11.1.0.7 64-bit

Benchmark Description

The benchmark highlights how Oracle Essbase can support pervasive deployments in large enterprises. It simulates an organization that needs to support a large Essbase Aggregate Storage database with over one billion data items, large dimension with 14 million members and 20 thousand active concurrent users, each operating in mixed mode: ad-hoc reporting and report viewing. The application for this benchmark was designed to model a scaled out version of a financial business intelligence application.

The benchmarks simulates typical administrative and user operations in an OLAP application environment. Administrative operations include: dimension build, data load, and data aggregation. User testing modeled a total user base of 200,000 with 10 percent actively retrieving data from Essbase.

Key Points and Best Practices

  • Sun Storage F5100 Flash Array system has been used to accelerate the application performance.
  • Jumbo frames were enabled to faster data loading.

See Also

Disclosure Statement

Oracle Essbase, www.oracle.com/solutions/mid/oracle-hyperion-enterprise.html, results 5/20/2010.

Wednesday Jun 09, 2010

PeopleSoft Payroll 500K Employees on Sun SPARC Enterprise M5000 World Record

Oracle's Sun SPARC Enterprise M5000 server combined with Oracle's Sun Storage F5100 Flash Array system has produced World Record Performance on PeopleSoft Payroll 9.0 (North American) 500K employees benchmark.
  • The Sun SPARC Enterprise M5000 server and the Sun Storage F5100 Flash Array system processed payroll for 500K employees using 32 payroll threads 18% faster than the IBM z10 EC 2097-709 mainframe as measured for payroll processing tasks in the Peoplesoft Payroll 9.0 (North American) benchmark. This IBM mainframe is rated at 6,512 MIPS.

  • The IBM z10 mainframe with nine 4.4 GHz Gen1 processors has a list price over $6M.

  • The Sun SPARC Enterprise M5000 server together with the Sun Storage F5100 Flash Array system processed payroll for 500K employees using 32 payroll threads 92% faster than an HP rx7640 as measured for payroll processing tasks in the Peoplesoft Payroll 9.0 (North American) benchmark.

  • The Sun Storage F5100 Flash Array system is a high performance, high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

  • The Sun SPARC Enterprise M5000 server used the Oracle Solaris 10 operating system and ran with the Oracle 11gR1 database for this benchmark.

Performance Landscape

500K Employees

System Processor OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000 8x 2.53GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 9x 4.4GHz Gen1, 6,512 MIPS Z/OS /DB2 58.96 80.5 250.68 462.6 8
HP rx7640 8x 1.6GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

Times under all Run columns above represent Payroll processing and Post-processing elapsed times and furthermore:

  • Run 1 = 32 parallel job streams & Single Check option = "No"
  • Run 2 = 32 sequential jobs for Pay Calculation process & 32 parallel job streams for the rest. Single Check option = "Yes"
  • Run 3 = One job stream & Single Check option = "Yes"

Times under Result column represents Payroll processing only.

Results and Configuration Summary

Hardware Configuration:

    1 x Sun SPARC Enterprise M5000 (8 x 2.53 GHz/64 GB)
    1 x Sun Storage F5100 Flash Array (40 x 24 GB FMODs)
    1 x StorageTek 2510 (4 x 136 GB SAS 15K RPM)
    4 x Dual-Port SAS Fibre Channel Host Bus Adapters (HBA)

Software Configuration:

    Oracle Solaris 10 10/09
    Oracle PeopleSoft HCM and Campus Solutions 9.00.00.311 64-bit
    Oracle PeopleSoft Enterprise (PeopleTools) 8.49.25 64-bit
    Oracle 11g R1 11.1.0.7 64-bit
    Micro Focus COBOL Server Express 4.0 SP4 64-bit

Benchmark Description

The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing large organization. The five processes are:

  • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

  • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

For the benchmark, we collect at least three data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of thirty-two job streams to be configured to run in parallel.

Key Points and Best Practices

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle PeopleSoft Payroll 9.0 benchmark, Sun SPARC Enterprise M5000 (8 2.53GHz SPARC64 VII) 50.11 min, IBM z10 (9 gen1) 58.96 min, HP rx7640 (8 1.6GHz Itanium2) 96.17 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html, results 6/3/2010.

Wednesday Apr 14, 2010

Oracle Sun Storage F5100 Flash Array Delivers World Record SPC-1C Performance

Oracle's Sun Storage F5100 flash array delivered world record performance on the SPC-1C benchmark. The SPC-1C benchmark shows the advantage of Oracle's FlashFire technology.

  • The Sun Storage F5100 flash array delivered world record SPC-1C performance of 300,873.47 SPC-1C IOPS.

  • The Sun Storage F5100 flash array requires half the rack space of the of the next best result, the IBM System Storage EXP12S.

  • The Sun Storage F5100 flash array delivered nearly seven times better SPC-1C IOPS performance than the next best SPC-1C result, the IBM System Storage EXP12S with 8 SSDs.

  • The Sun Storage F5100 flash array delivered the world record SPC-1C LRT (response time) performance of 330 microseconds, and a full load response time of 2.63 milliseconds, which is over 2.5x better than the IBM System Storage EXP12S SPC-1C result.

  • Compared to the IBM result, the Sun Storage F5100 flash array delivered 2.7x better access density (SPC-1C IOPS/ ASU GB), 3.9x better price/performance (TSC/ SPC-1C IOPS) and 31% better tested $/GB (TSC/ ASU) as part of these SPC-1C benchmark results.

  • The Sun Storage F5100 flash array delivered world record SPC-1C performance using the SPC-1C workload driven by the Sun SPARC Enterprise M5000 server. This type of workload is similar to database acceleration workloads where the storage is used as a low-latency cache. Typically these applications do not require data protection.

Performance Landscape

System SPC-1C
IOPS
ASU
Capacity
(GB)
TSC Data
Protection
Level
LRT
Response
(usecs)
Access
Density
Price
/Perf
$/GB Identifier
Sun F5100 300,873.47 1374.390 $151,381 unprotected 330 218.9 $0.50 110.1 C00010
IBM EXP12S 45,000.20 547.610 $87,486 unprotected 460 82.2 $1.94 159.8 E00001

SPC-1C IOPS – SPC-1C performance metric, bigger is better
ASU Capacity – Application storage unit capacity (in GB)
TSC – Total price of tested storage configuration, smaller is better
Data Protection Level – Data protection level used in benchmark
LRT Response (usecs) – Average response time (microseconds) of the 10% BSU load level test run, smaller is better
Access Density – Derived metric of SPC-1C IOPS / ASU GB, bigger is better
Price/Perf – Derived metric of TSC / SPC-1C IOPS, smaller is better
$/GB – Derived metric of TSC / ASU, smaller is better
Identifier – The SPC-1C submission identifier

Results and Configuration Summary

Storage Configuration:

1 x Sun Storage F5100 flash array with 80 FMODs

Hardware Configuration:

1 x Sun SPARC Enterprise M5000 server
16 x StorageTek PCIe SAS Host Bus Adapter, 8 Port

Software Configuration:

Oracle Solaris 10
SPC-1C benchmark kit

Benchmark Description

SPC-1C is the first SPC component-level benchmark applicable across a broad range of storage component products such as disk drives, host bus adapters (HBAs) intelligent enclosures, and storage software such as Logical Volume Managers. SPC-1C utilizes an identical workload as SPC-1, which is designed to demonstrate the performance of a storage component product while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries as well as update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.

SPC-1C configurations consist of one or more HBAs/Controllers and one of the following storage device configurations:

  • One, two, or four storage devices in a stand alone configuration. An external enclosure may be used but only to provide power and/or connectivity for the storage devices.

  • A "Small Storage Subsystem" configured in no larger than a 4U enclosure profile (1 - 4U, 2 - 2U, 4 - 1U, etc.).

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1C, SPC-1C IOPS, SPC-1C LRT are trademarks of Storage Performance Council (SPC), see www.storageperformance.org for more information. Sun Storage F5100 flash array SPC-1C submission identifier C00010 results of 300,873.47 SPC-1C IOPS over a total ASU capacity of 1374.390 GB using unprotected data protection, a SPC-1C LRT of 0.33 milliseconds, a 100% load over all ASU response time of 2.63 milliseconds and a total TSC price (including three-year maintenance) of $151,381. This compares with IBM System Storage EXP12S SPC-1C/E Submission identifier E00001 results of 45,000.20 SPC-1C IOPS over a total ASU capacity of 547.61 GB using unprotected data protection level, a SPC-1C LRT of 0.46 milliseconds, a 100% load over all ASU response time of 6.95 milliseconds and a total TSC price (including three-year maintenance) of $87,468.

The Sun Storage F5100 flash array is a 1RU (1.75") array. The IBM System Storage EXP12S is a 2RU (3.5") array.

Tuesday Apr 13, 2010

Oracle Sun Flash Accelerator F20 PCIe Card Accelerates Web Caching Performance

Using Oracle's Sun FlashFire technology, the Sun Accelerator F20 PCIe Card is shown to be a high performance and cost effective caching device for web servers. Many current web and application servers are designed with an active cache that is used for holding things like session objects, files and web pages. The Sun F20 card is shown to be an excellent candidate to improve performance over using HDD solutions.

  • The Sun Flash Accelerator F20 PCIe Card provides 2x better Quality of Service (QoS) at the same load as compared to 15K RPM high performance disk drives.

  • The Sun Flash Accelerator F20 PCIe Card enables scaling to 3x more users than 15K RPM high performance disk drives.

  • The Sun Flash Accelerator F20 PCIe Card provides 25% higher Quality of Service (QoS) than 15K RPM high performance disk drives at maximum rate.

  • The Sun Flash Accelerator F20 PCIe Card allows for easy expansion of the webcache. Each card provides an additional 96 GB of storage.

  • The Sun Flash Accelerator F20 PCIe Card used as a caching device offers Bitrate and Quality of Service (QoS) comparable to that provided by memory. While memory also provides excellent caching performance in comparison to disk, memory capacity is limited in servers.

Performance Landscape

Experiment results using three Sun Flash Accelerator F20 PCIe Cards.

Load Factor No Cache F20 Webcache Memcache
Max Load @Disk Load Max Load @F20 Load
Max Connections 7,000 7,000 27,000 27,000
Average Bitrate 445 Kbps 870 Kbps 602 Kbps 678 Kbps
Cache Hit Rate 0% 98% 99% 56%

QoS Bitrates %Connect %Connect %Connect %Connect
900 Kbps - 1 Mbps 0% 97% 0% 0%
800 Kbps 0% 3% 0% 6%
700 Kbps 0% 0% 64% 70%
600 Kbps 18% 0% 24% 15%
420 Kbps - 500 Kbps 88% 0% 12% 9%

Experiment results using two Sun Flash Accelerator F20 PCIe Cards.

Load Factor No Cache F20 Webcache Memcache
Max Load @Disk Load Max Load @F20 Load
Max Connections 7,000 7,000 22,000 27,000
Average Bitrate 445 Kbps 870 Kbps 622 Kbps 678 Kbps
Cache Hit Rate 0% 98% 80% 56%

QoS Bitrates %Connect %Connect %Connect %Connect
900 Kbps - 1 Mbps 0% 97% 0% 0%
800 Kbps 0% 3% 1% 6%
700 Kbps 0% 0% 68% 70%
600 Kbps 18% 0% 26% 15%
420 Kbps - 500 Kbps 88% 0% 5% 9%

Results and Configuration Summary

Hardware Configuration:

Sun Fire X4270, 72 GB memory
3 X Sun Flash Accelerator F20 PCIe Card
Sun Storage J4400 (12 15K RPM disks)

Software Configuration:

Sun Java System Web Server 7
OpenSolaris
Flickr Photo Download Workload
Oracle Solaris Zettabyte File System (ZFS)

Three configurations are compared:

  1. No cache, 12 x high-speed 15K RPM Disks
  2. 3 x Sun Flash Accelerator F20 PCIe Cards as cache device
  3. 64 GB server memory as cache device

Benchmark Description

This benchmark is based upon the description of the flickr website presented at http://highscalability.com/flickr-architecture. It measures performance of an HTTP-based file photo Slide Show workload. The workload randomly selects and downloads from 80 photos stored in 4 bins:

  • 20 large photos, 1800x1800p, 1 MB, 1% probability
  • 20 medium photos, 1000x1000p, 500 KB, 4% probability
  • 20 small photos, 540x540p, 100K, 35% probability
  • 20 thumbnail photos, 100x100p, 5k, 60% probability

Benchmark metrics are:

  • Scalability – Number of persistent connections achieved
  • Quality of Service (QoS) – bitrate achieved by each user
    • max speed: 1 Mbps, min speed SLA: 420 Kbps
    • divides bitrates between max and min in 5 bands, corresponding to dial-in, T1, etc.
    • example: 900 Kbps, 800 Kbps, 700 Kbps, 600 Kbps, 500 Kbps
    • reports %users in each bitrate band

Three cases were tested:

  • Disk as OverFlow Cache – Contents are served from 12 high-performance 15K RPM disks configured in a ZFS zpool.
  • Sun Flash Accelerator F20 PCIe Card as Cache Device – Contents are served from 2 F20 Cards, with 8 component DOMs configured in a ZFS spool
  • Memory as Cache – Contents are served from tmpfs

Key Points and Best Practices

See Also

Disclosure Statement

Results as of 4/1/2010.

Wednesday Nov 18, 2009

Sun Flash Accelerator F20 PCIe Card Achieves 100K 4K IOPS and 1.1 GB/sec

Part of the Sun FlashFire family, the Sun Flash Accelerator F20 PCIe Card is a low-profile x8 PCIe card with 4 Solid State Disks-on-Modules (DOMs) delivering over 101K IOPS (4K IO) and 1.1 GB/sec throughput (1M reads).

The Sun F20 card is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

  • The Sun Flash Accelerator F20 PCIe Card demonstrates breakthrough performance of 101K IOPS for 4K random read
  • The Sun Flash Accelerator F20 PCIe Card can also perform 88K IOPS for 4K random write
  • The Sun Flash Accelerator F20 PCIe Card has unprecedented throughput of 1.1 GB/sec.
  • The Sun Flash Accelerator F20 PCIe Card (low-profile x8 size) has the IOPS performance of over 550 SAS drives or 1,100 SATA drives.

Performance Landscape

Bandwidth and IOPS Measurements

Test DOMs
4 2 1
Random 4K Read 101K IOPS 68K IOPS 35K IOPS
Maximum Delivered Random 4K Write 88K IOPS 44K IOPS 22K IOPS
Maximum Delivered 50-50 4K Read/Write 54K IOPS 27K IOPS 13K IOPS
Sequential Read (1M) 1.1 GB/sec 547 MB/sec 273 MB/sec
Maximum Delivered Sequential Write (1M) 567 MB/sec 243 MB/sec 125 MB/sec

Sustained Random 4K Write\* 37K IOPS 18K IOPS 10K IOPS
Sustained 50/50 4K Read/Write\* 34K IOPS 17K IOPS 8.6K IOPS

(\*) Maximum Delivered values measured over a 1 minute period. Sustained write performance differs from maximum delivered performance. Over time, wear-leveling and erase operations are required and impact write performance levels.

Latency Measurements

The Sun Flash Accelerator F20 PCIe Card is tuned for 4 KB or larger IO sizes, the write service for IOs smaller than 4 KB can be 10 times more than shown in the table below. It should also be noted that the service times shown below are both the latency and the time to transfer the data. This becomes the dominant portion the the service time for IOs over 64 KB in size.

Transfer Size Service Time (ms)
Read Write
4 KB 0.32 0.22
8 KB 0.34 0.24
16 KB 0.37 0.27
32 KB 0.43 0.33
64 KB 0.54 0.46
128 KB 0.49 1.30
256 KB 1.31 2.15
512 KB 2.25 2.25

- Latencies are measured application latencies via vdbench tool.
- Please note that the FlashFire F20 card is a 4KB sector device. Doing IOs of less than 4KB in size, or not aligned on 4KB boundaries, can result in a significant performance degradations on write operations.

Results and Configuration Summary

Storage:

    Sun Flash Accelerator F20 PCIe Card
      4 x 24-GB Solid State Disks-on-Modules (DOMs)

Servers:

    1 x Sun Fire X4170

Software:

    OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
    Vdbench 5.0
    Required Flash Array Patches SPARC, ses/sgen patch 138128-01 or later & mpt patch 141736-05
    Required Flash Array Patches x86, ses/sgen patch 138129-01 or later & mpt patch 141737-05

Benchmark Description

Sun measured a wide variety of IO performance metrics on the Sun Flash Accelerator F20 PCIe Card using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

Vdbench profile f20-parmfile.txt is here for bandwidth and IOPs. And here is the vdbench profile f20-latency.txt file for latency.

Vdbench is publicly available for download at: http://vdbench.org

Key Points and Best Practices

  • Drive each Flash Modules with 32 outstanding IO as shown in the benchmark profile above.
  • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance

See Also

Disclosure Statement

Sun Flash Accelerator F20 PCIe Card delivered 100K 4K read IOPS and 1.1 GB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 14, 2009.

Wednesday Nov 04, 2009

New TPC-C World Record Sun/Oracle

TPC-C Sun SPARC Enterprise T5440 with Oracle RAC World Record Database Result

Sun and Oracle demonstrate the World's fastest database performance. Sun Microsystems using 12 Sun SPARC Enterprise T5440 servers, 60 Sun Storage F5100 Flash arrays and Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning delivered a world-record TPC-C benchmark result.

  • The 12-node Sun SPARC Enterprise T5440 server cluster result delivered a world record TPC-C benchmark result of 7,646,486.7 tpmC and $2.36 $/tpmC (USD) using Oracle 11g R1 on a configuration available 3/19/10.

  • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the IBM Power 595 (5GHz) with IBM DB2 9.5 database by 26% and has 16% better price/performance on the TPC-C benchmark.

  • The complete Oracle/Sun solution used 10.7x better computational density than the IBM configuration (computational density = performance/rack).

  • The complete Oracle/Sun solution used 8 times fewer racks than the IBM configuration.

  • The complete Oracle/Sun solution has 5.9x better power/performance than the IBM configuration.

  • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the HP Superdome (1.6GHz Itanium2) by 87% and has 19% better price/performance on the TPC-C benchmark.

  • The Oracle/Sun solution utilized Sun FlashFire technology to deliver this result. The Sun Storage F5100 flash array was used for database storage.

  • Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning scales and effectively uses all of the nodes in this configuration to produce the world record performance.

  • This result showed Sun and Oracle's integrated hardware and software stacks provide industry-leading performance.

More information on this benchmark will be posted in the next several days.

Performance Landscape

TPC-C results (sorted by tpmC, bigger is better)


System
tpmC Price/tpmC Avail Database Cluster Racks w/KtpmC
12 x Sun SPARC Enterprise T5440 7,646,487 2.36 USD 03/19/10 Oracle 11g RAC Y 9 9.6
IBM Power 595 6,085,166 2.81 USD 12/10/08 IBM DB2 9.5 N 76 56.4
HP Integrity Superdome 4,092,799 2.93 USD 08/06/07 Oracle 10g R2 N 46 to be added

Avail - Availability date
w/KtmpC - Watts per 1000 tpmC
Racks - clients, servers, storage, infrastructure

Sun and IBM TPC-C Response times


System
tpmC

Response Time

New Order 90th%

Response Time

New Order Average

12 x Sun SPARC Enterprise T5440 7,646,487 0.170 0.168
IBM Power 595 6,085,166 1.69
1.22
Response Time Ratio - Sun Better

9.9x 7.3x

Sun uses 7x comparison to highlight the differences in response times between Sun's solution and IBM.  Although notice that Sun is 10x faster on New Order transactions that finish in the 90% percentile.

It is also interesting to note that none of Sun's response times, avg or 90th percentile, for any transaction is over 0.25 seconds. While IBM does not have even one interactive transaction, not even the menu, below 0.50 seconds. Graphs of Sun's and IBM's response times for New-Order can be found in the full disclosure reports on TPC's website TPC-C Official Result Page.

Results and Configuration Summary

Hardware Configuration:

    9 racks used to hold

    Servers:
      12 x Sun SPARC Enterprise T5440
      4 x 1.6 GHz UltraSPARC T2 Plus
      512 GB memory
      10 GbE network for cluster
    Storage:
      60 x Sun Storage F5100 Flash Array
      61 x Sun Fire X4275, Comstar SAS target emulation
      24 x Sun StorageTek 6140 (16 x 300 GB SAS 15K RPM)
      6 x Sun Storage J4400
      3 x 80-port Brocade FC switches
    Clients:
      24 x Sun Fire X4170, each with
      2 x 2.53 GHz X5540
      48 GB memory

Software Configuration:

    Solaris 10 10/09
    OpenSolaris 6/09 (COMSTAR) for Sun Fire X4275
    Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning
    Tuxedo CFS-R Tier 1
    Sun Web Server 7.0 Update 5

Benchmark Description

TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

See Also

Disclosure Statement

TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Performance Processing Council (TPC). 12-node Sun SPARC Enterprise T5440 Cluster (1.6GHz UltraSPARC T2 Plus, 4 processor) with Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning, 7,646,486.7 tpmC, $2.36/tpmC. Available 3/19/10. IBM Power 595 (5GHz Power6, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5, 6,085,166 tpmC, $2.81/tpmC, available 12/10/08. HP Integrity Superdome(1.6GHz Itanium2, 64 processors, 128 cores, 256 threads) with Oracle 10g Enterprise Edition, 4,092,799 tpmC, $2.93/tpmC. Available 8/06/07. Source: www.tpc.org, results as of 11/5/09.

Thursday Oct 15, 2009

Oracle Flash Cache - SGA Caching on Sun Storage F5100

Overview and Significance of Results

Oracle and Sun's Flash Cache technology combines New features in Oracle with the Sun Storage F5100 to improve database performance. In Oracle databases, the System Global Area (SGA) is a group of shared memory areas that are dedicated to an Oracle “instance” (Oracle processes in execution sharing a database) . All Oracle processes use the SGA to hold information. The SGA is used to store incoming data (data and index buffers) and internal control information that is needed by the database. The size of the SGA is limited by the size of the available physical memory.

This benchmark tested and measured the performance of a new Oracle Database 11g (Release2) feature, which allows to extend the SGA size and caching beyond physical memory, to a large flash memory storage device as the Sun Storage F5100 flash array.

One particular benchmark test demonstrated a dramatic performance improvement (almost 5x) using the Oracle Extended SGA feature on flash storage by reaching SGA sizes in the hundreds of GB range, at a more reasonable cost than equivalently sized RAM and with much faster access times than disk I/O.

The workload consisted in a high volume of SQL select transactions accessing a very large table in a typical business oriented OLTP database. To obtain a baseline, throughput and response times were measured applying the workload against a traditional storage configuration and constrained by disk I/O demand (DB working set of about 3x the size of the data cache in the SGA). The workload was then executed with an added Sun Storage F5100 Flash Array configured to contain an Extended SGA of incremental size.

The tests have shown scaling throughput along with increasing Flash Cache size.

Table of Results

F5100 Extended SGA Size (GB) Query Txns / Min Avg Response Time (Secs) Speedup Ratio
No 76338 0.118 N/A
25 169396 0.053 2.2
50 224318 0.037 2.9
75 300568 0.031 3.9
100 357086 0.025 4.6




Configuration Summary

Server Configuration:

    Sun SPARC Enterprise M5000 Server
    8 x SPARC64 VII 2.4GHz Quad Core
    96 GB memory

Storage Configuration:

    8 x Sun Storage J4200 Arrays, 12x 146 GB 15K RPM disks each (96 disks total)
    1 x Sun Storage F5100 Flash Array

Software Configuration:

    Oracle 11gR2
    Solaris 10

Benchmark Description

The workload consisted in a high volume of SQL select transactions accessing a very large table in a typical business oriented OLTP database.

The database consisted of various tables: Products, Customers, Orders, Warehouse Inventory (Stock) data, etc. and the Stock table alone was 3x the size of the db cache size.

To obtain a baseline, throughput and response times were measured applying the workload against a traditional storage configuration and constrained by disk I/O demand. The workload was then executed with an added Sun Storage F5100 Flash Array configured to contain an Extended SGA of incremental size.

During all tests, the in memory SGA data cache was limited to 25 GB .

The Extended SGA was allocated on a “raw' Solaris Volume created with the Solaris Volume Manager (SVM) on a set of devices (Flash Modules) residing on the Sun Storage F5100 flash array.

Key Points and Best Practices

In order to verify the performance improvement brought by extended SGA, the feature had to be tested with a large enough database size and with a workload requiring significant disk I/O activity to access the data. For that purpose, the size of the database needed to be a multiple of the physical memory size, avoiding the case in which the accessed data could be entirely or almost entirely cached in physical memory.

The above represents a typical “use case” in which the Flash Cache Extension is able to show remarkable performance advantages.

If the DB dataset is already entirely cached, or the DB I/O demand is not significant or the application is already saturating the CPU for non database related processing, or large data caching is not productive (DSS type Queries), the Extended SGA may not improve performance.

It is also relevant to know that additional memory structures needed to manage the Extended SGA are allocated in the “in memory” SGA, therefore reducing its data caching capacity.

Increasing the Extended Cache beyond a specific threshold, dependent on various factors, may reduce the benefit of widening the Flash SGA and actually reduce the overall throughput.

This new cache is somewhat similar architecturally to the L2ARC on ZFS. Once written, flash cache buffers are read-only, and updates are only done into main memory SGA buffers. This feature is expected to primarily benefit read-only and read-mostly workloads.

A typical sizing of database flash cache is 2x to 10x the size of SGA memory buffers. Note that header information is stored in the SGA for each flash cache buffer (100 bytes per buffer in exclusive mode, 200 bytes per buffer in RAC mode), so the number of available SGA buffers is reduced as the flash cache size increases, and the SGA size should be increased accordingly.

Two new init.ora parameters have been introduced, illustrated below:

    db_flash_cache_file = /lfdata/lffile_raw
    db_flash_cache_size = 100G
The db_flash_cache_file parameter takes a single file name, which can be a file system file, a raw device, or an ASM volume. The db_flash_cache_size parameter specifies the size of the flash cache. Note that for raw devices, the partition being used should start at cylinder 1 rather than cylinder 0 (to avoid the disk's volume label).

See Also

Disclosure Statement

Results as of October 10, 2009 from Sun Microsystems.

Tuesday Oct 13, 2009

SPECweb2005 on Sun SPARC Enterprise T5440 World Record using Solaris Containers and Sun Storage F5100 Flash

The Sun SPARC Enterprise T5440 server with 1.6GHz UltraSPARC T2 Plus with Solaris Containers, Sun Flash Open Storage, and Sun JAVA System Web Server 7.0 Update 5 achieved World Record SPECweb2005.
  • Sun has obtained a World Record SPECweb2005 performance result of 100,209 SPECweb2005 on the Sun SPARC Enterprise T5440, running Solaris 10 10/09 Sun JAVA System Web Server 7.0 Update 5, and Java Hotspot™ Server VM.

  • This result demonstrates performance leadership of the Sun SPARC Enterprise T5440 server and its scalability, by using Solaris Containers to consolidate multiple web serving environments, and Sun OpenStorage Flash technology to store large datasets for fast data retrieval.

  • The Sun SPARC Enterprise T5440 delivers 21% greater SPECweb2005 performance than the HP DL370 G6 with 3.2GHz Xeon W5580 processors.

  • The Sun SPARC Enterprise T5440 delivers 40% greater SPECweb2005 performance than the HP DL 585 G5 with four 3.114 GHz Opteron 8393 SE processors.

  • The Sun SPARC Enterprise T5440 delivers 2x the SPECweb2005 performance of the HP DL 580 G5 with four 2.66GHz Xeon X7460 processors.

  • There are no IBM Power6 results on the SPECweb2005 benchmark.

  • This benchmark result clearly demonstrates that the Sun SPARC Enterprise T5440 running Solaris 10 10/09 and Sun Java System Webserver 7.0 Update 5 can support thousands of concurrent web server sessions and is an industry leader in web serving with a Sun solution.

Performance Landscape

Server

Processor

SPECweb2005

Banking\*

Ecomm\*

Support\*

Webserver

OS

Sun T5440

4x 1.6 T2 Plus

100,209

176,500

133,000

95,000

Java WebServer

Solaris

HP DL370 G6

2x 3.2 W5580

83,073

117,120

142,080

76,352

Rock

RedHat
Linux

HP DL585 G5

4x 3.11 O8393

71,629

117,504

123,072

56,320

Rock

RedHat
Linux

HP DL580 G5

4x 2.66 X7460

50,013

97,632

69,600

40,800

Rock

RedHat
Linux

\* Banking - SPECweb2005-Banking
   Ecomm - SPECweb2005-Ecommerce
   Support - SPECweb2005-Support

Results and Configuration Summary

Hardware Configuration:

  1 Sun SPARC Enterprise T5440 with

  • 4 x UltraSPARC T2 Processor 8 core, 64 threads, 1.6 GHz
  • 254 GB memory
  • 6 x 4Gb PCI Express 8-Port Host Adapter (SG-XPCIE8SAS-E-Z)
  • 1 x Sun Storage F5100 Flash Array (TA5100RASA4-80AA)
  • 1 x Sun Storage F5100 Flash Array (TA5100RASA4-40AA)

Server Software Configuration:

  • Solaris 10 10/09
  • JAVA System Web Server 7.0 Update 5
  • Java Hotspot™ Server VM

Network configuration:

  • 1 x Arista DCS-7124s 24-10GbE port  switch
  • 1 x Cisco 2970 series (WS-C2970G-24TS-E) switch for the three 1 GbE networks

Back-end Simulator:

  1 Sun Fire X4270 with

  • 2 x 2.93 GHz Intel X5570 Quad core
  • 48GB memory
  • Solaris 10 10/09
  • JSWS 7.0 Update 5
  • Java Hotspot™ Server VM

Clients:

  8 Sun Blade™ T6320

  • 1 x 1.417 GHz UltraSPARC-T2
  • 64 GB memory
  • Solaris 10 5/09
  • Java Hotspot™ Server VM

  8 Sun Blade™ 6270

  • 2 x 2.93 GHz Intel X5570 Quad core
  • 36 GB memory
  • Solaris 10 5/09
  • Java Hotspot™ Server VM

Benchmark Description

SPECweb2005, successor to SPECweb99 and SPECweb99_SSL, is an industry standard benchmark for evaluating Web Server performance developed by SPEC. The benchmark simulates multiple user sessions accessing a Web Server and generating static and dynamic HTTP requests. The major features of SPECweb2005 are:

  • Measures simultaneous user sessions
  • Dynamic content: currently PHP and JSP implementations
  • Page images requested using 2 parallel HTTP connections
  • Multiple, standardized workloads: Banking (HTTPS), E-commerce (HTTP and HTTPS), and Support (HTTP)
  • Simulates browser caching effects
  • File accesses more accurately simulate today's disk access patterns

Key Points and Best Practices

  • The server was divided into four Solaris Containers and a single web server instance was executed in each container.
  • Four processor sets were created (with varying numbers of threads depending on the workload) to run the web server in. This was done to reduce memory access latency using the physical memory closest to the processor.  All interrupts were run on the remaining threads.
  • Each web server is executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Two Sun Storage F5100 Flash Arrays (holding the target file set and logs) were shared by the four containers  for fast data retrieval.   
  • Use of Solaris Containers highlights the consolidation of multiple web serving environments on a single server.
  • Use of the Sun Ext I/O Expansion unit and Sun Storage F5100 Flash Arrays highlight the expandability of the server.

    Disclosure Statement

    Sun SPARC Enterprise T5440 (8 cores, 1 chip) 100209 SPECweb2005, was submitted to SPEC for review on October 13, 2009.  HP ProLiant DL370 G6 (8 cores, 2 chips) 83,073 SPECweb2005. HP ProLiant DL585 G5 (16 cores, 4 chips) 71,629 SPECweb2005. HP ProLiant DL580 G5 (24 cores, 4 chips) 50,013 SPECweb2005. SPEC, SPECweb reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of Oct 10, 2009.

    Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance

    The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology, the Sun Storage F5100 flash array, has produced World Record Performance on PeopleSoft Payroll (North America) 9.0 benchmark.

    • A Sun SPARC Enterprise M4000 server with four new 2.53GHz SPARC64 VII processors and a Sun Storage F5100 flash array is 33% faster than the HP rx6600 (4 x 1.6GHz Itanium2 processors) on the PeopleSoft Payroll (NA) 9.0 benchmark. The Sun solution used the Oracle 11g database running on Solaris 10.

    • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and the Sun Storage F5100 flash array is 35% faster than the 2027 MIPs IBM Z990 (6 Z990 Gen1 processors) on the PeopleSoft Payroll (NA) 9.0 benchmark with Oracle 11g database running on Solaris 10. The IBM result used IBM DB2 for Z/OS 8.1 for the database.

    • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and a Sun Storage F5100 flash array processed 250K employee payroll checks using PeopleSoft Payroll (NA) 9.0 and Oracle 11g running on Solaris 10. Four different execution strategies were run with an average improvement of 25% compared to HP's results run on the rx6600. Sun achieved these results with 8 concurrent jobs using only 25% CPU utilization while HP required 16 concurrent jobs with a 88% CPU utilization.

    • The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology processed 8 Sequential Jobs and single run control with a total time of 527.85 mins, an improvement of 20% compared to HPs time of 633.09 mins.

    • The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology demonstrated a speedup of 81% going from 1 to 8 streams on the PeopleSoft Payroll (NA) 9.0 benchmark using the Oracle 11g database.

    • The Sun FlashFire technology dramatically improves IO performance for the PeopleSoft Payroll benchmark with significant performance boost over best optimized FC disks (60+).

    • The Sun Storage F5100 Flash Array is a high performance high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

    • Sun estimates that the MIPS rating for a Sun SPARC Enterprise M4000 server is over 2742 MIPS.

    Performance Landscape

    250K Employees

    System Processor OS/Database Time in Minutes Version
    Run 1 Run 2 Run 3
    Sun M4000 4x 2.53GHz SPARC64 VII Solaris/Oracle 11g 79.35 288.47 527.85 9.0
    HP rx6600 4x 1.6GHz Itanium2 HP-UX/Oracle 11g 81.17 350.16 633.25 9.0
    IBM Z990 6x Gen1 2027 MIPS Z/OS /DB2 107.34 328.66 544.80 9.0
    HP rx6600 4x 1.6GHz Itanium2 HP-UX/Oracle 11g 105.70 369.59 633.09 9.0

    Note: IBM benchmark documents show that 6 Gen1 procs is 2027 mips. 13 Gen1 processors were in this config but only 6 were available for testing.

    500K Employees

    System Processor OS/Database Time in Minutes Version
    Run 1 Run 2 Run 3
    HP rx7640 8x 1.6GHz Itanium2 HP-UX/Oracle 11g 133.63 712.72 1665.01 9.0

    Results and Configuration Summary

    Hardware Configuration:

      1 x Sun SPARC Enterprise M4000 (4 x 2.53 GHz/32GB)
      1 x Sun Storage F5100 Flash Array (40 x 24GB FMODs)
      1 x Sun Storage J4200 (12 x 450GB SAS 15K RPM)

    Software Configuration:

      Solaris 10 5/09
      Oracle PeopleSoft HCM 9.0
      Oracle PeopleSoft Enterprise (PeopleTools) 8.49
      Micro Focus Server Express 4.0 SP4
      Oracle RDBMS 11.1.0.7 64-bit
      HP's Mercury Interactive QuickTest Professional 9.0

    Benchmark Description

    The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

    To measure five application business process run times for a database representing large organization. The five processes are:

    • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

    • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

    • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

    • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

    • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

    For the benchmark, we collect at least four data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of eight job streams to be configured to run in parallel.

    Key Points and Best Practices

    See Also

    Disclosure Statement

    Oracle PeopleSoft Payroll (NA) 9.0 benchmark, Sun M4000 (4 2.53GHz SPARC64) 79.35 min, IBM Z990 (6 gen1) 107.34 min, HP rx6600 (4 1.6GHz Itanium2) 105.70 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html Results 10/13/2009.

    Monday Oct 12, 2009

    MCAE MCS/NASTRAN faster on Sun F5100 and Fire X4270

    Significance of Results

    The Sun Storage F5100 Flash Array can double performance over internal hard disk drives as shown by the I/O intensive MSC/Nastran MCAE application MDR3 benchmark tests on a Sun Fire X4270 server.

    The MD Nastran MDR3 benchmarks were run on a single Sun Fire X4270 server. The I/O intensive test cases were run at different core levels from one up to the maximum of 8 available cores in SMP mode.

    The MSC/Nastran MD 2008 R3 module is an MCAE application based on the finite element method (FEA) of analysis. This computer based numerical method inherently involves a substantial I/O component. The purpose was to evaluate the performance of the Sun Storage F5100 Flash Array relative to high performance 15K RPM internal stripped HDDs.

    • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "xx0cmd2" test case by 107% in the 8-core server configuration.

    • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "xl0tdf1"test case by 85% in the 8-core server configuration.

    The MD Nastran MDR3 test suite was designed to include some very I/O intensive test cases albeit some are not very scalable. These cases are the called "xx0wmd0" and "xx0xst0". Both were run and results are presented using a single core server configuration.

    • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "xx0xst0"test case by 33% in the single-core server configuration.

    • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "xx0wmd0"test case by 20% in the single-core server configuration.

    Performance Landscape

    MD Nastran MDR3 Benchmark Tests

    Results in seconds

    Test Case DMP 4x15K RPM
    72 GB SAS HDD
    striped HW RAID0
    Sun F5100
    r/w buff 4096
    striped
    Sun F5100
    Performance
    Advantage
    xx0cmd2 8 959 463 107%
    xl0tdf1 8 1104 596 85%
    xx0xst0 1 1307 980 33%
    xx0wmd0 1 20250 16806 20%

    Results and Configuration Summary

    Hardware Configuration:
      Sun Fire X4270
        2 x 2.93 GHz QC Intel Xeon X5570 processors
        24 GB memory
        4 x 72 GB 15K RPM striped (HW RAID0) SAS disks
      Sun Storage F5100 Flash Array
        20 x 24 GB flash modules
        Intel controller

    Software Configuration:

      O/S: 64-bit SUSE Linux Enterprise Server 10 SP 2
      Application: MSC/NASTRAN MD 2008 R3
      Benchmark: MDR3 Benchmark Test Suite
      HP MPI: 02.03.00.00 [7585] Linux x86-64

    Benchmark Description

    The benchmark tests are representative of typical MSC/Nastran applications including both SMP and DMP runs involving linear statics, nonlinear statics, and natural frequency extraction.

    The MD (Multi Discipline) Nastran 2008 application performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior and/or deformations are concerned. The new release includes the MARC module for general purpose nonlinear analyses and the Dytran module that employs an explicit solver to analyze crash and high velocity impact conditions.

    Please go here for a more complete description of the tests.

    Key Points and Best Practices

    • Based on the maximum physical memory on a platform the user can stipulate the maximum portion of this memory that can be allocated to the Nastran job. This is done on the command line with the mem= option. On Linux based systems where the platform has a large amount of memory and where the model does not have large scratch I/O requirements the memory can be allocated to a tmpfs scratch space file system. On Solaris X64 systems advantage can be taken of ZFS for higher I/O performance.

    • The MD Nastran MDR3 test cases don't scale very well, a few not at all and the rest on up to 8 cores at best.

    • The test cases for the MSC/Nastran module all have a substantial I/O component where 15% to 25% of the total run times are associated with I/O activity (primarily scratch files). The required scratch file size ranges from less than 1 GB on up to about 140 GB. Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system, further enhanced as indicated here by implementing the Lustre based I/O system. High performance interconnects such as InfiniBand for inter node cluster message passing as well as I/O transfer from the storage system can also enhance performance substantially.

    See Also

    Disclosure Statement

    MSC.Software is a registered trademark of MSC. All information on the MSC.Software website is copyrighted. MD Nastran MDR3 results from http://www.mscsoftware.com and this report as of October 12, 2009.

    Sunday Oct 11, 2009

    1.6 Million 4K IOPS in 1RU on Sun Storage F5100 Flash Array

    The Sun Storage F5100 Flash Array is a high performance high density solid state flash array delivering over 1.6M IOPS (4K IO) and 12.8GB/sec throughput (1M reads). The Flash Array is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

    • The Sun Storage F5100 Flash Array demonstrates breakthrough performance of 1.6M IOPS for 4K random reads
    • The Sun Storage F5100 Flash Array can also perform 1.2M IOPS for 4K random writes
    • The Sun Storage F5100 Flash Array has unprecedented throughput of 12.8 GB/sec.

    Performance Landscape

    Results were obtained using four hosts.

    Bandwidth and IOPS Measurements

    Test Flash Modules
    80 40 20 1
    Random 4K Read 1,591K IOPS 796K IOPS 397K IOPS 21K IOPS
    Maximum Delivered Random 4K Write 1,217K IOPS 610K IOPS 304K IOPS 15K IOPS
    Maximum Delivered 50-50 4K Read/Write 850K IOPS 426K IOPS 213K IOPS 11K IOPS
    Sequential Read (1M) 12.8 GB/sec 6.4 GB/sec 3.2 GB/sec 265 MB/sec
    Maximum Delivered Sequential Write (1M) 9.7 GB/sec 4.8 GB/sec 2.4 GB/sec 118 MB/sec

    Sustained Random 4K Write\*

    172K IOPS 9K IOPS

    (\*) Maximum Delivered values measured over a 1 minute period. Sustained write performance measured over a 1 hour period and differs from maximum delivered performance. Over time, wear-leveling and erase operations are required and impact write performance levels.

    Latency Measurements

    The Sun Storage F5100 Flash Array is tuned for 4 KB or larger IO sizes, the write service for IOs smaller than 4 KB can be 10 times more than shown in the table below. It should also be noted that the service times shown below are both the latency and the time to transfer the data. This becomes the dominant portion the the service time for IOs over 64 KB in size.

    Transfer Size Service Time (ms)
    Read Write
    4 KB 0.41 0.28
    8 KB 0.42 0.35
    16 KB 0.45 0.72
    32 KB 0.51 0.77
    64 KB 0.63 1.52
    128 KB 0.87 2.99
    256 KB 1.34 6.03
    512 KB 2.29 12.14
    1024 KB 4.19 23.79

    - Latencies are measured application latencies via vdbench tool.
    - Please note that the F5100 Flash Array is a 4KB sector device. Doing IOs of less than 4KB in size, or not aligned on 4KB boundaries, can result in a significant performance degradations on write operations.

    Results and Configuration Summary

    Storage:

      Sun Storage F5100 Flash Array
        80 Flash Modules
        16 ports
        4 domains (20 Flash Modules per)
        CAM zoning - 5 Flash Modules per port

    Servers:

      4 x Sun SPARC Enterprise T5240
      4 x 4 HBAs each, firmware version 01.27.03.00-IT

    Software:

      OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
      Vdbench 5.0
      Required Flash Array Patches SPARC, ses/sgen patch 138128-01 or later & mpt patch 141736-05
      Required Flash Array Patches x86, ses/sgen patch 138129-01 or later & mpt patch 141737-05

    Benchmark Description

    Sun measured a wide variety of IO performance metrics on the Sun Storage F5100 Flash Array using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

    Vdbench profile parmfile.txt here

    Vdbench is publicly available for download at: http://vdbench.org

    Key Points and Best Practices

    • Drive each Flash Modules with 32 outstanding IO as shown in the benchmark profile above.
    • LSI HBA firmware level should be at Phase 15 maxq.
    • LSI HBAs either use single port HBAs or only 1 port per HBA.
    • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance

    See Also

    Disclosure Statement

    Sun Storage F5100 Flash Array delivered 1.6M 4K read IOPS and 12.8 GB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 12, 2009.

    TPC-C World Record Sun - Oracle

    TPC-C Sun SPARC Enterprise T5440 with Oracle RAC World Record Database Result

    Sun and Oracle demonstrate the World's fastest database performance. Sun Microsystems using 12 Sun SPARC Enterprise T5440 servers, 60 Sun Storage F5100 Flash arrays and Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning delivered a world-record TPC-C benchmark result.

    • The 12-node Sun SPARC Enterprise T5440 server cluster result delivered a world record TPC-C benchmark result of 7,646,486.7 tpmC and $2.36 $/tpmC (USD) using Oracle 11g R1 on a configuration available 3/19/10.

    • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the IBM Power 595 (5GHz) with IBM DB2 9.5 database by 26% and has 16% better price/performance on the TPC-C benchmark.

    • The complete Oracle/Sun solution used 10.7x better computational density than the IBM configuration (computational density = performance/rack).

    • The complete Oracle/Sun solution used 8 times fewer racks than the IBM configuration.

    • The complete Oracle/Sun solution has 5.9x better power/performance than the IBM configuration.

    • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the HP Superdome (1.6GHz Itanium2) by 87% and has 19% better price/performance on the TPC-C benchmark.

    • The Oracle/Sun solution utilized Sun FlashFire technology to deliver this result. The Sun Storage F5100 flash array was used for database storage.

    • Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning scales and effectively uses all of the nodes in this configuration to produce the world record performance.

    • This result showed Sun and Oracle's integrated hardware and software stacks provide industry-leading performance.

    More information on this benchmark will be posted in the next several days.

    Performance Landscape

    TPC-C results (sorted by tpmC, bigger is better)


    System
    tpmC Price/tpmC Avail Database Cluster Racks w/KtpmC
    12 x Sun SPARC Enterprise T5440 7,646,487 2.36 USD 03/19/10 Oracle 11g RAC Y 9 9.6
    IBM Power 595 6,085,166 2.81 USD 12/10/08 IBM DB2 9.5 N 76 56.4
    Bull Escala PL6460R 6,085,166 2.81 USD 12/15/08 IBM DB2 9.5 N 71 56.4
    HP Integrity Superdome 4,092,799 2.93 USD 08/06/07 Oracle 10g R2 N 46 to be added

    Avail - Availability date
    w/KtmpC - Watts per 1000 tpmC
    Racks - clients, servers, storage, infrastructure

    Results and Configuration Summary

    Hardware Configuration:

      9 racks used to hold

      Servers:
        12 x Sun SPARC Enterprise T5440
        4 x 1.6 GHz UltraSPARC T2 Plus
        512 GB memory
        10 GbE network for cluster
      Storage:
        60 x Sun Storage F5100 Flash Array
        61 x Sun Fire X4275, Comstar SAS target emulation
        24 x Sun StorageTek 6140 (16 x 300 GB SAS 15K RPM)
        6 x Sun Storage J4400
        3 x 80-port Brocade FC switches
      Clients:
        24 x Sun Fire X4170, each with
        2 x 2.53 GHz X5540
        48 GB memory

    Software Configuration:

      Solaris 10 10/09
      OpenSolaris 6/09 (COMSTAR) for Sun Fire X4275
      Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning
      Tuxedo CFS-R Tier 1
      Sun Web Server 7.0 Update 5

    Benchmark Description

    TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

    POSTSCRIPT: Here are some comments on IBM's grasping-at-straws-perf/core attacks on the TPC-C result:
    c0t0d0s0 blog: "IBM's Reaction to Sun&Oracle TPC-C

    See Also

    Disclosure Statement

    TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Performance Processing Council (TPC). 12-node Sun SPARC Enterprise T5440 Cluster (1.6GHz UltraSPARC T2 Plus, 4 processor) with Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning, 7,646,486.7 tpmC, $2.36/tpmC. Available 3/19/10. IBM Power 595 (5GHz Power6, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5, 6,085,166 tpmC, $2.81/tpmC, available 12/10/08. HP Integrity Superdome(1.6GHz Itanium2, 64 processors, 128 cores, 256 threads) with Oracle 10g Enterprise Edition, 4,092,799 tpmC, $2.93/tpmC. Available 8/06/07. Source: www.tpc.org, results as of 10/11/09.

    Thursday Jun 25, 2009

    Sun SSD Server Platform Bandwidth and IOPS (Speeds & Feeds)

    The Sun SSD (32 GB SATA 2.5" SSD) is the world's first enterprise-quality, open-standard Flash design. Built to an industry-standard JEDEC form factor, the module is being made available to developers and the OpenSolaris Storage community to foster Flash innovation. The Sun SSD delivers unprecedented IO performance, saves on power, space, and cooling, and will enable new levels of server optimization and datacenter efficiencies.

    • The Sun SSD demonstrated performance of 98K 4K random read IOPS on a Sun Fire X4450 server running the Solaris operating system.

    Performance Landscape

    Solaris 10 Results

    Test SSD Result
    X4450 T5240
    Random Read (4K) 98.4K IOPS 71.5K IOPS
    Random Write (4K) 31.8K IOPS 14.4K IOPS
    50-50 Read/Write (4K) 14.9K IOPS 15.7K IOPS
    Sequential Read (MB/sec) 764 MB/sec 1012 MB/sec
    Sequential Write (MB/sec) 376 MB/sec 531 MB/sec

    Results and Configuration Summary

    Storage:

      4 x Sun SSD
      32 GB SATA 2.5" SSD (24 GB usable)
      2.5in drive form factor

    Servers:

      Sun SPARC Enterprise T5240 - 4 internal drive slots used (LSI driver)
      Sun Fire X4450 - 4 internal drive slots used (LSI driver)

    Software:

      OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
      Vdbench 5.0

    Benchmark Description

    Sun measured a wide variety of IO performance metrics on the Sun SSD using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

    Vdbench profile:

      wd=wm_80dr,sd=sd\*,readpct=0,rhpct=0,seekpct=100
      wd=ws_80dr,sd=sd\*,readpct=0,rhpct=0,seekpct=0
      wd=rm_80dr,sd=(sd1-sd80),readpct=100,rhpct=0,seekpct=100
      wd=rs_80dr,sd=(sd1-sd80),readpct=100,rhpct=0,seekpct=0
      wd=rwm_80dr,sd=sd\*,readpct=50,rhpct=0,seekpct=100
      rd=default
      ###Random Read and writes tests varying transfer size
      rd=default,el=30m,in=6,forx=(4K),forth=(32),io=max,pause=20
      rd=run1_rm_80dr,wd=rm_80dr
      rd=run2_wm_80dr,wd=wm_80dr
      rd=run3_rwm_80dr,wd=rwm_80dr
      ###Sequential read and Write tests varying transfer size
      rd=default,el=30m,in=6,forx=(512k),forth=(32),io=max,pause=20
      rd=run4_rs_80dr,wd=rs_80dr
      rd=run5_ws_80dr,wd=ws_80dr
    Vdbench is publicly available for download at: http://vdbench.org

    Key Points and Best Practices

    • All measurements were done with the internal HBA and not the internal RAID.

    See Also

    Disclosure Statement

    Sun SSD delivered 71.5K 4K read IOPS and 1012 MB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of June 17, 2009.

    Friday Jun 19, 2009

    SSDs in HPC: Reducing the I/O Bottleneck BluePrint Best Practices

    High-Performance Computing (HPC) applications can be dramatically increased by simply using SSDs instead of traditional hard drives. To read about these findings see the Sun BluePrint by Larry McIntosh and Michael Burke, called "Solid State Drives in HPC: Reducing the I/O Bottleneck".

    There was a BestPerf blog posting on the NASTRAN/SSD results at:
    http://blogs.sun.com/BestPerf/entry/sun_fire_x2270_msc_nastran

    Our BestPerf authors will blog about more of their recent benchmarks in the coming weeks.

    Tuesday Jun 16, 2009

    Sun Fire X2270 MSC/Nastran Vendor_2008 Benchmarks

    Significance of Results

    The I/O intensive MSC/Nastran Vendor_2008 benchmark test suite was used to compare the performance on a Sun Fire X2270 server when using SSDs internally instead of HDDs.

    The effect on performance from increasing memory to augment I/O caching was also examined. The Sun Fire X2270 server was equipped with Intel QC Xeon X5570 processors (Nehalem). The positive effect of adding memory to increase I/O caching is offset to some degree by the reduction in memory frequency with additional DIMMs in the bays of each memory channel on each cpu socket for these Nehalem processors.

    • SSDs can significantly improve NASTRAN performance especially on runs with larger core counts.
    • Additional memory in the server can also increase performance, however in some systems additional memory can decrease memory GHz so this may offset the benefits of increased capacity.
    • If SSDs are not used striped disks will often improve performance of IO-bound MCAE applications.
    • To obtain the highest performance it is recommended that SSDs be used and servers be configured with the largest memory possible without decreasing memory GHz. One should always look at the workload characteristics and compare against this benchmark to correctly set expectations.

    SSD vs. HDD Performance

    The performance of two striped 30GB SSDs was compared to two striped 7200 rpm 500GB SATA drives on a Sun Fire X2270 server.

    • At the 8-core level (maximum cores for a single node) SSDs were 2.2x faster for the larger xxocmd2 and the smaller xlotdf1 cases.
    • For 1-core results SSDs are up to 3% faster.
    • On the smaller mdomdf1 test case there was no increase in performance on the 1-, 2-, and 4-cores configurations.

    Performance Enhancement with I/O Memory Caching

    Performance for Nastran can often be increased by additional memory to provide additional in-core space to cache I/O and thereby reduce the IO demands.

    The main memory was doubled from 24GB to 48GB. At the 24GB level one 4GB DIMM was placed in the first bay of each of the 3 CPU memory channels on each of the two CPU sockets on the Sun Fire X2270 platform. This configuration allows a memory frequency of 1333MHz.

    At the 48GB level a second 4GB DIMM was placed in the second bay of each of the 3 CPU memory channels on each socket. This reduces the memory frequency to 1066MHz.

    Adding Memory With HDDs (SATA)

    • The additional server memory increased the performance when running with the slower SATA drives at the higher core levels (e.g. 4- & 8-cores on a single node)
    • The larger xxocmd2 case was 42% faster and the smaller xlotdf1 case was 32% faster at the maximum 8-core level on a single system.
    • The special I/O intensive getrag case was 8% faster at the 1-core level.

    Adding Memory With SDDs

    • At the maximum 8-core level (for a single node) the larger xxocmd2 case was 47% faster in overall run time.
    • The effects were much smaller at lower core counts and in the tests at the 1-core level most test cases ran from 5% to 14% slower with the slower CPU memory frequency dominating over the added in-core space available for I/O caching vs. direct transfer to SSD.
    • Only the special I/O intensive getrag case was an exception running 6% faster at the 1-core level.

    Increasing performance with Two Striped (SATA) Drives

    The performance of multiple striped drives was also compared to single drive. The study compared two striped internal 7200 rpm 500GB SATA drives to a singe single internal SATA drive.

    • On a single node with 8 cores, the largest test xx0cmd2 was 40% faster, a smaller test case xl0tdf1 was 33% faster and even the smallest test case mdomdf1 case was 12% faster.

    • On 1-core the added boost in performance with striped disks was from 4% to 13% on the various test cases.

    • One 1-core the special I/O-intensive test case getrag was 29% faster.

    Performance Landscape

    Times in table are elapsed time (sec).


    MSC/Nastran Vendor_2008 Benchmark Test Suite

    Test Cores Sun Fire X2270
    2 x X5570 QC 2.93 GHz
    2 x 7200 RPM SATA HDDs
    Sun Fire X2270
    2 x X5570 QC 2.93 GHz
    2 x SSDs
    48 GB
    1067MHz
    24 GB
    2 SATA
    1333MHz
    24 GB
    1 SATA
    1333MHz
    Ratio (2xSATA):
    48GB/
    24GB
    Ratio:
    2xSATA/
    1xSATA
    48 GB
    1067MHz
    24 GB
    1333MHz
    Ratio:
    48GB/
    24GB
    Ratio (24GB):
    2xSATA/
    2xSSD

    vlosst1 1 133 127 134 1.05 0.95 133 126 1.05 1.01

    xxocmd2 1
    2
    4
    8
    946
    622
    466
    1049
    895
    614
    631
    1554
    978
    703
    991
    2590
    1.06
    1.01
    0.74
    0.68
    0.87
    0.87
    0.64
    0.60
    947
    600
    426
    381
    884
    583
    404
    711
    1.07
    1.03
    1.05
    0.53
    1.01
    1.05
    1.56
    2.18

    xlotdf1 1
    2
    4
    8
    2226
    1307
    858
    912
    2000
    1240
    833
    1562
    2081
    1308
    1030
    2336
    1.11
    1.05
    1.03
    0.58
    0.96
    0.95
    0.81
    0.67
    2214
    1315
    744
    674
    1939
    1189
    751
    712
    1.14
    1.10
    0.99
    0.95
    1.03
    1.04
    1.11
    2.19

    xloimf1 1 1216 1151 1236 1.06 0.93 1228 1290 0.95 0.89

    mdomdf1 1
    2
    4
    987
    524
    270
    913
    485
    237
    983
    520
    269
    1.08
    1.08
    1.14
    0.93
    0.93
    0.88
    987
    524
    270
    911
    484
    250
    1.08
    1.08
    1.08
    1.00
    1.00
    0.95

    Sol400_1
    (xl1fn40_1)
    1 2555 2479 2674 1.03 0.93 2549 2402 1.06 1.03

    Sol400_S
    (xl1fn40_S)
    1 2450 2302 2481 1.06 0.93 2449 2262 1.08 1.02

    getrag
    (xx0xst0)
    1 778 843 1178 0.92 0.71 771 817 0.94 1.03

    Results and Configuration Summary

    Hardware Configuration:
      Sun Fire X2270
        1 2-socket rack mounted server
        2 x 2.93 GHz QC Intel Xeon X5570 processors
        2 x internal striped SSDs
        2 x internal striped 7200 rpm 500GB SATA drives

    Software Configuration:

      O/S: Linux 64-bit SUSE SLES 10 SP 2
      Application: MSC/NASTRAN MD 2008
      Benchmark: MSC/NASTRAN Vendor_2008 Benchmark Test Suite
      HP MPI: 02.03.00.00 [7585] Linux x86-64
      Voltaire OFED-5.1.3.1_5 GridStack for SLES 10

    Benchmark Description

    The benchmark tests are representative of typical MSC/Nastran applications including both SMP and DMP runs involving linear statics, nonlinear statics, and natural frequency extraction.

    The MD (Multi Discipline) Nastran 2008 application performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior and/or deformations are concerned. The new release includes the MARC module for general purpose nonlinear analyses and the Dytran module that employs an explicit solver to analyze crash and high velocity impact conditions.

    • As of the Summer '08 there is now an official Solaris X64 version of the MD Nastran 2008 system that is certified and maintained.
    • The memory requirements for the test cases in the new MSC/Nastran Vendor 2008 benchmark test suite range from a few hundred megabytes to no more than 5 GB.

    Please go here for a more complete description of the tests.

    Key Points and Best Practices

    For more on Best Practices of SSD on HPC applications also see the Sun Blueprint:
    http://wikis.sun.com/display/BluePrints/Solid+State+Drives+in+HPC+-+Reducing+the+IO+Bottleneck

    Additional information on the MSC/Nastran Vendor 2008 benchmark test suite.

    • Based on the maximum physical memory on a platform the user can stipulate the maximum portion of this memory that can be allocated to the Nastran job. This is done on the command line with the mem= option. On Linux based systems where the platform has a large amount of memory and where the model does not have large scratch I/O requirements the memory can be allocated to a tmpfs scratch space file system. On Solaris X64 systems advantage can be taken of ZFS for higher I/O performance.

    • The MSC/Nastran Vendor 2008 test cases don't scale very well, a few not at all and the rest on up to 8 cores at best.

    • The test cases for the MSC/Nastran module all have a substantial I/O component where 15% to 25% of the total run times are associated with I/O activity (primarily scratch files). The required scratch file size ranges from less than 1 GB on up to about 140 GB. Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system, further enhanced as indicated here by implementing the Lustre based I/O system. High performance interconnects such as Infiniband for inter node cluster message passing as well as I/O transfer from the storage system can also enhance performance substantially.

    See Also

    Disclosure Statement

    MSC.Software is a registered trademark of MSC. All information on the MSC.Software website is copyrighted. MSC/Nastran Vendor 2008 results from http://www.mscsoftware.com and this report as of June 9, 2009.

    About

    BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

    Index Pages
    Search

    Archives
    « May 2016
    SunMonTueWedThuFriSat
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
        
           
    Today