Wednesday Sep 25, 2013

SPARC T5-8 Delivers World Record Oracle OLAP Perf Version 3 Benchmark Result on Oracle Database 12c

Oracle's SPARC T5-8 server delivered world record query performance for systems running Oracle Database 12c for the Oracle OLAP Perf Version 3 benchmark.

  • The query throughput on the SPARC T5-8 server is 1.7x higher than that of an 8-chip Intel Xeon E7-8870 server. Both systems had sub-second average response times.

  • The SPARC T5-8 server with the Oracle Database demonstrated the ability to support at least 700 concurrent users querying OLAP cubes (with no think time), processing 2.33 million analytic queries per hour with an average response time of less than 1 second per query. This performance was enabled by keeping the entire cube in-memory utilizing the 4 TB of memory on the SPARC T5-8 server.

  • Assuming a 60 second think time between query requests, the SPARC T5-8 server can support approximately 39,450 concurrent users with the same sub-second response time.

  • The workload uses a set of realistic Business Intelligence (BI) queries that run against an OLAP cube based on a 4 billion row fact table of sales data. The 4 billion rows are partitioned by month spanning 10 years.

  • The combination of the Oracle Database 12cwith the Oracle OLAP option running on a SPARC T5-8 server supports live data updates occurring concurrently with minimally impacted user query executions.

Performance Landscape

Oracle OLAP Perf Version 3 Benchmark
Oracle cube base on 4 billion fact table rows
10 years of data partitioned by month
System Queries/
hour
Users Average Response
Time (sec)
0 sec think time 60 sec think time
SPARC T5-8 2,329,000 700 39,450 <1 sec
8-chip Intel Xeon E7-8870 1,354,000 120 22,675 <1 sec

Configuration Summary

SPARC T5-8:

1 x SPARC T5-8 server with
8 x SPARC T5 processors, 3.6 GHz
4 TB memory
Data Storage and Redo Storage
Flash Storage
Oracle Solaris 11.1 (11.1.8.2.0)
Oracle Database 12c Release 1 (12.1.0.1) with Oracle OLAP option

Sun Server X2-8:

1 x Sun Server X2-8 with
8 x Intel Xeon E7-8870 processors, 2.4 GHz
1 TB memory
Data Storage and Redo Storage
Flash Storage
Oracle Solaris 10 10/12
Oracle Database 12c Release 1 (12.1.0.1) with Oracle OLAP option

Benchmark Description

The Oracle OLAP Perf Version 3 benchmark is a workload designed to demonstrate and stress the ability of the OLAP Option to deliver fast query, near real-time updates and rich calculations using a multi-dimensional model in the context of the Oracle data warehousing.

The bulk of the benchmark entails running a number of concurrent users, each issuing typical multidimensional queries against an Oracle cube. The cube has four dimensions: time, product, customer, and channel. Each query user issues approximately 150 different queries. One query chain may ask for total sales in a particular region (e.g South America) for a particular time period (e.g. Q4 of 2010) followed by additional queries which drill down into sales for individual countries (e.g. Chile, Peru, etc.) with further queries drilling down into individual stores, etc. Another query chain may ask for yearly comparisons of total sales for some product category (e.g. major household appliances) and then issue further queries drilling down into particular products (e.g. refrigerators, stoves. etc.), particular regions, particular customers, etc.

While the core of every OLAP Perf benchmark is real world query performance, the benchmark itself offers numerous execution options such as varying data set sizes, number of users, numbers of queries for any given user and cube update frequency. Version 3 of the benchmark is executed with a much larger number of query streams than previous versions and used a cube designed for near real-time updates. The results produced by version 3 of the benchmark are not directly comparable to results produced by previous versions of the benchmark.

The near real-time update capability is implemented along the following lines. A large Oracle cube, H, is built from a 4 billion row star schema, containing data up until the end of last business day. A second small cube, D, is then created which will contain all of today's new data coming in from outside the world. It will be updated every L minutes with the data coming in within the last L minutes. A third cube, R, joins cubes H and D for reporting purposes much like a view might join data from two tables. Calculations are installed into cube R. The use of a reporting cube which draws data from different storage cubes is a common practice.

Query users are never locked out of query operations while new data is added to the update cube. The point of the demonstration is to show that an Oracle OLAP system can be designed which results in data being no more than L minutes out of date, where L may be as low as just a few minutes. This is what is meant by near real-time analytics.

Key Points and Best Practices

  • Building and querying cubes with the Oracle OLAP option requires a large temporary tablespace. Normally temporary tablespaces would reside on disk storage. However, because the SPARC T5-8 server used in this benchmark had 4 TB of main memory, it was possible to use main memory for the OLAP temporary tablespace. This was accomplished by using a temporary, memory-based file system (TMPFS) for the temporary tablespace datafiles.

  • Since typical business intelligence users are often likely to issue similar queries, either with the same or different constants in the where clauses, setting the init.ora parameter "cursor_sharing" to "force" provides for additional query throughput and a larger number of potential users.

  • Assuming the normal Oracle Database initialization parameters (e.g. SGA, PGA, processes etc.) are appropriately set, out of the box performance for the Oracle OLAP workload should be close to what is reported here. Additional performance resulted from using memory for the OLAP temporary tablespace setting "cursor_sharing" to force.

  • Oracle OLAP Cube update performance was optimized by running update processes in the FX class with a priority greater than 0.

  • The maximum lag time between updates to the source fact table and data availability to query users (what was referred to as L in the benchmark description) was less than 3 minutes for the benchmark environment on the SPARC T5-8 server.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 09/22/2013.

Tuesday Mar 26, 2013

SPARC T5-8 Delivers Oracle OLAP World Record Performance

Oracle's SPARC T5-8 server delivered world record query performance with near real-time analytic capability using the Oracle OLAP Perf Version 3 workload running Oracle Database 11g Release 2 on Oracle Solaris 11.

  • The maximum query throughput on the SPARC T5-8 server is 1.6x higher than that of the 8-chip Intel Xeon E7-8870 server. Both systems had sub-second response time.

  • The SPARC T5-8 server with the Oracle Database demonstrated the ability to support at least 600 concurrent users querying OLAP cubes (with no think time), processing 2.93 million analytic queries per hour with an average response time of 0.66 seconds per query. This performance was enabled by keeping the entire cube in-memory utilizing the 4 TB of memory on the SPARC T5-8 server.

  • Assuming a 60 second think time between query requests, the SPARC T5-8 server can support approximately 49,450 concurrent users with the same 0.66 sec response time.

  • The SPARC T5-8 server delivered 4.3x times the maximum query throughput of a SPARC T4-4 server.

  • The workload uses a set of realistic BI queries that run against an OLAP cube based on a 4 billion row fact table of sales data. The 4 billion rows are partitioned by month spanning 10 years.

  • The combination of the Oracle Database with the Oracle OLAP option running on a SPARC T5-8 server supports live data updates occurring concurrently with minimally impacted user query executions.

Performance Landscape

Oracle OLAP Perf Version 3 Benchmark
Oracle cube base on 4 billion fact table rows
10 years of data partitioned by month
System Queries/
hour
Users* Average Response
Time (sec)
0 sec think time 60 sec think time
SPARC T5-8 2,934,000 600 49,450 0.66
8-chip Intel Xeon E7-8870 1,823,000 120 30,500 0.19
SPARC T4-4 686,500 150 11,580 0.71

Configuration Summary and Results

SPARC T5-8 Hardware Configuration:

1 x SPARC T5-8 server with
8 x SPARC T5 processors, 3.6 GHz
4 TB memory
Data Storage and Redo Storage
1 x Sun Storage F5100 Flash Array (with 80 FMODs)
Oracle Solaris 11.1
Oracle Database 11g Release 2 (11.2.0.3) with Oracle OLAP option

Sun Server X2-8 Hardware Configuration:

1 x Sun Server X2-8 with
8 x Intel Xeon E7-8870 processors, 2.4 GHz
512 GB memory
Data Storage and Redo Storage
3 x StorageTek 2540/2501 array pairs
Oracle Solaris 10 10/12
Oracle Database 11g Release 2 (11.2.0.2) with Oracle OLAP option

SPARC T4-4 Hardware Configuration:

1 x SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
1 TB memory
Data Storage
1 x Sun Fire X4275 (using COMSTAR)
2 x Sun Storage F5100 Flash Array (each with 80 FMODs)
Redo Storage
1 x Sun Fire X4275 (using COMSTAR with 8 HDD)
Oracle Solaris 11 11/11
Oracle Database 11g Release 2 (11.2.0.3) with Oracle OLAP option

Benchmark Description

The Oracle OLAP Perf Version 3 benchmark is a workload designed to demonstrate and stress the ability of the OLAP Option to deliver fast query, near real-time updates and rich calculations using a multi-dimensional model in the context of the Oracle data warehousing.

The bulk of the benchmark entails running a number of concurrent users, each issuing typical multidimensional queries against an Oracle cube. The cube has four dimensions: time, product, customer, and channel. Each query user issues approximately 150 different queries. One query chain may ask for total sales in a particular region (e.g South America) for a particular time period (e.g. Q4 of 2010) followed by additional queries which drill down into sales for individual countries (e.g. Chile, Peru, etc.) with further queries drilling down into individual stores, etc. Another query chain may ask for yearly comparisons of total sales for some product category (e.g. major household appliances) and then issue further queries drilling down into particular products (e.g. refrigerators, stoves. etc.), particular regions, particular customers, etc.

While the core of every OLAP Perf benchmark is real world query performance, the benchmark itself offers numerous execution options such as varying data set sizes, number of users, numbers of queries for any given user and cube update frequency. Version 3 of the benchmark is executed with a much larger number of query streams than previous versions and used a cube designed for near real-time updates. The results produced by version 3 of the benchmark are not directly comparable to results produced by previous versions of the benchmark.

The near real-time update capability is implemented along the following lines. A large Oracle cube, H, is built from a 4 billion row star schema, containing data up until the end of last business day. A second small cube, D, is then created which will contain all of today's new data coming in from outside the world. It will be updated every L minutes with the data coming in within the last L minutes. A third cube, R, joins cubes H and D for reporting purposes much like a view might join data from two tables. Calculations are installed into cube R. The use of a reporting cube which draws data from different storage cubes is a common practice.

Query users are never locked out of query operations while new data is added to the update cube. The point of the demonstration is to show that an Oracle OLAP system can be designed which results in data being no more than L minutes out of date, where L may be as low as just a few minutes. This is what is meant by near real-time analytics.

Key Points and Best Practices

  • Update performance of the D cube was optimized by running update processes in the FX class with a priority greater than 0. The maximum lag time between updates to the source fact table and data availability to query users (what was referred to as L in the benchmark description) was less than 3 minutes for the benchmark environment on the SPARC T5-8 server.

  • Building and querying cubes with the Oracle OLAP option requires a large temporary tablespace. Normally temporary tablespaces would reside on disk storage. However, because the SPARC T5-8 server used in this benchmark had 4 TB of main memory, it was possible to use main memory for the OLAP temporary tablespace. This was done by using files in /tmp for the temporary tablespace datafiles.

  • Since typical BI users are often likely to issue similar queries, either with the same, or different, constants in the where clauses, setting the init.ora parameter "cursor_sharing" to "force" provides for additional query throughput and a larger number of potential users.

  • Assuming the normal Oracle initialization parameters (e.g. SGA, PGA, processes etc.) are appropriately set, out of the box performance for the OLAP Perf workload should be close to what is reported here. Additional performance resulted from (a)using memory for the OLAP temporary tablespace (b)setting "cursor_sharing" to force.

  • For a given number of query users with zero think time, the main measured metrics are the average query response time and the query throughput. A derived metric is the maximum number of users the system can support, with the same response time, assuming some non-zero think time. The calculation of this maximum is from the well-known "response-time law"

      N = (rt + tt) * tp

    where rt is the average response time, tt is the think time and tp is the measured throughput.

    Setting tt to 60 seconds, rt to 0.66 seconds and tp to 815 queries/sec (2,934,000 queries/hour), the above formula shows that the SPARC T5-8 server will support 49,450 concurrent users with a think time of 60 seconds and an average response time of 0.66 seconds.

    For more information about the "response-time law" see chapter 3 from the book "Quantitative System Performance" cited below.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 03/26/2013.

Friday Feb 08, 2013

Improved Oracle Solaris 10 1/13 Secure Copy Performance for High Latency Networks

With Oracle Solaris 10 1/13, the performance of secure copy or scp is significantly improved for high latency networks.

  • Oracle Solaris 10 1/13 enabling a TCP receive window size up to 1 MB has up to 8 times faster transfer times over the latency range 50 - 200 msec compared to the previous Oracle Solaris 10 8/11.

  • The default TCP receive window size of 48 KB delivered similar performance in both Oracle Solaris 10 1/13 and Oracle Solaris 10 8/11.

  • In this study, settings above 1 MB for the TCP receive window size delivered similar performance to the 1 MB results.

  • The tuning of the TCP receive window has been available in Oracle Solaris for some time. This improved performance is available with Oracle Solaris 10 1/13 and Oracle Solaris 11.

Performance Landscape

T4-4_SSH_SCP.png

X4170M2_SSH_SCP.png

Configuration Summary

Test Systems:

SPARC T4-4 server
4 x SPARC T4 processor 3.0 GHz
1 TB memory
Oracle Solaris 10 1/13
Oracle Solaris 10 8/11

Sun Fire X4170 M2
2 x Intel Xeon X5675 3.06 GHz
48 GB memory
Oracle Solaris 10 1/13
Oracle Solaris 10 8/11

Driver System:

Sun Fire X4170 M2
2 x Intel Xeon X5675 3.06 GHz
48 GB memory
Oracle Solaris 10

Router / Programmable Delay System:

Sun Fire X4170 M2
2 x Intel Xeon X5675 3.06 GHz
48 GB memory
Oracle Solaris 10

Switch in between the router and the 2 test systems

Cisco linksys SR2024C

Benchmark Description

This benchmark measures the scp performance between two systems with variable router delays in the network between the two systems. A file size of 48 MB was used while measuring the affects of varying the latency (network delays) and varying the TCP receive window size.

Key Points and Best Practices

  • The WAN emulator (aka. hxbt) is used in the router to achieve delays. Verification of network function and characteristics confirmed after setting the simulator using Netperf latency and bandwidth tests between driver and test system.

  • Transfers performed over 1 GbE private, dedicated network.

  • Files were transferred to and from /tmp (i.e. in memory) on the test systems to minimize effect of filesystem performance and variability on the measurements.

  • Larger TCP receive windows than default can be enabled using the system-wide parameter tcp_recv_hiwat (e.g. to enable 1024 KB windows using this method, use the command: ndd -set /dev/tcp tcp_recv_hiwat 1048576). To make this change persistent the command will have to be added to system startup scripts.

  • sshd on target system must be restarted before any benefit can be observed after increasing the enabled tcp receive buffer size. (e.g: can restart with the command /usr/sbin/svcadm restart svc:/network/ssh:default)

  • Note that tcp_recv_hiwat is a system-wide variable that adjusts the entire TCP stack. Care, therefore, must be taken to make sure that changes do not adversely affect your environment.

  • Geographically distant servers can be affected by connection latencies of the kind presented here.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 2/08/2013.

Thursday Nov 08, 2012

SPARC T4-4 Delivers World Record Performance on Oracle OLAP Perf Version 2 Benchmark

Oracle's SPARC T4-4 server delivered world record performance with subsecond response time on the Oracle OLAP Perf Version 2 benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 11.

  • The SPARC T4-4 server achieved throughput of 430,000 cube-queries/hour with an average response time of 0.85 seconds and the median response time of 0.43 seconds. This was achieved by using only 60% of the available CPU resources leaving plenty of headroom for future growth.

Performance Landscape

Oracle OLAP Perf Version 2 Benchmark
4 Billion Fact Table Rows
System Queries/
hour
Users* Response Time (sec)
Average Median
SPARC T4-4 430,000 7,300 0.85 0.43

* Users - the supported number of users with a given think time of 60 seconds

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
1 TB memory
Data Storage
1 x Sun Fire X4275 (using COMSTAR)
2 x Sun Storage F5100 Flash Array (each with 80 FMODs)
Redo Storage
1 x Sun Fire X4275 (using COMSTAR with 8 HDD)

Software Configuration:

Oracle Solaris 11 11/11
Oracle Database 11g Release 2 (11.2.0.3) with Oracle OLAP option

Benchmark Description

The Oracle OLAP Perf Version 2 benchmark is a workload designed to demonstrate and stress the Oracle OLAP product's core features of fast query, fast update, and rich calculations on a multi-dimensional model to support enhanced Data Warehousing.

The bulk of the benchmark entails running a number of concurrent users, each issuing typical multidimensional queries against an Oracle OLAP cube. The cube has four dimensions: time, product, customer, and channel. Each query user issues approximately 150 different queries. One query chain may ask for total sales in a particular region (e.g South America) for a particular time period (e.g. Q4 of 2010) followed by additional queries which drill down into sales for individual countries (e.g. Chile, Peru, etc.) with further queries drilling down into individual stores, etc. Another query chain may ask for yearly comparisons of total sales for some product category (e.g. major household appliances) and then issue further queries drilling down into particular products (e.g. refrigerators, stoves. etc.), particular regions, particular customers, etc.

Results from version 2 of the benchmark are not comparable with version 1. The primary difference is the type of queries along with the query mix.

Key Points and Best Practices

  • Since typical BI users are often likely to issue similar queries, with different constants in the where clauses, setting the init.ora prameter "cursor_sharing" to "force" will provide for additional query throughput and a larger number of potential users. Except for this setting, together with making full use of available memory, out of the box performance for the OLAP Perf workload should provide results similar to what is reported here.

  • For a given number of query users with zero think time, the main measured metrics are the average query response time, the median query response time, and the query throughput. A derived metric is the maximum number of users the system can support achieving the measured response time assuming some non-zero think time. The calculation of the maximum number of users follows from the well-known response-time law

      N = (rt + tt) * tp

    where rt is the average response time, tt is the think time and tp is the measured throughput.

    Setting tt to 60 seconds, rt to 0.85 seconds and tp to 119.44 queries/sec (430,000 queries/hour), the above formula shows that the T4-4 server will support 7,300 concurrent users with a think time of 60 seconds and an average response time of 0.85 seconds.

    For more information see chapter 3 from the book "Quantitative System Performance" cited below.

See Also

Disclosure Statement

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 11/2/2012.

Tuesday Oct 02, 2012

World Record Oracle E-Business Consolidated Workload on SPARC T4-2

Oracle set a World Record for the Oracle E-Business Suite Standard Medium multiple-online module benchmark using Oracle's SPARC T4-2 and SPARC T4-4 servers which ran the application and database.

  • Oracle's SPARC T4 servers demonstrate performance leadership and world-record results on Oracle E-Business Suite Applications R12 OLTP benchmark by publishing the first result using multiple concurrent online application modules with Oracle Database 11g Release 2 running Solaris.

  •  

  • This results shows that a multi-tier configuration of SPARC T4 servers running the Oracle E-Business Suite R12.1.2 application and Oracle Database 11g Release 2 is capable of supporting 4,100 online users with outstanding response-times, executing a mix of complex transactions consolidating 4 Oracle E-Business modules (iProcurement, Order Management, Customer Service and HR Self-Service).

  •  

  • The SPARC T4-2 server in the application tier utilized about 65% and the SPARC T4-4 server in the database tier utilized about 30%, providing significant headroom for additional Oracle E-Business Suite R12.1.2 processing modules, more online users, and future growth.

  •  

  • Oracle E-Business Suite Applications were run in Oracle Solaris Containers on SPARC T4 servers and provides a consolidation platform for multiple E-Business instances.

  •  

Performance Landscape

Multiple Online Modules (Self-Service, Order-Management, iProcurement, Customer-Service)
Medium Configuration
System Users Average
Response Time
90th Percentile
Response Time
SPARC T4-2 4,100 2.08 sec 2.52 sec

Configuration Summary

Application Tier Configuration:

1 x SPARC T4-2 server
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
3 x 300 GB internal disks
Oracle Solaris 10
Oracle E-Business Suite 12.1.2

Database Tier Configuration:

1 x SPARC T4-4 server
4 x SPARC T4 processors, 3.0 GHz
256 GB memory
2 x 300 GB internal disks
Oracle Solaris 10
Oracle Solaris Containers
Oracle Database 11g Release 2

Storage Configuration:

1 x Sun Storage F5100 Flash Array (80 x 24 GB flash modules)

Benchmark Description

The Oracle R12 E-Business Suite Standard Benchmark combines online transaction execution by simulated users with multiple online concurrent modules to model a typical scenario for a global enterprise. The online component exercises the common UI flows which are most frequently used by a majority of our customers. This benchmark utilized four concurrent flows of OLTP transactions, for Order to Cash, iProcurement, Customer Service and HR Self-Service and measured the response times. The selected flows model simultaneous business activities inclusive of managing customers, services, products and employees.

See Also

Disclosure Statement

Oracle E-Business Suite R12 medium multiple-online module benchmark, SPARC T4-2, SPARC T4, 2.85 GHz, 2 chips, 16 cores, 128 threads, 256 GB memory, SPARC T4-4, SPARC T4, 3.0 GHz, 4 chips, 32 cores, 256 threads, 256 GB memory, average response time 2.08 sec, 90th percentile response time 2.52 sec, Oracle Solaris 10, Oracle Solaris Containers, Oracle E-Business Suite 12.1.2, Oracle Database 11g Release 2, Results as of 9/30/2012.

Monday Oct 01, 2012

World Record Batch Rate on Oracle JD Edwards Consolidated Workload with SPARC T4-2

Oracle produced a World Record batch throughput for single system results on Oracle's JD Edwards EnterpriseOne Day-in-the-Life benchmark using Oracle's SPARC T4-2 server running Oracle Solaris Containers and consolidating JD Edwards EnterpriseOne, Oracle WebLogic servers and the Oracle Database 11g Release 2. The workload includes both online and batch workload.

  • The SPARC T4-2 server delivered a result of 8,000 online users while concurrently executing a mix of JD Edwards EnterpriseOne Long and Short batch processes at 95.5 UBEs/min (Universal Batch Engines per minute).

  • In order to obtain this record benchmark result, the JD Edwards EnterpriseOne, Oracle WebLogic and Oracle Database 11g Release 2 servers were executed each in separate Oracle Solaris Containers which enabled optimal system resources distribution and performance together with scalable and manageable virtualization.

  • One SPARC T4-2 server running Oracle Solaris Containers and consolidating JD Edwards EnterpriseOne, Oracle WebLogic servers and the Oracle Database 11g Release 2 utilized only 55% of the available CPU power.

  • The Oracle DB server in a Shared Server configuration allows for optimized CPU resource utilization and significant memory savings on the SPARC T4-2 server without sacrificing performance.

  • This configuration with SPARC T4-2 server has achieved 33% more Users/core, 47% more UBEs/min and 78% more Users/rack unit than the IBM Power 770 server.

  • The SPARC T4-2 server with 2 processors ran the JD Edwards "Day-in-the-Life" benchmark and supported 8,000 concurrent online users while concurrently executing mixed batch workloads at 95.5 UBEs per minute. The IBM Power 770 server with twice as many processors supported only 12,000 concurrent online users while concurrently executing mixed batch workloads at only 65 UBEs per minute.

  • This benchmark demonstrates more than 2x cost savings by consolidating the complete solution in a single SPARC T4-2 server compared to earlier published results of 10,000 users and 67 UBEs per minute on two SPARC T4-2 and SPARC T4-1.

  • The Oracle DB server used mirrored (RAID 1) volumes for the database providing high availability for the data without impacting performance.

Performance Landscape

JD Edwards EnterpriseOne Day in the Life (DIL) Benchmark
Consolidated Online with Batch Workload

System Rack
Units
(U)
Batch
Rate
(UBEs/m)
Online
Users
Users
/ U
Users
/ Core
Version
SPARC T4-2 (2 x SPARC T4, 2.85 GHz) 3 95.5 8,000 2,667 500 9.0.2
IBM Power 770 (4 x POWER7, 3.3 GHz, 32 cores) 8 65 12,000 1,500 375 9.0.2

Batch Rate (UBEs/m) — Batch transaction rate in UBEs per minute

Configuration Summary

Hardware Configuration:

1 x SPARC T4-2 server with
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
4 x 300 GB 10K RPM SAS internal disk
2 x 300 GB internal SSD
2 x Sun Storage F5100 Flash Arrays

Software Configuration:

Oracle Solaris 10
Oracle Solaris Containers
JD Edwards EnterpriseOne 9.0.2
JD Edwards EnterpriseOne Tools (8.98.4.2)
Oracle WebLogic Server 11g (10.3.4)
Oracle HTTP Server 11g
Oracle Database 11g Release 2 (11.2.0.1)

Benchmark Description

JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations.

Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company.

  • The workload consists of online transactions and the UBE – Universal Business Engine workload of 61 short and 4 long UBEs.

  • LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time.

  • The UBE processes workload runs from the JD Enterprise Application server.

    • Oracle's UBE processes come as three flavors:

      • Short UBEs < 1 minute engage in Business Report and Summary Analysis,

      • Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address,

      • Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs.

    • The UBE workload generates large numbers of PDF files reports and log files.

    • The UBE Queues are categorized as the QBATCHD, a single threaded queue for large and medium UBEs, and the QPROCESS queue for short UBEs run concurrently.

Oracle's UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute.

Key Points and Best Practices

Two JD Edwards EnterpriseOne Application Servers, two Oracle WebLogic Servers 11g Release 1 coupled with two Oracle Web Tier HTTP server instances and one Oracle Database 11g Release 2 database on a single SPARC T4-2 server were hosted in separate Oracle Solaris Containers bound to four processor sets to demonstrate consolidation of multiple applications, web servers and the database with best resource utilizations.

  • Interrupt fencing was configured on all Oracle Solaris Containers to channel the interrupts to processors other than the processor sets used for the JD Edwards Application server, Oracle WebLogic servers and the database server.

  • A Oracle WebLogic vertical cluster was configured on each WebServer Container with twelve managed instances each to load balance users' requests and to provide the infrastructure that enables scaling to high number of users with ease of deployment and high availability.

  • The database log writer was run in the real time RT class and bound to a processor set.

  • The database redo logs were configured on the raw disk partitions.

  • The Oracle Solaris Container running the Enterprise Application server completed 61 Short UBEs, 4 Long UBEs concurrently as the mixed size batch workload.

  • The mixed size UBEs ran concurrently from the Enterprise Application server with the 8,000 online users driven by the LoadRunner.

See Also

Disclosure Statement

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 09/30/2012.

Oracle TimesTen In-Memory Database Performance on SPARC T4-2

The Oracle TimesTen In-Memory Database is optimized to run on Oracle's SPARC T4 processor platforms running Oracle Solaris 11 providing unsurpassed scalability, performance, upgradability, protection of investment and return on investment. The following demonstrate the value of combining Oracle TimesTen In-Memory Database with SPARC T4 servers and Oracle Solaris 11:

On a Mobile Call Processing test, the 2-socket SPARC T4-2 server outperforms:

  • Oracle's SPARC Enterprise M4000 server (4 x 2.66 GHz SPARC64 VII+) by 34%.

  • Oracle's SPARC T3-4 (4 x 1.65 GHz SPARC T3) by 2.7x, or 5.4x per processor.

Utilizing the TimesTen Performance Throughput Benchmark (TPTBM), the SPARC T4-2 server protects investments with:

  • 2.1x the overall performance of a 4-socket SPARC Enterprise M4000 server in read-only mode and 1.5x the performance in update-only testing. This is 4.2x more performance per processor than the SPARC64 VII+ 2.66 GHz based system.

  • 10x more performance per processor than the SPARC T2+ 1.4 GHz server.

  • 1.6x better performance per processor than the SPARC T3 1.65 GHz based server.

In replication testing, the two socket SPARC T4-2 server is over 3x faster than the performance of a four socket SPARC Enterprise T5440 server in both asynchronous replication environment and the highly available 2-Safe replication. This testing emphasizes parallel replication between systems.

Performance Landscape

Mobile Call Processing Test Performance

System Processor Sockets/Cores Tps Tps/
Socket
SPARC T4-2 SPARC T4, 2.85 GHz 2 16 218,400 109,200
M4000 SPARC64 VII+, 2.66 GHz 4 16 162,900 40,725
SPARC T3-4 SPARC T3, 1.65 GHz 4 64 80,400 20,100

TimesTen Performance Throughput Benchmark (TPTBM) Read-Only

System Processor Sockets/Cores Tps Tps/
Socket
SPARC T4-2 SPARC T4, 2.85 GHz 2 16 6.5M 3.3M
SPARC T3-4 SPARC T3, 1.65 GHz 4 64 7.9M 2.0M
M4000 SPARC64 VII+, 2.66 GHz 4 16 3.1M 0.8M
T5440 SPARC T2+, 1.4 GHz 4 32 3.1M 0.8M

TimesTen Performance Throughput Benchmark (TPTBM) Update-Only

System Processor Sockets/Cores Tps Tps/
Socket
SPARC T4-2 SPARC T4, 2.85 GHz 2 16 547,800 273,900
M4000 SPARC64 VII+, 2.66 GHz 4 16 363,800 90,950
SPARC T3-4 SPARC T3, 1.65 GHz 4 64 240,250 60,125

TimesTen Replication Tests

System Processor Sockets/Cores Asynchronous 2-Safe
SPARC T4-2 SPARC T4, 2.85 GHz 2 16 38,024 13,701
SPARC T5440 SPARC T2+, 1.4 GHz 4 32 11,621 4,615

Configuration Summary

Hardware Configurations:

SPARC T4-2 server
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
1 x 8 Gbs FC Qlogic HBA
1 x 6 Gbs SAS HBA
4 x 300 GB internal disks
Sun Storage F5100 Flash Array (40 x 24 GB flash modules)
1 x Sun Fire X4275 server configured as COMSTAR head

SPARC T3-4 server
4 x SPARC T3 processors, 1.6 GHz
512 GB memory
1 x 8 Gbs FC Qlogic HBA
8 x 146 GB internal disks
1 x Sun Fire X4275 server configured as COMSTAR head

SPARC Enterprise M4000 server
4 x SPARC64 VII+ processors, 2.66 GHz
128 GB memory
1 x 8 Gbs FC Qlogic HBA
1 x 6 Gbs SAS HBA
2 x 146 GB internal disks
Sun Storage F5100 Flash Array (40 x 24 GB flash modules)
1 x Sun Fire X4275 server configured as COMSTAR head

Software Configuration:

Oracle Solaris 11 11/11
Oracle TimesTen 11.2.2.4

Benchmark Descriptions

TimesTen Performance Throughput BenchMark (TPTBM) is shipped with TimesTen and measures the total throughput of the system. The workload can test read-only, update-only, delete and insert operations as required.

Mobile Call Processing is a customer-based workload for processing calls made by mobile phone subscribers. The workload has a mixture of read-only, update, and insert-only transactions. The peak throughput performance is measured from multiple concurrent processes executing the transactions until a peak performance is reached via saturation of the available resources.

Parallel Replication tests using both asynchronous and 2-Safe replication methods. For asynchronous replication, transactions are processed in batches to maximize the throughput capabilities of the replication server and network. In 2-Safe replication, also known as no data-loss or high availability, transactions are replicated between servers immediately emphasizing low latency. For both environments, performance is measured in the number of parallel replication servers and the maximum transactions-per-second for all concurrent processes.

See Also

Disclosure Statement

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 1 October 2012.

Wednesday Nov 30, 2011

SPARC T4-4 Beats 8-CPU IBM POWER7 on TPC-H @3000GB Benchmark

Oracle's SPARC T4-4 server delivered a world record TPC-H @3000GB benchmark result for systems with four processors. This result beats eight processor results from IBM (POWER7) and HP (x86). The SPARC T4-4 server also delivered better performance per core than these eight processor systems from IBM and HP. Comparisons below are based upon system to system comparisons, highlighting Oracle's complete software and hardware solution.

This database world record result used Oracle's Sun Storage 2540-M2 arrays (rotating disk) connected to a SPARC T4-4 server running Oracle Solaris 11 and Oracle Database 11g Release 2 demonstrating the power of Oracle's integrated hardware and software solution.

  • The SPARC T4-4 server based configuration achieved a TPC-H scale factor 3000 world record for four processor systems of 205,792 QphH@3000GB with price/performance of $4.10/QphH@3000GB.

  • The SPARC T4-4 server with four SPARC T4 processors (total of 32 cores) is 7% faster than the IBM Power 780 server with eight POWER7 processors (total of 32 cores) on the TPC-H @3000GB benchmark.

  • The SPARC T4-4 server is 36% better in price performance compared to the IBM Power 780 server on the TPC-H @3000GB Benchmark.

  • The SPARC T4-4 server is 29% faster than the IBM Power 780 for data loading.

  • The SPARC T4-4 server is up to 3.4 times faster than the IBM Power 780 server for the Refresh Function.

  • The SPARC T4-4 server with four SPARC T4 processors is 27% faster than the HP ProLiant DL980 G7 server with eight x86 processors on the TPC-H @3000GB benchmark.

  • The SPARC T4-4 server is 52% faster than the HP ProLiant DL980 G7 server for data loading.

  • The SPARC T4-4 server is up to 3.2 times faster than the HP ProLiant DL980 G7 for the Refresh Function.

  • The SPARC T4-4 server achieved a peak IO rate from the Oracle database of 17 GB/sec. This rate was independent of the storage used, as demonstrated by the TPC-H @3000TB benchmark which used twelve Sun Storage 2540-M2 arrays (rotating disk) and the TPC-H @1000TB benchmark which used four Sun Storage F5100 Flash Array devices (flash storage). [*]

  • The SPARC T4-4 server showed linear scaling from TPC-H @1000GB to TPC-H @3000GB. This demonstrates that the SPARC T4-4 server can handle the increasingly larger databases required of DSS systems. [*]

  • The SPARC T4-4 server benchmark results demonstrate a complete solution of building Decision Support Systems including data loading, business questions and refreshing data. Each phase usually has a time constraint and the SPARC T4-4 server shows superior performance during each phase.

[*] The TPC believes that comparisons of results published with different scale factors are misleading and discourages such comparisons.

Performance Landscape

The table lists the leading TPC-H @3000GB results for non-clustered systems.

TPC-H @3000GB, Non-Clustered Systems
System
Processor
P/C/T – Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC Enterprise M9000
3.0 GHz SPARC64 VII+
64/256/256 – 1024 GB
386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g R2 09/22/11
SPARC T4-4
3.0 GHz SPARC T4
4/32/256 – 1024 GB
205,792.0 $4.10 190,325.1 222,515.9 Oracle 11g R2 05/31/12
SPARC Enterprise M9000
2.88 GHz SPARC64 VII
32/128/256 – 512 GB
198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g R2 12/09/10
IBM Power 780
4.1 GHz POWER7
8/32/128 – 1024 GB
192,001.1 $6.37 210,368.4 175,237.4 Sybase 15.4 11/30/11
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
8/64/128 – 512 GB
162,601.7 $2.68 185,297.7 142,685.6 SQL Server 2008 10/13/10

P/C/T = Processors, Cores, Threads
QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric in USD (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

The following table lists data load times and refresh function times during the power run.

TPC-H @3000GB, Non-Clustered Systems
Database Load & Database Refresh
System
Processor
Data Loading
(h:m:s)
T4
Advan
RF1
(sec)
T4
Advan
RF2
(sec)
T4
Advan
SPARC T4-4
3.0 GHz SPARC T4
04:08:29 1.0x 67.1 1.0x 39.5 1.0x
IBM Power 780
4.1 GHz POWER7
05:51:50 1.5x 147.3 2.2x 133.2 3.4x
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
08:35:17 2.1x 173.0 2.6x 126.3 3.2x

Data Loading = database load time
RF1 = power test first refresh transaction
RF2 = power test second refresh transaction
T4 Advan = the ratio of time to T4 time

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server
4 x SPARC T4 3.0 GHz processors (total of 32 cores, 128 threads)
1024 GB memory
8 x internal SAS (8 x 300 GB) disk drives

External Storage:

12 x Sun Storage 2540-M2 array storage, each with
12 x 15K RPM 300 GB drives, 2 controllers, 2 GB cache

Software Configuration:

Oracle Solaris 11 11/11
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 3000 GB (Scale Factor 3000)
TPC-H Composite: 205,792.0 QphH@3000GB
Price/performance: $4.10/QphH@3000GB
Available: 05/31/2012
Total 3 year Cost: $843,656
TPC-H Power: 190,325.1
TPC-H Throughput: 222,515.9
Database Load Time: 4:08:29

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

  • Twelve Sun Storage 2540-M2 arrays were used for the benchmark. Each Sun Storage 2540-M2 array contains 12 15K RPM drives and is connected to a single dual port 8Gb FC HBA using 2 ports. Each Sun Storage 2540-M2 array showed 1.5 GB/sec for sequential read operations and showed linear scaling, achieving 18 GB/sec with twelve Sun Storage 2540-M2 arrays. These were stand alone IO tests.

  • The peak IO rate measured from the Oracle database was 17 GB/sec.

  • Oracle Solaris 11 11/11 required very little system tuning.

  • Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric in which to compare systems.

  • The SPARC T4-4 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle Database parallel processes.

  • Six Sun Storage 2540-M2 arrays were mirrored to another six Sun Storage 2540-M2 arrays on which all of the Oracle database files were placed. IO performance was high and balanced across all the arrays.

  • The TPC-H Refresh Function (RF) simulates periodical refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T4-4 server outperformed both the IBM POWER7 server and HP ProLiant DL980 G7 server. (See the RF columns above.)

See Also

Disclosure Statement

TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org. SPARC T4-4 205,792.0 QphH@3000GB, $4.10/QphH@3000GB, available 5/31/12, 4 processors, 32 cores, 256 threads; IBM Power 780 QphH@3000GB, 192,001.1 QphH@3000GB, $6.37/QphH@3000GB, available 11/30/11, 8 processors, 32 cores, 128 threads; HP ProLiant DL980 G7 162,601.7 QphH@3000GB, $2.68/QphH@3000GB available 10/13/10, 8 processors, 64 cores, 128 threads.

Wednesday Nov 02, 2011

SPARC T4-2 Server Beats 2-Socket 3.46 GHz x86 on Black-Scholes Option Pricing Test

Oracle's SPARC T4-2 server (two SPARC T4 processors at 2.85 GHz) delivered 21% better performance compared to a two-socket x86 server (with two Intel X5690 3.46 GHz processors) running a Black-Scholes options pricing test on 10 million options.

  • The hyper-threads of the Intel processor did not deliver additional performance, it actually caused a reduction in performance of 6%. The performance of hyper-threading on Intel processors will vary depending on workload

  • This test shows how delivered performance is not easily predicted just by processor frequency alone. It is vital that hardware and software be designed in tandem in order to deliver best performance.

Performance Landscape

Black-Scholes options pricing, 10 million options, results in seconds, 100 iterations of the test, smaller is better.

System Time (sec)
SPARC T4-2 (2 x SPARC T4, 2.85 GHz, 128 software threads) 9.2
2-socket x86 (2 x X5690, 3.46 GHz, 12 software threads) 11.7

Advantage SPARC T4-2 21% faster

The hyper-threads of the Intel processor did not deliver additional performance, causing a reduction in performance of 6%.

Configuration Summary

SPARC Configuration:

SPARC T4-2 server
2 x SPARC T4 processor 2.85 GHz
128 GB memory
Oracle Solaris 10 8/11

Intel Configuration:

Sun Fire X4270 M2
2 x Intel Xeon X5690 3.46 GHz, Hyper-Threading and Turbo Boost active
48 GB memory
Oracle Linux 6.1

Benchmark Description

Black-Scholes option pricing model is a financial market algorithm that uses the Black-Scholes partial differential equation (PDE) to calculate prices for European stock options. The key idea is that the value of the option fluctuates over time with the actual value of the stock. The reported time is just for calculating the options, no I/O component. The computation is floating point intensive and requires the calculation of logarithms, exponentials and square roots.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 11/1/2011.

Monday Oct 03, 2011

SPARC T4-4 Servers Set World Record on SPECjEnterprise2010, Beats IBM POWER7, Cisco x86

Oracle produced a world record SPECjEnterprise2010 benchmark result of 40,104.86 SPECjEnterprise2010 EjOPS using four of Oracle's SPARC T4-4 servers in the application tier and two more SPARC T4-4 servers for the database server.

  • The four SPARC T4-4 server configuration (sixteen SPARC T4 processors total, 3.0 GHz) demonstrated 2.4x better performance compared to the IBM Power 780 server (eight POWER7 processors, 3.86 THz) result of 16,646.34 SPECjEnterprise2010 EjOPS.

  • In the database tier, two SPARC T4-4 servers with a total of eight SPARC T4 processors at 3.0 GHz, processed 2.4x more transactions compared to the IBM result of 16,646.34 SPECjEnterprise2010 EjOPS which used four POWER7 processors at 3.55 GHz.

  • The four SPARC T4-4 server configuration demonstrated 1.5x better performance compared to the Cisco UCS B440 M2 Blade Server result of 26,118.67 SPECjEnterprise2010 EjOPS.

  • The four SPARC T4-4 server configuration demonstrated 2.3x better performance compared to the Cisco UCS B440 M1 Blade Server result of 17,301.86 SPECjEnterprise2010 EjOPS.

  • This result demonstrated less than 1 second average response times for all SPECjEnterprise2010 transactions and 90% of all transaction times took less than 1 second.

  • This result demonstrated a sustained Java EE 5 transaction load generated by approximately 320,000 users.

  • This result using 16 Oracle WebLogic 10.3.5 server instances demonstrated 4.8x better performance per application server instance when compared to the IBM result which used 32 WebSphere instances.

  • The SPARC T4-4 servers delivered a 6.7x price/performance advantage over the IBM Power 780 for the servers used in the application tier (see disclosure statement below for details). This price/performance advantage in the application tier was accomplished with a SPARC T4-4 server configuration with 2 TB of total memory compared to the IBM solution with 0.5 TB of memory.

  • The SPARC T4-4 servers had a 1.9x advantage over IBM in performance per space for the application tier (see disclosure statement below for details) even though the Oracle solution had four servers.

  • The four SPARC T4-4 servers used for the application tier used Oracle Solaris Containers to consolidate four Oracle WebLogic application server instances on each server to achieve this result.

  • The two SPARC T4-4 servers used for the database tier hosted Oracle Database 11g Release 2 and Oracle RAC cluster software using Oracle Automatic Storage Management (ASM).

  • Oracle Fusion Middleware provides a family of complete, integrated, hot pluggable and best-of-breed products known for enabling enterprise customers to create and run agile and intelligent business applications. Oracle WebLogic Server's on-going, record-setting Java application server performance demonstrates why so many customers rely on Oracle Fusion Middleware as their foundation for innovation.

Performance Landscape

Complete benchmark results are at the SPEC website, SPECjEnterprise2010 Results.

SPECjEnterprise2010 Performance Chart
as of 10/11/2011
Submitter EjOPS* Java EE Server DB Server
Oracle 40,104.86 4 x SPARC T4-4
4 chips, 32 cores, 3.0 GHz SPARC T4
Oracle WebLogic 11g (10.3.5)
2 x SPARC T4-4
4 chips, 32 cores, 3.0 GHz SPARC T4
Oracle 11g DB 11.2.0.2
Cisco 26,118.67 2 x Cisco UCS B440 M2
4 chips, 40 cores, 2.4 GHz Xeon E7-4870
Oracle WebLogic 11g (10.3.5)
1 x Cisco UCS C460 M2
4 chips, 40 cores, 2.4 GHz Xeon E7-4870
Oracle 11g DB 11.2.0.2
Cisco 17,301.86 2 x Cisco UCS B440 M1
4 chips, 32 cores, 2.26 GHz Xeon X7560
Oracle WebLogic 10.3.4
1 x Cisco UCS C460 M1
4 chips, 32 cores, 2.26 GHz Xeon X7560
Oracle 11g DB 11.2.0.2
IBM 16,646.34 1 x IBM Power 780
8 chips, 64 cores, 3.86 GHz POWER7
WebSphere Application Server V7.0
1 x IBM Power 750 Express
4 chips, 32 cores, 3.55 GHz POWER7
IBM DB2 Universal Database 9.7

* SPECjEnterprise2010 EjOPS (bigger is better)

Configuration Summary

Application Servers:

4 x SPARC T4-4 servers, each with
4 x 3.0 GHz SPARC T4 processors
512 GB memory
2 x 10GbE NIC
Oracle Solaris 10 8/11
Oracle WebLogic Server 11g Release 1 (10.3.5)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_26 (Java SE 6 Update 26)

Database Servers:

2 x SPARC T4-4 servers, each with
4 x 3.0 GHz SPARC T4 processors
1024 GB memory
2 x 10GbE NIC
4 x 8Gb FC HBA
Oracle Solaris 10 8/11
Oracle Database 11g Enterprise Edition Release 11.2.0.2
Oracle Real Application Clusters 11g Release 2

Storage Servers:

8 x Sun Fire X4270 M2 (12-Drive)
1 x 3.0 GHz Intel Xeon
8 GB memory
1 x 8Gb FC HBA
Oracle Solaris 11 Express 2010.11
8 x Sun Storage F5100 Flash Arrays

Switch Hardware:

2 x Sun Network 10GbE 72-port Top of Rack (ToR) Switch
1 x Brocade 5300 80-port Fiber Channel Switch

Benchmark Description

SPECjEnterprise2010 is the third generation of the SPEC organization's J2EE end-to-end industry standard benchmark application. The new SPECjEnterprise2010 benchmark has been re-designed and developed to cover the Java EE 5 specification's significantly expanded and simplified programming model, highlighting the major features used by developers in the industry today. This provides a real world workload driving the Application Server's implementation of the Java EE specification to its maximum potential and allowing maximum stressing of the underlying hardware and software systems,
  • The web container, servlets, and web services
  • The EJB container
  • JPA 1.0 Persistence Model
  • JMS and Message Driven Beans
  • Transaction management
  • Database connectivity
Moreover, SPECjEnterprise2010 also heavily exercises all parts of the underlying infrastructure that make up the application environment, including hardware, JVM software, database software, JDBC drivers, and the system network.

The primary metric of the SPECjEnterprise2010 benchmark is jEnterprise Operations Per Second (SPECjEnterprise2010 EjOPS). The primary metric for the SPECjEnterprise2010 benchmark is calculated by adding the metrics of the Dealership Management Application in the Dealer Domain and the Manufacturing Application in the Manufacturing Domain. There is NO price/performance metric in this benchmark.

Key Points and Best Practices

  • Four Oracle WebLogic server instances on each SPARC T4-4 server were hosted in 4 separate Oracle Solaris Containers to demonstrate consolidation of multiple application servers.
  • Each Oracle Solaris Container was bound to a separate processor set, each contained 7 cores (total 56 threads). This was done to improve performance by reducing memory access latency by using the physical memory closest to the processors. The default set was used for network and disk interrupt handling.
  • The Oracle WebLogic application servers were executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • The Oracle database processes were run in 2 processor sets using psrset(1M) and executed in the FX scheduling class. This improved performance by reducing memory access latency and reducing context switches.
  • The Oracle log writer process was run in a separate processor set containing 2 threads and run in the RT scheduling class. This insured that the log writer had the most efficient use of CPU resources.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjEnterprise are registered trademarks of the Standard Performance Evaluation Corporation. Results from www.spec.org as of 10/11/2011. SPARC T4-4, 40,104.86 SPECjEnterprise2010 EjOPS; Cisco UCS B440 M2, 26,118.67 SPECjEnterprise2010 EjOPS; Cisco UCS B440 M1, 17,301.86 SPECjEnterprise2010 EjOPS; IBM Power 780, 16,646.34 SPECjEnterprise2010 EjOPS.

SPECjEnterprise2010 models contemporary Java-based applications that run on large Java EE (Java Enterprise Edition) servers, backed by network infrastructure and database servers. Focusing on the critical Java EE server hardware & OS, the IBM result includes a Java EE server with a list price of $1.30 million. The Oracle Java EE servers have a list price of $0.47 million. The Java EE server price versus delivered EjOPS is $77.97/EjOPS for IBM versus $11.67/EjOPS for Oracle. Oracle's $/perf advantage is 6.7x better than IBM ($77.97/$11.67).

Pricing details for IBM, IBM p780 512GB based on public pricing at http://tpc.org/results/FDR/TPCH/TPC-H_1TB_IBM780_Sybase-FDR.pdf. Adjusted hardware costs to license all 64 cores. AIX pricing at: http://www-304.ibm.com/easyaccess3/fileserve?contentid=214347 and AIX Standard Edition V7.1 per processor (5765-G98-0017 64*2,600=$166,400). This gives application tier hardware & OS Price/perf: $77.97/EjOPS (1297956/16646.34)

Pricing details for Oracle, four SPARC T4-4 512 GB, HW acquisition price from Oracle's price list: $467,856 http://www.oracle.com. This gives application tier hardware & OS Price/perf: $11.67/EjOPS (467856/40104.86)

The Oracle application tier servers occupy 20U of space, 40,140.86/20=2005 EjOPS/U. The IBM application tier server occupies 16U of space, 16,646.34/16=1040 EjOPS/U. 2005/1040=1.9x

SPARC T4-4 Beats IBM POWER7 and HP Itanium on TPC-H @1000GB Benchmark

Oracle's SPARC T4-4 server configured with SPARC-T4 processors, Oracle's Sun Storage F5100 Flash Array storage, Oracle Solaris, and Oracle Database 11g Release 2 achieved a TPC-H benchmark performance result of 201,487 QphH@1000GB with price/performance of $4.60/QphH@1000GB.

  • The SPARC T4-4 server benchmark results demonstrate a complete solution of building Decision Support Systems including data loading, business questions and refreshing data. Each phase usually has a time constraint and the SPARC T4-4 server shows superior performance during each phase.

  • The SPARC T4-4 server is 22% faster than the 8-socket IBM POWER7 server with the same number of cores. The SPARC T4-4 server has over twice the performance per socket compared to the IBM POWER7 server.

  • The SPARC T4-4 server achieves 33% better price/performance than the IBM POWER7 server.

  • The SPARC T4-4 server is up to 4 times faster than the IBM POWER7 server for the Refresh Function.

  • The SPARC T4-4 server is 44% faster than the HP Superdome 2 server. The SPARC T4-4 server has 5.7x the performance per socket of the HP Superdome 2 server.

  • The SPARC T4-4 server is 62% better on price/performance than the HP Itanium server.

  • The SPARC T4-4 server is up to 3.7 times faster than the HP Itanium server for the Refresh Function.

  • The SPARC T4-4 server delivers nearly the same performance as Oracle's SPARC Enterprise M8000 server, but with 52% better price/performance on the TPC-H @1000GB benchmark.

  • Oracle used Storage Redundancy Level 3 as defined by the TPC-H 2.14.2 specification which is the strictest level.

  • This TPC-H result demonstrates that the SPARC T4-4 server can deliver the performance while running the increasingly larger databases required of DSS systems. The server measured more than 16 GB/sec of IO throughput through Oracle Database 11g Release 2 software while maintaining the high cpu load.

Performance Landscape

The table below lists published non-cluster results from comparable enterprise class systems from Oracle, IBM and HP. Each system was configured with 512 GB of memory.

TPC-H @1000GB

System
CPU type
Proc/Core/Thread
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC Enterprise M8000
3 GHz SPARC64 VII+
16 / 64 / 128
209,533.6 $9.53 177,845.9 246,867.2 Oracle 11g 09/22/11
SPARC T4-4
3 GHz SPARC-T4
4 / 32 / 256
201,487.0 $4.60 181,760.6 223,354.2 Oracle 11g 10/30/11
IBM Power 780
4.14 GHz POWER7
8 / 32 / 128
164,747.2 $6.85 170,206.4 159,463.1 Sybase 03/31/11
HP Superdome 2
1.73 GHz Intel Itanium 9350
16 / 64 / 64
140,181.1 $12.15 139,181.0 141,188.3 Oracle 11g 10/20/10

QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server
4 x SPARC-T4 3.0 GHz processors (total of 32 cores, 128 threads)
512 GB memory
8 x internal SAS (8 x 300 GB) disk drives

External Storage:

4 x Sun Storage F5100 Flash Array storage, each with
80 x 24 GB Flash Modules

Software Configuration:

Oracle Solaris 10 8/11
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 1000 GB (Scale Factor 1000)
TPC-H Composite: 201,487 QphH@1000GB
Price/performance: $4.60/QphH@1000GB
Available: 10/30/2011
Total 3 Year Cost: $925,525
TPC-H Power: 181,760.6
TPC-H Throughput: 223,354.2
Database Load Time: 1:22:39

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB and 10000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multi user modes. The benchmark requires reporting of price/performance, which is the ratio of QphH to total HW/SW cost plus 3 years maintenance.

Key Points and Best Practices

  • Four Sun Storage F5100 Flash Array devices were used for the benchmark. Each F5100 device contains 80 flash modules (FMODs). Twenty (20) FMODs from each F5100 device were connected to a single SAS 6 Gb HBA. A single F5100 device showed 4.16 GB/sec for sequential read and demonstrated linear scaling of 16.62 GB/sec with 4 x F5100 devices.

  • The IO rate from the Oracle database was over 16 GB/sec.

  • Oracle Solaris 10 8/11 required very little system tuning.

  • The SPARC T4-4 server and Oracle Solaris efficiently managed the system load of over one thousand Oracle parallel processes.

  • The Oracle database files for tables and indexes were managed by Oracle Automatic Storage Manager (ASM) with 4M stripe. Two F5100 devices were mirrored to another 2 F5100 devices under ASM. IO performance was high and balanced across all the FMODs.
  • The Oracle redo log files were mirrored across the F5100 devices using Oracle Solaris Volume Manager with 128K stripe.
  • Parallel degree on tables and indexes was set to 128. This setting worked the best for performance.
  • TPC-H Refresh Function simulates periodical Refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T4-4 server outperformed both HP Superdome 2 and IBM POWER7 servers.

See Also

Disclosure Statement

TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org. SPARC T4-4 201,487 QphH@1000GB, $4.60/QphH@1000GB, avail 10/30/2011, 4 processors, 32 cores, 256 threads; SPARC Enterprise M8000 209,533.6 QphH@1000GB, $9.53/QphH@1000GB, avail 09/22/11, 16 processors, 64 cores, 128 threads; IBM Power 780 QphH@1000GB, 164,747.2 QphH@1000GB, $6.85/QphH@1000GB, avail 03/31/11, 8 processors, 32 cores, 128 threads; HP Integrity Superdome 2 140,181.1 QphH@1000GB, $12.15/QphH@1000GB avail 10/20/10, 16 processors, 64, cores, 64 threads.

SPARC T4-4 Produces World Record Oracle OLAP Capacity

Oracle's SPARC T4-4 server delivered world record capacity on the Oracle OLAP Perf workload.

  • The SPARC T4-4 server was able to operate on a cube with a 3 billion row fact table of sales data containing 4 dimensions which represents as many as 70 quintillion aggregate rows (70 followed by 18 zeros).

  • The SPARC T4-4 server supported 3,500 cube-queries/minute against the Oracle OLAP cube with an average response time of 1.5 seconds and the median response time of 0.15 seconds.

Performance Landscape

Oracle OLAP Perf Benchmark
System Fact Table
Num of Rows
Cube-Queries/
minute
Median Response
seconds
Average Response
seconds
SPARC T4-4 3 Billion 3,500 0.15 1.5

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
1 TB main memory
2 x Sun Storage F5100 Flash Array

Software Configuration:

Oracle Solaris 10 8/11
Oracle Database 11g Enterprise Edition with Oracle OLAP option

Benchmark Description

OLAP Perf is a workload designed to demonstrate and stress the Oracle OLAP product's core functionalities of fast query, fast update, and rich calculations on a dimensional model to support Enhanced Data Warehousing. The workload uses a set of realistic business intelligence (BI) queries that run against an OLAP cube.

Key Points and Best Practices

  • The SPARC T4-4 server is estimated to support 2,400 interactive users with this fast response time assuming only 5 seconds between query requests.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/3/2011.

Friday Sep 30, 2011

SPARC T4-2 Server Beats Intel (Westmere AES-NI) on ZFS Encryption Tests

Oracle continues to lead in enterprise security. Oracle's SPARC T4 processors combined with Oracle's Solaris ZFS file system demonstrate faster file system encryption than equivalent systems based on the Intel Xeon Processor 5600 Sequence chips which use AES-NI security instructions.

Encryption is the process where data is encoded for privacy and a key is needed by the data owner to access the encoded data. The benefits of using ZFS encryption are:

  • The SPARC T4 processor is 3.5x to 5.2x faster than the Intel Xeon Processor X5670 that has the AES-NI security instructions in creating encrypted files.

  • ZFS encryption is integrated with the ZFS command set. Like other ZFS operations, encryption operations such as key changes and re-key are performed online.

  • Data is encrypted using AES (Advanced Encryption Standard) with key lengths of 256, 192, and 128 in the CCM and GCM operation modes.

  • The flexibility of encrypting specific file systems is a key feature.

  • ZFS encryption is inheritable to descendent file systems. Key management can be delegated through ZFS delegated administration.

  • ZFS encryption uses the Oracle Solaris Cryptographic Framework which gives it access to SPARC T4 processor and Intel Xeon X5670 processor (Intel AES-NI) hardware acceleration or to optimized software implementations of the encryption algorithms automatically.

Performance Landscape

Below are results running two different ciphers for ZFS encryption. Results are presented for runs without any cipher, labeled clear, and a variety of different key lengths.

Encryption Using AES-CCM Ciphers

MB/sec – 5 File Create* Encryption
Clear AES-256-CCM AES-192-CCM AES-128-CCM
SPARC T4-2 server 3,803 3,167 3,335 3,225
SPARC T3-2 server 2,286 1,554 1,561 1,594
2-Socket 2.93 GHz Xeon X5670 3,325 750 764 773

Speedup T4-2 vs X5670 1.1x 4.2x 4.4x 4.2x
Speedup T4-2 vs T3-2 1.7x 2.0x 2.1x 2.0x

Encryption Using AES-GCM Ciphers

MB/sec – 5 File Create* Encryption
Clear AES-256-GCM AES-192-GCM AES-128-GCM
SPARC T4-2 server 3,618 3,929 3,164 2,613
SPARC T3-2 server 2,278 1,451 1,455 1,449
2-Socket 2.93 GHz Xeon X5670 3,299 749 748 753

Speedup T4-2 vs X5670 1.1x 5.2x 4.2x 3.5x
Speedup T4-2 vs T3-2 1.6x 2.7x 2.2x 1.8x

(*) Maximum Delivered values measured over 5 concurrent mkfile operations.

Configuration Summary

Storage Configuration:

Sun Storage 6780 array
16 x 15K RPM drives
Raid 0 pool
Write back cache enable
Controller cache mirroring disabled for maximum bandwidth for test
Eight 8 Gb/sec ports per host

Server Configuration:

SPARC T4-2 server
2 x SPARC T4 2.85 GHz processors
256 GB memory
Oracle Solaris 11

SPARC T3-2 server
2 x SPARC T3 1.6 GHz processors
Oracle Solaris 11 Express 2010.11

Sun Fire X4270 M2 server
2 x Intel Xeon X5670, 2.93 GHz processors
Oracle Solaris 11

Benchmark Description

The benchmark ran the UNIX command mkfile (1M). Mkfile is a simple single threaded program to create a file of a specified size. The script ran 5 mkfile operations in the background and observed the peak bandwidth observed during the test.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of December 16, 2011.

SPARC T4 Processor Beats Intel (Westmere AES-NI) on AES Encryption Tests

The cryptography benchmark suite was internally developed by Oracle to measure the maximum throughput of in-memory, on-chip encryption operations that a system can perform. Multiple threads are used to achieve the maximum throughput.

  • Oracle's SPARC T4 processor running Oracle Solaris 11 is 1.5x faster on AES 256-bit key CFB mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 1.7x faster on AES 256-bit key CBC mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 3.6x faster on AES 256-bit key CCM mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption with authentication of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 1.4x faster on AES 256-bit key GCM mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption with authentication of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 9% faster on single-threaded AES 256-bit key CFB mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 1.8x faster on AES 256-bit key CFB mode encryption than the SPARC T3 running Solaris 11 Express.

  • AES CFB mode is used by the Oracle Database 11g for Transparent Data Encryption (TDE) which provides security to database storage.

Performance Landscape

Encryption Performance – AES-CFB

Performance is presented for in-memory AES-CFB128 mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-256-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 10,963 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 7,526 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 6,023 Oracle Solaris 11 Express, libpkcs11
Intel X5690 3.47 12 2,894 Oracle Solaris 11, libsoftcrypto
SPARC T4 2.85 1 712 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 653 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 425 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 331 Oracle Solaris 11 Express, libpkcs11

AES-192-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 12,451 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 8,677 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 6,175 Oracle Solaris 11 Express, libpkcs11
Intel X5690 3.47 12 2,976 Oracle Solaris 11, libsoftcrypto
SPARC T4 2.85 1 816 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 752 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 461 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 371 Oracle Solaris 11 Express, libpkcs11

AES-128-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 14,388 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 10,214 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 32 6,390 Oracle Solaris 11 Express, libpkcs11
Intel X5690 3.47 12 3,115 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 953 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 886 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 509 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 395 Oracle Solaris 11 Express, libpkcs11

Encryption Performance – AES-CBC

Performance is presented for in-memory AES-CBC mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-256-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 11,588 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 7,171 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 6,704 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 5,980 Oracle Solaris 11 Express, libpkcs11
SPARC T4 2.85 1 748 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 592 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 569 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 336 Oracle Solaris 11 Express, libpkcs11

AES-192-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 13,216 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 8,211 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 7,588 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 6,333 Oracle Solaris 11 Express, libpkcs11
SPARC T4 2.85 1 862 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 672 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 643 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 358 Oracle Solaris 11 Express, libpkcs11

AES-128-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 15,323 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 9,785 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 8,746 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 6,347 Oracle Solaris 11 Express, libpkcs11
SPARC T4 2.85 1 1,017 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 781 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 739 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 434 Oracle Solaris 11 Express, libpkcs11

Encryption Performance – AES-CCM

Performance is presented for in-memory AES-CCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-256-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 5,850 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,860 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,613 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 480 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 258 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 190 Oracle Linux 6.1, IPP/AES-NI

AES-192-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 6,709 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,930 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,715 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 565 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 293 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 206 Oracle Linux 6.1, IPP/AES-NI

AES-128-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 7,856 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 2,031 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,838 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 664 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 321 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 225 Oracle Linux 6.1, IPP/AES-NI

Encryption Performance – AES-GCM

Performance is presented for in-memory AES-GCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-256-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 6,871 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 4,794 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 12 1,685 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 691 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 571 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 253 Oracle Solaris 11, libsoftcrypto

AES-192-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 7,450 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 5,054 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 12 1,724 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 727 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 618 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 268 Oracle Solaris 11, libsoftcrypto

AES-128-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 7,987 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 5,315 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 12 1,781 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 765 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 655 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 281 Oracle Solaris 11, libsoftcrypto

Configuration Summary

SPARC T4-1 server
1 x SPARC T4 processor, 2.85 GHz
128 GB memory
Oracle Solaris 11

SPARC T3-1 server
1 x SPARC T3 processor, 1.65 GHz
128 GB memory
Oracle Solaris 11 Express

Sun Fire X4270 M2 server
2 x Intel Xeon X5690, 3.47 GHz
Hyper-Threading enabled
Turbo Boost enabled
24 GB memory
Oracle Linux 6.1

Sun Fire X4270 M2 server
2 x Intel Xeon X5690, 3.47 GHz
Hyper-Threading enabled
Turbo Boost enabled
24 GB memory
Oracle Solaris 11 Express

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-memory and on-chip using various ciphers, including AES-128-CFB, AES-192-CFB, AES-256-CFB, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CCM, AES-192-CCM, AES-256-CCM, AES-128-GCM, AES-192-GCM and AES-256-GCM.

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various ciphers. They were run using optimized libraries for each platform to obtain the best possible performance.

See Also

Disclosure Statement

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 1/13/2012.

Thursday Sep 29, 2011

SPARC T4 Processor Outperforms IBM POWER7 and Intel (Westmere AES-NI) on OpenSSL AES Encryption Test

Oracle's SPARC T4 processor is faster than the Intel Xeon X5690 (with AES-NI) and the IBM POWER7.

  • On single-thread OpenSSL encryption, the 2.85 GHz SPARC T4 processor is 4.3 times faster than the 3.5 GHz IBM POWER7 processor.

  • On single-thread OpenSSL encryption, the 2.85 GHz SPARC T4 processor is 17% faster than the 3.46 GHz Intel Xeon X5690 processor.

The SPARC T4 processor has Encryption Instruction Accelerators for encryption and decryption for AES and many other ciphers. The Intel Xeon X5690 processor has AES-NI instructions which accelerate only AES ciphers. The IBM POWER7 does not have cryptographic instructions, but cryptographic coprocessors are available.

Performance Landscape

The table below shows results when running the OpenSSL speed command with the AES-256-CBC cipher. The reported results are for a message size of 8192 bytes. Results are reported for a single thread and for running on all available hardware threads (no over subscribing).

OpenSSL Performance with
AES-256-CBC Encryption
Processor Performance (MB/sec)
1 Thread Maximum Throughput
(at number of threads)
SPARC T4, 2.85 GHz 769 11,967 (64)
Intel Xeon X5690, 3.46 GHz 660 7,362 (12)
IBM POWER7, 3.5 GHz 179 2,860 (est*)

(est*) The performance of the IBM POWER7 is estimated at 16 times the rate of the single thread performance. The estimate is considered an upper bound on expected performance for this processor.

Configuration Summary

SPARC Configuration:

SPARC T4-1 server
1 x SPARC T4 processors, 2.85 GHz
64 GB memory
Oracle Solaris 11

Intel Configuration:

Sun Fire X4270 M2 server
1 x Intel Xeon X5690 processors, 3.46 GHz
24 GB memory
Oracle Solaris 11

Software Configuration:

OpenSSL 1.0.0.d
gcc 3.4.3

Benchmark Description

The in-memory SSL performance was measured with the openssl command. openssl has an option for measuring the speed of various ciphers and message sizes. The actual command used to measure the speed of AES-256-CBC was:

openssl speed -multi {number of threads} -evp aes-256-cbc

openssl runs for several minutes and measures the speed, in units of MB/sec, of the specified cipher for messages of sizes 16 bytes to 8192 bytes.

Key Points and Best Practices

  • The Encryption Instruction Accelerators are accessed through a platform independent API for cryptographic engines.
  • The OpenSSL libraries use the API. The default is to not use the Encryption Instruction Accelerators.
  • Cryptography is compute intensive. Using all available threads streams, both the SPARC T4 processor and the Intel Xeon processor were able to saturate the memory bandwidth of the respective systems.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4-1 Server Outperforms Intel (Westmere AES-NI) on IPsec Encryption Tests

Oracle's SPARC T4 processor has significantly greater performance than the Intel Xeon X5690 processor when both are using Oracle Solaris 11 secure IP networking (IPsec). The SPARC T4 processor using IPsec AES-256-CCM mode achieves line speed over a 10 GbE network.

  • On IPsec, SPARC T4 processor is 23% faster than the 3.46 GHz Intel Xeon X5690 processor (Intel AES-NI).

  • The SPARC T4 processor is only at 23% utilization when running at its maximum throughput making it 3.6 times more efficient at secure networking than the 3.46 GHz Intel Xeon X5690 processor.

  • The 3.46 GHz Intel Xeon X5690 processor is nearly fully utilized at its maximum throughput leaving little CPU for application processing.

  • The SPARC T4 processor using IPsec AES-256-CCM mode achieves line speed over a 10 GbE network.

  • The SPARC T4 processor approaches line speed with fewer than one-quarter the number of IPsec streams required for the Intel Xeon X5690 processor to achieve its peak throughput. The SPARC T4 processor supports the additional streams with minimal extra CPU utilization.

IPsec provides general purpose networking security which is transparent to applications. This is ideal for supplying the capability to those networking applications that don't have cryptography built-in. IPsec provides for more than Virtual Private Networking (VPN) deployments where the technology is often first encountered.

Performance Landscape

Performance was measured using the AES-256-CCM cipher in megabits per second (Mb/sec) aggregate over sufficient numbers of TCP/IP streams to achieve line rate threshold (SPARC T4 processor) or drive a peak throughput (Intel Xeon X5690).

Processor GHz AES Decrypt AES Encrypt
B/W (Mb/sec) CPU Util Streams B/W (Mb/sec) CPU Util Streams
– Peak performance
SPARC T4 2.85 9,800 23% 96 9,800 20% 78
Intel Xeon X5690 3.46 8,000 83% 4,700 81%
– Load at which SPARC T4 processor performance crosses 9000 Mb/sec
SPARC T4 2.85 9,300 19% 17 9,200 15% 17
Intel Xeon X5690 3.46 4,700 41% 3,200 47%

Configuration Summary

SPARC Configuration:

SPARC T4-1 server
1 x SPARC T4 processor 2.85 GHz
128 GB memory
Oracle Solaris 11
Single 10-Gigabit Ethernet XAUI Adapter

Intel Configuration:

Sun Fire X4270 M2
1 x Intel Xeon X5690 3.46 GHz, Hyper-Threading and Turbo Boost active
48 GB memory
Oracle Solaris 11
Sun Dual Port 10GbE PCIe 2.0 Networking Card with Intel 82599 10GbE Controller

Driver Systems Configuration:

2 x Sun Blade 6000 chassis each with
1 x Sun Blade 6000 Virtualized Ethernet Switched Network Express Module 10GbE (NEM)
10 x Sun Blade X6270 M2 server modules each with
2 x Intel Xeon X5680 3.33 GHz, Hyper-Threading and Turbo Boost active
48 GB memory
Oracle Solaris 11
Dual 10-Gigabit Ethernet Fabric Expansion Module (FEM)

Benchmark Configuration:

Netperf 2.4.5 network benchmark adapted for testing bandwidth of multiple streams in aggregate.

Benchmark Description

The results here are derived from runs of the Netperf 2.4.5 benchmark. Netperf is a client/server benchmark measuring network performance providing a number of independent tests, including the TCP streaming bandwidth tests used here.

Netperf is, however, a single network stream benchmark and to demonstrate peak network bandwidth over a 10 GbE line under encryption requires many streams.

The Netperf documentation provides an example of using the software to drive multiple streams. The example is not sufficient to develop the workload because it does not scale beyond a single driver node which limits the processing power that can be applied. This subsequently limits how many full bandwidth streams can be supported. We chose to have a single server process on the target system (containing either the SPARC T4 processor or the Intel Xeon processor) and to spawn one or more Netperf client processes each across a cluster of the driver systems. The client processes are managed by the mpirun program of the Oracle Message Passing Toolkit.

Tabular results include aggregate bandwidth and CPU utilization. The aggregate bandwidth is computed by dividing the total traffic of the client processes by the overall runtime. CPU utilization on the target system is the average of that reported by all of the Netperf client processes.

IPsec is configured in the operating system of each participating server transparently to Netperf and applied to the dedicated network connecting the target system to the driver systems.

Key Points and Best Practices

  • Line speed is defined as data bandwidth within 10% of theoretical maximum bit rate of network line. For 10 GbE greater than 9000 Mb/sec bandwidth is defined as line speed.

  • IPsec provides network security that is configured completely in the operating system and is transparent to the application.

  • Peak bandwidths under IPsec are achieved only in aggregate with multiple client network streams to the target server.

  • Oracle Solaris receiver fanout must be increased from the default to support the large numbers of streams at quoted peak rates.

  • The ixgbe network driver relevant on servers with Intel 82599 10GbE controllers (driver systems and Intel Xeon target system) was limited to only a single receiver queue to maximize utilization of extra fanout.

  • IPsec is configured to make a unique security association (SA) for each connection to avoid a bottleneck over the large stream counts.

  • Jumbo frames are enabled (MTU of 9000) and network interrupt blanking (sometimes called interrupt coalescence) is disabled.

  • The TCP streaming bandwidth tests, which run continuously for minutes and multiple times to determine statistical significance, are configured to use message sizes of 1,048,576 bytes.

  • IPsec configuration defines that each SA is established through the use of a preshared key and Internet Key Exchange (IKE).

  • IPsec encryption uses the Solaris Cryptographic Framework which applies the appropriate accelerated provider on both the SPARC T4 processor and the Intel Xeon processor.

  • There is no need to configure a specific authentication algorithm for IPsec. With the Encapsulated Security Payload (ESP) security protocol and choosing AES-256-CCM for the encryption algorithm, the encapsulation is self-authenticating.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4-2 Server Beats Intel (Westmere AES-NI) on SSL Network Tests

Oracle's SPARC T4 processor is faster and more efficient than the Intel Xeon X5690 processor (with AES-NI) when running network SSL thoughput tests.

  • The SPARC T4 processor at 2.85 GHz is 20% faster than the 3.46 GHz Intel Xeon X5690 processor on single stream network SSL encryption.

  • The SPARC T4 processor requires fewer streams to attain near-linespeed of a 10 GbE secure network and does this with 5 times less CPU resources compared to the Intel Xeon X5690 processor.

  • Oracle's SPARC T4-2 server using 8 threads achieves line speed over a 10 GbE network with only 9% CPU utilization.

  • Oracle's Sun Fire X4270 M2 with two Intel Xeon X5690 processors achieves line speed with 8 threads, but at 45% CPU utilization.

The SPARC T4 processor has hardware support via Encryption Instruction Accelerators for encryption and decryption for AES and many other ciphers. The Intel Xeon X5690 processor has AES-NI instructions which accelerate only AES ciphers.

Performance Landscape

The following table shows single stream results running encrypted (SSL Read) and unencrypted (Clear Text) messages of 1 MB in size. These tests were run with the uperf benchmark and used the AES-256-CBC cipher. They were run across a 10 GbE connection. Write messages saw similar performance.

Single Stream Network Communication with Uperf
Processor Performance (Mb/sec)
Clear Text SSL Read
SPARC T4, 2.85 GHz 4,194 1,678
Intel Xeon X5690, 3.46 GHz 5,591 1,398

The next table shows how many streams it takes to achieve 90% of the 10 GbE network bandwidth (9000 Mb/sec) for encrypted read messages of 1 MB in size. These tests were run with the uperf benchmark and used the AES-256-CBC cipher. Write messages saw similar performance.

Uperf SSL Read with AES-256-CBC
Processor Number of
Streams for 90%
Network Utilization
CPU Utilization
SPARC T4, 2.85 GHz 8 9%
Intel Xeon X5690, 3.46 GHz 12 45%

Configuration Summary

SPARC T4 Configuration:

2 x SPARC T4-2 servers each with
2 x SPARC T4 processors, 2.85 GHz
128 GB memory
1 x 10-Gigabit Ethernet XAUI Adapter
Oracle Solaris 11
Back-to-back 10 GbE connection

Intel Configuration:

2 x Sun Fire X4270 M2 servers each with
2 x Intel Xeon X5690 processors, 3.46 GHz
48 GB memory
1 x Sun Dual Port 10GbE PCIe 2.0 Networking Card with Intel 82599 10GbE Controller
Oracle Solaris 11
Back-to-back 10 GbE connection

Software Configuration:

OpenSSL 1.0.0.d
uperf 1.0.3
gcc 3.4.3

Benchmark Description

Uperf is an open source benchmark program for simulating and measuring network performance. Uperf is able to measure the performance of various protocols, including TCP, UDP, SCTP and SSL. The uperf benchmark uses an input-defined workload to test network performance. This input workload can be used to model complex situations or to isolate simple tasks. The workload used for these tests was simple network reads and simple network writes.

Key Points and Best Practices

  • The Encryption Instruction Accelerators are accessed through a platform independent API for cryptographic engines.
  • The OpenSSL libraries use the API. The default is to not use the Encryption Instruction Accelerators.
  • Cryptography is compute intensive. Using 8 streams, the SPARC T4 processor was able to match the bandwidth of the 10 GbE network with 8 threads.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

Wednesday Sep 28, 2011

SPARC T4 Servers Set World Record on Oracle E-Business Suite R12 X-Large Order to Cash

With Oracle's SPARC T4-2 server running the application and SPARC T4-4 server running the database, Oracle set a world record result for the Oracle E-Business Suite Standard X-Large Order to Cash (OLTP) benchmark.

  • The combination of a SPARC T4-2 server running the Oracle E-Business Suite R12.1.2 application and a SPARC T4-4 server running the Oracle Database 11g Release 2 database enabled 2400 Order to Cash users of the X-Large Benchmark to simultaneously execute a large volume of medium to heavy transactions with an average response time of 2.4 seconds.

  • The SPARC T4-2 server in the application tier and the SPARC T4-4 server in the database tier are only about half utilized providing significant headroom for additional Oracle E-Business Suite R12.1.2 processing modules and future growth.

Performance Landscape

This is the first published result for the X-large benchmark using Oracle E-Business Order Management module.

OLTP Workload: Order to Cash
X-Large Configuration
System Users Average
Response Time
90th Percentile
Response Time
SPARC T4-2 2400 2.413 sec. 3.114 sec.

Configuration Summary

Application Tier Configuration:

1 x SPARC T4-2 server
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
Oracle Solaris 10 8/11
Oracle E-Business Suite 12.1.2

Database Tier Configuration:

1 x SPARC T4-4 server
4 x SPARC T4 processors, 3.0 GHz
256 GB memory
Oracle Solaris 10 8/11
Oracle Database 11g Release 2

Storage Configuration:

1 x Sun Storage F5100 Flash Array

Benchmark Description

The Oracle R12 E-Business Suite Standard Benchmark combines online transaction execution by simulated users with concurrent batch processing to model a typical scenario for a global enterprise. This benchmark ran one OLTP component, Order to Cash, in the Extra-Large size. The goal is to obtain reference response times.

Results can be published in four sizes and utilize different combination

  • X-large: Maximum online users running all business flows between 10,000 to 20,000; 750,000 order to cash lines per hour and 250,000 payroll checks per hour.
    • Order to Cash Online -- 2400 users
      • The percentage across the 5 transactions in Order Management module is:
        • Insert Manual Invoice -- 16.66%
        • Insert Order -- 32.33%
        • Order Pick Release -- 16.66%
        • Ship Confirm -- 16.66%
        • Order Summary Report -- 16.66%
    • HR Self-Service -- 4000 users
    • Customer Support Flow -- 8000 users
    • Procure to Pay -- 2000 users
  • Large: 10,000 online users; 100,000 order to cash lines per hour and 100,000 payroll checks per hour.
  • Medium: up to 3000 online users; 50,000 order to cash lines per hour and 10,000 payroll checks per hour.
  • Small: up to 1000 online users; 10,000 order to cash lines per hour and 5,000 payroll checks per hour.

See Also

Disclosure Statement

Oracle E-Business X-Large Order to Cash benchmark, SPARC T4-2, SPARC T4, 2.85 GHz, 2 chips, 16 cores, 128 threads, 256 GB memory, SPARC T4-4, SPARC T4, 3.0 GHz, 4 chips, 32 cores, 256 threads, 256 GB memory, average response time 2.413 sec, 90th percentile response time 3.114 sec, Oracle Solaris 10 8/11, Oracle E-Business Suite 12.1.2, Oracle Database 11g Release 2, Results as of 9/26/2011.

SPARC T4-2 Server Beats Intel (Westmere AES-NI) on Oracle Database Tablespace Encryption Queries

Oracle's SPARC T4 processor with Encryption Instruction Accelerators greatly improves performance over software implementations. This will greatly expand the use of TDE for many customers.

  • Oracle's SPARC T4-2 server is over 42% faster than Oracle's Sun Fire X4270 M2 (Intel AES-NI) when running DSS-style queries referencing an encrypted tablespace.

Oracle's Transparent Data Encryption (TDE) feature of the Oracle Database simplifies the encryption of data within datafiles preventing unauthorized access to it from the operating system. Tablespace encryption allows encryption of the entire contents of a tablespace.

TDE tablespace encryption has been certified with Siebel, PeopleSoft, and Oracle E-Business Suite applications

Performance Landscape

Total Query Time (time in seconds)
System GHz AES-128 AES-192 AES-256
SPARC T4-2 server 2.85 588 588 588
Sun Fire X4270 M2 (Intel X5690) 3.46 836 841 842
SPARC T4-2 Advantage
42% 43% 43%

Configuration Summary

SPARC Configuration:

SPARC T4-2 server
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
2 x Sun Storage F5100 Flash Array
Oracle Solaris 11
Oracle Database 11g Release 2

Intel Configuration:

Sun Fire X4270 M2 server
2 x Intel Xeon X5690 processors, 3.46 GHz
48 GB memory
2 x Sun Storage F5100 Flash Array
Oracle Linux 5.7
Oracle Database 11g Release 2

Benchmark Description

To test the performance of TDE, a 1 TB database was created. To demonstrate secure transactions, four 25 GB tables emulating customer private data were created: clear text, encrypted AES-128, encrypted AES-192, and encrypted AES-256. Eight queries of varying complexity that join on the customer table were executed.

The time spent scanning the customer table during each query was measured and query plans analyzed to ensure a fair comparison, e.g. no broken queries. The total query time for all queries is reported.

Key Points and Best Practices

  • Oracle Database 11g Release 2 is required for SPARC T4 processor Encryption Instruction Accelerators support with TDE tablespaces.

  • TDE tablespaces support the SPARC T4 processor Encryption Instruction Accelerators for Advanced Encryption Standard (AES) only.

  • AES-CFB is the mode used in the Oracle database with TDE

  • Prior to using TDE tablespaces you must create a wallet and setup an encryption key. Here is one method to do that:

  • Create a wallet entry in $ORACLE_HOME/network/admin/sqlnet.ora.
    ENCRYPTION_WALLET_LOCATION=
    (SOURCE=(METHOD=FILE)(METHOD_DATA=
    (DIRECTORY=/oracle/app/oracle/product/11.2.0/dbhome_1/encryption_wallet)))
    
    Set an encryption key. This also opens the wallet.
    $ sqlplus / as sysdba
    SQL> ALTER SYSTEM SET ENCRYPTION KEY IDENTIFIED BY "tDeDem0";
    
    On subsequent instance startup open the wallet.
    $ sqlplus / as sysdba
    SQL> STARTUP;
    SQL> ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY "tDeDem0";
    
  • TDE tablespace encryption and decryption occur on physical writes and reads of database blocks, respectively.

  • For parallel query using direct path reads decryption overhead varies inversely with the complexity of the query.

    For a simple full table scan query overhead can be reduced and performance improved by reducing the degree of parallelism (DOP) of the query.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4 Servers Set World Record on PeopleSoft HRMS 9.1

Oracle's SPARC T4-4 servers running Oracle's PeopleSoft HRMS Self-Service 9.1 benchmark and Oracle Database 11g Release 2 achieved World Record performance on Oracle Solaris 10.

  • Using two SPARC T4-4 servers to run the application and database tiers and one SPARC T4-2 server to run the webserver tier, Oracle demonstrated world record performance of 15,000 concurrent users running the PeopleSoft HRMS Self-Service 9.1 benchmark.

  • The combination of the SPARC T4 servers running the PeopleSoft HRMS 9.1 benchmark supports 3.8x more online users with faster response time compared to the best published result from IBM on the previous PeopleSoft HRMS 8.9 benchmark.

  • The average CPU utilization on the SPARC T4-4 server in the application tier handling 15,000 users was less than 50%, leaving significant room for application growth.

  • The SPARC T4-4 server on the application tier used Oracle Solaris Containers which provide a flexible, scalable and manageable virtualization environment.

Performance Landscape

PeopleSoft HRMS Self-Service 9.1 Benchmark
Systems Processors Users Ave Response -
Search (sec)
Ave Response -
Save (sec)
SPARC T4-2 (web)
SPARC T4-4 (app)
SPARC T4-4 (db)
2 x SPARC T4, 2.85 GHz
4 x SPARC T4, 3.0 GHz
4 x SPARC T4, 3.0 GHz
15,000 1.01 0.63
PeopleSoft HRMS Self-Service 8.9 Benchmark
IBM Power 570 (web/app)
IBM Power 570 (db)
12 x POWER5, 1.9 GHz
4 x POWER5, 1.9 GHz
4,000 1.74 1.25
IBM p690 (web)
IBM p690 (app)
IBM p690 (db)
4 x POWER4, 1.9 GHz
12 x POWER4, 1.9 GHz
6 x 4392 MPIS/Gen1
4,000 1.35 1.01

The main differences between version 9.1 and version 8.9 of the benchmark are:

  • the database expanded from 100K employees and 20K managers to 500K employees and 100K managers,
  • the manager data was expanded,
  • a new transaction, "Employee Add Profile," was added, the percent of users executing it is less then 2%, and the transaction has a heavier footprint,
  • version 9.1 has a different benchmark metric (Average Response search/save time for x number of users) versus single user search/save time,
  • newer versions of the PeopleSoft application and PeopleTools software are used.

Configuration Summary

Application Server:

1 x SPARC T4-4 server
4 x SPARC T4 processors 3.0 GHz
512 GB main memory
5 x 300 GB SAS internal disks,
2 x 100 GB internal SSDs
1 x 300 GB internal SSD
Oracle Solaris 10 8/11
PeopleSoft PeopleTools 8.51.02
PeopleSoft HCM 9.1
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_20

Web Server:

1 x SPARC T4-2 server
2 x SPARC T4 processors 2.85 GHz
256 GB main memory
1 x 300 GB SAS internal disks
1 x 300 GB internal SSD
Oracle Solaris 10 8/11
PeopleSoft PeopleTools 8.51.02
Oracle WebLogic Server 11g (10.3.3)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_20

Database Server:

1 x SPARC T4-4 server
4 x SPARC T4 processors 3.0 GHz
256 GB main memory
3 x 300 GB SAS internal disks
1 x Sun Storage F5100 Flash Array (80 flash modules)
Oracle Solaris 10 8/11
Oracle Database 11g Release 2

Benchmark Description

The purpose of the PeopleSoft HRMS Self-Service 9.1 benchmark is to measure comparative online performance of the selected processes in PeopleSoft Enterprise HCM 9.1 with Oracle Database 11g. The benchmark kit is an Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark with no dependency on remote COBOL calls, there is no batch workload, and DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published.

PeopleSoft defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consists of 14 scenarios which emulate users performing typical HCM transactions such as viewing paychecks, promoting and hiring employees, updating employee profiles and other typical HCM application transactions.

All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. The benchmark metric is the Average Response Time for search and save for 15,000 users..

Key Points and Best Practices

  • The application tier was configured with two PeopleSoft application server instances on the SPARC T4-4 server hosted in two separate Oracle Solaris Containers to demonstrate consolidation of multiple application, ease of administration, and load balancing.

  • Each PeopleSoft Application Server instance running in an Oracle Solaris Container was configured to run 5 application server Domains with 30 application server instances to be able to effectively handle the 15,000 users workload with zero application server queuing and minimal use of resources.

  • The web tier was configured with 20 WebLogic instances and with 4 GB JVM heap size to load balance transactions across 10 PeopleSoft Domains. That enables equitable distribution of transactions and scaling to high number of users.

  • Internal SSDs were configured in the application tier to host PeopleSoft Application Servers object CACHE file systems and in the web tier for WebLogic servers' logging providing near zero millisecond service time and faster server response time.

See Also

Disclosure Statement

Oracle's PeopleSoft HRMS 9.1 benchmark, www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

Tuesday Sep 27, 2011

SPARC T4-2 Servers Set World Record on JD Edwards EnterpriseOne Day in the Life Benchmark with Batch, Outperforms IBM POWER7

Using Oracle's SPARC T4-2 server for the application tier and a SPARC T4-1 server for the database tier, a world record result was produced running the Oracle's JD Edwards EnterpriseOne application Day in the Life (DIL) benchmark concurrently with a batch workload.

  • The SPARC T4-2 server running online and batch with JD Edwards EnterpriseOne 9.0.2 is 1.7x faster and has better response time than the IBM Power 750 system which only ran the online component of JD Edwards EnterpriseOne 9.0 Day in the Life test.

  • The combination of SPARC T4 servers delivered a Day in the Life benchmark result of 10,000 online users with 0.35 seconds of average transaction response time running concurrently with 112 Universal Batch Engine (UBE) processes at 67 UBEs/minute.

  • This is the first JD Edwards EnterpriseOne benchmark for 10,000 users and payroll batch on a SPARC T4-2 server for the application tier and the database tier with Oracle Database 11g Release 2. All servers ran with the Oracle Solaris 10 operating system.

  • The single-thread performance of the SPARC T4 processor produced sub-second response for the online components and provided dramatic performance for the batch jobs.

  • The SPARC T4 servers, JD Edwards EnterpriseOne 9.0.2, and Oracle WebLogic Server 11g Release 1 support 17% more users per JAS (Java Application Server) than the SPARC T3-1 server for this benchmark.

  • The SPARC T4-2 server provided a 6.7x better batch processing rate than the previous SPARC T3-1 server record result and had 2.5x faster response time.

  • The SPARC T4-2 server used Oracle Solaris Containers, which provide flexible, scalable and manageable virtualization.

  • JD Edwards EnterpriseOne uses Oracle Fusion Middleware WebLogic Server 11g R1 and Oracle Fusion Middleware Cluster Web Tier Utilities 11g HTTP server.

  • The combination of the SPARC T4-2 server and Oracle JD Edwards EnterpriseOne in the application tier with a SPARC T4-1 server in the database tier measured low CPU utilization providing headroom for growth.

Performance Landscape

JD Edwards EnterpriseOne Day in the Life Benchmark
Online with Batch Workload

System Online
Users
Resp
Time (sec)
Batch
Concur
(# of UBEs)
Batch
Rate
(UBEs/m)
Version
2xSPARC T4-2 (app+web)
SPARC T4-1 (db)
10000 0.35 112 67 9.0.2
SPARC T3-1 (app+web)
SPARC Enterprise M3000 (db)
5000 0.88 19 10 9.0.1

Resp Time (sec) — Response time of online jobs reported in seconds
Batch Concur (# of UBEs) — Batch concurrency presented in the number of UBEs
Batch Rate (UBEs/m) — Batch transaction rate in UBEs per minute

Edwards EnterpriseOne Day in the Life Benchmark
Online Workload Only

System Online
Users
Response
Time (sec)
Version
SPARC T3-1, 1 x SPARC T3 (1.65 GHz), Solaris 10 (app)
M3000, 1 x SPARC64 VII (2.75 GHz), Solaris 10 (db)
5000 0.52 9.0.1
IBM Power 750, POWER7 (3.55 GHz) (app+db) 4000 0.61 9.0

IBM result from http://www-03.ibm.com/systems/i/advantages/oracle/, IBM used WebSphere

Configuration Summary

Application Tier Configuration:

1 x SPARC T4-2 server with
2 x 2.85 GHz SPARC T4 processors
128 GB main memory
6 x 300 GB 10K RPM SAS internal HDD
Oracle Solaris 10 9/10
JD Edwards EnterpriseOne 9.0.2 with Tools 8.98.3.3

Web Tier Configuration:

1 x SPARC T4-2 server with
2 x 2.85 GHz SPARC T4 processors
256 GB main memory
2 x 300 GB SSD
4 x 300 GB 10K RPM SAS internal HDD
Oracle Solaris 10 9/10
Oracle WebLogic Server 11g Release 1

Database Tier Configuration:

1 x SPARC T4-1 server with
1 x 2.85 GHz SPARC T4 processor
128 GB main memory
6 x 300 GB 10K RPM SAS internal HDD
2 x Sun Storage F5100 Flash Array
Oracle Solaris 10 9/10
Oracle Database 11g Release 2

Benchmark Description

JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations.

Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company.

  • The workload consists of online transactions and the UBE – Universal Business Engine workload of 42 short, 8 medium and 4 long UBEs.

  • LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time.

  • The UBE processes workload runs from the JD Enterprise Application server.

    • Oracle's UBE processes come as three flavors:
      • Short UBEs < 1 minute engage in Business Report and Summary Analysis,
      • Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address,
      • Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs.
    • The UBE workload generates large numbers of PDF files reports and log files.
    • The UBE Queues are categorized as the QBATCHD, a single threaded queue for large and medium UBEs, and the QPROCESS queue for short UBEs run concurrently.

Oracle’s UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute.

Key Points and Best Practices

One JD Edwards EnterpriseOne Application Server and two Oracle WebLogic Servers 11g R1 coupled with two Oracle Fusion Middleware 11g Web Tier HTTP Server instances on the SPARC T4-2 servers were hosted in three separate Oracle Solaris Containers to demonstrate consolidation of multiple application and web servers.

  • Interrupt fencing was configured on all Oracle Solaris Containers to channel the interrupts to processors other than the processor sets used for the JD Edwards Application server and WebLogic servers.

  • Processor 0 was left alone for clock interrupts.

  • The applications were executed in the FX scheduling class to improve performance by reducing the frequency of context switches.

  • A WebLogic vertical cluster was configured on each WebServer Container with twelve managed instances each to load balance users' requests and to provide the infrastructure that enables scaling to high number of users with ease of deployment and high availability.

  • The database server was run in an Oracle Solaris Container hosted on the SPARC T4-2 server.

  • The database log writer was run in the real time RT class and bound to a processor set.

  • The database redo logs were configured on the raw disk partitions.

  • The private network between the SPARC T4-2 servers was configured with a 10 GbE interface.

  • The Oracle Solaris Container on the Enterprise Application server ran 42 Short UBEs, 8 Medium UBEs and 4 Long UBEs concurrently as the mixed size batch workload.

  • The mixed size UBEs ran concurrently from the application server with the 10000 online users driven by the LoadRunner.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4 Servers Set World Record on Siebel Loyalty Batch

Oracle's SPARC T4-2 and SPARC T4-4 servers running Oracle's Siebel Loyalty Batch engine delivered a world record result for batch processing.

  • The SPARC T4-2 and SPARC T4-4 servers running Siebel Loyalty Batch engine, part of Siebel Loyalty Solution, with Oracle Database 11g Release 2 running on Oracle Solaris 10 achieved 7.65M TPH on Accrual (Reward) processing using three Siebel Servers.

  • The world record result was achieved with 24M members and 50M records in the base transaction table.

  • Siebel Loyalty Application was configured with 50 Active Promotions with three Assign Points and four Update Attributes.

  • Oracle's Siebel Server scaled near linearly on SPARC T4 systems achieving 2.72M TPH on a single Siebel Server to 7.65M TPH with three Siebel Servers.

  • The average CPU utilization on the database tier server was 25% and on the application tier server was 65%, leaving significant room for application growth.

Performance Landscape

System Processor TPH Version
3 x SPARC T4-2 (app)
1 x SPARC T4-4 (db)
SPARC T4, 2.85 GHz
SPARC T4, 3.0 GHz
7.65M 8.1.1.1FP
2 x SPARC T3-2 (app)
1 x SPARC T3-1 (app)
1 x SPARC M5000 (db)
SPARC T3, 1.65 GHz
SPARC T3, 1.65 GHz
SPARC64 VII, 2.52 GHz
3.9M 8.1.1.1FP
Customer (app)
Customer (db)
4 x Intel E5540, 2.53 GHz
1 x Itanium, 1.6 GHz
1.5M 8.1.x

Configuration Summary

Hardware Configuration:

3 x SPARC T4-2 servers, each with
2 x SPARC T4 processors, 2.85 GHz
128 GB main memory
1 x SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
256 GB main memory
1 x Sun Storage 6180 array
16 disk drives
CSM200 with 16 disk drives

Software Configuration:

Oracle Solaris 10
Siebel Server 8.1.1.1FP
Oracle Database 11g Release 2 Enterprise Edition 11.2.0.1

Benchmark Description

Siebel Loyalty enables companies to simulate and process loyalty rewards for their activities across channels and process very high volume accrual and tier assessment transactions via batch process.

The benchmark simulates a workload of Accrual Batch Transactions Processing which imports data through Enterprise Integration Manager (EIM), evaluates eligible promotion and calculates rewards. The key performance metric is transactions per hour (TPH). Key aspects of the workload simulation include:

  • Batch Engine evaluating all accrual promotions and applying all actions in one go,
  • Users do not have control over the sequence in which promotion applied,
  • Promotion actions (assign/redeem points) are rolled back in case of failure.
The number of active promotions and, in particular, the Assign Point action has very significant impact on performance. The load simulated 50 Active promotions with 3 for Assign Points and 7 Update attribute actions configured.

The number of members and the number of queued transactions in the backend database have significant impact on the performance. The benchmark had 24 million members and 52 million records in the base transaction table. The simplified process flow of the benchmark is:

  • calculate accruals base on promotions,
  • credit points to members,
  • initiate any other actions specified in promotions.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4-4 Server Sets World Record on PeopleSoft Payroll (N.A.) 9.1, Outperforms IBM Mainframe, HP Itanium

Oracle's SPARC T4-4 server achieved world record performance on the Unicode version of Oracle's PeopleSoft Enterprise Payroll (N.A) 9.1 extra-large volume model benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 10.

  • The SPARC T4-4 server was able to process 1,460,544 payments/hour using PeopleSoft Payroll N.A 9.1.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 2.8x faster than IBM z10 EC 2097 Payroll 9.0 (UNICODE version) result of 87.4 minutes. The IBM mainframe is rated at 6,512 MIPS.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 3.1x faster than HP rx7640 Itanium2 non-UNICODE result of 96.17 minutes, on Payroll 9.0.

  • The average CPU utilization on the SPARC T4-4 server was only 30%, leaving significant room for business growth.

  • The SPARC T4-4 server processed payroll for 500,000 employees, 750,000 payments, in 30.84 minutes compared to the earlier world record result of 46.76 minutes on Oracle's SPARC Enterprise M5000 server.

  • The SPARC Enterprise M5000 server configured with eight 2.66 GHz SPARC64 VII processors has a result of 46.76 minutes on Payroll 9.1. That is 7% better than the result of 50.11 minutes on the SPARC Enterprise M5000 server configured with eight 2.53 GHz SPARC64 VII processors on Payroll 9.0. The difference in clock speed between the two processors is ~5%. That is close to the difference in the two results, thereby showing that the impact of the Payroll 9.1 benchmark on the overall result is about the same as that of Payroll 9.0.

Performance Landscape

PeopleSoft Payroll (N.A.) 9.1 – 500K Employees (7 Million SQL PayCalc, Unicode)

System OS/Database Payroll Processing
Result (minutes)
Run 1
(minutes)
Num of
Streams
SPARC T4-4, 4 x 3.0 GHz SPARC T4 Solaris/Oracle 11g 30.84 43.76 96
SPARC M5000, 8 x 2.66 GHz SPARC64 VII+ Solaris/Oracle 11g 46.76 66.28 32

PeopleSoft Payroll (N.A.) 9.0 – 500K Employees (3 Million SQL PayCalc, Non-Unicode)

System OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000, 8 x 2.53 GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 58.96 80.5 250.68 462.6 8
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 87.4 ** 107.6 - - 8
HP rx7640, 8 x 1.6 GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

** This result was run with Unicode. The IBM z10 EC 2097 UNICODE result of 87.4 minutes is 48% slower than IBM z10 EC 2097 non-UNICODE result of 58.96 minutes, both on Payroll 9.0, each configured with nine 4.4GHz Gen1 processors.

Payroll 9.1 Compared to Payroll 9.0

Please note that Payroll 9.1 is Unicode based and Payroll 9.0 had non-Unicode and Unicode versions of the workload. There are 7 million executions of an SQL statement for the PayCalc batch process in Payroll 9.1 and 3 million executions of the same SQL statement for the PayCalc batch process in Payroll 9.0. This gets reflected in the elapsed time (27.33 min for 9.1 and 23.78 min for 9.0). The elapsed times of all other batch processes is lower (better) on 9.1.

Configuration Summary

Hardware Configuration:

SPARC T4-4 server
4 x 3.0 GHz SPARC T4 processors
256 GB memory
Sun Storage F5100 Flash Array
80 x 24 GB FMODs

Software Configuration:

Oracle Solaris 10 8/11
PeopleSoft HRMS and Campus Solutions 9.10.303
PeopleSoft Enterprise (PeopleTools) 8.51.035
Oracle Database 11g Release 2 11.2.0.1 (64-bit)
Micro Focus COBOLServer Express 5.1 (64-bit)

Benchmark Description

The PeopleSoft 9.1 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing a large organization. The five processes are:

  • Paysheet Creation: Generates payroll data worksheets consisting of standard payroll information for each employee for a given pay cycle.

  • Payroll Calculation: Looks at paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by Payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by the above processes and produces an electronic transmittal file that is used to transfer payroll funds directly into an employee's bank account.

Key Points and Best Practices

  • The SPARC T4-4 server with the Sun Storage F5100 Flash Array device had an average read throughput of up to 103 MB/sec and an average write throughput of up to 124 MB/sec while consuming 30% CPU on average.

  • The Sun Storage F5100 Flash Array device is a solid-state device that provides a read latency of only 0.5 msec. That is about 10 times faster than the normal disk latencies of 5 msec measured on this benchmark.

See Also

  • Oracle PeopleSoft Benchmark White Papers
    oracle.com
  • PeopleSoft Enterprise Human Capital Management (Payroll)
    oracle.com

  • PeopleSoft Enterprise Payroll 9.1 Using Oracle for Solaris (Unicode) on an Oracle's SPARC T4-4 – White Paper
    oracle.com

  • SPARC T4-4 Server
    oracle.com
  • Oracle Solaris
    oracle.com
  • Oracle Database 11g Release 2 Enterprise Edition
    oracle.com
  • Sun Storage F5100 Flash Array
    oracle.com

Disclosure Statement

Oracle's PeopleSoft Payroll 9.1 benchmark, SPARC T4-4 30.84 min,
http://www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« May 2016
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today