Monday Oct 03, 2011

Sun ZFS Storage 7420 Appliance Doubles NetApp FAS3270A on SPC-1 Benchmark

Oracle's Sun ZFS Storage 7420 appliance delivered outstanding performance and price/performance on the SPC Benchmark 1, beating results published on the NetApp FAS3270A.

  • The Sun ZFS Storage 7420 appliance delivered 137,066.20 SPC-1 IOPS at $2.99 $/SPC-1 IOPS on the SPC-1 benchmark.

  • The Sun ZFS Storage 7420 appliance outperformed the NetApp FAS3270A by 2x on the SPC-1 benchmark.

  • The Sun ZFS Storage 7420 appliance outperformed the NetApp FAS3270A by 2.5x on price/performance on the SPC-1 benchmark.

Performance Landscape

SPC-1 Performance Chart (in decreasing performance order)

System SPC-1
IOPS
$/SPC-1
IOPS
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
Huawei Symantec S6800T 150,061.17 $3.08 43,937.515 $461,471.75 Mirroring 08/31/11 A00107
Sun ZFS Storage 7420 137,066.20 $2.99 23,703.035 $409,933 Mirroring 10/03/11 A00108
Huawei Symantec S5600T 102,471.66 $2.73 35,945.185 $279,914.53 Mirroring 08/25/11 A00106
Pillar Axiom 600 70,102.27 $7.32 32,000.000 $513,112 Mirroring 04/19/11 A00104
NetApp FAS3270A 68,034.63 $7.48 21,659.386 $509,200.79 RAID DP 11/09/10 AE00004
Sun Storage 6780 62,261.80 $6.89 13,742.218 $429,294 Mirroring 06/01/10 A00094
NetApp FAS3170 60,515.34 $10.01 19,628,500 $605,492 RAID-DP 06/10/08 A00066
IBM V7000 56,510.85 $7.24 14,422.309 $409,410.86 Mirroring 10/22/10 A00097
IBM V7000 53,014.29 $7.52 24,433.592 $389,425.11 Mirroring 03/14/11 A00103

SPC-1 IOPS = the Performance Metric
$/SPC-1 IOPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-1 benchmark results may be found at http://www.storageperformance.org.

Configuration Summary

Storage Configuration:

Sun ZFS Storage 7420 appliance in clustered configuration
2 x Sun ZFS Storage 7420 controllers, each with
4 x 2.0 GHz Intel Xeon X7550 processors
512 GB memory, 64 x 8 GB 1066 MHz DDR3 DIMMs
4 x 512 GB SSD flash-enabled read-cache
12 x Sun Disk shelves
10 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Server Configuration:

1 x SPARC T3-2 server
2 x 1.65 GHz SPARC T3 processors
128 GB memory
6 x 8 Gb FC connections to the Sun ZFS Storage 7420 appliance
Oracle Solaris 10 9/10

Benchmark Description

SPC Benchmark-1 (SPC-1): is the first industry standard storage benchmark and is the most comprehensive performance analysis environment ever constructed for storage subsystems. The I/O workload in SPC-1 is characterized by predominately random I/O operations as typified by multi-user OLTP, database, and email servers environments. SPC-1 uses a highly efficient multi-threaded workload generator to thoroughly analyze direct attach or network storage subsystems. The SPC-1 benchmark enables companies to rapidly produce valid performance and price/performance results using a variety of host platforms and storage network topologies.

SPC1 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

See Also

Disclosure Statement

SPC-1, SPC-1 IOPS, $/SPC-1 IOPS are registered trademarks of Storage Performance Council (SPC). Results as of October 2, 2011, for more information see www.storageperformance.org. Sun ZFS Storage 7420 Appliance http://www.storageperformance.org/results/benchmark_results_spc1#a00108; NetApp FAS3270A http://www.storageperformance.org/results/benchmark_results_spc1#ae00004.

SPARC T4-4 Produces World Record Oracle OLAP Capacity

Oracle's SPARC T4-4 server delivered world record capacity on the Oracle OLAP Perf workload.

  • The SPARC T4-4 server was able to operate on a cube with a 3 billion row fact table of sales data containing 4 dimensions which represents as many as 70 quintillion aggregate rows (70 followed by 18 zeros).

  • The SPARC T4-4 server supported 3,500 cube-queries/minute against the Oracle OLAP cube with an average response time of 1.5 seconds and the median response time of 0.15 seconds.

Performance Landscape

Oracle OLAP Perf Benchmark
System Fact Table
Num of Rows
Cube-Queries/
minute
Median Response
seconds
Average Response
seconds
SPARC T4-4 3 Billion 3,500 0.15 1.5

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
1 TB main memory
2 x Sun Storage F5100 Flash Array

Software Configuration:

Oracle Solaris 10 8/11
Oracle Database 11g Enterprise Edition with Oracle OLAP option

Benchmark Description

OLAP Perf is a workload designed to demonstrate and stress the Oracle OLAP product's core functionalities of fast query, fast update, and rich calculations on a dimensional model to support Enhanced Data Warehousing. The workload uses a set of realistic business intelligence (BI) queries that run against an OLAP cube.

Key Points and Best Practices

  • The SPARC T4-4 server is estimated to support 2,400 interactive users with this fast response time assuming only 5 seconds between query requests.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/3/2011.

Friday Sep 30, 2011

SPARC T4-2 Server Beats Intel (Westmere AES-NI) on ZFS Encryption Tests

Oracle continues to lead in enterprise security. Oracle's SPARC T4 processors combined with Oracle's Solaris ZFS file system demonstrate faster file system encryption than equivalent systems based on the Intel Xeon Processor 5600 Sequence chips which use AES-NI security instructions.

Encryption is the process where data is encoded for privacy and a key is needed by the data owner to access the encoded data. The benefits of using ZFS encryption are:

  • The SPARC T4 processor is 3.5x to 5.2x faster than the Intel Xeon Processor X5670 that has the AES-NI security instructions in creating encrypted files.

  • ZFS encryption is integrated with the ZFS command set. Like other ZFS operations, encryption operations such as key changes and re-key are performed online.

  • Data is encrypted using AES (Advanced Encryption Standard) with key lengths of 256, 192, and 128 in the CCM and GCM operation modes.

  • The flexibility of encrypting specific file systems is a key feature.

  • ZFS encryption is inheritable to descendent file systems. Key management can be delegated through ZFS delegated administration.

  • ZFS encryption uses the Oracle Solaris Cryptographic Framework which gives it access to SPARC T4 processor and Intel Xeon X5670 processor (Intel AES-NI) hardware acceleration or to optimized software implementations of the encryption algorithms automatically.

Performance Landscape

Below are results running two different ciphers for ZFS encryption. Results are presented for runs without any cipher, labeled clear, and a variety of different key lengths.

Encryption Using AES-CCM Ciphers

MB/sec – 5 File Create* Encryption
Clear AES-256-CCM AES-192-CCM AES-128-CCM
SPARC T4-2 server 3,803 3,167 3,335 3,225
SPARC T3-2 server 2,286 1,554 1,561 1,594
2-Socket 2.93 GHz Xeon X5670 3,325 750 764 773

Speedup T4-2 vs X5670 1.1x 4.2x 4.4x 4.2x
Speedup T4-2 vs T3-2 1.7x 2.0x 2.1x 2.0x

Encryption Using AES-GCM Ciphers

MB/sec – 5 File Create* Encryption
Clear AES-256-GCM AES-192-GCM AES-128-GCM
SPARC T4-2 server 3,618 3,929 3,164 2,613
SPARC T3-2 server 2,278 1,451 1,455 1,449
2-Socket 2.93 GHz Xeon X5670 3,299 749 748 753

Speedup T4-2 vs X5670 1.1x 5.2x 4.2x 3.5x
Speedup T4-2 vs T3-2 1.6x 2.7x 2.2x 1.8x

(*) Maximum Delivered values measured over 5 concurrent mkfile operations.

Configuration Summary

Storage Configuration:

Sun Storage 6780 array
16 x 15K RPM drives
Raid 0 pool
Write back cache enable
Controller cache mirroring disabled for maximum bandwidth for test
Eight 8 Gb/sec ports per host

Server Configuration:

SPARC T4-2 server
2 x SPARC T4 2.85 GHz processors
256 GB memory
Oracle Solaris 11

SPARC T3-2 server
2 x SPARC T3 1.6 GHz processors
Oracle Solaris 11 Express 2010.11

Sun Fire X4270 M2 server
2 x Intel Xeon X5670, 2.93 GHz processors
Oracle Solaris 11

Benchmark Description

The benchmark ran the UNIX command mkfile (1M). Mkfile is a simple single threaded program to create a file of a specified size. The script ran 5 mkfile operations in the background and observed the peak bandwidth observed during the test.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of December 16, 2011.

SPARC T4 Processor Beats Intel (Westmere AES-NI) on AES Encryption Tests

The cryptography benchmark suite was internally developed by Oracle to measure the maximum throughput of in-memory, on-chip encryption operations that a system can perform. Multiple threads are used to achieve the maximum throughput.

  • Oracle's SPARC T4 processor running Oracle Solaris 11 is 1.5x faster on AES 256-bit key CFB mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 1.7x faster on AES 256-bit key CBC mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 3.6x faster on AES 256-bit key CCM mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption with authentication of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 1.4x faster on AES 256-bit key GCM mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption with authentication of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 9% faster on single-threaded AES 256-bit key CFB mode encryption than the Intel Xeon X5690 processor running Oracle Linux 6.1 for in-memory encryption of 32 KB blocks.

  • The SPARC T4 processor running Oracle Solaris 11 is 1.8x faster on AES 256-bit key CFB mode encryption than the SPARC T3 running Solaris 11 Express.

  • AES CFB mode is used by the Oracle Database 11g for Transparent Data Encryption (TDE) which provides security to database storage.

Performance Landscape

Encryption Performance – AES-CFB

Performance is presented for in-memory AES-CFB128 mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-256-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 10,963 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 7,526 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 6,023 Oracle Solaris 11 Express, libpkcs11
Intel X5690 3.47 12 2,894 Oracle Solaris 11, libsoftcrypto
SPARC T4 2.85 1 712 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 653 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 425 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 331 Oracle Solaris 11 Express, libpkcs11

AES-192-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 12,451 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 8,677 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 6,175 Oracle Solaris 11 Express, libpkcs11
Intel X5690 3.47 12 2,976 Oracle Solaris 11, libsoftcrypto
SPARC T4 2.85 1 816 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 752 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 461 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 371 Oracle Solaris 11 Express, libpkcs11

AES-128-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 14,388 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 10,214 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 32 6,390 Oracle Solaris 11 Express, libpkcs11
Intel X5690 3.47 12 3,115 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 953 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 886 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 509 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 395 Oracle Solaris 11 Express, libpkcs11

Encryption Performance – AES-CBC

Performance is presented for in-memory AES-CBC mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-256-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 11,588 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 7,171 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 6,704 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 5,980 Oracle Solaris 11 Express, libpkcs11
SPARC T4 2.85 1 748 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 592 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 569 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 336 Oracle Solaris 11 Express, libpkcs11

AES-192-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 13,216 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 8,211 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 7,588 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 6,333 Oracle Solaris 11 Express, libpkcs11
SPARC T4 2.85 1 862 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 672 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 643 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 358 Oracle Solaris 11 Express, libpkcs11

AES-128-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 15,323 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 9,785 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 8,746 Oracle Linux 6.1, IPP/AES-NI
SPARC T3 1.65 32 6,347 Oracle Solaris 11 Express, libpkcs11
SPARC T4 2.85 1 1,017 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 781 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 1 739 Oracle Solaris 11, libsoftcrypto
SPARC T3 1.65 1 434 Oracle Solaris 11 Express, libpkcs11

Encryption Performance – AES-CCM

Performance is presented for in-memory AES-CCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-256-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 5,850 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,860 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,613 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 480 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 258 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 190 Oracle Linux 6.1, IPP/AES-NI

AES-192-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 6,709 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,930 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,715 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 565 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 293 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 206 Oracle Linux 6.1, IPP/AES-NI

AES-128-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 7,856 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 2,031 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 1,838 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 664 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 321 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 225 Oracle Linux 6.1, IPP/AES-NI

Encryption Performance – AES-GCM

Performance is presented for in-memory AES-GCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-256-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 6,871 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 4,794 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 12 1,685 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 691 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 571 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 253 Oracle Solaris 11, libsoftcrypto

AES-192-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 7,450 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 5,054 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 12 1,724 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 727 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 618 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 268 Oracle Solaris 11, libsoftcrypto

AES-128-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Th Performance Software Environment
SPARC T4 2.85 64 7,987 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 12 5,315 Oracle Linux 6.1, IPP/AES-NI
Intel X5690 3.47 12 1,781 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 765 Oracle Linux 6.1, IPP/AES-NI
SPARC T4 2.85 1 655 Oracle Solaris 11, libsoftcrypto
Intel X5690 3.47 1 281 Oracle Solaris 11, libsoftcrypto

Configuration Summary

SPARC T4-1 server
1 x SPARC T4 processor, 2.85 GHz
128 GB memory
Oracle Solaris 11

SPARC T3-1 server
1 x SPARC T3 processor, 1.65 GHz
128 GB memory
Oracle Solaris 11 Express

Sun Fire X4270 M2 server
2 x Intel Xeon X5690, 3.47 GHz
Hyper-Threading enabled
Turbo Boost enabled
24 GB memory
Oracle Linux 6.1

Sun Fire X4270 M2 server
2 x Intel Xeon X5690, 3.47 GHz
Hyper-Threading enabled
Turbo Boost enabled
24 GB memory
Oracle Solaris 11 Express

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-memory and on-chip using various ciphers, including AES-128-CFB, AES-192-CFB, AES-256-CFB, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CCM, AES-192-CCM, AES-256-CCM, AES-128-GCM, AES-192-GCM and AES-256-GCM.

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various ciphers. They were run using optimized libraries for each platform to obtain the best possible performance.

See Also

Disclosure Statement

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 1/13/2012.

Thursday Sep 29, 2011

SPARC T4 Processor Outperforms IBM POWER7 and Intel (Westmere AES-NI) on OpenSSL AES Encryption Test

Oracle's SPARC T4 processor is faster than the Intel Xeon X5690 (with AES-NI) and the IBM POWER7.

  • On single-thread OpenSSL encryption, the 2.85 GHz SPARC T4 processor is 4.3 times faster than the 3.5 GHz IBM POWER7 processor.

  • On single-thread OpenSSL encryption, the 2.85 GHz SPARC T4 processor is 17% faster than the 3.46 GHz Intel Xeon X5690 processor.

The SPARC T4 processor has Encryption Instruction Accelerators for encryption and decryption for AES and many other ciphers. The Intel Xeon X5690 processor has AES-NI instructions which accelerate only AES ciphers. The IBM POWER7 does not have cryptographic instructions, but cryptographic coprocessors are available.

Performance Landscape

The table below shows results when running the OpenSSL speed command with the AES-256-CBC cipher. The reported results are for a message size of 8192 bytes. Results are reported for a single thread and for running on all available hardware threads (no over subscribing).

OpenSSL Performance with
AES-256-CBC Encryption
Processor Performance (MB/sec)
1 Thread Maximum Throughput
(at number of threads)
SPARC T4, 2.85 GHz 769 11,967 (64)
Intel Xeon X5690, 3.46 GHz 660 7,362 (12)
IBM POWER7, 3.5 GHz 179 2,860 (est*)

(est*) The performance of the IBM POWER7 is estimated at 16 times the rate of the single thread performance. The estimate is considered an upper bound on expected performance for this processor.

Configuration Summary

SPARC Configuration:

SPARC T4-1 server
1 x SPARC T4 processors, 2.85 GHz
64 GB memory
Oracle Solaris 11

Intel Configuration:

Sun Fire X4270 M2 server
1 x Intel Xeon X5690 processors, 3.46 GHz
24 GB memory
Oracle Solaris 11

Software Configuration:

OpenSSL 1.0.0.d
gcc 3.4.3

Benchmark Description

The in-memory SSL performance was measured with the openssl command. openssl has an option for measuring the speed of various ciphers and message sizes. The actual command used to measure the speed of AES-256-CBC was:

openssl speed -multi {number of threads} -evp aes-256-cbc

openssl runs for several minutes and measures the speed, in units of MB/sec, of the specified cipher for messages of sizes 16 bytes to 8192 bytes.

Key Points and Best Practices

  • The Encryption Instruction Accelerators are accessed through a platform independent API for cryptographic engines.
  • The OpenSSL libraries use the API. The default is to not use the Encryption Instruction Accelerators.
  • Cryptography is compute intensive. Using all available threads streams, both the SPARC T4 processor and the Intel Xeon processor were able to saturate the memory bandwidth of the respective systems.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4-2 Server Beats Intel (Westmere AES-NI) on SSL Network Tests

Oracle's SPARC T4 processor is faster and more efficient than the Intel Xeon X5690 processor (with AES-NI) when running network SSL thoughput tests.

  • The SPARC T4 processor at 2.85 GHz is 20% faster than the 3.46 GHz Intel Xeon X5690 processor on single stream network SSL encryption.

  • The SPARC T4 processor requires fewer streams to attain near-linespeed of a 10 GbE secure network and does this with 5 times less CPU resources compared to the Intel Xeon X5690 processor.

  • Oracle's SPARC T4-2 server using 8 threads achieves line speed over a 10 GbE network with only 9% CPU utilization.

  • Oracle's Sun Fire X4270 M2 with two Intel Xeon X5690 processors achieves line speed with 8 threads, but at 45% CPU utilization.

The SPARC T4 processor has hardware support via Encryption Instruction Accelerators for encryption and decryption for AES and many other ciphers. The Intel Xeon X5690 processor has AES-NI instructions which accelerate only AES ciphers.

Performance Landscape

The following table shows single stream results running encrypted (SSL Read) and unencrypted (Clear Text) messages of 1 MB in size. These tests were run with the uperf benchmark and used the AES-256-CBC cipher. They were run across a 10 GbE connection. Write messages saw similar performance.

Single Stream Network Communication with Uperf
Processor Performance (Mb/sec)
Clear Text SSL Read
SPARC T4, 2.85 GHz 4,194 1,678
Intel Xeon X5690, 3.46 GHz 5,591 1,398

The next table shows how many streams it takes to achieve 90% of the 10 GbE network bandwidth (9000 Mb/sec) for encrypted read messages of 1 MB in size. These tests were run with the uperf benchmark and used the AES-256-CBC cipher. Write messages saw similar performance.

Uperf SSL Read with AES-256-CBC
Processor Number of
Streams for 90%
Network Utilization
CPU Utilization
SPARC T4, 2.85 GHz 8 9%
Intel Xeon X5690, 3.46 GHz 12 45%

Configuration Summary

SPARC T4 Configuration:

2 x SPARC T4-2 servers each with
2 x SPARC T4 processors, 2.85 GHz
128 GB memory
1 x 10-Gigabit Ethernet XAUI Adapter
Oracle Solaris 11
Back-to-back 10 GbE connection

Intel Configuration:

2 x Sun Fire X4270 M2 servers each with
2 x Intel Xeon X5690 processors, 3.46 GHz
48 GB memory
1 x Sun Dual Port 10GbE PCIe 2.0 Networking Card with Intel 82599 10GbE Controller
Oracle Solaris 11
Back-to-back 10 GbE connection

Software Configuration:

OpenSSL 1.0.0.d
uperf 1.0.3
gcc 3.4.3

Benchmark Description

Uperf is an open source benchmark program for simulating and measuring network performance. Uperf is able to measure the performance of various protocols, including TCP, UDP, SCTP and SSL. The uperf benchmark uses an input-defined workload to test network performance. This input workload can be used to model complex situations or to isolate simple tasks. The workload used for these tests was simple network reads and simple network writes.

Key Points and Best Practices

  • The Encryption Instruction Accelerators are accessed through a platform independent API for cryptographic engines.
  • The OpenSSL libraries use the API. The default is to not use the Encryption Instruction Accelerators.
  • Cryptography is compute intensive. Using 8 streams, the SPARC T4 processor was able to match the bandwidth of the 10 GbE network with 8 threads.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

Wednesday Sep 28, 2011

SPARC T4 Servers Set World Record on Oracle E-Business Suite R12 X-Large Order to Cash

With Oracle's SPARC T4-2 server running the application and SPARC T4-4 server running the database, Oracle set a world record result for the Oracle E-Business Suite Standard X-Large Order to Cash (OLTP) benchmark.

  • The combination of a SPARC T4-2 server running the Oracle E-Business Suite R12.1.2 application and a SPARC T4-4 server running the Oracle Database 11g Release 2 database enabled 2400 Order to Cash users of the X-Large Benchmark to simultaneously execute a large volume of medium to heavy transactions with an average response time of 2.4 seconds.

  • The SPARC T4-2 server in the application tier and the SPARC T4-4 server in the database tier are only about half utilized providing significant headroom for additional Oracle E-Business Suite R12.1.2 processing modules and future growth.

Performance Landscape

This is the first published result for the X-large benchmark using Oracle E-Business Order Management module.

OLTP Workload: Order to Cash
X-Large Configuration
System Users Average
Response Time
90th Percentile
Response Time
SPARC T4-2 2400 2.413 sec. 3.114 sec.

Configuration Summary

Application Tier Configuration:

1 x SPARC T4-2 server
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
Oracle Solaris 10 8/11
Oracle E-Business Suite 12.1.2

Database Tier Configuration:

1 x SPARC T4-4 server
4 x SPARC T4 processors, 3.0 GHz
256 GB memory
Oracle Solaris 10 8/11
Oracle Database 11g Release 2

Storage Configuration:

1 x Sun Storage F5100 Flash Array

Benchmark Description

The Oracle R12 E-Business Suite Standard Benchmark combines online transaction execution by simulated users with concurrent batch processing to model a typical scenario for a global enterprise. This benchmark ran one OLTP component, Order to Cash, in the Extra-Large size. The goal is to obtain reference response times.

Results can be published in four sizes and utilize different combination

  • X-large: Maximum online users running all business flows between 10,000 to 20,000; 750,000 order to cash lines per hour and 250,000 payroll checks per hour.
    • Order to Cash Online -- 2400 users
      • The percentage across the 5 transactions in Order Management module is:
        • Insert Manual Invoice -- 16.66%
        • Insert Order -- 32.33%
        • Order Pick Release -- 16.66%
        • Ship Confirm -- 16.66%
        • Order Summary Report -- 16.66%
    • HR Self-Service -- 4000 users
    • Customer Support Flow -- 8000 users
    • Procure to Pay -- 2000 users
  • Large: 10,000 online users; 100,000 order to cash lines per hour and 100,000 payroll checks per hour.
  • Medium: up to 3000 online users; 50,000 order to cash lines per hour and 10,000 payroll checks per hour.
  • Small: up to 1000 online users; 10,000 order to cash lines per hour and 5,000 payroll checks per hour.

See Also

Disclosure Statement

Oracle E-Business X-Large Order to Cash benchmark, SPARC T4-2, SPARC T4, 2.85 GHz, 2 chips, 16 cores, 128 threads, 256 GB memory, SPARC T4-4, SPARC T4, 3.0 GHz, 4 chips, 32 cores, 256 threads, 256 GB memory, average response time 2.413 sec, 90th percentile response time 3.114 sec, Oracle Solaris 10 8/11, Oracle E-Business Suite 12.1.2, Oracle Database 11g Release 2, Results as of 9/26/2011.

SPARC T4-2 Server Beats Intel (Westmere AES-NI) on Oracle Database Tablespace Encryption Queries

Oracle's SPARC T4 processor with Encryption Instruction Accelerators greatly improves performance over software implementations. This will greatly expand the use of TDE for many customers.

  • Oracle's SPARC T4-2 server is over 42% faster than Oracle's Sun Fire X4270 M2 (Intel AES-NI) when running DSS-style queries referencing an encrypted tablespace.

Oracle's Transparent Data Encryption (TDE) feature of the Oracle Database simplifies the encryption of data within datafiles preventing unauthorized access to it from the operating system. Tablespace encryption allows encryption of the entire contents of a tablespace.

TDE tablespace encryption has been certified with Siebel, PeopleSoft, and Oracle E-Business Suite applications

Performance Landscape

Total Query Time (time in seconds)
System GHz AES-128 AES-192 AES-256
SPARC T4-2 server 2.85 588 588 588
Sun Fire X4270 M2 (Intel X5690) 3.46 836 841 842
SPARC T4-2 Advantage
42% 43% 43%

Configuration Summary

SPARC Configuration:

SPARC T4-2 server
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
2 x Sun Storage F5100 Flash Array
Oracle Solaris 11
Oracle Database 11g Release 2

Intel Configuration:

Sun Fire X4270 M2 server
2 x Intel Xeon X5690 processors, 3.46 GHz
48 GB memory
2 x Sun Storage F5100 Flash Array
Oracle Linux 5.7
Oracle Database 11g Release 2

Benchmark Description

To test the performance of TDE, a 1 TB database was created. To demonstrate secure transactions, four 25 GB tables emulating customer private data were created: clear text, encrypted AES-128, encrypted AES-192, and encrypted AES-256. Eight queries of varying complexity that join on the customer table were executed.

The time spent scanning the customer table during each query was measured and query plans analyzed to ensure a fair comparison, e.g. no broken queries. The total query time for all queries is reported.

Key Points and Best Practices

  • Oracle Database 11g Release 2 is required for SPARC T4 processor Encryption Instruction Accelerators support with TDE tablespaces.

  • TDE tablespaces support the SPARC T4 processor Encryption Instruction Accelerators for Advanced Encryption Standard (AES) only.

  • AES-CFB is the mode used in the Oracle database with TDE

  • Prior to using TDE tablespaces you must create a wallet and setup an encryption key. Here is one method to do that:

  • Create a wallet entry in $ORACLE_HOME/network/admin/sqlnet.ora.
    ENCRYPTION_WALLET_LOCATION=
    (SOURCE=(METHOD=FILE)(METHOD_DATA=
    (DIRECTORY=/oracle/app/oracle/product/11.2.0/dbhome_1/encryption_wallet)))
    
    Set an encryption key. This also opens the wallet.
    $ sqlplus / as sysdba
    SQL> ALTER SYSTEM SET ENCRYPTION KEY IDENTIFIED BY "tDeDem0";
    
    On subsequent instance startup open the wallet.
    $ sqlplus / as sysdba
    SQL> STARTUP;
    SQL> ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY "tDeDem0";
    
  • TDE tablespace encryption and decryption occur on physical writes and reads of database blocks, respectively.

  • For parallel query using direct path reads decryption overhead varies inversely with the complexity of the query.

    For a simple full table scan query overhead can be reduced and performance improved by reducing the degree of parallelism (DOP) of the query.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4 Servers Set World Record on PeopleSoft HRMS 9.1

Oracle's SPARC T4-4 servers running Oracle's PeopleSoft HRMS Self-Service 9.1 benchmark and Oracle Database 11g Release 2 achieved World Record performance on Oracle Solaris 10.

  • Using two SPARC T4-4 servers to run the application and database tiers and one SPARC T4-2 server to run the webserver tier, Oracle demonstrated world record performance of 15,000 concurrent users running the PeopleSoft HRMS Self-Service 9.1 benchmark.

  • The combination of the SPARC T4 servers running the PeopleSoft HRMS 9.1 benchmark supports 3.8x more online users with faster response time compared to the best published result from IBM on the previous PeopleSoft HRMS 8.9 benchmark.

  • The average CPU utilization on the SPARC T4-4 server in the application tier handling 15,000 users was less than 50%, leaving significant room for application growth.

  • The SPARC T4-4 server on the application tier used Oracle Solaris Containers which provide a flexible, scalable and manageable virtualization environment.

Performance Landscape

PeopleSoft HRMS Self-Service 9.1 Benchmark
Systems Processors Users Ave Response -
Search (sec)
Ave Response -
Save (sec)
SPARC T4-2 (web)
SPARC T4-4 (app)
SPARC T4-4 (db)
2 x SPARC T4, 2.85 GHz
4 x SPARC T4, 3.0 GHz
4 x SPARC T4, 3.0 GHz
15,000 1.01 0.63
PeopleSoft HRMS Self-Service 8.9 Benchmark
IBM Power 570 (web/app)
IBM Power 570 (db)
12 x POWER5, 1.9 GHz
4 x POWER5, 1.9 GHz
4,000 1.74 1.25
IBM p690 (web)
IBM p690 (app)
IBM p690 (db)
4 x POWER4, 1.9 GHz
12 x POWER4, 1.9 GHz
6 x 4392 MPIS/Gen1
4,000 1.35 1.01

The main differences between version 9.1 and version 8.9 of the benchmark are:

  • the database expanded from 100K employees and 20K managers to 500K employees and 100K managers,
  • the manager data was expanded,
  • a new transaction, "Employee Add Profile," was added, the percent of users executing it is less then 2%, and the transaction has a heavier footprint,
  • version 9.1 has a different benchmark metric (Average Response search/save time for x number of users) versus single user search/save time,
  • newer versions of the PeopleSoft application and PeopleTools software are used.

Configuration Summary

Application Server:

1 x SPARC T4-4 server
4 x SPARC T4 processors 3.0 GHz
512 GB main memory
5 x 300 GB SAS internal disks,
2 x 100 GB internal SSDs
1 x 300 GB internal SSD
Oracle Solaris 10 8/11
PeopleSoft PeopleTools 8.51.02
PeopleSoft HCM 9.1
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 031
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_20

Web Server:

1 x SPARC T4-2 server
2 x SPARC T4 processors 2.85 GHz
256 GB main memory
1 x 300 GB SAS internal disks
1 x 300 GB internal SSD
Oracle Solaris 10 8/11
PeopleSoft PeopleTools 8.51.02
Oracle WebLogic Server 11g (10.3.3)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.6.0_20

Database Server:

1 x SPARC T4-4 server
4 x SPARC T4 processors 3.0 GHz
256 GB main memory
3 x 300 GB SAS internal disks
1 x Sun Storage F5100 Flash Array (80 flash modules)
Oracle Solaris 10 8/11
Oracle Database 11g Release 2

Benchmark Description

The purpose of the PeopleSoft HRMS Self-Service 9.1 benchmark is to measure comparative online performance of the selected processes in PeopleSoft Enterprise HCM 9.1 with Oracle Database 11g. The benchmark kit is an Oracle standard benchmark kit run by all platform vendors to measure the performance. It's an OLTP benchmark with no dependency on remote COBOL calls, there is no batch workload, and DB SQLs are moderately complex. The results are certified by Oracle and a white paper is published.

PeopleSoft defines a business transaction as a series of HTML pages that guide a user through a particular scenario. Users are defined as corporate Employees, Managers and HR administrators. The benchmark consists of 14 scenarios which emulate users performing typical HCM transactions such as viewing paychecks, promoting and hiring employees, updating employee profiles and other typical HCM application transactions.

All these transactions are well-defined in the PeopleSoft HR Self-Service 9.1 benchmark kit. The benchmark metric is the Average Response Time for search and save for 15,000 users..

Key Points and Best Practices

  • The application tier was configured with two PeopleSoft application server instances on the SPARC T4-4 server hosted in two separate Oracle Solaris Containers to demonstrate consolidation of multiple application, ease of administration, and load balancing.

  • Each PeopleSoft Application Server instance running in an Oracle Solaris Container was configured to run 5 application server Domains with 30 application server instances to be able to effectively handle the 15,000 users workload with zero application server queuing and minimal use of resources.

  • The web tier was configured with 20 WebLogic instances and with 4 GB JVM heap size to load balance transactions across 10 PeopleSoft Domains. That enables equitable distribution of transactions and scaling to high number of users.

  • Internal SSDs were configured in the application tier to host PeopleSoft Application Servers object CACHE file systems and in the web tier for WebLogic servers' logging providing near zero millisecond service time and faster server response time.

See Also

Disclosure Statement

Oracle's PeopleSoft HRMS 9.1 benchmark, www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

Tuesday Sep 27, 2011

SPARC T4-2 Servers Set World Record on JD Edwards EnterpriseOne Day in the Life Benchmark with Batch, Outperforms IBM POWER7

Using Oracle's SPARC T4-2 server for the application tier and a SPARC T4-1 server for the database tier, a world record result was produced running the Oracle's JD Edwards EnterpriseOne application Day in the Life (DIL) benchmark concurrently with a batch workload.

  • The SPARC T4-2 server running online and batch with JD Edwards EnterpriseOne 9.0.2 is 1.7x faster and has better response time than the IBM Power 750 system which only ran the online component of JD Edwards EnterpriseOne 9.0 Day in the Life test.

  • The combination of SPARC T4 servers delivered a Day in the Life benchmark result of 10,000 online users with 0.35 seconds of average transaction response time running concurrently with 112 Universal Batch Engine (UBE) processes at 67 UBEs/minute.

  • This is the first JD Edwards EnterpriseOne benchmark for 10,000 users and payroll batch on a SPARC T4-2 server for the application tier and the database tier with Oracle Database 11g Release 2. All servers ran with the Oracle Solaris 10 operating system.

  • The single-thread performance of the SPARC T4 processor produced sub-second response for the online components and provided dramatic performance for the batch jobs.

  • The SPARC T4 servers, JD Edwards EnterpriseOne 9.0.2, and Oracle WebLogic Server 11g Release 1 support 17% more users per JAS (Java Application Server) than the SPARC T3-1 server for this benchmark.

  • The SPARC T4-2 server provided a 6.7x better batch processing rate than the previous SPARC T3-1 server record result and had 2.5x faster response time.

  • The SPARC T4-2 server used Oracle Solaris Containers, which provide flexible, scalable and manageable virtualization.

  • JD Edwards EnterpriseOne uses Oracle Fusion Middleware WebLogic Server 11g R1 and Oracle Fusion Middleware Cluster Web Tier Utilities 11g HTTP server.

  • The combination of the SPARC T4-2 server and Oracle JD Edwards EnterpriseOne in the application tier with a SPARC T4-1 server in the database tier measured low CPU utilization providing headroom for growth.

Performance Landscape

JD Edwards EnterpriseOne Day in the Life Benchmark
Online with Batch Workload

System Online
Users
Resp
Time (sec)
Batch
Concur
(# of UBEs)
Batch
Rate
(UBEs/m)
Version
2xSPARC T4-2 (app+web)
SPARC T4-1 (db)
10000 0.35 112 67 9.0.2
SPARC T3-1 (app+web)
SPARC Enterprise M3000 (db)
5000 0.88 19 10 9.0.1

Resp Time (sec) — Response time of online jobs reported in seconds
Batch Concur (# of UBEs) — Batch concurrency presented in the number of UBEs
Batch Rate (UBEs/m) — Batch transaction rate in UBEs per minute

Edwards EnterpriseOne Day in the Life Benchmark
Online Workload Only

System Online
Users
Response
Time (sec)
Version
SPARC T3-1, 1 x SPARC T3 (1.65 GHz), Solaris 10 (app)
M3000, 1 x SPARC64 VII (2.75 GHz), Solaris 10 (db)
5000 0.52 9.0.1
IBM Power 750, POWER7 (3.55 GHz) (app+db) 4000 0.61 9.0

IBM result from http://www-03.ibm.com/systems/i/advantages/oracle/, IBM used WebSphere

Configuration Summary

Application Tier Configuration:

1 x SPARC T4-2 server with
2 x 2.85 GHz SPARC T4 processors
128 GB main memory
6 x 300 GB 10K RPM SAS internal HDD
Oracle Solaris 10 9/10
JD Edwards EnterpriseOne 9.0.2 with Tools 8.98.3.3

Web Tier Configuration:

1 x SPARC T4-2 server with
2 x 2.85 GHz SPARC T4 processors
256 GB main memory
2 x 300 GB SSD
4 x 300 GB 10K RPM SAS internal HDD
Oracle Solaris 10 9/10
Oracle WebLogic Server 11g Release 1

Database Tier Configuration:

1 x SPARC T4-1 server with
1 x 2.85 GHz SPARC T4 processor
128 GB main memory
6 x 300 GB 10K RPM SAS internal HDD
2 x Sun Storage F5100 Flash Array
Oracle Solaris 10 9/10
Oracle Database 11g Release 2

Benchmark Description

JD Edwards EnterpriseOne is an integrated applications suite of Enterprise Resource Planning (ERP) software. Oracle offers 70 JD Edwards EnterpriseOne application modules to support a diverse set of business operations.

Oracle's Day in the Life (DIL) kit is a suite of scripts that exercises most common transactions of JD Edwards EnterpriseOne applications, including business processes such as payroll, sales order, purchase order, work order, and manufacturing processes, such as ship confirmation. These are labeled by industry acronyms such as SCM, CRM, HCM, SRM and FMS. The kit's scripts execute transactions typical of a mid-sized manufacturing company.

  • The workload consists of online transactions and the UBE – Universal Business Engine workload of 42 short, 8 medium and 4 long UBEs.

  • LoadRunner runs the DIL workload, collects the user’s transactions response times and reports the key metric of Combined Weighted Average Transaction Response time.

  • The UBE processes workload runs from the JD Enterprise Application server.

    • Oracle's UBE processes come as three flavors:
      • Short UBEs < 1 minute engage in Business Report and Summary Analysis,
      • Mid UBEs > 1 minute create a large report of Account, Balance, and Full Address,
      • Long UBEs > 2 minutes simulate Payroll, Sales Order, night only jobs.
    • The UBE workload generates large numbers of PDF files reports and log files.
    • The UBE Queues are categorized as the QBATCHD, a single threaded queue for large and medium UBEs, and the QPROCESS queue for short UBEs run concurrently.

Oracle’s UBE process performance metric is Number of Maximum Concurrent UBE processes at transaction rate, UBEs/minute.

Key Points and Best Practices

One JD Edwards EnterpriseOne Application Server and two Oracle WebLogic Servers 11g R1 coupled with two Oracle Fusion Middleware 11g Web Tier HTTP Server instances on the SPARC T4-2 servers were hosted in three separate Oracle Solaris Containers to demonstrate consolidation of multiple application and web servers.

  • Interrupt fencing was configured on all Oracle Solaris Containers to channel the interrupts to processors other than the processor sets used for the JD Edwards Application server and WebLogic servers.

  • Processor 0 was left alone for clock interrupts.

  • The applications were executed in the FX scheduling class to improve performance by reducing the frequency of context switches.

  • A WebLogic vertical cluster was configured on each WebServer Container with twelve managed instances each to load balance users' requests and to provide the infrastructure that enables scaling to high number of users with ease of deployment and high availability.

  • The database server was run in an Oracle Solaris Container hosted on the SPARC T4-2 server.

  • The database log writer was run in the real time RT class and bound to a processor set.

  • The database redo logs were configured on the raw disk partitions.

  • The private network between the SPARC T4-2 servers was configured with a 10 GbE interface.

  • The Oracle Solaris Container on the Enterprise Application server ran 42 Short UBEs, 8 Medium UBEs and 4 Long UBEs concurrently as the mixed size batch workload.

  • The mixed size UBEs ran concurrently from the application server with the 10000 online users driven by the LoadRunner.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4 Servers Set World Record on Siebel Loyalty Batch

Oracle's SPARC T4-2 and SPARC T4-4 servers running Oracle's Siebel Loyalty Batch engine delivered a world record result for batch processing.

  • The SPARC T4-2 and SPARC T4-4 servers running Siebel Loyalty Batch engine, part of Siebel Loyalty Solution, with Oracle Database 11g Release 2 running on Oracle Solaris 10 achieved 7.65M TPH on Accrual (Reward) processing using three Siebel Servers.

  • The world record result was achieved with 24M members and 50M records in the base transaction table.

  • Siebel Loyalty Application was configured with 50 Active Promotions with three Assign Points and four Update Attributes.

  • Oracle's Siebel Server scaled near linearly on SPARC T4 systems achieving 2.72M TPH on a single Siebel Server to 7.65M TPH with three Siebel Servers.

  • The average CPU utilization on the database tier server was 25% and on the application tier server was 65%, leaving significant room for application growth.

Performance Landscape

System Processor TPH Version
3 x SPARC T4-2 (app)
1 x SPARC T4-4 (db)
SPARC T4, 2.85 GHz
SPARC T4, 3.0 GHz
7.65M 8.1.1.1FP
2 x SPARC T3-2 (app)
1 x SPARC T3-1 (app)
1 x SPARC M5000 (db)
SPARC T3, 1.65 GHz
SPARC T3, 1.65 GHz
SPARC64 VII, 2.52 GHz
3.9M 8.1.1.1FP
Customer (app)
Customer (db)
4 x Intel E5540, 2.53 GHz
1 x Itanium, 1.6 GHz
1.5M 8.1.x

Configuration Summary

Hardware Configuration:

3 x SPARC T4-2 servers, each with
2 x SPARC T4 processors, 2.85 GHz
128 GB main memory
1 x SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
256 GB main memory
1 x Sun Storage 6180 array
16 disk drives
CSM200 with 16 disk drives

Software Configuration:

Oracle Solaris 10
Siebel Server 8.1.1.1FP
Oracle Database 11g Release 2 Enterprise Edition 11.2.0.1

Benchmark Description

Siebel Loyalty enables companies to simulate and process loyalty rewards for their activities across channels and process very high volume accrual and tier assessment transactions via batch process.

The benchmark simulates a workload of Accrual Batch Transactions Processing which imports data through Enterprise Integration Manager (EIM), evaluates eligible promotion and calculates rewards. The key performance metric is transactions per hour (TPH). Key aspects of the workload simulation include:

  • Batch Engine evaluating all accrual promotions and applying all actions in one go,
  • Users do not have control over the sequence in which promotion applied,
  • Promotion actions (assign/redeem points) are rolled back in case of failure.
The number of active promotions and, in particular, the Assign Point action has very significant impact on performance. The load simulated 50 Active promotions with 3 for Assign Points and 7 Update attribute actions configured.

The number of members and the number of queued transactions in the backend database have significant impact on the performance. The benchmark had 24 million members and 52 million records in the base transaction table. The simplified process flow of the benchmark is:

  • calculate accruals base on promotions,
  • credit points to members,
  • initiate any other actions specified in promotions.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/26/2011.

SPARC T4-4 Server Sets World Record on PeopleSoft Payroll (N.A.) 9.1, Outperforms IBM Mainframe, HP Itanium

Oracle's SPARC T4-4 server achieved world record performance on the Unicode version of Oracle's PeopleSoft Enterprise Payroll (N.A) 9.1 extra-large volume model benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 10.

  • The SPARC T4-4 server was able to process 1,460,544 payments/hour using PeopleSoft Payroll N.A 9.1.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 2.8x faster than IBM z10 EC 2097 Payroll 9.0 (UNICODE version) result of 87.4 minutes. The IBM mainframe is rated at 6,512 MIPS.

  • The SPARC T4-4 server UNICODE result of 30.84 minutes on Payroll 9.1 is 3.1x faster than HP rx7640 Itanium2 non-UNICODE result of 96.17 minutes, on Payroll 9.0.

  • The average CPU utilization on the SPARC T4-4 server was only 30%, leaving significant room for business growth.

  • The SPARC T4-4 server processed payroll for 500,000 employees, 750,000 payments, in 30.84 minutes compared to the earlier world record result of 46.76 minutes on Oracle's SPARC Enterprise M5000 server.

  • The SPARC Enterprise M5000 server configured with eight 2.66 GHz SPARC64 VII processors has a result of 46.76 minutes on Payroll 9.1. That is 7% better than the result of 50.11 minutes on the SPARC Enterprise M5000 server configured with eight 2.53 GHz SPARC64 VII processors on Payroll 9.0. The difference in clock speed between the two processors is ~5%. That is close to the difference in the two results, thereby showing that the impact of the Payroll 9.1 benchmark on the overall result is about the same as that of Payroll 9.0.

Performance Landscape

PeopleSoft Payroll (N.A.) 9.1 – 500K Employees (7 Million SQL PayCalc, Unicode)

System OS/Database Payroll Processing
Result (minutes)
Run 1
(minutes)
Num of
Streams
SPARC T4-4, 4 x 3.0 GHz SPARC T4 Solaris/Oracle 11g 30.84 43.76 96
SPARC M5000, 8 x 2.66 GHz SPARC64 VII+ Solaris/Oracle 11g 46.76 66.28 32

PeopleSoft Payroll (N.A.) 9.0 – 500K Employees (3 Million SQL PayCalc, Non-Unicode)

System OS/Database Time in Minutes Num of
Streams
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M5000, 8 x 2.53 GHz SPARC64 VII Solaris/Oracle 11g 50.11 73.88 534.20 1267.06 32
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 58.96 80.5 250.68 462.6 8
IBM z10 EC 2097, 9 x 4.4 GHz Gen1 Z/OS /DB2 87.4 ** 107.6 - - 8
HP rx7640, 8 x 1.6 GHz Itanium2 HP-UX/Oracle 11g 96.17 133.63 712.72 1665.01 32

** This result was run with Unicode. The IBM z10 EC 2097 UNICODE result of 87.4 minutes is 48% slower than IBM z10 EC 2097 non-UNICODE result of 58.96 minutes, both on Payroll 9.0, each configured with nine 4.4GHz Gen1 processors.

Payroll 9.1 Compared to Payroll 9.0

Please note that Payroll 9.1 is Unicode based and Payroll 9.0 had non-Unicode and Unicode versions of the workload. There are 7 million executions of an SQL statement for the PayCalc batch process in Payroll 9.1 and 3 million executions of the same SQL statement for the PayCalc batch process in Payroll 9.0. This gets reflected in the elapsed time (27.33 min for 9.1 and 23.78 min for 9.0). The elapsed times of all other batch processes is lower (better) on 9.1.

Configuration Summary

Hardware Configuration:

SPARC T4-4 server
4 x 3.0 GHz SPARC T4 processors
256 GB memory
Sun Storage F5100 Flash Array
80 x 24 GB FMODs

Software Configuration:

Oracle Solaris 10 8/11
PeopleSoft HRMS and Campus Solutions 9.10.303
PeopleSoft Enterprise (PeopleTools) 8.51.035
Oracle Database 11g Release 2 11.2.0.1 (64-bit)
Micro Focus COBOLServer Express 5.1 (64-bit)

Benchmark Description

The PeopleSoft 9.1 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing a large organization. The five processes are:

  • Paysheet Creation: Generates payroll data worksheets consisting of standard payroll information for each employee for a given pay cycle.

  • Payroll Calculation: Looks at paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by Payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by the above processes and produces an electronic transmittal file that is used to transfer payroll funds directly into an employee's bank account.

Key Points and Best Practices

  • The SPARC T4-4 server with the Sun Storage F5100 Flash Array device had an average read throughput of up to 103 MB/sec and an average write throughput of up to 124 MB/sec while consuming 30% CPU on average.

  • The Sun Storage F5100 Flash Array device is a solid-state device that provides a read latency of only 0.5 msec. That is about 10 times faster than the normal disk latencies of 5 msec measured on this benchmark.

See Also

  • Oracle PeopleSoft Benchmark White Papers
    oracle.com
  • PeopleSoft Enterprise Human Capital Management (Payroll)
    oracle.com

  • PeopleSoft Enterprise Payroll 9.1 Using Oracle for Solaris (Unicode) on an Oracle's SPARC T4-4 – White Paper
    oracle.com

  • SPARC T4-4 Server
    oracle.com
  • Oracle Solaris
    oracle.com
  • Oracle Database 11g Release 2 Enterprise Edition
    oracle.com
  • Sun Storage F5100 Flash Array
    oracle.com

Disclosure Statement

Oracle's PeopleSoft Payroll 9.1 benchmark, SPARC T4-4 30.84 min,
http://www.oracle.com/us/solutions/benchmark/apps-benchmark/peoplesoft-167486.html, results 9/26/2011.

Monday Sep 19, 2011

Halliburton ProMAX® Seismic Processing on Sun Blade X6270 M2 with Sun ZFS Storage 7320

Halliburton/Landmark's ProMAX® 3D Pre-Stack Kirchhoff Time Migration's (PSTM) single workflow scalability and multiple workflow throughput using various scheduling methods are evaluated on a cluster of Oracle's Sun Blade X6270 M2 server modules attached to Oracle's Sun ZFS Storage 7320 appliance.

Two resource scheduling methods, compact and distributed, are compared while increasing the system load with additional concurrent ProMAX® workflows.

  • Multiple concurrent 24-process ProMAX® PSTM workflow throughput is constant; 10 workflows on 10 nodes finish as fast as 1 workflow on one compute node. Additionally, processing twice the data volume yields similar traces/second throughput performance.

  • A single ProMAX® PSTM workflow has good scaling from 1 to 10 nodes of a Sun Blade X6270 M2 cluster scaling 4.5X. ProMAX® scales to 4.7X on 10 nodes with one input data set and 6.3X with two consecutive input data sets (i.e. twice the data).

  • A single ProMAX® PSTM workflow has near linear scaling of 11x on a Sun Blade X6270 M2 server module when running from 1 to 12 processes.

  • The 12-thread ProMAX® workflow throughput using the distributed scheduling method is equivalent or slightly faster than the compact scheme for 1 to 6 concurrent workflows.

Performance Landscape

Multiple 24-Process Workflow Throughput Scaling

This test measures the system throughput scalability as concurrent 24-process workflows are added, one workflow per node. The per workflow throughput and the system scalability are reported.

Aggregate system throughput scales linearly. Ten concurrent workflows finish in the same time as does one workflow on a single compute node.

Halliburton ProMAX® Pre-Stack Time Migration - Multiple Workflow Scaling


Single Workflow Scaling

This test measures single workflow scalability across a 10-node cluster. Utilizing a single data set, performance exhibits near linear scaling of 11x at 12 processes, and per-node scaling of 4x at 6 nodes; performance flattens quickly reaching a peak of 60x at 240 processors and per-node scaling of 4.7x with 10 nodes.

Running with two consecutive input data sets in the workflow, scaling is considerably improved with peak scaling ~35% higher than obtained using a single data set. Doubling the data set size minimizes time spent in workflow initialization, data input and output.

Halliburton ProMAX® Pre-Stack Time Migration - Single Workflow Scaling

This next test measures single workflow scalability across a 10-node cluster (as above) but limiting scheduling to a maximum of 12-process per node; effectively restricting a maximum of one process per physical core. The speedup relative to a single process, and single node are reported.

Utilizing a single data set, performance exhibits near linear scaling of 37x at 48 processes, and per-node scaling of 4.3x at 6 nodes. Performance of 55x at 120 processors and per-node scaling of 5x with 10 nodes is reached and scalability is trending higher more strongly compared to the the case of two processes running per physical core above. For equivalent total process counts, multi-node runs using only a single process per physical core appear to run between 28-64% more efficiently (96 and 24 processes respectively). With a full compliment of 10 nodes (120 processes) the peak performance is only 9.5% lower than with 2 processes per vcpu (240 processes).

Running with two consecutive input data sets in the workflow, scaling is considerably improved with peak scaling ~35% higher than obtained using a single data set.

Halliburton ProMAX® Pre-Stack Time Migration - Single Workflow Scaling

Multiple 12-Process Workflow Throughput Scaling, Compact vs. Distributed Scheduling

The fourth test compares compact and distributed scheduling of 1, 2, 4, and 6 concurrent 12-processor workflows.

All things being equal, the system bi-section bandwidth should improve with distributed scheduling of a fixed-size workflow; as more nodes are used for a workflow, more memory and system cache is employed and any node memory bandwidth bottlenecks can be offset by distributing communication across the network (provided the network and inter-node communication stack do not become a bottleneck). When physical cores are not over-subscribed, compact and distributed scheduling performance is within 3% suggesting that there may be little memory contention for this workflow on the benchmarked system configuration.

With compact scheduling of two concurrent 12-processor workflows, the physical cores become over-subscribed and performance degrades 36% per workflow. With four concurrent workflows, physical cores are oversubscribed 4x and performance is seen to degrade 66% per workflow. With six concurrent workflows over-subscribed compact scheduling performance degrades 77% per workflow. As multiple 12-processor workflows become more and more distributed, the performance approaches the non over-subscribed case.

Halliburton ProMAX® Pre-Stack Time Migration - Multiple Workflow Scaling

141616 traces x 624 samples


Test Notes

All tests were performed with one input data set (70808 traces x 624 samples) and two consecutive input data sets (2 * (70808 traces x 624 samples)) in the workflow. All results reported are the average of at least 3 runs and performance is based on reported total wall-clock time by the application.

All tests were run with NFS attached Sun ZFS Storage 7320 appliance and then with NFS attached legacy Sun Fire X4500 server. The StorageTek Workload Analysis Tool (SWAT) was invoked to measure the I/O characteristics of the NFS attached storage used on separate runs of all workflows.

Configuration Summary

Hardware Configuration:

10 x Sun Blade X6270 M2 server modules, each with
2 x 3.33 GHz Intel Xeon X5680 processors
48 GB DDR3-1333 memory
4 x 146 GB, Internal 10000 RPM SAS-2 HDD
10 GbE
Hyper-Threading enabled

Sun ZFS Storage 7320 Appliance
1 x Storage Controller
2 x 2.4 GHz Intel Xeon 5620 processors
48 GB memory (12 x 4 GB DDR3-1333)
2 TB Read Cache (4 x 512 GB Read Flash Accelerator)
10 GbE
1 x Disk Shelf
20.0 TB RAID-Z (20 x 1 TB SAS-2, 7200 RPM HDD)
4 x Write Flash Accelerators

Sun Fire X4500
2 x 2.8 GHz AMD 290 processors
16 GB DDR1-400 memory
34.5 TB RAID-Z (46 x 750 GB SATA-II, 7200 RPM HDD)
10 GbE

Software Configuration:

Oracle Linux 5.5
Parallel Virtual Machine 3.3.11 (bundled with ProMAX)
Intel 11.1.038 Compilers
Libraries: pthreads 2.4, Java 1.6.0_01, BLAS, Stanford Exploration Project Libraries

Benchmark Description

The ProMAX® family of seismic data processing tools is the most widely used Oil and Gas Industry seismic processing application. ProMAX® is used for multiple applications, from field processing and quality control, to interpretive project-oriented reprocessing at oil companies and production processing at service companies. ProMAX® is integrated with Halliburton's OpenWorks® Geoscience Oracle Database to index prestack seismic data and populate the database with processed seismic.

This benchmark evaluates single workflow scalability and multiple workflow throughput of the ProMAX® 3D Prestack Kirchhoff Time Migration (PSTM) while processing the Halliburton benchmark data set containing 70,808 traces with 8 msec sample interval and trace length of 4992 msec. Benchmarks were performed with both one and two consecutive input data sets.

Each workflow consisted of:

  • reading the previously constructed MPEG encoded processing parameter file
  • reading the compressed seismic data traces from disk
  • performing the PSTM imaging
  • writing the result to disk

Workflows using two input data sets were constructed by simply adding a second identical seismic data read task immediately after the first in the processing parameter file. This effectively doubled the data volume read, processed, and written.

This version of ProMAX® currently only uses Parallel Virtual Machine (PVM) as the parallel processing paradigm. The PVM software only used TCP networking and has no internal facility for assigning memory affinity and processor binding. Every compute node is running a PVM daemon.

The ProMAX® processing parameters used for this benchmark:

Minimum output inline = 65
Maximum output inline = 85
Inline output sampling interval = 1
Minimum output xline = 1
Maximum output xline = 200 (fold)
Xline output sampling interval = 1
Antialias inline spacing = 15
Antialias xline spacing = 15
Stretch Mute Aperature Limit with Maximum Stretch = 15
Image Gather Type = Full Offset Image Traces
No Block Moveout
Number of Alias Bands = 10
3D Amplitude Phase Correction
No compression
Maximum Number of Cache Blocks = 500000

Primary PSTM business metrics are typically time-to-solution and accuracy of the subsurface imaging solution.

Key Points and Best Practices

  • Multiple job system throughput scales perfectly; ten concurrent workflows on 10 nodes each completes in the same time and has the same throughput as a single workflow running on one node.
  • Best single workflow scaling is 6.6x using 10 nodes.

    When tasked with processing several similar workflows, while individual time-to-solution will be longer, the most efficient way to run is to fully distribute them one workflow per node (or even across two nodes) and run these concurrently, rather than to use all nodes for each workflow and running consecutively. For example, while the best-case configuration used here will run 6.6 times faster using all ten nodes compared to a single node, ten such 10-node jobs running consecutively will overall take over 50% longer to complete than ten jobs one per node running concurrently.

  • Throughput was seen to scale better with larger workflows. While throughput with both large and small workflows are similar with only one node, the larger dataset exhibits 11% and 35% more throughput with four and 10 nodes respectively.

  • 200 processes appears to be a scalability asymptote with these workflows on the systems used.
  • Hyperthreading marginally helps throughput. For the largest model run on 10 nodes, 240 processes delivers 11% more performance than with 120 processes.

  • The workflows do not exhibit significant I/O bandwidth demands. Even with 10 concurrent 24-process jobs, the measured aggregate system I/O did not exceed 100 MB/s.

  • 10 GbE was the only network used and, though shared for all interprocess communication and network attached storage, it appears to have sufficient bandwidth for all test cases run.

See Also

Disclosure Statement

The following are trademarks or registered trademarks of Halliburton/Landmark Graphics: ProMAX®, GeoProbe®, OpenWorks®. Results as of 9/1/2011.

Thursday Sep 15, 2011

Sun Fire X4800 M2 Servers (now known as Sun Server X2-8) Produce World Record on SAP SD-Parallel Benchmark

Oracle delivered an SAP enhancement package 4 for SAP ERP 6.0 (Unicode) Sales and Distribution - Parallel (SD Parallel) Benchmark world record result using eight of Oracle's Sun Fire X4800 M2 servers (now known as Sun Server X2-8), Oracle Solaris 10 and Oracle Database 11g Real Application Clusters (RAC) software that achieved 180,000 users as of 10/03/2011.

  • The eight Sun Fire X4800 M2 servers delivered a world record result of 180,000 users on the SAP SD Parallel Benchmark.

  • The eight Sun Fire X4800 M2 server SD Parallel result of 180,000 users delivered 43% more performance compared to the IBM Power 795 server SD two-tier result of 126,063 users.

Performance Landscape

Selected SAP Sales and Distribution (SD) benchmark results are presented in decreasing order of performance. All benchmarks were using SAP enhancement package 4 for SAP ERP 6.0 (Unicode).

System OS
Database
Users SAPS Type Cert #
Eight Sun Fire X4800 M2
8 x Intel Xeon E7-8870 @2.4 GHz
512 GB
Oracle Solaris 10
Oracle 11g RAC
180,000 1,016,380 Parallel 2011037
Six Sun Fire X4800 M2
8 x Intel Xeon E7-8870 @2.4 GHz
512 GB
Oracle Solaris 10
Oracle 11g RAC
137,904 765,470 Parallel 2011038
IBM Power 795
32 x POWER7 @4.0 GHz
4096 GB
AIX 7.1
DB2 9.7
126,063 688,630 Two-Tier 2010046
Four Sun Fire X4800 M2
8 x Intel Xeon E7-8870 @2.4 GHz
512 GB
Oracle Solaris 10
Oracle 11g RAC
94,736 546,050 Parallel 2011039
Two Sun Fire X4800 M2
8 x Intel Xeon E7-8870 @2.4 GHz
512 GB
Oracle Solaris 10
Oracle 11g RAC
49,860 274,080 Parallel 2011040
Four Sun Fire X4470
4 x Intel Xeon X7560 @2.26 GHz
256 GB
Solaris 10
Oracle 11g RAC
40,000 221,020 Parallel 2010039

Complete benchmark results and descriptions can be found at the SAP standard applications benchmark website.
For SD benchmark results website: Two-Tier or Three-Tier. For SD Parallel benchmark results website: SD Parallel.

Configuration and Results Summary

Hardware Configuration:

8 x Sun Fire X4800 M2 servers, each with
8 x Intel Xeon E7-8870 @ 2.4 GHz (8 processors, 80 cores, 160 threads)
512 GB memory

Software Configuration:

SAP enhancement package 4 for SAP ERP 6.0
Oracle Database 11g Real Application Clusters (RAC)
Oracle Solaris 10

Results Summary:

Number of SAP SD benchmark users:
180,000
Average dialog response time:
0.63 seconds
Throughput:

Fully processed order line items per hour:
20,327,670

Dialog steps/hour:
60,983,000

SAPS:
1,016,380
Average database request time (dialog/update):
0.010 sec / 0.055 sec
SAP Certification:
2011037

Benchmark Description

The SAP Standard Application Sales and Distribution - Parallel (SD Parallel) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

The SD Parallel Benchmark consists of the same transactions and user interaction steps as the two-tier and three-tier SD Benchmark. This means that the SD Parallel Benchmark runs the same business processes as the SD Benchmark. The difference between the benchmarks is the technical data distribution. Additionally, the benchmark requires equal distribution of the benchmark users across all database nodes for the used benchmark clients (round-robin method). Following this rule, all database nodes work on data of all clients. This avoids unrealistic configurations such as having only one client per database node.

The SAP Benchmark Council agreed to give the parallel benchmark a different name so that the difference can be easily recognized by any interested parties - customers, prospects, and analysts. The naming convention is SD Parallel for Sales & Distribution - Parallel.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Disclosure Statement

SAP enhancement package 4 for SAP ERP 6.0 (Unicode) Sales and Distribution Benchmark, results as of 10/03/2011.

SD Parallel, 8 x Sun Fire X4800 M2 (each 8 processors, 80 cores, 160 threads) 180,000 SAP SD Users, Oracle Solaris 10, Oracle 11g Real Application Clusters (RAC), Certification Number 2011037.
SD Parallel, 6 x Sun Fire X4800 M2 (each 8 processors, 80 cores, 160 threads) 137,904 SAP SD Users, Oracle Solaris 10, Oracle 11g Real Application Clusters (RAC), Certification Number 2011038.
SD Parallel, 4 x Sun Fire X4470 (each 4 processors, 32 cores, 64 threads) 40,000 SAP SD Users, Oracle Solaris 10, Oracle 11g Real Application Clusters (RAC), Certification Number 2010039.
SD Two-Tier, IBM Power 795 (32 processors, 256 cores, 1024 threads) 126,063 SAP SD Users, AIX 7.1, DB2 9.7, Certification Number 2010046.

SAP, R/3 are registered trademarks of SAP AG in Germany and other countries. More information may be found at www.sap.com/benchmark.

Monday Sep 12, 2011

SPARC Enterprise M9000 Produces World Record SAP ATO Benchmark

Oracle delivered an SAP enhancement package 4 for SAP ERP 6.0 Assemble-to-Order (ATO) benchmark world record result using Oracle's SPARC Enterprise M9000 server running Oracle Solaris 10 and Oracle Database 11g along with SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode). The SAP ATO benchmark integrates process chains across SAP Business Suite components, include Financials, Logistics, Human Resources, Basis and Cross Application.

  • The SPARC Enterprise M9000 server containing 64 SPARC64 VII+ 3.0 GHz processors, running Oracle Solaris 10 and Oracle Database 11g along with SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) delivered a world record 206,000 fully processed assembly orders per hour on the SAP enhancement package 4 for SAP ERP 6.0 ATO benchmark.

  • The SPARC Enterprise M9000 server result shows it can more than consolidate the work of the three-tier HP solution which used 80 different servers.

  • Oracle produced the first SAP ATO benchmark result using Unicode encoding.

  • The SAP ATO benchmark uses multiple components of the SAP Business Suite. See more detail at the SAP ATO benchmark webpage.

Performance Landscape

SAP ATO 2-Tier Performance Table (select results in decreasing performance order)

System OS
Database
Assembly Orders
per hour(*)
SAP
ERP/ECC
Release
Cert Num
SPARC Enterprise M9000
64 x SPARC64 VII+ @3.0 GHz
2048 GB
Oracle Solaris 10
Oracle 11g
206,360 SAP ERP6.0*
(Unicode)
2011033
Fujitsu Siemens Primepower 2000
128 x SPARC64 @560 MHz
128 GB
Solaris 8
Oracle 8.1.7
34,260 4.6B
(non-Unicode)
2001018
HP 9000 Superdome
64 x PA-RISC 8600 @552 MHz
128 GB
HP-UX 11.11
Oracle 8.16
18,870 4.6B
(non-Unicode)
2001014
Fujitsu Siemens Primepower 900
16 x SPARC64 V @1.35 GHz
64 GB
Solaris 8
Oracle 9i
12,170 4.6C
(non-Unicode)
2003012
HP rx5670
4 x Itanium II @1.0 GHz
24 GB
HP-UX 11i
Oracle 9i
3,090 4.6C
(non-Unicode)
2002069

(*) SAP enhancement package 4 for SAP ERP6.0 (Unicode)

SAP ATO 3-Tier Performance Table (top results in decreasing performance order)

System OS
Database
Assembly Orders
per hour(*)
SAP
ERP/ECC
Release
Cert Num
HP 9000 Superdome Enterprise Server
64 x PA-RISC 8700 @ 750MHz
128 GB
HP-UX 11i
Oracle 9i
144,090 4.6 C
(non-Unicode)
2002003
HP 9000 Superdome Enterprise Server
64 x PA-RISC 8700 @750 MHz
128 GB
HP-UX 11i
Oracle 9i
130,570 4.6 C
(non-Unicode)
2001047

(*) Assembly Order: Request to assemble pre-manufactured parts and assemblies to finished products according to an existing sales order.

Complete benchmark results may be found at the SAP benchmark website: http://www.sap.com/benchmark.

Configuration Summary and Results

Hardware Configuration:

SPARC Enterprise M9000
64 SPARC64 VII+ 3.0 GHz processor
2048 GB memory

Software Configuration:

Oracle Solaris 10
SAP enhancement package 4 for SAP ERP 6.0 (Unicode)
Oracle Database 11g

Certified Result:

Fully business processed Assembly Orders/hour:
206,360
SAP Certification Number:
2011033

Benchmark Description

The SAP ATO benchmark integrates process chains across SAP Business Suite components. The ATO scenario is characterized by high volume sales, short production times (from hours to one day), and individual assembly for such products as PCs, pumps, and cars. In general, each benchmark user has its own master data, such as material, vendor, or customer master data to avoid data locking situations. However, the ATO Benchmark has been designed to handle and overcome data locking situations - the ATO benchmark users access common master data, such as material, vendor, or customer master data. (source: http://www12.sap.com/solutions/benchmark/ato.epx).

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Disclosure Statement

SAP, R/3 are registered trademarks of SAP AG in Germany and other countries. More information may be found at www.sap.com/benchmark

Two-tier SAP ATO standard SAP ERP 6.0 2005/EP4 (Unicode) application benchmarks as of 09/04/11:
Oracle's SPARC Enterprise M9000 (64 processors, 256 cores, 512 threads) 206,360 Assembly Orders/hour, 64 x 3.0 GHz SPARC VIII, 2048 GB memory, Oracle 11g, Oracle Solaris 10, Certification Number 2011033.

Two-tier SAP ATO standard 4.6 C application benchmarks as of 09/04/11:
Fujitsu Siemens Primepower 900 (16-way SMP) 12,170 Assembly Orders/hour, 16 x 1.35 GHz SPARC64 V, 64 GB memory, Oracle 9i, Solaris 8, Certification Number 2003012.
HP rx5670 (4 processors SMP) 3,090 Assembly Orders/hour, 4 x 1.0 GHz Itanium II, 24 GB memory, Oracle 9i, HP-UX 11i, Certification Number 2002069.

Two-tier SAP ATO standard 4.6 B application benchmarks as of 09/04/11:
HP 9000 Superdome (64-way SMP) 18,8770 Assembly Orders/hour, 64 x 552 MHz PA-RISC 8600, 128 GB memory, Oracle 8.1.6, HP-UX 11.11, Certification Number 2001014.
Fujitsu Siemens Primepower 2000 (128 processors SMP) 34,260 Assembly Orders/hour, 128 x 560 MHz SPARC64, 128 GB memory, Oracle 8.1.7, Solaris 8, Certification Number 2001018.

Three-tier SAP ATO standard 4.6 C application benchmarks as of 09/04/11:
HP 9000 Superdome Enterprise Server (64 processors SMP) 144,090 Assembly Orders/hour, 64 x 750 MHz PA-RISC 8700, 128 GB memory, Oracle 9i, HP-UX 11i, Certification Number 2002003
HP 9000 Superdome Enterprise Server (64 processors SMP) 130,570 Assembly Orders/hour, 64 x 750 MHz PA-RISC 8700, 128 GB memory, Oracle 9i, HP-UX 11i, Certification Number 2001047

About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today