Tuesday May 31, 2016

SAP Two-Tier Standard Sales and Distribution SD Benchmark: SPARC M7-8 World Record 8 Processors

Oracle's SPARC M7-8 server produced a world record result for 8-processors on the SAP two-tier Sales and Distribution (SD) Standard Application Benchmark using SAP Enhancement Package 5 for SAP ERP 6.0 (8 chips / 256 cores / 2048 threads).

  • The SPARC M7-8 server achieved 130,000 SAP SD benchmark users running the two-tier SAP Sales and Distribution (SD) Standard Application Benchmark using SAP Enhancement Package 5 for SAP ERP 6.0.

  • The SPARC M7-8 server is 1.5x faster per core than x86-based HPE Integrity Superdome X running the two-tier SAP Sales and Distribution (SD) Standard Application Benchmark using SAP Enhancement Package 5 for SAP ERP 6.0.

  • The SPARC M7-8 server result was run with Oracle Solaris 11 and used Oracle Database 12c.

  • Previously the SPARC T7-2 server set the 2-chip server world record achieving 30,800 SAP SD benchmark users running the two-tier SAP Sales and Distribution (SD) Standard Application Benchmark using SAP Enhancement Package 5 for SAP ERP 6.0.

Performance Landscape

SAP-SD 2-tier performance table in decreasing performance order with SAP ERP 6.0 Enhancement Package 5 for SAP ERP 6.0 results (current version of the benchmark as of May 2012).

SAP SD Two-Tier Benchmark
System
Processor
OS
Database
Users Resp Time
(sec)
Users/
core
Cert#
SPARC M7-8
8 x SPARC M7 (8x 32core)
Oracle Solaris 11
Oracle Database 12c
130,000 0.93 508 2016020
HPE Integrity Superdome X
16 x Intel E7-8890 v3 (16x 18core)
Windows Server 2012 R2
Datacenter Edition
SQL Server 2014
100,000 0.99 347 2016002

Number of cores presented are per chip, to get system totals, multiple by the number of chips.

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Configuration Summary and Results

Database/Application Server:

1 x SPARC M7-8 server with
8 x SPARC M7 processors (4.13 GHz, total of 8 processors / 256 cores / 2048 threads)
4 TB memory
Oracle Solaris 11.3
Oracle Database 12c

Database Storage:
7 x Sun Server X3-2L each with
2 x Intel Xeon Processors E5-2609 (2.4 GHz)
16 GB memory
4 x Sun Flash Accelerator F40 PCIe Card
12 x 3 TB SAS disks
Oracle Solaris 11

REDO log Storage:
1 x Pillar FS-1 Flash Storage System, with
2 x FS1-2 Controller (Netra X3-2)
2 x FS1-2 Pilot (X4-2)
4 x DE2-24P Disk enclosure
96 x 300 GB 10000 RPM SAS Disk Drive Assembly

Certified Results (published by SAP)

Number of SAP SD benchmark users: 130,000
Average dialog response time: 0.93 seconds
Throughput:
  Fully processed order line items per hour: 14,269,670
  Dialog steps per hour: 42,809,000
  SAPS: 713,480
Average database request time (dialog/update): 0.018 sec / 0.039 sec
SAP Certification: 2016020

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is an ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard application benchmarks, SAP Enhancement Package 5 for SAP ERP 6.0 as of 5/14/16:

SPARC M7-8 (8 processors, 256 cores, 2048 threads) 130,000 SAP SD users, 8 x 4.13 GHz SPARC M7, 4 TB memory, Oracle Database 12c, Oracle Solaris 11, Cert# 2016020
SPARC T7-2 (2 processors, 64 cores, 512 threads) 30,800 SAP SD users, 2 x 4.13 GHz SPARC M7, 1 TB memory, Oracle Database 12c, Oracle Solaris 11, Cert# 2015050
HPE Integrity Superdome X (16 processors, 288 cores, 576 threads) 100,000 SAP SD users, 16 x 2.5 GHz Intel Xeon Processor E7-8890 v3 4096 GB memory, SQL Server 2014, Windows Server 2012 R2 Datacenter Edition, Cert# 2016002

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Wednesday Apr 13, 2016

SHA Digest Encryption: SPARC T7-2 Beats x86 E5 v4

Oracle's cryptography benchmark measures security performance on important Secure Hash Algorithm (SHA) functions. Oracle's SPARC M7 processor with its security software in silicon is faster than current and recent x86 servers. In this test, the performance of on-processor digest operations is measured for three sizes of plaintext inputs (64, 1024 and 8192 bytes) using three SHA2 digests (SHA512, SHA384, SHA256) and the older, weaker SHA1 digest. Multiple parallel threads are used to measure each processor's maximum throughput. Oracle's SPARC T7-2 server shows dramatically faster digest computation compared to current x86 two processor servers.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 10 times faster computing multiple parallel SHA512 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon Processor E5-2699 v4 running Oracle Linux 7.2.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 10 times faster computing multiple parallel SHA256 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon Processor E5-2699 v4 running Oracle Linux 7.2.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 3.6 times faster computing multiple parallel SHA1 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon Processor E5-2699 v4 running Oracle Linux 7.2.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 17 times faster computing multiple parallel SHA512 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon Processor E5-2699 v3 running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 14 times faster computing multiple parallel SHA256 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon Processor E5-2699 v3 running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 4.8 times faster computing multiple parallel SHA1 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon Processor E5-2699 v3 running Oracle Linux 6.5.

  • SHA1 and SHA2 operations are an integral part of Oracle Solaris, while on Linux they are performed using the add-on Cryptography for Intel Integrated Performance Primitives for Linux (library).

Oracle has also measured AES (CFB, GCM, CCM, CBC) cryptographic performance on the SPARC M7 processor.

Performance Landscape

Presented below are results for computing SHA1, SHA256, SHA384 and SHA512 digests for input plaintext sizes of 64, 1024 and 8192 bytes. Results are presented as MB/sec (10**6). All SPARC M7 processor results were run as part of this benchmark effort. All other results were run during previous benchmark efforts.

Digest Performance – SHA512

Performance is presented for SHA512 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 39,201 167,072 184,944
2 x SPARC T5, 3.6 GHz 18,717 73,810 78,997
2 x Intel Xeon Processor E5-2699 v4, 2.2 GHz 6,973 15,412 17,616
2 x Intel Xeon Processor E5-2699 v3, 2.3 GHz 3,949 9,214 10,681
2 x Intel Xeon Processor E5-2697 v2, 2.7 GHz 2,681 6,631 7,701

Digest Performance – SHA384

Performance is presented for SHA384 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 39,697 166,898 185,194
2 x SPARC T5, 3.6 GHz 18,814 73,770 78,997
2 x Intel Xeon Processor E5-2699 v4, 2.2 GHz 6,909 15,353 17,618
2 x Intel Xeon Processor E5-2699 v3, 2.3 GHz 4,061 9,263 10,678
2 x Intel Xeon Processor E5-2697 v2, 2.7 GHz 2,774 6,669 7,706

Digest Performance – SHA256

Performance is presented for SHA256 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 45,148 113,648 119,929
2 x SPARC T5, 3.6 GHz 21,140 49,483 51,114
2 x Intel Xeon Processor E5-2699 v4, 2.2 GHz 5,103 11,174 12,037
2 x Intel Xeon Processor E5-2699 v3, 2.3 GHz 3,446 7,785 8,463
2 x Intel Xeon Processor E5-2697 v2, 2.7 GHz 2,404 5,570 6,037

Digest Performance – SHA1

Performance is presented for SHA1 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 47,640 92,515 97,545
2 x SPARC T5, 3.6 GHz 21,052 40,107 41,584
2 x Intel Xeon Processor E5-2699 v4, 2.2 GHz 8,566 23,901 26,752
2 x Intel Xeon Processor E5-2699 v3, 2.3 GHz 6,677 18,165 20,405
2 x Intel Xeon Processor E5-2697 v2, 2.7 GHz 4,649 13,245 14,842

Configuration Summary

SPARC T7-2 server
2 x SPARC M7 processor, 4.13 GHz
1 TB memory
Oracle Solaris 11.3

SPARC T5-2 server
2 x SPARC T5 processor, 3.60 GHz
512 GB memory
Oracle Solaris 11.2

Oracle Server X6-2L system
2 x Intel Xeon Processor E5-2699 v4, 2.20 GHz
256 GB memory
Oracle Linux 7.2
Intel Integrated Performance Primitives for Linux, Version 9.0 (Update 2) 17 Feb 2016

Oracle Server X5-2 system
2 x Intel Xeon Processor E5-2699 v3, 2.30 GHz
256 GB memory
Oracle Linux 6.5
Intel Integrated Performance Primitives for Linux, Version 8.2 (Update 1) 07 Nov 2014

Sun Server X4-2 system
2 x Intel Xeon Processor E5-2697 v2, 2.70 GHz
256 GB memory
Oracle Linux 6.5
Intel Integrated Performance Primitives for Linux, Version 8.2 (Update 1) 07 Nov 2014

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-cache and on-chip using various digests, including SHA1 and SHA2 (SHA256, SHA384, SHA512).

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various digests. They were run using optimized libraries for each platform to obtain the best possible performance. The encryption tests were run with pseudo-random data of sizes 64 bytes, 1024 bytes and 8192 bytes. The benchmark tests were designed to run out of cache, so memory bandwidth and latency are not the limitations.

See Also

Disclosure Statement

Copyright 2016, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 4/13/2016.

AES Encryption: SPARC T7-2 Beats x86 E5 v4

Oracle's cryptography benchmark measures security performance on important AES security modes. Oracle's SPARC M7 processor with its software in silicon security is faster than x86 servers that have the AES-NI instructions. In this test, the performance of on-processor encryption operations is measured (32 KB encryptions). Multiple threads are used to measure each processor's maximum throughput. Oracle's SPARC T7-2 server shows dramatically faster encryption compared to current x86 two processor servers.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 3.3 times faster executing AES-CFB 256-bit key encryption (in cache) than the Intel Xeon Processor E5-2699 v4 (with AES-NI) running Oracle Linux 7.2.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 3.1 times faster executing AES-CFB 128-bit key encryption (in cache) than the Intel Xeon Processor E5-2699 v4 (with AES-NI) running Oracle Linux 7.2.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 4.0 times faster executing AES-CFB 256-bit key encryption (in cache) than Intel Xeon Processor E5-2699 v3 (with AES-NI) running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 3.7 times faster executing AES-CFB 128-bit key encryption (in cache) than Intel Xeon Processor E5-2699 v3 (with AES-NI) running Oracle Linux 6.5.

  • AES-CFB encryption is used by Oracle Database for Transparent Data Encryption (TDE) which provides security for database storage.

Oracle has also measured SHA digest performance on the SPARC M7 processor.

Performance Landscape

Presented below are results for running encryption using the AES cipher with the CFB, CBC, GCM and CCM modes for key sizes of 128, 192 and 256. Decryption performance was similar and is not presented. Results are presented as MB/sec (10**6). All SPARC M7 processor results were run as part of this benchmark effort. All other results were run during previous benchmark efforts.

Encryption Performance – AES-CFB (used by Oracle Database)

Performance is presented for in-cache AES-CFB128 mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CFB
SPARC M7 4.13 2 126,948 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 53,794 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v4 2.20 2 39,034 Oracle Linux 7.2, IPP/AES-NI
Intel E5-2699 v3 2.30 2 31,924 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 19,964 Oracle Linux 6.5, IPP/AES-NI
AES-192-CFB
SPARC M7 4.13 2 144,299 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 60,736 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v4 2.20 2 45,351 Oracle Linux 7.2, IPP/AES-NI
Intel E5-2699 v3 2.30 2 37,157 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 23,218 Oracle Linux 6.5, IPP/AES-NI
AES-128-CFB
SPARC M7 4.13 2 166,324 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 68,691 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v4 2.20 2 54,179 Oracle Linux 7.2, IPP/AES-NI
Intel E5-2699 v3 2.30 2 44,388 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 27,755 Oracle Linux 6.5, IPP/AES-NI

Encryption Performance – AES-CBC

Performance is presented for in-cache AES-CBC mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CBC
SPARC M7 4.13 2 134,278 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 56,788 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v4 2.20 2 38,943 Oracle Linux 7.2, IPP/AES-NI
Intel E5-2699 v3 2.30 2 31,894 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 19,961 Oracle Linux 6.5, IPP/AES-NI
AES-192-CBC
SPARC M7 4.13 2 152,961 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 63,937 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v4 2.20 2 45,285 Oracle Linux 7.2, IPP/AES-NI
Intel E5-2699 v3 2.30 2 37,021 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 23,224 Oracle Linux 6.5, IPP/AES-NI
AES-128-CBC
SPARC M7 4.13 2 175,151 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 72,870 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v4 2.20 2 54,076 Oracle Linux 7.2, IPP/AES-NI
Intel E5-2699 v3 2.30 2 44,103 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 27,730 Oracle Linux 6.5, IPP/AES-NI

Encryption Performance – AES-GCM (used by ZFS Filesystem)

Performance is presented for in-cache AES-GCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-GCM
SPARC M7 4.13 2 74,221 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 34,022 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 15,338 Oracle Solaris 11.1, libsoftcrypto + libumem
AES-192-GCM
SPARC M7 4.13 2 81,448 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 36,820 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 15,768 Oracle Solaris 11.1, libsoftcrypto + libumem
AES-128-GCM
SPARC M7 4.13 2 86,223 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 38,845 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 16,405 Oracle Solaris 11.1, libsoftcrypto + libumem

Encryption Performance – AES-CCM (alternative used by ZFS Filesystem)

Performance is presented for in-cache AES-CCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CCM
SPARC M7 4.13 2 67,669 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 28,909 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 19,447 Oracle Linux 6.5, IPP/AES-NI
AES-192-CCM
SPARC M7 4.13 2 77,711 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 33,116 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 22,634 Oracle Linux 6.5, IPP/AES-NI
AES-128-CCM
SPARC M7 4.13 2 90,729 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 38,529 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 26,951 Oracle Linux 6.5, IPP/AES-NI

Configuration Summary

SPARC T7-2 server
2 x SPARC M7 processor, 4.13 GHz
1 TB memory
Oracle Solaris 11.3

SPARC T5-2 server
2 x SPARC T5 processor, 3.60 GHz
512 GB memory
Oracle Solaris 11.2

Oracle Server X6-2L system
2 x Intel Xeon Processor E5-2699 v4, 2.20 GHz
256 GB memory
Oracle Linux 7.2
Intel Integrated Performance Primitives for Linux, Version 9.0 (Update 2) 17 Feb 2016

Oracle Server X5-2 system
2 x Intel Xeon Processor E5-2699 v3, 2.30 GHz
256 GB memory
Oracle Linux 6.5
Intel Integrated Performance Primitives for Linux, Version 8.2 (Update 1) 07 Nov 2014

Sun Server X4-2 system
2 x Intel Xeon Processor E5-2697 v2, 2.70 GHz
256 GB memory
Oracle Linux 6.5
Intel Integrated Performance Primitives for Linux, Version 8.2 (Update 1) 07 Nov 2014

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-cache and on-chip using various ciphers, including AES-128-CFB, AES-192-CFB, AES-256-CFB, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CCM, AES-192-CCM, AES-256-CCM, AES-128-GCM, AES-192-GCM and AES-256-GCM.

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various ciphers. They were run using optimized libraries for each platform to obtain the best possible performance. The encryption tests were run with pseudo-random data of size 32 KB. The benchmark tests were designed to run out of cache, so memory bandwidth and latency are not the limitations.

See Also

Disclosure Statement

Copyright 2016, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 4/13/2016.

Thursday Mar 24, 2016

PeopleSoft Human Capital Management 9.1 FP2: SPARC M7-8 Results Using Oracle Advances Security Transparent Data Encryption

Using Oracle Advanced Security Transparent Data Encryption (TDE), Oracle's SPARC M7-8 server using Oracle's SPARC M7 processor's software in silicon cryptography instructions produced results on Oracle's PeopleSoft Human Capital Management 9.1 FP2 Benchmark that were nearly identical to results run without TDE (clear-text runs). The benchmark consists of three different components, PeopleSoft HR Self-Service Online, PeopleSoft Payroll Batch, and the combined PeopleSoft HR Self-Service Online and PeopleSoft Payroll Batch. The benchmarks were run on a virtualized two-chip, 1 TB LDom of the SPARC M7-8 server.

Using TDE enforces data-at-rest encryption in the database layer. Applications and users authenticated to the database continue to have access to application data transparently (no application code or configuration changes are required), while attacks from OS users attempting to read sensitive data from tablespace files and attacks from thieves attempting to read information from acquired disks or backups are denied access to the clear-text data.

  • The PeopleSoft HR online-only and the PeopleSoft HR online combined with PeopleSoft Payroll batch showed similar Search/Save average response times using TDE compared to the corresponding clear-text runs.

  • The PeopleSoft Payroll batch-only run showed only around 4% degradation in batch throughput using TDE compared to the clear-text run.

  • The PeopleSoft HR online combined with PeopleSoft Payroll batch run showed less than 5% degradation in batch throughput (payments per hour) using TDE compared to the clear-text result.

  • On the combined benchmark, the virtualized two-chip LDom of the SPARC M7-8 server with TDE demonstrated around 5 times better Search and around 8 times better Save average response times running nearly double the number of online users for the online component compared to the ten-chip x86 clear-text database solution from Cisco.

  • On the PeopleSoft Payroll batch run and using only a single chip in the virtualized two-chip LDom on the SPARC M7-8 server, the TDE solution demonstrated 1.7 times better batch throughput compared to a four-chip Cisco UCSB460 M4 server with clear-text database.

  • On the PeopleSoft Payroll batch run and using only a single chip in the virtualized two-chip LDom on the SPARC M7-8 server, the TDE solution demonstrated around 2.3 times better batch throughput compared to a nine-chip IBM zEnterprise z196 server (EC 2817-709, 9-way, 8943 MIPS) with clear-text database.

  • On the combined benchmark, the two SPARC M7 processor LDom (in SPARC M7-8) can run the same number of online users with TDE as a dynamic domain (PDom) of eight SPARC M6 processors (in SPARC M6-32) with clear-text database with better online response times, batch elapsed times and batch throughput.

Performance Landscape

All results presented are taken from Oracle's PeopleSoft benchmark white papers.

The first table presents the combined results, running both the PeopleSoft HR Self-Service Online and Payroll Batch tests concurrently.

PeopleSoft HR Self-Service Online And Payroll Batch Using Oracle Database 11g
System
Processors
Chips
Used
Users Search/Save Batch Elapsed
Time
Batch Pay/Hr
SPARC M7-8
(Secure with TDE)
SPARC M7
2 35,000 0.55 sec/0.34 sec 23.72 min 1,265,969
SPARC M7-8
(Unsecure)
SPARC M7
2 35,000 0.67 sec/0.42 sec 22.71 min 1,322,272
SPARC M6-32
(Unsecure)
SPARC M6
8 35,000 1.80 sec/1.12 sec 29.2 min 1,029,440
Cisco 1 x B460 M4, 3 x B200 M3
(Unsecure)
Intel E7-4890 v2, Intel E5-2697 v2
10 18,000 2.70 sec/2.60 sec 21.70 min 1,383,816

The following results are running only the Peoplesoft HR Self-Service Online test.

PeopleSoft HR Self-Service Online Using Oracle Database 11g
System
Processors
Chips
Used
Users Search/Save
Avg Response Times
SPARC M7-8 (Secure with TDE)
SPARC M7
2 40,000 0.52 sec/0.31 sec
SPARC M7-8 (Unsecure)
SPARC M7
2 40,000 0.55 sec/0.33 sec
SPARC M6-32 (Unsecure)
SPARC M6
8 40,000 2.73 sec/1.33 sec
Cisco 1 x B460 M4, 3 x B200 M3 (Unsecure)
Intel E7-4890 v2, Intel E5-2697 v2
10 20,000 0.35 sec/0.17 sec

The following results are running only the Peoplesoft Payroll Batch test. For the SPARC M7-8 server results, only one of the processors was used per LDom. This was accomplished using processor sets to further restrict the test to a single SPARC M7 processor.

PeopleSoft Payroll Batch Using Oracle Database 11g
System
Processors
Chips
Used
Batch Elapsed
Time
Batch Pay/Hr
SPARC M7-8 (Secure with TDE)
SPARC M7
1 13.34 min 2,251,034
SPARC M7-8 (Unsecure)
SPARC M7
1 12.85 min 2,336,872
SPARC M6-32 (Unsecure)
SPARC M6
2 18.27 min 1,643,612
Cisco UCS B460 M4 (Unsecure)
Intel E7-4890 v2
4 23.02 min 1,304,655
IBM z196 (Unsecure)
zEnterprise (5.2 GHz, 8943 MIPS)
9 30.50 min 984,551

Configuration Summary

System Under Test:

SPARC M7-8 server with
8 x SPARC M7 processor (4.13 GHz)
4 TB memory
Virtualized as an Oracle VM Server for SPARC (LDom) with
2 x SPARC M7 processor (4.13 GHz)
1 TB memory

Storage Configuration:

2 x Oracle ZFS Storage ZS3-2 appliance (DB Data) each with
40 x 300 GB 10K RPM SAS-2 HDD,
8 x Write Flash Accelerator SSD and
2 x Read Flash Accelerator SSD 1.6TB SAS
2 x Oracle Server X5-2L as COMSTAR nodes (DB redo logs & App object cache) each with
2 x Intel Xeon Processor E5-2630 v3
32 GB memory
4 x 1.6 TB NVMe SSD

Software Configuration:

Oracle Solaris 11.3
Oracle Database 11g Release 2 (11.2.0.3.0)
PeopleSoft Human Capital Management 9.1 FP2
PeopleSoft PeopleTools 8.52.03
Oracle Java SE 6u32
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 043
Oracle WebLogic Server 11g (10.3.5)

Benchmark Description

The PeopleSoft Human Capital Management benchmark simulates thousands of online employees, managers and Human Resource administrators executing transactions typical of a Human Resources Self Service application for the Enterprise. Typical transactions are: viewing paychecks, promoting and hiring employees, updating employee profiles, etc. The database tier uses a database instance of about 500 GB in size, containing information for 500,480 employees. The application tier for this test includes web and application server instances, specifically Oracle WebLogic Server 11g, PeopleSoft Human Capital Management 9.1 FP2 and Oracle Java SE 6u32.

Key Points and Best Practices

In the HR online along with Payroll batch run, the LDom had one Oracle Solaris Zone of 7 cores containing the Web tier, two Oracle Solaris Zones of 16 cores each containing the Application tier and one Oracle Solaris Zone of 23 cores containing the Database tier. Two cores were dedicated to network and disk interrupt handling. In the HR online only run, the LDom had one Oracle Solaris Zone of 12 cores containing the Web tier, two Oracle Solaris Zones of 18 cores each containing the Application tier and one Oracle Solaris Zone of 14 cores containing the Database tier. 2 cores were dedicated to network and disk interrupt handling. In the Payroll batch only run, the LDom had one Oracle Solaris Zone of 31 cores containing the Database tier. 1 core was dedicated to disk interrupt handling.

All database data files, recovery files and Oracle Clusterware files for the PeopleSoft test were created with the Oracle Automatic Storage Management (Oracle ASM) volume manager for the added benefit of the ease of management provided by Oracle ASM integrated storage management solution.

In the application tier on the LDom, 5 PeopleSoft application domains with 350 application servers (70 per domain) were hosted in two separate Oracle Solaris Zones for a total of 10 domains with 700 application server processes.

All PeopleSoft Application processes and the 32 Web Server JVM instances were executed in the Oracle Solaris FX scheduler class.

See Also

Disclosure Statement

Copyright 2016, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of March 24, 2016.

Yahoo Cloud Serving Benchmark: SPARC T7-4 with Flash Storage and Oracle NoSQL Beats x86 E5 v3 Per Chip

Oracle's SPARC T7-4 server delivered 1.8 million ops/sec on 1.2 billion records for the Yahoo Cloud Serving Benchmark (YCSB) 95% read/5% update workload. Oracle NoSQL Database was used in these tests. NoSQL is important for Big Data Analysis and for Cloud Computing.

  • In the run comparing the performance of a single SPARC M7 processor to one Intel Xeon Processor E5-2699 v3 for the YCSB 95% read/5% update workload, the SPARC M7 processor was 2.6 times better per chip than the x86 processor and on a per core basis, 1.4 times better than the x86 processor.

  • The SPARC T7-4 server showed low average latency of 0.86 msec on read and 5.37 msec on write while achieving 1.8 million ops/sec.

  • The SPARC T7-4 server delivered 313K inserts/sec on 1.2 billion records with a low average latency of 2.75 msec.

  • One processor performance on the SPARC T7-4 server was over half a million (519K ops/sec) on 300 million records for the YCSB 95% read/5% update workload.

  • The SPARC T7-4 server scaling from 1 to 4 processors was 3.5x while maintaining low latency.

These results show the SPARC T7-4 server can handle a large database while achieving high throughput with low latency for cloud computing.

Performance Landscape

This table presents single chip results comparing the SPARC M7 processor (in a SPARC T7-4 server) to the Intel Xeon Processor E5-2699 v3 (in a 2-socket x86 server). All of the following results were run as part of this benchmark effort.

Comparing Single Chip Performance on YCSB Benchmark
Processor Insert Mixed Load (95% Read/5% Update)
Throughput
ops/sec
Average Latency Throughput
ops/sec
Average Latency
Write msec Read msec Write msec
SPARC M7 89,177 2.42 519,352 0.82 3.61
E5-2699 v3 55,636 1.18 202,701 0.71 2.30

The following table shows the performance of the Yahoo Clouds Serving Benchmark on multiple processor counts on the SPARC T7-4 server.

SPARC T7-4 server running YCSB benchmark
CPUs Shards Insert Mixed Load (95% Read/5% Update)
Throughput
ops/sec
Average Latency Throughput
ops/sec
Average Latency
Write msec Read msec Write msec
4 16 313,044 2.75 1,814,911 0.86 5.37
3 12 245,145 2.63 1,412,424 0.82 5.49
2 8 169,720 2.54 974,243 0.82 4.76
1 4 89,177 2.42 519,352 0.82 3.61

Configuration Summary

SPARC System:

SPARC T7-4 server
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB)
8 x Oracle Flash Accelerator F160 PCIe card
8 x Sun Dual Port 10 GbE PCIe 2.0 Low Profile Adapter, Base-T

x86 System:

Oracle Server X5-2L server
2 x Intel Xeon Processor E5-2699 v3 (2.3 GHz)
384 GB memory
1 x Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Emulex
1 x Sun Dual 10GbE SFP+ PCIe 2.0 Low Profile Adapter

External Storage: COMSTAR (Common Multiprotocol SCSI TARget)
2 x Sun Server X3-2L servers configured as COMSTAR nodes, each with
2 x Intel Xeon Processor E5-2609 (2.4 GHz)
4 x Sun Flash Accelerator F40 PCIe Cards, 400 GB each
1 x 8 Gb dual port HBA

Software Configuration:

Oracle Solaris 11.3 (11.3.1.2.0)
Logical Domains Manager v3.3.0.0.17 (running on the SPARC T7-4)
Oracle NoSQL Database, Enterprise Edition 12c R1.3.3.4
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)

Benchmark Description

The Yahoo Cloud Serving Benchmark (YCSB) is a performance benchmark for cloud database and their systems. The benchmark documentation says:

With the many new serving databases available including Sherpa, BigTable, Azure and many more, it can be difficult to decide which system is right for your application, partially because the features differ between systems, and partially because there is not an easy way to compare the performance of one system versus another. The goal of the Yahoo Cloud Serving Benchmark (YCSB) project is to develop a framework and common set of workloads for evaluating the performance of different "key-value" and "cloud" serving stores.

Key Points and Best Practices

  • The SPARC T7-4 server showed 3.5x scaling from 1 to 4 sockets while maintaining low latency.

  • Two Oracle VM for SPARC (LDom) servers were created per processor, for a total of seven LDoms plus a primary domain. Each domain was configured with 240 GB memory accessing two PCIe IO slots using the Direct IO feature.

  • The Oracle Flash Accelerator F160 PCIe cards demonstrated excellent IO capability and performed 812K read IOPS using eight Oracle Flash Accelerator F160 PCIe cards (over 100K IOPS per card) during the 1.8 million ops/sec benchmark run.

  • Balanced memory bandwidth was delivered across all four processors achieving an average total of 254 GB/sec during the 1.8 million ops/sec run.

  • The 1.2 billion records were loaded into 16 Shards with the replication factor set to 3.

  • Each LDom hosted two Storage Nodes so two processor sets were created for each Storage Node. The default processor set was additionally used for OS and IO interrupts. The processor sets were used for isolation and to ensure a balanced load.

  • Fixed priority class was assigned to Oracle NoSQL Storage Node java processes.

  • The ZFS record size was set to 16K (default 128K) and this worked best for the 95% read/5% update workload.

  • A total of eight Sun Server X4-2 and Sun Server X4-2L systems were used as clients for generating the workload.

  • The LDoms and client systems were connected through a 10 GbE network.

  • Oracle Server X5-2L system configuration was as follows:
       1 chip (the other chip disabled by psradm)
       2 x Sun Server X3-2L system COMSTAR nodes (total 8 x Sun Flash Accelerator F40 PCIe Cards)
       1 x Sun Server X4-2 system as client connected through a 10 GbE network
       A processor set for NoSQL processes and the default processor set for OS and IO interrupts
       Fixed priority class for NoSQL Storage Node java processes
       ZFS 16K record size
       1 Shard (100M records)

See Also

Disclosure Statement

Copyright 2016, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of March 24, 2016.

Tuesday Mar 22, 2016

Siebel PSPP: SPARC T7-2 World Record Result, Beats IBM

Oracle set a new world record for the Siebel Platform Sizing and Performance Program (PSPP) benchmark using Oracle's SPARC T7-2 server for the application server with Oracle's Siebel CRM 8.1.1.4 Industry Applications and Oracle Database 12c running on Oracle Solaris.

  • The SPARC T7-2 server running the application tier achieved 55,000 users with sub-second response time and with throughput of 457,909 business transactions per hour on the Siebel PSPP benchmark.

  • The SPARC T7-2 server in the application tier delivered 3.3 times more users on a per chip basis compared to published IBM POWER8 based server results.

  • For the new Oracle results, eight cores of a SPARC T7-2 server were used for the database tier running at 32% utilization (as measured by mpstat). The IBM result used 6 cores at about 75% utilization for the database/gateway tier.

  • The SPARC T7-2 server in the application tier delivered nearly the same number of users on a per core basis compared to published IBM POWER8 based server results.

  • The SPARC T7-2 server in the application tier delivered nearly 2.8 times more users on a per chip basis compared to earlier published SPARC T5-2 server results.

  • The SPARC T7-2 server in the application tier delivered nearly 1.4 times more users on a per core basis compared to earlier published SPARC T5-2 server results.

  • The SPARC T7-2 server used Oracle Solaris Zones which provide flexible, scalable and manageable virtualization to scale the application within and across multiple nodes.

The Siebel 8.1.1.4 PSPP workload includes Siebel Call Center and Order Management System.

Performance Landscape

Application Server TPH Users Users/
Chip
Users/
Core
Response Times
Call
Center
Order
Mgmt
1 x SPARC T7-2
(2 x SPARC M7 @4.13 GHz)
457,909 55,000 27,500 859 0.045 sec 0.257 sec
3 x IBM S824 (each 2 x 8 active core
LPARs, POWER8 @4.1 GHz)
418,976 50,000 8,333 1041 0.031 sec 0.175 sec
2 x SPARC T5-2
(each with 2 x SPARC T5 @3.6 GHz)
333,339 40,000 10,000 625 0.110 sec 0.608 sec

TPH – Business transactions throughput per hour

Configuration Summary

Application Server:

1 x SPARC T7-2 server with
2 x SPARC M7 processors, 4.13 GHz
1 TB memory
6 x 300 GB SAS internal disks
Oracle Solaris 11.3
Siebel CRM 8.1.1.4 SIA

Web/Database/Gateway Server:

1 x SPARC T7-2 server with
2 x SPARC M7 processors, 4.13 GHz (20 active cores: 8 cores for DB, 12 for Web/Gateway)
512 TB memory
6 x 300 GB SAS internal disks
2 x 1.6 TB NVMe SSD
Oracle Solaris 11.3
Siebel CRM 8.1.1.4 SIA
iPlanet Web Server 7
Oracle Database 12c (12.1.0.2)

Benchmark Description

Siebel PSPP benchmark includes Call Center and Order Management:

  • Siebel Financial Services Call Center – Provides the most complete solution for sales and service, allowing customer service and telesales representatives to provide superior customer support, improve customer loyalty, and increase revenues through cross-selling and up-selling.

    High-level description of the use cases tested: Incoming Call Creates Opportunity, Quote and Order and Incoming Call Creates Service Request. Three complex business transactions are executed simultaneously for specific number of concurrent users. The ratios of these 3 scenarios were 30%, 40%, 30% respectively, which together were totaling 70% of all transactions simulated in this benchmark. Between each user operation and the next one, the think time averaged approximately 10, 13, and 35 seconds respectively.

  • Siebel Order Management – Oracle's Siebel Order Management allows employees such as salespeople and call center agents to create and manage quotes and orders through their entire life cycle. Siebel Order Management can be tightly integrated with back-office applications allowing users to perform tasks such as checking credit, confirming availability, and monitoring the fulfillment process.

    High-level description of the use cases tested: Order & Order Items Creation and Order Updates. Two complex Order Management transactions were executed simultaneously for specific number of concurrent users concurrently with aforementioned three Call Center scenarios above. The ratio of these 2 scenarios was 50% each, which together were totaling 30% of all transactions simulated in this benchmark. Between each user operation and the next one, the think time averaged approximately 20 and 67 seconds respectively.

See Also

Disclosure Statement

Copyright 2016, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of March 22, 2016.

Thursday Mar 17, 2016

OLTPbenchmark Workload, Open-Source Benchmark: SPARC T7-1 Performance Beats IBM S824, Beats x86 E5-2699 v3

OLTPbenchmark is an open-source database benchmarking tool that includes an On-Line Transaction Processing (OLTP) transactional workload derived from the industry standard TPC-C workload.

Oracle's SPARC T7-1 server demonstrated OLTP performance that is 2.76 times faster per chip than Intel Xeon Processor E5-2699 v3 and 5.47 times faster per chip than an IBM POWER8 (3.5 GHz) processor. This means that a SPARC T7-1 is 1.38 times faster than a 2-chip x86 E5 v3 based server. The SPARC T7-1 server is also 1.37 times faster than an IBM Power System S824 (POWER8) server. On per core performance, the SPARC M7 processor used in the SPARC T7-1 server out performed the IBM POWER8 processor. All of these tests used Oracle Database 12c Release 1 (12.1.0.2) Enterprise Edition for the database.

Comparing the SPARC T7-1 server to the 2-chip x86 E5 v3 server equipped with two 2.3 GHz Intel Xeon Processor E5-2699 v3, we see the following advantages for the the SPARC T7-1 server.

  • On a per chip basis, the SPARC T7-1 server demonstrated 2.76 times better performance compared to the 2-chip x86 E5 v3 server.

  • At the system level, the SPARC T7-1 server demonstrated 1.38 times better performance compared to the 2-chip x86 E5 v3 server.

Comparing the SPARC T7-1 server to an IBM Power System S824 server equipped with four 3.5 GHz POWER8 processors (6 cores), we see the following advantages for the the SPARC T7-1 server.

  • On a per chip basis, the SPARC T7-1 server demonstrated nearly 5.47 times better performance compared to an IBM Power System S824 server.

  • On a per core basis the SPARC T7-1 server demonstrated nearly 3% better performance per core compared to an IBM Power System S824 server.

  • At the system level, the SPARC T7-1 server demonstrated nearly 1.37 times better performance compared to the IBM Power System S824 server.

The OLTPbenchmark transactional workload is based upon the TPC-C benchmark specification. Details of the configuration and parameters used are available in the reports referenced in the See Also section.

Performance Landscape

All OLTPbenchmark server results were run as part of this benchmark effort (except as noted). All results are run with Oracle Database 12c Release 1 Enterprise Edition. Results are ordered by TPM/core, highest to lowest.

OLTPbenchmark Transactional Workload
Relative Performance to x86 System
System TPM TPM/chip TPM/core
SPARC T7-1
1 x SPARC M7 (32 cores/chip, 32 total)
1.38x 2.76x 1.55x
IBM Power System S824
4 x POWER8 (6 cores/chip, 24 total)
1.01x 0.50x 1.51x
Oracle Server X5-2
2 x Intel E5-2699 v3 (18 cores/chip, 36 total)
1.00x 1.00x 1.00x

TPM – OLTPbenchmark transactions per minute

Results on the IBM Power System S824 were run by Oracle engineers using Oracle Database 12c.

Configuration Summary

Systems Under Test:

SPARC T7-1 server with
1 x SPARC M7 processor (4.13 GHz)
512 GB memory
2 x 600 GB 10K RPM SAS2 HDD
1 x Sun Dual Port 10 GbE PCIe 2.0 Networking card with Intel 82599 10 GbE Controller
1 x Sun Storage 16 Gb Fibre Channel Universal HBA
Oracle Solaris 11.3
Oracle Database 12c Release 1 (12.1.0.2) Enterprise Edition
Oracle Grid Infrastructure 12c Release 1 (12.1.0.2)

Oracle Server X5-2 with
2 x Intel Xeon processor E5-2699 v3 (2.3 GHz)
512 GB memory
2 x 600 GB 10K RPM SAS2 HDD
1 x Sun Dual Port 10 GbE PCIe 2.0 Networking card with Intel 82599 10 GbE Controller
1 x Sun Storage 16 Gb Fibre Channel Universal HBA
Oracle Linux 6.5
Oracle Database 12c Release 1 (12.1.0.2) Enterprise Edition
Oracle Grid Infrastructure 12c Release 1 (12.1.0.2)

IBM Power System S824 with
4 x POWER8 (3.5 GHz)
512 GB memory
4 x 300 GB 15K RPM SAS HDD
1 x 10 GbE Network Interface
1 x 16 Gb Fibre Channel HBA
AIX 7.1 TL3 SP3
Oracle Database 12c Release 1 (12.1.0.2) Enterprise Edition
Oracle Grid Infrastructure 12c Release 1 (12.1.0.2)

Storage Servers:

1 x Oracle Server X5-2L with
2 x Intel Xeon Processor E5-2630 v3 (2.4 GHz)
32 GB memory
1 x Sun Storage 16 Gb Fibre Channel Universal HBA
4 x 1.6 TB NVMe SSD
2 x 600 GB SAS HDD
Oracle Solaris 11.3

1 x Oracle Server X5-2L with
2 x Intel Xeon Processor E5-2630 v3 (2.4 GHz)
32 GB memory
1 x Sun Storage 16 Gb Fibre Channel Universal HBA
14 x 600 GB SAS HDD
Oracle Solaris 11.3

Benchmark Description

The OLTPbenchmark workload as described from the OLTPbenchmark website:

This is a database performance testing tool that allows you to conduct database workload replay, industry-standard benchmark testing, and scalability testing under various loads, such as scaling a population of users who executes order-entry transactions against a wholesale supplier database.
OLTPbenchmark supports many databases including Oracle, SQL Server, DB2, TimesTen, MySQL, MariaDB, PostgreSQL, Greenplum, Postgres Plus Advanced Server, Redis and Trafodion SQL on Hadoop.

Key Points and Best Practices

  • For these tests, an 800 warehouse database was created to compare directly with results posted by Intel.

  • To improve the scalability, the OrderLine table was partitioned and loaded into a separate tablespace using the OLTPbenchmark GUI. The default blocksize was 8K and the OrderLine tablespace blocksize was 16K.

  • To reduce latency of Oracle "cache chains buffers" wait events, the OLTPbenchmark kit was modified by adding partitioning to the NEW_ORDER table as well as the ORDERS_I1 and ORDERS_I2 indexes.

  • To reduce latency of Oracle "library cache: mutex X" wait events, added recommended workarounds from the following Intel blog

  • Refer to the detailed configuration documents in the See Also section below for the list of Oracle parameters.

See Also

Disclosure Statement

Copyright 2016, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of March 17, 2016.

OLTPbenchmark TPC-C Workload, Open-Source Benchmark: SPARC T7-1 Performance Beats IBM S824, Beats x86 E5-2699 v3

This page moved to new location.

Tuesday Mar 15, 2016

Oracle Advanced Security – Transparent Data Encryption: Secure Database on SPARC M7 Processor Performance Nearly the Same as Clear

Oracle's SPARC T7-1 server is faster and more efficient than a two-processor x86 server (Intel Xeon Processor E5-2699 v3) in processing I/O intensive database queries when running the Oracle Advanced Security Transparent Data Encryption (TDE) feature of Oracle Database 12c.

  • The single-processor SPARC T7-1 server is up to 1.4 times faster than the two-processor x86 system for all queries tested, with TDE enabled and without. On a per chip basis, Oracle's SPARC M7 processor is over twice the performance of the Intel Xeon Processor E5-2699 v3 (Haswell).

  • The SPARC T7-1 server is more efficient than the two-processor x86 system for all queries tested, with TDE enabled and without, as measured by CPU utilization. For example, on Query A the CPU utilization nearly doubled on the x86 server (41% on clear to 79% with TDE) while on the same Query A the SPARC T7-1 server CPU utilization 30% on clear to 38% with TDE.

In a head-to-head comparison of system performance using Oracle's Transparent Data Encryption, the SPARC T7-1 single processor system with one SPARC M7 (4.13 GHz) processor outperforms a two-processor x86 server with Intel Xeon Processor E5-2699 v3 (2.3 GHz) processors. The two systems were configured with the same storage environment, 256 GB of memory, the same version of Oracle Database 12c, and with the same high-level of tunings. All tests run with TDE security used the hardware instructions available on the processors (SPARC or x86).

Performance Landscape

In the first table below, results are presented for three different queries and a full table scan. The results labeled "clear" were executed in clear text or without Transparent Data Encryption. The results labeled "TDE" are with AES-128 encryption enabled for all of the data tables used in the tablespace with the default parameter of db_block_checking=false.

Query Times (seconds – smaller is better)
System Security Query A Query B Query C Full Table Scan
SPARC T7-1 clear 64.0 61.0 54.8 52.7
TDE 65.3 62.8 56.3 53.4
TDE to Clear ratio 1.02x 1.03x 1.03x 1.01x
Two x86 E5 v3 clear 69.6 68.6 61.7 54.5
TDE 89.4 88.7 73.5 58.1
TDE to Clear ratio 1.3x 1.3x 1.2x 1.1x
Comparing SPARC and x86 on Query Times
SPARC advantage – clear 1.09x 1.12x 1.13x 1.03x
SPARC advantage – TDE 1.37x 1.41x 1.31x 1.09x

From the table above, the average increase in the query's execution time for the SPARC T7-1 server with TDE enabled is about 2%. The average slow down for the x86 server is about 20%.

Looking into the utilization of the individual processor's cores, reveals that the single processor SPARC T7-1 server, with 32 cores, has an average core utilization of 36% with TDE enabled. The SPARC T7-1 server still has plenty of cycles and additional processing capability to handle other work. The two-processor x86 E5 v3 server with a total of 36 cores reveals an average core utilization of over 79% with TDE enabled. This means there is little to no room in the processors for handling additional work beyond executing just one of these queries individually without affecting the query's execution time and resources. These results are in the table below.

Average Core Utilization (smaller is better)
System Security Query A Query B Query C Full Table Scan
SPARC T7-1 clear 30% 32% 27% 21%
TDE 38% 40% 36% 31%
TDE to Clear ratio 1.3x 1.3x 1.3x 1.5x
Two x86 E5 v3 clear 41% 40% 38% 41%
TDE 79% 73% 80% 86%
TDE to Clear ratio 1.9x 1.8x 2.1x 2.1x
Comparing SPARC and x86 on Utilization
SPARC advantage – clear 1.37x 1.25x 1.41x 1.95x
SPARC advantage – TDE 2.08x 1.83x 2.22x 2.77x

Configuration Summary

SPARC Configuration:

SPARC T7-1 server with
1 x SPARC M7 processor (4.13 GHz, 32 cores)
256 GB memory
Flash storage
Oracle Solaris 11.3
Oracle Enterprise Database 12c

x86 Configuration:

Oracle Server X5-2L system with
2 x Intel Xeon Processor E5-2699 v3 (2.3 GHz, 36 total cores)
256 GB memory
Flash storage
Oracle Solaris 11.3
Oracle Enterprise Database 12c

Note that the two systems were configured with the same storage environment, the same version of Oracle Database 12c, and with the same high-level of tunings.

Benchmark Description

The benchmark executes a set of queries on a table of approximately 1 TB in size. The database contains two copies of the table, one that was built using security and one that does not. The tablespaces used the same layout on the storage and DBMS parameters. Each query is executed individually after a restart of the database and the average of 5 executions of the query is used as the average execution time and the gathering of other system statistics.

Description of the queries:

  • Query A: Determines how the market share of a given nation within a region has changed over two years for a given part type.
  • Query B: Identifies customers who might have a problem with parts shipped to them.
  • Query C: Determines how much average yearly revenue would be lost if orders were no longer filled for small quantities of certain parts.
  • Full Table Scan: Full table scan of the largest table, over 700 GB of data

Key Points and Best Practices

  • For each system, the 1 TB of data is spread evenly across the flash storage in 1 MB stripes. This was determined to be the most efficient stripe size for a data warehouse environment with large sequential read operations. With each system having the same amount of memory and database software, the same tuning parameters were used on each system to ensure a fair comparison and that each query induced roughly the same amount of I/O throughput per query.

  • Efficiency was verified by looking at not only the average processor utilization (as measured by Oracle Solaris tool pgstat(1M)), but also by measuring the average processor core utilization at the hardware level.

See Also

Disclosure Statement

Copyright 2016, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of March 14, 2016.

Thursday Nov 19, 2015

SPECvirt_2013: SPARC T7-2 World Record Performance for Two- and Four-Chip Systems

Oracle's SPARC T7-2 server delivered a world record SPECvirt_sc2013 result for systems with two to four chips.

  • The SPARC T7-2 server produced a result of 3198 @ 179 VMs SPECvirt_sc2013.

  • The two-chip SPARC T7-2 server beat the best four-chip x86 Intel E7-8890 v3 server (HP ProLiant DL580 Gen9), demonstrating that the SPARC M7 processor is 2.1 times faster than the Intel Xeon Processor E7-8890 v3 (chip-to-chip comparison).

  • The two-chip SPARC T7-2 server beat the best two-chip x86 Intel E5-2699 v3 server results by nearly 2 times (Huawei FusionServer RH2288H V3, HP ProLiant DL360 Gen9).

  • The two-chip SPARC T7-2 server delivered nearly 2.2 times the performance of the four-chip IBM Power System S824 server solution which used 3.5 GHz POWER8 six core chips.

  • The SPARC T7-2 server running Oracle Solaris 11.3 operating system, utilizes embedded virtualization products as the Oracle Solaris 11 zones, which in turn provide a low overhead, flexible, scalable and manageable virtualization environment.

  • The SPARC T7-2 server result used Oracle VM Server for SPARC 3.3 and Oracle Solaris Zones providing a flexible, scalable and manageable virtualization environment.

Performance Landscape

Complete benchmark results are at the SPEC website, SPECvirt_sc2013 Results. The following table highlights the leading two-, and four-chip results for the benchmark, bigger is better.

SPECvirt_sc2013
Leading Two to Four-Chip Results
System
Processor
Chips Result @ VMs Virtualization Software
SPARC T7-2
SPARC M7 (4.13 GHz, 32core)
2 3198 @ 179 Oracle VM Server for SPARC 3.3
Oracle Solaris Zones
HP ProLiant DL580 Gen9
Intel E7-8890 v3 (2.5 GHz, 18core)
4 3020 @ 168 Red Hat Enterprise Linux 7.1 KVM
Lenovo System x3850 X6
Intel E7-8890 v3 (2.5 GHz, 18core)
4 2655 @ 147 Red Hat Enterprise Linux 6.6 KVM
Huawei FusionServer RH2288H V3
Intel E5-2699 v3 (2.3 GHz, 18core)
2 1616 @ 95 Huawei FusionSphere V1R5C10
HP ProLiant DL360 Gen9
Intel E5-2699 v3 (2.3 GHz, 18core)
2 1614 @ 95 Red Hat Enterprise Linux 7.1 KVM
IBM Power S824
POWER8 (3.5 GHz, 6core)
4 1370 @ 79 PowerVM Enterprise Edition 2.2.3

Configuration Summary

System Under Test Highlights:

Hardware:
1 x SPARC T7-2 server, with
2 x 4.13 GHz SPARC M7
1 TB memory
2 Sun Dual Port 10GBase-T Adapter
2 Sun Storage Dual 16 Gb Fibre Channel PCIe Universal HBA

Software:
Oracle Solaris 11.3
Oracle VM Server for SPARC 3.3 (LDom)
Oracle Solaris Zones
Oracle iPlanet Web Server 7.0.20
Oracle PHP 5.3.29
Dovecot v2.2.18
Oracle WebLogic Server Standard Edition Release 10.3.6
Oracle Database 12c Enterprise Edition (12.1.0.2.0)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.7.0_85-b15

Storage:
3 x Oracle Server X5-2L, with
2 x Intel Xeon Processor E5-2630 v3 8-core 2.4 GHz
32 GB memory
4 x Oracle Flash Accelerator F160 PCIe Card
Oracle Solaris 11.3

1 x Oracle Server X5-2L, with
2 x Intel Xeon Processor E5-2630 v3 8-core 2.4 GHz
32 GB memory
4 x Oracle Flash Accelerator F160 PCIe Card
4x 400 GB SSDs
Oracle Solaris 11.3

Benchmark Description

SPECvirt_sc2013 is SPEC's updated benchmark addressing performance evaluation of datacenter servers used in virtualized server consolidation. SPECvirt_sc2013 measures the end-to-end performance of all system components including the hardware, virtualization platform, and the virtualized guest operating system and application software. It utilizes several SPEC workloads representing applications that are common targets of virtualization and server consolidation. The workloads were made to match a typical server consolidation scenario of CPU resource requirements, memory, disk I/O, and network utilization for each workload. These workloads are modified versions of SPECweb2005, SPECjAppServer2004, SPECmail2008, and SPEC CPU2006. The client-side SPECvirt_sc2013 harness controls the workloads. Scaling is achieved by running additional sets of virtual machines, called "tiles", until overall throughput reaches a peak.

Key Points and Best Practices

  • The SPARC T7-2 server running the Oracle Solaris 11.3, utilizes embedded virtualization products as the Oracle VM Server for SPARC and Oracle Solaris Zones, which provide a low overhead, flexible, scalable and manageable virtualization environment.

  • In order to provide a high level of data integrity and availability, all the benchmark data sets are stored on mirrored (RAID1) storage

  • Using Oracle VM Server for SPARC to bind the SPARC M7 processor with its local memory optimized the memory use in this virtual environment.

  • This improved result used a fractional tile to fully saturate the system.

See Also

Disclosure Statement

SPEC and the benchmark name SPECvirt_sc are registered trademarks of the Standard Performance Evaluation Corporation. Results from www.spec.org as of 11/19/2015. SPARC T7-2, SPECvirt_sc2013 3198@179 VMs; HP ProLiant DL580 Gen9, SPECvirt_sc2013 3020@168 VMs; Lenovo x3850 X6; SPECvirt_sc2013 2655@147 VMs; Huawei FusionServer RH2288H V3, SPECvirt_sc2013 1616@95 VMs; HP ProLiant DL360 Gen9, SPECvirt_sc2013 1614@95 VMs; IBM Power S824, SPECvirt_sc2013 1370@79 VMs.

Friday Nov 13, 2015

SPECjbb2015: SPARC T7-1 World Record for 1 Chip Result

Updated November 30, 2015 to point to published results and add latest, best x86 two-chip result.

Oracle's SPARC T7-1 server, using Oracle Solaris and Oracle JDK, produced world record one-chip SPECjbb2015 benchmark (MultiJVM metric) results beating all previous one- and two-chip results in the process. This benchmark was designed by the industry to showcase Java performance in the Enterprise. Performance is expressed in terms of two metrics, max-jOPS which is the maximum throughput number, and critical-jOPS which is critical throughput under service level agreements (SLAs).

  • The SPARC T7-1 server achieved 120,603 SPECjbb2015-MultiJVM max-jOPS and 60,280 SPECjbb2015-MultiJVM critical-jOPS on the SPECjbb2015 benchmark.

  • The one-chip SPARC T7-1 server delivered 2.5 times more max-jOPS performance per chip than the best two-chip result which was run on the Cisco UCS C220 M4 server using Intel v3 processors. The SPARC T7-1 server also produced 4.3 times more critical-jOPS performance per chip compared to the Cisco UCS C220 M4. The Cisco result enabled the COD BIOS option.

  • The SPARC T7-1 server delivered 2.7 times more max-jOPS performance per chip than the IBM Power S812LC using POWER8 chips. The SPARC T7-1 server also produced 4.6 times more critical-jOPS performance per chip compared to the IBM server. The SPARC M7 processor also delivered 1.45 times more critical-jOPS performance per core than IBM POWER8 processor.

  • The one-chip SPARC T7-1 server delivered 3 times more max-jOPS performance per chip than the two-chip result on the Lenovo Flex System x240 M5 using Intel v3 processors. The SPARC T7-1 server also produced 2.8 times more critical-jOPS performance per chip compared to the Lenovo. The Lenovo result did not enable the COD BIOS option.

  • The SPARC T5-2 server achieved 80,889 SPECjbb2015-MultiJVM max-jOPS and 37,422 SPECjbb2015-MultiJVM critical-jOPS on the SPECjbb2015 benchmark.

  • The one-chip SPARC T7-1 server demonstrated a 3 times max-jOPS performance improvement per chip compared to the previous generation two-chip SPARC T5-2 server.

From SPEC's press release: "The SPECjbb2015 benchmark is based on the usage model of a worldwide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations. It exercises Java 7 and higher features, using the latest data formats (XML), communication using compression, and secure messaging."

The Cluster on Die (COD) mode is a BIOS setting that effectively splits the chip in half, making the operating system think it has twice as many chips as it does (in this case, four, 9 core chips). Intel has said that COD is appropriate only for highly NUMA optimized workloads . Dell has shown that there is a 3.7x slower bandwidth to the other half of the chip split by COD.

Performance Landscape

One- and two-chip results of SPECjbb2015 MultiJVM from www.spec.org as of November 30, 2015.

SPECjbb2015
One- and Two-Chip Results
System SPECjbb2015-MultiJVM OS JDK Notes
max-jOPS critical-jOPS
SPARC T7-1
1 x SPARC M7
(4.13 GHz, 1x 32core)
120,603 60,280 Oracle Solaris 11.3 8u66 -
Cisco UCS C220 M4
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
97,551 28,318 Red Hat 6.5 8u60 COD
Dell PowerEdge R730
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
94,903 29,033 SUSE 12 8u60 COD
Cisco UCS C220 M4
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
92,463 31,654 Red Hat 6.5 8u60 COD
Lenovo Flex System x240 M5
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
80,889 43,654 Red Hat 6.5 8u60 -
SPARC T5-2
2 x SPARC T5
(3.6 GHz, 2x 16core)
80,889 37,422 Oracle Solaris 11.2 8u66 -
Oracle Server X5-2L
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
76,773 26,458 Oracle Solaris 11.2 8u60 -
Sun Server X4-2
2 x Intel E5-2697 v2
(2.7 GHz, 2x 12core)
52,482 19,614 Oracle Solaris 11.1 8u60 -
HP ProLiant DL120 Gen9
1 x Intel Xeon E5-2699 v3
(2.3 GHz, 18core)
47,334 9,876 Red Hat 7.1 8u51 -
IBM Power S812LC
1 x POWER8
(2.92 GHz, 10core)
44,883 13,032 Ubuntu 14.04.3 J9 VM -

* Note COD: result uses non-default BIOS setting of Cluster on Die (COD) which splits the chip in two. This requires specific NUMA optimization, in that memory traffic to the other half of the chip can see a 3.7x decrease in bandwidth

Configuration Summary

Systems Under Test:

SPARC T7-1
1 x SPARC M7 processor (4.13 GHz)
512 GB memory (16 x 32 GB dimms)
Oracle Solaris 11.3 (11.3.1.5.0)
Java HotSpot 64-Bit Server VM, version 1.8.0_66

SPARC T5-2
2 x SPARC T5 processors (3.6 GHz)
512 GB memory (32 x 16 GB dimms)
Oracle Solaris 11.2
Java HotSpot 64-Bit Server VM, version 1.8.0_66

Benchmark Description

The benchmark description, as found at the SPEC website.

The SPECjbb2015 benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community.

Features include:

  • A usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases and data-mining operations.
  • Both a pure throughput metric and a metric that measures critical throughput under service level agreements (SLAs) specifying response times ranging from 10ms to 100ms.
  • Support for multiple run configurations, enabling users to analyze and overcome bottlenecks at multiple layers of the system stack, including hardware, OS, JVM and application layers.
  • Exercising new Java 7 features and other important performance elements, including the latest data formats (XML), communication using compression, and messaging with security.
  • Support for virtualization and cloud environments.

Key Points and Best Practices

  • For the SPARC T5-2 server results, processor sets were use to isolate the different JVMs used during the test.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results from http://www.spec.org as of 11/30/2015. SPARC T7-1 120,603 SPECjbb2015-MultiJVM max-jOPS, 60,280 SPECjbb2015-MultiJVM critical-jOPS; Cisco UCS C220 M4 97,551 SPECjbb2015-MultiJVM max-jOPS, 28,318 SPECjbb2015-MultiJVM critical-jOPS; Dell PowerEdge R730 94,903 SPECjbb2015-MultiJVM max-jOPS, 29,033 SPECjbb2015-MultiJVM critical-jOPS; Cisco UCS C220 M4 92,463 SPECjbb2015-MultiJVM max-jOPS, 31,654 SPECjbb2015-MultiJVM critical-jOPS; Lenovo Flex System x240 M5 80,889 SPECjbb2015-MultiJVM max-jOPS, 43,654 SPECjbb2015-MultiJVM critical-jOPS; SPARC T5-2 80,889 SPECjbb2015-MultiJVM max-jOPS, 37,422 SPECjbb2015-MultiJVM critical-jOPS; Oracle Server X5-2L 76,773 SPECjbb2015-MultiJVM max-jOPS, 26,458 SPECjbb2015-MultiJVM critical-jOPS; Sun Server X4-2 52,482 SPECjbb2015-MultiJVM max-jOPS, 19,614 SPECjbb2015-MultiJVM critical-jOPS; HP ProLiant DL120 Gen9 47,334 SPECjbb2015-MultiJVM max-jOPS, 9,876 SPECjbb2015-MultiJVM critical-jOPS; IBM Power S812LC 44,883 SPECjbb2015-MultiJVM max-jOPS, 13,032 SPECjbb2015-MultiJVM critical-jOPS.

Monday Oct 26, 2015

Real-Time Enterprise: SPARC T7-1 Faster Than x86 E5 v3

A goal of the modern business is real-time enterprise where analytics are run simultaneously with transaction processing on the same system to provide the most effective decision making. Oracle Database 12c Enterprise Edition utilizing the In-Memory option is designed to have the same database able to perform transactions at the highest performance and to transform analytical calculations that once took days or hours to complete orders of magnitude faster.

Oracle's SPARC M7 processor has deep innovations to take the real-time enterprise to the next level of performance. In this test both OLTP transactions and analytical queries were run in a single database instance using all of the same features of Oracle Database 12c Enterprise Edition utilizing the In-Memory option in order to compare the advantages of the SPARC M7 processor compared to a generic x86 processor. On both systems the OLTP and analytical queries both took about half of the processing load of the server.

In this test Oracle's SPARC T7-1 server is compared to a two-chip x86 E5 v3 based server. On analytical queries the SPARC M7 processor is 8.2x faster than the x86 E5 v3 processor. Simultaneously on OLTP transactions the SPARC M7 processor is 2.9x faster than the x86 E5 v3 processor. In addition, the SPARC T7-1 server had better OLTP transactional response time than the x86 E5 v3 server.

The SPARC M7 processor does this by using the Data Accelerator co-processor (DAX). DAX is not a SIMD instruction set, but rather an actual co-processor that offloads in-memory queries which frees the cores up for other processing. The DAX has direct access to the memory bus and can execute scans at near full memory bandwidth. Oracle makes the DAX API available to other applications, so this kind of acceleration is not just to the Oracle database, it is open.

The results below were obtained running a set of OLTP transactions and analytic queries simultaneously against two schema: a real time online orders system and a related historical orders schema configured as a real cardinality database (RCDB) star schema. The in-memory analytics RCDB queries are executed using the Oracle Database 12c In-Memory columnar feature.

  • The SPARC T7-1 server and the x86 E5 v3 server both ran OLTP transactions and the in-memory analytics on the same database instance using Oracle Database 12c Enterprise Edition utilizing the In-Memory option.

  • The SPARC T7-1 server ran the in-memory analytics RCDB based queries 8.2x faster per chip than a two-chip x86 E5 v3 server on the 48 stream test.

  • The SPARC T7-1 server delivers 2.9x higher OLTP transaction throughput results per chip than a two-chip x86 E5 v3 server on the 48 stream test.

Performance Landscape

The table below compares the SPARC T7-1 server and 2-chip x86 E5 v3 server while running OLTP and in-memory analytics against tables in the same database instance. The same set of transactions and queries were executed on each system.

Real-Time Enterprise Performance Chart
48 RCDB DSS Streams, 224 OLTP users
System OLTP Transactions Analytic Queries
Trans Per
Second
Per Chip
Advantage
Average
Response Time
Queries Per
Minute
Per Chip
Advantage
SPARC T7-1
1 x SPARC M7 (32core)
338 K 2.9x 11 (msec) 267 8.2x
x86 E5 v3 server
2 x Intel E5-2699 v3 (2x 18core)
236 K 1.0 12 (msec) 65 1.0

The number of cores listed is per chip.
The Per Chip Advantage it computed by normalizing to a single chip's performance

Configuration Summary

SPARC Server:

1 X SPARC T7-1 server
1 X SPARC M7 processor
256 GB Memory
Oracle Solaris 11.3
Oracle Database 12c Enterprise Edition Release 12.1.0.2.10

x86 Server:

1 X Oracle Server X5-2L
2 X Intel Xeon Processor E5-2699 v3
256 GB Memory
Oracle Linux 6 Update 5 (3.8.13-16.2.1.el6uek.x86_64)
Oracle Database 12c Enterprise Edition Release 12.1.0.2.10

Benchmark Description

The Real-Time Enterprise benchmark simulates the demands of customers who want to simultaneously run both their OLTP database and the related historical warehouse DSS data that would be based on that OLTP data. It answers the question of how a system will perform when doing data analysis while at the same time executing real-time on-line transactions.

The OLTP workload simulates an Order Inventory System that exercises both reads and writes with a potentially large number of users that stresses the lock management and connectivity, as well as, database access.

The number of customers, orders and users is fully parametrized. This benchmark is base on 100 GB dataset, 15 million customers, 600 million orders and up to 580 users. The workload consists of a number of transaction types including show-expenses, part-cost, supplier-phone, low-inv, high-inv, update-price, update-phone, update-cost, and new-order.

The real cardinality database (RCDB) schema was created to showcase the potential speedup one may see moving from on disk, row format data warehouse/Star Schema, to utilizing Oracle Database 12c's In-Memory feature for analytical queries.

The workload consists of as many as 2,304 unique queries asking questions such as "In 2014, what was the total revenue of single item orders", or "In August 2013, how many orders exceeded a total price of $50". Questions like these can help a company see where to focus for further revenue growth or identify weaknesses in their offerings.

RCDB scale factor 1050 represents a 1.05 TB data warehouse. It is transformed into a star schema of 1.0 TB, and then becomes 110 GB in size when loaded in memory. It consists of 1 fact table, and 4 dimension tables with over 10.5 billion rows. There are 56 columns with most cardinalities varying between 5 and 2,000, a primary key being an example of something outside this range.

Two reports are generated: one for the OLTP-Perf workload and one for the RCDB DSS workload. For the analytical DSS workload, queries per minute and average query elapsed times are reported. For the OLTP-Perf workload, both transactions-per-seconds in thousands and OLTP average response times in milliseconds are reported.

Key Points and Best Practices

  • This benchmark utilized the SPARC M7 processor's co-processor DAX for query acceleration.

  • All SPARC T7-1 server results were run with out-of-the-box tuning for Oracle Solaris.

  • All Oracle Server X5-2L system results were run with out of the box tunings for Oracle Linux except for the setting in /etc/sysctl.conf to get large pages for the Oracle Database:

    • vm.nr_hugepages=98304

  • To create an in memory area, the following was added to the init.ora:

      inmemory_size = 120g

  • An example of how to set a table to be in memory is below:

      ALTER TABLE CUSTOMER INMEMORY MEMCOMPRESS FOR QUERY HIGH

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

In-Memory Database: SPARC T7-1 Faster Than x86 E5 v3

Fast analytics on large databases are critical to transforming key business processes. Oracle's SPARC M7 processors are specifically designed to accelerate in-memory analytics using Oracle Database 12c Enterprise Edition utilizing the In-Memory option. The SPARC M7 processor outperforms an x86 E5 v3 chip by up to 10.8x on analytics queries. In order to test real world deep analysis on the SPARC M7 processor a scenario with over 2,300 analytical queries was run against a real cardinality database (RCDB) star schema. This benchmark was audited by Enterprise Strategy Group (ESG). ESG is an IT research, analyst, strategy, and validation firm focused on the global IT community.

The SPARC M7 processor does this by using Data Accelerator co-processor (DAX). DAX is not a SIMD instruction but rather an actual co-processor that offloads in-memory queries which frees the cores up for other processing. The DAX has direct access to the memory bus and can execute scans at near full memory bandwidth. Oracle makes the DAX API available to other applications, so this kind of acceleration not just for the Oracle database, it is open.

  • The SPARC M7 processor delivers up to a 10.8x Query Per Minute speedup per chip over the Intel Xeon Processor E5-2699 v3 when executing analytical queries using the In-Memory option of Oracle Database 12c.

  • Oracle's SPARC T7-1 server delivers up to a 5.4x Query Per Minute speedup over the 2-chip x86 E5 v3 server when executing analytical queries using the In-Memory option of Oracle Database 12c.

  • The SPARC T7-1 server delivers over 143 GB/sec of memory bandwidth which is up to 7x more than the 2-chip x86 E5 v3 server when the Oracle Database 12c is executing the same analytical queries against the RCDB.

  • The SPARC T7-1 server scanned over 48 billion rows per second through the database.

  • The SPARC T7-1 server compresses the on-disk RCDB star schema by around 6x when using the Memcompress For Query High setting (more information following below) and by nearly 10x compared to a standard data warehouse row format version of the same database.

Performance Landscape

The table below compares the SPARC T7-1 server and 2-chip x86 E5 v3 server. The x86 E5 v3 server single chip compares are from actual measurements against a single chip configuration.

The number of cores is per chip, multiply by number of chips to get system total.

RCDB Performance Chart
2,304 Queries
System Elapsed
Seconds
Queries Per
Minute
System
Adv
Chip
Adv
DB Memory
Bandwidth
SPARC T7-1
1 x SPARC M7 (32core)
381 363 5.4x 10.8x 143 GB/sec
x86 E5 v3 server
2 x Intel E5-2699 v3 (2x 18core)
2059 67 1.0x 2.0x 20 GB/sec
x86 E5 v3 server
1 x Intel E5-2699 v3 (18core)
4096 34 0.5x 1.0x 10 GB/sec

Fused Decompress + Scan

The In-Memory feature of Oracle Database 12c puts tables in columnar format. There are different levels of compression that can be applied. One of these is Oracle Zip (OZIP) which is used with the "MEMCOMPRESS FOR QUERY HIGH" setting. Typically when compression is applied to data, in order to operate on it, the data must be:

    (1) Decompressed
    (2) Written back to memory in uncompressed form
    (3) Scanned and the results returned.

When OZIP is applied to the data inside of an In-Memory Columnar Unit (or IMCU, an N sized chunk of rows), the DAX is able to take this data in its compressed format and operate (scan) directly upon it, returning results in a single step. This not only saves on compute power by not having the CPU do the decompression step, but also on memory bandwidth as the uncompressed data is not put back into memory. Only the results are returned. To illustrate this, a microbenchmark was used which measured the amount of rows that could be scanned per second.

SAE hpk-uperf

Compression

This performance test was run on a Scale Factor 1750 database, which represents a 1.75 TB row format data warehouse. The database is then transformed into a star schema which ends up around 1.1 TB in size. The star schema is then loaded in memory with a setting of "MEMCOMPRESS FOR QUERY HIGH", which focuses on performance with somewhat more aggressive compression. This memory area is a separate part of the System Global Area (SGA) which is defined by the database initialization parameter "inmemory_size". See below for an example. Here is a breakdown of each table in memory with compression ratios.

Column Name Original Size
(Bytes)
In Memory
Size (Bytes)
Compression
Ratio
LINEORDER 1,103,524,528,128 178,586,451,968 6.2x
DATE 11,534,336 1,179,648 9.8x
PART 11,534,336 1,179,648 9.8x
SUPPLIER 11,534,336 1,179,648 9.8x
CUSTOMER 11,534,336 1,179,648 9.8x

Configuration Summary

SPARC Server:

1 X SPARC T7-1 server
1 X SPARC M7 processor
512 GB memory
Oracle Solaris 11.3
Oracle Database 12c Enterprise Edition Release 12.1.0.2.13

x86 Server:

1 X Oracle Server X5-2L
2 X Intel Xeon Processor E5-2699 v3
512 GB memory
Oracle Linux 6 Update 5 (3.8.13-16.2.1.el6uek.x86_64)
Oracle Database 12c Enterprise Edition Release 12.1.0.2.13

Benchmark Description

The real cardinality database (RCDB) benchmark was created to showcase the potential speedup one may see moving from on disk, row format data warehouse/Star Schema, to utilizing Oracle Database 12c's In-Memory feature for analytical queries.

The workload consists of 2,304 unique queries asking questions such as "In 2014, what was the total revenue of single item orders", or "In August 2013, how many orders exceeded a total price of $50". Questions like these can help a company see where to focus for further revenue growth or identify weaknesses in their offerings.

RCDB scale factor 1750 represents a 1.75 TB data warehouse. It is transformed into a star schema of 1.1 TB, and then becomes 179 GB in size when loaded in memory. It consists of 1 fact table, and 4 dimension tables with over 10.5 billion rows. There are 56 columns with most cardinalities varying between 5 and 2,000, a primary key being an example of something outside this range.

One problem with many industry standard generated databases is that as they have grown in size the cardinalities for the generated columns have become exceedingly unrealistic. For instance one industry standard benchmark uses a schema where at scale factor 1 TB it calls for the number of parts to be SF * 800,000. A 1 TB database that calls for 800 million unique parts is not very realistic. Therefore RCDB attempts to take some of these unrealistic cardinalities and size them to be more representative of at least a section of customer data. Obviously one cannot encompass every database in one schema, this is just an example.

We carefully scaled each system so that the optimal number of users was run on each system under test so that we did not create artificial bottlenecks. Each user ran an equal number of queries and the same queries were run on each system, allowing for a fair comparison of the results.

Key Points and Best Practices

  • This benchmark utilized the SPARC M7 processor's co-processor DAX for query acceleration.

  • All SPARC T7-1 server results were run with out of the box tuning for Oracle Solaris.

  • All Oracle Server X5-2L system results were run with out of the box tunings for Oracle Linux except for the setting in /etc/sysctl.conf to get large pages for the Oracle Database:

    • vm.nr_hugepages=64520

  • To create an in memory area, the following was added to the init.ora:

      inmemory_size = 200g

  • An example of how to set a table to be in memory is below:

      ALTER TABLE CUSTOMER INMEMORY MEMCOMPRESS FOR QUERY HIGH

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

In-Memory Aggregation: SPARC T7-2 Beats 4-Chip x86 E7 v2

Oracle's SPARC T7-2 server demonstrates better performance both in throughput and number of users compared to a four-chip x86 E7 v2 sever. The workload consists of a realistic set of business intelligence (BI) queries in a multi-user environment against a 500 million row fact table using Oracle Database 12c Enterprise Edition utilizing the In-Memory option.

  • The SPARC M7 chip delivers 2.3 times more query throughput per hour compared to an x86 E7 v2 chip.

  • The two-chip SPARC T7-2 server delivered 13% more query throughput per hour compared to a four-chip x86 E7 v2 server.

  • The two-chip SPARC T7-2 server supported over 10% more users than a four-chip x86 E7 v2 server.

  • Both the SPARC server and x86 server ran with just under 5 second average response time.

Performance Landscape

The results below were run as part of this benchmark. All results use 500,000,000 fact table rows and had average cpu utilization of 100%.

In-Memory Aggregation
500 Million Row Fact Table
System Users Queries
per Hour
Queries per Hour
per Chip
Average
Response Time
SPARC T7-2
2 x SPARC M7 (32core)
190 127,540 63,770 4.99 (sec)
x86 E7 v2
4 x E7-8895 v2 (4x 15core)
170 112,470 28,118 4.92 (sec)

The number of cores are listed per chip.

Configuration Summary

SPARC Configuration:

SPARC T7-2
2 x 4.13 GHz SPARC M7 processors
1 TB memory (32 x 32 GB)
Oracle Solaris 11.3
Oracle Database 12c Enterprise /Edition (12.1.0.2.0)

x86 Configuration:

Sun Server X4-4
4 x Intel Xeon Processor E7-8895 v2 processors
1 TB memory (64 x 16 GB)
Oracle Linux Server 6.5 (kernel 2.6.32-431.el6.x86_64)
Oracle Database 12c Enterprise /Edition (12.1.0.2.0)

Benchmark Description

The benchmark is designed to highlight the efficacy of the Oracle Database 12c In-Memory Aggregation facility (join and aggregation optimizations) together with the fast scan and filtering capability of Oracle's in-memory column store facility.

The benchmark runs analytic queries such as those seen in typical customer business intelligence (BI) applications. These are done in the context of a star schema database. The key metrics are query throughput, number of users and average response times

The implementation of the workload used to achieve the results is based on a schema consisting of 9 dimension tables together with a 500 million row fact table.

The query workload consists of randomly generated star-style queries simulating a collection of ad-hoc business intelligence users. Up to 300 concurrent users have been run, with each user running approximately 500 queries. The implementation includes a relatively small materialized view, which contains some precomputed data. The creation of the materialized view takes only a few minutes.

Key Points and Best Practices

The reported results were obtained by using the following settings on both systems except where otherwise noted:

  1. starting with a completely cold shared pool
  2. without making use of the result cache
  3. without using dynamic sampling or adaptive query optimization
  4. running all queries in parallel, where
    parallel_max_servers = 1600 (on the SPARC T7-2) or
    parallel_max_servers = 240 (on the Sun Server X4-4)
    each query hinted with PARALLEL(4)
    parallel_degree_policy = limited
  5. having appropriate queries rewritten to the materialized view, MV3, defined as
    SELECT
    /*+ append vector_transform */
    d1.calendar_year_name, d1.calendar_quarter_name, d2.all_products_name,
    d2.department_name, d2.category_name, d2.type_name, d3.all_customers_name,
    d3.region_name, d3.country_name, d3.state_province_name, d4.all_channels_name,
    d4.class_name, d4.channel_name, d5.all_ages_name, d5.age_name, d6.all_sizes_name,
    d6.household_size_name, d7.all_years_name, d7.years_customer_name, d8.all_incomes_name,
    d8.income_name, d9.all_status_name, d9.marital_status_name,
    SUM(f.sales) AS sales,
    SUM(f.units) AS units,
    SUM(f.measure_3) AS measure_3,
    SUM(f.measure_4) AS measure_4,
    SUM(f.measure_5) AS measure_5,
    SUM(f.measure_6) AS measure_6,
    SUM(f.measure_7) AS measure_7,
    SUM(f.measure_8) AS measure_8,
    SUM(f.measure_9) AS measure_9,
    SUM(f.measure_10) AS measure_10
    FROM time_dim d1, product_dim d2, customer_dim_500M_10 d3, channel_dim d4, age_dim d5,
    household_size_dim d6, years_customer_dim d7, income_dim d8, marital_status_dim d9,
    units_fact_500M_10 f
    WHERE d1.day_id = f.day_id AND
    d2.item_id = f.item_id AND
    d3.customer_id = f.customer_id AND
    d4.channel_id = f.channel_id AND
    d5.age_id = f.age_id AND
    d6.household_size_id = f.household_size_id AND
    d7.years_customer_id = f.years_customer_id AND
    d8.income_id = f.income_id AND
    d9.marital_status_id = f.marital_status_id
    GROUP BY d1.calendar_year_name, d1.calendar_quarter_name, d2.all_products_name,
    d2.department_name, d2.category_name, d2.type_name, d3.all_customers_name,
    d3.region_name, d3.country_name, d3.state_province_name, d4.all_channels_name,
    d4.class_name, d4.channel_name, d5.all_ages_name, d5.age_name, d6.all_sizes_name,
    d6.household_size_name, d7.all_years_name, d7.years
    

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of October 25, 2015.

Memory and Bisection Bandwidth: SPARC T7 and M7 Servers Faster Than x86 and POWER8

The STREAM benchmark measures delivered memory bandwidth on a variety of memory intensive tasks. Delivered memory bandwidth is key to a server delivering high performance on a wide variety of workloads. The STREAM benchmark is typically run where each chip in the system gets its memory requests satisfied from local memory. This report presents performance of Oracle's SPARC M7 processor based servers and compares their performance to x86 and IBM POWER8 servers.

Bisection bandwidth on a server is a measure of the cross-chip data bandwidth between the processors of a system where no memory access is local to the processor. Systems with large cross-chip penalties show dramatically lower bisection bandwidth. Real-world ad hoc workloads tend to perform better on systems with better bisection bandwidth because their memory usage characteristics tend to be chaotic.

IBM says the sustained or delivered bandwidth of the IBM POWER8 12-core chip is 230 GB/s. This number is a peak bandwidth calculation: 230.4 GB/sec = 9.6 GHz * 3 (r+w) * 8 byte. A similar calculation is used by IBM for the POWER8 dual-chip-module (two 6-core chips) to show a sustained or delivered bandwidth of 192 GB/sec (192.0 GB/sec = 8.0 GHz * 3 (r+w) * 8 byte). Peaks are the theoretical limits used for marketing hype, but true measured delivered bandwidth is the only useful comparison to help one understand delivered performance of real applications.

The STREAM benchmark is easy to run and anyone can measure memory bandwidth on a target system (see Key Points and Best Practices section).

  • The SPARC M7-8 server delivers over 1 TB/sec on the STREAM benchmark. This is over 2.4 times the triad bandwidth of an eight-chip x86 E7 v3 server.

  • The SPARC T7-4 delivered 2.2 times the STREAM triad bandwidth of a four-chip x86 E7 v3 server and 1.7 times the triad bandwidth of a four-chip IBM Power System S824 server.

  • The SPARC T7-2 delivered 2.5 times the STREAM triad bandwidth of a two-chip x86 E5 v3 server.

  • The SPARC M7-8 server delivered over 8.5 times the triad bisection bandwidth of an eight-chip x86 E7 v3 server.

  • The SPARC T7-4 server delivered over 2.7 times the triad bisection bandwidth of a four-chip x86 E7 v3 server and 2.3 times the triad bisection bandwidth of a four-chip IBM Power System S824 server.

  • The SPARC T7-2 server delivered over 2.7 times the triad bisection bandwidth of a two-chip x86 E5 v3 server.

Performance Landscape

The following SPARC, x86, and IBM S824 STREAM results were run as part of this benchmark effort. The IBM S822L result is from the referenced web location. The following SPARC results were all run using 32 GB dimms.

Maximum STREAM Benchmark Performance
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC M7-8 8 995,402 995,727 1,092,742 1,086,305
x86 E7 v3 8 346,771 354,679 445,550 442,184
SPARC T7-4 4 512,080 510,387 556,184 555,374
IBM S824 4 251,533 253,216 322,399 319,561
IBM S822L 4 252,743 247,314 295,556 305,955
x86 E7 v3 4 230,027 232,092 248,761 251,161
SPARC T7-2 2 259,198 259,380 285,835 285,905
x86 E5 v3 2 105,622 105,808 113,116 112,521
SPARC T7-1 1 131,323 131,308 144,956 144,706

All of the following bisection bandwidth results were run as part of this benchmark effort.

Bisection Bandwidth Benchmark Performance (Nonlocal STREAM)
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC M7-8 8 383,479 381,219 375,371 375,851
SPARC T5-8 8 172,195 172,354 250,620 250,858
x86 E7 v3 8 42,636 42,839 43,753 43,744
SPARC T7-4 4 142,549 142,548 142,645 142,729
SPARC T5-4 4 75,926 75,947 76,975 77,061
IBM S824 4 53,940 54,107 60,746 60,939
x86 E7 v3 4 41,636 47,740 51,206 51,333
SPARC T7-2 2 127,372 127,097 129,833 129,592
SPARC T5-2 2 91,530 91,597 91,761 91,984
x86 E5 v3 2 45,211 45,331 47,414 47,251

The following SPARC results were all run using 16 GB dimms.

SPARC T7 Servers – 16 GB DIMMS
Maximum STREAM Benchmark Performance
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC T7-4 4 520,779 521,113 602,137 600,330
SPARC T7-2 2 262,586 262,760 302,758 302,085
SPARC T7-1 1 132,154 132,132 168,677 168,654

Configuration Summary

SPARC Configurations:

SPARC M7-8
8 x SPARC M7 processors (4.13 GHz)
4 TB memory (128 x 32 GB dimms)

SPARC T7-4
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB dimms)
1 TB memory (64 x 16 GB dimms)

SPARC T7-2
2 x SPARC M7 processors (4.13 GHz)
1 TB memory (32 x 32 GB dimms)
512 GB memory (32 x 16 GB dimms)

SPARC T7-1
1 x SPARC M7 processor (4.13 GHz)
512 GB memory (16 x 32 GB dimms)
256 GB memory (16 x 16 GB dimms)

Oracle Solaris 11.3
Oracle Solaris Studio 12.4

x86 Configurations:

Oracle Server X5-8
8 x Intel Xeon Processor E7-8995 v3
2 TB memory (128 x 16 GB dimms)

Oracle Server X5-4
4 x Intel Xeon Processor E7-8995 v3
1 TB memory (64 x 16 GB dimms)

Oracle Server X5-2
2 x Intel Xeon Processor E5-2699 v3
256 GB memory (16 x 16 GB dimms)

Oracle Linux 7.1
Intel Parallel Studio XE Composer Version 2016 compilers

Benchmark Description

STREAM

The STREAM benchmark measures sustainable memory bandwidth (in MB/s) for simple vector compute kernels. All memory accesses are sequential, so a picture of how fast regular data may be moved through the system is portrayed. Properly run, the benchmark displays the characteristics of the memory system of the machine and not the advantages of running from the system's memory caches.

STREAM counts the bytes read plus the bytes written to memory. For the simple Copy kernel, this is exactly twice the number obtained from the bcopy convention. STREAM does this because three of the four kernels (Scale, Add and Triad) do arithmetic, so it makes sense to count both the data read into the CPU and the data written back from the CPU. The Copy kernel does no arithmetic, but, for consistency, counts bytes the same way as the other three.

The sequential nature of the memory references is the benchmark's biggest weakness. The benchmark does not expose limitations in a system's interconnect to move data from anywhere in the system to anywhere.

Bisection Bandwidth – Easy Modification of STREAM Benchmark

To test for bisection bandwidth, processes are bound to processors in sequential order. The memory is allocated in reverse order, so that the memory is placed non-local to the process. The benchmark is then run. If the system is capable of page migration, this feature must be turned off.

Key Points and Best Practices

The stream benchmark code was compiled for the SPARC M7 processor based systems with the following flags (using cc):

    -fast -m64 -W2,-Avector:aggressive -xautopar -xreduction -xpagesize=4m

The benchmark code was compiled for the x86 based systems with the following flags (Intel icc compiler):

    -O3 -m64 -xCORE-AVX2 -ipo -openmp -mcmodel=medium -fno-alias -nolib-inline

On Oracle Solaris, binding is accomplished with either setting the environment variable SUNW_MP_PROCBIND or the OpenMP variables OMP_PROC_BIND and OMP_PLACES.

    export OMP_NUM_THREADS=512
    export SUNW_MP_PROCBIND=0-511

On Oracle Linux systems using Intel compiler, binding is accomplished by setting the environment variable KMP_AFFINITY.

    export OMP_NUM_THREADS=72
    export KMP_AFFINITY='verbose,granularity=fine,proclist=[0-71],explicit'

The source code change in the file stream.c to do the reverse allocation

    <     for (j=STREAM_ARRAY_SIZE-1; j>=0; j--) { 
                a[j] = 1.0; 
                b[j] = 2.0; 
                c[j] = 0.0; 
            }
    ---
    >     for (j=0; j<STREAM_ARRAY_SIZE; j++) {
                a[j] = 1.0; 
                b[j] = 2.0; 
                c[j] = 0.0; 
            }
    

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Hadoop TeraSort: SPARC T7-4 Top Per-Chip Performance

Oracle's SPARC T7-4 server using virtualization delivered an outstanding single server result running the Hadoop TeraSort benchmark. The SPARC T7-4 server was run with and without security. Even the secure runs on the SPARC M7 processor based server performed much faster per chip compared to competitive unsecure results.

  • The SPARC T7-4 server on a per chip basis is 4.7x faster than an IBM POWER8 based cluster on the 10 TB Hadoop TeraSort benchmark.

  • The SPARC T7-4 server running with ZFS encryption enabled on the 10 TB Hadoop TeraSort benchmark is 4.6x faster than an unsecure x86 v2 cluster on a per chip basis.

  • The SPARC T7-4 server running with ZFS encryption (AES-256-GCM) enabled on the 10 TB Hadoop TeraSort benchmark is 4.3x faster than an unsecure (plain-text) IBM POWER8 cluster on a per chip basis.

  • The SPARC T7-4 server ran the 10 TB Hadoop TeraSort benchmark in 4,259 seconds.

Performance Landscape

The following table presents results for the 10 TB Hadoop TeraSort benchmark. The rate results are determined by taking the dataset size (10**13) and dividing by the time (in minutes). These rates are further normalized by the number of systems or chips used in obtaining the results.

10 TB Hadoop TeraSort Performance Landscape
System Security Nodes Total
Chips
Time
(sec)
Sort Rate (GB/min)
Per Node Per Chip
SPARC T7-4
SPARC M7 (4.13 GHz)
unsecure 1 4 4,259 140.9 35.2
SPARC T7-4
SPARC M7 (4.13 GHz)
AES-256-GCM 1 4 4,657 128.8 32.2
IBM Power System S822L
POWER8 (3.0 GHz)
unsecure 8 32 2,490 30.1 7.5
Dell R720xd/VMware
Intel Xeon E5-2680 v2 (2.8 GHz)
unsecure 32 64 1,054 17.8 8.9
Cisco UCS CPA C240 M3
Intel Xeon E5-2665 (2.4 GHz)
unsecure 16 32 3,112 12.0 6.0

Configuration Summary

Server:

SPARC T7-4
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB)
6 x 600 GB 10K RPM SAS-2 HDD
10 GbE
Oracle Solaris 11.3 (11.3.0.29)
Oracle Solaris Studio 12.4
Java SE Runtime Environment (build 1.7.0_85-b33)
Hadoop 1.2.1

External Storage (Common Multiprotocol SCSI TARget, or COMSTAR enables system to be seen as a SCSI target device):

16 x Sun Server X3-2L
2 x Intel Xeon E5-2609 (2.4 GHz)
16 GB memory (2 x 8 GB)
2 x 600 GB SAS-2 HDD
12 x 3 TB SAS-1 HDD
4 x Sun Flash Accelerator F40 PCIe Card
Oracle Solaris 11.1 (11.1.16.5.0)
Please note: These devices are only used as storage. No Hadoop is run on these COMSTAR storage nodes. There was no compression or encryption done on these COMSTAR storage nodes.

Benchmark Description

The Hadoop TeraSort benchmark sorts 100-byte records by a contained 10-byte random key. Hadoop TeraSort is characterized by high I/O bandwidth between each compute/data node of a Hadoop cluster and the disk drives that are attached to that node.

Note: benchmark size is measured by power-of-ten not power-of-two bytes; 1 TB sort is sorting 10^12 Bytes = 10 billion 100-byte rows using an embedded 10-Byte key field of random characters, 100 GB sort is sorting 10^11 Bytes = 1 billion 100-byte rows, etc.

Key Points and Best Practices

  • The SPARC T7-4 server was configured with 15 Oracle Solaris Zones. Each Zone was running one Hadoop data-node with HDFS layered on an Oracle Solaris ZFS volume.

  • Hadoop uses a distributed, shared nothing, batch processing framework employing divide-conquer serial Map and Reduce JVM tasks with performance coming from scale-out concurrency (e.g. more tasks) rather than parallelism. Only one job scheduler and task manager can be configured per data/compute-node and both (job scheduler and task manager) have inherent scaling limitations (the hadoop design target being small compute-nodes and hundreds or even thousands of them).

  • Multiple data-nodes significantly help improve overall system utilization – HDFS becomes more distributed with more processes servicing file system operations, and more task-trackers are managing all the MapReduce work.

  • On large node systems virtualization is required to improve utilization by increasing the number of independent data/compute nodes each running their own hadoop processes.

  • I/O bandwidth to the local disk drives and network communication bandwidth are the primary determinants of Hadoop performance. Typically, Hadoop reads input data files from HDFS during the Map phase of computation, and stores intermediate file back to HDFS. Then during the subsequent Reduce phase of computation, Hadoop reads the intermediate files, and outputs the final result. The Map and Reduce phases are executed concurrently by multiple Map tasks and Reduce tasks. Tasks are purpose-built stand-alone serial applications often written in Java (but can be written in any programming language or script).

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

Competitive results found at: Dell R720xd/VMware, IBM S822L, Cisco C240 M3

Yahoo Cloud Serving Benchmark: SPARC T7-4 With Oracle NoSQL Beats x86 E5 v3 Per Chip

Oracle's SPARC T7-4 server delivered 1.9 million ops/sec on 1.6 billion records for the Yahoo Cloud Serving Benchmark (YCSB) 95% read/5% update workload. Oracle NoSQL Database was used in these tests. NoSQL is important for Big Data Analysis and for Cloud Computing.

  • One processor performance on the SPARC T7-4 server was 2.5 times better than one chip Intel Xeon E5-2699 v3 for the YCSB 95% read/5% update workload.

  • The SPARC T7-4 server showed low average latency of 1.12 msec on read and 4.90 msec on write while achieving nearly 1.9 million ops/sec.

  • The SPARC T7-4 server delivered 325K inserts/sec on 1.6 billion records with a low average latency of 2.65 msec.

  • One processor performance on the SPARC T7-4 server was over half a million (511K ops/sec) on 400 million records for the YCSB 95% read/5% update workload.

  • Near-linear scaling from 1 to 4 processors was 3.7x while maintaining low latency.

These results show the SPARC T7-4 server can handle a large database while achieving high throughput with low latency for cloud computing.

Performance Landscape

This table presents single chip results comparing the SPARC M7 processor (in a SPARC T7-4 server) to the Intel Xeon Processor E5-2699 v3 (in a 2-socket x86 server). All of the following results were run as part of this benchmark effort.

Comparing Single Chip Performance on YCSB Benchmark
Processor Insert Mixed Load (95% Read/5% Update)
Throughput
ops/sec
Average Latency Throughput
ops/sec
Average Latency
Write msec Read msec Write msec
SPARC M7 89,617 2.40 510,824 1.07 3.80
E5-2699 v3 55,636 1.18 202,701 0.71 2.30

The following table shows the performance of the Yahoo Clouds Serving Benchmark on multiple processor counts on the SPARC T7-4 server.

SPARC T7-4 server running YCSB benchmark
CPUs Shards Insert Mixed Load (95% Read/5% Update)
Throughput
ops/sec
Average Latency Throughput
ops/sec
Average Latency
Write msec Read msec Write msec
4 16 325,167 2.65 1,890,394 1.12 4.90
3 12 251,051 2.57 1,428,813 1.12 4.68
2 8 170,963 2.52 968,146 1.11 4.37
1 4 89,617 2.40 510,824 1.07 3.80

Configuration Summary

System Under Test:

SPARC T7-4 server
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB)
8 x Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Emulex
8 x Sun Dual Port 10 GbE PCIe 2.0 Low Profile Adapter, Base-T

Oracle Server X5-2L server
2 x Intel Xeon E5-2699 v3 processors (2.3 GHz)
384 GB memory
1 x Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Emulex
1 x Sun Dual 10GbE SFP+ PCIe 2.0 Low Profile Adapter

External Storage (Common Multiprotocol SCSI TARget, or COMSTAR enables system to be seen as a SCSI target device):

16 x Sun Server X3-2L servers configured as COMSTAR nodes, each with
2 x Intel Xeon E5-2609 processors (2.4 GHz)
4 x Sun Flash Accelerator F40 PCIe Cards, 400 GB each
1 x 8 Gb dual port HBA
Please note: These devices are only used as storage. No NoSQL is run on these COMSTAR storage nodes. There is no query acceleration done on these COMSTAR storage nodes.

Software Configuration:

Oracle Solaris 11.3 (11.3.1.2.0)
Logical Domains Manager v3.3.0.0.17 (running on the SPARC T7-4)
Oracle NoSQL Database, Enterprise Edition 12c R1.3.2.5
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)

Benchmark Description

The Yahoo Cloud Serving Benchmark (YCSB) is a performance benchmark for cloud database and their systems. The benchmark documentation says:

    With the many new serving databases available including Sherpa, BigTable, Azure and many more, it can be difficult to decide which system is right for your application, partially because the features differ between systems, and partially because there is not an easy way to compare the performance of one system versus another. The goal of the Yahoo Cloud Serving Benchmark (YCSB) project is to develop a framework and common set of workloads for evaluating the performance of different "key-value" and "cloud" serving stores.

Key Points and Best Practices

  • The SPARC T7-4 server showed 3.7x scaling from 1 to 4 sockets while maintaining low latency.

  • Four Oracle VM for SPARC (LDom) servers were created per processor, for a total of sixteen LDoms. Each LDom was configured with 120 GB memory accessing two PCIe IO slots under SR-IOV (Single Root IO Virtualization).

  • The Sun Flash Accelerator F40 PCIe Card demonstrated excellent IO capability and performed 841K read IOPS (3.5K IOPS per disk) during the 1.9 million ops/sec benchmark run.

  • There was no performance loss from Fibre Channel SR-IOV (Single Root IO Virtualization) compared to native.

  • Balanced memory bandwidth was delivered across all four processors achieving an average total of 304 GB/sec during 1.9 million ops/sec run.

  • The 1.6 billion records were loaded into 16 Shards with the replication factor set to 3.

  • Each LDom is associated with a processor set (16 total). The default processor set was additionally used for OS and IO interrupts. The processors sets were used to ensure a balanced load.

  • Fixed priority class was assigned to Oracle NoSQL Storage Node java processes.

  • The ZFS record size was set to 16K (default 128K) and this worked the best for 95% read/5% update workload.

  • A total of eight Sun Server X4-2 and Sun Server X4-2L systems were used as clients for generating the workload.

  • The LDoms and client systems were connected through a 10 GbE network.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of Oct 25, 2015.

Graph PageRank: SPARC M7-8 Beats x86 E5 v3 Per Chip

Graph algorithms are used in many big data and analytics workloads. The report presents performance using the PageRank algorithm. Oracle's SPARC M7 processor based systems provide better performance than an x86 E5 v3 based system.

  • Oracle's SPARC M7-8 server was able to deliver 3.2 times faster per chip performance than a two-chip x86 E5 v3 server running a PageRank algorithm implemented using Parallel Graph AnalytiX (PGX) from Oracle Labs on a medium sized graph.

Performance Landscape

The graph used for these results has 41,652,230 nodes and 1,468,365,182 edges using 22 GB of memory. All of the following results were run as part of this benchmark effort. Performance is a measure of processing rate, bigger is better.

PageRank Algorithm
Server Performance SPARC Advantage
SPARC M7-8
8 x SPARC M7 (4.13 GHz, 8x 32core)
281.1 3.2x faster per chip
x86 E5 v3 server
2 x Intel E5-2699 v3 (2.3 GHz, 2x 18core)
22.2 1.0

The number of cores are per processor.

Configuration Summary

Systems Under Test:

SPARC M7-8 server with
4 x SPARC M7 processors (4.13 GHz)
4 TB memory
Oracle Solaris 11.3
Oracle Solaris Studio 12.4

Oracle Server X5-2 with
2 x Intel Xeon Processor E5-2699 v3 (2.3 GHz)
384 GB memory
Oracle Linux
gcc 4.7.4

Benchmark Description

Graphs are a core part of many analytics workloads. They are very data intensive and stress computers. Each algorithm typically traverses the entire graph multiple times, while doing certain arithmetic operations during the traversal, it can perform (double/single precision) floating point operations.

The mathematics of PageRank are entirely general and apply to any graph or network in any domain. Thus, PageRank is now regularly used in bibliometrics, social and information network analysis, and for link prediction and recommendation. The PageRank algorithm counts the number and quality of links to a page to determine a rough estimate of the importance of the website.

Key Points and Best Practices

  • This algorithm is implemented using PGX (Parallel Graph AnalytiX) from Oracle Labs, a fast, parallel, in-memory graph analytic framework.
  • The graph used for these results is based on real world data from Twitter and has 41,652,230 nodes and 1,468,365,182 edges using 22 GB of memory.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of October 25, 2015.

Graph Breadth First Search Algorithm: SPARC T7-4 Beats 4-Chip x86 E7 v2

Graph algorithms are used in many big data and analytics workloads. Oracle's SPARC T7 processor based systems provide better performance than x86 systems with the Intel Xeon E7 v2 family of processors.

  • The SPARC T7-4 server was able to deliver 3.1x better performance than a four chip x86 server running a breadth-first search (BFS) on a large graph.

Performance Landscape

The problem is identified by "Scale" and the approximate amount of memory used. Results are listed as edge traversals in billions (ETB) per second (bigger is better). The SPARC M7 processor results were run as part of this benchmark effort. The x86 results were taken from a previous benchmark effort.

Breadth-First Search (BFS)
Scale Dataset
(GB)
ETB/sec Speedup
T7-4/x86
SPARC T7-4 x86 (4xE7 v2)
30 580 1.68 0.54 3.1x
29 282 1.76 0.62 2.8x
28 140 1.70 0.99 1.7x
27 70 1.56 1.07 1.5x
26 35 1.67 1.19 1.4x

Configuration Summary

Systems Under Test:

SPARC T7-4 server with
4 x SPARC M7 processors (4.13 GHz)
1 TB memory
Oracle Solaris 11.3
Oracle Solaris Studio 12.4

Sun Server X4-4 system with
4 x Intel Xeon E7-8895 v2 processors (2.8 GHz)
1 TB memory
Oracle Solaris 11.2
Oracle Solaris Studio 12.4

Benchmark Description

Graphs are a core part of many analytics workloads. They are very data intensive and stress computers. This benchmark does a breadth-first search on a randomly generated graph. It reports the number of graph edges traversed (in billions) per second (ETB/sec). To generate the graph, the data generator from the graph500 benchmark was used.

A description of what breadth-first search is taken from Introduction to Algorithms, page 594:

Given a graph G = (V, E) and a distinguished source vertex s, breadth-first search systematically explores the edges of G to "discover" every vertex that is reachable from s. It computes the distance (smallest number of edges) from s to each reachable vertex. It also produces a "breadth-first tree" with root s that contains all reachable vertices. For any vertex reachable from s, the simple path in the breadth-first tree from s to corresponds to a "shortest path" from s to in G, that is, a path containing the smallest number of edges. The algorithm works on both directed and undirected graphs.

Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L., Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of October 25, 2015.

Neural Network Models Using Oracle R Enterprise: SPARC T7-4 Beats 4-Chip x86 E7 v3

Oracle's SPARC T7-4 server executing neural network algorithms using Oracle R Enterprise (ORE) is up to two times faster than a four-chip x86 E7 v3 server.

  • For a neural network with two hidden layers, 10-neuron with 5-neuron hyperbolic tangent, the SPARC T7-4 server is 1.5 times faster than a four-chip x86 T7 v3 server on calculation time.

  • For a neural network with two hidden layers, 20-neuron with 10-neuron hyperbolic tangent, the SPARC T7-4 server is 2.0 times faster than than a four-chip x86 T7 v3 server on calculation time.

Performance Landscape

Oracle Enterprise R Statistics in Oracle Database
(250 million rows)
Neural Network
with Two Hidden Layers
Elapsed Calculation Time SPARC Advantage
4-chip x86 E7 v3 SPARC T7-4
10-neuron + 5-neuron
hyperbolic tangent
520.1 (sec) 337.3 (sec) 1.5x
20-neuron + 10-neuron
hyperbolic tangent
1128.4 (sec) 578.1 (sec) 2.0x

Configuration Summary

SPARC Configuration:

SPARC T7-4
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB dimms)
Oracle Solaris 11.3
Oracle Database 12c Enterprise Edition
Oracle R Enterprise 1.5
Oracle Solaris Studio 12.4 with 4/15 patch set

x86 Configuration:

Oracle Server X5-4
4 x Intel Xeon Processor E7-8895 v3 (2.6 GHz)
512 GB memory
Oracle Linux 6.4
Oracle Database 12c Enterprise Edition
Oracle R Enterprise 1.5

Storage Configuration:

Oracle Server X5-2L
2 x Intel Xeon Processor E5-2699 v3
512 GB memory
4 x 1.6 TB 2.5-inch NVMe PCIe 3.0 SSD
2 x Sun Storage Dual 16Gb FC PCIe HBA
Oracle Solaris 11.3

Benchmark Description

The benchmark is designed to run various statistical analyses using Oracle R Enterprise (ORE) with historical aviation data. The size of the benchmark data is about 35 GB, a single table holding 250 million rows. One of the most popular algorithms, neural network, has been used against the dataset to generate comparable results.

The neural network algorithms support various features. In this workload, the following two neural network features have been used: neural net with two hidden layers 10-neuron with 5-neuron hyperbolic tangent and neural net with two hidden layers 20-neuron with 10-neuron hyperbolic tangent.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

SPECjEnterprise2010: SPARC T7-1 World Record with Single Application Server Using 1 to 4 Chips

Oracle's SPARC T7-1 servers have set a world record for the SPECjEnterprise2010 benchmark for solutions using a single application server with one to four chips. The result of 25,818.85 SPECjEnterprise2010 EjOPS used two SPARC T7-1 servers, one server for the application tier and the other server for the database tier.

  • The SPARC T7-1 servers obtained a result of 25,093.06 SPECjEnterprise2010 EjOPS using encrypted data. This secured result used Oracle Advanced Security Transparent Data Encryption (TDE) for the application database tablespaces with the AES-256-CFB cipher. The network connection between the application server and the database server was also encrypted using the secure JDBC.

  • The SPARC T7-1 server solution delivered 34% more performance compared to the two-chip IBM x3650 M5 server result of 19,282.14 SPECjEnterprise2010 EjOPS.

  • The SPARC T7-1 server solution delivered 14% more performance compared to the four-chip IBM Power System S824 server result of 22,543.34 SPECjEnterprise2010 EjOPS.

  • The SPARC T7-1 server based results demonstrated 20% more performance compared to the Oracle Server X5-2 system result of 21,504.30 SPECjEnterprise2010 EjOPS. Oracle holds the top x86 two-chip application server SPECjEnterprise2010 result.

  • The application server used Oracle Fusion Middleware components including the Oracle WebLogic 12.1 application server and Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.8.0_60. The database server was configured with Oracle Database 12c Release 1.

  • For the secure result, the application data was encrypted in the Oracle database using the Oracle Advanced Security Transparent Data Encryption (TDE) feature. Hardware accelerated cryptography support in the SPARC M7 processor for the AES-256-CFB cipher was used to provide data security.

  • The benchmark performance using the secure SPARC T7-1 server configuration with encryption was less than 3% when compared to the peak result.

  • This result demonstrated less than 1 second average response times for all SPECjEnterprise2010 transactions and represents Jave EE 5.0 transactions generated by over 210,000 users.

Performance Landscape

Select single application server results. Complete benchmark results are at the SPEC website, SPECjEnterprise2010 Results.

SPECjEnterprise2010 Performance Chart
10/25/2015
Submitter EjOPS* Java EE Server DB Server Notes
Oracle 25,818.85 1 x SPARC T7-1
1 x 4.13 GHz SPARC M7
Oracle WebLogic 12c (12.1.3)
1 x SPARC T7-1
1 x 4.13 GHz SPARC M7
Oracle Database 12c (12.1.0.2)
-
Oracle 25,093.06 1 x SPARC T7-1
1 x 4.13 GHz SPARC M7
Oracle WebLogic 12c (12.1.3)
Network Data Encryption for JDBC
1 x SPARC T7-1
1 x 4.13 GHz SPARC M7
Oracle Database 12c (12.1.0.2)
Transparent Data Encryption
Secure
IBM 22,543.34 1 x IBM Power S824
4 x 3.5 GHz POWER 8
WebSphere Application Server V8.5
1 x IBM Power S824
4 x 3.5 GHz POWER 8
IBM DB2 10.5 FP3
-
Oracle 21,504.30 1 x Oracle Server X5-2
2 x 2.3 GHz Intel Xeon E5-2699 v3
Oracle WebLogic 12c (12.1.3)
1 x Oracle Server X5-2
2 x 2.3 GHz Intel Xeon E5-2699 v3
Oracle Database 12c (12.1.0.2)
COD
IBM 19,282.14 1 x System x3650 M5
2 x 2.6 GHz Intel Xeon E5-2697 v3
WebSphere Application Server V8.5
1 x System x3850 X6
4 x 2.8 GHz Intel Xeon E7-4890 v2
IBM DB2 10.5 FP5
-

* SPECjEnterprise2010 EjOPS (bigger is better)

The Cluster on Die (COD) mode is a BIOS setting that effectively splits the chip in half, making the operating system think it has twice as many chips as it does (in this case, four, 9 core chips). Intel has stated that COD is appropriate only for highly NUMA optimized workloads. Dell has shown that there is a 3.7x slower bandwidth to the other half of the chip split by COD.

Configuration Summary

Application Server:

1 x SPARC T7-1 server, with
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
2 x 600 GB SAS HDD
2 x 400 GB SAS SSD
3 x Sun Dual Port 10 GbE PCIe 2.0 Networking card with Intel 82599 10 GbE Controller
Oracle Solaris 11.3 (11.3.0.0.30)
Oracle WebLogic Server 12c (12.1.3)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.8.0_60

Database Server:

1 x SPARC T7-1 server, with
1 x SPARC M7 processor (4.13 GHz)
512 GB memory (16 x 32 GB)
2 x 600 GB SAS HDD
1 x Sun Dual Port 10 GbE PCIe 2.0 Networking card with Intel 82599 10 GbE Controller
1 x Sun Storage 16 Gb Fibre Channel Universal HBA
Oracle Solaris 11.3 (11.3.0.0.30)
Oracle Database 12c (12.1.0.2)

Storage Servers:

1 x Oracle Server X5-2L (8-Drive), with
2 x Intel Xeon Processor E5-2699 v3 (2.3 GHz)
32 GB memory
1 x Sun Storage 16 Gb Fibre Channel Universal HBA
4 x 1.6 TB NVMe SSD
2 x 600 GB SAS HDD
Oracle Solaris 11.3 (11.3.0.0.30)
1 x Oracle Server X5-2L (24-Drive), with
2 x Intel Xeon Processor E5-2699 v3 (2.3 GHz)
32 GB memory
1 x Sun Storage 16 Gb Fibre Channel Universal HBA
14 x 600 GB SAS HDD
Oracle Solaris 11.3 (11.3.0.0.30)

1 x Brocade 6510 16 Gb FC switch

Benchmark Description

SPECjEnterprise2010 is the third generation of the SPEC organization's J2EE end-to-end industry standard benchmark application. The SPECjEnterprise2010 benchmark has been designed and developed to cover the Java EE 5 specification's significantly expanded and simplified programming model, highlighting the major features used by developers in the industry today. This provides a real world workload driving the Application Server's implementation of the Java EE specification to its maximum potential and allowing maximum stressing of the underlying hardware and software systems,

  • The web zone, servlets, and web services
  • The EJB zone
  • JPA 1.0 Persistence Model
  • JMS and Message Driven Beans
  • Transaction management
  • Database connectivity
Moreover, SPECjEnterprise2010 also heavily exercises all parts of the underlying infrastructure that make up the application environment, including hardware, JVM software, database software, JDBC drivers, and the system network.

The primary metric of the SPECjEnterprise2010 benchmark is jEnterprise Operations Per Second (SPECjEnterprise2010 EjOPS). The primary metric for the SPECjEnterprise2010 benchmark is calculated by adding the metrics of the Dealership Management Application in the Dealer Domain and the Manufacturing Application in the Manufacturing Domain. There is NO price/performance metric in this benchmark.

Key Points and Best Practices

  • Four Oracle WebLogic server instances on the SPARC T7-1 server were hosted in 4 separate Oracle Solaris Zones.
  • The Oracle WebLogic application servers were executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • The Oracle log writer process was run in the RT scheduling class.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjEnterprise are registered trademarks of the Standard Performance Evaluation Corporation. Results from www.spec.org as of 10/25/2015. SPARC T7-1, 25,818.85 SPECjEnterprise2010 EjOPS (unsecure); SPARC T7-1, 25,093.06 SPECjEnterprise2010 EjOPS (secure); Oracle Server X5-2, 21,504.30 SPECjEnterprise2010 EjOPS (unsecure); IBM Power S824, 22,543.34 SPECjEnterprise2010 EjOPS (unsecure); IBM x3650 M5, 19,282.14 SPECjEnterprise2010 EjOPS (unsecure);

AES Encryption: SPARC T7-2 Beats x86 E5 v3

Oracle's cryptography benchmark measures security performance on important AES security modes. Oracle's SPARC M7 processor with its software in silicon security is faster than x86 servers that have the AES-NI instructions. In this test, the performance of on-processor encryption operations is measured (32 KB encryptions). Multiple threads are used to measure each processor's maximum throughput. Oracle's SPARC T7-2 server shows dramatically faster encryption compared to current x86 two processor servers.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 4.0 times faster executing AES-CFB 256-bit key encryption (in cache) than Intel Xeon E5-2699 v3 processors (with AES-NI) running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 3.7 times faster executing AES-CFB 128-bit key encryption (in cache) than Intel Xeon E5-2699 v3 processors (with AES-NI) running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 6.4 times faster executing AES-CFB 256-bit key encryption (in cache) than the Intel Xeon E5-2697 v2 processors (with AES-NI) running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 6.0 times faster executing AES-CFB 128-bit key encryption (in cache) than the Intel Xeon E5-2697 v2 processors (with AES-NI) running Oracle Linux 6.5.

  • AES-CFB encryption is used by Oracle Database for Transparent Data Encryption (TDE) which provides security for database storage.

Oracle has also measured SHA digest performance on the SPARC M7 processor.

Performance Landscape

Presented below are results for running encryption using the AES cipher with the CFB, CBC, GCM and CCM modes for key sizes of 128, 192 and 256. Decryption performance was similar and is not presented. Results are presented as MB/sec (10**6). All SPARC M7 processor results were run as part of this benchmark effort. All other results were run during previous benchmark efforts.

Encryption Performance – AES-CFB (used by Oracle Database)

Performance is presented for in-cache AES-CFB128 mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CFB
SPARC M7 4.13 2 126,948 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 53,794 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 31,924 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 19,964 Oracle Linux 6.5, IPP/AES-NI
AES-192-CFB
SPARC M7 4.13 2 144,299 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 60,736 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 37,157 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 23,218 Oracle Linux 6.5, IPP/AES-NI
AES-128-CFB
SPARC M7 4.13 2 166,324 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 68,691 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 44,388 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 27,755 Oracle Linux 6.5, IPP/AES-NI

Encryption Performance – AES-CBC

Performance is presented for in-cache AES-CBC mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CBC
SPARC M7 4.13 2 134,278 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 56,788 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 31,894 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 19,961 Oracle Linux 6.5, IPP/AES-NI
AES-192-CBC
SPARC M7 4.13 2 152,961 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 63,937 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 37,021 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 23,224 Oracle Linux 6.5, IPP/AES-NI
AES-128-CBC
SPARC M7 4.13 2 175,151 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 72,870 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 44,103 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 27,730 Oracle Linux 6.5, IPP/AES-NI

Encryption Performance – AES-GCM (used by ZFS Filesystem)

Performance is presented for in-cache AES-GCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-GCM
SPARC M7 4.13 2 74,221 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 34,022 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 15,338 Oracle Solaris 11.1, libsoftcrypto + libumem
AES-192-GCM
SPARC M7 4.13 2 81,448 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 36,820 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 15,768 Oracle Solaris 11.1, libsoftcrypto + libumem
AES-128-GCM
SPARC M7 4.13 2 86,223 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 38,845 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 16,405 Oracle Solaris 11.1, libsoftcrypto + libumem

Encryption Performance – AES-CCM (alternative used by ZFS Filesystem)

Performance is presented for in-cache AES-CCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CCM
SPARC M7 4.13 2 67,669 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 28,909 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 19,447 Oracle Linux 6.5, IPP/AES-NI
AES-192-CCM
SPARC M7 4.13 2 77,711 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 33,116 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 22,634 Oracle Linux 6.5, IPP/AES-NI
AES-128-CCM
SPARC M7 4.13 2 90,729 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 38,529 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 26,951 Oracle Linux 6.5, IPP/AES-NI

Configuration Summary

SPARC T7-2 server
2 x SPARC M7 processor, 4.13 GHz
1 TB memory
Oracle Solaris 11.3

SPARC T5-2 server
2 x SPARC T5 processor, 3.60 GHz
512 GB memory
Oracle Solaris 11.2

Oracle Server X5-2 system
2 x Intel Xeon E5-2699 v3 processors, 2.30 GHz
256 GB memory
Oracle Linux 6.5

Sun Server X4-2 system
2 x Intel Xeon E5-2697 v2 processors, 2.70 GHz
256 GB memory
Oracle Linux 6.5

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-cache and on-chip using various ciphers, including AES-128-CFB, AES-192-CFB, AES-256-CFB, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CCM, AES-192-CCM, AES-256-CCM, AES-128-GCM, AES-192-GCM and AES-256-GCM.

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various ciphers. They were run using optimized libraries for each platform to obtain the best possible performance. The encryption tests were run with pseudo-random data of size 32 KB. The benchmark tests were designed to run out of cache, so memory bandwidth and latency are not the limitations.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

SHA Digest Encryption: SPARC T7-2 Beats x86 E5 v3

Oracle's cryptography benchmark measures security performance on important Secure Hash Algorithm (SHA) functions. Oracle's SPARC M7 processor with its security software in silicon is faster than current and recent x86 servers. In this test, the performance of on-processor digest operations is measured for three sizes of plaintext inputs (64, 1024 and 8192 bytes) using three SHA2 digests (SHA512, SHA384, SHA256) and the older, weaker SHA1 digest. Multiple parallel threads are used to measure each processor's maximum throughput. Oracle's SPARC T7-2 server shows dramatically faster digest computation compared to current x86 two processor servers.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 17 times faster computing multiple parallel SHA512 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon E5-2699 v3 processors running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 14 times faster computing multiple parallel SHA256 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon E5-2699 v3 processors running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 4.8 times faster computing multiple parallel SHA1 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon E5-2699 v3 processors running Oracle Linux 6.5.

  • SHA1 and SHA2 operations are an integral part of Oracle Solaris, while on Linux they are performed using the add-on Cryptography for Intel Integrated Performance Primitives for Linux (library).

Oracle has also measured AES (CFB, GCM, CCM, CBC) cryptographic performance on the SPARC M7 processor.

Performance Landscape

Presented below are results for computing SHA1, SHA256, SHA384 and SHA512 digests for input plaintext sizes of 64, 1024 and 8192 bytes. Results are presented as MB/sec (10**6). All SPARC M7 processor results were run as part of this benchmark effort. All other results were run during previous benchmark efforts.

Digest Performance – SHA512

Performance is presented for SHA512 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 39,201 167,072 184,944
2 x SPARC T5, 3.6 GHz 18,717 73,810 78,997
2 x Intel Xeon E5-2699 v3, 2.3 GHz 3,949 9,214 10,681
2 x Intel Xeon E5-2697 v2, 2.7 GHz 2,681 6,631 7,701

Digest Performance – SHA384

Performance is presented for SHA384 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 39,697 166,898 185,194
2 x SPARC T5, 3.6 GHz 18,814 73,770 78,997
2 x Intel Xeon E5-2699 v3, 2.3 GHz 4,061 9,263 10,678
2 x Intel Xeon E5-2697 v2, 2.7 GHz 2,774 6,669 7,706

Digest Performance – SHA256

Performance is presented for SHA256 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 45,148 113,648 119,929
2 x SPARC T5, 3.6 GHz 21,140 49,483 51,114
2 x Intel Xeon E5-2699 v3, 2.3 GHz 3,446 7,785 8,463
2 x Intel Xeon E5-2697 v2, 2.7 GHz 2,404 5,570 6,037

Digest Performance – SHA1

Performance is presented for SHA1 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 47,640 92,515 97,545
2 x SPARC T5, 3.6 GHz 21,052 40,107 41,584
2 x Intel Xeon E5-2699 v3, 2.3 GHz 6,677 18,165 20,405
2 x Intel Xeon E5-2697 v2, 2.7 GHz 4,649 13,245 14,842

Configuration Summary

SPARC T7-2 server
2 x SPARC M7 processor, 4.13 GHz
1 TB memory
Oracle Solaris 11.3

SPARC T5-2 server
2 x SPARC T5 processor, 3.60 GHz
512 GB memory
Oracle Solaris 11.2

Oracle Server X5-2 system
2 x Intel Xeon E5-2699 v3 processors, 2.30 GHz
256 GB memory
Oracle Linux 6.5
Intel Integrated Performance Primitives for Linux, Version 8.2 (Update 1) 07 Nov 2014

Sun Server X4-2 system
2 x Intel Xeon E5-2697 v2 processors, 2.70 GHz
256 GB memory
Oracle Linux 6.5
Intel Integrated Performance Primitives for Linux, Version 8.2 (Update 1) 07 Nov 2014

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-cache and on-chip using various digests, including SHA1 and SHA2 (SHA256, SHA384, SHA512).

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various digests. They were run using optimized libraries for each platform to obtain the best possible performance. The encryption tests were run with pseudo-random data of sizes 64 bytes, 1024 bytes and 8192 bytes. The benchmark tests were designed to run out of cache, so memory bandwidth and latency are not the limitations.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

SPECvirt_sc2013: SPARC T7-2 World Record for 2 and 4 Chip Systems

Oracle has had a new result accepted by SPEC as of November 19, 2015. This new result may be found here.

Oracle's SPARC T7-2 server delivered a world record SPECvirt_sc2013 result for systems with two to four chips.

  • The SPARC T7-2 server produced a result of 3026 @ 168 VMs SPECvirt_sc2013.

  • The two-chip SPARC T7-2 server beat the best two-chip x86 Intel E5-2699 v3 server results by nearly 1.9 times (Huawei FusionServer RH2288H V3, HP ProLiant DL360 Gen9).

  • The two-chip SPARC T7-2 server delivered nearly 2.2 times the performance of the four-chip IBM Power System S824 server solution which used 3.5 GHz POWER8 six core chips.

  • The SPARC T7-2 server running Oracle Solaris 11.3 operating system, utilizes embedded virtualization products as the Oracle Solaris 11 zones, which in turn provide a low overhead, flexible, scalable and manageable virtualization environment.

  • The SPARC T7-2 server result used Oracle VM Server for SPARC 3.3 and Oracle Solaris Zones providing a flexible, scalable and manageable virtualization environment.

Performance Landscape

Complete benchmark results are at the SPEC website, SPECvirt_sc2013 Results. The following table highlights the leading two-, and four-chip results for the benchmark, bigger is better.

SPECvirt_sc2013
Leading Two to Four-Chip Results
System
Processor
Chips Result @ VMs Virtualization Software
SPARC T7-2
SPARC M7 (4.13 GHz, 32core)
2 3026 @ 168 Oracle VM Server for SPARC 3.3
Oracle Solaris Zones
HP DL580 Gen9
Intel E7-8890 v3 (2.5 GHz, 18core)
4 3020 @ 168 Red Hat Enterprise Linux 7.1 KVM
Lenovo System x3850 X6
Intel E7-8890 v3 (2.5 GHz, 18core)
4 2655 @ 147 Red Hat Enterprise Linux 6.6 KVM
Huawei FusionServer RH2288H V3
Intel E5-2699 v3 (2.3 GHz, 18core)
2 1616 @ 95 Huawei FusionSphere V1R5C10
HP DL360 Gen9
Intel E5-2699 v3 (2.3 GHz, 18core)
2 1614 @ 95 Red Hat Enterprise Linux 7.1 KVM
IBM Power S824
POWER8 (3.5 GHz, 6core)
4 1370 @ 79 PowerVM Enterprise Edition 2.2.3

Configuration Summary

System Under Test Highlights:

Hardware:
1 x SPARC T7-2 server, with
2 x 4.13 GHz SPARC M7
1 TB memory
2 Sun Dual Port 10GBase-T Adapter
2 Sun Storage Dual 16 Gb Fibre Channel PCIe Universal HBA

Software:
Oracle Solaris 11.3
Oracle VM Server for SPARC 3.3 (LDom)
Oracle Solaris Zones
Oracle iPlanet Web Server 7.0.20
Oracle PHP 5.3.29
Dovecot v2.2.18
Oracle WebLogic Server Standard Edition Release 10.3.6
Oracle Database 12c Enterprise Edition (12.1.0.2.0)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.7.0_85-b15

Storage:
3 x Oracle Server X5-2L, with
2 x Intel Xeon Processor E5-2630 v3 8-core 2.4 GHz
32 GB memory
4 x Oracle Flash Accelerator F160 PCIe Card
Oracle Solaris 11.3

1 x Oracle Server X5-2L, with
2 x Intel Xeon Processor E5-2630 v3 8-core 2.4 GHz
32 GB memory
4 x Oracle Flash Accelerator F160 PCIe Card
4x 400 GB SSDs
Oracle Solaris 11.3

Benchmark Description

SPECvirt_sc2013 is SPEC's updated benchmark addressing performance evaluation of datacenter servers used in virtualized server consolidation. SPECvirt_sc2013 measures the end-to-end performance of all system components including the hardware, virtualization platform, and the virtualized guest operating system and application software. It utilizes several SPEC workloads representing applications that are common targets of virtualization and server consolidation. The workloads were made to match a typical server consolidation scenario of CPU resource requirements, memory, disk I/O, and network utilization for each workload. These workloads are modified versions of SPECweb2005, SPECjAppServer2004, SPECmail2008, and SPEC CPU2006. The client-side SPECvirt_sc2013 harness controls the workloads. Scaling is achieved by running additional sets of virtual machines, called "tiles", until overall throughput reaches a peak.

Key Points and Best Practices

  • The SPARC T7-2 server running the Oracle Solaris 11.3, utilizes embedded virtualization products as the Oracle VM Server for SPARC and Oracle Solaris Zones, which provide a low overhead, flexible, scalable and manageable virtualization environment.

  • In order to provide a high level of data integrity and availability, all the benchmark data sets are stored on mirrored (RAID1) storage

  • Using Oracle VM Server for SPARC to bind the SPARC M7 processor with its local memory optimized system memory use in this virtual environment.

See Also

Disclosure Statement

SPEC and the benchmark name SPECvirt_sc are registered trademarks of the Standard Performance Evaluation Corporation. Results from www.spec.org as of 10/25/2015. SPARC T7-2, SPECvirt_sc2013 3026@168 VMs; HP DL580 Gen9, SPECvirt_sc2013 3020@168 VMs; Lenovo x3850 X6; SPECvirt_sc2013 2655@147 VMs; Huawei FusionServer RH2288H V3, SPECvirt_sc2013 1616@95 VMs; HP ProLiant DL360 Gen9, SPECvirt_sc2013 1614@95 VMs; IBM Power S824, SPECvirt_sc2013 1370@79 VMs.

Live Migration: SPARC T7-2 Oracle VM Server for SPARC Performance

One of the features that Oracle VM Server for SPARC offers is Live Migration, which is the process of securely moving an active logical domain (LDom, Virtual Machine) between different physical machines while maintaining application services to users. Memory, storage, and network connectivity of the logical domain are transferred from the original logical domain's machine to the destination target machine with all data compressed and encrypted.

  • Oracle's Live Migration is secure by default using SSL (AES256_GCM_SHA384) to encrypt migration network traffic to protect sensitive data from exploitation and to eliminate the requirement for additional hardware and dedicated networks. Additional authentication schemes can be set up to increase security for the source and target machines. VMware vMotion and IBM PowerVM do not support Secure Live Migration by default (see below).

  • An enterprise Java workload with a 74 GB footprint in a 128 GB VM running on Oracle's SPARC T7-2 server migrated to another SPARC T7-2 server in just 95 seconds with 30 seconds suspension time to the user.

Performance Landscape

Results from moving an active workload as well as two different idle workloads. The LDom was allocated 128 GB of memory.

Mission-Critical LDom Live Migration
Benchmark Test Total Migration
Time (sec)
Data Moved
(GB)
Network Bandwidth
(MB/sec)
Enterprise Java Workload/Active 95 74.3 835.3
After Active Workload/Idle 13 1.9 236.1
Out of the Box/Idle 13 1.1 135.4

Enterprise Java Workload Performance
Test Conditions Average Operations per Second
During Live Migration 347,370
No Migration 596,914

Configuration Summary

2 x SPARC T7-2
2 x SPARC M7 processors (4.13 GHz)
512 GB memory (32 x 16 GB DDR4-2133 DIMMs)
6 x 600 GB 10K RPM SAS-2 HDD
10 GbE (built-in network device)
Oracle Solaris 11.3 (11.3.0.26.0)
Oracle VM Server for SPARC ( LDoms v 3.3.0.0 Integration 17 )

The configuration of the LDoms on the source machine is:

Source Machine Configuration
LDom vcpus Memory
Primary/control 128 (16-cores) 128 GB
Guest0 128 (16-cores) 110 GB
Guest1 (Migration) 128 (16-cores) 128 GB
Guest2 128 (16-cores) 110 GB

The configuration of the LDoms on the target machine is:

Target Machine Configuration
LDom vcpus Memory
Primary/control 128 (16-cores) 128 GB

Benchmark Description

By running a Java workload on a logical domain and start a Live Migration process to move this logical domain to a target machine, the values of the major performance metrics of live migration can be measured:

  • Total Migration Time ( the total time it takes to migrate a logical domain ) .
  • Effect on Application Performance ( how much an application's performance degrades because of being migrated ) .

The number of logical domains on the source machine is three (Guest0, Guest1, Guest3) because it could represent a more realistic environment where all the source machine resources (vcpus and memory) are in use, by running the same Java workload on each LDom.

Three different experiments are run:

  • Enterprise Java Workload/Active: starting the same Java workload at the same time on three logical domains (Guest0, Guest1, and Guest2), the Live Migration of Guest1 is executed after an arbitrary amount of time.
  • After Active Workload/Idle: after running a Java workload on three logical domains (Guest0, Guest1, and Guest2), so the memory of each has been touched, and no workload is running on any of them, the Live Migration of Guest1 is executed.
  • Out of the Box/Idle: as soon as the three logical domains are installed or rebooted (Guest0, Guest1, and Guest2) with Oracle Solaris and no workload is running on any of them, the Live Migration of Guest1 is executed.

Key Points and Best Practices

  • The network interconnection between the primaries on source and target machines is 10 GbE built-in network device configured to use Jumbo Frames (MTU=9000) in order to get higher bandwidth during the live migration.

  • The Enterprise Java Workload Performance on the non-migrated logical domains (Guest0, Guest2) was not affected before, during, and after the live migration of Guest1.

  • IBM PowerVM does not support Secure Live Migration by default; the IBM's technology name is Live Partition Mobility and it can be found on Cloud Security Guidelines for IBM Power Systems, January 2015, pp 89 "4.10.1 Live Partition Mobility".

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

ZFS Encryption: SPARC T7-1 Performance

Oracle's SPARC T7-1 server can encrypt/decrypt at near clear text throughput. The SPARC T7-1 server can encrypt/decrypt on the fly and have CPU cycles left over for the application.

  • The SPARC T7-1 server performed 475,123 Clear 8k read IOPs. With AES-256-CCM enabled on the file syste, 8K read IOPS only drop 3.2% to 461,038.

  • The SPARC T7-1 server performed 461,038 AES-256-CCM 8K read IOPS and a two-chip x86 E5-2660 v3 server performed 224,360 AES-256-CCM 8K read IOPS. The SPARC M7 processor result is 4.1 times faster per chip.

  • The SPARC T7-1 server performed 460,600 AES-192-CCM 8K read IOPS and a two chip x86 E5-2660 v3 server performed 228,654 AES-192-CCM 8K read IOPS. The SPARC M7 processor result is 4.0 times faster per chip.

  • The SPARC T7-1 server performed 465,114 AES-128-CCM 8K read IOPS and a two chip x86 E5-2660 v3 server performed 231,911 AES-128-CCM 8K read IOPS. The SPARC M7 processor result is 4.0 times faster per chip.

  • The SPARC T7-1 server performed 475,123 clear text 8K read IOPS and a two chip x86 E5-2660 v3 server performed 438,483 clear text 8K read IOPS The SPARC M7 processor result is 2.2 times faster per chip.

Performance Landscape

Results presented below are for random read performance for 8K size. All of the following results were run as part of this benchmark effort.

Read Performance – 8K
Encryption SPARC T7-1 2 x E5-2660 v3
IOPS Resp Time % Busy IOPS Resp Time % Busy
Clear 475,123 0.8 msec 43% 438,483 0.8 msec 95%
AES-256-CCM 461,038 0.83 msec 56% 224,360 1.6 msec 97%
AES-192-CCM 465,114 0.83 msec 56% 228,654 1.5 msec 97%
AES-128-CCM 465,114 0.82 msec 57% 231,911 1.5 msec 96%

IOPS – IO operations per second
Resp Time – response time
% Busy – percent cpu usage

Configuration Summary

SPARC T7-1 server
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle Solaris 11.3
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Oracle Server X5-2L system
2 x Intel Xeon Processor E5-2660 V3 (2.60 GHz)
256 GB memory
Oracle Solaris 11.3
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Storage SAN
2 x Brocade 300 FC switches
2 x Sun Storage 6780 array with 64 disk drives / 16 GB Cache

Benchmark Description

The benchmark tests the performance of running an encrypted ZFS file system compared to the non-encrypted (clear text) ZFS file system. The tests were executed with Oracle's Vdbench tool Version 5.04.03. Three different encryption methods are tested, AES-256-CCM, AES-192-CCM and AES-128-CCM.

Key Points and Best Practices

  • The ZFS file system was configured with data cache disabled, meta cache enabled, 4 pools, 128 luns, and 192 file systems with 8K record size. Data cache was disable to insure data would be decrypted as it was read from storage. This is not a recommended setting for normal customer operations.

  • The tests were executed with Oracle's Vdbench tool against 192 file systems. Each file system was run with a queue depth of 2. The script used for testing is listed below.

  • hd=default,jvms=16
    sd=sd001,lun=/dev/zvol/rdsk/p1/vol001,size=5g,hitarea=100m
    sd=sd002,lun=/dev/zvol/rdsk/p1/vol002,size=5g,hitarea=100m
    #
    # sd003 through sd191 statements here
    #
    sd=sd192,lun=/dev/zvol/rdsk/p4/vol192,size=5g,hitarea=100m
    
    # VDBENCH work load definitions for run
    # Sequential write to fill storage.
    wd=swrite1,sd=sd*,readpct=0,seekpct=eof
    
    # Random Read work load.
    wd=rread,sd=sd*,readpct=100,seekpct=random,rhpct=100
    
    # VDBENCH Run Definitions for actual execution of load.
    rd=default,iorate=max,elapsed=3h,interval=10
    rd=seqwritewarmup,wd=swrite1,forxfersize=(1024k),forthreads=(16) 
    
    rd=default,iorate=max,elapsed=10m,interval=10
    
    rd=rread8k-50,wd=rread,forxfersize=(8k),iorate=curve, \
    curve=(95,90,80,70,60,50),forthreads=(2)
    

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Virtualized Storage: SPARC T7-1 Performance

Oracle's SPARC T7-1 server using SR-IOV enabled HBAs can achieve near native throughput. The SPARC T7-1 server, with its dramatically improved compute engine, can also achieve near native throughput with Virtual Disk (VDISK).

  • The SPARC T7-1 server is able to produce 604,219 8K read IO/Second (IOPS) with native Oracle Solaris 11.3 using 8 Gb FC HBAs. The SPARC T7-1 server using Oracle VM Server for SPARC 3.1 with 4 LDOM VDISK produced near native performance of 603,766 8K read IOPS. With SR-IOV enabled using 2 LDOMs, the SPARC T7-1 server produced 604,966 8K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 2.8 times faster virtualized IO throughput than a Sun Server X3-2L system (two Intel Xeon E5-2690, running a popular virtualization product). The virtualized x86 system produced 209,166 8K virtualized reads. The native performance of the x86 system was 338,458 8K read IOPS.

  • The SPARC T7-1 server is able to produce 891,025 4K Read IOPS with native Oracle Solaris 11.3 using 8 Gb FC HBAs. The SPARC T7-1 server using Oracle VM Server for SPARC 3.1 with 4 LDOM VDISK produced near native performance of 849,493 4K read IOPS. With SR-IOV enabled using 2 LDOMs, the SPARC T7-1 server produced 891,338 4K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 3.8 times faster virtualized IO throughput than a Sun Server X3-2L system (Intel Xeon E5-2690, running a popular virtualization product). The virtualized x86 system produced 219,830 4K virtualized reads. The native performance of the x86 system was 346,868 4K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 1.3 times faster with 16 Gb HBA compared to 8 Gb HBAs. This is quite impressive considering it was still attached to 8 Gb switches and storage.

Performance Landscape

Results presented below are for read performance for 8K size and then for 4K size. All of the following results were run as part of this benchmark effort.

Read Performance — 8K

System 8K Read IOPS Performance
Native Virtual Disk SR-IOV
SPARC T7-1 (16 Gb FC) 796,849 N/A 797,221
SPARC T7-1 (8 Gb FC) 604,219 603,766 604,966
Sun Server X3-2 (8 Gb FC) 338,458 209,166 N/A

Read Performance — 4K

System 4K Read IOPS Performance
Native Virtual Disk SR-IOV
SPARC T7-1 (16 Gb FC) 1,185,392 N/A 1,231,808
SPARC T7-1 (8 Gb FC) 891,025 849,493 891,338
Sun Server X3-2 (8 Gb FC) 346,868 219,830 N/A

Configuration Summary

SPARC T7-1 server
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle Solaris 11.3
Oracle VM Server for SPARC 3.1
4 x Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Qlogic
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Sun Server X3-2 system
2 x Intel Xeon Processor E5-2690 (2.90 GHz)
128 GB memory
Oracle Solaris 11.2
Popular Virtualization Software
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Storage SAN
Brocade 5300 Switch
2 x Sun Storage 6780 array with 64 disk drives / 16 GB Cache
2 x Sun Storage 2540-M2 arrays with 36 disk drives / 1.5 GB Cache

Benchmark Description

The benchmark tests operating system IO efficiency of native and virtual machine environments. The test accesses storage devices raw and with no operating system buffering. The storage space accessed fit within the cache controller on the storage arrays for low latency and highest throughput. All accesses were random 4K or 8K reads.

Tests were executed with Oracle's Vdbench Version 5.04.03 tool against 32 LUNs. Each LUN was run with a queue depth of 32.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Oracle Internet Directory: SPARC T7-2 World Record

Oracle's SPARC T7-2 server running Oracle Internet Directory (OID, Oracle's LDAP Directory Server) on Oracle Solaris 11 on a virtualized processor configuration achieved a record result on the Oracle Internet Directory benchmark.

  • The SPARC T7-2 server, virtualized to use a single processor, achieved world record performance running Oracle Internet Directory benchmark with 50M users.

  • The SPARC T7-2 server and Oracle Internet Directory using Oracle Database 12c running on Oracle Solaris 11 achieved record result of 1.18M LDAP searches/sec with an average latency of 0.85 msec with 1000 clients.

  • The SPARC T7 server demonstrated 25% better throughput and 23% better latency for LDAP search/sec over similarly configured SPARC T5 server benchmark environment.

  • Oracle Internet Directory achieved near linear scalability on the virtualized single processor domain on the SPARC T7-2 server with 79K LDAP searches/sec with 2 cores to 1.18M LDAP searches/sec with 32 cores.

  • Oracle Internet Directory and the virtualized single processor domain on the SPARC T7-2 server achieved up to 22,408 LDAP modify/sec with an average latency of 2.23 msec for 50 clients.

Performance Landscape

A virtualized single SPARC M7 processor in a SPARC T7-2 server was used for the test results presented below. The SPARC T7-2 server and SPARC T5-2 server results were run as part of this benchmark effort. The remaining results were part of a previous benchmark effort.

Oracle Internet Directory Tests
System chips/
cores
Search Modify Add
ops/sec lat (msec) ops/sec lat (msec) ops/sec lat (msec)
SPARC T7-2 1/32 1,177,947 0.85 22,400 2.2 1,436 11.1
SPARC T5-2 2/32 944,624 1.05 16,700 2.9 1,000 15.95
SPARC T4-4 4/32 682,000 1.46 12,000 4.0 835 19.0

Scaling runs were also made on the virtualized single processor domain on the SPARC T7-2 server.

Scaling of Search Tests – SPARC T7-2, One Processor
Cores Clients ops/sec Latency (msec)
32 1000 1,177,947 0.85
24 1000 863,343 1.15
16 500 615,563 0.81
8 500 280,029 1.78
4 100 156,114 0.64
2 100 79,300 1.26

Configuration Summary

System Under Test:

SPARC T7-2
2 x SPARC M7 processors, 4.13 GHz
512 GB memory
6 x 600 GB internal disks
1 x Sun Storage ZS3-2 (used for database and log files)
Flash storage (used for redo logs)
Oracle Solaris 11.3
Oracle Internet Directory 11g Release 1 PS7 (11.1.1.7.0)
Oracle Database 12c Enterprise Edition 12.1.0.2 (64-bit)

Benchmark Description

Oracle Internet Directory (OID) is Oracle's LDAPv3 Directory Server. The throughput for five key operations are measured — Search, Compare, Modify, Mix and Add.

LDAP Search Operations Test

This test scenario involved concurrent clients binding once to OID and then performing repeated LDAP Search operations. The salient characteristics of this test scenario is as follows:

  • SLAMD SearchRate job was used.
  • BaseDN of the search is root of the DIT, the scope is SUBTREE, the search filter is of the form UID=, DN and UID are the required attribute.
  • Each LDAP search operation matches a single entry.
  • The total number concurrent clients was 1000 and were distributed amongst two client nodes.
  • Each client binds to OID once and performs repeated LDAP Search operations, each search operation resulting in the lookup of a unique entry in such a way that no client looks up the same entry twice and no two clients lookup the same entry and all entries are searched randomly.
  • In one run of the test, random entries from the 50 Million entries are looked up in as many LDAP Search operations.
  • Test job was run for 60 minutes.

LDAP Compare Operations Test

This test scenario involved concurrent clients binding once to OID and then performing repeated LDAP Compare operations on userpassword attribute. The salient characteristics of this test scenario is as follows:

  • SLAMD CompareRate job was used.
  • Each LDAP compare operation matches user password of user.
  • The total number concurrent clients was 1000 and were distributed amongst two client nodes.
  • Each client binds to OID once and performs repeated LDAP compare operations.
  • In one run of the test, random entries from the 50 Million entries are compared in as many LDAP compare operations.
  • Test job was run for 60 minutes.

LDAP Modify Operations Test

This test scenario consisted of concurrent clients binding once to OID and then performing repeated LDAP Modify operations. The salient characteristics of this test scenario is as follows:

  • SLAMD LDAP modrate job was used.
  • A total of 50 concurrent LDAP clients were used.
  • Each client updates a unique entry each time and a total of 50 Million entries are updated.
  • Test job was run for 60 minutes.
  • Value length was set to 11.
  • Attribute that is being modified is not indexed.

LDAP Mixed Load Test

The test scenario involved both the LDAP search and LDAP modify clients enumerated above.

  • The ratio involved 60% LDAP search clients, 30% LDAP bind and 10% LDAP modify clients.
  • A total of 1000 concurrent LDAP clients were used and were distributed on 2 client nodes.
  • Test job was run for 60 minutes.

LDAP Add Load Test

The test scenario involved concurrent clients adding new entries as follows.

  • Slamd standard add rate job is used.
  • A total of 500,000 entries were added.
  • A total of 16 concurrent LDAP clients were used.
  • Slamd add's inetorgperson objectclass entry with 21 attributes (includes operational attributes).

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

Oracle Stream Explorer DDOS Attack: SPARC T7-4 World Record

A single processor of Oracle's SPARC T7-4 server achieved a world record result running an Oracle Stream Explorer platform benchmark. The Oracle Stream Explorer platform is used to process multiple event streams to detect patterns and trends in real time. The benchmark detects malicious IP addresses that cause a distributed denial of service (DDOS) attack.

  • A single SPARC M7 processor of a SPARC T7-4 server running Oracle Stream Explorer achieved a throughput result of 1.505 million ops/sec.

  • The SPARC M7 processor achieved 2.9 times the throughput of an x86 Intel Xeon Processor E7-8895 v3 based server.

Performance Landscape

All of the following results were run as part of this benchmark effort.

Oracle Stream Explorer Throughput Test
One Processor Performance
System Throughput
SPARC T7-4 1.505 M ops/sec
Oracle Server X5-4 0.522 M ops/sec

Configuration Summary

SPARC Server:

SPARC T7-4
4 x SPARC M7 processors
1 TB memory
Oracle Solaris 11.3
Oracle Stream Explorer 11.1.1.7 (PS6)
Oracle JDK 6

x86 Server:

Oracle Server X5-4
4 x Intel Xeon Processor E7-8895 v3
1 TB memory
Oracle Solaris 11.3
Oracle Stream Explorer 11.1.1.7 (PS6)
Oracle JDK 6

Benchmark Description

The benchmark detects malicious IP addresses that cause a distributed denial of service (DDOS) attack on a system. The benchmark determines which IP address sent the most packets. The benchmark has a dedicated load generator program for each Oracle Stream Explorer platform instance.

The Oracle Stream Explorer platform instance is always in a listening mode. When it receives data on its network socket, it starts incrementing the packet counter. Different Oracle Stream Explorer platform instances are deployed on different network sockets. The packet counter is printed out in regular intervals as the throughput for benchmarking purposes.

Key Points and Best Practices

  • The load generator was run on the system under test. One processor was used for the event processing, the other processors were used for the load generation.

  • On the SPARC T7-4 server, three SPARC M7 processors were assigned the task of running the 200 load generators. This was accomplished using the psrset command.

  • On the Oracle Server X5-4 system, three Intel Xeon Processor E7-8895 v3 were assigned the task of running the 36 load generators.

  • Only 25 cores of the SPARC M7 processor were required to satisfy the workload. The 200 Oracle Stream Explorer applications were bound eight per core.

  • All 18 cores of the Intel Xeon Processor E7-8895 v3 were required to satisfy the workload. The 36 Oracle Stream Explorer applications were bound two per core.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

Oracle FLEXCUBE Universal Banking: SPARC T7-1 World Record

Oracle's SPARC T7-1 servers running Oracle FLEXCUBE Universal Banking Release 12 along with Oracle Database 12c Enterprise Edition with Oracle Real Application Clusters on Oracle Solaris 11 produced record results for two processor solutions.

  • Two SPARC T7-1 servers each running Oracle FLEXCUBE Universal Banking Release 12 (v 12.0.1) and Oracle Real Application Clusters 12c database on Oracle Solaris 11 achieved record End of Year batch processing of 25 million accounts with 200 branches in 4 hrs 34 minutes (total of two processors).

  • A single SPARC T7-1 server running Oracle FLEXCUBE Universal Banking Release 12 processing 100 branches was able to complete the workload in similar time as the two node 200 branches End of Year workload, demonstrating good scaling of the application.

  • The customer representative workload for all 25 million accounts included saving accounts, current accounts, loans and TD accounts were created on the basis 25 million Customer IDs with 200 branches.

  • Oracle's SPARC M7 and T7 Servers running Oracle Solaris 11 with built-in Silicon Secured Memory with Oracle Database 12c can benefit global retail and corporate financial institutions who are running Oracle FLEXCUBE Universal Banking Release 12. The uniquely co-engineered Oracle software and hardware unlock unique agile capabilities demanded by modern business environments.

  • The SPARC T7-1 system and Oracle Solaris are able to provide a combination of uniquely essential characteristics that resonate with core values for a modern financial services institution.

  • The SPARC M7 processor based systems are capable of delivering higher performance and lower total cost of ownership (TCO) than older SPARC infrastructure, without introducing the unseen tax and risk of migrating applications away from older SPARC systems.

Performance Landscape

Oracle FLEXCUBE Universal Banking Release 12
End of Year Batch Processing
System Branches Time in Minutes
2 x SPARC T7-1 200 274 (min)
1 x SPARC T7-1 100 268 (min)

Configuration Summary

Systems Under Test:

2 x SPARC T7-1 each with
1 x SPARC M7 processor, 4.13 GHz
256 GB memory
Oracle Solaris 11.3 (11.3.0.27.0)
Oracle Database 12c (RAC/ASM 12.1.0.2 BP7)
Oracle FLEXCUBE Universal Banking Release 12

Storage Configuration:

Oracle ZFS Storage ZS4-4 appliance

Benchmark Description

The Oracle FLEXCUBE Universal Banking Release 12 benchmark models an actual customer bank with End of Cycle transaction batch jobs which typically execute during non-banking hours. This benchmark includes accrual for savings and term deposit accounts, interest capitalization for saving accounts, interest pay out for term deposit accounts and consumer load processing.

This benchmark helps banks refine their infrastructure requirements for the volumes and scale of operations for business expansion. The end of cycle can be year, month or day, with year having the most processing followed by month and then day.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« June 2016
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today