Thursday Nov 19, 2015

SPECvirt_2013: SPARC T7-2 World Record Performance for Two- and Four-Chip Systems

Oracle's SPARC T7-2 server delivered a world record SPECvirt_sc2013 result for systems with two to four chips.

  • The SPARC T7-2 server produced a result of 3198 @ 179 VMs SPECvirt_sc2013.

  • The two-chip SPARC T7-2 server beat the best four-chip x86 Intel E7-8890 v3 server (HP ProLiant DL580 Gen9), demonstrating that the SPARC M7 processor is 2.1 times faster than the Intel Xeon Processor E7-8890 v3 (chip-to-chip comparison).

  • The two-chip SPARC T7-2 server beat the best two-chip x86 Intel E5-2699 v3 server results by nearly 2 times (Huawei FusionServer RH2288H V3, HP ProLiant DL360 Gen9).

  • The two-chip SPARC T7-2 server delivered nearly 2.2 times the performance of the four-chip IBM Power System S824 server solution which used 3.5 GHz POWER8 six core chips.

  • The SPARC T7-2 server running Oracle Solaris 11.3 operating system, utilizes embedded virtualization products as the Oracle Solaris 11 zones, which in turn provide a low overhead, flexible, scalable and manageable virtualization environment.

  • The SPARC T7-2 server result used Oracle VM Server for SPARC 3.3 and Oracle Solaris Zones providing a flexible, scalable and manageable virtualization environment.

Performance Landscape

Complete benchmark results are at the SPEC website, SPECvirt_sc2013 Results. The following table highlights the leading two-, and four-chip results for the benchmark, bigger is better.

SPECvirt_sc2013
Leading Two to Four-Chip Results
System
Processor
Chips Result @ VMs Virtualization Software
SPARC T7-2
SPARC M7 (4.13 GHz, 32core)
2 3198 @ 179 Oracle VM Server for SPARC 3.3
Oracle Solaris Zones
HP ProLiant DL580 Gen9
Intel E7-8890 v3 (2.5 GHz, 18core)
4 3020 @ 168 Red Hat Enterprise Linux 7.1 KVM
Lenovo System x3850 X6
Intel E7-8890 v3 (2.5 GHz, 18core)
4 2655 @ 147 Red Hat Enterprise Linux 6.6 KVM
Huawei FusionServer RH2288H V3
Intel E5-2699 v3 (2.3 GHz, 18core)
2 1616 @ 95 Huawei FusionSphere V1R5C10
HP ProLiant DL360 Gen9
Intel E5-2699 v3 (2.3 GHz, 18core)
2 1614 @ 95 Red Hat Enterprise Linux 7.1 KVM
IBM Power S824
POWER8 (3.5 GHz, 6core)
4 1370 @ 79 PowerVM Enterprise Edition 2.2.3

Configuration Summary

System Under Test Highlights:

Hardware:
1 x SPARC T7-2 server, with
2 x 4.13 GHz SPARC M7
1 TB memory
2 Sun Dual Port 10GBase-T Adapter
2 Sun Storage Dual 16 Gb Fibre Channel PCIe Universal HBA

Software:
Oracle Solaris 11.3
Oracle VM Server for SPARC 3.3 (LDom)
Oracle Solaris Zones
Oracle iPlanet Web Server 7.0.20
Oracle PHP 5.3.29
Dovecot v2.2.18
Oracle WebLogic Server Standard Edition Release 10.3.6
Oracle Database 12c Enterprise Edition (12.1.0.2.0)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.7.0_85-b15

Storage:
3 x Oracle Server X5-2L, with
2 x Intel Xeon Processor E5-2630 v3 8-core 2.4 GHz
32 GB memory
4 x Oracle Flash Accelerator F160 PCIe Card
Oracle Solaris 11.3

1 x Oracle Server X5-2L, with
2 x Intel Xeon Processor E5-2630 v3 8-core 2.4 GHz
32 GB memory
4 x Oracle Flash Accelerator F160 PCIe Card
4x 400 GB SSDs
Oracle Solaris 11.3

Benchmark Description

SPECvirt_sc2013 is SPEC's updated benchmark addressing performance evaluation of datacenter servers used in virtualized server consolidation. SPECvirt_sc2013 measures the end-to-end performance of all system components including the hardware, virtualization platform, and the virtualized guest operating system and application software. It utilizes several SPEC workloads representing applications that are common targets of virtualization and server consolidation. The workloads were made to match a typical server consolidation scenario of CPU resource requirements, memory, disk I/O, and network utilization for each workload. These workloads are modified versions of SPECweb2005, SPECjAppServer2004, SPECmail2008, and SPEC CPU2006. The client-side SPECvirt_sc2013 harness controls the workloads. Scaling is achieved by running additional sets of virtual machines, called "tiles", until overall throughput reaches a peak.

Key Points and Best Practices

  • The SPARC T7-2 server running the Oracle Solaris 11.3, utilizes embedded virtualization products as the Oracle VM Server for SPARC and Oracle Solaris Zones, which provide a low overhead, flexible, scalable and manageable virtualization environment.

  • In order to provide a high level of data integrity and availability, all the benchmark data sets are stored on mirrored (RAID1) storage

  • Using Oracle VM Server for SPARC to bind the SPARC M7 processor with its local memory optimized the memory use in this virtual environment.

  • This improved result used a fractional tile to fully saturate the system.

See Also

Disclosure Statement

SPEC and the benchmark name SPECvirt_sc are registered trademarks of the Standard Performance Evaluation Corporation. Results from www.spec.org as of 11/19/2015. SPARC T7-2, SPECvirt_sc2013 3198@179 VMs; HP ProLiant DL580 Gen9, SPECvirt_sc2013 3020@168 VMs; Lenovo x3850 X6; SPECvirt_sc2013 2655@147 VMs; Huawei FusionServer RH2288H V3, SPECvirt_sc2013 1616@95 VMs; HP ProLiant DL360 Gen9, SPECvirt_sc2013 1614@95 VMs; IBM Power S824, SPECvirt_sc2013 1370@79 VMs.

Monday Oct 26, 2015

Real-Time Enterprise: SPARC T7-1 Faster Than x86 E5 v3

A goal of the modern business is real-time enterprise where analytics are run simultaneously with transaction processing on the same system to provide the most effective decision making. Oracle Database 12c Enterprise Edition utilizing the In-Memory option is designed to have the same database able to perform transactions at the highest performance and to transform analytical calculations that once took days or hours to complete orders of magnitude faster.

Oracle's SPARC M7 processor has deep innovations to take the real-time enterprise to the next level of performance. In this test both OLTP transactions and analytical queries were run in a single database instance using all of the same features of Oracle Database 12c Enterprise Edition utilizing the In-Memory option in order to compare the advantages of the SPARC M7 processor compared to a generic x86 processor. On both systems the OLTP and analytical queries both took about half of the processing load of the server.

In this test Oracle's SPARC T7-1 server is compared to a two-chip x86 E5 v3 based server. On analytical queries the SPARC M7 processor is 8.2x faster than the x86 E5 v3 processor. Simultaneously on OLTP transactions the SPARC M7 processor is 2.9x faster than the x86 E5 v3 processor. In addition, the SPARC T7-1 server had better OLTP transactional response time than the x86 E5 v3 server.

The SPARC M7 processor does this by using the Data Accelerator co-processor (DAX). DAX is not a SIMD instruction set, but rather an actual co-processor that offloads in-memory queries which frees the cores up for other processing. The DAX has direct access to the memory bus and can execute scans at near full memory bandwidth. Oracle makes the DAX API available to other applications, so this kind of acceleration is not just to the Oracle database, it is open.

The results below were obtained running a set of OLTP transactions and analytic queries simultaneously against two schema: a real time online orders system and a related historical orders schema configured as a real cardinality database (RCDB) star schema. The in-memory analytics RCDB queries are executed using the Oracle Database 12c In-Memory columnar feature.

  • The SPARC T7-1 server and the x86 E5 v3 server both ran OLTP transactions and the in-memory analytics on the same database instance using Oracle Database 12c Enterprise Edition utilizing the In-Memory option.

  • The SPARC T7-1 server ran the in-memory analytics RCDB based queries 8.2x faster per chip than a two-chip x86 E5 v3 server on the 48 stream test.

  • The SPARC T7-1 server delivers 2.9x higher OLTP transaction throughput results per chip than a two-chip x86 E5 v3 server on the 48 stream test.

Performance Landscape

The table below compares the SPARC T7-1 server and 2-chip x86 E5 v3 server while running OLTP and in-memory analytics against tables in the same database instance. The same set of transactions and queries were executed on each system.

Real-Time Enterprise Performance Chart
48 RCDB DSS Streams, 224 OLTP users
System OLTP Transactions Analytic Queries
Trans Per
Second
Per Chip
Advantage
Average
Response Time
Queries Per
Minute
Per Chip
Advantage
SPARC T7-1
1 x SPARC M7 (32core)
338 K 2.9x 11 (msec) 267 8.2x
x86 E5 v3 server
2 x Intel E5-2699 v3 (2x 18core)
236 K 1.0 12 (msec) 65 1.0

The number of cores listed is per chip.
The Per Chip Advantage it computed by normalizing to a single chip's performance

Configuration Summary

SPARC Server:

1 X SPARC T7-1 server
1 X SPARC M7 processor
256 GB Memory
Oracle Solaris 11.3
Oracle Database 12c Enterprise Edition Release 12.1.0.2.10

x86 Server:

1 X Oracle Server X5-2L
2 X Intel Xeon Processor E5-2699 v3
256 GB Memory
Oracle Linux 6 Update 5 (3.8.13-16.2.1.el6uek.x86_64)
Oracle Database 12c Enterprise Edition Release 12.1.0.2.10

Benchmark Description

The Real-Time Enterprise benchmark simulates the demands of customers who want to simultaneously run both their OLTP database and the related historical warehouse DSS data that would be based on that OLTP data. It answers the question of how a system will perform when doing data analysis while at the same time executing real-time on-line transactions.

The OLTP workload simulates an Order Inventory System that exercises both reads and writes with a potentially large number of users that stresses the lock management and connectivity, as well as, database access.

The number of customers, orders and users is fully parametrized. This benchmark is base on 100 GB dataset, 15 million customers, 600 million orders and up to 580 users. The workload consists of a number of transaction types including show-expenses, part-cost, supplier-phone, low-inv, high-inv, update-price, update-phone, update-cost, and new-order.

The real cardinality database (RCDB) schema was created to showcase the potential speedup one may see moving from on disk, row format data warehouse/Star Schema, to utilizing Oracle Database 12c's In-Memory feature for analytical queries.

The workload consists of as many as 2,304 unique queries asking questions such as "In 2014, what was the total revenue of single item orders", or "In August 2013, how many orders exceeded a total price of $50". Questions like these can help a company see where to focus for further revenue growth or identify weaknesses in their offerings.

RCDB scale factor 1050 represents a 1.05 TB data warehouse. It is transformed into a star schema of 1.0 TB, and then becomes 110 GB in size when loaded in memory. It consists of 1 fact table, and 4 dimension tables with over 10.5 billion rows. There are 56 columns with most cardinalities varying between 5 and 2,000, a primary key being an example of something outside this range.

Two reports are generated: one for the OLTP-Perf workload and one for the RCDB DSS workload. For the analytical DSS workload, queries per minute and average query elapsed times are reported. For the OLTP-Perf workload, both transactions-per-seconds in thousands and OLTP average response times in milliseconds are reported.

Key Points and Best Practices

  • This benchmark utilized the SPARC M7 processor's co-processor DAX for query acceleration.

  • All SPARC T7-1 server results were run with out-of-the-box tuning for Oracle Solaris.

  • All Oracle Server X5-2L system results were run with out of the box tunings for Oracle Linux except for the setting in /etc/sysctl.conf to get large pages for the Oracle Database:

    • vm.nr_hugepages=98304

  • To create an in memory area, the following was added to the init.ora:

      inmemory_size = 120g

  • An example of how to set a table to be in memory is below:

      ALTER TABLE CUSTOMER INMEMORY MEMCOMPRESS FOR QUERY HIGH

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

In-Memory Database: SPARC T7-1 Faster Than x86 E5 v3

Fast analytics on large databases are critical to transforming key business processes. Oracle's SPARC M7 processors are specifically designed to accelerate in-memory analytics using Oracle Database 12c Enterprise Edition utilizing the In-Memory option. The SPARC M7 processor outperforms an x86 E5 v3 chip by up to 10.8x on analytics queries. In order to test real world deep analysis on the SPARC M7 processor a scenario with over 2,300 analytical queries was run against a real cardinality database (RCDB) star schema. This benchmark was audited by Enterprise Strategy Group (ESG). ESG is an IT research, analyst, strategy, and validation firm focused on the global IT community.

The SPARC M7 processor does this by using Data Accelerator co-processor (DAX). DAX is not a SIMD instruction but rather an actual co-processor that offloads in-memory queries which frees the cores up for other processing. The DAX has direct access to the memory bus and can execute scans at near full memory bandwidth. Oracle makes the DAX API available to other applications, so this kind of acceleration not just for the Oracle database, it is open.

  • The SPARC M7 processor delivers up to a 10.8x Query Per Minute speedup per chip over the Intel Xeon Processor E5-2699 v3 when executing analytical queries using the In-Memory option of Oracle Database 12c.

  • Oracle's SPARC T7-1 server delivers up to a 5.4x Query Per Minute speedup over the 2-chip x86 E5 v3 server when executing analytical queries using the In-Memory option of Oracle Database 12c.

  • The SPARC T7-1 server delivers over 143 GB/sec of memory bandwidth which is up to 7x more than the 2-chip x86 E5 v3 server when the Oracle Database 12c is executing the same analytical queries against the RCDB.

  • The SPARC T7-1 server scanned over 48 billion rows per second through the database.

  • The SPARC T7-1 server compresses the on-disk RCDB star schema by around 6x when using the Memcompress For Query High setting (more information following below) and by nearly 10x compared to a standard data warehouse row format version of the same database.

Performance Landscape

The table below compares the SPARC T7-1 server and 2-chip x86 E5 v3 server. The x86 E5 v3 server single chip compares are from actual measurements against a single chip configuration.

The number of cores is per chip, multiply by number of chips to get system total.

RCDB Performance Chart
2,304 Queries
System Elapsed
Seconds
Queries Per
Minute
System
Adv
Chip
Adv
DB Memory
Bandwidth
SPARC T7-1
1 x SPARC M7 (32core)
381 363 5.4x 10.8x 143 GB/sec
x86 E5 v3 server
2 x Intel E5-2699 v3 (2x 18core)
2059 67 1.0x 2.0x 20 GB/sec
x86 E5 v3 server
1 x Intel E5-2699 v3 (18core)
4096 34 0.5x 1.0x 10 GB/sec

Fused Decompress + Scan

The In-Memory feature of Oracle Database 12c puts tables in columnar format. There are different levels of compression that can be applied. One of these is Oracle Zip (OZIP) which is used with the "MEMCOMPRESS FOR QUERY HIGH" setting. Typically when compression is applied to data, in order to operate on it, the data must be:

    (1) Decompressed
    (2) Written back to memory in uncompressed form
    (3) Scanned and the results returned.

When OZIP is applied to the data inside of an In-Memory Columnar Unit (or IMCU, an N sized chunk of rows), the DAX is able to take this data in its compressed format and operate (scan) directly upon it, returning results in a single step. This not only saves on compute power by not having the CPU do the decompression step, but also on memory bandwidth as the uncompressed data is not put back into memory. Only the results are returned. To illustrate this, a microbenchmark was used which measured the amount of rows that could be scanned per second.

SAE hpk-uperf

Compression

This performance test was run on a Scale Factor 1750 database, which represents a 1.75 TB row format data warehouse. The database is then transformed into a star schema which ends up around 1.1 TB in size. The star schema is then loaded in memory with a setting of "MEMCOMPRESS FOR QUERY HIGH", which focuses on performance with somewhat more aggressive compression. This memory area is a separate part of the System Global Area (SGA) which is defined by the database initialization parameter "inmemory_size". See below for an example. Here is a breakdown of each table in memory with compression ratios.

Column Name Original Size
(Bytes)
In Memory
Size (Bytes)
Compression
Ratio
LINEORDER 1,103,524,528,128 178,586,451,968 6.2x
DATE 11,534,336 1,179,648 9.8x
PART 11,534,336 1,179,648 9.8x
SUPPLIER 11,534,336 1,179,648 9.8x
CUSTOMER 11,534,336 1,179,648 9.8x

Configuration Summary

SPARC Server:

1 X SPARC T7-1 server
1 X SPARC M7 processor
512 GB memory
Oracle Solaris 11.3
Oracle Database 12c Enterprise Edition Release 12.1.0.2.13

x86 Server:

1 X Oracle Server X5-2L
2 X Intel Xeon Processor E5-2699 v3
512 GB memory
Oracle Linux 6 Update 5 (3.8.13-16.2.1.el6uek.x86_64)
Oracle Database 12c Enterprise Edition Release 12.1.0.2.13

Benchmark Description

The real cardinality database (RCDB) benchmark was created to showcase the potential speedup one may see moving from on disk, row format data warehouse/Star Schema, to utilizing Oracle Database 12c's In-Memory feature for analytical queries.

The workload consists of 2,304 unique queries asking questions such as "In 2014, what was the total revenue of single item orders", or "In August 2013, how many orders exceeded a total price of $50". Questions like these can help a company see where to focus for further revenue growth or identify weaknesses in their offerings.

RCDB scale factor 1750 represents a 1.75 TB data warehouse. It is transformed into a star schema of 1.1 TB, and then becomes 179 GB in size when loaded in memory. It consists of 1 fact table, and 4 dimension tables with over 10.5 billion rows. There are 56 columns with most cardinalities varying between 5 and 2,000, a primary key being an example of something outside this range.

One problem with many industry standard generated databases is that as they have grown in size the cardinalities for the generated columns have become exceedingly unrealistic. For instance one industry standard benchmark uses a schema where at scale factor 1 TB it calls for the number of parts to be SF * 800,000. A 1 TB database that calls for 800 million unique parts is not very realistic. Therefore RCDB attempts to take some of these unrealistic cardinalities and size them to be more representative of at least a section of customer data. Obviously one cannot encompass every database in one schema, this is just an example.

We carefully scaled each system so that the optimal number of users was run on each system under test so that we did not create artificial bottlenecks. Each user ran an equal number of queries and the same queries were run on each system, allowing for a fair comparison of the results.

Key Points and Best Practices

  • This benchmark utilized the SPARC M7 processor's co-processor DAX for query acceleration.

  • All SPARC T7-1 server results were run with out of the box tuning for Oracle Solaris.

  • All Oracle Server X5-2L system results were run with out of the box tunings for Oracle Linux except for the setting in /etc/sysctl.conf to get large pages for the Oracle Database:

    • vm.nr_hugepages=64520

  • To create an in memory area, the following was added to the init.ora:

      inmemory_size = 200g

  • An example of how to set a table to be in memory is below:

      ALTER TABLE CUSTOMER INMEMORY MEMCOMPRESS FOR QUERY HIGH

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Memory and Bisection Bandwidth: SPARC T7 and M7 Servers Faster Than x86 and POWER8

The STREAM benchmark measures delivered memory bandwidth on a variety of memory intensive tasks. Delivered memory bandwidth is key to a server delivering high performance on a wide variety of workloads. The STREAM benchmark is typically run where each chip in the system gets its memory requests satisfied from local memory. This report presents performance of Oracle's SPARC M7 processor based servers and compares their performance to x86 and IBM POWER8 servers.

Bisection bandwidth on a server is a measure of the cross-chip data bandwidth between the processors of a system where no memory access is local to the processor. Systems with large cross-chip penalties show dramatically lower bisection bandwidth. Real-world ad hoc workloads tend to perform better on systems with better bisection bandwidth because their memory usage characteristics tend to be chaotic.

IBM says the sustained or delivered bandwidth of the IBM POWER8 12-core chip is 230 GB/s. This number is a peak bandwidth calculation: 230.4 GB/sec = 9.6 GHz * 3 (r+w) * 8 byte. A similar calculation is used by IBM for the POWER8 dual-chip-module (two 6-core chips) to show a sustained or delivered bandwidth of 192 GB/sec (192.0 GB/sec = 8.0 GHz * 3 (r+w) * 8 byte). Peaks are the theoretical limits used for marketing hype, but true measured delivered bandwidth is the only useful comparison to help one understand delivered performance of real applications.

The STREAM benchmark is easy to run and anyone can measure memory bandwidth on a target system (see Key Points and Best Practices section).

  • The SPARC M7-8 server delivers over 1 TB/sec on the STREAM benchmark. This is over 2.4 times the triad bandwidth of an eight-chip x86 E7 v3 server.

  • The SPARC T7-4 delivered 2.2 times the STREAM triad bandwidth of a four-chip x86 E7 v3 server and 1.7 times the triad bandwidth of a four-chip IBM Power System S824 server.

  • The SPARC T7-2 delivered 2.5 times the STREAM triad bandwidth of a two-chip x86 E5 v3 server.

  • The SPARC M7-8 server delivered over 8.5 times the triad bisection bandwidth of an eight-chip x86 E7 v3 server.

  • The SPARC T7-4 server delivered over 2.7 times the triad bisection bandwidth of a four-chip x86 E7 v3 server and 2.3 times the triad bisection bandwidth of a four-chip IBM Power System S824 server.

  • The SPARC T7-2 server delivered over 2.7 times the triad bisection bandwidth of a two-chip x86 E5 v3 server.

Performance Landscape

The following SPARC, x86, and IBM S824 STREAM results were run as part of this benchmark effort. The IBM S822L result is from the referenced web location. The following SPARC results were all run using 32 GB dimms.

Maximum STREAM Benchmark Performance
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC M7-8 8 995,402 995,727 1,092,742 1,086,305
x86 E7 v3 8 346,771 354,679 445,550 442,184
SPARC T7-4 4 512,080 510,387 556,184 555,374
IBM S824 4 251,533 253,216 322,399 319,561
IBM S822L 4 252,743 247,314 295,556 305,955
x86 E7 v3 4 230,027 232,092 248,761 251,161
SPARC T7-2 2 259,198 259,380 285,835 285,905
x86 E5 v3 2 105,622 105,808 113,116 112,521
SPARC T7-1 1 131,323 131,308 144,956 144,706

All of the following bisection bandwidth results were run as part of this benchmark effort.

Bisection Bandwidth Benchmark Performance (Nonlocal STREAM)
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC M7-8 8 383,479 381,219 375,371 375,851
SPARC T5-8 8 172,195 172,354 250,620 250,858
x86 E7 v3 8 42,636 42,839 43,753 43,744
SPARC T7-4 4 142,549 142,548 142,645 142,729
SPARC T5-4 4 75,926 75,947 76,975 77,061
IBM S824 4 53,940 54,107 60,746 60,939
x86 E7 v3 4 41,636 47,740 51,206 51,333
SPARC T7-2 2 127,372 127,097 129,833 129,592
SPARC T5-2 2 91,530 91,597 91,761 91,984
x86 E5 v3 2 45,211 45,331 47,414 47,251

The following SPARC results were all run using 16 GB dimms.

SPARC T7 Servers – 16 GB DIMMS
Maximum STREAM Benchmark Performance
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC T7-4 4 520,779 521,113 602,137 600,330
SPARC T7-2 2 262,586 262,760 302,758 302,085
SPARC T7-1 1 132,154 132,132 168,677 168,654

Configuration Summary

SPARC Configurations:

SPARC M7-8
8 x SPARC M7 processors (4.13 GHz)
4 TB memory (128 x 32 GB dimms)

SPARC T7-4
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB dimms)
1 TB memory (64 x 16 GB dimms)

SPARC T7-2
2 x SPARC M7 processors (4.13 GHz)
1 TB memory (32 x 32 GB dimms)
512 GB memory (32 x 16 GB dimms)

SPARC T7-1
1 x SPARC M7 processor (4.13 GHz)
512 GB memory (16 x 32 GB dimms)
256 GB memory (16 x 16 GB dimms)

Oracle Solaris 11.3
Oracle Solaris Studio 12.4

x86 Configurations:

Oracle Server X5-8
8 x Intel Xeon Processor E7-8995 v3
2 TB memory (128 x 16 GB dimms)

Oracle Server X5-4
4 x Intel Xeon Processor E7-8995 v3
1 TB memory (64 x 16 GB dimms)

Oracle Server X5-2
2 x Intel Xeon Processor E5-2699 v3
256 GB memory (16 x 16 GB dimms)

Oracle Linux 7.1
Intel Parallel Studio XE Composer Version 2016 compilers

Benchmark Description

STREAM

The STREAM benchmark measures sustainable memory bandwidth (in MB/s) for simple vector compute kernels. All memory accesses are sequential, so a picture of how fast regular data may be moved through the system is portrayed. Properly run, the benchmark displays the characteristics of the memory system of the machine and not the advantages of running from the system's memory caches.

STREAM counts the bytes read plus the bytes written to memory. For the simple Copy kernel, this is exactly twice the number obtained from the bcopy convention. STREAM does this because three of the four kernels (Scale, Add and Triad) do arithmetic, so it makes sense to count both the data read into the CPU and the data written back from the CPU. The Copy kernel does no arithmetic, but, for consistency, counts bytes the same way as the other three.

The sequential nature of the memory references is the benchmark's biggest weakness. The benchmark does not expose limitations in a system's interconnect to move data from anywhere in the system to anywhere.

Bisection Bandwidth – Easy Modification of STREAM Benchmark

To test for bisection bandwidth, processes are bound to processors in sequential order. The memory is allocated in reverse order, so that the memory is placed non-local to the process. The benchmark is then run. If the system is capable of page migration, this feature must be turned off.

Key Points and Best Practices

The stream benchmark code was compiled for the SPARC M7 processor based systems with the following flags (using cc):

    -fast -m64 -W2,-Avector:aggressive -xautopar -xreduction -xpagesize=4m

The benchmark code was compiled for the x86 based systems with the following flags (Intel icc compiler):

    -O3 -m64 -xCORE-AVX2 -ipo -openmp -mcmodel=medium -fno-alias -nolib-inline

On Oracle Solaris, binding is accomplished with either setting the environment variable SUNW_MP_PROCBIND or the OpenMP variables OMP_PROC_BIND and OMP_PLACES.

    export OMP_NUM_THREADS=512
    export SUNW_MP_PROCBIND=0-511

On Oracle Linux systems using Intel compiler, binding is accomplished by setting the environment variable KMP_AFFINITY.

    export OMP_NUM_THREADS=72
    export KMP_AFFINITY='verbose,granularity=fine,proclist=[0-71],explicit'

The source code change in the file stream.c to do the reverse allocation

    <     for (j=STREAM_ARRAY_SIZE-1; j>=0; j--) { 
                a[j] = 1.0; 
                b[j] = 2.0; 
                c[j] = 0.0; 
            }
    ---
    >     for (j=0; j<STREAM_ARRAY_SIZE; j++) {
                a[j] = 1.0; 
                b[j] = 2.0; 
                c[j] = 0.0; 
            }
    

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Yahoo Cloud Serving Benchmark: SPARC T7-4 With Oracle NoSQL Beats x86 E5 v3 Per Chip

Oracle's SPARC T7-4 server delivered 1.9 million ops/sec on 1.6 billion records for the Yahoo Cloud Serving Benchmark (YCSB) 95% read/5% update workload. Oracle NoSQL Database was used in these tests. NoSQL is important for Big Data Analysis and for Cloud Computing.

  • One processor performance on the SPARC T7-4 server was 2.5 times better than one chip Intel Xeon E5-2699 v3 for the YCSB 95% read/5% update workload.

  • The SPARC T7-4 server showed low average latency of 1.12 msec on read and 4.90 msec on write while achieving nearly 1.9 million ops/sec.

  • The SPARC T7-4 server delivered 325K inserts/sec on 1.6 billion records with a low average latency of 2.65 msec.

  • One processor performance on the SPARC T7-4 server was over half a million (511K ops/sec) on 400 million records for the YCSB 95% read/5% update workload.

  • Near-linear scaling from 1 to 4 processors was 3.7x while maintaining low latency.

These results show the SPARC T7-4 server can handle a large database while achieving high throughput with low latency for cloud computing.

Performance Landscape

This table presents single chip results comparing the SPARC M7 processor (in a SPARC T7-4 server) to the Intel Xeon Processor E5-2699 v3 (in a 2-socket x86 server). All of the following results were run as part of this benchmark effort.

Comparing Single Chip Performance on YCSB Benchmark
Processor Insert Mixed Load (95% Read/5% Update)
Throughput
ops/sec
Average Latency Throughput
ops/sec
Average Latency
Write msec Read msec Write msec
SPARC M7 89,617 2.40 510,824 1.07 3.80
E5-2699 v3 55,636 1.18 202,701 0.71 2.30

The following table shows the performance of the Yahoo Clouds Serving Benchmark on multiple processor counts on the SPARC T7-4 server.

SPARC T7-4 server running YCSB benchmark
CPUs Shards Insert Mixed Load (95% Read/5% Update)
Throughput
ops/sec
Average Latency Throughput
ops/sec
Average Latency
Write msec Read msec Write msec
4 16 325,167 2.65 1,890,394 1.12 4.90
3 12 251,051 2.57 1,428,813 1.12 4.68
2 8 170,963 2.52 968,146 1.11 4.37
1 4 89,617 2.40 510,824 1.07 3.80

Configuration Summary

System Under Test:

SPARC T7-4 server
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB)
8 x Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Emulex
8 x Sun Dual Port 10 GbE PCIe 2.0 Low Profile Adapter, Base-T

Oracle Server X5-2L server
2 x Intel Xeon E5-2699 v3 processors (2.3 GHz)
384 GB memory
1 x Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Emulex
1 x Sun Dual 10GbE SFP+ PCIe 2.0 Low Profile Adapter

External Storage (Common Multiprotocol SCSI TARget, or COMSTAR enables system to be seen as a SCSI target device):

16 x Sun Server X3-2L servers configured as COMSTAR nodes, each with
2 x Intel Xeon E5-2609 processors (2.4 GHz)
4 x Sun Flash Accelerator F40 PCIe Cards, 400 GB each
1 x 8 Gb dual port HBA
Please note: These devices are only used as storage. No NoSQL is run on these COMSTAR storage nodes. There is no query acceleration done on these COMSTAR storage nodes.

Software Configuration:

Oracle Solaris 11.3 (11.3.1.2.0)
Logical Domains Manager v3.3.0.0.17 (running on the SPARC T7-4)
Oracle NoSQL Database, Enterprise Edition 12c R1.3.2.5
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)

Benchmark Description

The Yahoo Cloud Serving Benchmark (YCSB) is a performance benchmark for cloud database and their systems. The benchmark documentation says:

    With the many new serving databases available including Sherpa, BigTable, Azure and many more, it can be difficult to decide which system is right for your application, partially because the features differ between systems, and partially because there is not an easy way to compare the performance of one system versus another. The goal of the Yahoo Cloud Serving Benchmark (YCSB) project is to develop a framework and common set of workloads for evaluating the performance of different "key-value" and "cloud" serving stores.

Key Points and Best Practices

  • The SPARC T7-4 server showed 3.7x scaling from 1 to 4 sockets while maintaining low latency.

  • Four Oracle VM for SPARC (LDom) servers were created per processor, for a total of sixteen LDoms. Each LDom was configured with 120 GB memory accessing two PCIe IO slots under SR-IOV (Single Root IO Virtualization).

  • The Sun Flash Accelerator F40 PCIe Card demonstrated excellent IO capability and performed 841K read IOPS (3.5K IOPS per disk) during the 1.9 million ops/sec benchmark run.

  • There was no performance loss from Fibre Channel SR-IOV (Single Root IO Virtualization) compared to native.

  • Balanced memory bandwidth was delivered across all four processors achieving an average total of 304 GB/sec during 1.9 million ops/sec run.

  • The 1.6 billion records were loaded into 16 Shards with the replication factor set to 3.

  • Each LDom is associated with a processor set (16 total). The default processor set was additionally used for OS and IO interrupts. The processors sets were used to ensure a balanced load.

  • Fixed priority class was assigned to Oracle NoSQL Storage Node java processes.

  • The ZFS record size was set to 16K (default 128K) and this worked the best for 95% read/5% update workload.

  • A total of eight Sun Server X4-2 and Sun Server X4-2L systems were used as clients for generating the workload.

  • The LDoms and client systems were connected through a 10 GbE network.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of Oct 25, 2015.

AES Encryption: SPARC T7-2 Beats x86 E5 v3

Oracle's cryptography benchmark measures security performance on important AES security modes. Oracle's SPARC M7 processor with its software in silicon security is faster than x86 servers that have the AES-NI instructions. In this test, the performance of on-processor encryption operations is measured (32 KB encryptions). Multiple threads are used to measure each processor's maximum throughput. Oracle's SPARC T7-2 server shows dramatically faster encryption compared to current x86 two processor servers.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 4.0 times faster executing AES-CFB 256-bit key encryption (in cache) than Intel Xeon E5-2699 v3 processors (with AES-NI) running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 3.7 times faster executing AES-CFB 128-bit key encryption (in cache) than Intel Xeon E5-2699 v3 processors (with AES-NI) running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 6.4 times faster executing AES-CFB 256-bit key encryption (in cache) than the Intel Xeon E5-2697 v2 processors (with AES-NI) running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 6.0 times faster executing AES-CFB 128-bit key encryption (in cache) than the Intel Xeon E5-2697 v2 processors (with AES-NI) running Oracle Linux 6.5.

  • AES-CFB encryption is used by Oracle Database for Transparent Data Encryption (TDE) which provides security for database storage.

Oracle has also measured SHA digest performance on the SPARC M7 processor.

Performance Landscape

Presented below are results for running encryption using the AES cipher with the CFB, CBC, GCM and CCM modes for key sizes of 128, 192 and 256. Decryption performance was similar and is not presented. Results are presented as MB/sec (10**6). All SPARC M7 processor results were run as part of this benchmark effort. All other results were run during previous benchmark efforts.

Encryption Performance – AES-CFB (used by Oracle Database)

Performance is presented for in-cache AES-CFB128 mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CFB
SPARC M7 4.13 2 126,948 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 53,794 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 31,924 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 19,964 Oracle Linux 6.5, IPP/AES-NI
AES-192-CFB
SPARC M7 4.13 2 144,299 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 60,736 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 37,157 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 23,218 Oracle Linux 6.5, IPP/AES-NI
AES-128-CFB
SPARC M7 4.13 2 166,324 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 68,691 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 44,388 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 27,755 Oracle Linux 6.5, IPP/AES-NI

Encryption Performance – AES-CBC

Performance is presented for in-cache AES-CBC mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CBC
SPARC M7 4.13 2 134,278 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 56,788 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 31,894 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 19,961 Oracle Linux 6.5, IPP/AES-NI
AES-192-CBC
SPARC M7 4.13 2 152,961 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 63,937 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 37,021 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 23,224 Oracle Linux 6.5, IPP/AES-NI
AES-128-CBC
SPARC M7 4.13 2 175,151 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 72,870 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2699 v3 2.30 2 44,103 Oracle Linux 6.5, IPP/AES-NI
Intel E5-2697 v2 2.70 2 27,730 Oracle Linux 6.5, IPP/AES-NI

Encryption Performance – AES-GCM (used by ZFS Filesystem)

Performance is presented for in-cache AES-GCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-GCM
SPARC M7 4.13 2 74,221 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 34,022 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 15,338 Oracle Solaris 11.1, libsoftcrypto + libumem
AES-192-GCM
SPARC M7 4.13 2 81,448 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 36,820 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 15,768 Oracle Solaris 11.1, libsoftcrypto + libumem
AES-128-GCM
SPARC M7 4.13 2 86,223 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 38,845 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 16,405 Oracle Solaris 11.1, libsoftcrypto + libumem

Encryption Performance – AES-CCM (alternative used by ZFS Filesystem)

Performance is presented for in-cache AES-CCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CCM
SPARC M7 4.13 2 67,669 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 28,909 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 19,447 Oracle Linux 6.5, IPP/AES-NI
AES-192-CCM
SPARC M7 4.13 2 77,711 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 33,116 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 22,634 Oracle Linux 6.5, IPP/AES-NI
AES-128-CCM
SPARC M7 4.13 2 90,729 Oracle Solaris 11.3, libsoftcrypto + libumem
SPARC T5 3.60 2 38,529 Oracle Solaris 11.2, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 26,951 Oracle Linux 6.5, IPP/AES-NI

Configuration Summary

SPARC T7-2 server
2 x SPARC M7 processor, 4.13 GHz
1 TB memory
Oracle Solaris 11.3

SPARC T5-2 server
2 x SPARC T5 processor, 3.60 GHz
512 GB memory
Oracle Solaris 11.2

Oracle Server X5-2 system
2 x Intel Xeon E5-2699 v3 processors, 2.30 GHz
256 GB memory
Oracle Linux 6.5

Sun Server X4-2 system
2 x Intel Xeon E5-2697 v2 processors, 2.70 GHz
256 GB memory
Oracle Linux 6.5

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-cache and on-chip using various ciphers, including AES-128-CFB, AES-192-CFB, AES-256-CFB, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CCM, AES-192-CCM, AES-256-CCM, AES-128-GCM, AES-192-GCM and AES-256-GCM.

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various ciphers. They were run using optimized libraries for each platform to obtain the best possible performance. The encryption tests were run with pseudo-random data of size 32 KB. The benchmark tests were designed to run out of cache, so memory bandwidth and latency are not the limitations.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

SHA Digest Encryption: SPARC T7-2 Beats x86 E5 v3

Oracle's cryptography benchmark measures security performance on important Secure Hash Algorithm (SHA) functions. Oracle's SPARC M7 processor with its security software in silicon is faster than current and recent x86 servers. In this test, the performance of on-processor digest operations is measured for three sizes of plaintext inputs (64, 1024 and 8192 bytes) using three SHA2 digests (SHA512, SHA384, SHA256) and the older, weaker SHA1 digest. Multiple parallel threads are used to measure each processor's maximum throughput. Oracle's SPARC T7-2 server shows dramatically faster digest computation compared to current x86 two processor servers.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 17 times faster computing multiple parallel SHA512 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon E5-2699 v3 processors running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 14 times faster computing multiple parallel SHA256 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon E5-2699 v3 processors running Oracle Linux 6.5.

  • SPARC M7 processors running Oracle Solaris 11.3 ran 4.8 times faster computing multiple parallel SHA1 digests of 8 KB inputs (in cache) than Cryptography for Intel Integrated Performance Primitives for Linux (library) on Intel Xeon E5-2699 v3 processors running Oracle Linux 6.5.

  • SHA1 and SHA2 operations are an integral part of Oracle Solaris, while on Linux they are performed using the add-on Cryptography for Intel Integrated Performance Primitives for Linux (library).

Oracle has also measured AES (CFB, GCM, CCM, CBC) cryptographic performance on the SPARC M7 processor.

Performance Landscape

Presented below are results for computing SHA1, SHA256, SHA384 and SHA512 digests for input plaintext sizes of 64, 1024 and 8192 bytes. Results are presented as MB/sec (10**6). All SPARC M7 processor results were run as part of this benchmark effort. All other results were run during previous benchmark efforts.

Digest Performance – SHA512

Performance is presented for SHA512 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 39,201 167,072 184,944
2 x SPARC T5, 3.6 GHz 18,717 73,810 78,997
2 x Intel Xeon E5-2699 v3, 2.3 GHz 3,949 9,214 10,681
2 x Intel Xeon E5-2697 v2, 2.7 GHz 2,681 6,631 7,701

Digest Performance – SHA384

Performance is presented for SHA384 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 39,697 166,898 185,194
2 x SPARC T5, 3.6 GHz 18,814 73,770 78,997
2 x Intel Xeon E5-2699 v3, 2.3 GHz 4,061 9,263 10,678
2 x Intel Xeon E5-2697 v2, 2.7 GHz 2,774 6,669 7,706

Digest Performance – SHA256

Performance is presented for SHA256 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 45,148 113,648 119,929
2 x SPARC T5, 3.6 GHz 21,140 49,483 51,114
2 x Intel Xeon E5-2699 v3, 2.3 GHz 3,446 7,785 8,463
2 x Intel Xeon E5-2697 v2, 2.7 GHz 2,404 5,570 6,037

Digest Performance – SHA1

Performance is presented for SHA1 digest. The digest was computed for 64, 1024 and 8192 bytes of pseudo-random input data (same data for each run).

Processors Performance (MB/sec)
64B input 1024B input 8192B input
2 x SPARC M7, 4.13 GHz 47,640 92,515 97,545
2 x SPARC T5, 3.6 GHz 21,052 40,107 41,584
2 x Intel Xeon E5-2699 v3, 2.3 GHz 6,677 18,165 20,405
2 x Intel Xeon E5-2697 v2, 2.7 GHz 4,649 13,245 14,842

Configuration Summary

SPARC T7-2 server
2 x SPARC M7 processor, 4.13 GHz
1 TB memory
Oracle Solaris 11.3

SPARC T5-2 server
2 x SPARC T5 processor, 3.60 GHz
512 GB memory
Oracle Solaris 11.2

Oracle Server X5-2 system
2 x Intel Xeon E5-2699 v3 processors, 2.30 GHz
256 GB memory
Oracle Linux 6.5
Intel Integrated Performance Primitives for Linux, Version 8.2 (Update 1) 07 Nov 2014

Sun Server X4-2 system
2 x Intel Xeon E5-2697 v2 processors, 2.70 GHz
256 GB memory
Oracle Linux 6.5
Intel Integrated Performance Primitives for Linux, Version 8.2 (Update 1) 07 Nov 2014

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-cache and on-chip using various digests, including SHA1 and SHA2 (SHA256, SHA384, SHA512).

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various digests. They were run using optimized libraries for each platform to obtain the best possible performance. The encryption tests were run with pseudo-random data of sizes 64 bytes, 1024 bytes and 8192 bytes. The benchmark tests were designed to run out of cache, so memory bandwidth and latency are not the limitations.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

SPECvirt_sc2013: SPARC T7-2 World Record for 2 and 4 Chip Systems

Oracle has had a new result accepted by SPEC as of November 19, 2015. This new result may be found here.

Oracle's SPARC T7-2 server delivered a world record SPECvirt_sc2013 result for systems with two to four chips.

  • The SPARC T7-2 server produced a result of 3026 @ 168 VMs SPECvirt_sc2013.

  • The two-chip SPARC T7-2 server beat the best two-chip x86 Intel E5-2699 v3 server results by nearly 1.9 times (Huawei FusionServer RH2288H V3, HP ProLiant DL360 Gen9).

  • The two-chip SPARC T7-2 server delivered nearly 2.2 times the performance of the four-chip IBM Power System S824 server solution which used 3.5 GHz POWER8 six core chips.

  • The SPARC T7-2 server running Oracle Solaris 11.3 operating system, utilizes embedded virtualization products as the Oracle Solaris 11 zones, which in turn provide a low overhead, flexible, scalable and manageable virtualization environment.

  • The SPARC T7-2 server result used Oracle VM Server for SPARC 3.3 and Oracle Solaris Zones providing a flexible, scalable and manageable virtualization environment.

Performance Landscape

Complete benchmark results are at the SPEC website, SPECvirt_sc2013 Results. The following table highlights the leading two-, and four-chip results for the benchmark, bigger is better.

SPECvirt_sc2013
Leading Two to Four-Chip Results
System
Processor
Chips Result @ VMs Virtualization Software
SPARC T7-2
SPARC M7 (4.13 GHz, 32core)
2 3026 @ 168 Oracle VM Server for SPARC 3.3
Oracle Solaris Zones
HP DL580 Gen9
Intel E7-8890 v3 (2.5 GHz, 18core)
4 3020 @ 168 Red Hat Enterprise Linux 7.1 KVM
Lenovo System x3850 X6
Intel E7-8890 v3 (2.5 GHz, 18core)
4 2655 @ 147 Red Hat Enterprise Linux 6.6 KVM
Huawei FusionServer RH2288H V3
Intel E5-2699 v3 (2.3 GHz, 18core)
2 1616 @ 95 Huawei FusionSphere V1R5C10
HP DL360 Gen9
Intel E5-2699 v3 (2.3 GHz, 18core)
2 1614 @ 95 Red Hat Enterprise Linux 7.1 KVM
IBM Power S824
POWER8 (3.5 GHz, 6core)
4 1370 @ 79 PowerVM Enterprise Edition 2.2.3

Configuration Summary

System Under Test Highlights:

Hardware:
1 x SPARC T7-2 server, with
2 x 4.13 GHz SPARC M7
1 TB memory
2 Sun Dual Port 10GBase-T Adapter
2 Sun Storage Dual 16 Gb Fibre Channel PCIe Universal HBA

Software:
Oracle Solaris 11.3
Oracle VM Server for SPARC 3.3 (LDom)
Oracle Solaris Zones
Oracle iPlanet Web Server 7.0.20
Oracle PHP 5.3.29
Dovecot v2.2.18
Oracle WebLogic Server Standard Edition Release 10.3.6
Oracle Database 12c Enterprise Edition (12.1.0.2.0)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.7.0_85-b15

Storage:
3 x Oracle Server X5-2L, with
2 x Intel Xeon Processor E5-2630 v3 8-core 2.4 GHz
32 GB memory
4 x Oracle Flash Accelerator F160 PCIe Card
Oracle Solaris 11.3

1 x Oracle Server X5-2L, with
2 x Intel Xeon Processor E5-2630 v3 8-core 2.4 GHz
32 GB memory
4 x Oracle Flash Accelerator F160 PCIe Card
4x 400 GB SSDs
Oracle Solaris 11.3

Benchmark Description

SPECvirt_sc2013 is SPEC's updated benchmark addressing performance evaluation of datacenter servers used in virtualized server consolidation. SPECvirt_sc2013 measures the end-to-end performance of all system components including the hardware, virtualization platform, and the virtualized guest operating system and application software. It utilizes several SPEC workloads representing applications that are common targets of virtualization and server consolidation. The workloads were made to match a typical server consolidation scenario of CPU resource requirements, memory, disk I/O, and network utilization for each workload. These workloads are modified versions of SPECweb2005, SPECjAppServer2004, SPECmail2008, and SPEC CPU2006. The client-side SPECvirt_sc2013 harness controls the workloads. Scaling is achieved by running additional sets of virtual machines, called "tiles", until overall throughput reaches a peak.

Key Points and Best Practices

  • The SPARC T7-2 server running the Oracle Solaris 11.3, utilizes embedded virtualization products as the Oracle VM Server for SPARC and Oracle Solaris Zones, which provide a low overhead, flexible, scalable and manageable virtualization environment.

  • In order to provide a high level of data integrity and availability, all the benchmark data sets are stored on mirrored (RAID1) storage

  • Using Oracle VM Server for SPARC to bind the SPARC M7 processor with its local memory optimized system memory use in this virtual environment.

See Also

Disclosure Statement

SPEC and the benchmark name SPECvirt_sc are registered trademarks of the Standard Performance Evaluation Corporation. Results from www.spec.org as of 10/25/2015. SPARC T7-2, SPECvirt_sc2013 3026@168 VMs; HP DL580 Gen9, SPECvirt_sc2013 3020@168 VMs; Lenovo x3850 X6; SPECvirt_sc2013 2655@147 VMs; Huawei FusionServer RH2288H V3, SPECvirt_sc2013 1616@95 VMs; HP ProLiant DL360 Gen9, SPECvirt_sc2013 1614@95 VMs; IBM Power S824, SPECvirt_sc2013 1370@79 VMs.

ZFS Encryption: SPARC T7-1 Performance

Oracle's SPARC T7-1 server can encrypt/decrypt at near clear text throughput. The SPARC T7-1 server can encrypt/decrypt on the fly and have CPU cycles left over for the application.

  • The SPARC T7-1 server performed 475,123 Clear 8k read IOPs. With AES-256-CCM enabled on the file syste, 8K read IOPS only drop 3.2% to 461,038.

  • The SPARC T7-1 server performed 461,038 AES-256-CCM 8K read IOPS and a two-chip x86 E5-2660 v3 server performed 224,360 AES-256-CCM 8K read IOPS. The SPARC M7 processor result is 4.1 times faster per chip.

  • The SPARC T7-1 server performed 460,600 AES-192-CCM 8K read IOPS and a two chip x86 E5-2660 v3 server performed 228,654 AES-192-CCM 8K read IOPS. The SPARC M7 processor result is 4.0 times faster per chip.

  • The SPARC T7-1 server performed 465,114 AES-128-CCM 8K read IOPS and a two chip x86 E5-2660 v3 server performed 231,911 AES-128-CCM 8K read IOPS. The SPARC M7 processor result is 4.0 times faster per chip.

  • The SPARC T7-1 server performed 475,123 clear text 8K read IOPS and a two chip x86 E5-2660 v3 server performed 438,483 clear text 8K read IOPS The SPARC M7 processor result is 2.2 times faster per chip.

Performance Landscape

Results presented below are for random read performance for 8K size. All of the following results were run as part of this benchmark effort.

Read Performance – 8K
Encryption SPARC T7-1 2 x E5-2660 v3
IOPS Resp Time % Busy IOPS Resp Time % Busy
Clear 475,123 0.8 msec 43% 438,483 0.8 msec 95%
AES-256-CCM 461,038 0.83 msec 56% 224,360 1.6 msec 97%
AES-192-CCM 465,114 0.83 msec 56% 228,654 1.5 msec 97%
AES-128-CCM 465,114 0.82 msec 57% 231,911 1.5 msec 96%

IOPS – IO operations per second
Resp Time – response time
% Busy – percent cpu usage

Configuration Summary

SPARC T7-1 server
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle Solaris 11.3
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Oracle Server X5-2L system
2 x Intel Xeon Processor E5-2660 V3 (2.60 GHz)
256 GB memory
Oracle Solaris 11.3
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Storage SAN
2 x Brocade 300 FC switches
2 x Sun Storage 6780 array with 64 disk drives / 16 GB Cache

Benchmark Description

The benchmark tests the performance of running an encrypted ZFS file system compared to the non-encrypted (clear text) ZFS file system. The tests were executed with Oracle's Vdbench tool Version 5.04.03. Three different encryption methods are tested, AES-256-CCM, AES-192-CCM and AES-128-CCM.

Key Points and Best Practices

  • The ZFS file system was configured with data cache disabled, meta cache enabled, 4 pools, 128 luns, and 192 file systems with 8K record size. Data cache was disable to insure data would be decrypted as it was read from storage. This is not a recommended setting for normal customer operations.

  • The tests were executed with Oracle's Vdbench tool against 192 file systems. Each file system was run with a queue depth of 2. The script used for testing is listed below.

  • hd=default,jvms=16
    sd=sd001,lun=/dev/zvol/rdsk/p1/vol001,size=5g,hitarea=100m
    sd=sd002,lun=/dev/zvol/rdsk/p1/vol002,size=5g,hitarea=100m
    #
    # sd003 through sd191 statements here
    #
    sd=sd192,lun=/dev/zvol/rdsk/p4/vol192,size=5g,hitarea=100m
    
    # VDBENCH work load definitions for run
    # Sequential write to fill storage.
    wd=swrite1,sd=sd*,readpct=0,seekpct=eof
    
    # Random Read work load.
    wd=rread,sd=sd*,readpct=100,seekpct=random,rhpct=100
    
    # VDBENCH Run Definitions for actual execution of load.
    rd=default,iorate=max,elapsed=3h,interval=10
    rd=seqwritewarmup,wd=swrite1,forxfersize=(1024k),forthreads=(16) 
    
    rd=default,iorate=max,elapsed=10m,interval=10
    
    rd=rread8k-50,wd=rread,forxfersize=(8k),iorate=curve, \
    curve=(95,90,80,70,60,50),forthreads=(2)
    

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Virtualized Storage: SPARC T7-1 Performance

Oracle's SPARC T7-1 server using SR-IOV enabled HBAs can achieve near native throughput. The SPARC T7-1 server, with its dramatically improved compute engine, can also achieve near native throughput with Virtual Disk (VDISK).

  • The SPARC T7-1 server is able to produce 604,219 8K read IO/Second (IOPS) with native Oracle Solaris 11.3 using 8 Gb FC HBAs. The SPARC T7-1 server using Oracle VM Server for SPARC 3.1 with 4 LDOM VDISK produced near native performance of 603,766 8K read IOPS. With SR-IOV enabled using 2 LDOMs, the SPARC T7-1 server produced 604,966 8K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 2.8 times faster virtualized IO throughput than a Sun Server X3-2L system (two Intel Xeon E5-2690, running a popular virtualization product). The virtualized x86 system produced 209,166 8K virtualized reads. The native performance of the x86 system was 338,458 8K read IOPS.

  • The SPARC T7-1 server is able to produce 891,025 4K Read IOPS with native Oracle Solaris 11.3 using 8 Gb FC HBAs. The SPARC T7-1 server using Oracle VM Server for SPARC 3.1 with 4 LDOM VDISK produced near native performance of 849,493 4K read IOPS. With SR-IOV enabled using 2 LDOMs, the SPARC T7-1 server produced 891,338 4K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 3.8 times faster virtualized IO throughput than a Sun Server X3-2L system (Intel Xeon E5-2690, running a popular virtualization product). The virtualized x86 system produced 219,830 4K virtualized reads. The native performance of the x86 system was 346,868 4K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 1.3 times faster with 16 Gb HBA compared to 8 Gb HBAs. This is quite impressive considering it was still attached to 8 Gb switches and storage.

Performance Landscape

Results presented below are for read performance for 8K size and then for 4K size. All of the following results were run as part of this benchmark effort.

Read Performance — 8K

System 8K Read IOPS Performance
Native Virtual Disk SR-IOV
SPARC T7-1 (16 Gb FC) 796,849 N/A 797,221
SPARC T7-1 (8 Gb FC) 604,219 603,766 604,966
Sun Server X3-2 (8 Gb FC) 338,458 209,166 N/A

Read Performance — 4K

System 4K Read IOPS Performance
Native Virtual Disk SR-IOV
SPARC T7-1 (16 Gb FC) 1,185,392 N/A 1,231,808
SPARC T7-1 (8 Gb FC) 891,025 849,493 891,338
Sun Server X3-2 (8 Gb FC) 346,868 219,830 N/A

Configuration Summary

SPARC T7-1 server
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle Solaris 11.3
Oracle VM Server for SPARC 3.1
4 x Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Qlogic
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Sun Server X3-2 system
2 x Intel Xeon Processor E5-2690 (2.90 GHz)
128 GB memory
Oracle Solaris 11.2
Popular Virtualization Software
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Storage SAN
Brocade 5300 Switch
2 x Sun Storage 6780 array with 64 disk drives / 16 GB Cache
2 x Sun Storage 2540-M2 arrays with 36 disk drives / 1.5 GB Cache

Benchmark Description

The benchmark tests operating system IO efficiency of native and virtual machine environments. The test accesses storage devices raw and with no operating system buffering. The storage space accessed fit within the cache controller on the storage arrays for low latency and highest throughput. All accesses were random 4K or 8K reads.

Tests were executed with Oracle's Vdbench Version 5.04.03 tool against 32 LUNs. Each LUN was run with a queue depth of 32.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Oracle Internet Directory: SPARC T7-2 World Record

Oracle's SPARC T7-2 server running Oracle Internet Directory (OID, Oracle's LDAP Directory Server) on Oracle Solaris 11 on a virtualized processor configuration achieved a record result on the Oracle Internet Directory benchmark.

  • The SPARC T7-2 server, virtualized to use a single processor, achieved world record performance running Oracle Internet Directory benchmark with 50M users.

  • The SPARC T7-2 server and Oracle Internet Directory using Oracle Database 12c running on Oracle Solaris 11 achieved record result of 1.18M LDAP searches/sec with an average latency of 0.85 msec with 1000 clients.

  • The SPARC T7 server demonstrated 25% better throughput and 23% better latency for LDAP search/sec over similarly configured SPARC T5 server benchmark environment.

  • Oracle Internet Directory achieved near linear scalability on the virtualized single processor domain on the SPARC T7-2 server with 79K LDAP searches/sec with 2 cores to 1.18M LDAP searches/sec with 32 cores.

  • Oracle Internet Directory and the virtualized single processor domain on the SPARC T7-2 server achieved up to 22,408 LDAP modify/sec with an average latency of 2.23 msec for 50 clients.

Performance Landscape

A virtualized single SPARC M7 processor in a SPARC T7-2 server was used for the test results presented below. The SPARC T7-2 server and SPARC T5-2 server results were run as part of this benchmark effort. The remaining results were part of a previous benchmark effort.

Oracle Internet Directory Tests
System chips/
cores
Search Modify Add
ops/sec lat (msec) ops/sec lat (msec) ops/sec lat (msec)
SPARC T7-2 1/32 1,177,947 0.85 22,400 2.2 1,436 11.1
SPARC T5-2 2/32 944,624 1.05 16,700 2.9 1,000 15.95
SPARC T4-4 4/32 682,000 1.46 12,000 4.0 835 19.0

Scaling runs were also made on the virtualized single processor domain on the SPARC T7-2 server.

Scaling of Search Tests – SPARC T7-2, One Processor
Cores Clients ops/sec Latency (msec)
32 1000 1,177,947 0.85
24 1000 863,343 1.15
16 500 615,563 0.81
8 500 280,029 1.78
4 100 156,114 0.64
2 100 79,300 1.26

Configuration Summary

System Under Test:

SPARC T7-2
2 x SPARC M7 processors, 4.13 GHz
512 GB memory
6 x 600 GB internal disks
1 x Sun Storage ZS3-2 (used for database and log files)
Flash storage (used for redo logs)
Oracle Solaris 11.3
Oracle Internet Directory 11g Release 1 PS7 (11.1.1.7.0)
Oracle Database 12c Enterprise Edition 12.1.0.2 (64-bit)

Benchmark Description

Oracle Internet Directory (OID) is Oracle's LDAPv3 Directory Server. The throughput for five key operations are measured — Search, Compare, Modify, Mix and Add.

LDAP Search Operations Test

This test scenario involved concurrent clients binding once to OID and then performing repeated LDAP Search operations. The salient characteristics of this test scenario is as follows:

  • SLAMD SearchRate job was used.
  • BaseDN of the search is root of the DIT, the scope is SUBTREE, the search filter is of the form UID=, DN and UID are the required attribute.
  • Each LDAP search operation matches a single entry.
  • The total number concurrent clients was 1000 and were distributed amongst two client nodes.
  • Each client binds to OID once and performs repeated LDAP Search operations, each search operation resulting in the lookup of a unique entry in such a way that no client looks up the same entry twice and no two clients lookup the same entry and all entries are searched randomly.
  • In one run of the test, random entries from the 50 Million entries are looked up in as many LDAP Search operations.
  • Test job was run for 60 minutes.

LDAP Compare Operations Test

This test scenario involved concurrent clients binding once to OID and then performing repeated LDAP Compare operations on userpassword attribute. The salient characteristics of this test scenario is as follows:

  • SLAMD CompareRate job was used.
  • Each LDAP compare operation matches user password of user.
  • The total number concurrent clients was 1000 and were distributed amongst two client nodes.
  • Each client binds to OID once and performs repeated LDAP compare operations.
  • In one run of the test, random entries from the 50 Million entries are compared in as many LDAP compare operations.
  • Test job was run for 60 minutes.

LDAP Modify Operations Test

This test scenario consisted of concurrent clients binding once to OID and then performing repeated LDAP Modify operations. The salient characteristics of this test scenario is as follows:

  • SLAMD LDAP modrate job was used.
  • A total of 50 concurrent LDAP clients were used.
  • Each client updates a unique entry each time and a total of 50 Million entries are updated.
  • Test job was run for 60 minutes.
  • Value length was set to 11.
  • Attribute that is being modified is not indexed.

LDAP Mixed Load Test

The test scenario involved both the LDAP search and LDAP modify clients enumerated above.

  • The ratio involved 60% LDAP search clients, 30% LDAP bind and 10% LDAP modify clients.
  • A total of 1000 concurrent LDAP clients were used and were distributed on 2 client nodes.
  • Test job was run for 60 minutes.

LDAP Add Load Test

The test scenario involved concurrent clients adding new entries as follows.

  • Slamd standard add rate job is used.
  • A total of 500,000 entries were added.
  • A total of 16 concurrent LDAP clients were used.
  • Slamd add's inetorgperson objectclass entry with 21 attributes (includes operational attributes).

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

Oracle FLEXCUBE Universal Banking: SPARC T7-1 World Record

Oracle's SPARC T7-1 servers running Oracle FLEXCUBE Universal Banking Release 12 along with Oracle Database 12c Enterprise Edition with Oracle Real Application Clusters on Oracle Solaris 11 produced record results for two processor solutions.

  • Two SPARC T7-1 servers each running Oracle FLEXCUBE Universal Banking Release 12 (v 12.0.1) and Oracle Real Application Clusters 12c database on Oracle Solaris 11 achieved record End of Year batch processing of 25 million accounts with 200 branches in 4 hrs 34 minutes (total of two processors).

  • A single SPARC T7-1 server running Oracle FLEXCUBE Universal Banking Release 12 processing 100 branches was able to complete the workload in similar time as the two node 200 branches End of Year workload, demonstrating good scaling of the application.

  • The customer representative workload for all 25 million accounts included saving accounts, current accounts, loans and TD accounts were created on the basis 25 million Customer IDs with 200 branches.

  • Oracle's SPARC M7 and T7 Servers running Oracle Solaris 11 with built-in Silicon Secured Memory with Oracle Database 12c can benefit global retail and corporate financial institutions who are running Oracle FLEXCUBE Universal Banking Release 12. The uniquely co-engineered Oracle software and hardware unlock unique agile capabilities demanded by modern business environments.

  • The SPARC T7-1 system and Oracle Solaris are able to provide a combination of uniquely essential characteristics that resonate with core values for a modern financial services institution.

  • The SPARC M7 processor based systems are capable of delivering higher performance and lower total cost of ownership (TCO) than older SPARC infrastructure, without introducing the unseen tax and risk of migrating applications away from older SPARC systems.

Performance Landscape

Oracle FLEXCUBE Universal Banking Release 12
End of Year Batch Processing
System Branches Time in Minutes
2 x SPARC T7-1 200 274 (min)
1 x SPARC T7-1 100 268 (min)

Configuration Summary

Systems Under Test:

2 x SPARC T7-1 each with
1 x SPARC M7 processor, 4.13 GHz
256 GB memory
Oracle Solaris 11.3 (11.3.0.27.0)
Oracle Database 12c (RAC/ASM 12.1.0.2 BP7)
Oracle FLEXCUBE Universal Banking Release 12

Storage Configuration:

Oracle ZFS Storage ZS4-4 appliance

Benchmark Description

The Oracle FLEXCUBE Universal Banking Release 12 benchmark models an actual customer bank with End of Cycle transaction batch jobs which typically execute during non-banking hours. This benchmark includes accrual for savings and term deposit accounts, interest capitalization for saving accounts, interest pay out for term deposit accounts and consumer load processing.

This benchmark helps banks refine their infrastructure requirements for the volumes and scale of operations for business expansion. The end of cycle can be year, month or day, with year having the most processing followed by month and then day.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

PeopleSoft Human Capital Management 9.1 FP2: SPARC M7-8 World Record

This result demonstrates how Oracle's SPARC M7-8 server using Oracle VM Server for SPARC (LDoms) provides mission critical enterprise virtualization.

  • The virtualized two-chip, 1 TB LDom of the SPARC M7-8 server set a world record two-chip PeopleSoft Human Capital Management (HCM) 9.1 FP2 benchmark result, supporting 35,000 HR Self-Service online users with response times under one second, while simultaneously running the Payroll batch workload.

  • The virtualized two-chip LDom of the SPARC M7-8 server demonstrated 4 times better Search and 6 times better Save average response times running nearly double the number of online users along with payroll batch, compared to the ten-chip x86 solution from Cisco.

  • Using only a single chip in the virtualized two-chip LDom on the SPARC M7-8 server, the batch-only run demonstrated 1.8 times better throughput (payments/hour) compared to a four-chip Cisco UCSB460 M4 server.

  • Using only a single chip in the virtualized two-chip LDom on the SPARC M7-8 server, the batch-only run demonstrated 2.3 times better throughput (payments/hour) compared to a nine-chip IBM zEnterprise z196 server (EC 2817-709, 9-way, 8943 MIPS).

  • This record result demonstrates that a two SPARC M7 processor LDom (in SPARC M7-8), can run the same number of online users as a dynamic domain (PDom) of eight SPARC M6 processors (in SPARC M6-32), with better online response times, batch elapsed times and batch throughput (payments/hour).

  • The SPARC M7-8 server provides enterprise applications high availability and security, where each application is executed on its own environment independent of the others.

Performance Landscape

The first table presents the combined results, running both the PeopleSoft HR Self-Service Online and Payroll Batch tests concurrently.

PeopleSoft HR Self-Service Online And Payroll Batch Using Oracle Database 11g
System
Processors
Chips
Used
Users Search/Save Batch Elapsed
Time
Batch Pay/Hr
SPARC M7-8
SPARC M7
LDom1 2 35,000 0.67 sec/0.42 sec 22.71 min 1,322,272
LDom2 2 35,000 0.85 sec/0.50 sec 22.96 min 1,307,875
SPARC M6-32
SPARC M6
8 35,000 1.80 sec/1.12 sec 29.2 min 1,029,440
Cisco 1 x B460 M4, 3 x B200 M3
Intel E7-4890 v2, Intel E5-2697 v2
10 18,000 2.70 sec/2.60 sec 21.70 min 1,383,816

The following results are running only the Peoplesoft HR Self-Service Online test.

PeopleSoft HR Self-Service Online Using Oracle Database 11g
System
Processors
Chips
Used
Users Search/Save
Avg Response Times
SPARC M7-8
SPARC M7
LDom1 2 40,000 0.55 sec/0.33 sec
LDom2 2 40,000 0.56 sec/0.32 sec
SPARC M6-32
SPARC M6
8 40,000 2.73 sec/1.33 sec
Cisco 1 x B460 M4, 3 x B200 M3
Intel E7-4890 v2, Intel E5-2697 v2
10 20,000 0.35 sec/0.17 sec

The following results are running only the Peoplesoft Payroll Batch test. For the SPARC M7-8 server results, only one of the processors was used per LDom. This was accomplished using processor sets to further restrict the test to a single SPARC M7 processor.

PeopleSoft Payroll Batch Using Oracle Database 11g
System
Processors
Chips
Used
Batch Elapsed
Time
Batch Pay/Hr
SPARC M7-8
SPARC M7
LDom1 1 13.06 min 2,299,296
LDom2 1 12.85 min 2,336,872
SPARC M6-32
SPARC M6
2 18.27 min 1,643,612
Cisco UCS B460 M4
Intel E7-4890 v2
4 23.02 min 1,304,655
IBM z196
zEnterprise (5.2 GHz, 8943 MIPS)
9 30.50 min 984,551

Configuration Summary

System Under Test:

SPARC M7-8 server with
8 x SPARC M7 processor (4.13 GHz)
4 TB memory
Virtualized as two Oracle VM Server for SPARC (LDom) each with
2 x SPARC M7 processor (4.13 GHz)
1 TB memory

Storage Configuration:

2 x Oracle ZFS Storage ZS3-2 appliance (DB Data) each with
40 x 300 GB 10K RPM SAS-2 HDD,
8 x Write Flash Accelerator SSD and
2 x Read Flash Accelerator SSD 1.6TB SAS
2 x Oracle Server X5-2L (DB redo logs & App object cache) each with
2 x Intel Xeon Processor E5-2630 v3
32 GB memory
4 x 1.6 TB NVMe SSD

Software Configuration:

Oracle Solaris 11.3
Oracle Database 11g Release 2 (11.2.0.3.0)
PeopleSoft Human Capital Management 9.1 FP2
PeopleSoft PeopleTools 8.52.03
Oracle Java SE 6u32
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 043
Oracle WebLogic Server 11g (10.3.5)

Benchmark Description

The PeopleSoft Human Capital Management benchmark simulates thousands of online employees, managers and Human Resource administrators executing transactions typical of a Human Resources Self Service application for the Enterprise. Typical transactions are: viewing paychecks, promoting and hiring employees, updating employee profiles, etc. The database tier uses a database instance of about 500 GB in size, containing information for 500,480 employees. The application tier for this test includes web and application server instances, specifically Oracle WebLogic Server 11g, PeopleSoft Human Capital Management 9.1 FP2 and Oracle Java SE 6u32.

Key Points and Best Practices

In the HR online along with Payroll batch run, each LDom had one Oracle Solaris Zone of 7 cores containing the Web tier, two Oracle Solaris Zones of 16 cores each containing the Application tier and one Oracle Solaris Zone of 23 cores containing the Database tier. Two cores were dedicated to network and disk interrupt handling. In the HR online only run, each LDom had one Oracle Solaris Zone of 12 cores containing the Web tier, two Oracle Solaris Zones of 18 cores each containing the Application tier and one Oracle Solaris Zone of 14 cores containing the Database tier. 2 cores were dedicated to network and disk interrupt handling. In the Payroll batch only run, each LDom had one Oracle Solaris Zone of 31 cores containing the Database tier. 1 core was dedicated to disk interrupt handling.

All database data files, recovery files and Oracle Clusterware files for the PeopleSoft test were created with the Oracle Automatic Storage Management (Oracle ASM) volume manager for the added benefit of the ease of management provided by Oracle ASM integrated storage management solution.

In the application tier on each LDom, 5 PeopleSoft application domains with 350 application servers (70 per domain) were hosted in two separate Oracle Solaris Zones for a total of 10 domains with 700 application server processes.

All PeopleSoft Application processes and the 32 Web Server JVM instances were executed in the Oracle Solaris FX scheduler class.

See Also

Disclosure Statement

Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Oracle E-Business Payroll Batch Extra-Large: SPARC T7-1 World Record

Oracle's SPARC T7-1 server set a world record running the Oracle E-Business Suite 12.1.3 Standard Extra-Large (250,000 Employees) Payroll (Batch) workload.

  • The SPARC T7-1 server produced a world record result of 1,527,494 employee records processed per hour (9.82 min elapsed time) on the Oracle E-Business Suite R12 (12.1.3) Extra-Large Payroll (Batch) benchmark.

  • The SPARC T7-1 server equipped with one 4.13 GHz SPARC M7 processor, demonstrated 36% better hourly employee throughput compared to a two-chip Cisco UCS B200 M4 (Intel Xeon E5-2697 v3).

  • The SPARC T7-1 server equipped with one 4.13 GHz SPARC M7 processor, demonstrated 40% better hourly employee throughput compared to two-chip IBM S824 (POWER8 using 12 cores total).

Performance Landscape

This is the world record result for the Payroll Extra-Large model using Oracle E-Business 12.1.3 workload.

Batch Workload: Payroll Extra-Large Model
System Processor Employees/Hr Elapsed Time
SPARC T7-1 1 x SPARC M7 (4.13 GHz) 1,527,494 9.82 minutes
Cisco UCS B200 M4 2 x Intel Xeon Processor E5-2697 v3 1,125,281 13.33 minutes
IBM S824 2 x POWER8 (3.52 GHz) 1,090,909 13.75 minutes
Cisco UCS B200 M3 2 x Intel Xeon Processor E5-2697 v2 1,017,639 14.74 minutes
Cisco UCS B200 M3 2 x Intel Xeon Processor E5-2690 839,865 17.86 minutes
Sun Server X3-2L 2 x Intel Xeon Processor E5-2690 789,473 19.00 minutes

Configuration Summary

Hardware Configuration:

SPARC T7-1 server
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle ZFS Storage ZS3-2 appliance (DB Data storage) with
40 x 900 GB 10K RPM SAS-2 HDD,
8 x Write Flash Accelerator SSD and
2 x Read Flash Accelerator SSD 1.6 TB SAS
Oracle Flash Accelerator F160 PCIe Card (1.6 TB NVMe for DB Log storage)

Software Configuration:

Oracle Solaris 11.3
Oracle E-Business Suite R12 (12.1.3)
Oracle Database 11g (11.2.0.3.0)

Benchmark Description

The Oracle E-Business Suite Standard R12 Benchmark combines online transaction execution by simulated users with concurrent batch processing to model a typical scenario for a global enterprise. This benchmark ran one Batch component, Payroll, in the Extra-Large size.

Results can be published in four sizes and use one or more online/batch modules

  • X-large: Maximum online users running all business flows between 10,000 to 20,000; 750,000 order to cash lines per hour and 250,000 payroll checks per hour.
    • Order to Cash Online — 2400 users
      • The percentage across the 5 transactions in Order Management module is:
        • Insert Manual Invoice — 16.66%
        • Insert Order — 32.33%
        • Order Pick Release — 16.66%
        • Ship Confirm — 16.66%
        • Order Summary Report — 16.66%
    • HR Self-Service — 4000 users
    • Customer Support Flow — 8000 users
    • Procure to Pay — 2000 users
  • Large: 10,000 online users; 100,000 order to cash lines per hour and 100,000 payroll checks per hour.
  • Medium: up to 3000 online users; 50,000 order to cash lines per hour and 10,000 payroll checks per hour.
  • Small: up to 1000 online users; 10,000 order to cash lines per hour and 5,000 payroll checks per hour.

Key Points and Best Practices

  • All system optimizations are in the published report which is referenced in the See Also section below.

See Also

Disclosure Statement

Oracle E-Business X-Large Payroll Batch workload, SPARC T7-1, 4.13 GHz, 1 chip, 32 cores, 256 threads, 256 GB memory, elapsed time 9.82 minutes, 1,527,494 hourly employee throughput, Oracle Solaris 11.3, Oracle E-Business Suite 12.1.3, Oracle Database 11g Release 2, Results as of 10/25/2015.

Oracle E-Business Suite Applications R12.1.3 (OLTP X-Large): SPARC M7-8 World Record

Oracle's SPARC M7-8 server, using a four-chip Oracle VM Server for SPARC (LDom) virtualized server, produced a world record 20,000 users running the Oracle E-Business OLTP X-Large benchmark. The benchmark runs five Oracle E-Business online workloads concurrently: Customer Service, iProcurement, Order Management, Human Resources Self-Service, and Financials.

  • The virtualized four-chip LDom on the SPARC M7-8 was able to handle more users than the previous best result which used eight processors of Oracle's SPARC M6-32 server.

  • The SPARC M7-8 server using Oracle VM Server for SPARC provides enterprise applications high availability, where each application is executed on its own environment, insulated and independent of the others.

Performance Landscape

Oracle E-Business (3-tier) OLTP X-Large Benchmark
System Chips Total Online Users Weighted Average
Response Time (sec)
90th Percentile
Response Time (s)
SPARC M7-8 4 20,000 0.70 1.13
SPARC M6-32 8 18,500 0.61 1.16

Break down of the total number of users by component.

Users per Component
Component SPARC M7-8 SPARC M6-32
Total Online Users 20,000 users 18,500 users
HR Self-Service
Order-to-Cash
iProcurement
Customer Service
Financial
5000 users
2500 users
2700 users
7000 users
2800 users
4000 users
2300 users
2400 users
7000 users
2800 users

Configuration Summary

System Under Test:

SPARC M7-8 server
8 x SPARC M7 processors (4.13 GHz)
4 TB memory
2 x 600 GB SAS-2 HDD
using a Logical Domain with
4 x SPARC M7 processors (4.13 GHz)
2 TB memory
2 x Sun Storage Dual 16Gb Fibre Channel PCIe Universal HBA
2 x Sun Dual Port 10GBase-T Adapter
Oracle Solaris 11.3
Oracle E-Business Suite 12.1.3
Oracle Database 11g Release 2

Storage Configuration:

4 x Oracle ZFS Storage ZS3-2 appliances each with
2 x Read Flash Accelerator SSD
1 x Storage Drive Enclosure DE2-24P containing:
20 x 900 GB 10K RPM SAS-2 HDD
4 x Write Flash Accelerator SSD
1 x Sun Storage Dual 8Gb FC PCIe HBA
Used for Database files, Zones OS, EBS Mid-Tier Apps software stack
and db-tier Oracle Server
2 x Sun Server X4-2L server with
2 x Intel Xeon Processor E5-2650 v2
128 GB memory
1 x Sun Storage 6Gb SAS PCIe RAID HBA
4 x 400 GB SSD
14 x 600 GB HDD
Used for Redo log files, db backup storage.

Benchmark Description

The Oracle E-Business OLTP X-Large benchmark simulates thousands of online users executing transactions typical of an internal Enterprise Resource Processing, simultaneously executing five application modules: Customer Service, Human Resources Self Service, iProcurement, Order Management and Financial.

Each database tier uses a database instance of about 600 GB in size, supporting thousands of application users, accessing hundreds of objects (tables, indexes, SQL stored procedures, etc.).

Key Points and Best Practices

This test demonstrates virtualization technologies running concurrently various Oracle multi-tier business critical applications and databases on four SPARC M7 processors contained in a single SPARC M7-8 server supporting thousands of users executing a high volume of complex transactions with constrained (<1 sec) weighted average response time.

The Oracle E-Business LDom is further configured using Oracle Solaris Zones.

This result of 20,000 users was achieved by load balancing the Oracle E-Business Suite Applications 12.1.3 five online workloads across two Oracle Solaris processor sets and redirecting all network interrupts to a dedicated third processor set.

Each applications processor set (set-1 and set-2) was running concurrently two Oracle E-Business Suite Application servers and two database servers instances, each within its own Oracle Solaris Zone (4 x Zones per set).

Each application server network interface (to a client) was configured to map with the locality group associated to the CPUs processing the related workload, to guarantee memory locality of network structures and application servers hardware resources.

All external storage was connected with at least two paths to the host multipath-capable fibre channel controller ports and Oracle Solaris I/O multipathing feature was enabled.

See Also

Disclosure Statement

Oracle E-Business Suite R12 extra-large multiple-online module benchmark, SPARC M7-8, SPARC M7, 4.13 GHz, 4 chips, 128 cores, 1024 threads, 2 TB memory, 20,000 online users, average response time 0.70 sec, 90th percentile response time 1.13 sec, Oracle Solaris 11.3, Oracle Solaris Zones, Oracle VM Server for SPARC, Oracle E-Business Suite 12.1.3, Oracle Database 11g Release 2, Results as of 10/25/2015.

Oracle E-Business Order-To-Cash Batch Large: SPARC T7-1 World Record

Oracle's SPARC T7-1 server set a world record running the Oracle E-Business Suite 12.1.3 Standard Large (100,000 Order/Inventory Lines) Order-To-Cash (Batch) workload.

  • The SPARC T7-1 server produced a world record hourly order line throughput of 273,973 per hour (21.90 min elapsed time) on the Oracle E-Business Suite R12 (12.1.3) Large Order-To-Cash (Batch) benchmark using a SPARC T7-1 server for the database and application tiers running Oracle Database 11g on Oracle Solaris 11.

  • The SPARC T7-1 server demonstrated 12% better hourly order line throughput compared to a two-chip Cisco UCS B200 M4 (Intel Xeon Processor E5-2697 v3).

Performance Landscape

Results for the Oracle E-Business 12.1.3 Order-To-Cash Batch Large model workload.

Batch Workload: Order-To-Cash Large Model
System CPU Employees/Hr Elapsed Time (min)
SPARC T7-1 1 x SPARC M7 processor 273,973 21.90
Cisco UCS B200 M4 2 x Intel Xeon Processor E5-2697 v3 243,803 24.61
Cisco UCS B200 M3 2 x Intel Xeon Processor E5-2690 232,739 25.78

Configuration Summary

Hardware Configuration:

SPARC T7-1 server with
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle ZFS Storage ZS3-2 appliance (DB Data storage) with
40 x 900 GB 10K RPM SAS-2 HDD,
8 x Write Flash Accelerator SSD and
2 x Read Flash Accelerator SSD 1.6TB SAS
Oracle Flash Accelerator F160 PCIe Card (1.6 TB NVMe for DB Log storage)

Software Configuration:

Oracle Solaris 11.3
Oracle E-Business Suite R12 (12.1.3)
Oracle Database 11g (11.2.0.3.0)

Benchmark Description

The Oracle E-Business Suite Standard R12 Benchmark combines online transaction execution by simulated users with concurrent batch processing to model a typical scenario for a global enterprise. This benchmark ran one Batch component, Order-To-Cash, in the Large size.

Results can be published in four sizes and use one or more online/batch modules

  • X-large: Maximum online users running all business flows between 10,000 to 20,000; 750,000 order to cash lines per hour and 250,000 payroll checks per hour.
    • Order to Cash Online — 2400 users
      • The percentage across the 5 transactions in Order Management module is:
        • Insert Manual Invoice — 16.66%
        • Insert Order — 32.33%
        • Order Pick Release — 16.66%
        • Ship Confirm — 16.66%
        • Order Summary Report — 16.66%
    • HR Self-Service — 4000 users
    • Customer Support Flow — 8000 users
    • Procure to Pay — 2000 users
  • Large: 10,000 online users; 100,000 order to cash lines per hour and 100,000 payroll checks per hour.
  • Medium: up to 3000 online users; 50,000 order to cash lines per hour and 10,000 payroll checks per hour.
  • Small: up to 1000 online users; 10,000 order to cash lines per hour and 5,000 payroll checks per hour.

Key Points and Best Practices

  • All system optimizations are in the published report, find link in See Also section below.

See Also

Disclosure Statement

Oracle E-Business Large Order-To-Cash Batch workload, SPARC T7-1, 4.13 GHz, 1 chip, 32 cores, 256 threads, 256 GB memory, elapsed time 21.90 minutes, 273,973 hourly order line throughput, Oracle Solaris 11.3, Oracle E-Business Suite 12.1.3, Oracle Database 11g Release 2, Results as of 10/25/2015.

SAP Two-Tier Standard Sales and Distribution SD Benchmark: SPARC T7-2 World Record 2 Processors

Oracle's SPARC T7-2 server produces a world record result for 2-processors on the SAP two-tier Sales and Distribution (SD) Standard Application Benchmark using SAP Enhancement Package 5 for SAP ERP 6.0 (2 chips / 64 cores / 512 threads).

  • The SPARC T7-2 server achieved 30,800 SAP SD benchmark users running the two-tier SAP Sales and Distribution (SD) Standard Application Benchmark using SAP Enhancement Package 5 for SAP ERP 6.0.

  • The SPARC T7-2 server achieved 1.9 times more users than the Dell PowerEdge R730 server result.

  • The SPARC T7-2 server achieved 1.5 times more users than the IBM Power System S824 server result.

  • The SPARC T7-2 server achieved 1.9 times more users than the HP ProLiant DL380 Gen9 server result.

  • The SPARC T7-2 server result was run with Oracle Solaris 11 and used Oracle Database 12c.

Performance Landscape

SAP-SD 2-tier performance table in decreasing performance order for leading two-processor systems and four-processor IBM Power System S824 server, with SAP ERP 6.0 Enhancement Package 5 for SAP ERP 6.0 results (current version of the benchmark as of May, 2012).

SAP SD Two-Tier Benchmark
System
Processor
OS
Database
Users Resp Time
(sec)
Version Cert#
SPARC T7-2
2 x SPARC M7 (2x 32core)
Oracle Solaris 11
Oracle Database 12c
30,800 0.96 EHP5 2015050
IBM Power S824
4 x POWER8 (4x 6core)
AIX 7
DB2 10.5
21,212 0.98 EHP5 2014016
Dell PowerEdge R730
2 x Intel E5-2699 v3 (2x 18core)
Red Hat Enterprise Linux 7
SAP ASE 16
16,500 0.99 EHP5 2014033
HP ProLiant DL380 Gen9
2 x Intel E5-2699 v3 (2x 18core)
Red Hat Enterprise Linux 6.5
SAP ASE 16
16,101 0.99 EHP5 2014032

Version – Version of SAP, EHP5 refers to SAP ERP 6.0 Enhancement Package 5 for SAP ERP 6.0

Number of cores presented are per chip, to get system totals, multiple by the number of chips.

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Configuration Summary and Results

Database/Application Server:

1 x SPARC T7-2 server with
2 x SPARC M7 processors (4.13 GHz, total of 2 processors / 64 cores / 512 threads)
1 TB memory
Oracle Solaris 11.3
Oracle Database 12c

Database Storage:
3 x Sun Server X3-2L each with
2 x Intel Xeon Processors E5-2609 (2.4 GHz)
16 GB memory
4 x Sun Flash Accelerator F40 PCIe Card
12 x 3 TB SAS disks
Oracle Solaris 11

REDO log Storage:
1 x Pillar FS-1 Flash Storage System, with
2 x FS1-2 Controller (Netra X3-2)
2 x FS1-2 Pilot (X4-2)
4 x DE2-24P Disk enclosure
96 x 300 GB 10000 RPM SAS Disk Drive Assembly

Certified Results (published by SAP)

Number of SAP SD benchmark users: 30,800
Average dialog response time: 0.96 seconds
Throughput:
  Fully processed order line items per hour: 3,372,000
  Dialog steps per hour: 10,116,000
  SAPS: 168,600
Average database request time (dialog/update): 0.022 sec / 0.047 sec
SAP Certification: 2015050

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is an ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard application benchmarks, SAP Enhancement Package 5 for SAP ERP 6.0 as of 10/23/15:

SPARC T7-2 (2 processors, 64 cores, 512 threads) 30,800 SAP SD users, 2 x 4.13 GHz SPARC M7, 1 TB memory, Oracle Database 12c, Oracle Solaris 11, Cert# 2015050.
IBM Power System S824 (4 processors, 24 cores, 192 threads) 21,212 SAP SD users, 4 x 3.52 GHz POWER8, 512 GB memory, DB2 10.5, AIX 7, Cert#2014016
Dell PowerEdge R730 (2 processors, 36 cores, 72 threads) 16,500 SAP SD users, 2 x 2.3 GHz Intel Xeon Processor E5-2699 v3 256 GB memory, SAP ASE 16, RHEL 7, Cert#2014033
HP ProLiant DL380 Gen9 (2 processors, 36 cores, 72 threads) 16,101 SAP SD users, 2 x 2.3 GHz Intel Xeon Processor E5-2699 v3 256 GB memory, SAP ASE 16, RHEL 6.5, Cert#2014032

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

SPARC T7-1 Delivers 1-Chip World Records for SPEC CPU2006 Rate Benchmarks

This page has been updated on November 19, 2015. The SPARC T7-1 server results have been published at www.spec.org.

Oracle's SPARC T7-1 server delivered world record SPEC CPU2006 rate benchmark results for systems with one chip. This was accomplished with Oracle Solaris 11.3 and Oracle Solaris Studio 12.4 software.

  • The SPARC T7-1 server achieved world record scores of 1200 SPECint_rate2006, 1120 SPECint_rate_base2006, 832 SPECfp_rate2006, and 801 SPECfp_rate_base2006.

  • The SPARC T7-1 server beat the 1 chip Fujitsu CELSIUS C740 with an Intel Xeon Processor E5-2699 v3 by 1.7x on the SPECint_rate2006 benchmark. The SPARC T7-1 server beat the 1 chip NEC Express5800/R120f-1M with an Intel Xeon Processor E5-2699 v3 by 1.8x on the SPECfp_rate2006 benchmark.

  • The SPARC T7-1 server beat the 1 chip IBM Power S812LC server with a POWER8 processor by 1.9 times on the SPECint_rate2006 benchmark and by 1.8 times on the SPECfp_rate2006 benchmark.

  • The SPARC T7-1 server beat the 1 chip Fujitsu SPARC M10-4S with a SPARC64 X+ processor by 2.2x on the SPECint_rate2006 benchmark and by 1.6x on the SPECfp_rate2006 benchmark.

  • The SPARC T7-1 server improved upon the previous generation SPARC platform which used the SPARC T5 processor by 2.5 on the SPECint_rate2006 benchmark and by 2.3 on the SPECfp_rate2006 benchmark.

The SPEC CPU2006 benchmarks are derived from the compute-intensive portions of real applications, stressing chip, memory hierarchy, and compilers. The benchmarks are not intended to stress other computer components such as networking, the operating system, or the I/O system. Note that there are many other SPEC benchmarks, including benchmarks that specifically focus on Java computing, enterprise computing, and network file systems.

Performance Landscape

Complete benchmark results are at the SPEC website. The tables below provide the new Oracle results, as well as select results from other vendors.

Presented are single chip SPEC CPU2006 rate results. Only the best results published at www.spec.org per chip type are presented (best Intel, IBM, Fujitsu, Oracle chips).

SPEC CPU2006 Rate Results – One Chip
System Chip Peak Base
  SPECint_rate2006
SPARC T7-1 SPARC M7 (4.13 GHz, 32 cores) 1200 1120
Fujitsu CELSIUS C740 Intel E5-2699 v3 (2.3 GHz, 18 cores) 715 693
IBM Power S812LC POWER8 (2.92 GHz, 10 cores) 642 482
Fujitsu SPARC M10-4S SPARC64 X+ (3.7 GHz, 16 cores) 546 479
SPARC T5-1B SPARC T5 (3.6 GHz, 16 cores) 489 441
IBM Power 710 Express POWER7 (3.55 GHz, 8 cores) 289 255
  SPECfp_rate2006
SPARC T7-1 SPARC M7 (4.13 GHz, 32 cores) 832 801
NEC Express5800/R120f-1M Intel E5-2699 v3 (2.3 GHz, 18 cores) 474 460
IBM Power S812LC POWER8 (2.92 GHz, 10 cores) 468 394
Fujitsu SPARC M10-4S SPARC64 X+ (3.7 GHz, 16 cores) 462 418
SPARC T5-1B SPARC T5 (3.6 GHz, 16 cores) 369 350
IBM Power 710 Express POWER7 (3.55 GHz, 8 cores) 248 229

The following table highlights the performance of the single-chip SPARC M7 processor based server to the best published two-chip POWER8 processor based server.

SPEC CPU2006 Rate Results
Comparing One SPARC M7 Chip to Two POWER8 Chips
System Chip Peak Base
  SPECint_rate2006
SPARC T7-1 1 x SPARC M7 (4.13 GHz, 32core) 1200 1120
IBM Power S822LC 2 x POWER8 (2.92 GHz, 2x 10core) 1100 853
  SPECfp_rate2006
SPARC T7-1 1 x SPARC M7 (4.13 GHz, 32 cores) 832 801
IBM Power S822LC 2 x POWER8 (2.92 GHz, 2x 10core) 888 745

Configuration Summary

System Under Test:

SPARC T7-1
1 x SPARC M7 processor (4.13 GHz)
512 GB memory (16 x 32 GB dimms)
800 GB on 4 x 400 GB SAS SSD (mirrored)
Oracle Solaris 11.3
Oracle Solaris Studio 12.4 with 4/15 Patch Set

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark. It measures:

  • Speed — single copy performance of chip, memory, compiler
  • Rate — multiple copy (throughput)

The benchmark is also divided into integer intensive applications and floating point intensive applications:

  • integer: 12 benchmarks derived from applications such as artificial intelligence chess playing, artificial intelligence go playing, quantum computer simulation, perl, gcc, XML processing, and pathfinding
  • floating point: 17 benchmarks derived from applications, including chemistry, physics, genetics, and weather.

It is also divided depending upon the amount of optimization allowed:

  • base: optimization is consistent per compiled language, all benchmarks must be compiled with the same flags per language.
  • peak: specific compiler optimization is allowed per application.

The overall metrics for the benchmark which are commonly used are:

  • SPECint_rate2006, SPECint_rate_base2006: integer, rate
  • SPECfp_rate2006, SPECfp_rate_base2006: floating point, rate
  • SPECint2006, SPECint_base2006: integer, speed
  • SPECfp2006, SPECfp_base2006: floating point, speed

Key Points and Best Practices

  • Jobs were bound using pbind.

See Also

Disclosure Statement

SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of November 19, 2015 from www.spec.org.
SPARC T7-1: 1200 SPECint_rate2006, 1120 SPECint_rate_base2006, 832 SPECfp_rate2006, 801 SPECfp_rate_base2006; SPARC T5-1B: 489 SPECint_rate2006, 440 SPECint_rate_base2006, 369 SPECfp_rate2006, 350 SPECfp_rate_base2006; Fujitsu SPARC M10-4S: 546 SPECint_rate2006, 479 SPECint_rate_base2006, 462 SPECfp_rate2006, 418 SPECfp_rate_base2006. IBM Power 710 Express: 289 SPECint_rate2006, 255 SPECint_rate_base2006, 248 SPECfp_rate2006, 229 SPECfp_rate_base2006; Fujitsu CELSIUS C740: 715 SPECint_rate2006, 693 SPECint_rate_base2006; NEC Express5800/R120f-1M: 474 SPECfp_rate2006, 460 SPECfp_rate_base2006; IBM Power S822LC: 1100 SPECint_rate2006, 853 SPECint_rate_base2006, 888 SPECfp_rate2006, 745 SPECfp_rate_base2006; IBM Power S812LC: 642 SPECint_rate2006, 482 SPECint_rate_base2006, 468 SPECfp_rate2006, 394 SPECfp_rate_base2006.

SPARC T7-4 Delivers 4-Chip World Record for SPEC OMP2012

Oracle's SPARC T7-4 server delivered world record performance on the SPEC OMP2012 benchmark for systems with four chips. This was accomplished with Oracle Solaris 11.3 and Oracle Solaris Studio 12.4 software.

  • The SPARC T7-4 server delivered world record for systems with four chips of 27.9 SPECompG_peak2012 and 26.4 SPECompG_base2012.

  • The SPARC T7-4 server beat the four chip HP ProLiant DL580 Gen9 with Intel Xeon Processor E7-8890 v3 by 29% on the SPECompG_peak2012 benchmark.

  • This SPEC OMP2012 benchmark result demonstrates that the SPARC M7 processor performs well on floating-point intensive technical computing and modeling workloads.

Performance Landscape

Complete benchmark results are at the SPEC website, SPEC OMP2012 Results. The table below provides the new Oracle result as well as the previous best four chip results.

SPEC OMP2012 Results
Four Chip Results
System Processor Peak Base
SPARC T7-4 SPARC M7, 4.13 GHz 27.9 26.4
HP ProLiant DL580 Gen9 Intel Xeon E7-8890 v3, 2.5 GHz 21.5 20.4
Cisco UCS C460 M4 Intel Xeon E7-8890 v3, 2.5 GHz -- 20.8

Configuration Summary

Systems Under Test:

SPARC T7-4
4 x 4.13 GHz SPARC M7 processors
2 TB memory (64 x 32 GB dimms)
4 x 600 GB SAS 10,000 RPM HDD (mirrored)
Oracle Solaris 11.3 (11.3.0.30.0)
Oracle Solaris Studio 12.4 with 4/15 Patch Set

Benchmark Description

The following was taken from the SPEC website.

SPEC OMP2012 focuses on compute intensive performance, which means these benchmarks emphasize the performance of:

  • the computer processor (CPU),
  • the memory architecture,
  • the parallel support libraries, and
  • the compilers.

It is important to remember the contribution of the latter three components. SPEC OMP performance intentionally depends on more than just the processor.

SPEC OMP2012 contains a suite that focuses on parallel computing performance using the OpenMP parallelism standard.

The suite can be used to measure along the following vector:

  • Compilation method: Consistent compiler options across all programs of a given language (the base metrics) and, optionally, compiler options tuned to each program (the peak metrics).

SPEC OMP2012 is not intended to stress other computer components such as networking, the operating system, graphics, or the I/O system. Note that there are many other SPEC benchmarks, including benchmarks that specifically focus on graphics, distributed Java computing, webservers, and network file systems.

Key Points and Best Practices

  • Jobs were bound using OMP_PLACES.

See Also

Disclosure Statement

SPEC and the benchmark name SPEC OMP are registered trademarks of the Standard Performance Evaluation Corporation. Results as of November 11, 2015 from www.spec.org. SPARC T7-4 (4 chips, 128 cores, 1024 threads): 27.9 SPECompG_peak2012, 26.4 SPECompG_base2012; HP ProLiant DL580 Gen9 (4 chips, 72 cores, 144 threads): 21.5 SPECompG_peak2012, 20.4 SPECompG_base2012; Cisco UCS C460 M7 (4 chips, 72 cores, 144 threads): 20.8 SPECompG_base2012.

Friday Mar 20, 2015

Oracle ZFS Storage ZS4-4 Shows 1.8x Generational Performance Improvement on SPC-2 Benchmark

The Oracle ZFS Storage ZS4-4 appliance delivered 1.8x improved performance and 1.3x improved price performance over the previous generation Oracle ZFS Storage ZS3-4 appliance as shown by the SPC-2 benchmark.

  • Running the SPC-2 benchmark, the Oracle ZFS Storage ZS4-4 appliance delivered SPC-2 Price-Performance of $17.09 and an overall score of 31,486.23 SPC-2 MBPS.

  • The Oracle ZFS Storage continues its strong price performance by occupying the three of the top five SPC-2 price performance.

  • Oracle holds the three of the top four performance results on the SPC-2 benchmark for HDD based systems.

  • The Oracle ZFS Storage ZS4-4 appliance has a 7.6x price-performance advantage over the IBM DS8870 and 2x performance advantage as measured by the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS4-4 appliance has a 5.0x performance advantage over the new Fujitsu DX200 S3 as measured by the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS4-4 appliance has a 4.6x price-performance advantage over the Fujitsu ET8700 S2 and 1.9x performance advantage as shown by the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS4-4 appliance has a 4.6x price-performance advantage over the Hitachi Virtual Storage Platform (VSP) and 1.96x performance advantage as measured by the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS4-4 appliance has a 1.6x price-performance advantage over the HP XP7 disk array as shown by the SPC-2 benchmark (HP even discounted their hardware 63%).

Performance Landscape

SPC-2 Price-Performance

Below is a table of the top SPC-2 Price-Performance results for HDD storage based systems, presented in increasing price-performance order (as of 03/17/2015). The complete set of results may be found at SPC2 top 10 Price-Performance list.

System SPC-2
MBPS
$/SPC-2
MBPS
Results
Identifier
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 BE00002
Fujitsu Eternus DX200 S3 6,266.50 $15.42 B00071
SGI InfiniteStorage 5600 8,855.70 $15.97 B00065
Oracle ZFS Storage ZS4-4 31,486.23 $17.09 B00072
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 B00067
NEC Storage M700 14,408.89 $25.10 B00066
Sun StorageTek 2530 663.51 $26.48 B00026
HP XP7 storage 43,012.53 $28.30 B00070
Fujitsu ETERNUS DX80 S2 2,685.50 $28.48 B00055
SGI InfiniteStorage 5500-SP 4,064.49 $28.57 B00059
Hitachi Unified Storage VM 11,274.83 $32.64 B00069

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
Results Identifier = A unique identification of the result

SPC-2 Performance

The following table list the top SPC-2 -Performance results for HDD storage based systems, presented in decreasing performance order (as of 03/17/2015). The complete set of results may be found at the SPC2 top 10 Performance list.

HDD Based Systems SPC-2
MBPS
$/SPC-2
MBPS
TSC Price Results
Identifier
HP XP7 storage 43,012.52 $28.30 $1,217,462 B00070
Oracle ZFS Storage ZS4-4 31,486.23 $17.09 $538,050 B00072
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 $388,472 B00067
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 $195,915 BE00002
Fujitsu ETERNUS DX8870 S2 16,038.74 $79.51 $1,275,163 B00063
IBM System Storage DS8870 15,423.66 $131.21 $2,023,742 B00062
IBM SAN VC v6.4 14,581.03 $129.14 $1,883,037 B00061
Hitachi Virtual Storage Platform (VSP) 13,147.87 $95.38 $1,254,093 B00060
HP StorageWorks P9500 XP Storage Array 13,147.87 $88.34 $1,161,504 B00056

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at
http://www.storageperformance.org/results/benchmark_results_spc2.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS4-4 storage system in clustered configuration
2 x Oracle ZFS Storage ZS4-4 controllers with
8 x Intel Xeon processors
3 TB memory
24 x Oracle Storage Drive Enclosure DE2-24P, each with
24 x 300 GB 10K RPM SAS-2 drives

Benchmark Description

SPC Benchmark 2 (SPC-2): Consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.

  • Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
  • Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
  • Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.

SPC-2 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

See Also

Disclosure Statement

SPC-2 and SPC-2 MBPS are registered trademarks of Storage Performance Council (SPC). Results as of March 17, 2015, for more information see www.storageperformance.org.

Oracle ZFS Storage ZS4-4 - B00072, Oracle ZFS Storage ZS3-2 - BE00002, Oracle ZFS Storage ZS3-4 - B00067, Fujitsu ETERNUS DX80 S2, B00055, Fujitsu ETERNUS DX8870 S2 - B00063, Fujitsu ETERNUS DX200 S3 - B00071, HP StorageWorks P9500 XP Storage Array - B00056, HP XP7 Storage Array - B00070, Hitachi Unified Storage VM - B00069, Hitachi Virtual Storage Platform (VSP) - B00060, IBM SAN VC v6.4 - B00061, IBM System Storage DS8870 - B00062, IBM XIV Storage System Gen3 - BE00001, NEC Storage M700 - B00066, SGI InfiniteStorage 5500-SP - B00059, SGI InfiniteStorage 5600 - B00065, Sun StorageTek 2530 - B00026.

Wednesday Jun 25, 2014

Oracle ZFS Storage ZS3-2 Delivers World Record Price-Performance on SPC-2/E

The Oracle ZFS Storage ZS3-2 appliance delivered a world record Price-Performance result, world record energy result and excellent overall performance for the SPC-2/E benchmark.

  • The Oracle ZFS Storage ZS3-2 appliance delivered the top SPC-2 Price-Performance of $12.08 and it delivered an overall score of 16,212.66 SPC-2 MBPS for the SPC-2/E benchmark.

  • The Oracle ZFS Storage ZS3-2 appliance produced the top Performance-Energy SPC-2/E benchmark result of 3.67 SPC2 MBPS / watt.

  • Oracle holds the top two performance results on the SPC-2 benchmark for HDD based systems.

  • The Oracle ZFS Storage ZS3-2 appliance has an 11x price-performance advantage over the IBM DS8870.

  • The Oracle ZFS Storage ZS3-2 appliance has an 8x price-performance advantage over the Hitachi Virtual Storage Platform (VSP).

  • The Oracle ZFS Storage ZS3-2 appliance has an 7.3x price-performance advantage over the HP P9500 XP disk array.

Performance Landscape

SPC-2 Price-Performance

Below is a table of the top SPC-2 Price-Performance results for HDD storage based systems, presented in increasing price-performance order (as of 06/25/2014). The complete set of results may be found at SPC2 top 10 Price-Performance list.

System SPC-2
MBPS
$/SPC-2
MBPS
Results
Identifier
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 BE00002
SGI InfiniteStorage 5600 8,855.70 $15.97 B00065
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 B00067
NEC Storage M700 14,408.89 $25.10 B00066
Sun StorageTek 2530 663.51 $26.48 B00026
Fujitsu ETERNUS DX80 S2 2,685.50 $28.48 B00055
SGI InfiniteStorage 5500-SP 4,064.49 $28.57 B00059
Hitachi Unified Storage VM 11,274.83 $32.64 B00069

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
Results Identifier = A unique identification of the result

SPC-2/E Results

The table below list all SPC-2/E results. The SPC-2/E benchmark extends the SPC-2 benchmark by additionally measuring power consumption during the SPC-2 benchmark run.

System SPC-2
MBPS
$/SPC-2
MBPS
TSC Price SPC2 MBPS /
watt
Results
Identifier
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 $195,915 3.67 BE00002
IBM XIV Storage System Gen3 7,467.99 $152.34 $1,137,641 0.81 BE00001

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
TSC Price = Total Cost of Ownership Metric
SPC2 MBPS / watt = Number of SPC2 MB/second produced per watt consumed. Higher is Better.
Results Identifier = A unique identification of the result

SPC-2 Performance

The following table list the top SPC-2 -Performance results for HDD storage based systems, presented in decreasing performance order (as of 06/25/2014). The complete set of results may be found at the SPC2 top 10 Performance list.

System SPC-2
MBPS
$/SPC-2
MBPS
TSC Price Results
Identifier
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 $388,472 B00067
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 $195,915 BE00002
Fujitsu ETERNUS DX8870 S2 16,038.74 $79.51 $1,275,163 B00063
IBM System Storage DS8870 15,423.66 $131.21 $2,023,742 B00062
IBM SAN VC v6.4 14,581.03 $129.14 $1,883,037 B00061
Hitachi Virtual Storage Platform (VSP) 13,147.87 $95.38 $1,254,093 B00060
HP StorageWorks P9500 XP Storage Array 13,147.87 $88.34 $1,161,504 B00056

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at
http://www.storageperformance.org/results/benchmark_results_spc2.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS3-2 storage system in clustered configuration
2 x Oracle ZFS Storage ZS3-2 controllers, each with
4 x 2.1 GHz 8-core Intel Xeon processors
512 GB memory
12 x Sun Disk shelves, each with
24 x 300 GB 10K RPM SAS-2 drives

Benchmark Description

SPC Benchmark 2 (SPC-2): Consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.

  • Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
  • Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
  • Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.

SPC-2 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

SPC Benchmark 2/Energy (SPC-2/E): consists of the complete set of SPC-2 performance measurement and reporting plus the measurement and reporting of energy use. This benchmark extension provides measurement and reporting to complete storage configurations, complementing SPC-2C/E, which focuses on storage component configurations.

See Also

Disclosure Statement

SPC-2 and SPC-2 MBPS are registered trademarks of Storage Performance Council (SPC). Results as of June 25, 2014, for more information see www.storageperformance.org.

Fujitsu ETERNUS DX80 S2, B00055, Fujitsu ETERNUS DX8870 S2 - B00063, HP StorageWorks P9500 XP Storage Array - B00056, Hitachi Unified Storage VM - B00069, Hitachi Virtual Storage Platform (VSP) - B00060, IBM SAN VC v6.4 - B00061, IBM System Storage DS8870 - B00062, IBM XIV Storage System Gen3 - BE00001, NEC Storage M700 - B00066, Oracle ZFS Storage ZS3-2 - BE00002, Oracle ZFS Storage ZS3-4 - B00067, SGI InfiniteStorage 5500-SP - B00059, SGI InfiniteStorage 5600 - B00065, Sun StorageTek 2530 - B00026.

Thursday Mar 27, 2014

SPARC M6-32 Produces SAP SD Two-Tier Benchmark World Record for 32-Processor Systems

Oracle's SPARC M6-32 server produced a world record result for 32-processors on the SAP two-tier Sales and Distribution (SD) Standard Application Benchmark using SAP Enhancement Package 5 for SAP ERP 6.0 (32 chips / 384 cores / 3072 threads).

  • SPARC M6-32 server achieved 140,000 SAP SD benchmark users with a low average dialog response time of 0.58 seconds running the SAP two-tier Sales and Distribution (SD) Standard Application Benchmark using SAP Enhancement package 5 for SAP ERP 6.0.

  • The SPARC M6-32 delivered 2.5 times more users than the IBM Power 780 result using SAP Enhancement Package 5 for SAP ERP 6.0. The IBM result also had 1.7 times worse average dialog response time compared to the SPARC M6-32 server result.

  • The SPARC M6-32 delivered 3.0 times more users than the Fujitsu PRIMEQUEST 2800E (with Intel Xeon E7-8890 v2 processors) result. The Fujitsu result also had 1.7 times worse average dialog response time compared to the SPARC M6-32 server result.

  • The SPARC M6-32 server solution was run with Oracle Solaris 11 and used Oracle Database 11g.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order). With SAP ERP 6.0 Enhancement Package 4 for SAP ERP 6.0 (Old version of the benchmark, obsolete at the end of April, 2012), and SAP ERP 6.0 Enhancement Package 5 for SAP ERP 6.0 results (current version of the benchmark as of May, 2012).

System
Processor
Ch / Co / Th — Memory
OS
Database
Users Resp Time
(sec)
Version Cert#
Fujitsu SPARC M10-4S
SPARC64 X @3.0 GHz
40 / 640 / 1280 — 10 TB
Solaris 11
Oracle 11g
153,000 0.87 EHP5 2013014
SPARC M6-32 Server
SPARC M6 @3.6 GHz
32 / 384 / 3072 — 16 TB
Solaris 11
Oracle 11g
140,000 0.58 EHP5 2014008
IBM Power 795
POWER7 @4 GHz
32 / 256 / 1024 — 4 TB
AIX 7.1
DB2 9.7
126,063 0.98 EHP4 2010046
IBM Power 780
POWER7+ @3.72 GHz
12 / 96 / 834 — 1536 GB
AIX 7.1
DB2 10
57,024 0.98 EHP5 2012033
Fujitsu PRIMEQUEST 2800E
Intel Xeon E7-8890 v2 @2.8 GHz
8 / 120 / 240 — 1024 GB
Windows Server 2012 SE
SQL Server 2012
47,500 0.97 EHP5 2014003
IBM Power 760
POWER7+ @3.41 GHz
8 / 48 / 192 — 1024 GB
AIX 7.1
DB2 10
25,488 0.99 EHP5 2013004

Version – Version of SAP, EHP5 refers to SAP ERP 6.0 Enhancement Package 5 for SAP ERP 6.0 and EHP4 refers to SAP ERP 6.0 Enhancement Package 4 for SAP ERP 6.0

Ch / Co / Th – Total chips, coreas and threads

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Configuration Summary and Results

Hardware Configuration:

1 x SPARC M6-32 server with
32 x 3.6 GHz SPARC M6 processors (total of 32 processors / 384 cores / 3072 threads)
16 TB memory
6 x Sun Server X3-2L each with
2 x Intel Xeon E5-2609 2.4 GHz Processors
16 GB Memory
4 x Flash Accelerator F40
12 x 3 TB SAS disks
2 x Sun Server X3-2L each with
2 x Intel Xeon E5-2609 2.4 GHz Processors
16 GB Memory
1 x 8-Port 6Gbps SAS-2 RAID PCI Express HBA
12 x 3 TB SAS disks

Software Configuration:

Oracle Solaris 11
SAP Enhancement Package 5 for SAP ERP 6.0
Oracle Database 11g Release 2

Certified Results (published by SAP)

Number of SAP SD benchmark users:
140,000
Average dialog response time:
0.58 seconds
Throughput:

  Fully processed order line items per hour:
15,878,670
  Dialog steps per hour:
47,636,000
  SAPS:
793,930
Average database request time (dialog/update):
0.020 sec / 0.041 sec
SAP Certification:
2014008

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is an ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard application benchmarks, SAP Enhancement Package 5 for SAP ERP 6.0 as of 3/26/14:

SPARC M6-32 (32 processors, 384 cores, 3072 threads) 140,000 SAP SD users, 32 x 3.6 GHz SPARC M6, 16 TB memory, Oracle Database 11g, Oracle Solaris 11, Cert# 2014008. Fujitsu SPARC M10-4S (40 processors, 640 cores, 1280 threads) 153,000 SAP SD users, 40 x 3.0 GHz SPARC65 X, 10 TB memory, Oracle Database 11g, Oracle Solaris 11, Cert# 2013014. IBM Power 780 (12 processors, 96 cores, 384 threads) 57,024 SAP SD users, 12 x 3.72 GHz IBM POWER7+, 1536 GB memory, DB210, AIX7.1, Cert#2012033. Fujitsu PRIMEQUEST 2800E (8 processors, 120 cores, 240 threads) 47,500 SAP SD users, 8 x 2.8 GHz Intel Xeon Processor E7-8890 v2, 1024 GB memory, SQL Server 2012, Windows Server 2012 Standard Edition, Cert# 2014003. IBM Power 760 (8 processors, 48 cores, 192 threads) 25,488 SAP SD users, 8 x 3.41 GHz IBM POWER7+, 1024 GB memory, DB2 10, AIX 7.1, Cert#2013004.

Two-tier SAP Sales and Distribution (SD) standard application benchmarks, SAP Enhancement Package 4 for SAP ERP 6.0 as of 3/26/14:

IBM Power 795 (32 processors, 256 cores, 1024 threads) 126,063 SAP SD users, 32 x 4 GHz IBM POWER7, 4 TB memory, DB2 9.7, AIX7.1, Cert#2010046.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Wednesday Mar 05, 2014

SPARC T5-2 Delivers World Record 2-Socket SPECvirt_sc2010 Benchmark

Oracle's SPARC T5-2 server delivered a world record two-chip SPECvirt_sc2010 result of 4270 @ 264 VMs, establishing performance superiority in virtualized environments of the SPARC T5 processors with Oracle Solaris 11, which includes as standard virtualization products Oracle VM for SPARC and Oracle Solaris Zones.

  • The SPARC T5-2 server has 2.3x better performance than an HP BL620c G7 blade server (with two Westmere EX processors) which used VMware ESX 4.1 U1 virtualization software (best SPECvirt_sc2010 result on two-chip servers using VMware software).

  • The SPARC T5-2 server has 1.6x better performance than an IBM Flex System x240 server (with two Sandy Bridge processors) which used Kernel-based Virtual Machines (KVM).

  • This is the first SPECvirt_sc2010 result using Oracle production level software: Oracle Solaris 11.1, Oracle WebLogic Server 10.3.6, Oracle Database 11g Enterprise Edition, Oracle iPlanet Web Server 7 and Oracle Java Development Kit 7 (JDK). The only exception for the Dovecot mail server.

Performance Landscape

Complete benchmark results are at the SPEC website, SPECvirt_sc2010 Results. The following table highlights the leading two-chip results for the benchmark, bigger is better.

SPECvirt_sc2010
Leading Two-Chip Results
System Processor Result @ VMs Virtualization Software
SPARC T5-2 2 x SPARC T5, 3.6 GHz 4270 @ 264 Oracle VM Server for SPARC 3.0
Oracle Solaris Zones
IBM Flex System x240 2 x Intel E5-2690, 2.9 GHz 2741 @ 168 Red Hat Enterprise Linux 6.4 KVM
HP Proliant BL6200c G7 2 x Intel E7-2870, 2.4 GHz 1878 @ 120 VMware ESX 4.1 U1

Configuration Summary

System Under Test Highlights:

1 x SPARC T5-2 server, with
2 x 3.6 GHz SPARC T5 processors
1 TB memory
Oracle Solaris 11.1
Oracle VM Server for SPARC 3.0
Oracle iPlanet Web Server 7.0.15
Oracle PHP 5.3.14
Dovecot 2.1.17
Oracle WebLogic Server 11g (10.3.6)
Oracle Database 11g (11.2.0.3)
Java HotSpot(TM) 64-Bit Server VM on Solaris, version 1.7.0_51

Benchmark Description

The SPECvirt_sc2010 benchmark is SPEC's first benchmark addressing performance of virtualized systems. It measures the end-to-end performance of all system components that make up a virtualized environment.

The benchmark utilizes several previous SPEC benchmarks which represent common tasks which are commonly used in virtualized environments. The workloads included are derived from SPECweb2005, SPECjAppServer2004 and SPECmail2008. Scaling of the benchmark is achieved by running additional sets of virtual machines until overall throughput reaches a peak. The benchmark includes a quality of service criteria that must be met for a successful run.

Key Points and Best Practices

  • The SPARC T5 server running the Oracle Solaris 11.1, utilizes embedded virtualization products as the Oracle VM for SPARC and Oracle Solaris Zones, which provide a low overhead, flexible, scalable and manageable virtualization environment.

  • In order to provide a high level of data integrity and availability, all the benchmark data sets are stored on mirrored (RAID1) storage.

See Also

Disclosure Statement

SPEC and the benchmark name SPECvirt_sc are registered trademarks of the Standard Performance Evaluation Corporation. Results from www.spec.org as of 3/5/2014. SPARC T5-2, SPECvirt_sc2010 4270 @ 264 VMs; IBM Flex System x240, SPECvirt_sc2010 2741 @ 168 VMs; HP Proliant BL620c G7, SPECvirt_sc2010 1878 @ 120 VMs.

Friday Feb 14, 2014

SPARC M6-32 Delivers Oracle E-Business and PeopleSoft World Record Benchmarks, Linear Data Warehouse Scaling in a Virtualized Configuration

This result demonstrates how the combination of Oracle virtualization technologies for SPARC and Oracle's SPARC M6-32 server allow the deployment and concurrent high performance execution of multiple Oracle applications and databases sized for the Enterprise.

  • In an 8-chip Dynamic Domain (also known as PDom), the SPARC M6-32 server set a World Record E-Business 12.1.3 X-Large world record with 14,660 online users running five simultaneous E-Business modules.

  • In a second 8-chip Dynamic Domain, the SPARC M6-32 server set a World Record PeopleSoft HCM 9.1 HR Self-Service online supporting 35,000 users while simultaneously running a batch workload in 29.17 minutes. This was done with a database of 600,480 employees. Two other separate tests were run, one supporting 40,000 online users only and another a batch-only workload that was run in 18.27 min.

  • In a third Dynamic Domain with 16-chips on the SPARC M6-32 server, a data warehouse test was run that showed near-linear scaling.

  • On the SPARC M6-32 server, several critical applications instances were virtualized: an Oracle E-Business application and database, an Oracle's PeopleSoft application and database, and a Decision Support database instance using Oracle Database 12c.

  • In this Enterprise Virtualization benchmark a SPARC M6-32 server utilized all levels of Oracle Virtualization features available for SPARC servers. The 32-chip SPARC M6 based server was divided in three separate Dynamic Domains (also known as PDoms), available only on the SPARC Enterprise M-Series systems, which are completely electrically isolated and independent hardware partitions. Each PDom was subsequently split into multiple hypervisor-based Oracle VM for SPARC partitions (also known as LDoms), each one running its own Oracle Solaris kernel and managing its own CPUs and I/O resources. The hardware resources allocated to each Oracle VM for SPARC partition were then organized in various Oracle Solaris Zones, to further refine application tier isolation and resources management. The three PDoms were dedicated to the enterprise applications as follows:

    • Oracle E-Business PDom: Oracle E-Business 12.1.3 Suite World Record Extra-Large benchmark, exercising five Online Modules: Customer Service, Human Resources Self Service, iProcurement, Order Management and Financial, with 14,660 users and an average user response time under 2 seconds.

    • PeopleSoft PDom: PeopleSoft Human Capital Management (HCM) 9.1 FP2 World Record Benchmark, using PeopleTools 8.52 and an Oracle Database 11g Release 2, with 35,000 users, at an average user Search Time of 1.46 seconds and Save Time of 0.93 seconds. An online run with 40,000 users, had an average user Search Time of 2.17 seconds and Save Time of 1.39 seconds, and a Payroll batch run completed in 29.17 minutes elapsed time for more than 500,000 employees.

    • Decision Support PDom: An Oracle Database 12c instance executing a Decision Support workload on about 30 billion rows of data and achieving linear scalability, i.e. on the 16 chips comprising the PDom, the workload ran 16x faster than on a single chip. Specifically, the 16-chip PDom processed about 320M rows/sec whereas a single chip could process about 20M rows/sec.

  • The SPARC M6-32 server is ideally suited for large-memory utilization. In this virtualized environment, three critical applications made use of 16 TB of physical memory. Each of the Oracle VM Server for SPARC environments utilized from 4 to 8 TB of memory, more than the limits of other virtualization solutions.

  • SPARC M6-32 Server Virtualization Layout Highlights

    • The Oracle E-Business application instances were run in a dedicated Dynamic Domain consisting of 8 SPARC M6 processors and 4 TB of memory. The PDom was split into four symmetric Oracle VM Server for SPARC (LDoms) environments of 2 chips and 1 TB of memory each, two dedicated to the Application Server tier and the other two to the Database Server tier. Each Logical Domain was subsequently divided into two Oracle Solaris Zones, for a total of eight, one for each E-Business Application server and one for each Oracle Database 11g instance.

    • The PeopleSoft application was run in a dedicated Dynamic Domain (PDom) consisting of 8 SPARC M6 processors and 4 TB of memory. The PDom was split into two Oracle VM Server for SPARC (LDoms) environments one of 6 chips and 3 TB of memory, reserved for the Web and Application Server tiers, and a second one of 2 chips and 1 TB of memory, reserved for the Database tier. Two PeopleSoft Application Servers, a Web Server instance, and a single Oracle Database 11g instance were each executed in their respective and exclusive Oracle Solaris Zone.

    • The Oracle Database 12c Decision Support workload was run in a Dynamic Domain consisting of 16 SPARC M6 processors and 8 TB of memory.

  • All the Oracle Applications and Database instances were running at high level of performance and concurrently in a virtualized environment. Running three Enterprise level application environments on a single SPARC M6-32 server offers centralized administration, simplified physical layout, high availability and security features (as each PDom and LDom runs its own Oracle Solaris operating system copy physically and logically isolated from the other environments), enabling the coexistence of multiple versions Oracle Solaris and application software on a single physical server.

  • Dynamic Domains and Oracle VM Server for SPARC guests were configured with independent direct I/O domains, allowing for fast and isolated I/O paths, providing secure and high performance I/O access.

Performance Landscape

Oracle E-Business Test using Oracle Database 11g
SPARC M6-32 PDom, 8 SPARC M6 Processors, 4 TB Memory
Total Online Users Weighted Average
Response Time (sec)
90th Percentile
Response Time (s)
14,660 0.81 0.88
Multiple Online Modules X-Large Configuration (HR Self-Service, Order Management, iProcurement, Customer Service, Financial)

PeopleSoft HR Self-Service Online Plus Payroll Batch using Oracle Database 11g
SPARC M6-32 PDom, 8 SPARC M6 Processors, 4 TB Memory
HR Self-Service Payroll Batch
Elapsed (min)
Online Users Average User
Search / Save
Time (sec)
Transactions
per Second
35,000 1.46 / 0.93 116 29.17

HR Self-Service Only Payroll Batch Only
Elapsed (min)
40,000 2.17 / 1.39 132 18.27

Oracle Database 12c Decision Support Query Test
SPARC M6-32 PDom, 16 SPARC M6 Processors, 8 TB Memory
Parallelism
Chips Used
Rows Processing Rate
(rows/s)
Scaling Normalized to 1 Chip
16 319,981,734 15.9
8 162,545,303 8.1
4 80,943,271 4.0
2 40,458,329 2.0
1 20,086,829 1.0

Configuration Summary

System Under Test:

SPARC M6-32 server with
32 x SPARC M6 processors (3.6 GHz)
16 TB memory

Storage Configuration:

6 x Sun Storage 2540-M2 each with
8 x Expansion Trays (each tray equipped with 12 x 300 GB SAS drives)
7 x Sun Server X3-2L each with
2 x Intel Xeon E5-2609 2.4 GHz Processors
16 GB Memory
4 x Sun Flash Accelerator F40 PCIe 400 GB cards
Oracle Solaris 11.1 (COMSTAR)
1 x Sun Server X3-2L with
2 x Intel Xeon E5-2609 2.4 GHz Processors
16 GB Memory
12 x 3 TB SAS disks
Oracle Solaris 11.1 (COMSTAR)

Software Configuration:

Oracle Solaris 11.1 (11.1.10.5.0), Oracle E-Business
Oracle Solaris 11.1 (11.1.10.5.0), PeopleSoft
Oracle Solaris 11.1 (11.1.9.5.0), Decision Support
Oracle Database 11g Release 2, Oracle E-Business and PeopleSoft
Oracle Database 12c Release 1, Decision Support
Oracle E-Business Suite 12.1.3
PeopleSoft Human Capital Management 9.1 FP2
PeopleSoft PeopleTools 8.52.03
Oracle Java SE 6u32
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 043
Oracle WebLogic Server 11g (10.3.4)

Oracle Dynamic Domains (PDoms) resources:


Oracle E-Business PeopleSoft Oracle DSS
Processors 8 8 16
Memory 4 TB 4 TB 8 TB
Oracle Solaris 11.1 (11.1.10.5.0) 11.1 (11.1.10.5.0) 11.1 (11.1.9.5.0)
Oracle Database 11g 11g 12c
Oracle VM for SPARC /
Oracle Solaris Zones
4 LDom / 8 Zones 2 LDom / 4 Zones None
Storage 7 x Sun Server X3-2L 1 x Sun Server X3-2L
(12 x 3 TB SAS )
2 x Sun Storage 2540-M2 / 2501 pairs
4 x Sun Storage 2540-M2/2501 pairs

Benchmark Description

This benchmark consists of three different applications running concurrently. It shows that large, enterprise workloads can be run on a single system and without performance impact between application environments.

The three workloads are:

  • Oracle E-Business Suite Online

    • This test simulates thousands of online users executing transactions typical of an internal Enterprise Resource Processing, including 5 application modules: Customer Service, Human Resources Self Service, Procurement, Order Management and Financial.

    • Each database tier uses a database instance of about 600 GB in size, and supporting thousands of application users, accessing hundreds of objects (tables, indexes, SQL stored procedures, etc.).

    • The application tier includes multiple web and application server instances, specifically Apache Web Server, Oracle Application Server 10g and Oracle Java SE 6u32.

  • PeopleSoft Human Capital Management

    • This test simulates thousands of online employees, managers and Human Resource administrators executing transactions typical of a Human Resources Self Service application for the Enterprise. Typical transactions are: viewing paychecks, promoting and hiring employees, updating employee profiles, etc.

    • The database tier uses a database instance of about 500 GB in size, containing information for 500,480 employees.

    • The application tier for this test includes web and application server instances, specifically Oracle WebLogic Server 11g, PeopleSoft Human Capital Management 9.1 and Oracle Java SE 6u32.

  • Decision Support Workload using the Oracle Database.

    • The query processes 30 billion rows stored in the Oracle Database, making heavy use of Oracle parallel query processing features. It performs multiple aggregations and summaries by reading and processing all the rows of the database.

Key Points and Best Practices

Oracle E-Business Environment

The Oracle E-Business Suite setup consisted 4 Oracle E-Business environments running 5 online Oracle E-Business modules simultaneously.

The Oracle E-Business environments were deployed on 4 Oracle VM for SPARC, respectively 2 for the Application tier and 2 for the Database tier. Each LDom included 2 SPARC M6 processor chips. The Application LDom was further split into 2 Oracle Solaris Zones, each one containing one Oracle E-Business Application instance. Similarly, on the Database tier, each LDom was further divided into 2 Oracle Solaris Zones, each containing an Oracle Database instance. Applications on the same LDom shared a 10 GbE network link to connect to the Database tier LDom. Each Application in a Zone was connected to its own dedicated Database Zone. The communication between the two Zones was implemented via Oracle Solaris 11 virtual network, which provides high performance, low latency transfers at memory speed using large frames (9000 bytes vs typical 1500 bytes frames).

The Oracle E-Business setup made use of the Oracle Database Shared Server feature in order to limit memory utilization, as well as the number of database Server processes. The Oracle Database configuration and optimization was substantially out-of-the-box, except for proper sizing the Oracle Database memory areas (System Global Area and Program Global Area).

In the Oracle E-Business Application LDom handling Customer Service and HR Self Service modules, 28 Forms servers and 8 OC4J application servers were hosted in the two separate Oracle Solaris Zones, for a total of 56 forms servers and 16 applications servers.

All the Oracle Database server processes and the listener processes were executed in the Oracle Solaris FX scheduler class.

PeopleSoft Environment

The PeopleSoft Application Oracle VM for SPARC had one Oracle Solaris Zone of 12 cores containing the web tier and two Oracle Solaris Zones of 57 cores total containing the Application tier. The Database tier was contained in an Oracle VM for SPARC consisting of one Oracle Solaris Zone of 24 cores. One core, in the Application Oracle VM, was dedicated to network and disk interrupt handling.

All database data files, recovery files and Oracle Clusterware files for the PeopleSoft test were created with the Oracle Automatic Storage Management (Oracle ASM) volume manager for the added benefit of the ease of management provided by Oracle ASM integrated storage management solution.

In the application tier, 5 PeopleSoft domains with 350 application servers (70 per each domain) were hosted in the two separate Oracle Solaris Zones for a total of 10 domains with 700 application server processes.

All PeopleSoft Application processes and Web Server JVM instances were executed in the Oracle Solaris FX scheduler class.

Oracle Decision Support Environment

The decision support workload showed how the combination of a large memory (8 TB) and a large number of processors (16 chips comprising 1536 virtual CPUs) together with Oracle parallel query facility can linearly increase the performance of certain decision support queries as the number of CPUs increase.

The large memory was used to cache the entire 30 billion row Oracle table in memory. There are a number of ways to accomplish this. The method deployed in this test was to allocate sufficient memory for Oracle's "keep cache" and direct the table to the "keep cache."

To demonstrate scalability, it was necessary to ensure that the number of Oracle parallel servers was always equal to the number of available virtual CPUs. This was accomplished by the combination of providing a degree of parallelism hint to the query and setting both parallel_max_servers and parallel_min_servers to the number of virtual CPUs.

The number of virtual CPUs for each stage of the scalability test was adjusted using the psradm command available in Oracle Solaris.

See Also

Disclosure Statement

Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. PeopleSoft results as of 02/14/2014. Other results as of 09/22/2013.

Oracle E-Business Suite R12 extra-large multiple-online module benchmark, SPARC M6-32, SPARC M6, 3.6 GHz, 8 chips, 96 cores, 768 threads, 4 TB memory, 14,660 online users, average response time 0.81 sec, 90th percentile response time 0.88 sec, Oracle Solaris 11.1, Oracle Solaris Zones, Oracle VM for SPARC, Oracle E-Business Suite 12.1.3, Oracle Database 11g Release 2, Results as of 9/22/2013.

Monday Nov 25, 2013

World Record Single System TPC-H @10000GB Benchmark on SPARC T5-4

Oracle's SPARC T5-4 server delivered world record single server performance of 377,594 QphH@10000GB with price/performance of $4.65/QphH@10000GB USD on the TPC-H @10000GB benchmark. This result shows that the 4-chip SPARC T5-4 server is significantly faster than the 8-chip server results from HP (Intel x86 based).

  • The SPARC T5-4 server with four SPARC T5 processors is 2.4 times faster than the HP ProLiant DL980 G7 server with eight x86 processors.

  • The SPARC T5-4 server delivered 4.8 times better performance per chip and 3.0 times better performance per core than the HP ProLiant DL980 G7 server.

  • The SPARC T5-4 server has 28% better price/performance than the HP ProLiant DL980 G7 server (for the price/QphH metric).

  • The SPARC T5-4 server with 2 TB memory is 2.4 times faster than the HP ProLiant DL980 G7 server with 4 TB memory (for the composite metric).

  • The SPARC T5-4 server took 9 hours, 37 minutes, 54 seconds for data loading while the HP ProLiant DL980 G7 server took 8.3 times longer.

  • The SPARC T5-4 server accomplished the refresh function in around a minute, the HP ProLiant DL980 G7 server took up to 7.1 times longer to do the same function.

This result demonstrates a complete data warehouse solution that shows the performance both of individual and concurrent query processing streams, faster loading, and refresh of the data during business operations. The SPARC T5-4 server delivers superior performance and cost efficiency when compared to the HP result.

Performance Landscape

The table lists the leading TPC-H @10000GB results for non-clustered systems.

TPC-H @10000GB, Non-Clustered Systems
System
Processor
P/C/T – Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC T5-4
3.6 GHz SPARC T5
4/64/512 – 2048 GB
377,594.3 $4.65 342,714.1 416,024.4 Oracle 11g R2 11/25/13
HP ProLiant DL980 G7
2.4 GHz Intel Xeon E7-4870
8/80/160 – 4096 GB
158,108.3 $6.49 185,473.6 134,780.5 SQL Server 2012 04/15/13

P/C/T = Processors, Cores, Threads
QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric in USD (smaller is better)
QppH = the Power Numerical Quantity (bigger is better)
QthH = the Throughput Numerical Quantity (bigger is better)

The following table lists data load times and average refresh function times.

TPC-H @10000GB, Non-Clustered Systems
Database Load & Database Refresh
System
Processor
Data Loading
(h:m:s)
T5
Advan
RF1
(sec)
T5
Advan
RF2
(sec)
T5
Advan
SPARC T5-4
3.6 GHz SPARC T5
09:37:54 8.3x 58.8 7.1x 62.1 6.4x
HP ProLiant DL980 G7
2.4 GHz Intel Xeon E7-4870
79:28:23 1.0x 416.4 1.0x 394.9 1.0x

Data Loading = database load time
RF1 = throughput average first refresh transaction
RF2 = throughput average second refresh transaction
T5 Advan = the ratio of time to the SPARC T5-4 server time

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server Under Test:

SPARC T5-4 server
4 x SPARC T5 processors (3.6 GHz total of 64 cores, 512 threads)
2 TB memory
2 x internal SAS (2 x 300 GB) disk drives
12 x 16 Gb FC HBA

External Storage:

24 x Sun Server X4-2L servers configured as COMSTAR nodes, each with
2 x 2.5 GHz Intel Xeon E5-2609 v2 processors
4 x Sun Flash Accelerator F80 PCIe Cards, 800 GB each
6 x 4 TB 7.2K RPM 3.5" SAS disks
1 x 8 Gb dual port HBA

2 x 48 port Brocade 6510 Fibre Channel Switches

Software Configuration:

Oracle Solaris 11.1
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 10000 GB (Scale Factor 10000)
TPC-H Composite: 377,594.3 QphH@10000GB
Price/performance: $4.65/QphH@10000GB USD
Available: 11/25/2013
Total 3 year Cost: $1,755,709 USD
TPC-H Power: 342,714.1
TPC-H Throughput: 416,024.4
Database Load Time: 9:37:54

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

  • COMSTAR (Common Multiprotocol SCSI Target) is the software framework that enables an Oracle Solaris host to serve as a SCSI Target platform. COMSTAR uses a modular approach to break the huge task of handling all the different pieces in a SCSI target subsystem into independent functional modules which are glued together by the SCSI Target Mode Framework (STMF). The modules implementing functionality at SCSI level (disk, tape, medium changer etc.) are not required to know about the underlying transport. And the modules implementing the transport protocol (FC, iSCSI, etc.) are not aware of the SCSI-level functionality of the packets they are transporting. The framework hides the details of allocation providing execution context and cleanup of SCSI commands and associated resources and simplifies the task of writing the SCSI or transport modules.

  • The SPARC T5-4 server achieved a peak IO rate of 37 GB/sec from the Oracle database configured with this storage.

  • Twelve COMSTAR nodes were mirrored to another twelve COMSTAR nodes on which all of the Oracle database files were placed. IO performance was high and balanced across all the nodes.

  • Oracle Solaris 11.1 required very little system tuning.

  • Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric when comparing systems.

  • The SPARC T5-4 server and Oracle Solaris efficiently managed the system load of nearly two thousand Oracle Database parallel processes.

See Also

Disclosure Statement

TPC Benchmark, TPC-H, QphH, QthH, QppH are trademarks of the Transaction Processing Performance Council (TPC). Results as of 11/25/13, prices are in USD. SPARC T5-4 www.tpc.org/3293; HP ProLiant DL980 G7 www.tpc.org/3285.

Thursday Sep 26, 2013

SPARC M6-32 Delivers Oracle E-Business and PeopleSoft World Record Benchmarks, Linear Data Warehouse Scaling in a Virtualized Configuration

This result has been superceded.  Please see the latest result.

 This result demonstrates how the combination of Oracle virtualization technologies for SPARC and Oracle's SPARC M6-32 server allow the deployment and concurrent high performance execution of multiple Oracle applications and databases sized for the Enterprise.

  • In an 8-chip Dynamic Domain (also known as PDom), the SPARC M6-32 server set a World Record E-Business 12.1.3 X-Large world record with 14,660 online users running five simultaneous E-Business modules.

  • In a second 8-chip Dynamic Domain, the SPARC M6-32 server set a World Record PeopleSoft HCM 9.1 HR Self-Service online supporting 34,000 users while simultaneously running a batch workload in 29.7 minutes. This was done with a database of 600,480 employees. In a separate test running a batch-only workload was run in 21.2 min.

  • In a third Dynamic Domain with 16-chips on the SPARC M6-32 server, a data warehouse test was run that showed near-linear scaling.

  • On the SPARC M6-32 server, several critical applications instances were virtualized: an Oracle E-Business application and database, an Oracle's PeopleSoft application and database, and a Decision Support database instance using Oracle Database 12c.

  • In this Enterprise Virtualization benchmark a SPARC M6-32 server utilized all levels of Oracle Virtualization features available for SPARC servers. The 32-chip SPARC M6 based server was divided in three separate Dynamic Domains (also known as PDoms), available only on the SPARC Enterprise M-Series systems, which are completely electrically isolated and independent hardware partitions. Each PDom was subsequently split into multiple hypervisor-based Oracle VM for SPARC partitions (also known as LDoms), each one running its own Oracle Solaris kernel and managing its own CPUs and I/O resources. The hardware resources allocated to each Oracle VM for SPARC partition were then organized in various Oracle Solaris Zones, to further refine application tier isolation and resources management. The three PDoms were dedicated to the enterprise applications as follows:

    • Oracle E-Business PDom: Oracle E-Business 12.1.3 Suite World Record Extra-Large benchmark, exercising five Online Modules: Customer Service, Human Resources Self Service, iProcurement, Order Management and Financial, with 14,660 users and an average user response time under 2 seconds.

    • PeopleSoft PDom: PeopleSoft Human Capital Management (HCM) 9.1 FP2 World Record Benchmark, using PeopleTools 8.52 and an Oracle Database 11g Release 2, with 34,000 users, at an average user Search Time of 1.11 seconds and Save Time of 0.77 seconds, and a Payroll batch run completed in 29.7 minutes elapsed time for more than 500,000 employees.

    • Decision Support PDom: An Oracle Database 12c instance executing a Decision Support workload on about 30 billion rows of data and achieving linear scalability, i.e. on the 16 chips comprising the PDom, the workload ran 16x faster than on a single chip. Specifically, the 16-chip PDom processed about 320M rows/sec whereas a single chip could process about 20M rows/sec.

  • The SPARC M6-32 server is ideally suited for large-memory utilization. In this virtualized environment, three critical applications made use of 16 TB of physical memory. Each of the Oracle VM Server for SPARC environments utilized from 4 to 8 TB of memory, more than the limits of other virtualization solutions.

  • SPARC M6-32 Server Virtualization Layout Highlights

    • The Oracle E-Business application instances were run in a dedicated Dynamic Domain consisting of 8 SPARC M6 processors and 4 TB of memory. The PDom was split into four symmetric Oracle VM Server for SPARC (LDoms) environments of 2 chips and 1 TB of memory each, two dedicated to the Application Server tier and the other two to the Database Server tier. Each Logical Domain was subsequently divided into two Oracle Solaris Zones, for a total of eight, one for each E-Business Application server and one for each Oracle Database 11g instance.

    • The PeopleSoft application was run in a dedicated Dynamic Domain (PDom) consisting of 8 SPARC M6 processors and 4 TB of memory. The PDom was split into two Oracle VM Server for SPARC (LDoms) environments one of 6 chips and 3 TB of memory, reserved for the Web and Application Server tiers, and a second one of 2 chips and 1 TB of memory, reserved for the Database tier. Two PeopleSoft Application Servers, a Web Server instance, and a single Oracle Database 11g instance were each executed in their respective and exclusive Oracle Solaris Zone.

    • The Oracle Database 12c Decision Support workload was run in a Dynamic Domain consisting of 16 SPARC M6 processors and 8 TB of memory.

  • All the Oracle Applications and Database instances were running at high level of performance and concurrently in a virtualized environment. Running three Enterprise level application environments on a single SPARC M6-32 server offers centralized administration, simplified physical layout, high availability and security features (as each PDom and LDom runs its own Oracle Solaris operating system copy physically and logically isolated from the other environments), enabling the coexistence of multiple versions Oracle Solaris and application software on a single physical server.

  • Dynamic Domains and Oracle VM Server for SPARC guests were configured with independent direct I/O domains, allowing for fast and isolated I/O paths, providing secure and high performance I/O access.

Performance Landscape

Oracle E-Business Test using Oracle Database 11g
SPARC M6-32 PDom, 8 SPARC M6 Processors, 4 TB Memory
Total Online Users Weighted Average
Response Time (sec)
90th Percentile
Response Time (s)
14,660 0.81 0.88
Multiple Online Modules X-Large Configuration (HR Self-Service, Order Management, iProcurement, Customer Service, Financial)

PeopleSoft HR Self-Service Online Plus Payroll Batch using Oracle Database 11g
SPARC M6-32 PDom, 8 SPARC M6 Processors, 4 TB Memory
HR Self-Service Payroll Batch
Elapsed (min)
Online Users Average User
Search / Save
Time (sec)
Transactions
per Second
34,000 1.11 / 0.77 113 29.7

Payroll Batch Only
Elapsed (min)
21.17

Oracle Database 12c Decision Support Query Test
SPARC M6-32 PDom, 16 SPARC M6 Processors, 8 TB Memory
Parallelism
Chips Used
Rows Processing Rate
(rows/s)
Scaling Normalized to 1 Chip
16 319,981,734 15.9
8 162,545,303 8.1
4 80,943,271 4.0
2 40,458,329 2.0
1 20,086,829 1.0

Configuration Summary

System Under Test:

SPARC M6-32 server with
32 x SPARC M6 processors (3.6 GHz)
16 TB memory

Storage Configuration:

6 x Sun Storage 2540-M2 each with
8 x Expansion Trays (each tray equipped with 12 x 300 GB SAS drives)
7 x Sun Server X3-2L each with
2 x Intel Xeon E5-2609 2.4 GHz Processors
16 GB Memory
4 x Sun Flash Accelerator F40 PCIe 400 GB cards
Oracle Solaris 11.1 (COMSTAR)
1 x Sun Server X3-2L with
2 x Intel Xeon E5-2609 2.4 GHz Processors
16 GB Memory
12 x 3 TB SAS disks
Oracle Solaris 11.1 (COMSTAR)

Software Configuration:

Oracle Solaris 11.1 (11.1.10.5.0), Oracle E-Business
Oracle Solaris 11.1 (11.1.10.5.0), PeopleSoft
Oracle Solaris 11.1 (11.1.9.5.0), Decision Support
Oracle Database 11g Release 2, Oracle E-Business and PeopleSoft
Oracle Database 12c Release 1, Decision Support
Oracle E-Business Suite 12.1.3
PeopleSoft Human Capital Management 9.1 FP2
PeopleSoft PeopleTools 8.52.03
Oracle Java SE 6u32
Oracle Tuxedo, Version 10.3.0.0, 64-bit, Patch Level 043
Oracle WebLogic Server 11g (10.3.4)

Oracle Dynamic Domains (PDoms) resources:


Oracle E-Business PeopleSoft Oracle DSS
Processors 8 8 16
Memory 4 TB 4 TB 8 TB
Oracle Solaris 11.1 (11.1.10.5.0) 11.1 (11.1.10.5.0) 11.1 (11.1.9.5.0)
Oracle Database 11g 11g 12c
Oracle VM for SPARC /
Oracle Solaris Zones
4 LDom / 8 Zones 2 LDom / 4 Zones None
Storage 7 x Sun Server X3-2L 1 x Sun Server X3-2L
(12 x 3 TB SAS )
2 x Sun Storage 2540-M2 / 2501 pairs
4 x Sun Storage 2540-M2/2501 pairs

Benchmark Description

This benchmark consists of three different applications running concurrently. It shows that large, enterprise workloads can be run on a single system and without performance impact between application environments.

The three workloads are:

  • Oracle E-Business Suite Online

    • This test simulates thousands of online users executing transactions typical of an internal Enterprise Resource Processing, including 5 application modules: Customer Service, Human Resources Self Service, Procurement, Order Management and Financial.

    • Each database tier uses a database instance of about 600 GB in size, and supporting thousands of application users, accessing hundreds of objects (tables, indexes, SQL stored procedures, etc.).

    • The application tier includes multiple web and application server instances, specifically Apache Web Server, Oracle Application Server 10g and Oracle Java SE 6u32.

  • PeopleSoft Human Capital Management

    • This test simulates thousands of online employees, managers and Human Resource administrators executing transactions typical of a Human Resources Self Service application for the Enterprise. Typical transactions are: viewing paychecks, promoting and hiring employees, updating employee profiles, etc.

    • The database tier uses a database instance of about 500 GB in size, containing information for 500,480 employees.

    • The application tier for this test includes web and application server instances, specifically Oracle WebLogic Server 11g, PeopleSoft Human Capital Management 9.1 and Oracle Java SE 6u32.

  • Decision Support Workload using the Oracle Database.

    • The query processes 30 billion rows stored in the Oracle Database, making heavy use of Oracle parallel query processing features. It performs multiple aggregations and summaries by reading and processing all the rows of the database.

Key Points and Best Practices

Oracle E-Business Environment

The Oracle E-Business Suite setup consisted 4 Oracle E-Business environments running 5 online Oracle E-Business modules simultaneously. The Oracle E-Business environments were deployed on 4 Oracle VM for SPARC, respectively 2 for the Application tier and 2 for the Database tier. Each LDom included 2 SPARC M6 processor chips. The Application LDom was further split into 2 Oracle Solaris Zones, each one containing one Oracle E-Business Application instance. Similarly, on the Database tier, each LDom was further divided into 2 Oracle Solaris Zones, each containing an Oracle Database instance. Applications on the same LDom shared a 10 GbE network link to connect to the Database tier LDom. Each Application in a Zone was connected to its own dedicated Database Zone. The communication between the two Zones was implemented via Oracle Solaris 11 virtual network, which provides high performance, low latency transfers at memory speed using large frames (9000 bytes vs typical 1500 bytes frames).

The Oracle E-Business setup made use of the Oracle Database Shared Server feature in order to limit memory utilization, as well as the number of database Server processes. The Oracle Database configuration and optimization was substantially out-of-the-box, except for proper sizing the Oracle Database memory areas (System Global Area and Program Global Area).

In the Oracle E-Business Application LDom handling Customer Service and HR Self Service modules, 28 Forms servers and 8 OC4J application servers were hosted in the two separate Oracle Solaris Zones, for a total of 56 forms servers and 16 applications servers.

All the Oracle Database server processes and the listener processes were executed in the Oracle Solaris FX scheduler class.

PeopleSoft Environment

The PeopleSoft Application Oracle VM for SPARC had one Oracle Solaris Zone of 12 cores containing the web tier and two Oracle Solaris Zones of 28 cores each containing the Application tier. The Database tier was contained in an Oracle VM for SPARC consisting of one Oracle Solaris Zone of 24 cores. One and a half cores, in the Application Oracle VM, were dedicated to network and disk interrupt handling.

All database data files, recovery files and Oracle Clusterware files for the PeopleSoft test were created with the Oracle Automatic Storage Management (Oracle ASM) volume manager for the added benefit of the ease of management provided by Oracle ASM integrated storage management solution.

In the application tier, 5 PeopleSoft domains with 350 application servers (70 per each domain) were hosted in the two separate Oracle Solaris Zones for a total of 10 domains with 700 application server processes.

All PeopleSoft Application processes and Web Server JVM instances were executed in the Oracle Solaris FX scheduler class.

Oracle Decision Support Environment

The decision support workload showed how the combination of a large memory (8 TB) and a large number of processors (16 chips comprising 1536 virtual CPUs) together with Oracle parallel query facility can linearly increase the performance of certain decision support queries as the number of CPUs increase.

The large memory was used to cache the entire 30 billion row Oracle table in memory. There are a number of ways to accomplish this. The method deployed in this test was to allocate sufficient memory for Oracle's "keep cache" and direct the table to the "keep cache."

To demonstrate scalability, it was necessary to ensure that the number of Oracle parallel servers was always equal to the number of available virtual CPUs. This was accomplished by the combination of providing a degree of parallelism hint to the query and setting both parallel_max_servers and parallel_min_servers to the number of virtual CPUs.

The number of virtual CPUs for each stage of the scalability test was adjusted using the psradm command available in Oracle Solaris.

See Also

Disclosure Statement

Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 09/22/2013.

Oracle E-Business Suite R12 extra-large multiple-online module benchmark, SPARC M6-32, SPARC M6, 3.6 GHz, 8 chips, 96 cores, 768 threads, 4 TB memory, 14,660 online users, average response time 0.81 sec, 90th percentile response time 0.88 sec, Oracle Solaris 11.1, Oracle Solaris Zones, Oracle VM for SPARC, Oracle E-Business Suite 12.1.3, Oracle Database 11g Release 2, Results as of 9/20/2013.

SPARC T5-2 Server Beats x86 Server on Oracle Database Transparent Data Encryption

Database security is becoming increasingly important. Oracle Database Advanced Security Transparent Data Encryption (TDE) stops would-be attackers from bypassing the database and reading sensitive information from storage by enforcing data-at-rest encryption in the database layer. Oracle's SPARC T5-2 server outperformed x86 systems when running Oracle Database 12c with Transparent Data Encryption.

  • The SPARC T5-2 server sustained more than 8.0 GB/sec of read bandwidth while decrypting using Transparent Data Encryption (TDE) in Oracle Database 12c. This was the bandwidth available on the system and matched the rate for querying the non-encrypted data.

  • The SPARC T5-2 server achieves about 1.5x higher decryption rate per socket using Oracle Database 12c with TDE than a Sun Server X4-2 system.

  • The SPARC T5-2 server achieves more than double the decryption rate per socket using Oracle Database 12c with TDE than a Sun Server X3-2 system.

Performance Landscape

Table of Size 250 GB Encrypted with AES-128-CFB
Full Table Scan with Degree of Parallelism 128
System Chips Table Data Format SPARC T5-2 Advantage
Clear Encrypted
SPARC T5-2 2 8.4 GB/sec 8.3 GB/sec 1.0
Sun Server X4-2L 2 8.2 GB/sec 5.6 GB/sec 1.5

SPARC T5-2 1 8.4 GB/sec 4.2 GB/sec 1.0
Sun Server X4-2L 1 8.2 GB/sec 2.8 GB/sec 1.5
Sun Server X3-2L 1 8.2 GB/sec 2.0 GB/sec 2.1

Configuration Summary

Systems Under Test:

SPARC T5-2
2 x SPARC T5 processors, 3.6 GHz
256 GB memory
Oracle Solaris 11.1
Oracle Database 12c

Sun Server X3-2L
2 x Intel Xeon E5-2690 processor, 2.90 GHz
64 GB memory
Oracle Solaris 11.1
Oracle Database 12c

Sun Server X4-2L
2 x Intel Xeon E5-2697 v2 processor, 2.70 GHz
256 GB memory
Oracle Solaris 11.1
Oracle Database 12c

Storage:

Flash Storage

Benchmark Description

The purpose of the benchmark is to show the query performance of a database using data encryption to keep the data secure. The benchmark creates a 250 GB table. It is loaded both into a clear text (no encryption) tablespace and an AES-128 encrypted tablespace. Full table scans of the tables were timed.

Key Points and Best Practices

The Oracle Database feature, Transparent Data Encryption (TDE), simplifies the encryption of data within datafiles, preventing unauthorized access to it from the operating system. Transparent Data Encryption allows encryption of the entire contents of a tablespace.

With hardware acceleration of the encryption routines, the SPARC T5-2 server can achieve nearly the same query rate whether the table is encrypted or not up to a limit of about 4 GB/sec per chip.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 23 September 2013.

Wednesday Sep 25, 2013

SPARC T5-8 Delivers World Record Oracle OLAP Perf Version 3 Benchmark Result on Oracle Database 12c

Oracle's SPARC T5-8 server delivered world record query performance for systems running Oracle Database 12c for the Oracle OLAP Perf Version 3 benchmark.

  • The query throughput on the SPARC T5-8 server is 1.7x higher than that of an 8-chip Intel Xeon E7-8870 server. Both systems had sub-second average response times.

  • The SPARC T5-8 server with the Oracle Database demonstrated the ability to support at least 700 concurrent users querying OLAP cubes (with no think time), processing 2.33 million analytic queries per hour with an average response time of less than 1 second per query. This performance was enabled by keeping the entire cube in-memory utilizing the 4 TB of memory on the SPARC T5-8 server.

  • Assuming a 60 second think time between query requests, the SPARC T5-8 server can support approximately 39,450 concurrent users with the same sub-second response time.

  • The workload uses a set of realistic Business Intelligence (BI) queries that run against an OLAP cube based on a 4 billion row fact table of sales data. The 4 billion rows are partitioned by month spanning 10 years.

  • The combination of the Oracle Database 12cwith the Oracle OLAP option running on a SPARC T5-8 server supports live data updates occurring concurrently with minimally impacted user query executions.

Performance Landscape

Oracle OLAP Perf Version 3 Benchmark
Oracle cube base on 4 billion fact table rows
10 years of data partitioned by month
System Queries/
hour
Users Average Response
Time (sec)
0 sec think time 60 sec think time
SPARC T5-8 2,329,000 700 39,450 <1 sec
8-chip Intel Xeon E7-8870 1,354,000 120 22,675 <1 sec

Configuration Summary

SPARC T5-8:

1 x SPARC T5-8 server with
8 x SPARC T5 processors, 3.6 GHz
4 TB memory
Data Storage and Redo Storage
Flash Storage
Oracle Solaris 11.1 (11.1.8.2.0)
Oracle Database 12c Release 1 (12.1.0.1) with Oracle OLAP option

Sun Server X2-8:

1 x Sun Server X2-8 with
8 x Intel Xeon E7-8870 processors, 2.4 GHz
1 TB memory
Data Storage and Redo Storage
Flash Storage
Oracle Solaris 10 10/12
Oracle Database 12c Release 1 (12.1.0.1) with Oracle OLAP option

Benchmark Description

The Oracle OLAP Perf Version 3 benchmark is a workload designed to demonstrate and stress the ability of the OLAP Option to deliver fast query, near real-time updates and rich calculations using a multi-dimensional model in the context of the Oracle data warehousing.

The bulk of the benchmark entails running a number of concurrent users, each issuing typical multidimensional queries against an Oracle cube. The cube has four dimensions: time, product, customer, and channel. Each query user issues approximately 150 different queries. One query chain may ask for total sales in a particular region (e.g South America) for a particular time period (e.g. Q4 of 2010) followed by additional queries which drill down into sales for individual countries (e.g. Chile, Peru, etc.) with further queries drilling down into individual stores, etc. Another query chain may ask for yearly comparisons of total sales for some product category (e.g. major household appliances) and then issue further queries drilling down into particular products (e.g. refrigerators, stoves. etc.), particular regions, particular customers, etc.

While the core of every OLAP Perf benchmark is real world query performance, the benchmark itself offers numerous execution options such as varying data set sizes, number of users, numbers of queries for any given user and cube update frequency. Version 3 of the benchmark is executed with a much larger number of query streams than previous versions and used a cube designed for near real-time updates. The results produced by version 3 of the benchmark are not directly comparable to results produced by previous versions of the benchmark.

The near real-time update capability is implemented along the following lines. A large Oracle cube, H, is built from a 4 billion row star schema, containing data up until the end of last business day. A second small cube, D, is then created which will contain all of today's new data coming in from outside the world. It will be updated every L minutes with the data coming in within the last L minutes. A third cube, R, joins cubes H and D for reporting purposes much like a view might join data from two tables. Calculations are installed into cube R. The use of a reporting cube which draws data from different storage cubes is a common practice.

Query users are never locked out of query operations while new data is added to the update cube. The point of the demonstration is to show that an Oracle OLAP system can be designed which results in data being no more than L minutes out of date, where L may be as low as just a few minutes. This is what is meant by near real-time analytics.

Key Points and Best Practices

  • Building and querying cubes with the Oracle OLAP option requires a large temporary tablespace. Normally temporary tablespaces would reside on disk storage. However, because the SPARC T5-8 server used in this benchmark had 4 TB of main memory, it was possible to use main memory for the OLAP temporary tablespace. This was accomplished by using a temporary, memory-based file system (TMPFS) for the temporary tablespace datafiles.

  • Since typical business intelligence users are often likely to issue similar queries, either with the same or different constants in the where clauses, setting the init.ora parameter "cursor_sharing" to "force" provides for additional query throughput and a larger number of potential users.

  • Assuming the normal Oracle Database initialization parameters (e.g. SGA, PGA, processes etc.) are appropriately set, out of the box performance for the Oracle OLAP workload should be close to what is reported here. Additional performance resulted from using memory for the OLAP temporary tablespace setting "cursor_sharing" to force.

  • Oracle OLAP Cube update performance was optimized by running update processes in the FX class with a priority greater than 0.

  • The maximum lag time between updates to the source fact table and data availability to query users (what was referred to as L in the benchmark description) was less than 3 minutes for the benchmark environment on the SPARC T5-8 server.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 09/22/2013.

SPARC T5 Encryption Performance Tops Intel E5-2600 v2 Processor

The cryptography benchmark suite was developed by Oracle to measure security performance on important AES security modes. Oracle's SPARC T5 processor with it security software in silicon is faster than x86 servers that have the AES-NI instructions. In this test, the performance of on-processor encryption operations is measured (32 KB encryptions). Multiple threads are used to measure each processors maximum throughput. The SPARC T5-8 shows dramatically faster encryption.

  • A SPARC T5 processor running Oracle Solaris 11.1 is 2.7 times faster executing AES-CFB 256-bit key encryption (in cache) than the Intel E5-2697 v2 processor (with AES-NI) running Oracle Linux 6.3. AES-CFB encryption is used by Oracle Database for Transparent Data Encryption (TDE) which provides security for database storage.

  • On the AES-CFB 128-bit key encryption, the SPARC T5 processor is 2.5 times faster than the Intel E5-2697 v2 processor (with AES-NI) running Oracle Linux 6.3 for in-cache encryption. AES-CFB mode is used by Oracle Database for Transparent Data Encryption (TDE) which provides security for database storage.

  • The IBM POWER7+ has three hardware security units for 8-core processors, but IBM has not publicly shown any measured performance results on AES-CFB or other encryption modes.

Performance Landscape

Presented below are results for running encryption using the AES cipher with the CFB, CBC, CCM and GCM modes for key sizes of 128, 192 and 256. Decryption performance was similar and is not presented. Results are presented as MB/sec (10**6).

Encryption Performance – AES-CFB

Performance is presented for in-cache AES-CFB128 mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-CFB
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CFB
SPARC T5 3.60 2 54,396 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 19,960 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 12,823 Oracle Linux 6.3, IPP/AES-NI
AES-192-CFB
SPARC T5 3.60 2 61,000 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 23,217 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 14,928 Oracle Linux 6.3, IPP/AES-NI
AES-128-CFB
SPARC T5 3.60 2 68,695 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 27,740 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 17,824 Oracle Linux 6.3, IPP/AES-NI

Encryption Performance – AES-GCM

Performance is presented for in-cache AES-GCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-GCM
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-GCM
SPARC T5 3.60 2 34,101 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 15,338 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2690 2.90 2 13,520 Oracle Linux 6.3, IPP/AES-NI
AES-192-GCM
SPARC T5 3.60 2 36,852 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 15,768 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2690 2.90 2 14,159 Oracle Linux 6.3, IPP/AES-NI
AES-128-GCM
SPARC T5 3.60 2 39,003 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 16,405 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2690 2.90 2 14,877 Oracle Linux 6.3, IPP/AES-NI

Encryption Performance – AES-CCM

Performance is presented for in-cache AES-CCM mode encryption with authentication. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption/authentication was performance on 32 KB of pseudo-random data (same data for each run).

AES-CCM
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CCM
SPARC T5 3.60 2 29,431 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 19,447 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 12,493 Oracle Linux 6.3, IPP/AES-NI
AES-192-CCM
SPARC T5 3.60 2 33,715 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 22,634 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 14,507 Oracle Linux 6.3, IPP/AES-NI
AES-128-CCM
SPARC T5 3.60 2 39,188 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 26,951 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 17,256 Oracle Linux 6.3, IPP/AES-NI

Encryption Performance – AES-CBC

Performance is presented for in-cache AES-CBC mode encryption. Multiple key sizes of 256-bit, 192-bit and 128-bit are presented. The encryption was performance on 32 KB of pseudo-random data (same data for each run).

AES-CBC
Microbenchmark Performance (MB/sec)
Processor GHz Chips Performance Software Environment
AES-256-CBC
SPARC T5 3.60 2 56,933 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 19,962 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 12,822 Oracle Linux 6.3, IPP/AES-NI
AES-192-CBC
SPARC T5 3.60 2 63,767 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 23,224 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 14,915 Oracle Linux 6.3, IPP/AES-NI
AES-128-CBC
SPARC T5 3.60 2 72,508 Oracle Solaris 11.1, libsoftcrypto + libumem
Intel E5-2697 v2 2.70 2 27,733 Oracle Linux 6.3, IPP/AES-NI
Intel E5-2690 2.90 2 17,823 Oracle Linux 6.3, IPP/AES-NI

Configuration Summary

SPARC T5-2 server
2 x SPARC T5 processor, 3.6 GHz
512 GB memory
Oracle Solaris 11.1 SRU 4.2

Sun Server X4-2L server
2 x E5-2697 v2 processors, 2.70 GHz
256 GB memory
Oracle Linux 6.3

Sun Server X3-2 server
2 x E5-2690 processors, 2.90 GHz
128 GB memory
Oracle Linux 6.3

Benchmark Description

The benchmark measures cryptographic capabilities in terms of general low-level encryption, in-cache (32 KB encryptions) and on-chip using various ciphers, including AES-128-CFB, AES-192-CFB, AES-256-CFB, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CCM, AES-192-CCM, AES-256-CCM, AES-128-GCM, AES-192-GCM and AES-256-GCM.

The benchmark results were obtained using tests created by Oracle which use various application interfaces to perform the various ciphers. They were run using optimized libraries for each platform to obtain the best possible performance.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/23/2013.

Tuesday Sep 10, 2013

Oracle ZFS Storage ZS3-4 Delivers World Record SPC-2 Performance

The Oracle Storage ZS3-4 storage system delivered a world record performance result for the SPC-2 benchmark along with excellent price-performance.

  • The Oracle Storage ZS3-4 storage system delivered an overall score of 17,244.22 SPC-2 MBPS™ and a SPC-2 price-performance of $22.53 on the SPC-2 benchmark.

  • This is over a 1.6X generational improvement in performance and over a 1.5X generational improvement in price-performance than over Oracle's Sun ZFS Storage 7420 SPC-2 Benchmark results.

  • The Oracle ZFS Storage ZS3-4 storage system has 6.8X better overall throughput and nearly 1.2X better price-performance than the IBM DS3524 Express turbo, which is IBM's best overall price-performance score on the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS3-4 storage system has over 1.1X overall throughput and 5.8X better price-performance than the IBM DS8870, which is IBM's best overall performance score on the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS3-4 storage system has over 1.3X overall throughput and 3.9X better price-performance than the HP StorageWorks P9500XP Disk Array on the SPC-2 benchmark.

Performance Landscape

SPC-2 Performance Chart (in decreasing performance order)

System SPC-2
MB/s
$/SPC-2
MB/s
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 31,611 $388,472 Mirroring 09/10/13 B00067
Fujitsu DX8700 S2 16,039 $79.51 71,404 $1,275,163 Mirroring 12/03/12 B00063
IBM DS8870 15,424 $131.21 30,924 $2,023,742 RAID-5 10/03/12 B00062
IBM SAN VC v6.4 14,581 $129.14 74,492 $1,883,037 RAID-5 08/01/12 B00061
NEC Storage M700 14,409 $25.13 53,550 $361,613 Mirroring 08/19/12 B00066
Hitachi VSP 13,148 $95.38 129,112 $1,254,093 RAID-5 07/27/12 B00060
HP StorageWorks P9500 13,148 $88.34 129,112 $1,161,504 RAID-5 03/07/12 B00056
Sun ZFS Storage 7420 10,704 $35.24 31,884 $377,225 Mirroring 04/12/12 B00058
IBM DS8800 9,706 $270.38 71,537 $2,624,257 RAID-5 12/01/10 B00051
HP XP24000 8,725 $187.45 18,401 $1,635,434 Mirroring 09/08/08 B00035

SPC-2 MB/s = the Performance Metric
$/SPC-2 MB/s = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

SPC-2 Price-Performance Chart (in increasing price-performance order)

System SPC-2
MB/s
$/SPC-2
MB/s
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
SGI InfiniteStorage 5600 8,855.70 $15.97 28,748 $141,393 RAID6 03/06/13 B00065
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 31,611 $388,472 Mirroring 09/10/13 B00067
Sun Storage J4200 548.80 $22.92 11,995 $12,580 Unprotected 07/10/08 B00033
NEC Storage M700 14,409 $25.13 53,550 $361,613 Mirroring 08/19/12 B00066
Sun Storage J4400 887.44 $25.63 23,965 $22,742 Unprotected 08/15/08 B00034
Sun StorageTek 2530 672.05 $26.15 1,451 $17,572 RAID5 08/16/07 B00026
Sun StorageTek 2530 663.51 $26.48 854 $17,572 Mirroring 08/16/07 B00025
Fujitsu ETERNUS DX80 1,357.55 $26.70 4,681 $36,247 Mirroring 03/15/10 B00050
IBM DS3524 Express Turbo 2,510 $26.76 14,374 $67,185 RAID-5 12/31/10 B00053
Fujitsu ETERNUS DX80 S2 2,685.50 $28.48 17,231 $76,475 Mirroring 08/19/11 B00055

SPC-2 MB/s = the Performance Metric
$/SPC-2 MB/s = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org/results/benchmark_results_spc2.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS3-4 storage system in clustered configuration
2 x Oracle ZFS Storage ZS3-4 controllers, each with
4 x 2.4 GHz 10-core Intel Xeon processors
1024 GB memory
16 x Sun Disk shelves, each with
24 x 300 GB 15K RPM SAS-2 drives

Benchmark Description

SPC Benchmark-2 (SPC-2): Consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.

  • Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
  • Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
  • Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.

SPC-2 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

See Also

Disclosure Statement

SPC-2 and SPC-2 MBPS are registered trademarks of Storage Performance Council (SPC). Results as of September 10, 2013, for more information see www.storageperformance.org. Oracle ZFS Storage ZS3-4 B00067, Fujitsu ET 8700 S2 B00063, IBM DS8870 B00062, IBM S.V.C 6.4 B00061, NEC Storage M700 B00066, Hitachi VSP B00060, HP P9500 XP Disk Array B00056, IBM DS8800 B00051.

About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« February 2016
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
     
       
Today