Friday Nov 13, 2015

SPECjbb2015: SPARC T7-1 World Record for 1 Chip Result

Updated November 30, 2015 to point to published results and add latest, best x86 two-chip result.

Oracle's SPARC T7-1 server, using Oracle Solaris and Oracle JDK, produced world record one-chip SPECjbb2015 benchmark (MultiJVM metric) results beating all previous one- and two-chip results in the process. This benchmark was designed by the industry to showcase Java performance in the Enterprise. Performance is expressed in terms of two metrics, max-jOPS which is the maximum throughput number, and critical-jOPS which is critical throughput under service level agreements (SLAs).

  • The SPARC T7-1 server achieved 120,603 SPECjbb2015-MultiJVM max-jOPS and 60,280 SPECjbb2015-MultiJVM critical-jOPS on the SPECjbb2015 benchmark.

  • The one-chip SPARC T7-1 server delivered 2.5 times more max-jOPS performance per chip than the best two-chip result which was run on the Cisco UCS C220 M4 server using Intel v3 processors. The SPARC T7-1 server also produced 4.3 times more critical-jOPS performance per chip compared to the Cisco UCS C220 M4. The Cisco result enabled the COD BIOS option.

  • The SPARC T7-1 server delivered 2.7 times more max-jOPS performance per chip than the IBM Power S812LC using POWER8 chips. The SPARC T7-1 server also produced 4.6 times more critical-jOPS performance per chip compared to the IBM server. The SPARC M7 processor also delivered 1.45 times more critical-jOPS performance per core than IBM POWER8 processor.

  • The one-chip SPARC T7-1 server delivered 3 times more max-jOPS performance per chip than the two-chip result on the Lenovo Flex System x240 M5 using Intel v3 processors. The SPARC T7-1 server also produced 2.8 times more critical-jOPS performance per chip compared to the Lenovo. The Lenovo result did not enable the COD BIOS option.

  • The SPARC T5-2 server achieved 80,889 SPECjbb2015-MultiJVM max-jOPS and 37,422 SPECjbb2015-MultiJVM critical-jOPS on the SPECjbb2015 benchmark.

  • The one-chip SPARC T7-1 server demonstrated a 3 times max-jOPS performance improvement per chip compared to the previous generation two-chip SPARC T5-2 server.

From SPEC's press release: "The SPECjbb2015 benchmark is based on the usage model of a worldwide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations. It exercises Java 7 and higher features, using the latest data formats (XML), communication using compression, and secure messaging."

The Cluster on Die (COD) mode is a BIOS setting that effectively splits the chip in half, making the operating system think it has twice as many chips as it does (in this case, four, 9 core chips). Intel has said that COD is appropriate only for highly NUMA optimized workloads . Dell has shown that there is a 3.7x slower bandwidth to the other half of the chip split by COD.

Performance Landscape

One- and two-chip results of SPECjbb2015 MultiJVM from www.spec.org as of November 30, 2015.

SPECjbb2015
One- and Two-Chip Results
System SPECjbb2015-MultiJVM OS JDK Notes
max-jOPS critical-jOPS
SPARC T7-1
1 x SPARC M7
(4.13 GHz, 1x 32core)
120,603 60,280 Oracle Solaris 11.3 8u66 -
Cisco UCS C220 M4
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
97,551 28,318 Red Hat 6.5 8u60 COD
Dell PowerEdge R730
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
94,903 29,033 SUSE 12 8u60 COD
Cisco UCS C220 M4
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
92,463 31,654 Red Hat 6.5 8u60 COD
Lenovo Flex System x240 M5
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
80,889 43,654 Red Hat 6.5 8u60 -
SPARC T5-2
2 x SPARC T5
(3.6 GHz, 2x 16core)
80,889 37,422 Oracle Solaris 11.2 8u66 -
Oracle Server X5-2L
2 x Intel E5-2699 v3
(2.3 GHz, 2x 18core)
76,773 26,458 Oracle Solaris 11.2 8u60 -
Sun Server X4-2
2 x Intel E5-2697 v2
(2.7 GHz, 2x 12core)
52,482 19,614 Oracle Solaris 11.1 8u60 -
HP ProLiant DL120 Gen9
1 x Intel Xeon E5-2699 v3
(2.3 GHz, 18core)
47,334 9,876 Red Hat 7.1 8u51 -
IBM Power S812LC
1 x POWER8
(2.92 GHz, 10core)
44,883 13,032 Ubuntu 14.04.3 J9 VM -

* Note COD: result uses non-default BIOS setting of Cluster on Die (COD) which splits the chip in two. This requires specific NUMA optimization, in that memory traffic to the other half of the chip can see a 3.7x decrease in bandwidth

Configuration Summary

Systems Under Test:

SPARC T7-1
1 x SPARC M7 processor (4.13 GHz)
512 GB memory (16 x 32 GB dimms)
Oracle Solaris 11.3 (11.3.1.5.0)
Java HotSpot 64-Bit Server VM, version 1.8.0_66

SPARC T5-2
2 x SPARC T5 processors (3.6 GHz)
512 GB memory (32 x 16 GB dimms)
Oracle Solaris 11.2
Java HotSpot 64-Bit Server VM, version 1.8.0_66

Benchmark Description

The benchmark description, as found at the SPEC website.

The SPECjbb2015 benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community.

Features include:

  • A usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases and data-mining operations.
  • Both a pure throughput metric and a metric that measures critical throughput under service level agreements (SLAs) specifying response times ranging from 10ms to 100ms.
  • Support for multiple run configurations, enabling users to analyze and overcome bottlenecks at multiple layers of the system stack, including hardware, OS, JVM and application layers.
  • Exercising new Java 7 features and other important performance elements, including the latest data formats (XML), communication using compression, and messaging with security.
  • Support for virtualization and cloud environments.

Key Points and Best Practices

  • For the SPARC T5-2 server results, processor sets were use to isolate the different JVMs used during the test.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results from http://www.spec.org as of 11/30/2015. SPARC T7-1 120,603 SPECjbb2015-MultiJVM max-jOPS, 60,280 SPECjbb2015-MultiJVM critical-jOPS; Cisco UCS C220 M4 97,551 SPECjbb2015-MultiJVM max-jOPS, 28,318 SPECjbb2015-MultiJVM critical-jOPS; Dell PowerEdge R730 94,903 SPECjbb2015-MultiJVM max-jOPS, 29,033 SPECjbb2015-MultiJVM critical-jOPS; Cisco UCS C220 M4 92,463 SPECjbb2015-MultiJVM max-jOPS, 31,654 SPECjbb2015-MultiJVM critical-jOPS; Lenovo Flex System x240 M5 80,889 SPECjbb2015-MultiJVM max-jOPS, 43,654 SPECjbb2015-MultiJVM critical-jOPS; SPARC T5-2 80,889 SPECjbb2015-MultiJVM max-jOPS, 37,422 SPECjbb2015-MultiJVM critical-jOPS; Oracle Server X5-2L 76,773 SPECjbb2015-MultiJVM max-jOPS, 26,458 SPECjbb2015-MultiJVM critical-jOPS; Sun Server X4-2 52,482 SPECjbb2015-MultiJVM max-jOPS, 19,614 SPECjbb2015-MultiJVM critical-jOPS; HP ProLiant DL120 Gen9 47,334 SPECjbb2015-MultiJVM max-jOPS, 9,876 SPECjbb2015-MultiJVM critical-jOPS; IBM Power S812LC 44,883 SPECjbb2015-MultiJVM max-jOPS, 13,032 SPECjbb2015-MultiJVM critical-jOPS.

Monday Oct 26, 2015

Real-Time Enterprise: SPARC T7-1 Faster Than x86 E5 v3

A goal of the modern business is real-time enterprise where analytics are run simultaneously with transaction processing on the same system to provide the most effective decision making. Oracle Database 12c Enterprise Edition utilizing the In-Memory option is designed to have the same database able to perform transactions at the highest performance and to transform analytical calculations that once took days or hours to complete orders of magnitude faster.

Oracle's SPARC M7 processor has deep innovations to take the real-time enterprise to the next level of performance. In this test both OLTP transactions and analytical queries were run in a single database instance using all of the same features of Oracle Database 12c Enterprise Edition utilizing the In-Memory option in order to compare the advantages of the SPARC M7 processor compared to a generic x86 processor. On both systems the OLTP and analytical queries both took about half of the processing load of the server.

In this test Oracle's SPARC T7-1 server is compared to a two-chip x86 E5 v3 based server. On analytical queries the SPARC M7 processor is 8.2x faster than the x86 E5 v3 processor. Simultaneously on OLTP transactions the SPARC M7 processor is 2.9x faster than the x86 E5 v3 processor. In addition, the SPARC T7-1 server had better OLTP transactional response time than the x86 E5 v3 server.

The SPARC M7 processor does this by using the Data Accelerator co-processor (DAX). DAX is not a SIMD instruction set, but rather an actual co-processor that offloads in-memory queries which frees the cores up for other processing. The DAX has direct access to the memory bus and can execute scans at near full memory bandwidth. Oracle makes the DAX API available to other applications, so this kind of acceleration is not just to the Oracle database, it is open.

The results below were obtained running a set of OLTP transactions and analytic queries simultaneously against two schema: a real time online orders system and a related historical orders schema configured as a real cardinality database (RCDB) star schema. The in-memory analytics RCDB queries are executed using the Oracle Database 12c In-Memory columnar feature.

  • The SPARC T7-1 server and the x86 E5 v3 server both ran OLTP transactions and the in-memory analytics on the same database instance using Oracle Database 12c Enterprise Edition utilizing the In-Memory option.

  • The SPARC T7-1 server ran the in-memory analytics RCDB based queries 8.2x faster per chip than a two-chip x86 E5 v3 server on the 48 stream test.

  • The SPARC T7-1 server delivers 2.9x higher OLTP transaction throughput results per chip than a two-chip x86 E5 v3 server on the 48 stream test.

Performance Landscape

The table below compares the SPARC T7-1 server and 2-chip x86 E5 v3 server while running OLTP and in-memory analytics against tables in the same database instance. The same set of transactions and queries were executed on each system.

Real-Time Enterprise Performance Chart
48 RCDB DSS Streams, 224 OLTP users
System OLTP Transactions Analytic Queries
Trans Per
Second
Per Chip
Advantage
Average
Response Time
Queries Per
Minute
Per Chip
Advantage
SPARC T7-1
1 x SPARC M7 (32core)
338 K 2.9x 11 (msec) 267 8.2x
x86 E5 v3 server
2 x Intel E5-2699 v3 (2x 18core)
236 K 1.0 12 (msec) 65 1.0

The number of cores listed is per chip.
The Per Chip Advantage it computed by normalizing to a single chip's performance

Configuration Summary

SPARC Server:

1 X SPARC T7-1 server
1 X SPARC M7 processor
256 GB Memory
Oracle Solaris 11.3
Oracle Database 12c Enterprise Edition Release 12.1.0.2.10

x86 Server:

1 X Oracle Server X5-2L
2 X Intel Xeon Processor E5-2699 v3
256 GB Memory
Oracle Linux 6 Update 5 (3.8.13-16.2.1.el6uek.x86_64)
Oracle Database 12c Enterprise Edition Release 12.1.0.2.10

Benchmark Description

The Real-Time Enterprise benchmark simulates the demands of customers who want to simultaneously run both their OLTP database and the related historical warehouse DSS data that would be based on that OLTP data. It answers the question of how a system will perform when doing data analysis while at the same time executing real-time on-line transactions.

The OLTP workload simulates an Order Inventory System that exercises both reads and writes with a potentially large number of users that stresses the lock management and connectivity, as well as, database access.

The number of customers, orders and users is fully parametrized. This benchmark is base on 100 GB dataset, 15 million customers, 600 million orders and up to 580 users. The workload consists of a number of transaction types including show-expenses, part-cost, supplier-phone, low-inv, high-inv, update-price, update-phone, update-cost, and new-order.

The real cardinality database (RCDB) schema was created to showcase the potential speedup one may see moving from on disk, row format data warehouse/Star Schema, to utilizing Oracle Database 12c's In-Memory feature for analytical queries.

The workload consists of as many as 2,304 unique queries asking questions such as "In 2014, what was the total revenue of single item orders", or "In August 2013, how many orders exceeded a total price of $50". Questions like these can help a company see where to focus for further revenue growth or identify weaknesses in their offerings.

RCDB scale factor 1050 represents a 1.05 TB data warehouse. It is transformed into a star schema of 1.0 TB, and then becomes 110 GB in size when loaded in memory. It consists of 1 fact table, and 4 dimension tables with over 10.5 billion rows. There are 56 columns with most cardinalities varying between 5 and 2,000, a primary key being an example of something outside this range.

Two reports are generated: one for the OLTP-Perf workload and one for the RCDB DSS workload. For the analytical DSS workload, queries per minute and average query elapsed times are reported. For the OLTP-Perf workload, both transactions-per-seconds in thousands and OLTP average response times in milliseconds are reported.

Key Points and Best Practices

  • This benchmark utilized the SPARC M7 processor's co-processor DAX for query acceleration.

  • All SPARC T7-1 server results were run with out-of-the-box tuning for Oracle Solaris.

  • All Oracle Server X5-2L system results were run with out of the box tunings for Oracle Linux except for the setting in /etc/sysctl.conf to get large pages for the Oracle Database:

    • vm.nr_hugepages=98304

  • To create an in memory area, the following was added to the init.ora:

      inmemory_size = 120g

  • An example of how to set a table to be in memory is below:

      ALTER TABLE CUSTOMER INMEMORY MEMCOMPRESS FOR QUERY HIGH

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

In-Memory Database: SPARC T7-1 Faster Than x86 E5 v3

Fast analytics on large databases are critical to transforming key business processes. Oracle's SPARC M7 processors are specifically designed to accelerate in-memory analytics using Oracle Database 12c Enterprise Edition utilizing the In-Memory option. The SPARC M7 processor outperforms an x86 E5 v3 chip by up to 10.8x on analytics queries. In order to test real world deep analysis on the SPARC M7 processor a scenario with over 2,300 analytical queries was run against a real cardinality database (RCDB) star schema. This benchmark was audited by Enterprise Strategy Group (ESG). ESG is an IT research, analyst, strategy, and validation firm focused on the global IT community.

The SPARC M7 processor does this by using Data Accelerator co-processor (DAX). DAX is not a SIMD instruction but rather an actual co-processor that offloads in-memory queries which frees the cores up for other processing. The DAX has direct access to the memory bus and can execute scans at near full memory bandwidth. Oracle makes the DAX API available to other applications, so this kind of acceleration not just for the Oracle database, it is open.

  • The SPARC M7 processor delivers up to a 10.8x Query Per Minute speedup per chip over the Intel Xeon Processor E5-2699 v3 when executing analytical queries using the In-Memory option of Oracle Database 12c.

  • Oracle's SPARC T7-1 server delivers up to a 5.4x Query Per Minute speedup over the 2-chip x86 E5 v3 server when executing analytical queries using the In-Memory option of Oracle Database 12c.

  • The SPARC T7-1 server delivers over 143 GB/sec of memory bandwidth which is up to 7x more than the 2-chip x86 E5 v3 server when the Oracle Database 12c is executing the same analytical queries against the RCDB.

  • The SPARC T7-1 server scanned over 48 billion rows per second through the database.

  • The SPARC T7-1 server compresses the on-disk RCDB star schema by around 6x when using the Memcompress For Query High setting (more information following below) and by nearly 10x compared to a standard data warehouse row format version of the same database.

Performance Landscape

The table below compares the SPARC T7-1 server and 2-chip x86 E5 v3 server. The x86 E5 v3 server single chip compares are from actual measurements against a single chip configuration.

The number of cores is per chip, multiply by number of chips to get system total.

RCDB Performance Chart
2,304 Queries
System Elapsed
Seconds
Queries Per
Minute
System
Adv
Chip
Adv
DB Memory
Bandwidth
SPARC T7-1
1 x SPARC M7 (32core)
381 363 5.4x 10.8x 143 GB/sec
x86 E5 v3 server
2 x Intel E5-2699 v3 (2x 18core)
2059 67 1.0x 2.0x 20 GB/sec
x86 E5 v3 server
1 x Intel E5-2699 v3 (18core)
4096 34 0.5x 1.0x 10 GB/sec

Fused Decompress + Scan

The In-Memory feature of Oracle Database 12c puts tables in columnar format. There are different levels of compression that can be applied. One of these is Oracle Zip (OZIP) which is used with the "MEMCOMPRESS FOR QUERY HIGH" setting. Typically when compression is applied to data, in order to operate on it, the data must be:

    (1) Decompressed
    (2) Written back to memory in uncompressed form
    (3) Scanned and the results returned.

When OZIP is applied to the data inside of an In-Memory Columnar Unit (or IMCU, an N sized chunk of rows), the DAX is able to take this data in its compressed format and operate (scan) directly upon it, returning results in a single step. This not only saves on compute power by not having the CPU do the decompression step, but also on memory bandwidth as the uncompressed data is not put back into memory. Only the results are returned. To illustrate this, a microbenchmark was used which measured the amount of rows that could be scanned per second.

SAE hpk-uperf

Compression

This performance test was run on a Scale Factor 1750 database, which represents a 1.75 TB row format data warehouse. The database is then transformed into a star schema which ends up around 1.1 TB in size. The star schema is then loaded in memory with a setting of "MEMCOMPRESS FOR QUERY HIGH", which focuses on performance with somewhat more aggressive compression. This memory area is a separate part of the System Global Area (SGA) which is defined by the database initialization parameter "inmemory_size". See below for an example. Here is a breakdown of each table in memory with compression ratios.

Column Name Original Size
(Bytes)
In Memory
Size (Bytes)
Compression
Ratio
LINEORDER 1,103,524,528,128 178,586,451,968 6.2x
DATE 11,534,336 1,179,648 9.8x
PART 11,534,336 1,179,648 9.8x
SUPPLIER 11,534,336 1,179,648 9.8x
CUSTOMER 11,534,336 1,179,648 9.8x

Configuration Summary

SPARC Server:

1 X SPARC T7-1 server
1 X SPARC M7 processor
512 GB memory
Oracle Solaris 11.3
Oracle Database 12c Enterprise Edition Release 12.1.0.2.13

x86 Server:

1 X Oracle Server X5-2L
2 X Intel Xeon Processor E5-2699 v3
512 GB memory
Oracle Linux 6 Update 5 (3.8.13-16.2.1.el6uek.x86_64)
Oracle Database 12c Enterprise Edition Release 12.1.0.2.13

Benchmark Description

The real cardinality database (RCDB) benchmark was created to showcase the potential speedup one may see moving from on disk, row format data warehouse/Star Schema, to utilizing Oracle Database 12c's In-Memory feature for analytical queries.

The workload consists of 2,304 unique queries asking questions such as "In 2014, what was the total revenue of single item orders", or "In August 2013, how many orders exceeded a total price of $50". Questions like these can help a company see where to focus for further revenue growth or identify weaknesses in their offerings.

RCDB scale factor 1750 represents a 1.75 TB data warehouse. It is transformed into a star schema of 1.1 TB, and then becomes 179 GB in size when loaded in memory. It consists of 1 fact table, and 4 dimension tables with over 10.5 billion rows. There are 56 columns with most cardinalities varying between 5 and 2,000, a primary key being an example of something outside this range.

One problem with many industry standard generated databases is that as they have grown in size the cardinalities for the generated columns have become exceedingly unrealistic. For instance one industry standard benchmark uses a schema where at scale factor 1 TB it calls for the number of parts to be SF * 800,000. A 1 TB database that calls for 800 million unique parts is not very realistic. Therefore RCDB attempts to take some of these unrealistic cardinalities and size them to be more representative of at least a section of customer data. Obviously one cannot encompass every database in one schema, this is just an example.

We carefully scaled each system so that the optimal number of users was run on each system under test so that we did not create artificial bottlenecks. Each user ran an equal number of queries and the same queries were run on each system, allowing for a fair comparison of the results.

Key Points and Best Practices

  • This benchmark utilized the SPARC M7 processor's co-processor DAX for query acceleration.

  • All SPARC T7-1 server results were run with out of the box tuning for Oracle Solaris.

  • All Oracle Server X5-2L system results were run with out of the box tunings for Oracle Linux except for the setting in /etc/sysctl.conf to get large pages for the Oracle Database:

    • vm.nr_hugepages=64520

  • To create an in memory area, the following was added to the init.ora:

      inmemory_size = 200g

  • An example of how to set a table to be in memory is below:

      ALTER TABLE CUSTOMER INMEMORY MEMCOMPRESS FOR QUERY HIGH

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

In-Memory Aggregation: SPARC T7-2 Beats 4-Chip x86 E7 v2

Oracle's SPARC T7-2 server demonstrates better performance both in throughput and number of users compared to a four-chip x86 E7 v2 sever. The workload consists of a realistic set of business intelligence (BI) queries in a multi-user environment against a 500 million row fact table using Oracle Database 12c Enterprise Edition utilizing the In-Memory option.

  • The SPARC M7 chip delivers 2.3 times more query throughput per hour compared to an x86 E7 v2 chip.

  • The two-chip SPARC T7-2 server delivered 13% more query throughput per hour compared to a four-chip x86 E7 v2 server.

  • The two-chip SPARC T7-2 server supported over 10% more users than a four-chip x86 E7 v2 server.

  • Both the SPARC server and x86 server ran with just under 5 second average response time.

Performance Landscape

The results below were run as part of this benchmark. All results use 500,000,000 fact table rows and had average cpu utilization of 100%.

In-Memory Aggregation
500 Million Row Fact Table
System Users Queries
per Hour
Queries per Hour
per Chip
Average
Response Time
SPARC T7-2
2 x SPARC M7 (32core)
190 127,540 63,770 4.99 (sec)
x86 E7 v2
4 x E7-8895 v2 (4x 15core)
170 112,470 28,118 4.92 (sec)

The number of cores are listed per chip.

Configuration Summary

SPARC Configuration:

SPARC T7-2
2 x 4.13 GHz SPARC M7 processors
1 TB memory (32 x 32 GB)
Oracle Solaris 11.3
Oracle Database 12c Enterprise /Edition (12.1.0.2.0)

x86 Configuration:

Sun Server X4-4
4 x Intel Xeon Processor E7-8895 v2 processors
1 TB memory (64 x 16 GB)
Oracle Linux Server 6.5 (kernel 2.6.32-431.el6.x86_64)
Oracle Database 12c Enterprise /Edition (12.1.0.2.0)

Benchmark Description

The benchmark is designed to highlight the efficacy of the Oracle Database 12c In-Memory Aggregation facility (join and aggregation optimizations) together with the fast scan and filtering capability of Oracle's in-memory column store facility.

The benchmark runs analytic queries such as those seen in typical customer business intelligence (BI) applications. These are done in the context of a star schema database. The key metrics are query throughput, number of users and average response times

The implementation of the workload used to achieve the results is based on a schema consisting of 9 dimension tables together with a 500 million row fact table.

The query workload consists of randomly generated star-style queries simulating a collection of ad-hoc business intelligence users. Up to 300 concurrent users have been run, with each user running approximately 500 queries. The implementation includes a relatively small materialized view, which contains some precomputed data. The creation of the materialized view takes only a few minutes.

Key Points and Best Practices

The reported results were obtained by using the following settings on both systems except where otherwise noted:

  1. starting with a completely cold shared pool
  2. without making use of the result cache
  3. without using dynamic sampling or adaptive query optimization
  4. running all queries in parallel, where
    parallel_max_servers = 1600 (on the SPARC T7-2) or
    parallel_max_servers = 240 (on the Sun Server X4-4)
    each query hinted with PARALLEL(4)
    parallel_degree_policy = limited
  5. having appropriate queries rewritten to the materialized view, MV3, defined as
    SELECT
    /*+ append vector_transform */
    d1.calendar_year_name, d1.calendar_quarter_name, d2.all_products_name,
    d2.department_name, d2.category_name, d2.type_name, d3.all_customers_name,
    d3.region_name, d3.country_name, d3.state_province_name, d4.all_channels_name,
    d4.class_name, d4.channel_name, d5.all_ages_name, d5.age_name, d6.all_sizes_name,
    d6.household_size_name, d7.all_years_name, d7.years_customer_name, d8.all_incomes_name,
    d8.income_name, d9.all_status_name, d9.marital_status_name,
    SUM(f.sales) AS sales,
    SUM(f.units) AS units,
    SUM(f.measure_3) AS measure_3,
    SUM(f.measure_4) AS measure_4,
    SUM(f.measure_5) AS measure_5,
    SUM(f.measure_6) AS measure_6,
    SUM(f.measure_7) AS measure_7,
    SUM(f.measure_8) AS measure_8,
    SUM(f.measure_9) AS measure_9,
    SUM(f.measure_10) AS measure_10
    FROM time_dim d1, product_dim d2, customer_dim_500M_10 d3, channel_dim d4, age_dim d5,
    household_size_dim d6, years_customer_dim d7, income_dim d8, marital_status_dim d9,
    units_fact_500M_10 f
    WHERE d1.day_id = f.day_id AND
    d2.item_id = f.item_id AND
    d3.customer_id = f.customer_id AND
    d4.channel_id = f.channel_id AND
    d5.age_id = f.age_id AND
    d6.household_size_id = f.household_size_id AND
    d7.years_customer_id = f.years_customer_id AND
    d8.income_id = f.income_id AND
    d9.marital_status_id = f.marital_status_id
    GROUP BY d1.calendar_year_name, d1.calendar_quarter_name, d2.all_products_name,
    d2.department_name, d2.category_name, d2.type_name, d3.all_customers_name,
    d3.region_name, d3.country_name, d3.state_province_name, d4.all_channels_name,
    d4.class_name, d4.channel_name, d5.all_ages_name, d5.age_name, d6.all_sizes_name,
    d6.household_size_name, d7.all_years_name, d7.years
    

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of October 25, 2015.

Memory and Bisection Bandwidth: SPARC T7 and M7 Servers Faster Than x86 and POWER8

The STREAM benchmark measures delivered memory bandwidth on a variety of memory intensive tasks. Delivered memory bandwidth is key to a server delivering high performance on a wide variety of workloads. The STREAM benchmark is typically run where each chip in the system gets its memory requests satisfied from local memory. This report presents performance of Oracle's SPARC M7 processor based servers and compares their performance to x86 and IBM POWER8 servers.

Bisection bandwidth on a server is a measure of the cross-chip data bandwidth between the processors of a system where no memory access is local to the processor. Systems with large cross-chip penalties show dramatically lower bisection bandwidth. Real-world ad hoc workloads tend to perform better on systems with better bisection bandwidth because their memory usage characteristics tend to be chaotic.

IBM says the sustained or delivered bandwidth of the IBM POWER8 12-core chip is 230 GB/s. This number is a peak bandwidth calculation: 230.4 GB/sec = 9.6 GHz * 3 (r+w) * 8 byte. A similar calculation is used by IBM for the POWER8 dual-chip-module (two 6-core chips) to show a sustained or delivered bandwidth of 192 GB/sec (192.0 GB/sec = 8.0 GHz * 3 (r+w) * 8 byte). Peaks are the theoretical limits used for marketing hype, but true measured delivered bandwidth is the only useful comparison to help one understand delivered performance of real applications.

The STREAM benchmark is easy to run and anyone can measure memory bandwidth on a target system (see Key Points and Best Practices section).

  • The SPARC M7-8 server delivers over 1 TB/sec on the STREAM benchmark. This is over 2.4 times the triad bandwidth of an eight-chip x86 E7 v3 server.

  • The SPARC T7-4 delivered 2.2 times the STREAM triad bandwidth of a four-chip x86 E7 v3 server and 1.7 times the triad bandwidth of a four-chip IBM Power System S824 server.

  • The SPARC T7-2 delivered 2.5 times the STREAM triad bandwidth of a two-chip x86 E5 v3 server.

  • The SPARC M7-8 server delivered over 8.5 times the triad bisection bandwidth of an eight-chip x86 E7 v3 server.

  • The SPARC T7-4 server delivered over 2.7 times the triad bisection bandwidth of a four-chip x86 E7 v3 server and 2.3 times the triad bisection bandwidth of a four-chip IBM Power System S824 server.

  • The SPARC T7-2 server delivered over 2.7 times the triad bisection bandwidth of a two-chip x86 E5 v3 server.

Performance Landscape

The following SPARC, x86, and IBM S824 STREAM results were run as part of this benchmark effort. The IBM S822L result is from the referenced web location. The following SPARC results were all run using 32 GB dimms.

Maximum STREAM Benchmark Performance
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC M7-8 8 995,402 995,727 1,092,742 1,086,305
x86 E7 v3 8 346,771 354,679 445,550 442,184
SPARC T7-4 4 512,080 510,387 556,184 555,374
IBM S824 4 251,533 253,216 322,399 319,561
IBM S822L 4 252,743 247,314 295,556 305,955
x86 E7 v3 4 230,027 232,092 248,761 251,161
SPARC T7-2 2 259,198 259,380 285,835 285,905
x86 E5 v3 2 105,622 105,808 113,116 112,521
SPARC T7-1 1 131,323 131,308 144,956 144,706

All of the following bisection bandwidth results were run as part of this benchmark effort.

Bisection Bandwidth Benchmark Performance (Nonlocal STREAM)
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC M7-8 8 383,479 381,219 375,371 375,851
SPARC T5-8 8 172,195 172,354 250,620 250,858
x86 E7 v3 8 42,636 42,839 43,753 43,744
SPARC T7-4 4 142,549 142,548 142,645 142,729
SPARC T5-4 4 75,926 75,947 76,975 77,061
IBM S824 4 53,940 54,107 60,746 60,939
x86 E7 v3 4 41,636 47,740 51,206 51,333
SPARC T7-2 2 127,372 127,097 129,833 129,592
SPARC T5-2 2 91,530 91,597 91,761 91,984
x86 E5 v3 2 45,211 45,331 47,414 47,251

The following SPARC results were all run using 16 GB dimms.

SPARC T7 Servers – 16 GB DIMMS
Maximum STREAM Benchmark Performance
System Chips Bandwidth (MB/sec - 10^6)
Copy Scale Add Triad
SPARC T7-4 4 520,779 521,113 602,137 600,330
SPARC T7-2 2 262,586 262,760 302,758 302,085
SPARC T7-1 1 132,154 132,132 168,677 168,654

Configuration Summary

SPARC Configurations:

SPARC M7-8
8 x SPARC M7 processors (4.13 GHz)
4 TB memory (128 x 32 GB dimms)

SPARC T7-4
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB dimms)
1 TB memory (64 x 16 GB dimms)

SPARC T7-2
2 x SPARC M7 processors (4.13 GHz)
1 TB memory (32 x 32 GB dimms)
512 GB memory (32 x 16 GB dimms)

SPARC T7-1
1 x SPARC M7 processor (4.13 GHz)
512 GB memory (16 x 32 GB dimms)
256 GB memory (16 x 16 GB dimms)

Oracle Solaris 11.3
Oracle Solaris Studio 12.4

x86 Configurations:

Oracle Server X5-8
8 x Intel Xeon Processor E7-8995 v3
2 TB memory (128 x 16 GB dimms)

Oracle Server X5-4
4 x Intel Xeon Processor E7-8995 v3
1 TB memory (64 x 16 GB dimms)

Oracle Server X5-2
2 x Intel Xeon Processor E5-2699 v3
256 GB memory (16 x 16 GB dimms)

Oracle Linux 7.1
Intel Parallel Studio XE Composer Version 2016 compilers

Benchmark Description

STREAM

The STREAM benchmark measures sustainable memory bandwidth (in MB/s) for simple vector compute kernels. All memory accesses are sequential, so a picture of how fast regular data may be moved through the system is portrayed. Properly run, the benchmark displays the characteristics of the memory system of the machine and not the advantages of running from the system's memory caches.

STREAM counts the bytes read plus the bytes written to memory. For the simple Copy kernel, this is exactly twice the number obtained from the bcopy convention. STREAM does this because three of the four kernels (Scale, Add and Triad) do arithmetic, so it makes sense to count both the data read into the CPU and the data written back from the CPU. The Copy kernel does no arithmetic, but, for consistency, counts bytes the same way as the other three.

The sequential nature of the memory references is the benchmark's biggest weakness. The benchmark does not expose limitations in a system's interconnect to move data from anywhere in the system to anywhere.

Bisection Bandwidth – Easy Modification of STREAM Benchmark

To test for bisection bandwidth, processes are bound to processors in sequential order. The memory is allocated in reverse order, so that the memory is placed non-local to the process. The benchmark is then run. If the system is capable of page migration, this feature must be turned off.

Key Points and Best Practices

The stream benchmark code was compiled for the SPARC M7 processor based systems with the following flags (using cc):

    -fast -m64 -W2,-Avector:aggressive -xautopar -xreduction -xpagesize=4m

The benchmark code was compiled for the x86 based systems with the following flags (Intel icc compiler):

    -O3 -m64 -xCORE-AVX2 -ipo -openmp -mcmodel=medium -fno-alias -nolib-inline

On Oracle Solaris, binding is accomplished with either setting the environment variable SUNW_MP_PROCBIND or the OpenMP variables OMP_PROC_BIND and OMP_PLACES.

    export OMP_NUM_THREADS=512
    export SUNW_MP_PROCBIND=0-511

On Oracle Linux systems using Intel compiler, binding is accomplished by setting the environment variable KMP_AFFINITY.

    export OMP_NUM_THREADS=72
    export KMP_AFFINITY='verbose,granularity=fine,proclist=[0-71],explicit'

The source code change in the file stream.c to do the reverse allocation

    <     for (j=STREAM_ARRAY_SIZE-1; j>=0; j--) { 
                a[j] = 1.0; 
                b[j] = 2.0; 
                c[j] = 0.0; 
            }
    ---
    >     for (j=0; j<STREAM_ARRAY_SIZE; j++) {
                a[j] = 1.0; 
                b[j] = 2.0; 
                c[j] = 0.0; 
            }
    

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Hadoop TeraSort: SPARC T7-4 Top Per-Chip Performance

Oracle's SPARC T7-4 server using virtualization delivered an outstanding single server result running the Hadoop TeraSort benchmark. The SPARC T7-4 server was run with and without security. Even the secure runs on the SPARC M7 processor based server performed much faster per chip compared to competitive unsecure results.

  • The SPARC T7-4 server on a per chip basis is 4.7x faster than an IBM POWER8 based cluster on the 10 TB Hadoop TeraSort benchmark.

  • The SPARC T7-4 server running with ZFS encryption enabled on the 10 TB Hadoop TeraSort benchmark is 4.6x faster than an unsecure x86 v2 cluster on a per chip basis.

  • The SPARC T7-4 server running with ZFS encryption (AES-256-GCM) enabled on the 10 TB Hadoop TeraSort benchmark is 4.3x faster than an unsecure (plain-text) IBM POWER8 cluster on a per chip basis.

  • The SPARC T7-4 server ran the 10 TB Hadoop TeraSort benchmark in 4,259 seconds.

Performance Landscape

The following table presents results for the 10 TB Hadoop TeraSort benchmark. The rate results are determined by taking the dataset size (10**13) and dividing by the time (in minutes). These rates are further normalized by the number of systems or chips used in obtaining the results.

10 TB Hadoop TeraSort Performance Landscape
System Security Nodes Total
Chips
Time
(sec)
Sort Rate (GB/min)
Per Node Per Chip
SPARC T7-4
SPARC M7 (4.13 GHz)
unsecure 1 4 4,259 140.9 35.2
SPARC T7-4
SPARC M7 (4.13 GHz)
AES-256-GCM 1 4 4,657 128.8 32.2
IBM Power System S822L
POWER8 (3.0 GHz)
unsecure 8 32 2,490 30.1 7.5
Dell R720xd/VMware
Intel Xeon E5-2680 v2 (2.8 GHz)
unsecure 32 64 1,054 17.8 8.9
Cisco UCS CPA C240 M3
Intel Xeon E5-2665 (2.4 GHz)
unsecure 16 32 3,112 12.0 6.0

Configuration Summary

Server:

SPARC T7-4
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB)
6 x 600 GB 10K RPM SAS-2 HDD
10 GbE
Oracle Solaris 11.3 (11.3.0.29)
Oracle Solaris Studio 12.4
Java SE Runtime Environment (build 1.7.0_85-b33)
Hadoop 1.2.1

External Storage (Common Multiprotocol SCSI TARget, or COMSTAR enables system to be seen as a SCSI target device):

16 x Sun Server X3-2L
2 x Intel Xeon E5-2609 (2.4 GHz)
16 GB memory (2 x 8 GB)
2 x 600 GB SAS-2 HDD
12 x 3 TB SAS-1 HDD
4 x Sun Flash Accelerator F40 PCIe Card
Oracle Solaris 11.1 (11.1.16.5.0)
Please note: These devices are only used as storage. No Hadoop is run on these COMSTAR storage nodes. There was no compression or encryption done on these COMSTAR storage nodes.

Benchmark Description

The Hadoop TeraSort benchmark sorts 100-byte records by a contained 10-byte random key. Hadoop TeraSort is characterized by high I/O bandwidth between each compute/data node of a Hadoop cluster and the disk drives that are attached to that node.

Note: benchmark size is measured by power-of-ten not power-of-two bytes; 1 TB sort is sorting 10^12 Bytes = 10 billion 100-byte rows using an embedded 10-Byte key field of random characters, 100 GB sort is sorting 10^11 Bytes = 1 billion 100-byte rows, etc.

Key Points and Best Practices

  • The SPARC T7-4 server was configured with 15 Oracle Solaris Zones. Each Zone was running one Hadoop data-node with HDFS layered on an Oracle Solaris ZFS volume.

  • Hadoop uses a distributed, shared nothing, batch processing framework employing divide-conquer serial Map and Reduce JVM tasks with performance coming from scale-out concurrency (e.g. more tasks) rather than parallelism. Only one job scheduler and task manager can be configured per data/compute-node and both (job scheduler and task manager) have inherent scaling limitations (the hadoop design target being small compute-nodes and hundreds or even thousands of them).

  • Multiple data-nodes significantly help improve overall system utilization – HDFS becomes more distributed with more processes servicing file system operations, and more task-trackers are managing all the MapReduce work.

  • On large node systems virtualization is required to improve utilization by increasing the number of independent data/compute nodes each running their own hadoop processes.

  • I/O bandwidth to the local disk drives and network communication bandwidth are the primary determinants of Hadoop performance. Typically, Hadoop reads input data files from HDFS during the Map phase of computation, and stores intermediate file back to HDFS. Then during the subsequent Reduce phase of computation, Hadoop reads the intermediate files, and outputs the final result. The Map and Reduce phases are executed concurrently by multiple Map tasks and Reduce tasks. Tasks are purpose-built stand-alone serial applications often written in Java (but can be written in any programming language or script).

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

Competitive results found at: Dell R720xd/VMware, IBM S822L, Cisco C240 M3

Graph PageRank: SPARC M7-8 Beats x86 E5 v3 Per Chip

Graph algorithms are used in many big data and analytics workloads. The report presents performance using the PageRank algorithm. Oracle's SPARC M7 processor based systems provide better performance than an x86 E5 v3 based system.

  • Oracle's SPARC M7-8 server was able to deliver 3.2 times faster per chip performance than a two-chip x86 E5 v3 server running a PageRank algorithm implemented using Parallel Graph AnalytiX (PGX) from Oracle Labs on a medium sized graph.

Performance Landscape

The graph used for these results has 41,652,230 nodes and 1,468,365,182 edges using 22 GB of memory. All of the following results were run as part of this benchmark effort. Performance is a measure of processing rate, bigger is better.

PageRank Algorithm
Server Performance SPARC Advantage
SPARC M7-8
8 x SPARC M7 (4.13 GHz, 8x 32core)
281.1 3.2x faster per chip
x86 E5 v3 server
2 x Intel E5-2699 v3 (2.3 GHz, 2x 18core)
22.2 1.0

The number of cores are per processor.

Configuration Summary

Systems Under Test:

SPARC M7-8 server with
4 x SPARC M7 processors (4.13 GHz)
4 TB memory
Oracle Solaris 11.3
Oracle Solaris Studio 12.4

Oracle Server X5-2 with
2 x Intel Xeon Processor E5-2699 v3 (2.3 GHz)
384 GB memory
Oracle Linux
gcc 4.7.4

Benchmark Description

Graphs are a core part of many analytics workloads. They are very data intensive and stress computers. Each algorithm typically traverses the entire graph multiple times, while doing certain arithmetic operations during the traversal, it can perform (double/single precision) floating point operations.

The mathematics of PageRank are entirely general and apply to any graph or network in any domain. Thus, PageRank is now regularly used in bibliometrics, social and information network analysis, and for link prediction and recommendation. The PageRank algorithm counts the number and quality of links to a page to determine a rough estimate of the importance of the website.

Key Points and Best Practices

  • This algorithm is implemented using PGX (Parallel Graph AnalytiX) from Oracle Labs, a fast, parallel, in-memory graph analytic framework.
  • The graph used for these results is based on real world data from Twitter and has 41,652,230 nodes and 1,468,365,182 edges using 22 GB of memory.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of October 25, 2015.

Graph Breadth First Search Algorithm: SPARC T7-4 Beats 4-Chip x86 E7 v2

Graph algorithms are used in many big data and analytics workloads. Oracle's SPARC T7 processor based systems provide better performance than x86 systems with the Intel Xeon E7 v2 family of processors.

  • The SPARC T7-4 server was able to deliver 3.1x better performance than a four chip x86 server running a breadth-first search (BFS) on a large graph.

Performance Landscape

The problem is identified by "Scale" and the approximate amount of memory used. Results are listed as edge traversals in billions (ETB) per second (bigger is better). The SPARC M7 processor results were run as part of this benchmark effort. The x86 results were taken from a previous benchmark effort.

Breadth-First Search (BFS)
Scale Dataset
(GB)
ETB/sec Speedup
T7-4/x86
SPARC T7-4 x86 (4xE7 v2)
30 580 1.68 0.54 3.1x
29 282 1.76 0.62 2.8x
28 140 1.70 0.99 1.7x
27 70 1.56 1.07 1.5x
26 35 1.67 1.19 1.4x

Configuration Summary

Systems Under Test:

SPARC T7-4 server with
4 x SPARC M7 processors (4.13 GHz)
1 TB memory
Oracle Solaris 11.3
Oracle Solaris Studio 12.4

Sun Server X4-4 system with
4 x Intel Xeon E7-8895 v2 processors (2.8 GHz)
1 TB memory
Oracle Solaris 11.2
Oracle Solaris Studio 12.4

Benchmark Description

Graphs are a core part of many analytics workloads. They are very data intensive and stress computers. This benchmark does a breadth-first search on a randomly generated graph. It reports the number of graph edges traversed (in billions) per second (ETB/sec). To generate the graph, the data generator from the graph500 benchmark was used.

A description of what breadth-first search is taken from Introduction to Algorithms, page 594:

Given a graph G = (V, E) and a distinguished source vertex s, breadth-first search systematically explores the edges of G to "discover" every vertex that is reachable from s. It computes the distance (smallest number of edges) from s to each reachable vertex. It also produces a "breadth-first tree" with root s that contains all reachable vertices. For any vertex reachable from s, the simple path in the breadth-first tree from s to corresponds to a "shortest path" from s to in G, that is, a path containing the smallest number of edges. The algorithm works on both directed and undirected graphs.

Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L., Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of October 25, 2015.

Neural Network Models Using Oracle R Enterprise: SPARC T7-4 Beats 4-Chip x86 E7 v3

Oracle's SPARC T7-4 server executing neural network algorithms using Oracle R Enterprise (ORE) is up to two times faster than a four-chip x86 E7 v3 server.

  • For a neural network with two hidden layers, 10-neuron with 5-neuron hyperbolic tangent, the SPARC T7-4 server is 1.5 times faster than a four-chip x86 T7 v3 server on calculation time.

  • For a neural network with two hidden layers, 20-neuron with 10-neuron hyperbolic tangent, the SPARC T7-4 server is 2.0 times faster than than a four-chip x86 T7 v3 server on calculation time.

Performance Landscape

Oracle Enterprise R Statistics in Oracle Database
(250 million rows)
Neural Network
with Two Hidden Layers
Elapsed Calculation Time SPARC Advantage
4-chip x86 E7 v3 SPARC T7-4
10-neuron + 5-neuron
hyperbolic tangent
520.1 (sec) 337.3 (sec) 1.5x
20-neuron + 10-neuron
hyperbolic tangent
1128.4 (sec) 578.1 (sec) 2.0x

Configuration Summary

SPARC Configuration:

SPARC T7-4
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB dimms)
Oracle Solaris 11.3
Oracle Database 12c Enterprise Edition
Oracle R Enterprise 1.5
Oracle Solaris Studio 12.4 with 4/15 patch set

x86 Configuration:

Oracle Server X5-4
4 x Intel Xeon Processor E5-8895 v3 (2.6 GHz)
512 GB memory
Oracle Linux 6.4
Oracle Database 12c Enterprise Edition
Oracle R Enterprise 1.5

Storage Configuration:

Oracle Server X5-2L
2 x Intel Xeon Processor E5-2699 v3
512 GB memory
4 x 1.6 TB 2.5-inch NVMe PCIe 3.0 SSD
2 x Sun Storage Dual 16Gb FC PCIe HBA
Oracle Solaris 11.3

Benchmark Description

The benchmark is designed to run various statistical analyses using Oracle R Enterprise (ORE) with historical aviation data. The size of the benchmark data is about 35 GB, a single table holding 250 million rows. One of the most popular algorithms, neural network, has been used against the dataset to generate comparable results.

The neural network algorithms support various features. In this workload, the following two neural network features have been used: neural net with two hidden layers 10-neuron with 5-neuron hyperbolic tangent and neural net with two hidden layers 20-neuron with 10-neuron hyperbolic tangent.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

ZFS Encryption: SPARC T7-1 Performance

Oracle's SPARC T7-1 server can encrypt/decrypt at near clear text throughput. The SPARC T7-1 server can encrypt/decrypt on the fly and have CPU cycles left over for the application.

  • The SPARC T7-1 server performed 475,123 Clear 8k read IOPs. With AES-256-CCM enabled on the file syste, 8K read IOPS only drop 3.2% to 461,038.

  • The SPARC T7-1 server performed 461,038 AES-256-CCM 8K read IOPS and a two-chip x86 E5-2660 v3 server performed 224,360 AES-256-CCM 8K read IOPS. The SPARC M7 processor result is 4.1 times faster per chip.

  • The SPARC T7-1 server performed 460,600 AES-192-CCM 8K read IOPS and a two chip x86 E5-2660 v3 server performed 228,654 AES-192-CCM 8K read IOPS. The SPARC M7 processor result is 4.0 times faster per chip.

  • The SPARC T7-1 server performed 465,114 AES-128-CCM 8K read IOPS and a two chip x86 E5-2660 v3 server performed 231,911 AES-128-CCM 8K read IOPS. The SPARC M7 processor result is 4.0 times faster per chip.

  • The SPARC T7-1 server performed 475,123 clear text 8K read IOPS and a two chip x86 E5-2660 v3 server performed 438,483 clear text 8K read IOPS The SPARC M7 processor result is 2.2 times faster per chip.

Performance Landscape

Results presented below are for random read performance for 8K size. All of the following results were run as part of this benchmark effort.

Read Performance – 8K
Encryption SPARC T7-1 2 x E5-2660 v3
IOPS Resp Time % Busy IOPS Resp Time % Busy
Clear 475,123 0.8 msec 43% 438,483 0.8 msec 95%
AES-256-CCM 461,038 0.83 msec 56% 224,360 1.6 msec 97%
AES-192-CCM 465,114 0.83 msec 56% 228,654 1.5 msec 97%
AES-128-CCM 465,114 0.82 msec 57% 231,911 1.5 msec 96%

IOPS – IO operations per second
Resp Time – response time
% Busy – percent cpu usage

Configuration Summary

SPARC T7-1 server
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle Solaris 11.3
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Oracle Server X5-2L system
2 x Intel Xeon Processor E5-2660 V3 (2.60 GHz)
256 GB memory
Oracle Solaris 11.3
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Storage SAN
2 x Brocade 300 FC switches
2 x Sun Storage 6780 array with 64 disk drives / 16 GB Cache

Benchmark Description

The benchmark tests the performance of running an encrypted ZFS file system compared to the non-encrypted (clear text) ZFS file system. The tests were executed with Oracle's Vdbench tool Version 5.04.03. Three different encryption methods are tested, AES-256-CCM, AES-192-CCM and AES-128-CCM.

Key Points and Best Practices

  • The ZFS file system was configured with data cache disabled, meta cache enabled, 4 pools, 128 luns, and 192 file systems with 8K record size. Data cache was disable to insure data would be decrypted as it was read from storage. This is not a recommended setting for normal customer operations.

  • The tests were executed with Oracle's Vdbench tool against 192 file systems. Each file system was run with a queue depth of 2. The script used for testing is listed below.

  • hd=default,jvms=16
    sd=sd001,lun=/dev/zvol/rdsk/p1/vol001,size=5g,hitarea=100m
    sd=sd002,lun=/dev/zvol/rdsk/p1/vol002,size=5g,hitarea=100m
    #
    # sd003 through sd191 statements here
    #
    sd=sd192,lun=/dev/zvol/rdsk/p4/vol192,size=5g,hitarea=100m
    
    # VDBENCH work load definitions for run
    # Sequential write to fill storage.
    wd=swrite1,sd=sd*,readpct=0,seekpct=eof
    
    # Random Read work load.
    wd=rread,sd=sd*,readpct=100,seekpct=random,rhpct=100
    
    # VDBENCH Run Definitions for actual execution of load.
    rd=default,iorate=max,elapsed=3h,interval=10
    rd=seqwritewarmup,wd=swrite1,forxfersize=(1024k),forthreads=(16) 
    
    rd=default,iorate=max,elapsed=10m,interval=10
    
    rd=rread8k-50,wd=rread,forxfersize=(8k),iorate=curve, \
    curve=(95,90,80,70,60,50),forthreads=(2)
    

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Virtualized Storage: SPARC T7-1 Performance

Oracle's SPARC T7-1 server using SR-IOV enabled HBAs can achieve near native throughput. The SPARC T7-1 server, with its dramatically improved compute engine, can also achieve near native throughput with Virtual Disk (VDISK).

  • The SPARC T7-1 server is able to produce 604,219 8K read IO/Second (IOPS) with native Oracle Solaris 11.3 using 8 Gb FC HBAs. The SPARC T7-1 server using Oracle VM Server for SPARC 3.1 with 4 LDOM VDISK produced near native performance of 603,766 8K read IOPS. With SR-IOV enabled using 2 LDOMs, the SPARC T7-1 server produced 604,966 8K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 2.8 times faster virtualized IO throughput than a Sun Server X3-2L system (two Intel Xeon E5-2690, running a popular virtualization product). The virtualized x86 system produced 209,166 8K virtualized reads. The native performance of the x86 system was 338,458 8K read IOPS.

  • The SPARC T7-1 server is able to produce 891,025 4K Read IOPS with native Oracle Solaris 11.3 using 8 Gb FC HBAs. The SPARC T7-1 server using Oracle VM Server for SPARC 3.1 with 4 LDOM VDISK produced near native performance of 849,493 4K read IOPS. With SR-IOV enabled using 2 LDOMs, the SPARC T7-1 server produced 891,338 4K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 3.8 times faster virtualized IO throughput than a Sun Server X3-2L system (Intel Xeon E5-2690, running a popular virtualization product). The virtualized x86 system produced 219,830 4K virtualized reads. The native performance of the x86 system was 346,868 4K read IOPS.

  • The SPARC T7-1 server running Oracle VM Server for SPARC 3.1 ran 1.3 times faster with 16 Gb HBA compared to 8 Gb HBAs. This is quite impressive considering it was still attached to 8 Gb switches and storage.

Performance Landscape

Results presented below are for read performance for 8K size and then for 4K size. All of the following results were run as part of this benchmark effort.

Read Performance — 8K

System 8K Read IOPS Performance
Native Virtual Disk SR-IOV
SPARC T7-1 (16 Gb FC) 796,849 N/A 797,221
SPARC T7-1 (8 Gb FC) 604,219 603,766 604,966
Sun Server X3-2 (8 Gb FC) 338,458 209,166 N/A

Read Performance — 4K

System 4K Read IOPS Performance
Native Virtual Disk SR-IOV
SPARC T7-1 (16 Gb FC) 1,185,392 N/A 1,231,808
SPARC T7-1 (8 Gb FC) 891,025 849,493 891,338
Sun Server X3-2 (8 Gb FC) 346,868 219,830 N/A

Configuration Summary

SPARC T7-1 server
1 x SPARC M7 processor (4.13 GHz)
256 GB memory (16 x 16 GB)
Oracle Solaris 11.3
Oracle VM Server for SPARC 3.1
4 x Sun Storage 16 Gb Fibre Channel PCIe Universal FC HBA, Qlogic
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Sun Server X3-2 system
2 x Intel Xeon Processor E5-2690 (2.90 GHz)
128 GB memory
Oracle Solaris 11.2
Popular Virtualization Software
4 x StorageTek 8 Gb Fibre Channel PCIe HBA

Storage SAN
Brocade 5300 Switch
2 x Sun Storage 6780 array with 64 disk drives / 16 GB Cache
2 x Sun Storage 2540-M2 arrays with 36 disk drives / 1.5 GB Cache

Benchmark Description

The benchmark tests operating system IO efficiency of native and virtual machine environments. The test accesses storage devices raw and with no operating system buffering. The storage space accessed fit within the cache controller on the storage arrays for low latency and highest throughput. All accesses were random 4K or 8K reads.

Tests were executed with Oracle's Vdbench Version 5.04.03 tool against 32 LUNs. Each LUN was run with a queue depth of 32.

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/25/2015.

Oracle Internet Directory: SPARC T7-2 World Record

Oracle's SPARC T7-2 server running Oracle Internet Directory (OID, Oracle's LDAP Directory Server) on Oracle Solaris 11 on a virtualized processor configuration achieved a record result on the Oracle Internet Directory benchmark.

  • The SPARC T7-2 server, virtualized to use a single processor, achieved world record performance running Oracle Internet Directory benchmark with 50M users.

  • The SPARC T7-2 server and Oracle Internet Directory using Oracle Database 12c running on Oracle Solaris 11 achieved record result of 1.18M LDAP searches/sec with an average latency of 0.85 msec with 1000 clients.

  • The SPARC T7 server demonstrated 25% better throughput and 23% better latency for LDAP search/sec over similarly configured SPARC T5 server benchmark environment.

  • Oracle Internet Directory achieved near linear scalability on the virtualized single processor domain on the SPARC T7-2 server with 79K LDAP searches/sec with 2 cores to 1.18M LDAP searches/sec with 32 cores.

  • Oracle Internet Directory and the virtualized single processor domain on the SPARC T7-2 server achieved up to 22,408 LDAP modify/sec with an average latency of 2.23 msec for 50 clients.

Performance Landscape

A virtualized single SPARC M7 processor in a SPARC T7-2 server was used for the test results presented below. The SPARC T7-2 server and SPARC T5-2 server results were run as part of this benchmark effort. The remaining results were part of a previous benchmark effort.

Oracle Internet Directory Tests
System chips/
cores
Search Modify Add
ops/sec lat (msec) ops/sec lat (msec) ops/sec lat (msec)
SPARC T7-2 1/32 1,177,947 0.85 22,400 2.2 1,436 11.1
SPARC T5-2 2/32 944,624 1.05 16,700 2.9 1,000 15.95
SPARC T4-4 4/32 682,000 1.46 12,000 4.0 835 19.0

Scaling runs were also made on the virtualized single processor domain on the SPARC T7-2 server.

Scaling of Search Tests – SPARC T7-2, One Processor
Cores Clients ops/sec Latency (msec)
32 1000 1,177,947 0.85
24 1000 863,343 1.15
16 500 615,563 0.81
8 500 280,029 1.78
4 100 156,114 0.64
2 100 79,300 1.26

Configuration Summary

System Under Test:

SPARC T7-2
2 x SPARC M7 processors, 4.13 GHz
512 GB memory
6 x 600 GB internal disks
1 x Sun Storage ZS3-2 (used for database and log files)
Flash storage (used for redo logs)
Oracle Solaris 11.3
Oracle Internet Directory 11g Release 1 PS7 (11.1.1.7.0)
Oracle Database 12c Enterprise Edition 12.1.0.2 (64-bit)

Benchmark Description

Oracle Internet Directory (OID) is Oracle's LDAPv3 Directory Server. The throughput for five key operations are measured — Search, Compare, Modify, Mix and Add.

LDAP Search Operations Test

This test scenario involved concurrent clients binding once to OID and then performing repeated LDAP Search operations. The salient characteristics of this test scenario is as follows:

  • SLAMD SearchRate job was used.
  • BaseDN of the search is root of the DIT, the scope is SUBTREE, the search filter is of the form UID=, DN and UID are the required attribute.
  • Each LDAP search operation matches a single entry.
  • The total number concurrent clients was 1000 and were distributed amongst two client nodes.
  • Each client binds to OID once and performs repeated LDAP Search operations, each search operation resulting in the lookup of a unique entry in such a way that no client looks up the same entry twice and no two clients lookup the same entry and all entries are searched randomly.
  • In one run of the test, random entries from the 50 Million entries are looked up in as many LDAP Search operations.
  • Test job was run for 60 minutes.

LDAP Compare Operations Test

This test scenario involved concurrent clients binding once to OID and then performing repeated LDAP Compare operations on userpassword attribute. The salient characteristics of this test scenario is as follows:

  • SLAMD CompareRate job was used.
  • Each LDAP compare operation matches user password of user.
  • The total number concurrent clients was 1000 and were distributed amongst two client nodes.
  • Each client binds to OID once and performs repeated LDAP compare operations.
  • In one run of the test, random entries from the 50 Million entries are compared in as many LDAP compare operations.
  • Test job was run for 60 minutes.

LDAP Modify Operations Test

This test scenario consisted of concurrent clients binding once to OID and then performing repeated LDAP Modify operations. The salient characteristics of this test scenario is as follows:

  • SLAMD LDAP modrate job was used.
  • A total of 50 concurrent LDAP clients were used.
  • Each client updates a unique entry each time and a total of 50 Million entries are updated.
  • Test job was run for 60 minutes.
  • Value length was set to 11.
  • Attribute that is being modified is not indexed.

LDAP Mixed Load Test

The test scenario involved both the LDAP search and LDAP modify clients enumerated above.

  • The ratio involved 60% LDAP search clients, 30% LDAP bind and 10% LDAP modify clients.
  • A total of 1000 concurrent LDAP clients were used and were distributed on 2 client nodes.
  • Test job was run for 60 minutes.

LDAP Add Load Test

The test scenario involved concurrent clients adding new entries as follows.

  • Slamd standard add rate job is used.
  • A total of 500,000 entries were added.
  • A total of 16 concurrent LDAP clients were used.
  • Slamd add's inetorgperson objectclass entry with 21 attributes (includes operational attributes).

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.

SPARC T7-1 Delivers 1-Chip World Records for SPEC CPU2006 Rate Benchmarks

This page has been updated on November 19, 2015. The SPARC T7-1 server results have been published at www.spec.org.

Oracle's SPARC T7-1 server delivered world record SPEC CPU2006 rate benchmark results for systems with one chip. This was accomplished with Oracle Solaris 11.3 and Oracle Solaris Studio 12.4 software.

  • The SPARC T7-1 server achieved world record scores of 1200 SPECint_rate2006, 1120 SPECint_rate_base2006, 832 SPECfp_rate2006, and 801 SPECfp_rate_base2006.

  • The SPARC T7-1 server beat the 1 chip Fujitsu CELSIUS C740 with an Intel Xeon Processor E5-2699 v3 by 1.7x on the SPECint_rate2006 benchmark. The SPARC T7-1 server beat the 1 chip NEC Express5800/R120f-1M with an Intel Xeon Processor E5-2699 v3 by 1.8x on the SPECfp_rate2006 benchmark.

  • The SPARC T7-1 server beat the 1 chip IBM Power S812LC server with a POWER8 processor by 1.9 times on the SPECint_rate2006 benchmark and by 1.8 times on the SPECfp_rate2006 benchmark.

  • The SPARC T7-1 server beat the 1 chip Fujitsu SPARC M10-4S with a SPARC64 X+ processor by 2.2x on the SPECint_rate2006 benchmark and by 1.6x on the SPECfp_rate2006 benchmark.

  • The SPARC T7-1 server improved upon the previous generation SPARC platform which used the SPARC T5 processor by 2.5 on the SPECint_rate2006 benchmark and by 2.3 on the SPECfp_rate2006 benchmark.

The SPEC CPU2006 benchmarks are derived from the compute-intensive portions of real applications, stressing chip, memory hierarchy, and compilers. The benchmarks are not intended to stress other computer components such as networking, the operating system, or the I/O system. Note that there are many other SPEC benchmarks, including benchmarks that specifically focus on Java computing, enterprise computing, and network file systems.

Performance Landscape

Complete benchmark results are at the SPEC website. The tables below provide the new Oracle results, as well as select results from other vendors.

Presented are single chip SPEC CPU2006 rate results. Only the best results published at www.spec.org per chip type are presented (best Intel, IBM, Fujitsu, Oracle chips).

SPEC CPU2006 Rate Results – One Chip
System Chip Peak Base
  SPECint_rate2006
SPARC T7-1 SPARC M7 (4.13 GHz, 32 cores) 1200 1120
Fujitsu CELSIUS C740 Intel E5-2699 v3 (2.3 GHz, 18 cores) 715 693
IBM Power S812LC POWER8 (2.92 GHz, 10 cores) 642 482
Fujitsu SPARC M10-4S SPARC64 X+ (3.7 GHz, 16 cores) 546 479
SPARC T5-1B SPARC T5 (3.6 GHz, 16 cores) 489 441
IBM Power 710 Express POWER7 (3.55 GHz, 8 cores) 289 255
  SPECfp_rate2006
SPARC T7-1 SPARC M7 (4.13 GHz, 32 cores) 832 801
NEC Express5800/R120f-1M Intel E5-2699 v3 (2.3 GHz, 18 cores) 474 460
IBM Power S812LC POWER8 (2.92 GHz, 10 cores) 468 394
Fujitsu SPARC M10-4S SPARC64 X+ (3.7 GHz, 16 cores) 462 418
SPARC T5-1B SPARC T5 (3.6 GHz, 16 cores) 369 350
IBM Power 710 Express POWER7 (3.55 GHz, 8 cores) 248 229

The following table highlights the performance of the single-chip SPARC M7 processor based server to the best published two-chip POWER8 processor based server.

SPEC CPU2006 Rate Results
Comparing One SPARC M7 Chip to Two POWER8 Chips
System Chip Peak Base
  SPECint_rate2006
SPARC T7-1 1 x SPARC M7 (4.13 GHz, 32core) 1200 1120
IBM Power S822LC 2 x POWER8 (2.92 GHz, 2x 10core) 1100 853
  SPECfp_rate2006
SPARC T7-1 1 x SPARC M7 (4.13 GHz, 32 cores) 832 801
IBM Power S822LC 2 x POWER8 (2.92 GHz, 2x 10core) 888 745

Configuration Summary

System Under Test:

SPARC T7-1
1 x SPARC M7 processor (4.13 GHz)
512 GB memory (16 x 32 GB dimms)
800 GB on 4 x 400 GB SAS SSD (mirrored)
Oracle Solaris 11.3
Oracle Solaris Studio 12.4 with 4/15 Patch Set

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark. It measures:

  • Speed — single copy performance of chip, memory, compiler
  • Rate — multiple copy (throughput)

The benchmark is also divided into integer intensive applications and floating point intensive applications:

  • integer: 12 benchmarks derived from applications such as artificial intelligence chess playing, artificial intelligence go playing, quantum computer simulation, perl, gcc, XML processing, and pathfinding
  • floating point: 17 benchmarks derived from applications, including chemistry, physics, genetics, and weather.

It is also divided depending upon the amount of optimization allowed:

  • base: optimization is consistent per compiled language, all benchmarks must be compiled with the same flags per language.
  • peak: specific compiler optimization is allowed per application.

The overall metrics for the benchmark which are commonly used are:

  • SPECint_rate2006, SPECint_rate_base2006: integer, rate
  • SPECfp_rate2006, SPECfp_rate_base2006: floating point, rate
  • SPECint2006, SPECint_base2006: integer, speed
  • SPECfp2006, SPECfp_base2006: floating point, speed

Key Points and Best Practices

  • Jobs were bound using pbind.

See Also

Disclosure Statement

SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of November 19, 2015 from www.spec.org.
SPARC T7-1: 1200 SPECint_rate2006, 1120 SPECint_rate_base2006, 832 SPECfp_rate2006, 801 SPECfp_rate_base2006; SPARC T5-1B: 489 SPECint_rate2006, 440 SPECint_rate_base2006, 369 SPECfp_rate2006, 350 SPECfp_rate_base2006; Fujitsu SPARC M10-4S: 546 SPECint_rate2006, 479 SPECint_rate_base2006, 462 SPECfp_rate2006, 418 SPECfp_rate_base2006. IBM Power 710 Express: 289 SPECint_rate2006, 255 SPECint_rate_base2006, 248 SPECfp_rate2006, 229 SPECfp_rate_base2006; Fujitsu CELSIUS C740: 715 SPECint_rate2006, 693 SPECint_rate_base2006; NEC Express5800/R120f-1M: 474 SPECfp_rate2006, 460 SPECfp_rate_base2006; IBM Power S822LC: 1100 SPECint_rate2006, 853 SPECint_rate_base2006, 888 SPECfp_rate2006, 745 SPECfp_rate_base2006; IBM Power S812LC: 642 SPECint_rate2006, 482 SPECint_rate_base2006, 468 SPECfp_rate2006, 394 SPECfp_rate_base2006.

SPARC T7-4 Delivers 4-Chip World Record for SPEC OMP2012

Oracle's SPARC T7-4 server delivered world record performance on the SPEC OMP2012 benchmark for systems with four chips. This was accomplished with Oracle Solaris 11.3 and Oracle Solaris Studio 12.4 software.

  • The SPARC T7-4 server delivered world record for systems with four chips of 27.9 SPECompG_peak2012 and 26.4 SPECompG_base2012.

  • The SPARC T7-4 server beat the four chip HP ProLiant DL580 Gen9 with Intel Xeon Processor E7-8890 v3 by 29% on the SPECompG_peak2012 benchmark.

  • This SPEC OMP2012 benchmark result demonstrates that the SPARC M7 processor performs well on floating-point intensive technical computing and modeling workloads.

Performance Landscape

Complete benchmark results are at the SPEC website, SPEC OMP2012 Results. The table below provides the new Oracle result as well as the previous best four chip results.

SPEC OMP2012 Results
Four Chip Results
System Processor Peak Base
SPARC T7-4 SPARC M7, 4.13 GHz 27.9 26.4
HP ProLiant DL580 Gen9 Intel Xeon E7-8890 v3, 2.5 GHz 21.5 20.4
Cisco UCS C460 M4 Intel Xeon E7-8890 v3, 2.5 GHz -- 20.8

Configuration Summary

Systems Under Test:

SPARC T7-4
4 x 4.13 GHz SPARC M7 processors
2 TB memory (64 x 32 GB dimms)
4 x 600 GB SAS 10,000 RPM HDD (mirrored)
Oracle Solaris 11.3 (11.3.0.30.0)
Oracle Solaris Studio 12.4 with 4/15 Patch Set

Benchmark Description

The following was taken from the SPEC website.

SPEC OMP2012 focuses on compute intensive performance, which means these benchmarks emphasize the performance of:

  • the computer processor (CPU),
  • the memory architecture,
  • the parallel support libraries, and
  • the compilers.

It is important to remember the contribution of the latter three components. SPEC OMP performance intentionally depends on more than just the processor.

SPEC OMP2012 contains a suite that focuses on parallel computing performance using the OpenMP parallelism standard.

The suite can be used to measure along the following vector:

  • Compilation method: Consistent compiler options across all programs of a given language (the base metrics) and, optionally, compiler options tuned to each program (the peak metrics).

SPEC OMP2012 is not intended to stress other computer components such as networking, the operating system, graphics, or the I/O system. Note that there are many other SPEC benchmarks, including benchmarks that specifically focus on graphics, distributed Java computing, webservers, and network file systems.

Key Points and Best Practices

  • Jobs were bound using OMP_PLACES.

See Also

Disclosure Statement

SPEC and the benchmark name SPEC OMP are registered trademarks of the Standard Performance Evaluation Corporation. Results as of November 11, 2015 from www.spec.org. SPARC T7-4 (4 chips, 128 cores, 1024 threads): 27.9 SPECompG_peak2012, 26.4 SPECompG_base2012; HP ProLiant DL580 Gen9 (4 chips, 72 cores, 144 threads): 21.5 SPECompG_peak2012, 20.4 SPECompG_base2012; Cisco UCS C460 M7 (4 chips, 72 cores, 144 threads): 20.8 SPECompG_base2012.

Tuesday Feb 18, 2014

SPARC T5-2 Produces SPECjbb2013-MultiJVM World Record for 2-Chip Systems

From www.spec.org

Defects Identified in SPECjbb®2013

December 9, 2014 - SPEC has identified a defect in its SPECjbb®2013 benchmark suite. SPEC has suspended sales of the benchmark software and is no longer accepting new submissions of SPECjbb®2013 results for publication on SPEC's website. Current SPECjbb®2013 licensees will receive a free copy of the new version of the benchmark when it becomes available.

SPEC is advising SPECjbb®2013 licensees and users of the SPECjbb®2013 metrics that the recently discovered defect impacts the comparability of results. This defect can significantly impact the amount of work done during the measurement period, resulting in an inflated SPECjbb®2013 metric. SPEC recommends that users not utilize these results for system comparisons without a full understanding of the impact of these defects on each benchmark result.

Additional information is available here.

The SPECjbb2013 benchmark shows modern Java application performance. Oracle's SPARC T5-2 set a two-chip world record, which is 1.8x faster than the best two-chip x86-based server. Using Oracle Solaris and Oracle Java, Oracle delivered this two-chip world record result on the MultiJVM SPECjbb2013 metric.

  • The SPARC T5-2 server achieved 114,492 SPECjbb2013-MultiJVM max-jOPS and 43,963 SPECjbb2013-MultiJVM critical-jOPS on the SPECjbb2013 benchmark. This result is a two-chip world record.

  • The SPARC T5-2 server running SPECjbb2013 is 1.8x faster than the Cisco UCS C240 M3 server (2.7 GHz Intel Xeon E5-2697 v2) based on both the SPECjbb2013-MultiJVM max-jOPS and SPECjbb2013-MultiJVM critical-jOPS metrics.

  • The SPARC T5-2 server running SPECjbb2013 is 2x faster than the HP ProLiant ML350p Gen8 server (2.7 GHz Intel Xeon E5-2697 v2) based on SPECjbb2013-MultiJVM max-jOPS and 1.3x faster based on SPECjbb2013-MultiJVM critical-jOPS.

  • The new Oracle results were obtained using Oracle Solaris 11 along with Oracle Java SE 8 on the SPARC T5-2 server.

  • The SPARC T5-2 server running SPECjbb2013 on a per chip basis is 1.3x faster than the NEC Express5800/A040b server (2.8 GHz Intel Xeon E7-4890 v2) based on both the SPECjbb2013-MultiJVM max-jOPS and SPECjbb2013-MultiJVM critical-jOPS metrics.

  • There are no IBM POWER7 or POWER7+ based server results on the SPECjbb2013 benchmark. IBM has published IBM POWER7+ based servers on the SPECjbb2005 which was retired by SPEC in 2013.

Performance Landscape

Results of SPECjbb2013 from www.spec.org as of March 6, 2014. These are the leading 2-chip SPECjbb2013 MultiJVM results.

SPECjbb2013 - 2-Chip MultiJVM Results
System Processor SPECjbb2013-MultiJVM JDK
max-jOPS critical-jOPS
SPARC T5-2 2xSPARC T5, 3.6 GHz 114,492 43,963 Oracle Java SE 8
Cisco UCS C240 M3 2xIntel E5-2697 v2, 2.7 GHz 63,079 23,797 Oracle Java SE 7u45
HP ProLiant ML350p Gen8 2xIntel E5-2697 v2, 2.7 GHz 62,393 24,310 Oracle Java SE 7u45
IBM System x3650 M4 BD 2xIntel E5-2695 v2, 2.4 GHz 59,124 22,275 IBM SDK V7 SR6 (*)
HP ProLiant ML350p Gen8 2xIntel E5-2697 v2, 2.7 GHz 57,594 32,103 Oracle Java SE 7u40
HP ProLiant BL460c Gen8 2xIntel E5-2697 v2, 2.7 GHz 56,367 30,078 Oracle Java SE 7u40
Sun Server X4-2, DDR3-1600 2xIntel E5-2697 v2, 2.7 GHz 52,664 20,553 Oracle Java SE 7u40
HP ProLiant DL360e Gen8 2xIntel E5-2470 v2, 2.4 GHz 48,772 17,915 Oracle Java SE 7u40

* IBM SDK V7 SR6 – IBM SDK, Java Technology Edition, Version 7, Service Refresh 6

The following table compares the SPARC T5 processor to the Intel E7 v2 processor.

SPECjbb2013 - Results Using JDK 8
Per Chip Comparison
System SPECjbb2013-MultiJVM SPECjbb2013-MultiJVM/Chip JDK
max-jOPS critical-jOPS max-jOPS critical-jOPS
SPARC T5-2
2xSPARC T5, 3.6 GHz
114,492 43,963 57,246 21,981 Oracle Java SE 8
NEC Express5800/A040b
4xIntel E7-4890 v2, 2.8 GHz
177,753 65,529 44,438 16,382 Oracle Java SE 8

SPARC per Chip Advantage 1.29x 1.34x

Configuration Summary

System Under Test:

SPARC T5-2 server
2 x SPARC T5, 3.60 GHz
512 GB memory (32 x 16 GB dimms)
Oracle Solaris 11.1
Oracle Java SE 8

Benchmark Description

The SPECjbb2013 benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community.

From SPEC's press release, "SPECjbb2013 replaces SPECjbb2005. The new benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is expected to be used widely by all those interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community."

SPECjbb2013 features include:

  • A usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases and data-mining operations.
  • Both a pure throughput metric and a metric that measures critical throughput under service-level agreements (SLAs) specifying response times ranging from 10ms to 500ms.
  • Support for multiple run configurations, enabling users to analyze and overcome bottlenecks at multiple layers of the system stack, including hardware, OS, JVM and application layers.
  • Exercising new Java 7 features and other important performance elements, including the latest data formats (XML), communication using compression, and messaging with security.
  • Support for virtualization and cloud environments.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of 3/6/2014, see http://www.spec.org for more information.  SPARC T5-2 114,492 SPECjbb2013-MultiJVM max-jOPS, 43,963 SPECjbb2013-MultiJVM critical-jOPS; NEC Express5800/A040b 177,753 SPECjbb2013-MultiJVM max-jOPS, 65,529 SPECjbb2013-MultiJVM critical-jOPS; Cisco UCS c240 M3 63,079 SPECjbb2013-MultiJVM max-jOPS, 23,797 SPECjbb2013-MultiJVM critical-jOPS; HP ProLiant ML350p Gen8 62,393 SPECjbb2013-MultiJVM max-jOPS, 24,310 SPECjbb2013-MultiJVM critical-jOPS; IBM System X3650 M4 BD 59,124 SPECjbb2013-MultiJVM max-jOPS, 22,275 SPECjbb2013-MultiJVM critical-jOPS; HP ProLiant ML350p Gen8 57,594 SPECjbb2013-MultiJVM max-jOPS, 32,103 SPECjbb2013-MultiJVM critical-jOPS; HP ProLiant BL460c Gen8 56,367 SPECjbb2013-MultiJVM max-jOPS, 30,078 SPECjbb2013-MultiJVM critical-jOPS; Sun Server X4-2 52,664 SPECjbb2013-MultiJVM max-jOPS, 20,553 SPECjbb2013-MultiJVM critical-jOPS; HP ProLiant DL360e Gen8 48,772 SPECjbb2013-MultiJVM max-jOPS, 17,915 SPECjbb2013-MultiJVM critical-jOPS.

Monday Sep 23, 2013

SPARC T5-2 Delivers Best 2-Chip MultiJVM SPECjbb2013 Result

From www.spec.org

Defects Identified in SPECjbb®2013

December 9, 2014 - SPEC has identified a defect in its SPECjbb®2013 benchmark suite. SPEC has suspended sales of the benchmark software and is no longer accepting new submissions of SPECjbb®2013 results for publication on SPEC's website. Current SPECjbb®2013 licensees will receive a free copy of the new version of the benchmark when it becomes available.

SPEC is advising SPECjbb®2013 licensees and users of the SPECjbb®2013 metrics that the recently discovered defect impacts the comparability of results. This defect can significantly impact the amount of work done during the measurement period, resulting in an inflated SPECjbb®2013 metric. SPEC recommends that users not utilize these results for system comparisons without a full understanding of the impact of these defects on each benchmark result.

Additional information is available here.

SPECjbb2013 is a new benchmark designed to show modern Java server performance. Oracle's SPARC T5-2 set a world record as the fastest two-chip system beating just introduced two-chip x86-based servers. Oracle, using Oracle Solaris and Oracle JDK, delivered this two-chip world record result on the MultiJVM SPECjbb2013 metric. SPECjbb2013 is the replacement for SPECjbb2005 (SPECjbb2005 will soon be retired by SPEC).

  • Oracle's SPARC T5-2 server achieved 81,084 SPECjbb2013-MultiJVM max-jOPS and 39,129 SPECjbb2013-MultiJVM critical-jOPS on the SPECjbb2013 benchmark. This result is a two chip world record.

  • There are no IBM POWER7 or POWER7+ based server results on the SPECjbb2013 benchmark. IBM has published IBM POWER7+ based servers on the SPECjbb2005 which will soon be retired by SPEC.

  • The 2-chip SPARC T5-2 server running SPECjbb2013 is 30% faster than the 2-chip Cisco UCS B200 M3 server (2.7 GHz E5-2697 v2 Ivy Bridge-based) based on SPECjbb2013-MultiJVM max-jOPS.

  • The 2-chip SPARC T5-2 server running SPECjbb2013 is 66% faster than the 2-chip Cisco UCS B200 M3 server (2.7 GHz E5-2697 v2 Ivy Bridge-based) based on SPECjbb2013-MultiJVM critical-jOPS.

  • These results were obtained using Oracle Solaris 11 along with Java Platform, Standard Edition, JDK 7 Update 40 on the SPARC T5-2 server.

From SPEC's press release, "SPECjbb2013 replaces SPECjbb2005. The new benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is expected to be used widely by all those interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community."

Performance Landscape

Results of SPECjbb2013 from www.spec.org as of September 22, 2013 and this report.

SPECjbb2013
System Processor SPECjbb2013-MultiJVM JDK
type # max-jOPS critical-jOPS
SPARC T5-2 SPARC T5, 3.6 GHz 2 81,084 39,129 Oracle JDK 7u40
Cisco UCS B200 M3, DDR3-1866 Intel E5-2697 v2, 2.7 GHz 2 62,393 23,505 Oracle JDK 7u40
Sun Server X4-2, DDR3-1600 Intel E5-2697 v2, 2.7 GHz 2 52,664 20,553 Oracle JDK 7u40
Cisco UCS C220 M3 Intel E5-2690, 2.9 GHz 2 41,954 16,545 Oracle JDK 7u11

The above table represents all of the published results on www.spec.org. SPEC allows for self publication of SPECjbb2013 results. See below for locations where full reports were made available.

Configuration Summary

System Under Test:

SPARC T5-2 server
2 x SPARC T5, 3.60 GHz
512 GB memory (32 x 16 GB dimms)
Oracle Solaris 11.1
Oracle JDK 7 Update 40

Benchmark Description

The SPECjbb2013 benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community.

SPECjbb2013 replaces SPECjbb2005. New features include:

  • A usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases and data-mining operations.
  • Both a pure throughput metric and a metric that measures critical throughput under service-level agreements (SLAs) specifying response times ranging from 10ms to 500ms.
  • Support for multiple run configurations, enabling users to analyze and overcome bottlenecks at multiple layers of the system stack, including hardware, OS, JVM and application layers.
  • Exercising new Java 7 features and other important performance elements, including the latest data formats (XML), communication using compression, and messaging with security.
  • Support for virtualization and cloud environments.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of 9/23/2013, see http://www.spec.org for more information. SPARC T5-2 81,084 SPECjbb2013-MultiJVM max-jOPS, 39,129 SPECjbb2013-MultiJVM critical-jOPS, result from https://blogs.oracle.com/BestPerf/resource/jbb2013/sparct5-922.pdf Cisco UCS B200 M3 62,393 SPECjbb2013-MultiJVM max-jOPS, 23,505 SPECjbb2013-MultiJVM critical-jOPS, result from http://www.cisco.com/en/US/prod/collateral/ps10265/le_41704_pb_specjbb2013b200.pdf; Sun Server X4-2 52,664 SPECjbb2013-MultiJVM max-jOPS, 20,553 SPECjbb2013-MultiJVM critical-jOPS, result from https://blogs.oracle.com/BestPerf/entry/20130918_x4_2_specjbb2013; Cisco UCS C220 M3 41,954 SPECjbb2013-MultiJVM max-jOPS, 16,545 SPECjbb2013-MultiJVM critical-jOPS result from www.spec.org.

Wednesday Sep 18, 2013

Sun Server X4-2 Performance Running SPECjbb2013 MultiJVM Benchmark

From www.spec.org

Defects Identified in SPECjbb®2013

December 9, 2014 - SPEC has identified a defect in its SPECjbb®2013 benchmark suite. SPEC has suspended sales of the benchmark software and is no longer accepting new submissions of SPECjbb®2013 results for publication on SPEC's website. Current SPECjbb®2013 licensees will receive a free copy of the new version of the benchmark when it becomes available.

SPEC is advising SPECjbb®2013 licensees and users of the SPECjbb®2013 metrics that the recently discovered defect impacts the comparability of results. This defect can significantly impact the amount of work done during the measurement period, resulting in an inflated SPECjbb®2013 metric. SPEC recommends that users not utilize these results for system comparisons without a full understanding of the impact of these defects on each benchmark result.

Additional information is available here.

Oracle's Sun Server X4-2 system, using Oracle Solaris and Oracle JDK, produced a SPECjbb2013 benchmark (MultiJVM metric) result. This benchmark was designed by the industry to showcase Java server performance.

  • The Sun Server X4-2 system is 24% faster than the fastest published Intel Xeon E5-2600 (Sandy Bridge) based two socket system's (Dell PowerEdge R720's) SPECjbb2013-MultiJVM max-jOPS.

  • The Sun Server X4-2 is 22% faster than the fastest published Intel Xeon E5-2600 (Sandy Bridge) based two socket system's (Dell PowerEdge R720's) SPECjbb2013-MultiJVM critical-jOPS.

  • The Sun Server X4-2 runs SPECjbb2013 (MultiJVM metric) at 70% of the published T5-2 SPECjbb2013-MultiJVM max-jOPS.

  • The Sun Server X4-2 runs SPECjbb2013 (MultiJVM metric) at 88% of the published T5-2 SPECjbb2013-MultiJVM critical-jOPS.

  • The combination of Oracle Solaris 11.1 and Oracle JDK 7 update 40 delivered a result of 52,664 SPECjbb2013-MultiJVM max-jOPS and 20,553 SPECjbb2013-MultiJVM critical-jOPS on the SPECjbb2013 benchmark.

From SPEC's press release, "SPECjbb2013 replaces SPECjbb2005. The new benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is expected to be used widely by all those interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community."

Performance Landscape

Top two-socket results of SPECjbb2013 MultiJVM as of October 8, 2013.

SPECjbb2013
System Processor DDR3 SPECjbb2013-MultiJVM OS JDK
max-jOPS critical-jOPS
SPARC T5-2 2 x 3.6 GHz SPARC T5 1600 75,658 23,334 Solaris 11.1 7u17
Cisco UCS B200 M3 2 x 2.7 GHz Intel E5-2697 v2 1866 62,393 23,505 RHEL 6.4 7u40
Sun Server X4-2 2 x 2.7 GHz Intel E5-2697 v2 1600 52,664 20,553 Solaris 11.1 7u40
Dell PowerEdge R720 2 x 2.9 GHz Intel Xeon E5-2690 1600 42,431 16,779 RHEL 6.4 7u21

The above table includes published results from www.spec.org.

Configuration Summary

System Under Test:

Sun Server X4-2
2 x Intel E5-2697 v2, 2.7 GHz
Hyper-Threading enabled
Turbo Boost enabled
128 GB memory (16 x 8 GB dimms)
Oracle Solaris 11.1 (11.1.4.2.0)
Oracle JDK 7u40

Benchmark Description

The SPECjbb2013 benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community.

SPECjbb2013 replaces SPECjbb2005. New features include:

  • A usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases and data-mining operations.
  • Both a pure throughput metric and a metric that measures critical throughput under service-level agreements (SLAs) specifying response times ranging from 10ms to 500ms.
  • Support for multiple run configurations, enabling users to analyze and overcome bottlenecks at multiple layers of the system stack, including hardware, OS, JVM and application layers.
  • Exercising new Java 7 features and other important performance elements, including the latest data formats (XML), communication using compression, and messaging with security.
  • Support for virtualization and cloud environments.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results from http://www.spec.org as of 10/8/2013. SPARC T5-2, 75,658 SPECjbb2013-MultiJVM max-jOPS, 23,334 SPECjbb2013-MultiJVM critical-jOPS; Cisco UCS B200 M3 62,393 SPECjbb2013-MultiJVM max-jOPS, 23,505 SPECjbb2013-MultiJVM critical-jOPS; Dell PowerEdge R720 42,431 SPECjbb2013-MultiJVM max-jOPS, 16,779 SPECjbb2013-MultiJVM critical-jOPS; Sun Server X4-2 52,664 SPECjbb2013-MultiJVM max-jOPS, 20,553 SPECjbb2013-MultiJVM critical-jOPS.

Tuesday Mar 26, 2013

SPARC T5-2 Achieves SPECjbb2013 Benchmark World Record Result

From www.spec.org

Defects Identified in SPECjbb®2013

December 9, 2014 - SPEC has identified a defect in its SPECjbb®2013 benchmark suite. SPEC has suspended sales of the benchmark software and is no longer accepting new submissions of SPECjbb®2013 results for publication on SPEC's website. Current SPECjbb®2013 licensees will receive a free copy of the new version of the benchmark when it becomes available.

SPEC is advising SPECjbb®2013 licensees and users of the SPECjbb®2013 metrics that the recently discovered defect impacts the comparability of results. This defect can significantly impact the amount of work done during the measurement period, resulting in an inflated SPECjbb®2013 metric. SPEC recommends that users not utilize these results for system comparisons without a full understanding of the impact of these defects on each benchmark result.

Additional information is available here.

Oracle, using Oracle Solaris and Oracle JDK, delivered a two socket server world record result on the SPECjbb2013 benchmark, Multi-JVM metric. This benchmark was designed by the industry to showcase Java server performance. SPECjbb2013 is the replacement for SPECjbb2005 (SPECjbb2005 will soon be retired by SPEC).

  • Oracle's SPARC T5-2 server achieved 75,658 SPECjbb2013-MultiJVM max-jOPS and 23,268 SPECjbb2013-MultiJVM critical-jOPS on the SPECjbb2013 benchmark. This result is a two chip world record. (Oracle has submitted this result for review by SPEC.)

  • There are no IBM POWER7 or POWER7+ based server results on the SPECjbb2013 benchmark. IBM has published IBM POWER7+ based servers on the SPECjbb2005 which will soon be retired by SPEC.

  • The SPARC T5-2 server running is 1.9x faster than the 2-chip HP ProLiant ML350p server (2.9 GHz E5-2690 Sandy Bridge-based) based on SPECjbb2013-MultiJVM max-jOPS.

  • The 2-chip SPARC T5-2 server is 15% faster than the 4-chip HP ProLiant DL560p server (2.7 GHz E5-4650 Sandy Bridge-based) based on SPECjbb2013-MultiJVM max-jOPS.

  • The 2-chip SPARC T5-2 server is 6.1x faster than the 1-chip HP ProLiant ML310e Gen8 (3.6 GHZ E3-1280v2 Ivy Bridge based) based on SPECjbb2013-MultiJVM max-jOPS.

  • The Sun Server X3-2 system running Oracle Solaris 11 is 5% faster than the HP ProLiant ML350p Gen8 server running Windows Server 2008 based on SPECjbb2013-MultiJVM max-jOPS.

  • Oracle's SPARC T4-2 server achieved 34,804 SPECjbb2013-MultiJVM max-jOPS and 10,101 SPECjbb2013-MultiJVM critical-jOPS on the SPECjbb2013 benchmark.
    (Oracle has submitted this result for review by SPEC.)

  • Oracle's Sun Server X3-2 system achieved 41,954 SPECjbb2013-MultiJVM max-jOPS and 13,305 SPECjbb2013-MultiJVM critical-jOPS on the SPECjbb2013 benchmark. (Oracle has submitted this result for review by SPEC.)

  • Oracle's Sun Server X2-4 system achieved 65,211 SPECjbb2013-MultiJVM max-jOPS and 22,057 SPECjbb2013-MultiJVM critical-jOPS on the SPECjbb2013 benchmark. (Oracle has submitted this result for review by SPEC.)

  • SPECjbb2013 demonstrates better performance on Oracle hardware and software, engineered to work together, than alternatives from HP.

  • These results were obtained using Oracle Solaris 11 along with Java Platform, Standard Edition, JDK 7 Update 17 on the SPARC T5-2 server, SPARC T4-2 server, Sun Server X3-2 and Sun Server X2-4.

From SPEC's press release, "SPECjbb2013 replaces SPECjbb2005. The new benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is expected to be used widely by all those interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community."

Performance Landscape

Results of SPECjbb2013 from www.spec.org as of March 26, 2013 and this report.

SPECjbb2013
System Processor SPECjbb2013-MultiJVM OS JDK
max-jOPS critical-jOPS
SPARC T5-2 2 x SPARC T5 75,658 23,334 Oracle Solaris 11.1 Oracle JDK 7u17
HP DL560p Gen8 4 x Intel E5-4650 66,007 16,577 Windows 2008 R2 Oracle JDK 7u15
Sun Server X2-4 4 x Intel E7-4870 65,211 22,057 Oracle Solaris 11.1 Oracle JDK 7u17
Sun Server X3-2 2 x Intel E5-2690 41,954 13,305 Oracle Solaris 11.1 Oracle JDK 7u17
HP ML350p Gen8 2 x Intel E5-2690 40,047 12,308 Windows 2008 R2 Oracle JDK 7u15
SPARC T4-2 2 x SPARC T4 34,804 10,101 Oracle Solaris 11.1 Oracle JDK 7u17
Supermicro X8DTN+ 2 x Intel E5690 20,977 6,188 RHEL 6.3 Oracle JDK 7u11
HP ML310e Gen8 1 x Intel E3-1280v2 12,315 2,908 Windows 2008 R2 Oracle JDK 7u15
Intel R1304BT 1 x Intel 1260L 6,198 1,722 Windows 2008 R2 Oracle JDK 7u11

The above table represents all of the published results on www.spec.org. SPEC allows for self publication of SPECjbb2013 results.

Configuration Summary

Systems Under Test:

SPARC T5-2 server
2 x SPARC T5, 3.60 GHz
512 GB memory (32 x 16 GB dimms)
Oracle Solaris 11.1
Oracle JDK 7 Update 17

Sun Server X2-4
4 x Intel Xeon E7-4870, 2.40 GHz
Hyper-Threading enabled
Turbo Boost enabled
128 GB memory (32 x 4 GB dimms)
Oracle Solaris 11.1
Oracle JDK 7 Update 17

Sun Server X3-2
2 x Intel E5-2690, 2.90 GHz
Hyper-Threading enabled
Turbo Boost enabled
128 GB memory (32 x 4 GB dimms)
Oracle Solaris 11.1
Oracle JDK 7 Update 17

SPARC T4-2 server
2 x SPARC T4, 2.85 GHz
256 GB memory (32 x 8 GB dimms)
Oracle Solaris 11.1
Oracle JDK 7 Update 17

Benchmark Description

The SPECjbb2013 benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community.

SPECjbb2013 replaces SPECjbb2005. New features include:

  • A usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases and data-mining operations.
  • Both a pure throughput metric and a metric that measures critical throughput under service-level agreements (SLAs) specifying response times ranging from 10ms to 500ms.
  • Support for multiple run configurations, enabling users to analyze and overcome bottlenecks at multiple layers of the system stack, including hardware, OS, JVM and application layers.
  • Exercising new Java 7 features and other important performance elements, including the latest data formats (XML), communication using compression, and messaging with security.
  • Support for virtualization and cloud environments.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of 3/26/2013, see http://www.spec.org for more information. SPARC T5-2 75,658 SPECjbb2013-MultiJVM max-jOPS, 23,334 SPECjbb2013-MultiJVM critical-jOPS. Sun Server X2-4 65,211 SPECjbb2013-MultiJVM max-jOPS, 22,057 SPECjbb2013-MultiJVM critical-jOPS. Sun Server X3-2 41,954 SPECjbb2013-MultiJVM max-jOPS, 13,305 SPECjbb2013-MultiJVM critical-jOPS. SPARC T4-2 34,804 SPECjbb2013-MultiJVM max-jOPS, 10,101 SPECjbb2013-MultiJVM critical-jOPS. HP ProLiant DL560p Gen8 66,007 SPECjbb2013-MultiJVM max-jOPS, 16,577 SPECjbb2013-MultiJVM critical-jOPS. HP ProLiant ML350p Gen8 40,047 SPECjbb2013-MultiJVM max-jOPS, 12,308 SPECjbb2013-MultiJVM critical-jOPS. Supermicro X8DTN+ 20,977 SPECjbb2013-MultiJVM max-jOPS, 6,188 SPECjbb2013-MultiJVM critical-jOPS. HP ProLiant ML310e Gen8 12,315 SPECjbb2013-MultiJVM max-jOPS, 2,908 SPECjbb2013-MultiJVM critical-jOPS. Intel R1304BT 6,198 SPECjbb2013-MultiJVM max-jOPS, 1,722 SPECjbb2013-MultiJVM critical-jOPS.

SPARC T5 Systems Deliver SPEC CPU2006 Rate Benchmark Multiple World Records

Oracle's SPARC T5 processor based systems delivered world record performance on the SPEC CPU2006 rate benchmarks. This was accomplished with Oracle Solaris 11.1 and Oracle Solaris Studio 12.3 software.

SPARC T5-8

  • The SPARC T5-8 server delivered world record SPEC CPU2006 rate benchmark results for systems with eight processors.

  • The SPARC T5-8 server achieved scores of 3750 SPECint_rate2006, 3490 SPECint_rate_base2006, 3020 SPECfp_rate2006, and 2770 SPECfp_rate_base2006.

  • The SPARC T5-8 server beat the 8 processor IBM Power 760 with POWER7+ processors by 1.7x on the SPECint_rate2006 benchmark and 2.2x on the SPECfp_rate2006 benchmark.

  • The SPARC T5-8 server beat the 8 processor IBM Power 780 with POWER7 processors by 35% on the SPECint_rate2006 benchmark and 14% on the SPECfp_rate2006 benchmark.

  • The SPARC T5-8 server beat the 8 processor HP DL980 G7 with Intel Xeon E7-4870 processors by 1.7x on the SPECint_rate2006 benchmark and 2.1x on the SPECfp_rate2006 benchmark.

SPARC T5-1B

  • The SPARC T5-1B server module delivered world record SPEC CPU2006 rate benchmark results for systems with one processor.

  • The SPARC T5-1B server module achieved scores of 467 SPECint_rate2006, 436 SPECint_rate_base2006, 369 SPECfp_rate2006, and 350 SPECfp_rate_base2006.

  • The SPARC T5-1B server module beat the 1 processor IBM Power 710 Express with a POWER7 processor by 62% on the SPECint_rate2006 benchmark and 49% on the SPECfp_rate2006 benchmark.

  • The SPARC T5-1B server module beat the 1 processor NEC Express5800/R120d-1M with an Intel Xeon E5-2690 processor by 31% on the SPECint_rate2006 benchmark. The SPARC T5-1B server module beat the 1 processor Huawei RH2288 V2 with an Intel Xeon E5-2690 processor by 44% on the SPECfp_rate2006 benchmark.

  • The SPARC T5-1B server module beat the 1 processor Supermicro A+ 1012G-MTF with an AMD Operton 6386 SE processor by 51% on the SPECint_rate2006 benchmark and 65% on the SPECfp_rate2006 benchmark.

Performance Landscape

Complete benchmark results are at the SPEC website, SPEC CPU2006 Results. The tables below provide the new Oracle results, as well as, select results from other vendors.

SPEC CPU2006 Rate Results – Eight Processors
System Processor ch/co/th * Peak Base
SPECint_rate2006
SPARC T5-8 SPARC T5, 3.6 GHz 8/128/1024 3750 3490
IBM Power 780 POWER7, 3.92 GHz 8/64/256 2770 2420
HP DL980 G7 Xeon E7-4870, 2.4 GHz 8/80/160 2180 2070
IBM Power 760 POWER7+, 3.42 GHz 8/48/192 2170 1480
Dell PowerEdge C6145 Opteron 6180 SE, 2.5 GHz 8/96/96 1670 1440
SPECfp_rate2006
SPARC T5-8 SPARC T5, 3.6 GHz 8/128/1024 3020 2770
IBM Power 780 POWER7, 3.92 GHz 8/64/256 2640 2410
HP DL980 G7 Xeon E7-4870, 2.4 GHz 8/80/160 1430 1380
IBM Power 760 POWER7+, 3.42 GHz 8/48/192 1400 1130
Dell PowerEdge C6145 Opteron 6180 SE, 2.5 GHz 8/96/96 1310 1200

* ch/co/th — chips / cores / threads enabled

SPEC CPU2006 Rate Results – One Processor
System Processor ch/co/th * Peak Base
SPECint_rate2006
SPARC T5-1B SPARC T5, 3.6 GHz 1/16/128 467 436
NEC Express5800/R120d-1M Xeon E5-2690, 2.9 GHz 1/8/16 357 343
Supermicro A+ 1012G-MTF Opteron 6386 SE, 2.8 GHz 1/16/16 309 269
IBM Power 710 Express POWER7, 3.556 GHz 1/8/32 289 255
SPECfp_rate2006
SPARC T5-1B SPARC T5, 3.6 GHz 1/16/128 369 350
Huawei RH2288 V2 Xeon E5-2690, 2.9 GHz 1/8/16 257 250
IBM Power 710 Express POWER7, 3.556 GHz 1/8/32 248 229
Supermicro A+ 1012G-MTF Opteron 6386 SE, 2.8 GHz 1/16/16 223 199

* ch/co/th — chips / cores / threads enabled

Configuration Summary

Systems Under Test:

SPARC T5-8
8 x 3.6 GHz SPARC T5 processors
4 TB memory (128 x 32 GB dimms)
2 TB on 8 x 600 GB 10K RPM SAS disks, arranged as 4 x 2-way mirrors
Oracle Solaris 11.1 (SRU 4.6)
Oracle Solaris Studio 12.3 1/13 PSE

SPARC T5-1B
1 x 3.6 GHz SPARC T5 processor
256 GB memory (16 x 16 GB dimms)
157 GB on 2 x 300 GB 10K RPM SAS disks (mirrored)
Oracle Solaris 11.1 (SRU 3.4)
Oracle Solaris Studio 12.3 1/13 PSE

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark. It measures:

  • Speed — single copy performance of chip, memory, compiler
  • Rate — multiple copy (throughput)

The benchmark is also divided into integer intensive applications and floating point intensive applications:

  • integer: 12 benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding
  • floating point: 17 benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

It is also divided depending upon the amount of optimization allowed:

  • base: optimization is consistent per compiled language, all benchmarks must be compiled with the same flags per language.
  • peak: specific compiler optimization is allowed per application.

The overall metrics for the benchmark which are commonly used are:

  • SPECint_rate2006, SPECint_rate_base2006: integer, rate
  • SPECfp_rate2006, SPECfp_rate_base2006: floating point, rate
  • SPECint2006, SPECint_base2006: integer, speed
  • SPECfp2006, SPECfp_base2006: floating point, speed

See Also

Disclosure Statement

SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of March 26, 2013 from www.spec.org and this report. SPARC T5-8: 3750 SPECint_rate2006, 3490 SPECint_rate_base2006, 3020 SPECfp_rate2006, 2770 SPECfp_rate_base2006; SPARC T5-1B: 467 SPECint_rate2006, 436 SPECint_rate_base2006, 369 SPECfp_rate2006, 350 SPECfp_rate_base2006.

SPARC T5 Systems Produce Oracle TimesTen Benchmark World Record

The Oracle TimesTen In-Memory Database is optimized to run on Oracle's SPARC T5 processor platforms running Oracle Solaris 11. In this series of tests, systems with the new SPARC T5 processor were significantly faster than systems based on other processors. Two tests were run to explore TimesTen performance: a Mobile Call Processing test (based on customer workload) and Oracle's TimesTen Performance Throughput Benchmark (TPTBM). TimesTen version 11.2.2.4 was used for all tests.

  • On the TimesTen Performance Throughput Benchmark (TPTBM), SPARC T5-8 server produced a world record 59.9 million read transactions per second.

  • On the Mobile Call Processing test, the SPARC T5 processor achieves 2.4 times more throughput than the Intel Xeon E7-4870 processor. The two-chip SPARC T5-2 server is 22% faster than an x86 server with four Intel E7-4870 2.4 GHz processors.

  • On the TimesTen Performance Throughput Benchmark (TPTBM) read-only workload, the SPARC T5 processor achieves 2.2 times higher throughput than the Intel Xeon E7-4870 processor. On the same workload, the two-chip SPARC T5-2 server produces 10% more throughput than an x86 server with four Intel E7-4870 processors and has almost twice the performance of a 2-chip Intel E5-2680 system.

  • With the TPTBM read-only workload, the SPARC T5-8 server delivers 3.8x more throughput than a SPARC T5-2 Server, showing excellent scalability.

  • The SPARC T5 processor delivers over twice the performace of the previous generation SPARC T4 processor and over 4x the performace of the SPARC T3 processor, all in the same amount of space.

  • The SPARC T5-2 server delivers 2.4x the performace of the SPARC T4-2 server in the same 3U space. This is better performance than that of the SPARC T4-4 server which occupies 5U.

Performance Landscape

Mobile Call Processing Test Performance

Processor Tps
SPARC T5, 3.6 GHz 367,600
Intel Xeon E7-4870, 2.4 GHz 302,000
SPARC T4, 2.85 GHz 230,500

All systems measured using Oracle Solaris 11 and Oracle TimesTen In-Memory Database 11.2.2.4.1

TimesTen Performance Throughput Benchmark (TPTBM) Read-Only

System Processor Chips Tps Tps/
Chip
SPARC T5-8 SPARC T5, 3.6 GHz 8 59.9M 7.5M
SPARC T5-2 SPARC T5, 3.6 GHz 2 15.9M 7.9M
x86 Intel Xeon E7-4870, 2.4 GHz 4 14.5M 3.6M
SPARC T4-4 SPARC T4, 3.0 GHz 4 14.2M 3.6M
x86* Intel Xeon E5-2680, 2.7 GHz 2 8.5M 4.3
SPARC T4-2 SPARC T4, 2.85 GHz 2 6.5M 3.3M
SPARC T3-4 SPARC T3, 1.65 GHz 4 7.9M 1.9M
T5440 SPARC T2+, 1.4 GHz 4 3.1M 0.8M

All systems measured using Oracle Solaris 11 and Oracle TimesTen In-Memory Database 11.2.2.4.1

*Intel E5-2680 using Oracle Linux and Oracle TimesTen In-Memory Database 11.2.2.4.1

TimesTen Performance Throughput Benchmark (TPTBM) Update-Only

Processor Tps
SPARC T5, 3.6 GHz 1,031.7K
Intel Xeon E7-4870, 2.4 GHz 988.1K
Intel Xeon E5-2680, 2.7 GHz * 944.3K
SPARC T4, 3.0 GHz 678.0K

All systems measured using Oracle Solaris 11 and Oracle TimesTen In-Memory Database 11.2.2.4.1

*Intel E5-2680 using Oracle Linux and Oracle TimesTen In-Memory Database 11.2.2.4.1

Configuration Summary

Hardware Configurations:

SPARC T5-8 server
8 x SPARC T5 processors, 3.6 GHz
2 TB memory
1 x 8 Gbs FC Qlogic HBA
1 x 6 Gbs SAS HBA
2 x 300 GB internal disks
Oracle Solaris 11
TimesTen 11.2.2.4.1
1 x Sun Fire X4275 server configured as COMSTAR redo head (log)

SPARC T5-2 server
2 x SPARC T5 processors, 3.6 GHz
512 GB memory
1 x 8 Gbs FC Qlogic HBA
1 x 6 Gbs SAS HBA
2 x 300 GB internal disks
Oracle Solaris 11
TimesTen 11.2.2.4.1
1 x Sun Fire X4275 server configured as COMSTAR redo head (log)

SPARC T4-4 server
4 x SPARC T4 processors, 3.0 GHz
1 TB memory
1 x 8 Gbs FC Qlogic HBA
1 x 6 Gbs SAS HBA
6 x 300 GB internal disks
Oracle Solaris 11
TimesTen 11.2.2.4.1
Sun Storage F5100 Flash Array (80 x 24 GB flash modules)
1 x Sun Fire X4275 server configured as COMSTAR redo head (log)

SPARC T4-2 server
2 x SPARC T4 processors, 2.85 GHz
256 GB memory
1 x 8 Gbs FC Qlogic HBA
1 x 6 Gbs SAS HBA
4 x 300 GB internal disks
Oracle Solaris 11
TimesTen 11.2.2.4.1
Sun Storage F5100 Flash Array (40 x 24 GB flash modules)
1 x Sun Fire X4275 server configured as COMSTAR head

SPARC T3-4 server
4 x SPARC T3 processors, 1.6 GHz
512 GB memory
1 x 8 Gbs FC Qlogic HBA
8 x 146 GB internal disks
Oracle Solaris 11
TimesTen 11.2.2.4.1
1 x Sun Fire X4275 server configured as COMSTAR head

Intel Server x86_64
2 x Intel Xeon E5-2680 processors, 2.7 GHz
256 GB memory
4 x SSD SAS disks (log)
1 x 600 GB internal disks
Oracle Linux
TimesTen 11.2.2.4.1

Sun Server X2-4
4 x Intel Xeon E7-4870 processors, 2.4 GHz
512 GB memory
1 x 8 Gbs FC Qlogic HBA
6 x 146 GB internal disks
Oracle Solaris 11
TimesTen 11.2.2.4.1
1 x Sun Fire X4275 server configured as COMSTAR redo head (log)

Benchmark Descriptions

TimesTen Performance Throughput BenchMark (TPTBM) is shipped with TimesTen and measures the total throughput of the system. The benchmark workloads can be reads, inserts, updates, and delete operations, or a mix of them as required.

Mobile Call Processing is a customer-based workload for processing calls made by mobile phone subscribers. The workload has a mixture of read-only, update, and insert-only transactions. The peak throughput performance is measured from multiple concurrent processes executing the transactions until a peak performance is reached via saturation of the available resources.

Key Points and Best Practices

The Mobile Call Processing test utilized Oracle Solaris processor sets in all environments for optimum performance. This features isolates running processes from other processes in the system. Combined with parameters to limit memory pages to the lgroup within the processor set and isolating the processor set to a single processor within the system.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 26 March 2013.

SPARC T5-2 Achieves ZFS File System Encryption Benchmark World Record

Oracle continues to lead in enterprise security. Oracle's SPARC T5 processors combined with the Oracle Solaris ZFS file system demonstrate faster file system encryption than equivalent x86 systems using the Intel Xeon Processor E5-2600 Sequence chips which have AES-NI security instructions.

Encryption is the process where data is encoded for privacy and a key is needed by the data owner to access the encoded data.

  • The SPARC T5-2 server is 3.4x faster than a 2 processor Intel Xeon E5-2690 server running Oracle Solaris 11.1 that uses the AES-NI GCM security instructions for creating encrypted files.

  • The SPARC T5-2 server is 2.2x faster than a 2 processor Intel Xeon E5-2690 server running Oracle Solaris 11.1 that uses the AES-NI CCM security instructions for creating encrypted files.

  • The SPARC T5-2 server consumes a significantly less percentage of system resources as compared to a 2 processor Intel Xeon E5-2690 server.

Performance Landscape

Below are results running two different ciphers for ZFS encryption. Results are presented for runs without any cipher, labeled clear, and a variety of different key lengths. The results represent the maximum delivered values measured for 3 concurrent sequential write operations using 1M blocks. Performance is measured in MB/sec (bigger is better). System utilization is reported as %CPU as measured by iostat (smaller is better).

The results for the x86 server were obtained using Oracle Solaris 11.1 with performance bug fixes.

Encryption Using AES-GCM Ciphers

System GCM Encryption: 3 Concurrent Sequential Writes
Clear AES-256-GCM AES-192-GCM AES-128-GCM
MB/sec %CPU MB/sec %CPU MB/sec %CPU MB/sec %CPU
SPARC T5-2 server 3,918 7 3,653 14 3,676 15 3,628 14
SPARC T4-2 server 2,912 11 2,662 31 2,663 30 2,779 31
2-Socket Intel Xeon E5-2690 3,969 42 1,062 58 1,067 58 1,076 57
SPARC T5-2 vs x86 server 1.0x 3.4x 3.4x 3.4x

Encryption Using AES-CCM Ciphers

System CCM Encryption: 3 Concurrent Sequential Writes
Clear AES-256-CCM AES-192-CCM AES-128-CCM
MB/sec %CPU MB/sec %CPU MB/sec %CPU MB/sec %CPU
SPARC T5-2 server 3,862 7 3,665 15 3,622 14 3,707 12
SPARC T4-2 server 2,945 11 2,471 26 2,801 26 2,442 25
2-Socket Intel Xeon E5-2690 3,868 42 1,566 64 1,632 63 1,689 66
SPARC T5-2 vs x86 server 1.0x 2.3x 2.2x 2.2x

Configuration Summary

Storage Configuration:

Sun Storage 6780 array
4 CSM2 trays, each with 16 83GB 15K RPM drives
8x 8 GB/sec Fiber Channel ports per host
R0 Write cache enabled, controller mirroring off for peak write bandwidth
8 Drive R0 512K stripe pools mirrored via ZFS to storage

Sun Storage 6580 array
9 CSM2 trays, each with 16 136GB 15K RPM drives
8x 4 GB/sec Fiber Channel ports per host
R0 Write cache enabled, controller mirroring off for peak write bandwidth
4 Drive R0 512K stripe pools mirrored via ZFS to storage

Server Configuration:

SPARC T5-2 server
2 x SPARC T5 3.6 GHz processors
512 GB memory
Oracle Solaris 11.1

SPARC T4-2 server
2 x SPARC T4 2.85 GHz processors
256 GB memory
Oracle Solaris 11.1

Sun Server X3-2L server
2 x Intel Xeon E5-2690, 2.90 GHz processors
128 GB memory
Oracle Solaris 11.1

Switch Configuration:

Brocade 5300 FC switch

Benchmark Description

This benchmark evaluates secure file system performance by measuring the rate at which encrypted data can be written. The Vdbench tool was used to generate the IO load. The test performed 3 concurrent sequential write operations using 1M blocks to 3 separate files.

Key Points and Best Practices

  • ZFS encryption is integrated with the ZFS command set. Like other ZFS operations, encryption operations such as key changes and re-key are performed online.

  • Data is encrypted using AES (Advanced Encryption Standard) with key lengths of 256, 192, and 128 in the CCM and GCM operation modes.

  • The flexibility of encrypting specific file systems is a key feature.

  • ZFS encryption is inheritable to descendent file systems. Key management can be delegated through ZFS delegated administration.

  • ZFS encryption uses the Oracle Solaris Cryptographic Framework which gives it access to SPARC T5 and Intel Xeon E5-2690 processor hardware acceleration or to optimized software implementations of the encryption algorithms automatically.

  • On modern computers with multiple threads per core, simple statistics like %utilization measured in tools like iostat and vmstat are not "hard" indications of the resources that might be available for other processing. For example, 90% idle may not mean that 10 times the work can be done. So drawing numerical conclusions must be done carefully.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of March 26, 2013.

SPARC T5-2 Obtains Oracle Internet Directory Benchmark World Record Performance

Oracle's SPARC T5-2 server running Oracle Internet Directory (OID, Oracle's LDAP Directory Server) on Oracle Solaris 11 achieved a record result for LDAP searches/second with 1000 clients.

  • The SPARC T5-2 server running Oracle Internet Directory on Oracle Solaris 11 achieved a result of 944,624 LDAP searches/sec with an average latency of 1.05 ms with 1000 clients.

  • The SPARC T5-2 server running Oracle Internet Directory demonstrated 2.7x better throughput and 39% better latency improvement over similarly configured OID and SPARC T4 benchmark environment.

  • The SPARC T5-2 server running Oracle Internet Directory demonstrates 39% better throughput and latency for LDAP searches on core-to-core comparison over an x86 system configured with two Intel Xeon X5675 processors.

  • Oracle Internet Directory achieved near linear scaling on the SPARC T5-2 server with 68,399 LDAP searches/sec with 2 cores to 944,624 LDAP searches/sec with 32 cores.

  • Oracle Internet Directory and the SPARC T5-2 server achieved up to 12,453 LDAP modifys/sec with an average latency of 3.9 msec for 50 clients.

Performance Landscape

Oracle Internet Directory Tests
System c/c/th Search Modify Add
ops/sec lat (msec) ops/sec lat (msec) ops/sec lat (msec)
SPARC T5-2 2/32/256 944,624 1.05 12,453 3.9 888 17.9
SPARC T4-4 4/32/256 682,000 1.46 12,000 4.0 835 19.0

In order to compare the SPARC T5-2 to a 12-core x86 system, only 1 processor and 12 cores was used in the SPARC T5-2.

Oracle Internet Directory Tests – Comparing Against x86
System c/c/th Search Compare Authentication
ops/sec lat (msec) ops/sec lat (msec) ops/sec lat (msec)
SPARC T5-2 1/12/96 417,000 1.19 274,185 1.82 149,623 3.30
x86 2 x Intel X5675 2/12/24 299,000 1.66 202,433 2.46 119,198 4.19

Scaling runs were also made on the SPARC T5-2 server.

Scaling of Search Tests – SPARC T5-2
Cores Clients ops/sec Latency (msec)
32 1000 944,624 1.05
24 1000 823,741 1.21
16 500 560,709 0.88
8 500 270,601 1.84
4 100 145,879 0.68
2 100 68,399 1.46

Configuration Summary

System Under Test:

SPARC T5-2
2 x SPARC T5 processors, 3.6 GHz
512 GB memory
4 x 300 GB internal disks
Flash Storage (used for database and log files)
1 x Sun Storage 2540-M2 (used for redo logs)
Oracle Solaris 11.1
Oracle Internet Directory 11g Release 1 PS6 (11.1.1.7.0)
Oracle Database 11g Enterprise Edition 11.2.0.3 (64-bit)

Benchmark Description

Oracle Internet Directory (OID) is Oracle's LDAPv3 Directory Server. The throughput for five key operations are measured — Search, Compare, Modify, Mix and Add.

LDAP Search Operations Test

This test scenario involved concurrent clients binding once to OID and then performing repeated LDAP Search operations. The salient characteristics of this test scenario is as follows:

  • SLAMD SearchRate job was used.
  • BaseDN of the search is root of the DIT, the scope is SUBTREE, the search filter is of the form UID=, DN and UID are the required attribute.
  • Each LDAP search operation matches a single entry.
  • The total number concurrent clients was 1000 and were distributed amongst two client nodes.
  • Each client binds to OID once and performs repeated LDAP Search operations, each search operation resulting in the lookup of a unique entry in such a way that no client looks up the same entry twice and no two clients lookup the same entry and all entries are searched randomly.
  • In one run of the test, random entries from the 50 Million entries are looked up in as many LDAP Search operations.
  • Test job was run for 60 minutes.

LDAP Compare Operations Test

This test scenario involved concurrent clients binding once to OID and then performing repeated LDAP Compare operations on userpassword attribute. The salient characteristics of this test scenario is as follows:

  • SLAMD CompareRate job was used.
  • Each LDAP compare operation matches user password of user.
  • The total number concurrent clients was 1000 and were distributed amongst two client nodes.
  • Each client binds to OID once and performs repeated LDAP compare operations.
  • In one run of the test, random entries from the 50 Million entries are compared in as many LDAP compare operations.
  • Test job was run for 60 minutes.

LDAP Modify Operations Test

This test scenario consisted of concurrent clients binding once to OID and then performing repeated LDAP Modify operations. The salient characteristics of this test scenario is as follows:

  • SLAMD LDAP modrate job was used.
  • A total of 50 concurrent LDAP clients were used.
  • Each client updates a unique entry each time and a total of 50 Million entries are updated.
  • Test job was run for 60 minutes.
  • Value length was set to 11.
  • Attribute that is being modified is not indexed.

LDAP Mixed Load Test

The test scenario involved both the LDAP search and LDAP modify clients enumerated above.

  • The ratio involved 60% LDAP search clients, 30% LDAP bind and 10% LDAP modify clients.
  • A total of 1000 concurrent LDAP clients were used and were distributed on 2 client nodes.
  • Test job was run for 60 minutes.

LDAP Add Load Test

The test scenario involved concurrent clients adding new entries as follows.

  • Slamd standard add rate job is used.
  • A total of 500,000 entries were added.
  • A total of 16 concurrent LDAP clients were used.
  • Slamd add's inetorgperson objectclass entry with 21 attributes (includes operational attributes).

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 26 March 2013.

Friday Feb 22, 2013

Oracle Produces World Record SPECjbb2013 Result with Oracle Solaris and Oracle JDK

From www.spec.org

Defects Identified in SPECjbb®2013

December 9, 2014 - SPEC has identified a defect in its SPECjbb®2013 benchmark suite. SPEC has suspended sales of the benchmark software and is no longer accepting new submissions of SPECjbb®2013 results for publication on SPEC's website. Current SPECjbb®2013 licensees will receive a free copy of the new version of the benchmark when it becomes available.

SPEC is advising SPECjbb®2013 licensees and users of the SPECjbb®2013 metrics that the recently discovered defect impacts the comparability of results. This defect can significantly impact the amount of work done during the measurement period, resulting in an inflated SPECjbb®2013 metric. SPEC recommends that users not utilize these results for system comparisons without a full understanding of the impact of these defects on each benchmark result.

Additional information is available here.

Oracle, using Oracle Solaris and Oracle JDK, delivered a world record result on the SPECjbb2013 benchmark (Composite metric). This benchmark was designed by the industry to showcase Java server performance. SPECjbb2013 is the replacement for SPECjbb2005 (SPECjbb2005 will soon be retired by SPEC).

  • Oracle Solaris is 1.8x faster on the SPECjbb2013-Composite max-jOPS metric than the Red Hat Enterprise Linux result.

  • Oracle Solaris is 2.2x faster on the SPECjbb2013-Composite critical-jOPS metric than the Red Hat Enterprise Linux result.

  • The combination of Oracle Solaris 11.1 and Oracle JDK 7 update 15 delivered a result of 37,007 SPECjbb2013-Composite max-jOPS and 13,812 SPECjbb2013-Composite critical-jOPS on the SPECjbb2013 benchmark.
    (Oracle has submitted this result for review by SPEC and it is currently under review.)

From SPEC's press release, "SPECjbb2013 replaces SPECjbb2005. The new benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is expected to be used widely by all those interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community."

Performance Landscape

Results of SPECjbb2013 from www.spec.org as of February 22, 2013 and this report.

SPECjbb2013
System Processor SPECjbb2013-Composite OS JDK
max-jOPS critical-jOPS
Sun Server X2-4 4 x Intel E7-4870 37,007 13,812 Solaris 11.1 Oracle JDK 7u15
Supermicro X8DTN+ 2 x Intel E5690 20,977 6,188 RHEL 6.3 Oracle JDK 7u11
Intel R1304BT 1 x Intel 1260L 6,198 1,722 Windows 2008 R2 Oracle JDK 7u11

The above table represents all of the published results on www.spec.org. SPEC allows for self publication of SPECjbb2013 results. AnandTech has taken advantage of this and has some result on their website which were run on Intel Xeon E5-2660, AMD Opteron 6380, AMD Opteron 6376 systems. These information be viewed at: www.anandtech.com. Unfortunately AnandTech did not follow SPEC's Fair Use requirements in disclosing information about their runs, so it is not possible to include the results in the table above.

SPECjbb2013
System Processor SPECjbb2013-MultiJVM OS JDK
max-jOPS critical-jOPS
HP ProLiant DL560p Gen8 4 x Intel E5-4650 66,007 16,577 Windows Server 2008 Oracle JDK 7u15
HP ProLiant ML350p Gen8 2 x Intel E5-2690 40,047 12,308 Windows Server 2008 Oracle JDK 7u15
HP ProLiant ML310e Gen8 1 x Intel E3-1280v2 12,315 2,908 Windows 2008 R2 Oracle JDK 7u15

Configuration Summary

System Under Test:

Sun Server X2-4
4 x Intel Xeon E7-4870, 2.40 GHz
Hyper-Threading enabled
Turbo Boost enabled
128 GB memory (32 x 4 GB dimms)
Oracle Solaris 11.1
Oracle JDK 7 update 15

Benchmark Description

The SPECjbb2013 benchmark has been developed from the ground up to measure performance based on the latest Java application features. It is relevant to all audiences who are interested in Java server performance, including JVM vendors, hardware developers, Java application developers, researchers and members of the academic community.

SPECjbb2013 replaces SPECjbb2005. New features include:

  • A usage model based on a world-wide supermarket company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases and data-mining operations.
  • Both a pure throughput metric and a metric that measures critical throughput under service-level agreements (SLAs) specifying response times ranging from 10ms to 500ms.
  • Support for multiple run configurations, enabling users to analyze and overcome bottlenecks at multiple layers of the system stack, including hardware, OS, JVM and application layers.
  • Exercising new Java 7 features and other important performance elements, including the latest data formats (XML), communication using compression, and messaging with security.
  • Support for virtualization and cloud environments.

See Also

Disclosure Statement

SPEC and the benchmark name SPECjbb are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of 2/22/2013, see http://www.spec.org for more information. Sun Server X2-4 37007 SPECjbb2013-Composite max-jOPS, 13812 SPECjbb2013-Composite critical-jOPS.

Thursday Apr 12, 2012

Sun Fire X4270 M3 SAP Enhancement Package 4 for SAP ERP 6.0 (Unicode) Two-Tier Standard Sales and Distribution (SD) Benchmark

Oracle's Sun Fire X4270 M3 server (now known as Sun Server X3-2L) achieved 8,320 SAP SD Benchmark users running SAP enhancement package 4 for SAP ERP 6.0 with unicode software using Oracle Database 11g and Oracle Solaris 10.

  • The Sun Fire X4270 M3 server using Oracle Database 11g and Oracle Solaris 10 beat both IBM Flex System x240 and IBM System x3650 M4 server running DB2 9.7 and Windows Server 2008 R2 Enterprise Edition.

  • The Sun Fire X4270 M3 server running Oracle Database 11g and Oracle Solaris 10 beat the HP ProLiant BL460c Gen8 server using SQL Server 2008 and Windows Server 2008 R2 Enterprise Edition by 6%.

  • The Sun Fire X4270 M3 server using Oracle Database 11g and Oracle Solaris 10 beat Cisco UCS C240 M3 server running SQL Server 2008 and Windows Server 2008 R2 Datacenter Edition by 9%.

  • The Sun Fire X4270 M3 server running Oracle Database 11g and Oracle Solaris 10 beat the Fujitsu PRIMERGY RX300 S7 server using SQL Server 2008 and Windows Server 2008 R2 Enterprise Edition by 10%.

Performance Landscape

SAP-SD 2-Tier Performance Table (in decreasing performance order).

SAP ERP 6.0 Enhancement Pack 4 (Unicode) Results
(benchmark version from January 2009 to April 2012)

System OS
Database
Users SAP
ERP/ECC
Release
SAPS SAPS/
Proc
Date
Sun Fire X4270 M3
2xIntel Xeon E5-2690 @2.90GHz
128 GB
Oracle Solaris 10
Oracle Database 11g
8,320 2009
6.0 EP4
(Unicode)
45,570 22,785 10-Apr-12
IBM Flex System x240
2xIntel Xeon E5-2690 @2.90GHz
128 GB
Windows Server 2008 R2 EE
DB2 9.7
7,960 2009
6.0 EP4
(Unicode)
43,520 21,760 11-Apr-12
HP ProLiant BL460c Gen8
2xIntel Xeon E5-2690 @2.90GHz
128 GB
Windows Server 2008 R2 EE
SQL Server 2008
7,865 2009
6.0 EP4
(Unicode)
42,920 21,460 29-Mar-12
IBM System x3650 M4
2xIntel Xeon E5-2690 @2.90GHz
128 GB
Windows Server 2008 R2 EE
DB2 9.7
7,855 2009
6.0 EP4
(Unicode)
42,880 21,440 06-Mar-12
Cisco UCS C240 M3
2xIntel Xeon E5-2690 @2.90GHz
128 GB
Windows Server 2008 R2 DE
SQL Server 2008
7,635 2009
6.0 EP4
(Unicode)
41,800 20,900 06-Mar-12
Fujitsu PRIMERGY RX300 S7
2xIntel Xeon E5-2690 @2.90GHz
128 GB
Windows Server 2008 R2 EE
SQL Server 2008
7,570 2009
6.0 EP4
(Unicode)
41,320 20,660 06-Mar-12

Complete benchmark results may be found at the SAP benchmark website http://www.sap.com/benchmark.

Configuration and Results Summary

Hardware Configuration:

Sun Fire X4270 M3
2 x 2.90 GHz Intel Xeon E5-2690 processors
128 GB memory
Sun StorageTek 6540 with 4 * 16 * 300GB 15Krpm 4Gb FC-AL

Software Configuration:

Oracle Solaris 10
Oracle Database 11g
SAP enhancement package 4 for SAP ERP 6.0 (Unicode)

Certified Results (published by SAP):

Number of benchmark users:
8,320
Average dialog response time:
0.95 seconds
Throughput:

Fully processed order line:
911,330

Dialog steps/hour:
2,734,000

SAPS:
45,570
SAP Certification:
2012014

Benchmark Description

The SAP Standard Application SD (Sales and Distribution) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing, and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Disclosure Statement

Two-tier SAP Sales and Distribution (SD) standard SAP SD benchmark based on SAP enhancement package 4 for SAP ERP 6.0 (Unicode) application benchmark as of 04/11/12: Sun Fire X4270 M3 (2 processors, 16 cores, 32 threads) 8,320 SAP SD Users, 2 x 2.90 GHz Intel Xeon E5-2690, 128 GB memory, Oracle 11g, Solaris 10, Cert# 2012014. IBM Flex System x240 (2 processors, 16 cores, 32 threads) 7,960 SAP SD Users, 2 x 2.90 GHz Intel Xeon E5-2690, 128 GB memory, DB2 9.7, Windows Server 2008 R2 EE, Cert# 2012016. IBM System x3650 M4 (2 processors, 16 cores, 32 threads) 7,855 SAP SD Users, 2 x 2.90 GHz Intel Xeon E5-2690, 128 GB memory, DB2 9.7, Windows Server 2008 R2 EE, Cert# 2012010. Cisco UCS C240 M3 (2 processors, 16 cores, 32 threads) 7,635 SAP SD Users, 2 x 2.90 GHz Intel Xeon E5-2690, 128 GB memory, SQL Server 2008, Windows Server 2008 R2 DE, Cert# 2012011. Fujitsu PRIMERGY RX300 S7 (2 processors, 16 cores, 32 threads) 7,570 SAP SD Users, 2 x 2.90 GHz Intel Xeon E5-2690, 128 GB memory, SQL Server 2008, Windows Server 2008 R2 EE, Cert# 2012008. HP ProLiant DL380p Gen8 (2 processors, 16 cores, 32 threads) 7,865 SAP SD Users, 2 x 2.90 GHz Intel Xeon E5-2690, 128 GB memory, SQL Server 2008, Windows Server 2008 R2 EE, Cert# 2012012.

SAP, R/3, reg TM of SAP AG in Germany and other countries. More info www.sap.com/benchmark

Tuesday Apr 10, 2012

SPEC CPU2006 Results on Oracle's Sun x86 Servers

Oracle's new Sun x86 servers delivered world records on the benchmarks SPECfp2006 and SPECint_rate2006 for two processor servers. This was accomplished with Oracle Solaris 11 and Oracle Solaris Studio 12.3 software.

  • The Sun Fire X4170 M3 (now known as Sun Server X3-2) server achieved a world record result in for SPECfp2006 benchmark with a score of 96.8.

  • The Sun Blade X6270 M3 server module (now known as Sun Blade X3-2B) produced best integer throughput performance for all 2-socket servers with a SPECint_rate2006 score of 705.

  • The Sun x86 servers with Intel Xeon E5-2690 2.9 GHz processors produced a cross-generational performance improvement up to 1.8x over the previous generation, Sun x86 M2 servers.

Performance Landscape

Complete benchmark results are at the SPEC website, SPEC CPU2006 Results. The tables below provide the new Oracle results, as well as, select results from other vendors.

SPECint2006
System Processor c/c/t * Peak Base O/S Compiler
Fujitsu PRIMERGY BX924 S3 Intel E5-2690, 2.9 GHz 2/16/16 60.8 56.0 RHEL 6.2 Intel 12.1.2.273
Sun Fire X4170 M3 Intel E5-2690, 2.9 GHz 2/16/32 58.5 54.3 Oracle Linux 6.1 Intel 12.1.0.225
Sun Fire X4270 M2 Intel X5690, 3.47 GHz 2/12/12 46.2 43.9 Oracle Linux 5.5 Intel 12.0.1.116

SPECfp2006
System Processor c/c/t * Peak Base O/S Compiler
Sun Fire X4170 M3 Intel E5-2690, 2.9 GHz 2/16/32 96.8 86.4 Oracle Solaris 11 Studio 12.3
Sun Blade X6270 M3 Intel E5-2690, 2.9 GHz 2/16/32 96.0 85.2 Oracle Solaris 11 Studio 12.3
Sun Fire X4270 M3 Intel E5-2690, 2.9 GHz 2/16/32 95.9 85.1 Oracle Solaris 11 Studio 12.3
Fujitsu CELSIUS R920 Intel E5-2687, 2.9 GHz 2/16/16 93.8 87.6 RHEL 6.1 Intel 12.1.2.273
Sun Fire X4270 M2 Intel X5690, 3.47 GHz 2/12/24 64.2 59.2 Oracle Solaris 10 Studio 12.2

Only 2-chip server systems listed below, excludes workstations.

SPECint_rate2006
System Processor Base
Copies
c/c/t * Peak Base O/S Compiler
Sun Blade X6270 M3 Intel E5-2690, 2.9 GHz 32 2/16/32 705 632 Oracle Solaris 11 Studio 12.3
Sun Fire X4270 M3 Intel E5-2690, 2.9 GHz 32 2/16/32 705 630 Oracle Solaris 11 Studio 12.3
Sun Fire X4170 M3 Intel E5-2690, 2.9 GHz 32 2/16/32 702 628 Oracle Solaris 11 Studio 12.3
Cisco UCS C220 M3 Intel E5-2690, 2.9 GHz 32 2/16/32 697 671 RHEL 6.2 Intel 12.1.0.225
Sun Blade X6270 M2 Intel X5690, 3.47 GHz 24 2/12/24 410 386 Oracle Linux 5.5 Intel 12.0.1.116

SPECfp_rate2006
System Processor Base
Copies
c/c/t * Peak Base O/S Compiler
Cisco UCS C240 M3 Intel E5-2690, 2.9 GHz 32 2/16/32 510 496 RHEL 6.2 Intel 12.1.2.273
Sun Fire X4270 M3 Intel E5-2690, 2.9 GHz 64 2/16/32 497 461 Oracle Solaris 11 Studio 12.3
Sun Blade X6270 M3 Intel E5-2690, 2.9 GHz 32 2/16/32 497 460 Oracle Solaris 11 Studio 12.3
Sun Fire X4170 M3 Intel E5-2690, 2.9 GHz 64 2/16/32 495 464 Oracle Solaris 11 Studio 12.3
Sun Fire X4270 M2 Intel E5690, 3.47 GHz 24 2/12/24 273 265 Oracle Linux 5.5 Intel 12.0.1.116

* c/c/t — chips / cores / threads enabled

Configuration Summary and Results

Hardware Configuration:

Sun Fire X4170 M3 server
2 x 2.90 GHz Intel Xeon E5-2690 processors
128 GB memory (16 x 8 GB 2Rx4 PC3-12800R-11, ECC)

Sun Fire X4270 M3 server
2 x 2.90 GHz Intel Xeon E5-2690 processors
128 GB memory (16 x 8 GB 2Rx4 PC3-12800R-11, ECC)

Sun Blade X6270 M3 server module
2 x 2.90 GHz Intel Xeon E5-2690 processors
128 GB memory (16 x 8 GB 2Rx4 PC3-12800R-11, ECC)

Software Configuration:

Oracle Solaris 11 11/11 (SRU2)
Oracle Solaris Studio 12.3 (patch update 1 nightly build 120313) Oracle Linux Server Release 6.1
Intel C++ Studio XE 12.1.0.225
SPEC CPU2006 V1.2

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark. It measures:

  • Speed — single copy performance of chip, memory, compiler
  • Rate — multiple copy (throughput)

The benchmark is also divided into integer intensive applications and floating point intensive applications:

  • integer: 12 benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding
  • floating point: 17 benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

It is also divided depending upon the amount of optimization allowed:

  • base: optimization is consistent per compiled language, all benchmarks must be compiled with the same flags per language.
  • peak: specific compiler optimization is allowed per application.

The overall metrics for the benchmark which are commonly used are:

  • SPECint_rate2006, SPECint_rate_base2006: integer, rate
  • SPECfp_rate2006, SPECfp_rate_base2006: floating point, rate
  • SPECint2006, SPECint_base2006: integer, speed
  • SPECfp2006, SPECfp_base2006: floating point, speed

See here for additional information.

See Also

Disclosure Statement

SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of 10 April 2012 from www.spec.org and this report.

SPEC CPU2006 Results on Oracle's Netra Server X3-2

Oracle's Netra Server X3-2 (formerly Sun Netra X4270 M3) equipped with the new Intel Xeon processor E5-2658, is up to 2.5x faster than the previous generation Netra systems on SPEC CPU2006 workloads.

Performance Landscape

Complete benchmark results are at the SPEC website, SPEC CPU2006 results. The tables below provide the new Oracle results and previous generation results.

SPECint2006
System Processor c/c/t * Peak Base O/S Compiler
Netra Server X3-2
Intel E5-2658, 2.1 GHz 2/16/32 38.5 36.0 Oracle Linux 6.1 Intel 12.1.0.225
Sun Netra X4270 Intel L5518, 2.13 GHz 2/8/16 27.9 25.0 Oracle Linux 5.4 Intel 11.1
Sun Netra X4250 Intel L5408, 2.13 GHz 2/8/8 20.3 17.9 SLES 10 SP1 Intel 11.0

SPECfp2006
System Processor c/c/t * Peak Base O/S Compiler
Netra Server X3-2 Intel E5-2658, 2.1 GHz 2/16/32 65.3 61.6 Oracle Linux 6.1 Intel 12.1.0.225
Sun Netra X4270 Intel L5518, 2.13 GHz 2/8/16 32.5 29.4 Oracle Linux 5.4 Intel 11.1
Sun Netra X4250 Intel L5408, 2.13 GHz 2/8/8 18.5 17.7 SLES 10 SP1 Intel 11.0

SPECint_rate2006
System Processor Base
Copies
c/c/t * Peak Base O/S Compiler
Netra Server X3-2 Intel E5-2658, 2.1 GHz 32 2/16/32 477 455 Oracle Linux 6.1 Intel 12.1.0.225
Sun Netra X4270 Intel L5518, 2.13 GHz 16 2/8/16 201 189 Oracle Linux 5.4 Intel 11.1
Sun Netra X4250 Intel L5408, 2.13 GHz 8 2/8/8 103 82.0 SLES 10 SP1 Intel 11.0

SPECfp_rate2006
System Processor Base
Copies
c/c/t * Peak Base O/S Compiler
Netra Server X3-2 Intel E5-2658, 2.1 GHz 32 2/16/32 392 383 Oracle Linux 6.1 Intel 12.1.0.225
Sun Netra X4270 Intel L5518, 2.13 GHz 16 2/8/16 155 153 Oracle Linux 5.4 Intel 11.1
Sun Netra X4250 Intel L5408, 2.13 GHz 8 2/8/8 55.9 52.3 SLES 10 SP1 Intel 11.0

* c/c/t — chips / cores / threads enabled

Configuration Summary

Hardware Configuration:

Netra Server X3-2
2 x 2.10 GHz Intel Xeon E5-2658 processors
128 GB memory (16 x 8 GB 2Rx4 PC3-12800R-11, ECC)

Software Configuration:

Oracle Linux Server Release 6.1
Intel C++ Studio XE 12.1.0.225
SPEC CPU2006 V1.2

Benchmark Description

SPEC CPU2006 is SPEC's most popular benchmark. It measures:

  • Speed — single copy performance of chip, memory, compiler
  • Rate — multiple copy (throughput)

The benchmark is also divided into integer intensive applications and floating point intensive applications:

  • integer: 12 benchmarks derived from real applications such as perl, gcc, XML processing, and pathfinding
  • floating point: 17 benchmarks derived from real applications, including chemistry, physics, genetics, and weather.

It is also divided depending upon the amount of optimization allowed:

  • base: optimization is consistent per compiled language, all benchmarks must be compiled with the same flags per language.
  • peak: specific compiler optimization is allowed per application.

The overall metrics for the benchmark which are commonly used are:

  • SPECint_rate2006, SPECint_rate_base2006: integer, rate
  • SPECfp_rate2006, SPECfp_rate_base2006: floating point, rate
  • SPECint2006, SPECint_base2006: integer, speed
  • SPECfp2006, SPECfp_base2006: floating point, speed

See here for additional information.

See Also

Disclosure Statement

SPEC and the benchmark names SPECfp and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. Results as of 10 July 2012 from www.spec.org and this report.

Thursday Mar 29, 2012

Sun Server X2-8 (formerly Sun Fire X4800 M2) Delivers World Record TPC-C for x86 Systems

Oracle's Sun Server X2-8 (formerly Sun Fire X4800 M2 server) equipped with eight 2.4 GHz Intel Xeon Processor E7-8870 chips obtained a result of 5,055,888 tpmC on the TPC-C benchmark. This result is a world record for x86 servers. Oracle demonstrated this world record database performance running Oracle Database 11g Release 2 Enterprise Edition with Partitioning.

  • The Sun Server X2-8 delivered a new x86 TPC-C world record of 5,055,888 tpmC with a price performance of $0.89/tpmC using Oracle Database 11g Release 2. This configuration is available 7/10/12.

  • The Sun Server X2-8 delivers 3.0x times better performance than the next 8-processor result, an IBM System p 570 equipped with POWER6 processors.

  • The Sun Server X2-8 has 3.1x times better price/performance than the 8-processor 4.7GHz POWER6 IBM System p 570.

  • The Sun Server X2-8 has 1.6x times better performance than the 4-processor IBM x3850 X5 system equipped with Intel Xeon processors.

  • This is the first TPC-C result on any system using eight Intel Xeon Processor E7-8800 Series chips.

  • The Sun Server X2-8 is the first x86 system to get over 5 million tpmC.

  • The Oracle solution utilized Oracle Linux operating system and Oracle Database 11g Enterprise Edition Release 2 with Partitioning to produce the x86 world record TPC-C benchmark performance.

Performance Landscape

Select TPC-C results (sorted by tpmC, bigger is better)

System p/c/t tpmC Price
/tpmC
Avail Database Memory
Size
Sun Server X2-8 8/80/160 5,055,888 0.89 USD 7/10/2012 Oracle 11g R2 4 TB
IBM x3850 X5 4/40/80 3,014,684 0.59 USD 7/11/2011 DB2 ESE 9.7 3 TB
IBM x3850 X5 4/32/64 2,308,099 0.60 USD 5/20/2011 DB2 ESE 9.7 1.5 TB
IBM System p 570 8/16/32 1,616,162 3.54 USD 11/21/2007 DB2 9.0 2 TB

p/c/t - processors, cores, threads
Avail - availability date

Oracle and IBM TPC-C Response times

System tpmC Response Time (sec)
New Order 90th%
Response Time (sec)
New Order Average

Sun Server X2-8 5,055,888 0.210 0.166
IBM x3850 X5 3,014,684 0.500 0.272
Ratios - Oracle Better 1.6x 1.4x 1.3x

Oracle uses average new order response time for comparison between Oracle and IBM.

Graphs of Oracle's and IBM's response times for New-Order can be found in the full disclosure reports on TPC's website TPC-C Official Result Page.

Configuration Summary and Results

Hardware Configuration:

Server
Sun Server X2-8
8 x 2.4 GHz Intel Xeon Processor E7-8870
4 TB memory
8 x 300 GB 10K RPM SAS internal disks
8 x Dual port 8 Gbs FC HBA

Data Storage
10 x Sun Fire X4270 M2 servers configured as COMSTAR heads, each with
1 x 3.06 GHz Intel Xeon X5675 processor
8 GB memory
10 x 2 TB 7.2K RPM 3.5" SAS disks
2 x Sun Storage F5100 Flash Array storage (1.92 TB each)
1 x Brocade 5300 switches

Redo Storage
2 x Sun Fire X4270 M2 servers configured as COMSTAR heads, each with
1 x 3.06 GHz Intel Xeon X5675 processor
8 GB memory
11 x 2 TB 7.2K RPM 3.5" SAS disks

Clients
8 x Sun Fire X4170 M2 servers, each with
2 x 3.06 GHz Intel Xeon X5675 processors
48 GB memory
2 x 300 GB 10K RPM SAS disks

Software Configuration:

Oracle Linux (Sun Fire 4800 M2)
Oracle Solaris 11 Express (COMSTAR for Sun Fire X4270 M2)
Oracle Solaris 10 9/10 (Sun Fire X4170 M2)
Oracle Database 11g Release 2 Enterprise Edition with Partitioning
Oracle iPlanet Web Server 7.0 U5
Tuxedo CFS-R Tier 1

Results:

System: Sun Server X2-8
tpmC: 5,055,888
Price/tpmC: 0.89 USD
Available: 7/10/2012
Database: Oracle Database 11g
Cluster: no
New Order Average Response: 0.166 seconds

Benchmark Description

TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

Key Points and Best Practices

  • Oracle Database 11g Release 2 Enterprise Edition with Partitioning scales easily to this high level of performance.

  • COMSTAR (Common Multiprotocol SCSI Target) is the software framework that enables an Oracle Solaris host to serve as a SCSI Target platform. COMSTAR uses a modular approach to break the huge task of handling all the different pieces in a SCSI target subsystem into independent functional modules which are glued together by the SCSI Target Mode Framework (STMF). The modules implementing functionality at SCSI level (disk, tape, medium changer etc.) are not required to know about the underlying transport. And the modules implementing the transport protocol (FC, iSCSI, etc.) are not aware of the SCSI-level functionality of the packets they are transporting. The framework hides the details of allocation providing execution context and cleanup of SCSI commands and associated resources and simplifies the task of writing the SCSI or transport modules.

  • Oracle iPlanet Web Server middleware is used for the client tier of the benchmark. Each web server instance supports more than a quarter-million users while satisfying the response time requirement from the TPC-C benchmark.

See Also

Disclosure Statement

TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Processing Performance Council (TPC). Sun Server X2-8 (8/80/160) with Oracle Database 11g Release 2 Enterprise Edition with Partitioning, 5,055,888 tpmC, $0.89 USD/tpmC, available 7/10/2012. IBM x3850 X5 (4/40/80) with DB2 ESE 9.7, 3,014,684 tpmC, $0.59 USD/tpmC, available 7/11/2011. IBM x3850 X5 (4/32/64) with DB2 ESE 9.7, 2,308,099 tpmC, $0.60 USD/tpmC, available 5/20/2011. IBM System p 570 (8/16/32) with DB2 9.0, 1,616,162 tpmC, $3.54 USD/tpmC, available 11/21/2007. Source: http://www.tpc.org/tpcc, results as of 7/15/2011.

Friday Sep 30, 2011

SPARC T4-2 Server Beats Intel (Westmere AES-NI) on ZFS Encryption Tests

Oracle continues to lead in enterprise security. Oracle's SPARC T4 processors combined with Oracle's Solaris ZFS file system demonstrate faster file system encryption than equivalent systems based on the Intel Xeon Processor 5600 Sequence chips which use AES-NI security instructions.

Encryption is the process where data is encoded for privacy and a key is needed by the data owner to access the encoded data. The benefits of using ZFS encryption are:

  • The SPARC T4 processor is 3.5x to 5.2x faster than the Intel Xeon Processor X5670 that has the AES-NI security instructions in creating encrypted files.

  • ZFS encryption is integrated with the ZFS command set. Like other ZFS operations, encryption operations such as key changes and re-key are performed online.

  • Data is encrypted using AES (Advanced Encryption Standard) with key lengths of 256, 192, and 128 in the CCM and GCM operation modes.

  • The flexibility of encrypting specific file systems is a key feature.

  • ZFS encryption is inheritable to descendent file systems. Key management can be delegated through ZFS delegated administration.

  • ZFS encryption uses the Oracle Solaris Cryptographic Framework which gives it access to SPARC T4 processor and Intel Xeon X5670 processor (Intel AES-NI) hardware acceleration or to optimized software implementations of the encryption algorithms automatically.

Performance Landscape

Below are results running two different ciphers for ZFS encryption. Results are presented for runs without any cipher, labeled clear, and a variety of different key lengths.

Encryption Using AES-CCM Ciphers

MB/sec – 5 File Create* Encryption
Clear AES-256-CCM AES-192-CCM AES-128-CCM
SPARC T4-2 server 3,803 3,167 3,335 3,225
SPARC T3-2 server 2,286 1,554 1,561 1,594
2-Socket 2.93 GHz Xeon X5670 3,325 750 764 773

Speedup T4-2 vs X5670 1.1x 4.2x 4.4x 4.2x
Speedup T4-2 vs T3-2 1.7x 2.0x 2.1x 2.0x

Encryption Using AES-GCM Ciphers

MB/sec – 5 File Create* Encryption
Clear AES-256-GCM AES-192-GCM AES-128-GCM
SPARC T4-2 server 3,618 3,929 3,164 2,613
SPARC T3-2 server 2,278 1,451 1,455 1,449
2-Socket 2.93 GHz Xeon X5670 3,299 749 748 753

Speedup T4-2 vs X5670 1.1x 5.2x 4.2x 3.5x
Speedup T4-2 vs T3-2 1.6x 2.7x 2.2x 1.8x

(*) Maximum Delivered values measured over 5 concurrent mkfile operations.

Configuration Summary

Storage Configuration:

Sun Storage 6780 array
16 x 15K RPM drives
Raid 0 pool
Write back cache enable
Controller cache mirroring disabled for maximum bandwidth for test
Eight 8 Gb/sec ports per host

Server Configuration:

SPARC T4-2 server
2 x SPARC T4 2.85 GHz processors
256 GB memory
Oracle Solaris 11

SPARC T3-2 server
2 x SPARC T3 1.6 GHz processors
Oracle Solaris 11 Express 2010.11

Sun Fire X4270 M2 server
2 x Intel Xeon X5670, 2.93 GHz processors
Oracle Solaris 11

Benchmark Description

The benchmark ran the UNIX command mkfile (1M). Mkfile is a simple single threaded program to create a file of a specified size. The script ran 5 mkfile operations in the background and observed the peak bandwidth observed during the test.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of December 16, 2011.

Monday Sep 19, 2011

Halliburton ProMAX® Seismic Processing on Sun Blade X6270 M2 with Sun ZFS Storage 7320

Halliburton/Landmark's ProMAX® 3D Pre-Stack Kirchhoff Time Migration's (PSTM) single workflow scalability and multiple workflow throughput using various scheduling methods are evaluated on a cluster of Oracle's Sun Blade X6270 M2 server modules attached to Oracle's Sun ZFS Storage 7320 appliance.

Two resource scheduling methods, compact and distributed, are compared while increasing the system load with additional concurrent ProMAX® workflows.

  • Multiple concurrent 24-process ProMAX® PSTM workflow throughput is constant; 10 workflows on 10 nodes finish as fast as 1 workflow on one compute node. Additionally, processing twice the data volume yields similar traces/second throughput performance.

  • A single ProMAX® PSTM workflow has good scaling from 1 to 10 nodes of a Sun Blade X6270 M2 cluster scaling 4.5X. ProMAX® scales to 4.7X on 10 nodes with one input data set and 6.3X with two consecutive input data sets (i.e. twice the data).

  • A single ProMAX® PSTM workflow has near linear scaling of 11x on a Sun Blade X6270 M2 server module when running from 1 to 12 processes.

  • The 12-thread ProMAX® workflow throughput using the distributed scheduling method is equivalent or slightly faster than the compact scheme for 1 to 6 concurrent workflows.

Performance Landscape

Multiple 24-Process Workflow Throughput Scaling

This test measures the system throughput scalability as concurrent 24-process workflows are added, one workflow per node. The per workflow throughput and the system scalability are reported.

Aggregate system throughput scales linearly. Ten concurrent workflows finish in the same time as does one workflow on a single compute node.

Halliburton ProMAX® Pre-Stack Time Migration - Multiple Workflow Scaling


Single Workflow Scaling

This test measures single workflow scalability across a 10-node cluster. Utilizing a single data set, performance exhibits near linear scaling of 11x at 12 processes, and per-node scaling of 4x at 6 nodes; performance flattens quickly reaching a peak of 60x at 240 processors and per-node scaling of 4.7x with 10 nodes.

Running with two consecutive input data sets in the workflow, scaling is considerably improved with peak scaling ~35% higher than obtained using a single data set. Doubling the data set size minimizes time spent in workflow initialization, data input and output.

Halliburton ProMAX® Pre-Stack Time Migration - Single Workflow Scaling

This next test measures single workflow scalability across a 10-node cluster (as above) but limiting scheduling to a maximum of 12-process per node; effectively restricting a maximum of one process per physical core. The speedup relative to a single process, and single node are reported.

Utilizing a single data set, performance exhibits near linear scaling of 37x at 48 processes, and per-node scaling of 4.3x at 6 nodes. Performance of 55x at 120 processors and per-node scaling of 5x with 10 nodes is reached and scalability is trending higher more strongly compared to the the case of two processes running per physical core above. For equivalent total process counts, multi-node runs using only a single process per physical core appear to run between 28-64% more efficiently (96 and 24 processes respectively). With a full compliment of 10 nodes (120 processes) the peak performance is only 9.5% lower than with 2 processes per vcpu (240 processes).

Running with two consecutive input data sets in the workflow, scaling is considerably improved with peak scaling ~35% higher than obtained using a single data set.

Halliburton ProMAX® Pre-Stack Time Migration - Single Workflow Scaling

Multiple 12-Process Workflow Throughput Scaling, Compact vs. Distributed Scheduling

The fourth test compares compact and distributed scheduling of 1, 2, 4, and 6 concurrent 12-processor workflows.

All things being equal, the system bi-section bandwidth should improve with distributed scheduling of a fixed-size workflow; as more nodes are used for a workflow, more memory and system cache is employed and any node memory bandwidth bottlenecks can be offset by distributing communication across the network (provided the network and inter-node communication stack do not become a bottleneck). When physical cores are not over-subscribed, compact and distributed scheduling performance is within 3% suggesting that there may be little memory contention for this workflow on the benchmarked system configuration.

With compact scheduling of two concurrent 12-processor workflows, the physical cores become over-subscribed and performance degrades 36% per workflow. With four concurrent workflows, physical cores are oversubscribed 4x and performance is seen to degrade 66% per workflow. With six concurrent workflows over-subscribed compact scheduling performance degrades 77% per workflow. As multiple 12-processor workflows become more and more distributed, the performance approaches the non over-subscribed case.

Halliburton ProMAX® Pre-Stack Time Migration - Multiple Workflow Scaling

141616 traces x 624 samples


Test Notes

All tests were performed with one input data set (70808 traces x 624 samples) and two consecutive input data sets (2 * (70808 traces x 624 samples)) in the workflow. All results reported are the average of at least 3 runs and performance is based on reported total wall-clock time by the application.

All tests were run with NFS attached Sun ZFS Storage 7320 appliance and then with NFS attached legacy Sun Fire X4500 server. The StorageTek Workload Analysis Tool (SWAT) was invoked to measure the I/O characteristics of the NFS attached storage used on separate runs of all workflows.

Configuration Summary

Hardware Configuration:

10 x Sun Blade X6270 M2 server modules, each with
2 x 3.33 GHz Intel Xeon X5680 processors
48 GB DDR3-1333 memory
4 x 146 GB, Internal 10000 RPM SAS-2 HDD
10 GbE
Hyper-Threading enabled

Sun ZFS Storage 7320 Appliance
1 x Storage Controller
2 x 2.4 GHz Intel Xeon 5620 processors
48 GB memory (12 x 4 GB DDR3-1333)
2 TB Read Cache (4 x 512 GB Read Flash Accelerator)
10 GbE
1 x Disk Shelf
20.0 TB RAID-Z (20 x 1 TB SAS-2, 7200 RPM HDD)
4 x Write Flash Accelerators

Sun Fire X4500
2 x 2.8 GHz AMD 290 processors
16 GB DDR1-400 memory
34.5 TB RAID-Z (46 x 750 GB SATA-II, 7200 RPM HDD)
10 GbE

Software Configuration:

Oracle Linux 5.5
Parallel Virtual Machine 3.3.11 (bundled with ProMAX)
Intel 11.1.038 Compilers
Libraries: pthreads 2.4, Java 1.6.0_01, BLAS, Stanford Exploration Project Libraries

Benchmark Description

The ProMAX® family of seismic data processing tools is the most widely used Oil and Gas Industry seismic processing application. ProMAX® is used for multiple applications, from field processing and quality control, to interpretive project-oriented reprocessing at oil companies and production processing at service companies. ProMAX® is integrated with Halliburton's OpenWorks® Geoscience Oracle Database to index prestack seismic data and populate the database with processed seismic.

This benchmark evaluates single workflow scalability and multiple workflow throughput of the ProMAX® 3D Prestack Kirchhoff Time Migration (PSTM) while processing the Halliburton benchmark data set containing 70,808 traces with 8 msec sample interval and trace length of 4992 msec. Benchmarks were performed with both one and two consecutive input data sets.

Each workflow consisted of:

  • reading the previously constructed MPEG encoded processing parameter file
  • reading the compressed seismic data traces from disk
  • performing the PSTM imaging
  • writing the result to disk

Workflows using two input data sets were constructed by simply adding a second identical seismic data read task immediately after the first in the processing parameter file. This effectively doubled the data volume read, processed, and written.

This version of ProMAX® currently only uses Parallel Virtual Machine (PVM) as the parallel processing paradigm. The PVM software only used TCP networking and has no internal facility for assigning memory affinity and processor binding. Every compute node is running a PVM daemon.

The ProMAX® processing parameters used for this benchmark:

Minimum output inline = 65
Maximum output inline = 85
Inline output sampling interval = 1
Minimum output xline = 1
Maximum output xline = 200 (fold)
Xline output sampling interval = 1
Antialias inline spacing = 15
Antialias xline spacing = 15
Stretch Mute Aperature Limit with Maximum Stretch = 15
Image Gather Type = Full Offset Image Traces
No Block Moveout
Number of Alias Bands = 10
3D Amplitude Phase Correction
No compression
Maximum Number of Cache Blocks = 500000

Primary PSTM business metrics are typically time-to-solution and accuracy of the subsurface imaging solution.

Key Points and Best Practices

  • Multiple job system throughput scales perfectly; ten concurrent workflows on 10 nodes each completes in the same time and has the same throughput as a single workflow running on one node.
  • Best single workflow scaling is 6.6x using 10 nodes.

    When tasked with processing several similar workflows, while individual time-to-solution will be longer, the most efficient way to run is to fully distribute them one workflow per node (or even across two nodes) and run these concurrently, rather than to use all nodes for each workflow and running consecutively. For example, while the best-case configuration used here will run 6.6 times faster using all ten nodes compared to a single node, ten such 10-node jobs running consecutively will overall take over 50% longer to complete than ten jobs one per node running concurrently.

  • Throughput was seen to scale better with larger workflows. While throughput with both large and small workflows are similar with only one node, the larger dataset exhibits 11% and 35% more throughput with four and 10 nodes respectively.

  • 200 processes appears to be a scalability asymptote with these workflows on the systems used.
  • Hyperthreading marginally helps throughput. For the largest model run on 10 nodes, 240 processes delivers 11% more performance than with 120 processes.

  • The workflows do not exhibit significant I/O bandwidth demands. Even with 10 concurrent 24-process jobs, the measured aggregate system I/O did not exceed 100 MB/s.

  • 10 GbE was the only network used and, though shared for all interprocess communication and network attached storage, it appears to have sufficient bandwidth for all test cases run.

See Also

Disclosure Statement

The following are trademarks or registered trademarks of Halliburton/Landmark Graphics: ProMAX®, GeoProbe®, OpenWorks®. Results as of 9/1/2011.

Thursday Sep 15, 2011

Sun Fire X4800 M2 Servers (now known as Sun Server X2-8) Produce World Record on SAP SD-Parallel Benchmark

Oracle delivered an SAP enhancement package 4 for SAP ERP 6.0 (Unicode) Sales and Distribution - Parallel (SD Parallel) Benchmark world record result using eight of Oracle's Sun Fire X4800 M2 servers (now known as Sun Server X2-8), Oracle Solaris 10 and Oracle Database 11g Real Application Clusters (RAC) software that achieved 180,000 users as of 10/03/2011.

  • The eight Sun Fire X4800 M2 servers delivered a world record result of 180,000 users on the SAP SD Parallel Benchmark.

  • The eight Sun Fire X4800 M2 server SD Parallel result of 180,000 users delivered 43% more performance compared to the IBM Power 795 server SD two-tier result of 126,063 users.

Performance Landscape

Selected SAP Sales and Distribution (SD) benchmark results are presented in decreasing order of performance. All benchmarks were using SAP enhancement package 4 for SAP ERP 6.0 (Unicode).

System OS
Database
Users SAPS Type Cert #
Eight Sun Fire X4800 M2
8 x Intel Xeon E7-8870 @2.4 GHz
512 GB
Oracle Solaris 10
Oracle 11g RAC
180,000 1,016,380 Parallel 2011037
Six Sun Fire X4800 M2
8 x Intel Xeon E7-8870 @2.4 GHz
512 GB
Oracle Solaris 10
Oracle 11g RAC
137,904 765,470 Parallel 2011038
IBM Power 795
32 x POWER7 @4.0 GHz
4096 GB
AIX 7.1
DB2 9.7
126,063 688,630 Two-Tier 2010046
Four Sun Fire X4800 M2
8 x Intel Xeon E7-8870 @2.4 GHz
512 GB
Oracle Solaris 10
Oracle 11g RAC
94,736 546,050 Parallel 2011039
Two Sun Fire X4800 M2
8 x Intel Xeon E7-8870 @2.4 GHz
512 GB
Oracle Solaris 10
Oracle 11g RAC
49,860 274,080 Parallel 2011040
Four Sun Fire X4470
4 x Intel Xeon X7560 @2.26 GHz
256 GB
Solaris 10
Oracle 11g RAC
40,000 221,020 Parallel 2010039

Complete benchmark results and descriptions can be found at the SAP standard applications benchmark website.
For SD benchmark results website: Two-Tier or Three-Tier. For SD Parallel benchmark results website: SD Parallel.

Configuration and Results Summary

Hardware Configuration:

8 x Sun Fire X4800 M2 servers, each with
8 x Intel Xeon E7-8870 @ 2.4 GHz (8 processors, 80 cores, 160 threads)
512 GB memory

Software Configuration:

SAP enhancement package 4 for SAP ERP 6.0
Oracle Database 11g Real Application Clusters (RAC)
Oracle Solaris 10

Results Summary:

Number of SAP SD benchmark users:
180,000
Average dialog response time:
0.63 seconds
Throughput:

Fully processed order line items per hour:
20,327,670

Dialog steps/hour:
60,983,000

SAPS:
1,016,380
Average database request time (dialog/update):
0.010 sec / 0.055 sec
SAP Certification:
2011037

Benchmark Description

The SAP Standard Application Sales and Distribution - Parallel (SD Parallel) Benchmark is a two-tier ERP business test that is indicative of full business workloads of complete order processing and invoice processing and demonstrates the ability to run both the application and database software on a single system. The SAP Standard Application SD Benchmark represents the critical tasks performed in real-world ERP business environments.

The SD Parallel Benchmark consists of the same transactions and user interaction steps as the two-tier and three-tier SD Benchmark. This means that the SD Parallel Benchmark runs the same business processes as the SD Benchmark. The difference between the benchmarks is the technical data distribution. Additionally, the benchmark requires equal distribution of the benchmark users across all database nodes for the used benchmark clients (round-robin method). Following this rule, all database nodes work on data of all clients. This avoids unrealistic configurations such as having only one client per database node.

The SAP Benchmark Council agreed to give the parallel benchmark a different name so that the difference can be easily recognized by any interested parties - customers, prospects, and analysts. The naming convention is SD Parallel for Sales & Distribution - Parallel.

SAP is one of the premier world-wide ERP application providers, and maintains a suite of benchmark tests to demonstrate the performance of competitive systems on the various SAP products.

See Also

Disclosure Statement

SAP enhancement package 4 for SAP ERP 6.0 (Unicode) Sales and Distribution Benchmark, results as of 10/03/2011.

SD Parallel, 8 x Sun Fire X4800 M2 (each 8 processors, 80 cores, 160 threads) 180,000 SAP SD Users, Oracle Solaris 10, Oracle 11g Real Application Clusters (RAC), Certification Number 2011037.
SD Parallel, 6 x Sun Fire X4800 M2 (each 8 processors, 80 cores, 160 threads) 137,904 SAP SD Users, Oracle Solaris 10, Oracle 11g Real Application Clusters (RAC), Certification Number 2011038.
SD Parallel, 4 x Sun Fire X4470 (each 4 processors, 32 cores, 64 threads) 40,000 SAP SD Users, Oracle Solaris 10, Oracle 11g Real Application Clusters (RAC), Certification Number 2010039.
SD Two-Tier, IBM Power 795 (32 processors, 256 cores, 1024 threads) 126,063 SAP SD Users, AIX 7.1, DB2 9.7, Certification Number 2010046.

SAP, R/3 are registered trademarks of SAP AG in Germany and other countries. More information may be found at www.sap.com/benchmark.
About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« February 2016
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
     
       
Today