Friday Mar 20, 2015

Oracle ZFS Storage ZS4-4 Shows 1.8x Generational Performance Improvement on SPC-2 Benchmark

The Oracle ZFS Storage ZS4-4 appliance delivered 1.8x improved performance and 1.3x improved price performance over the previous generation Oracle ZFS Storage ZS3-4 appliance as shown by the SPC-2 benchmark.

  • Running the SPC-2 benchmark, the Oracle ZFS Storage ZS4-4 appliance delivered SPC-2 Price-Performance of $17.09 and an overall score of 31,486.23 SPC-2 MBPS.

  • The Oracle ZFS Storage continues its strong price performance by occupying the three of the top five SPC-2 price performance.

  • Oracle holds the three of the top four performance results on the SPC-2 benchmark for HDD based systems.

  • The Oracle ZFS Storage ZS4-4 appliance has a 7.6x price-performance advantage over the IBM DS8870 and 2x performance advantage as measured by the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS4-4 appliance has a 5.0x performance advantage over the new Fujitsu DX200 S3 as measured by the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS4-4 appliance has a 4.6x price-performance advantage over the Fujitsu ET8700 S2 and 1.9x performance advantage as shown by the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS4-4 appliance has a 4.6x price-performance advantage over the Hitachi Virtual Storage Platform (VSP) and 1.96x performance advantage as measured by the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS4-4 appliance has a 1.6x price-performance advantage over the HP XP7 disk array as shown by the SPC-2 benchmark (HP even discounted their hardware 63%).

Performance Landscape

SPC-2 Price-Performance

Below is a table of the top SPC-2 Price-Performance results for HDD storage based systems, presented in increasing price-performance order (as of 03/17/2015). The complete set of results may be found at SPC2 top 10 Price-Performance list.

System SPC-2
MBPS
$/SPC-2
MBPS
Results
Identifier
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 BE00002
Fujitsu Eternus DX200 S3 6,266.50 $15.42 B00071
SGI InfiniteStorage 5600 8,855.70 $15.97 B00065
Oracle ZFS Storage ZS4-4 31,486.23 $17.09 B00072
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 B00067
NEC Storage M700 14,408.89 $25.10 B00066
Sun StorageTek 2530 663.51 $26.48 B00026
HP XP7 storage 43,012.53 $28.30 B00070
Fujitsu ETERNUS DX80 S2 2,685.50 $28.48 B00055
SGI InfiniteStorage 5500-SP 4,064.49 $28.57 B00059
Hitachi Unified Storage VM 11,274.83 $32.64 B00069

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
Results Identifier = A unique identification of the result

SPC-2 Performance

The following table list the top SPC-2 -Performance results for HDD storage based systems, presented in decreasing performance order (as of 03/17/2015). The complete set of results may be found at the SPC2 top 10 Performance list.

HDD Based Systems SPC-2
MBPS
$/SPC-2
MBPS
TSC Price Results
Identifier
HP XP7 storage 43,012.52 $28.30 $1,217,462 B00070
Oracle ZFS Storage ZS4-4 31,486.23 $17.09 $538,050 B00072
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 $388,472 B00067
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 $195,915 BE00002
Fujitsu ETERNUS DX8870 S2 16,038.74 $79.51 $1,275,163 B00063
IBM System Storage DS8870 15,423.66 $131.21 $2,023,742 B00062
IBM SAN VC v6.4 14,581.03 $129.14 $1,883,037 B00061
Hitachi Virtual Storage Platform (VSP) 13,147.87 $95.38 $1,254,093 B00060
HP StorageWorks P9500 XP Storage Array 13,147.87 $88.34 $1,161,504 B00056

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at
http://www.storageperformance.org/results/benchmark_results_spc2.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS4-4 storage system in clustered configuration
2 x Oracle ZFS Storage ZS4-4 controllers with
8 x Intel Xeon processors
3 TB memory
24 x Oracle Storage Drive Enclosure DE2-24P, each with
24 x 300 GB 10K RPM SAS-2 drives

Benchmark Description

SPC Benchmark 2 (SPC-2): Consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.

  • Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
  • Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
  • Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.

SPC-2 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

See Also

Disclosure Statement

SPC-2 and SPC-2 MBPS are registered trademarks of Storage Performance Council (SPC). Results as of March 17, 2015, for more information see www.storageperformance.org.

Oracle ZFS Storage ZS4-4 - B00072, Oracle ZFS Storage ZS3-2 - BE00002, Oracle ZFS Storage ZS3-4 - B00067, Fujitsu ETERNUS DX80 S2, B00055, Fujitsu ETERNUS DX8870 S2 - B00063, Fujitsu ETERNUS DX200 S3 - B00071, HP StorageWorks P9500 XP Storage Array - B00056, HP XP7 Storage Array - B00070, Hitachi Unified Storage VM - B00069, Hitachi Virtual Storage Platform (VSP) - B00060, IBM SAN VC v6.4 - B00061, IBM System Storage DS8870 - B00062, IBM XIV Storage System Gen3 - BE00001, NEC Storage M700 - B00066, SGI InfiniteStorage 5500-SP - B00059, SGI InfiniteStorage 5600 - B00065, Sun StorageTek 2530 - B00026.

Wednesday Jun 25, 2014

Oracle ZFS Storage ZS3-2 Delivers World Record Price-Performance on SPC-2/E

The Oracle ZFS Storage ZS3-2 appliance delivered a world record Price-Performance result, world record energy result and excellent overall performance for the SPC-2/E benchmark.

  • The Oracle ZFS Storage ZS3-2 appliance delivered the top SPC-2 Price-Performance of $12.08 and it delivered an overall score of 16,212.66 SPC-2 MBPS for the SPC-2/E benchmark.

  • The Oracle ZFS Storage ZS3-2 appliance produced the top Performance-Energy SPC-2/E benchmark result of 3.67 SPC2 MBPS / watt.

  • Oracle holds the top two performance results on the SPC-2 benchmark for HDD based systems.

  • The Oracle ZFS Storage ZS3-2 appliance has an 11x price-performance advantage over the IBM DS8870.

  • The Oracle ZFS Storage ZS3-2 appliance has an 8x price-performance advantage over the Hitachi Virtual Storage Platform (VSP).

  • The Oracle ZFS Storage ZS3-2 appliance has an 7.3x price-performance advantage over the HP P9500 XP disk array.

Performance Landscape

SPC-2 Price-Performance

Below is a table of the top SPC-2 Price-Performance results for HDD storage based systems, presented in increasing price-performance order (as of 06/25/2014). The complete set of results may be found at SPC2 top 10 Price-Performance list.

System SPC-2
MBPS
$/SPC-2
MBPS
Results
Identifier
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 BE00002
SGI InfiniteStorage 5600 8,855.70 $15.97 B00065
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 B00067
NEC Storage M700 14,408.89 $25.10 B00066
Sun StorageTek 2530 663.51 $26.48 B00026
Fujitsu ETERNUS DX80 S2 2,685.50 $28.48 B00055
SGI InfiniteStorage 5500-SP 4,064.49 $28.57 B00059
Hitachi Unified Storage VM 11,274.83 $32.64 B00069

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
Results Identifier = A unique identification of the result

SPC-2/E Results

The table below list all SPC-2/E results. The SPC-2/E benchmark extends the SPC-2 benchmark by additionally measuring power consumption during the SPC-2 benchmark run.

System SPC-2
MBPS
$/SPC-2
MBPS
TSC Price SPC2 MBPS /
watt
Results
Identifier
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 $195,915 3.67 BE00002
IBM XIV Storage System Gen3 7,467.99 $152.34 $1,137,641 0.81 BE00001

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
TSC Price = Total Cost of Ownership Metric
SPC2 MBPS / watt = Number of SPC2 MB/second produced per watt consumed. Higher is Better.
Results Identifier = A unique identification of the result

SPC-2 Performance

The following table list the top SPC-2 -Performance results for HDD storage based systems, presented in decreasing performance order (as of 06/25/2014). The complete set of results may be found at the SPC2 top 10 Performance list.

System SPC-2
MBPS
$/SPC-2
MBPS
TSC Price Results
Identifier
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 $388,472 B00067
Oracle ZFS Storage ZS3-2 16,212.66 $12.08 $195,915 BE00002
Fujitsu ETERNUS DX8870 S2 16,038.74 $79.51 $1,275,163 B00063
IBM System Storage DS8870 15,423.66 $131.21 $2,023,742 B00062
IBM SAN VC v6.4 14,581.03 $129.14 $1,883,037 B00061
Hitachi Virtual Storage Platform (VSP) 13,147.87 $95.38 $1,254,093 B00060
HP StorageWorks P9500 XP Storage Array 13,147.87 $88.34 $1,161,504 B00056

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at
http://www.storageperformance.org/results/benchmark_results_spc2.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS3-2 storage system in clustered configuration
2 x Oracle ZFS Storage ZS3-2 controllers, each with
4 x 2.1 GHz 8-core Intel Xeon processors
512 GB memory
12 x Sun Disk shelves, each with
24 x 300 GB 10K RPM SAS-2 drives

Benchmark Description

SPC Benchmark 2 (SPC-2): Consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.

  • Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
  • Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
  • Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.

SPC-2 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

SPC Benchmark 2/Energy (SPC-2/E): consists of the complete set of SPC-2 performance measurement and reporting plus the measurement and reporting of energy use. This benchmark extension provides measurement and reporting to complete storage configurations, complementing SPC-2C/E, which focuses on storage component configurations.

See Also

Disclosure Statement

SPC-2 and SPC-2 MBPS are registered trademarks of Storage Performance Council (SPC). Results as of June 25, 2014, for more information see www.storageperformance.org.

Fujitsu ETERNUS DX80 S2, B00055, Fujitsu ETERNUS DX8870 S2 - B00063, HP StorageWorks P9500 XP Storage Array - B00056, Hitachi Unified Storage VM - B00069, Hitachi Virtual Storage Platform (VSP) - B00060, IBM SAN VC v6.4 - B00061, IBM System Storage DS8870 - B00062, IBM XIV Storage System Gen3 - BE00001, NEC Storage M700 - B00066, Oracle ZFS Storage ZS3-2 - BE00002, Oracle ZFS Storage ZS3-4 - B00067, SGI InfiniteStorage 5500-SP - B00059, SGI InfiniteStorage 5600 - B00065, Sun StorageTek 2530 - B00026.

Monday Nov 25, 2013

World Record Single System TPC-H @10000GB Benchmark on SPARC T5-4

Oracle's SPARC T5-4 server delivered world record single server performance of 377,594 QphH@10000GB with price/performance of $4.65/QphH@10000GB USD on the TPC-H @10000GB benchmark. This result shows that the 4-chip SPARC T5-4 server is significantly faster than the 8-chip server results from HP (Intel x86 based).

  • The SPARC T5-4 server with four SPARC T5 processors is 2.4 times faster than the HP ProLiant DL980 G7 server with eight x86 processors.

  • The SPARC T5-4 server delivered 4.8 times better performance per chip and 3.0 times better performance per core than the HP ProLiant DL980 G7 server.

  • The SPARC T5-4 server has 28% better price/performance than the HP ProLiant DL980 G7 server (for the price/QphH metric).

  • The SPARC T5-4 server with 2 TB memory is 2.4 times faster than the HP ProLiant DL980 G7 server with 4 TB memory (for the composite metric).

  • The SPARC T5-4 server took 9 hours, 37 minutes, 54 seconds for data loading while the HP ProLiant DL980 G7 server took 8.3 times longer.

  • The SPARC T5-4 server accomplished the refresh function in around a minute, the HP ProLiant DL980 G7 server took up to 7.1 times longer to do the same function.

This result demonstrates a complete data warehouse solution that shows the performance both of individual and concurrent query processing streams, faster loading, and refresh of the data during business operations. The SPARC T5-4 server delivers superior performance and cost efficiency when compared to the HP result.

Performance Landscape

The table lists the leading TPC-H @10000GB results for non-clustered systems.

TPC-H @10000GB, Non-Clustered Systems
System
Processor
P/C/T – Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC T5-4
3.6 GHz SPARC T5
4/64/512 – 2048 GB
377,594.3 $4.65 342,714.1 416,024.4 Oracle 11g R2 11/25/13
HP ProLiant DL980 G7
2.4 GHz Intel Xeon E7-4870
8/80/160 – 4096 GB
158,108.3 $6.49 185,473.6 134,780.5 SQL Server 2012 04/15/13

P/C/T = Processors, Cores, Threads
QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric in USD (smaller is better)
QppH = the Power Numerical Quantity (bigger is better)
QthH = the Throughput Numerical Quantity (bigger is better)

The following table lists data load times and average refresh function times.

TPC-H @10000GB, Non-Clustered Systems
Database Load & Database Refresh
System
Processor
Data Loading
(h:m:s)
T5
Advan
RF1
(sec)
T5
Advan
RF2
(sec)
T5
Advan
SPARC T5-4
3.6 GHz SPARC T5
09:37:54 8.3x 58.8 7.1x 62.1 6.4x
HP ProLiant DL980 G7
2.4 GHz Intel Xeon E7-4870
79:28:23 1.0x 416.4 1.0x 394.9 1.0x

Data Loading = database load time
RF1 = throughput average first refresh transaction
RF2 = throughput average second refresh transaction
T5 Advan = the ratio of time to the SPARC T5-4 server time

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Server Under Test:

SPARC T5-4 server
4 x SPARC T5 processors (3.6 GHz total of 64 cores, 512 threads)
2 TB memory
2 x internal SAS (2 x 300 GB) disk drives
12 x 16 Gb FC HBA

External Storage:

24 x Sun Server X4-2L servers configured as COMSTAR nodes, each with
2 x 2.5 GHz Intel Xeon E5-2609 v2 processors
4 x Sun Flash Accelerator F80 PCIe Cards, 800 GB each
6 x 4 TB 7.2K RPM 3.5" SAS disks
1 x 8 Gb dual port HBA

2 x 48 port Brocade 6510 Fibre Channel Switches

Software Configuration:

Oracle Solaris 11.1
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 10000 GB (Scale Factor 10000)
TPC-H Composite: 377,594.3 QphH@10000GB
Price/performance: $4.65/QphH@10000GB USD
Available: 11/25/2013
Total 3 year Cost: $1,755,709 USD
TPC-H Power: 342,714.1
TPC-H Throughput: 416,024.4
Database Load Time: 9:37:54

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

  • COMSTAR (Common Multiprotocol SCSI Target) is the software framework that enables an Oracle Solaris host to serve as a SCSI Target platform. COMSTAR uses a modular approach to break the huge task of handling all the different pieces in a SCSI target subsystem into independent functional modules which are glued together by the SCSI Target Mode Framework (STMF). The modules implementing functionality at SCSI level (disk, tape, medium changer etc.) are not required to know about the underlying transport. And the modules implementing the transport protocol (FC, iSCSI, etc.) are not aware of the SCSI-level functionality of the packets they are transporting. The framework hides the details of allocation providing execution context and cleanup of SCSI commands and associated resources and simplifies the task of writing the SCSI or transport modules.

  • The SPARC T5-4 server achieved a peak IO rate of 37 GB/sec from the Oracle database configured with this storage.

  • Twelve COMSTAR nodes were mirrored to another twelve COMSTAR nodes on which all of the Oracle database files were placed. IO performance was high and balanced across all the nodes.

  • Oracle Solaris 11.1 required very little system tuning.

  • Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric when comparing systems.

  • The SPARC T5-4 server and Oracle Solaris efficiently managed the system load of nearly two thousand Oracle Database parallel processes.

See Also

Disclosure Statement

TPC Benchmark, TPC-H, QphH, QthH, QppH are trademarks of the Transaction Processing Performance Council (TPC). Results as of 11/25/13, prices are in USD. SPARC T5-4 www.tpc.org/3293; HP ProLiant DL980 G7 www.tpc.org/3285.

Wednesday Sep 25, 2013

SPARC T5-8 Delivers World Record Oracle OLAP Perf Version 3 Benchmark Result on Oracle Database 12c

Oracle's SPARC T5-8 server delivered world record query performance for systems running Oracle Database 12c for the Oracle OLAP Perf Version 3 benchmark.

  • The query throughput on the SPARC T5-8 server is 1.7x higher than that of an 8-chip Intel Xeon E7-8870 server. Both systems had sub-second average response times.

  • The SPARC T5-8 server with the Oracle Database demonstrated the ability to support at least 700 concurrent users querying OLAP cubes (with no think time), processing 2.33 million analytic queries per hour with an average response time of less than 1 second per query. This performance was enabled by keeping the entire cube in-memory utilizing the 4 TB of memory on the SPARC T5-8 server.

  • Assuming a 60 second think time between query requests, the SPARC T5-8 server can support approximately 39,450 concurrent users with the same sub-second response time.

  • The workload uses a set of realistic Business Intelligence (BI) queries that run against an OLAP cube based on a 4 billion row fact table of sales data. The 4 billion rows are partitioned by month spanning 10 years.

  • The combination of the Oracle Database 12cwith the Oracle OLAP option running on a SPARC T5-8 server supports live data updates occurring concurrently with minimally impacted user query executions.

Performance Landscape

Oracle OLAP Perf Version 3 Benchmark
Oracle cube base on 4 billion fact table rows
10 years of data partitioned by month
System Queries/
hour
Users Average Response
Time (sec)
0 sec think time 60 sec think time
SPARC T5-8 2,329,000 700 39,450 <1 sec
8-chip Intel Xeon E7-8870 1,354,000 120 22,675 <1 sec

Configuration Summary

SPARC T5-8:

1 x SPARC T5-8 server with
8 x SPARC T5 processors, 3.6 GHz
4 TB memory
Data Storage and Redo Storage
Flash Storage
Oracle Solaris 11.1 (11.1.8.2.0)
Oracle Database 12c Release 1 (12.1.0.1) with Oracle OLAP option

Sun Server X2-8:

1 x Sun Server X2-8 with
8 x Intel Xeon E7-8870 processors, 2.4 GHz
1 TB memory
Data Storage and Redo Storage
Flash Storage
Oracle Solaris 10 10/12
Oracle Database 12c Release 1 (12.1.0.1) with Oracle OLAP option

Benchmark Description

The Oracle OLAP Perf Version 3 benchmark is a workload designed to demonstrate and stress the ability of the OLAP Option to deliver fast query, near real-time updates and rich calculations using a multi-dimensional model in the context of the Oracle data warehousing.

The bulk of the benchmark entails running a number of concurrent users, each issuing typical multidimensional queries against an Oracle cube. The cube has four dimensions: time, product, customer, and channel. Each query user issues approximately 150 different queries. One query chain may ask for total sales in a particular region (e.g South America) for a particular time period (e.g. Q4 of 2010) followed by additional queries which drill down into sales for individual countries (e.g. Chile, Peru, etc.) with further queries drilling down into individual stores, etc. Another query chain may ask for yearly comparisons of total sales for some product category (e.g. major household appliances) and then issue further queries drilling down into particular products (e.g. refrigerators, stoves. etc.), particular regions, particular customers, etc.

While the core of every OLAP Perf benchmark is real world query performance, the benchmark itself offers numerous execution options such as varying data set sizes, number of users, numbers of queries for any given user and cube update frequency. Version 3 of the benchmark is executed with a much larger number of query streams than previous versions and used a cube designed for near real-time updates. The results produced by version 3 of the benchmark are not directly comparable to results produced by previous versions of the benchmark.

The near real-time update capability is implemented along the following lines. A large Oracle cube, H, is built from a 4 billion row star schema, containing data up until the end of last business day. A second small cube, D, is then created which will contain all of today's new data coming in from outside the world. It will be updated every L minutes with the data coming in within the last L minutes. A third cube, R, joins cubes H and D for reporting purposes much like a view might join data from two tables. Calculations are installed into cube R. The use of a reporting cube which draws data from different storage cubes is a common practice.

Query users are never locked out of query operations while new data is added to the update cube. The point of the demonstration is to show that an Oracle OLAP system can be designed which results in data being no more than L minutes out of date, where L may be as low as just a few minutes. This is what is meant by near real-time analytics.

Key Points and Best Practices

  • Building and querying cubes with the Oracle OLAP option requires a large temporary tablespace. Normally temporary tablespaces would reside on disk storage. However, because the SPARC T5-8 server used in this benchmark had 4 TB of main memory, it was possible to use main memory for the OLAP temporary tablespace. This was accomplished by using a temporary, memory-based file system (TMPFS) for the temporary tablespace datafiles.

  • Since typical business intelligence users are often likely to issue similar queries, either with the same or different constants in the where clauses, setting the init.ora parameter "cursor_sharing" to "force" provides for additional query throughput and a larger number of potential users.

  • Assuming the normal Oracle Database initialization parameters (e.g. SGA, PGA, processes etc.) are appropriately set, out of the box performance for the Oracle OLAP workload should be close to what is reported here. Additional performance resulted from using memory for the OLAP temporary tablespace setting "cursor_sharing" to force.

  • Oracle OLAP Cube update performance was optimized by running update processes in the FX class with a priority greater than 0.

  • The maximum lag time between updates to the source fact table and data availability to query users (what was referred to as L in the benchmark description) was less than 3 minutes for the benchmark environment on the SPARC T5-8 server.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 09/22/2013.

Tuesday Sep 10, 2013

Oracle ZFS Storage ZS3-4 Delivers World Record SPC-2 Performance

The Oracle Storage ZS3-4 storage system delivered a world record performance result for the SPC-2 benchmark along with excellent price-performance.

  • The Oracle Storage ZS3-4 storage system delivered an overall score of 17,244.22 SPC-2 MBPS™ and a SPC-2 price-performance of $22.53 on the SPC-2 benchmark.

  • This is over a 1.6X generational improvement in performance and over a 1.5X generational improvement in price-performance than over Oracle's Sun ZFS Storage 7420 SPC-2 Benchmark results.

  • The Oracle ZFS Storage ZS3-4 storage system has 6.8X better overall throughput and nearly 1.2X better price-performance than the IBM DS3524 Express turbo, which is IBM's best overall price-performance score on the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS3-4 storage system has over 1.1X overall throughput and 5.8X better price-performance than the IBM DS8870, which is IBM's best overall performance score on the SPC-2 benchmark.

  • The Oracle ZFS Storage ZS3-4 storage system has over 1.3X overall throughput and 3.9X better price-performance than the HP StorageWorks P9500XP Disk Array on the SPC-2 benchmark.

Performance Landscape

SPC-2 Performance Chart (in decreasing performance order)

System SPC-2
MB/s
$/SPC-2
MB/s
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 31,611 $388,472 Mirroring 09/10/13 B00067
Fujitsu DX8700 S2 16,039 $79.51 71,404 $1,275,163 Mirroring 12/03/12 B00063
IBM DS8870 15,424 $131.21 30,924 $2,023,742 RAID-5 10/03/12 B00062
IBM SAN VC v6.4 14,581 $129.14 74,492 $1,883,037 RAID-5 08/01/12 B00061
NEC Storage M700 14,409 $25.13 53,550 $361,613 Mirroring 08/19/12 B00066
Hitachi VSP 13,148 $95.38 129,112 $1,254,093 RAID-5 07/27/12 B00060
HP StorageWorks P9500 13,148 $88.34 129,112 $1,161,504 RAID-5 03/07/12 B00056
Sun ZFS Storage 7420 10,704 $35.24 31,884 $377,225 Mirroring 04/12/12 B00058
IBM DS8800 9,706 $270.38 71,537 $2,624,257 RAID-5 12/01/10 B00051
HP XP24000 8,725 $187.45 18,401 $1,635,434 Mirroring 09/08/08 B00035

SPC-2 MB/s = the Performance Metric
$/SPC-2 MB/s = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

SPC-2 Price-Performance Chart (in increasing price-performance order)

System SPC-2
MB/s
$/SPC-2
MB/s
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
SGI InfiniteStorage 5600 8,855.70 $15.97 28,748 $141,393 RAID6 03/06/13 B00065
Oracle ZFS Storage ZS3-4 17,244.22 $22.53 31,611 $388,472 Mirroring 09/10/13 B00067
Sun Storage J4200 548.80 $22.92 11,995 $12,580 Unprotected 07/10/08 B00033
NEC Storage M700 14,409 $25.13 53,550 $361,613 Mirroring 08/19/12 B00066
Sun Storage J4400 887.44 $25.63 23,965 $22,742 Unprotected 08/15/08 B00034
Sun StorageTek 2530 672.05 $26.15 1,451 $17,572 RAID5 08/16/07 B00026
Sun StorageTek 2530 663.51 $26.48 854 $17,572 Mirroring 08/16/07 B00025
Fujitsu ETERNUS DX80 1,357.55 $26.70 4,681 $36,247 Mirroring 03/15/10 B00050
IBM DS3524 Express Turbo 2,510 $26.76 14,374 $67,185 RAID-5 12/31/10 B00053
Fujitsu ETERNUS DX80 S2 2,685.50 $28.48 17,231 $76,475 Mirroring 08/19/11 B00055

SPC-2 MB/s = the Performance Metric
$/SPC-2 MB/s = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org/results/benchmark_results_spc2.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS3-4 storage system in clustered configuration
2 x Oracle ZFS Storage ZS3-4 controllers, each with
4 x 2.4 GHz 10-core Intel Xeon processors
1024 GB memory
16 x Sun Disk shelves, each with
24 x 300 GB 15K RPM SAS-2 drives

Benchmark Description

SPC Benchmark-2 (SPC-2): Consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.

  • Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
  • Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
  • Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.

SPC-2 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

See Also

Disclosure Statement

SPC-2 and SPC-2 MBPS are registered trademarks of Storage Performance Council (SPC). Results as of September 10, 2013, for more information see www.storageperformance.org. Oracle ZFS Storage ZS3-4 B00067, Fujitsu ET 8700 S2 B00063, IBM DS8870 B00062, IBM S.V.C 6.4 B00061, NEC Storage M700 B00066, Hitachi VSP B00060, HP P9500 XP Disk Array B00056, IBM DS8800 B00051.

Oracle ZFS Storage ZS3-4 Produces Best 2-Node Performance on SPECsfs2008 NFSv3

The Oracle ZFS Storage ZS3-4 storage system delivered world record two-node performance on the SPECsfs2008 NFSv3 benchmark, beating results published on NetApp's dual-controller and four-node high-end FAS6240 storage systems.

  • The Oracle ZFS Storage ZS3-4 storage system delivered a world record two-node result of 450,702 SPECsfs2008_nfs.v3 Ops/sec with an Overall Response Time (ORT) of 0.70 msec on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-4 storage system delivered 2.4x higher throughput than the dual-controller NetApp FAS6240 and 4.5x higher throughput than the dual-controller NetApp FAS3270 on the SPECsfs2008_nfs.v3 benchmark at less than half the list price of either result.

  • The Oracle ZFS Storage ZS3-4 storage system had 42 percent higher throughput than the four-node NetApp FAS6240 on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-4 storage aystem has 54 percent better Overall Response Time than the 4-node NetApp FAS6240 on the SPECsfs2008 NFSv3 benchmark.

Performance Landscape

Two node results for SPECsfs2008_nfs.v3 presented (in decreasing SPECsfs2008_nfs.v3 Ops/sec order) along with other select results.

Sponsor System Nodes Disks Throughput
(Ops/sec)
Overall Response
Time (msec)
Oracle ZS3-4 2 464 450,702 0.70
IBM SONAS 1.2 2 1975 403,326 3.23
NetApp FAS6240 4 288 260,388 1.53
NetApp FAS6240 2 288 190,675 1.17
EMC VG8 312 135,521 1.92
Oracle 7320 2 136 134,140 1.51
EMC NS-G8 100 110,621 2.32
NetApp FAS3270 2 360 101,183 1.66

Throughput SPECsfs2008_nfs.v3 Ops/sec — the Performance Metric
Overall Response Time — the corresponding Response Time Metric
Nodes — Nodes and Controllers are being used interchangeably

Complete SPECsfs2008 benchmark results may be found at http://www.spec.org/sfs2008/results/sfs2008.html.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS3-4 storage system in clustered configuration
2 x Oracle ZFS Storage ZS3-4 controllers, each with
8 x 2.4 GHz Intel Xeon E7-4870 processors
2 TB memory
2 x 10GbE NICs
20 x Sun Disk shelves
18 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 8 x 73 GB SAS-2 flash-enabled write-cache

Benchmark Description

SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation (SPEC) benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. SPECsfs2008 results summarize the server's capabilities with respect to the number of operations that can be handled per second, as well as the overall latency of the operations. The suite is a follow-on to the SFS97_R1 benchmark, adding a CIFS workload, an updated NFSv3 workload, support for additional client platforms, and a new test harness and reporting/submission framework.

See Also

Disclosure Statement

SPEC and SPECsfs are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of September 10, 2013, for more information see www.spec.org. Oracle ZFS Storage ZS3-4 Appliance 450,702 SPECsfs2008_nfs.v3 Ops/sec, 0.70 msec ORT, NetApp Data ONTAP 8.1 Cluster-Mode (4-node FAS6240) 260,388 SPECsfs2008_nfs.v3 Ops/Sec, 1.53 msec ORT, NetApp FAS6240 190,675 SPECsfs2008_nfs.v3 Ops/Sec, 1.17 msec ORT. NetApp FAS3270 101,183 SPECsfs2008_nfs.v3 Ops/Sec, 1.66 msec ORT.

Nodes refer to the item in the SPECsfs2008 disclosed Configuration Bill of Materials that have the Processing Elements that perform the NFS Processing Function. These are the first item listed in each of disclosed Configuration Bill of Materials except for EMC where it is both the first and third items listed, and HP, where it is the second item listed as Blade Servers. The number of nodes is from the QTY disclosed in the Configuration Bill of Materials as described above. Configuration Bill of Materials list price for Oracle result of US$ 423,644. Configuration Bill of Materials list price for NetApp FAS3270 result of US$ 1,215,290. Configuration Bill of Materials list price for NetApp FAS6240 result of US$ 1,028,118. Oracle pricing from https://shop.oracle.com/pls/ostore/f?p=dstore:home:0, traverse to "Storage and Tape" and then to "NAS Storage". NetApp's pricing from http://www.netapp.com/us/media/na-list-usd-netapp-custom-state-new-discounts.html.

Oracle ZFS Storage ZS3-2 Beats Comparable NetApp on SPECsfs2008 NFSv3

Oracle ZFS Storage ZS3-2 storage system delivered outstanding performance on the SPECsfs2008 NFSv3 benchmark, beating results published on NetApp's fastest midrange platform, the NetApp FAS3270, the NetApp FAS6240 and the EMC Gateway NS-G8 Server Failover Cluster.

  • The Oracle ZFS Storage ZS3-2 storage system delivered 210,535 SPECsfs2008_nfs.v3 Ops/sec with an Overall Response Time (ORT) of 1.12 msec on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-2 storage system delivered 10% higher throughput than the NetApp FAS6240 on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-2 storage system has 52% higher throughput than the NetApp FAS3270 on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-2 storage system has 5% better Overall Response Time than the NetApp FAS6240 on the SPECsfs2008 NFSv3 benchmark.

  • The Oracle ZFS Storage ZS3-2 storage system has 33% better Overall Response Time than the NetApp FAS3270 on the SPECsfs2008 NFSv3 benchmark.

Performance Landscape

Results for SPECsfs2008 NFSv3 (in decreasing SPECsfs2008_nfs.v3 Ops/sec order) for competitive systems.

Sponsor System Throughput
(Ops/sec)
Overall Response
Time (msec)
Oracle ZS3-2 210,535 1.12
NetApp FAS6240 190,675 1.17
EMC VG8 135,521 1.92
EMC NS-G8 110,621 2.32
NetApp FAS3270 101,183 1.66
NetApp FAS3250 100,922 1.76

Throughput SPECsfs2008_nfs.v3 Ops/sec = the Performance Metric
Overall Response Time = the corresponding Response Time Metric

Complete SPECsfs2008 benchmark results may be found at http://www.spec.org/sfs2008/results/sfs2008.html.

Configuration Summary

Storage Configuration:

Oracle ZFS Storage ZS3-2 storage system in clustered configuration
2 x Oracle ZFS Storage ZS3-2 controllers, each with
4 x 2.1 GHz Intel Xeon E5-2658 processors
512 GB memory
8 x Sun Disk shelves
3 x shelves with 24 x 900 GB 10K RPM SAS-2 drives
3 x shelves with 20 x 900 GB 10K RPM SAS-2 drives
2 x shelves with 20 x 900 GB 10K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Benchmark Description

SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation (SPEC) benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. SPECsfs2008 results summarize the server's capabilities with respect to the number of operations that can be handled per second, as well as the overall latency of the operations. The suite is a follow-on to the SFS97_R1 benchmark, adding a CIFS workload, an updated NFSv3 workload, support for additional client platforms, and a new test harness and reporting/submission framework.

 

See Also

Disclosure Statement

SPEC and SPECsfs are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of September 10, 2013, for more information see www.spec.org. Oracle ZFS Storage ZS3-2 Appliance 210,535 SPECsfs2008_nfs.v3 Ops/sec, 1.12 msec ORT, NetApp FAS6240 190,675 SPECsfs2008_nfs.v3 Ops/Sec, 1.17 msec ORT, EMC Celerra VG8 Server Failover Cluster, 2 Data Movers (1 stdby) / Symmetrix VMAX 135,521 SPECsfs2008_nfs.v3 Ops/Sec, 1.92 msec ORT, EMC Celerra Gateway NS-G8 Server Failover Cluster, 3 Datamovers (1 stdby) / Symmetrix V-Max 110,621 SPECsfs2008_nfs.v3 Ops/Sec, 2.32 msec ORT. NetApp FAS3270 101,183 SPECsfs2008_nfs.v3 Ops/Sec, 1.66 msec ORT. NetApp FAS3250 100,922 SPECsfs2008_nfs.v3 Ops/Sec, 1.76 msec ORT.

Wednesday Jun 12, 2013

SPARC T5-4 Produces World Record Single Server TPC-H @3000GB Benchmark Result

Oracle's SPARC T5-4 server delivered world record single server performance of 409,721 QphH@3000GB with price/performance of $3.94/QphH@3000GB on the TPC-H @3000GB benchmark. This result shows that the 4-chip SPARC T5-4 server is significantly faster than the 8-chip server results from IBM (POWER7 based) and HP (Intel x86 based).

This result demonstrates a complete data warehouse solution that shows the performance both of individual and concurrent query processing streams, faster loading, and refresh of the data during business operations. The SPARC T5-4 server delivers superior performance and cost efficiency when compared to the IBM POWER7 result.

  • The SPARC T5-4 server with four SPARC T5 processors is 2.1 times faster than the IBM Power 780 server with eight POWER7 processors and 2.5 times faster than the HP ProLiant DL980 G7 server with eight x86 processors on the TPC-H @3000GB benchmark. The SPARC T5-4 server also delivered better performance per core than these eight processor systems from IBM and HP.

  • The SPARC T5-4 server with four SPARC T5 processors is 2.1 times faster than the IBM Power 780 server with eight POWER7 processors on the TPC-H @3000GB benchmark.

  • The SPARC T5-4 server costs 38% less per $/QphH@3000GB compared to the IBM Power 780 server with the TPC-H @3000GB benchmark.

  • The SPARC T5-4 server took 2 hours, 6 minutes, 4 seconds for data loading while the IBM Power 780 server took 2.8 times longer.

  • The SPARC T5-4 server executed the first refresh function (RF1) in 19.4 seconds, the IBM Power 780 server took 7.6 times longer.

  • The SPARC T5-4 server with four SPARC T5 processors is 2.5 times faster than the HP ProLiant DL980 G7 server with the same number of cores on the TPC-H @3000GB benchmark.

  • The SPARC T5-4 server took 2 hours, 6 minutes, 4 seconds for data loading while the HP ProLiant DL980 G7 server took 4.1 times longer.

  • The SPARC T5-4 server executed the first refresh function (RF1) in 19.4 seconds, the HP ProLiant DL980 G7 server took 8.9 times longer.

  • The SPARC T5-4 server delivered 6% better performance than the SPARC Enterprise M9000-64 server and 2.1 times better than the SPARC Enterprise M9000-32 server on the TPC-H @3000GB benchmark.

Performance Landscape

The table lists the leading TPC-H @3000GB results for non-clustered systems.

TPC-H @3000GB, Non-Clustered Systems
System
Processor
P/C/T – Memory
Composite
(QphH)
$/perf
($/QphH)
Power
(QppH)
Throughput
(QthH)
Database Available
SPARC T5-4
3.6 GHz SPARC T5
4/64/512 – 2048 GB
409,721.8 $3.94 345,762.7 485,512.1 Oracle 11g R2 09/24/13
SPARC Enterprise M9000
3.0 GHz SPARC64 VII+
64/256/256 – 1024 GB
386,478.3 $18.19 316,835.8 471,428.6 Oracle 11g R2 09/22/11
SPARC T4-4
3.0 GHz SPARC T4
4/32/256 – 1024 GB
205,792.0 $4.10 190,325.1 222,515.9 Oracle 11g R2 05/31/12
SPARC Enterprise M9000
2.88 GHz SPARC64 VII
32/128/256 – 512 GB
198,907.5 $15.27 182,350.7 216,967.7 Oracle 11g R2 12/09/10
IBM Power 780
4.1 GHz POWER7
8/32/128 – 1024 GB
192,001.1 $6.37 210,368.4 175,237.4 Sybase 15.4 11/30/11
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
8/64/128 – 512 GB
162,601.7 $2.68 185,297.7 142,685.6 SQL Server 2008 10/13/10

P/C/T = Processors, Cores, Threads
QphH = the Composite Metric (bigger is better)
$/QphH = the Price/Performance metric in USD (smaller is better)
QppH = the Power Numerical Quantity
QthH = the Throughput Numerical Quantity

The following table lists data load times and refresh function times during the power run.

TPC-H @3000GB, Non-Clustered Systems
Database Load & Database Refresh
System
Processor
Data Loading
(h:m:s)
T5
Advan
RF1
(sec)
T5
Advan
RF2
(sec)
T5
Advan
SPARC T5-4
3.6 GHz SPARC T5
02:06:04 1.0x 19.4 1.0x 22.4 1.0x
IBM Power 780
4.1 GHz POWER7
05:51:50 2.8x 147.3 7.6x 133.2 5.9x
HP ProLiant DL980 G7
2.27 GHz Intel Xeon X7560
08:35:17 4.1x 173.0 8.9x 126.3 5.6x

Data Loading = database load time
RF1 = power test first refresh transaction
RF2 = power test second refresh transaction
T5 Advan = the ratio of time to T5 time

Complete benchmark results found at the TPC benchmark website http://www.tpc.org.

Configuration Summary and Results

Hardware Configuration:

SPARC T5-4 server
4 x SPARC T5 processors (3.6 GHz total of 64 cores, 512 threads)
2 TB memory
2 x internal SAS (2 x 300 GB) disk drives

External Storage:

12 x Sun Storage 2540-M2 array with Sun Storage 2501-M2 expansion trays, each with
24 x 15K RPM 300 GB drives, 2 controllers, 2 GB cache
2 x Brocade 6510 Fibre Channel Switches (48 x 16 Gbs port each)

Software Configuration:

Oracle Solaris 11.1
Oracle Database 11g Release 2 Enterprise Edition

Audited Results:

Database Size: 3000 GB (Scale Factor 3000)
TPC-H Composite: 409,721.8 QphH@3000GB
Price/performance: $3.94/QphH@3000GB
Available: 09/24/2013
Total 3 year Cost: $1,610,564
TPC-H Power: 345,762.7
TPC-H Throughput: 485,512.1
Database Load Time: 2:06:04

Benchmark Description

The TPC-H benchmark is a performance benchmark established by the Transaction Processing Council (TPC) to demonstrate Data Warehousing/Decision Support Systems (DSS). TPC-H measurements are produced for customers to evaluate the performance of various DSS systems. These queries and updates are executed against a standard database under controlled conditions. Performance projections and comparisons between different TPC-H Database sizes (100GB, 300GB, 1000GB, 3000GB, 10000GB, 30000GB and 100000GB) are not allowed by the TPC.

TPC-H is a data warehousing-oriented, non-industry-specific benchmark that consists of a large number of complex queries typical of decision support applications. It also includes some insert and delete activity that is intended to simulate loading and purging data from a warehouse. TPC-H measures the combined performance of a particular database manager on a specific computer system.

The main performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@SF, where SF is the number of GB of raw data, referred to as the scale factor). QphH@SF is intended to summarize the ability of the system to process queries in both single and multiple user modes. The benchmark requires reporting of price/performance, which is the ratio of the total HW/SW cost plus 3 years maintenance to the QphH. A secondary metric is the storage efficiency, which is the ratio of total configured disk space in GB to the scale factor.

Key Points and Best Practices

  • Twelve of Oracle's Sun Storage 2540-M2 arrays with Sun Storage 2501-M2 expansion trays were used for the benchmark. Each contains 24 15K RPM drives and is connected to a single dual port 16Gb FC HBA using 2 ports through a Brocade 6510 Fibre Channel switch.

  • The SPARC T5-4 server achieved a peak IO rate of 33 GB/sec from the Oracle database configured with this storage.

  • Oracle Solaris 11.1 required very little system tuning.

  • Some vendors try to make the point that storage ratios are of customer concern. However, storage ratio size has more to do with disk layout and the increasing capacities of disks – so this is not an important metric when comparing systems.

  • The SPARC T5-4 server and Oracle Solaris efficiently managed the system load of two thousand Oracle Database parallel processes.

  • Six Sun Storage 2540-M2/2501-M2 arrays were mirrored to another six Sun Storage 2540-M2/25001-M2 arrays on which all of the Oracle database files were placed. IO performance was high and balanced across all the arrays.

  • The TPC-H Refresh Function (RF) simulates periodical refresh portion of Data Warehouse by adding new sales and deleting old sales data. Parallel DML (parallel insert and delete in this case) and database log performance are a key for this function and the SPARC T5-4 server outperformed both the IBM POWER7 server and HP ProLiant DL980 G7 server. (See the RF columns above.)

See Also

Disclosure Statement

TPC-H, QphH, $/QphH are trademarks of Transaction Processing Performance Council (TPC). For more information, see www.tpc.org, results as of 6/7/13. Prices are in USD. SPARC T5-4 www.tpc.org/3288; SPARC T4-4 www.tpc.org/3278; SPARC Enterprise M9000 www.tpc.org/3262; SPARC Enterprise M9000 www.tpc.org/3258; IBM Power 780 www.tpc.org/3277; HP ProLiant DL980 www.tpc.org/3285. 

Tuesday Mar 26, 2013

SPARC T5-8 Delivers Oracle OLAP World Record Performance

Oracle's SPARC T5-8 server delivered world record query performance with near real-time analytic capability using the Oracle OLAP Perf Version 3 workload running Oracle Database 11g Release 2 on Oracle Solaris 11.

  • The maximum query throughput on the SPARC T5-8 server is 1.6x higher than that of the 8-chip Intel Xeon E7-8870 server. Both systems had sub-second response time.

  • The SPARC T5-8 server with the Oracle Database demonstrated the ability to support at least 600 concurrent users querying OLAP cubes (with no think time), processing 2.93 million analytic queries per hour with an average response time of 0.66 seconds per query. This performance was enabled by keeping the entire cube in-memory utilizing the 4 TB of memory on the SPARC T5-8 server.

  • Assuming a 60 second think time between query requests, the SPARC T5-8 server can support approximately 49,450 concurrent users with the same 0.66 sec response time.

  • The SPARC T5-8 server delivered 4.3x times the maximum query throughput of a SPARC T4-4 server.

  • The workload uses a set of realistic BI queries that run against an OLAP cube based on a 4 billion row fact table of sales data. The 4 billion rows are partitioned by month spanning 10 years.

  • The combination of the Oracle Database with the Oracle OLAP option running on a SPARC T5-8 server supports live data updates occurring concurrently with minimally impacted user query executions.

Performance Landscape

Oracle OLAP Perf Version 3 Benchmark
Oracle cube base on 4 billion fact table rows
10 years of data partitioned by month
System Queries/
hour
Users* Average Response
Time (sec)
0 sec think time 60 sec think time
SPARC T5-8 2,934,000 600 49,450 0.66
8-chip Intel Xeon E7-8870 1,823,000 120 30,500 0.19
SPARC T4-4 686,500 150 11,580 0.71

Configuration Summary and Results

SPARC T5-8 Hardware Configuration:

1 x SPARC T5-8 server with
8 x SPARC T5 processors, 3.6 GHz
4 TB memory
Data Storage and Redo Storage
1 x Sun Storage F5100 Flash Array (with 80 FMODs)
Oracle Solaris 11.1
Oracle Database 11g Release 2 (11.2.0.3) with Oracle OLAP option

Sun Server X2-8 Hardware Configuration:

1 x Sun Server X2-8 with
8 x Intel Xeon E7-8870 processors, 2.4 GHz
512 GB memory
Data Storage and Redo Storage
3 x StorageTek 2540/2501 array pairs
Oracle Solaris 10 10/12
Oracle Database 11g Release 2 (11.2.0.2) with Oracle OLAP option

SPARC T4-4 Hardware Configuration:

1 x SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
1 TB memory
Data Storage
1 x Sun Fire X4275 (using COMSTAR)
2 x Sun Storage F5100 Flash Array (each with 80 FMODs)
Redo Storage
1 x Sun Fire X4275 (using COMSTAR with 8 HDD)
Oracle Solaris 11 11/11
Oracle Database 11g Release 2 (11.2.0.3) with Oracle OLAP option

Benchmark Description

The Oracle OLAP Perf Version 3 benchmark is a workload designed to demonstrate and stress the ability of the OLAP Option to deliver fast query, near real-time updates and rich calculations using a multi-dimensional model in the context of the Oracle data warehousing.

The bulk of the benchmark entails running a number of concurrent users, each issuing typical multidimensional queries against an Oracle cube. The cube has four dimensions: time, product, customer, and channel. Each query user issues approximately 150 different queries. One query chain may ask for total sales in a particular region (e.g South America) for a particular time period (e.g. Q4 of 2010) followed by additional queries which drill down into sales for individual countries (e.g. Chile, Peru, etc.) with further queries drilling down into individual stores, etc. Another query chain may ask for yearly comparisons of total sales for some product category (e.g. major household appliances) and then issue further queries drilling down into particular products (e.g. refrigerators, stoves. etc.), particular regions, particular customers, etc.

While the core of every OLAP Perf benchmark is real world query performance, the benchmark itself offers numerous execution options such as varying data set sizes, number of users, numbers of queries for any given user and cube update frequency. Version 3 of the benchmark is executed with a much larger number of query streams than previous versions and used a cube designed for near real-time updates. The results produced by version 3 of the benchmark are not directly comparable to results produced by previous versions of the benchmark.

The near real-time update capability is implemented along the following lines. A large Oracle cube, H, is built from a 4 billion row star schema, containing data up until the end of last business day. A second small cube, D, is then created which will contain all of today's new data coming in from outside the world. It will be updated every L minutes with the data coming in within the last L minutes. A third cube, R, joins cubes H and D for reporting purposes much like a view might join data from two tables. Calculations are installed into cube R. The use of a reporting cube which draws data from different storage cubes is a common practice.

Query users are never locked out of query operations while new data is added to the update cube. The point of the demonstration is to show that an Oracle OLAP system can be designed which results in data being no more than L minutes out of date, where L may be as low as just a few minutes. This is what is meant by near real-time analytics.

Key Points and Best Practices

  • Update performance of the D cube was optimized by running update processes in the FX class with a priority greater than 0. The maximum lag time between updates to the source fact table and data availability to query users (what was referred to as L in the benchmark description) was less than 3 minutes for the benchmark environment on the SPARC T5-8 server.

  • Building and querying cubes with the Oracle OLAP option requires a large temporary tablespace. Normally temporary tablespaces would reside on disk storage. However, because the SPARC T5-8 server used in this benchmark had 4 TB of main memory, it was possible to use main memory for the OLAP temporary tablespace. This was done by using files in /tmp for the temporary tablespace datafiles.

  • Since typical BI users are often likely to issue similar queries, either with the same, or different, constants in the where clauses, setting the init.ora parameter "cursor_sharing" to "force" provides for additional query throughput and a larger number of potential users.

  • Assuming the normal Oracle initialization parameters (e.g. SGA, PGA, processes etc.) are appropriately set, out of the box performance for the OLAP Perf workload should be close to what is reported here. Additional performance resulted from (a)using memory for the OLAP temporary tablespace (b)setting "cursor_sharing" to force.

  • For a given number of query users with zero think time, the main measured metrics are the average query response time and the query throughput. A derived metric is the maximum number of users the system can support, with the same response time, assuming some non-zero think time. The calculation of this maximum is from the well-known "response-time law"

      N = (rt + tt) * tp

    where rt is the average response time, tt is the think time and tp is the measured throughput.

    Setting tt to 60 seconds, rt to 0.66 seconds and tp to 815 queries/sec (2,934,000 queries/hour), the above formula shows that the SPARC T5-8 server will support 49,450 concurrent users with a think time of 60 seconds and an average response time of 0.66 seconds.

    For more information about the "response-time law" see chapter 3 from the book "Quantitative System Performance" cited below.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 03/26/2013.

SPARC T5-2 Achieves ZFS File System Encryption Benchmark World Record

Oracle continues to lead in enterprise security. Oracle's SPARC T5 processors combined with the Oracle Solaris ZFS file system demonstrate faster file system encryption than equivalent x86 systems using the Intel Xeon Processor E5-2600 Sequence chips which have AES-NI security instructions.

Encryption is the process where data is encoded for privacy and a key is needed by the data owner to access the encoded data.

  • The SPARC T5-2 server is 3.4x faster than a 2 processor Intel Xeon E5-2690 server running Oracle Solaris 11.1 that uses the AES-NI GCM security instructions for creating encrypted files.

  • The SPARC T5-2 server is 2.2x faster than a 2 processor Intel Xeon E5-2690 server running Oracle Solaris 11.1 that uses the AES-NI CCM security instructions for creating encrypted files.

  • The SPARC T5-2 server consumes a significantly less percentage of system resources as compared to a 2 processor Intel Xeon E5-2690 server.

Performance Landscape

Below are results running two different ciphers for ZFS encryption. Results are presented for runs without any cipher, labeled clear, and a variety of different key lengths. The results represent the maximum delivered values measured for 3 concurrent sequential write operations using 1M blocks. Performance is measured in MB/sec (bigger is better). System utilization is reported as %CPU as measured by iostat (smaller is better).

The results for the x86 server were obtained using Oracle Solaris 11.1 with performance bug fixes.

Encryption Using AES-GCM Ciphers

System GCM Encryption: 3 Concurrent Sequential Writes
Clear AES-256-GCM AES-192-GCM AES-128-GCM
MB/sec %CPU MB/sec %CPU MB/sec %CPU MB/sec %CPU
SPARC T5-2 server 3,918 7 3,653 14 3,676 15 3,628 14
SPARC T4-2 server 2,912 11 2,662 31 2,663 30 2,779 31
2-Socket Intel Xeon E5-2690 3,969 42 1,062 58 1,067 58 1,076 57
SPARC T5-2 vs x86 server 1.0x 3.4x 3.4x 3.4x

Encryption Using AES-CCM Ciphers

System CCM Encryption: 3 Concurrent Sequential Writes
Clear AES-256-CCM AES-192-CCM AES-128-CCM
MB/sec %CPU MB/sec %CPU MB/sec %CPU MB/sec %CPU
SPARC T5-2 server 3,862 7 3,665 15 3,622 14 3,707 12
SPARC T4-2 server 2,945 11 2,471 26 2,801 26 2,442 25
2-Socket Intel Xeon E5-2690 3,868 42 1,566 64 1,632 63 1,689 66
SPARC T5-2 vs x86 server 1.0x 2.3x 2.2x 2.2x

Configuration Summary

Storage Configuration:

Sun Storage 6780 array
4 CSM2 trays, each with 16 83GB 15K RPM drives
8x 8 GB/sec Fiber Channel ports per host
R0 Write cache enabled, controller mirroring off for peak write bandwidth
8 Drive R0 512K stripe pools mirrored via ZFS to storage

Sun Storage 6580 array
9 CSM2 trays, each with 16 136GB 15K RPM drives
8x 4 GB/sec Fiber Channel ports per host
R0 Write cache enabled, controller mirroring off for peak write bandwidth
4 Drive R0 512K stripe pools mirrored via ZFS to storage

Server Configuration:

SPARC T5-2 server
2 x SPARC T5 3.6 GHz processors
512 GB memory
Oracle Solaris 11.1

SPARC T4-2 server
2 x SPARC T4 2.85 GHz processors
256 GB memory
Oracle Solaris 11.1

Sun Server X3-2L server
2 x Intel Xeon E5-2690, 2.90 GHz processors
128 GB memory
Oracle Solaris 11.1

Switch Configuration:

Brocade 5300 FC switch

Benchmark Description

This benchmark evaluates secure file system performance by measuring the rate at which encrypted data can be written. The Vdbench tool was used to generate the IO load. The test performed 3 concurrent sequential write operations using 1M blocks to 3 separate files.

Key Points and Best Practices

  • ZFS encryption is integrated with the ZFS command set. Like other ZFS operations, encryption operations such as key changes and re-key are performed online.

  • Data is encrypted using AES (Advanced Encryption Standard) with key lengths of 256, 192, and 128 in the CCM and GCM operation modes.

  • The flexibility of encrypting specific file systems is a key feature.

  • ZFS encryption is inheritable to descendent file systems. Key management can be delegated through ZFS delegated administration.

  • ZFS encryption uses the Oracle Solaris Cryptographic Framework which gives it access to SPARC T5 and Intel Xeon E5-2690 processor hardware acceleration or to optimized software implementations of the encryption algorithms automatically.

  • On modern computers with multiple threads per core, simple statistics like %utilization measured in tools like iostat and vmstat are not "hard" indications of the resources that might be available for other processing. For example, 90% idle may not mean that 10 times the work can be done. So drawing numerical conclusions must be done carefully.

See Also

Disclosure Statement

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of March 26, 2013.

Thursday Nov 08, 2012

SPARC T4-4 Delivers World Record Performance on Oracle OLAP Perf Version 2 Benchmark

Oracle's SPARC T4-4 server delivered world record performance with subsecond response time on the Oracle OLAP Perf Version 2 benchmark using Oracle Database 11g Release 2 running on Oracle Solaris 11.

  • The SPARC T4-4 server achieved throughput of 430,000 cube-queries/hour with an average response time of 0.85 seconds and the median response time of 0.43 seconds. This was achieved by using only 60% of the available CPU resources leaving plenty of headroom for future growth.

Performance Landscape

Oracle OLAP Perf Version 2 Benchmark
4 Billion Fact Table Rows
System Queries/
hour
Users* Response Time (sec)
Average Median
SPARC T4-4 430,000 7,300 0.85 0.43

* Users - the supported number of users with a given think time of 60 seconds

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
1 TB memory
Data Storage
1 x Sun Fire X4275 (using COMSTAR)
2 x Sun Storage F5100 Flash Array (each with 80 FMODs)
Redo Storage
1 x Sun Fire X4275 (using COMSTAR with 8 HDD)

Software Configuration:

Oracle Solaris 11 11/11
Oracle Database 11g Release 2 (11.2.0.3) with Oracle OLAP option

Benchmark Description

The Oracle OLAP Perf Version 2 benchmark is a workload designed to demonstrate and stress the Oracle OLAP product's core features of fast query, fast update, and rich calculations on a multi-dimensional model to support enhanced Data Warehousing.

The bulk of the benchmark entails running a number of concurrent users, each issuing typical multidimensional queries against an Oracle OLAP cube. The cube has four dimensions: time, product, customer, and channel. Each query user issues approximately 150 different queries. One query chain may ask for total sales in a particular region (e.g South America) for a particular time period (e.g. Q4 of 2010) followed by additional queries which drill down into sales for individual countries (e.g. Chile, Peru, etc.) with further queries drilling down into individual stores, etc. Another query chain may ask for yearly comparisons of total sales for some product category (e.g. major household appliances) and then issue further queries drilling down into particular products (e.g. refrigerators, stoves. etc.), particular regions, particular customers, etc.

Results from version 2 of the benchmark are not comparable with version 1. The primary difference is the type of queries along with the query mix.

Key Points and Best Practices

  • Since typical BI users are often likely to issue similar queries, with different constants in the where clauses, setting the init.ora prameter "cursor_sharing" to "force" will provide for additional query throughput and a larger number of potential users. Except for this setting, together with making full use of available memory, out of the box performance for the OLAP Perf workload should provide results similar to what is reported here.

  • For a given number of query users with zero think time, the main measured metrics are the average query response time, the median query response time, and the query throughput. A derived metric is the maximum number of users the system can support achieving the measured response time assuming some non-zero think time. The calculation of the maximum number of users follows from the well-known response-time law

      N = (rt + tt) * tp

    where rt is the average response time, tt is the think time and tp is the measured throughput.

    Setting tt to 60 seconds, rt to 0.85 seconds and tp to 119.44 queries/sec (430,000 queries/hour), the above formula shows that the T4-4 server will support 7,300 concurrent users with a think time of 60 seconds and an average response time of 0.85 seconds.

    For more information see chapter 3 from the book "Quantitative System Performance" cited below.

See Also

Disclosure Statement

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 11/2/2012.

Thursday Apr 19, 2012

Sun ZFS Storage 7420 Appliance Delivers 2-Node World Record SPECsfs2008 NFS Benchmark

Oracle's Sun ZFS Storage 7420 appliance delivered world record two-node performance on the SPECsfs2008 NFS benchmark, beating results published on NetApp's dual-controller and 4-node high-end FAS6240 storage systems.

  • The Sun ZFS Storage 7420 appliance delivered a world record two-node result of 267,928 SPECsfs2008_nfs.v3 Ops/sec with an Overall Response Time (ORT) of 1.31 msec on the SPECsfs2008 NFS benchmark.

  • The Sun ZFS Storage 7420 appliance delivered 1.4x higher throughput than the dual-controller NetApp FAS6240 and 2.6x higher throughput than the dual-controller NetApp FAS3270 on the SPECsfs2008_nfs.v3 benchmark at less than half the list price of either result.

  • The Sun ZFS Storage 7420 appliance required 10 percent less rack space than the dual-controller NetApp FAS6240.

  • The Sun ZFS Storage 7420 appliance had 3 percent higher throughput than the 4-node NetApp FAS6240 on the SPECsfs2008_nfs.v3 benchmark.

  • The Sun ZFS Storage 7420 appliance required 25 percent less rack space than the 4-node NetApp FAS6240.

  • The Sun ZFS Storage 7420 appliance has 14 percent better Overall Response Time than the 4-node NetApp FAS6240 on the SPECsfs2008_nfs.v3 benchmark.

Performance Landscape

SPECsfs2008_nfs.v3 Performance Chart (in decreasing SPECsfs2008_nfs.v3 Ops/sec order)

Sponsor System Throughput
(Ops/sec)
Overall Response
Time (msec)
Nodes Memory (GB)
Including Flash
Disks Rack Units –
Controllers
+Disks
Oracle 7420 267,928 1.31 2 6,728 280 54
NetApp FAS6240 260,388 1.53 4 2,256 288 72
NetApp FAS6240 190,675 1.17 2 1,128 288 60
EMC VG8 135,521 1.92 280 312
Oracle 7320 134,140 1.51 2 4,968 136 26
EMC NS-G8 110,621 2.32 264 100
NetApp FAS3270 101,183 1.66 2 40 360 66

Throughput SPECsfs2008_nfs.v3 Ops/sec — the Performance Metric
Overall Response Time — the corresponding Response Time Metric
Nodes — Nodes and Controllers are being used interchangeably

Complete SPECsfs2008 benchmark results may be found at http://www.spec.org/sfs2008/results/sfs2008.html.

Configuration Summary

Storage Configuration:

Sun ZFS Storage 7420 appliance in clustered configuration
2 x Sun ZFS Storage 7420 controllers, each with
4 x 2.4 GHz Intel Xeon E7-4870 processors
1 TB memory
4 x 512 GB SSD flash-enabled read-cache
2 x 10GbE NICs
12 x Sun Disk shelves
10 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Server Configuration:

4 x Sun Fire X4270 M2 servers, each with
2 x 3.3 GHz Intel Xeon E5680 processors
144 GB memory
1 x 10 GbE NIC
Oracle Solaris 10 9/10

Switches:

1 x 24-port 10Gb Ethernet Switch

Benchmark Description

SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation (SPEC) benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. SPECsfs2008 results summarize the server's capabilities with respect to the number of operations that can be handled per second, as well as the overall latency of the operations. The suite is a follow-on to the SFS97_R1 benchmark, adding a CIFS workload, an updated NFSv3 workload, support for additional client platforms, and a new test harness and reporting/submission framework.

See Also

Disclosure Statement

SPEC and SPECsfs are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of April 18, 2012, for more information see www.spec.org. Sun ZFS Storage 7420 Appliance 267,928 SPECsfs2008_nfs.v3 Ops/sec, 1.31 msec ORT, NetApp Data ONTAP 8.1 Cluster-Mode (4-node FAS6240) 260,388 SPECsfs2008_nfs.v3 Ops/Sec, 1.53 msec ORT, NetApp FAS6240 190,675 SPECsfs2008_nfs.v3 Ops/Sec, 1.17 msec ORT. NetApp FAS3270 101,183 SPECsfs2008_nfs.v3 Ops/Sec, 1.66 msec ORT.

Nodes refer to the item in the SPECsfs2008 disclosed Configuration Bill of Materials that have the Processing Elements that perform the NFS Processing Function. These are the first item listed in each of disclosed Configuration Bill of Materials except for EMC where it is both the first and third items listed, and HP, where it is the second item listed as Blade Servers. The number of nodes is from the QTY disclosed in the Configuration Bill of Materials as described above. Configuration Bill of Materials list price for Oracle result of US$ 423,644. Configuration Bill of Materials list price for NetApp FAS3270 result of US$ 1,215,290. Configuration Bill of Materials list price for NetApp FAS6240 result of US$ 1,028,118. Oracle pricing from https://shop.oracle.com/pls/ostore/f?p=dstore:home:0, traverse to "Storage and Tape" and then to "NAS Storage". NetApp's pricing from http://www.netapp.com/us/media/na-list-usd-netapp-custom-state-new-discounts.html.

Sunday Apr 15, 2012

Sun ZFS Storage 7420 Appliance Delivers Top High-End Price/Performance Result for SPC-2 Benchmark

Oracle's Sun ZFS Storage 7420 appliance delivered leading high-end price/performance on the SPC Benchmark 2 (SPC-2).

  • The Sun ZFS Storage 7420 appliance delivered a result of 10,704 SPC-2 MB/s at $35.24 $/SPC-2 MB/s on the SPC-2 benchmark.

  • The Sun ZFS Storage 7420 appliance beats the IBM DS8800 result by over 10% on SPC-2 MB/s and has 7.7x better $/SPC-2 MB/s.

  • The Sun ZFS Storage 7420 appliance achieved the best price/performance for the top 18 posted unique performance results on the SPC-2 benchmark.

Performance Landscape

SPC-2 Performance Chart (in decreasing performance order)

System SPC-2
MB/s
$/SPC-2
MB/s
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
HP StorageWorks P9500 13,148 $88.34 129,112 $1,161,504 RAID-5 03/07/12 B00056
Sun ZFS Storage 7420 10,704 $35.24 31,884 $377,225 Mirroring 04/12/12 B00058
IBM DS8800 9,706 $270.38 71,537 $2,624,257 RAID-5 12/01/10 B00051
HP XP24000 8,725 $187.45 18,401 $1,635,434 Mirroring 09/08/08 B00035
Hitachi Storage Platform V 8,725 $187.49 18,401 $1,635,770 Mirroring 09/08/08 B00036
TMS RamSan-630 8,323 $49.37 8,117 $410,927 RAID-5 05/10/11 B00054
IBM XIV 7,468 $152.34 154,619 $1,137,641 RAID-1 10/19/11 BE00001
IBM DS8700 7,247 $277.22 32,642 $2,009,007 RAID-5 11/30/09 B00049
IBM SAN Vol Ctlr 4.2 7,084 $463.66 101,155 $3,284,767 RAID-5 07/12/07 B00024
Fujitsu ETERNUS DX440 S2 5,768 $66.50 42,133 $383,576 Mirroring 04/12/12 B00057
IBM DS5300 5,634 $74.13 16,383 $417,648 RAID-5 10/21/09 B00045
Sun Storage 6780 5,634 $47.03 16,383 $264,999 RAID-5 10/28/09 B00047
IBM DS5300 5,544 $75.33 14,043 $417,648 RAID-6 10/21/09 B00046
Sun Storage 6780 5,544 $47.80 14,043 $264,999 RAID-6 10/28/09 B00048
IBM DS5300 4,818 $93.80 16,383 $451,986 RAID-5 09/25/08 B00037
Sun Storage 6780 4,818 $53.61 16,383 $258,329 RAID-5 02/02/09 B00039
IBM DS5300 4,676 $96.67 14,043 $451,986 RAID-6 09/25/08 B00038
Sun Storage 6780 4,676 $55.25 14,043 $258,329 RAID-6 02/03/09 B00040
IBM SAN Vol Ctlr 4.1 4,544 $400.78 51,265 $1,821,301 RAID-5 09/12/06 B00011
IBM SAN Vol Ctlr 3.1 3,518 $563.93 20,616 $1,983,785 Mirroring 12/14/05 B00001
Fujitsu ETERNUS8000 1100 3,481 $238.93 4,570 $831,649 Mirroring 03/08/07 B00019
IBM DS8300 3,218 $539.38 15,393 $1,735,473 Mirroring 12/14/05 B00006
IBM Storwize V7000 3,133 $71.32 29,914 $223,422 RAID-5 12/13/10 B00052

SPC-2 MB/s = the Performance Metric
$/SPC-2 MB/s = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.

Configuration Summary

Storage Configuration:

Sun ZFS Storage 7420 appliance in clustered configuration
2 x Sun ZFS Storage 7420 controllers, each with
4 x 2.0 GHz Intel Xeon X7550 processors
512 GB memory, 64 x 8 GB 1066 MHz DDR3 DIMMs
16 x Sun Disk shelves, each with
24 x 300 GB 15K RPM SAS-2 drives

Server Configuration:

1 x Sun Fire X4470 server, with
4 x 2.4 GHz Intel Xeon E7-4870 processors
512 GB memory
8 x 8 Gb FC connections to the Sun ZFS Storage 7420 appliance
Oracle Solaris 11 11/11

2 x Sun Fire X4470 servers, each with
4 x 2.4 GHz Intel Xeon E7-4870 processors
256 GB memory
8 x 8 Gb FC connections to the Sun ZFS Storage 7420 appliance
Oracle Solaris 11 11/11

Benchmark Description

SPC Benchmark-2 (SPC-2): Consists of three distinct workloads designed to demonstrate the performance of a storage subsystem during the execution of business critical applications that require the large-scale, sequential movement of data. Those applications are characterized predominately by large I/Os organized into one or more concurrent sequential patterns. A description of each of the three SPC-2 workloads is listed below as well as examples of applications characterized by each workload.

  • Large File Processing: Applications in a wide range of fields, which require simple sequential process of one or more large files such as scientific computing and large-scale financial processing.
  • Large Database Queries: Applications that involve scans or joins of large relational tables, such as those performed for data mining or business intelligence.
  • Video on Demand: Applications that provide individualized video entertainment to a community of subscribers by drawing from a digital film library.

SPC-2 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

See Also

Disclosure Statement

SPC-2, SPC-2 MB/s, $/SPC-2 MB/s are registered trademarks of Storage Performance Council (SPC). Results as of April 12, 2012, for more information see www.storageperformance.org. Sun ZFS Storage 7420 Appliance http://www.storageperformance.org/results/benchmark_results_spc2#b00058; IBM DS8800 http://www.storageperformance.org/results/benchmark_results_spc2#b00051.

Monday Feb 27, 2012

Sun ZFS Storage 7320 Appliance 33% Faster Than NetApp FAS3270 on SPECsfs2008

Oracle's Sun ZFS Storage 7320 appliance delivered outstanding performance on the SPECsfs2008 NFS benchmark, beating results published on NetApp's fastest midrange platform, the NetApp FAS3270, and the EMC Gateway NS-G8 Server Failover Cluster.

  • The Sun ZFS Storage 7320 appliance delivered 134,140 SPECsfs2008_nfs.v3 Ops/sec with an Overall Response Time (ORT) of 1.51 msec on the SPECsfs2008 NFS benchmark.

  • The Sun ZFS Storage 7320 appliance has 33% higher throughput than the NetApp FAS3270 on the SPECsfs2008 NFS benchmark.

  • The Sun ZFS Storage 7320 appliance required less than half the rack space of the NetApp FAS3270.

  • The Sun ZFS Storage 7320 appliance has 9% better Overall Response Time than the NetApp FAS3270 on the SPECsfs2008 NFS benchmark.

Performance Landscape

SPECsfs2008_nfs.v3 Performance Chart (in decreasing SPECsfs2008_nfs.v3 Ops/sec order)

Sponsor System Throughput
(Ops/sec)
Overall Response
Time (msec)
Memory
(GB)
Disks Exported
Capacity (TB)
Rack Units
Controllers+Disks
EMC VG8 135,521 1.92 280 312 19.2
Oracle 7320 134,140 1.51 288 136 37.0 26
EMC NS-G8 110,621 2.32 264 100 17.6
NetApp FAS3270 101,183 1.66 40 360 110.1 66

Throughput SPECsfs2008_nfs.v3 Ops/sec = the Performance Metric
Overall Response Time = the corresponding Response Time Metric

Complete SPECsfs2008 benchmark results may be found at http://www.spec.org/sfs2008/results/sfs2008.html.

Configuration Summary

Storage Configuration:

Sun ZFS Storage 7320 appliance in clustered configuration
2 x Sun ZFS Storage 7320 controllers, each with
2 x 2.4 GHz Intel Xeon E5620 processors
144 GB memory
4 x 512 GB SSD flash-enabled read-cache
6 x Sun Disk shelves
4 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Server Configuration:

3 x Sun Fire X4270 M2 servers, each with
2 x 2.4 GHz Intel Xeon E5620 processors
12 GB memory
1 x 10 GbE connection to the Sun ZFS Storage 7320 appliance
Oracle Solaris 10 8/11

Benchmark Description

SPECsfs2008 is the latest version of the Standard Performance Evaluation Corporation (SPEC) benchmark suite measuring file server throughput and response time, providing a standardized method for comparing performance across different vendor platforms. SPECsfs2008 results summarize the server's capabilities with respect to the number of operations that can be handled per second, as well as the overall latency of the operations. The suite is a follow-on to the SFS97_R1 benchmark, adding a CIFS workload, an updated NFSv3 workload, support for additional client platforms, and a new test harness and reporting/submission framework.

See Also

Disclosure Statement

SPEC and SPECsfs are registered trademarks of Standard Performance Evaluation Corporation (SPEC). Results as of February 22, 2012, for more information see www.spec.org. Sun ZFS Storage 7320 Appliance 134,140 SPECsfs2008_nfs.v3 Ops/sec, 1.51 msec ORT, NetApp FAS3270 101,183 SPECsfs2008_nfs.v3 Ops/Sec, 1.66 msec ORT, EMC Celerra Gateway NS-G8 Server Failover Cluster, 3 Datamovers (1 stdby) / Symmetrix V-Max 110,621 SPECsfs2008_nfs.v3 Ops/Sec, 2.32 msec ORT.

Monday Oct 03, 2011

Sun ZFS Storage 7420 Appliance Doubles NetApp FAS3270A on SPC-1 Benchmark

Oracle's Sun ZFS Storage 7420 appliance delivered outstanding performance and price/performance on the SPC Benchmark 1, beating results published on the NetApp FAS3270A.

  • The Sun ZFS Storage 7420 appliance delivered 137,066.20 SPC-1 IOPS at $2.99 $/SPC-1 IOPS on the SPC-1 benchmark.

  • The Sun ZFS Storage 7420 appliance outperformed the NetApp FAS3270A by 2x on the SPC-1 benchmark.

  • The Sun ZFS Storage 7420 appliance outperformed the NetApp FAS3270A by 2.5x on price/performance on the SPC-1 benchmark.

Performance Landscape

SPC-1 Performance Chart (in decreasing performance order)

System SPC-1
IOPS
$/SPC-1
IOPS
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
Huawei Symantec S6800T 150,061.17 $3.08 43,937.515 $461,471.75 Mirroring 08/31/11 A00107
Sun ZFS Storage 7420 137,066.20 $2.99 23,703.035 $409,933 Mirroring 10/03/11 A00108
Huawei Symantec S5600T 102,471.66 $2.73 35,945.185 $279,914.53 Mirroring 08/25/11 A00106
Pillar Axiom 600 70,102.27 $7.32 32,000.000 $513,112 Mirroring 04/19/11 A00104
NetApp FAS3270A 68,034.63 $7.48 21,659.386 $509,200.79 RAID DP 11/09/10 AE00004
Sun Storage 6780 62,261.80 $6.89 13,742.218 $429,294 Mirroring 06/01/10 A00094
NetApp FAS3170 60,515.34 $10.01 19,628,500 $605,492 RAID-DP 06/10/08 A00066
IBM V7000 56,510.85 $7.24 14,422.309 $409,410.86 Mirroring 10/22/10 A00097
IBM V7000 53,014.29 $7.52 24,433.592 $389,425.11 Mirroring 03/14/11 A00103

SPC-1 IOPS = the Performance Metric
$/SPC-1 IOPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-1 benchmark results may be found at http://www.storageperformance.org.

Configuration Summary

Storage Configuration:

Sun ZFS Storage 7420 appliance in clustered configuration
2 x Sun ZFS Storage 7420 controllers, each with
4 x 2.0 GHz Intel Xeon X7550 processors
512 GB memory, 64 x 8 GB 1066 MHz DDR3 DIMMs
4 x 512 GB SSD flash-enabled read-cache
12 x Sun Disk shelves
10 x shelves with 24 x 300 GB 15K RPM SAS-2 drives
2 x shelves with 20 x 300 GB 15K RPM SAS-2 drives and 4 x 73 GB SAS-2 flash-enabled write-cache

Server Configuration:

1 x SPARC T3-2 server
2 x 1.65 GHz SPARC T3 processors
128 GB memory
6 x 8 Gb FC connections to the Sun ZFS Storage 7420 appliance
Oracle Solaris 10 9/10

Benchmark Description

SPC Benchmark-1 (SPC-1): is the first industry standard storage benchmark and is the most comprehensive performance analysis environment ever constructed for storage subsystems. The I/O workload in SPC-1 is characterized by predominately random I/O operations as typified by multi-user OLTP, database, and email servers environments. SPC-1 uses a highly efficient multi-threaded workload generator to thoroughly analyze direct attach or network storage subsystems. The SPC-1 benchmark enables companies to rapidly produce valid performance and price/performance results using a variety of host platforms and storage network topologies.

SPC1 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

See Also

Disclosure Statement

SPC-1, SPC-1 IOPS, $/SPC-1 IOPS are registered trademarks of Storage Performance Council (SPC). Results as of October 2, 2011, for more information see www.storageperformance.org. Sun ZFS Storage 7420 Appliance http://www.storageperformance.org/results/benchmark_results_spc1#a00108; NetApp FAS3270A http://www.storageperformance.org/results/benchmark_results_spc1#ae00004.

SPARC T4-4 Produces World Record Oracle OLAP Capacity

Oracle's SPARC T4-4 server delivered world record capacity on the Oracle OLAP Perf workload.

  • The SPARC T4-4 server was able to operate on a cube with a 3 billion row fact table of sales data containing 4 dimensions which represents as many as 70 quintillion aggregate rows (70 followed by 18 zeros).

  • The SPARC T4-4 server supported 3,500 cube-queries/minute against the Oracle OLAP cube with an average response time of 1.5 seconds and the median response time of 0.15 seconds.

Performance Landscape

Oracle OLAP Perf Benchmark
System Fact Table
Num of Rows
Cube-Queries/
minute
Median Response
seconds
Average Response
seconds
SPARC T4-4 3 Billion 3,500 0.15 1.5

Configuration Summary and Results

Hardware Configuration:

SPARC T4-4 server with
4 x SPARC T4 processors, 3.0 GHz
1 TB main memory
2 x Sun Storage F5100 Flash Array

Software Configuration:

Oracle Solaris 10 8/11
Oracle Database 11g Enterprise Edition with Oracle OLAP option

Benchmark Description

OLAP Perf is a workload designed to demonstrate and stress the Oracle OLAP product's core functionalities of fast query, fast update, and rich calculations on a dimensional model to support Enhanced Data Warehousing. The workload uses a set of realistic business intelligence (BI) queries that run against an OLAP cube.

Key Points and Best Practices

  • The SPARC T4-4 server is estimated to support 2,400 interactive users with this fast response time assuming only 5 seconds between query requests.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/3/2011.

Friday Sep 30, 2011

SPARC T4-2 Server Beats Intel (Westmere AES-NI) on ZFS Encryption Tests

Oracle continues to lead in enterprise security. Oracle's SPARC T4 processors combined with Oracle's Solaris ZFS file system demonstrate faster file system encryption than equivalent systems based on the Intel Xeon Processor 5600 Sequence chips which use AES-NI security instructions.

Encryption is the process where data is encoded for privacy and a key is needed by the data owner to access the encoded data. The benefits of using ZFS encryption are:

  • The SPARC T4 processor is 3.5x to 5.2x faster than the Intel Xeon Processor X5670 that has the AES-NI security instructions in creating encrypted files.

  • ZFS encryption is integrated with the ZFS command set. Like other ZFS operations, encryption operations such as key changes and re-key are performed online.

  • Data is encrypted using AES (Advanced Encryption Standard) with key lengths of 256, 192, and 128 in the CCM and GCM operation modes.

  • The flexibility of encrypting specific file systems is a key feature.

  • ZFS encryption is inheritable to descendent file systems. Key management can be delegated through ZFS delegated administration.

  • ZFS encryption uses the Oracle Solaris Cryptographic Framework which gives it access to SPARC T4 processor and Intel Xeon X5670 processor (Intel AES-NI) hardware acceleration or to optimized software implementations of the encryption algorithms automatically.

Performance Landscape

Below are results running two different ciphers for ZFS encryption. Results are presented for runs without any cipher, labeled clear, and a variety of different key lengths.

Encryption Using AES-CCM Ciphers

MB/sec – 5 File Create* Encryption
Clear AES-256-CCM AES-192-CCM AES-128-CCM
SPARC T4-2 server 3,803 3,167 3,335 3,225
SPARC T3-2 server 2,286 1,554 1,561 1,594
2-Socket 2.93 GHz Xeon X5670 3,325 750 764 773

Speedup T4-2 vs X5670 1.1x 4.2x 4.4x 4.2x
Speedup T4-2 vs T3-2 1.7x 2.0x 2.1x 2.0x

Encryption Using AES-GCM Ciphers

MB/sec – 5 File Create* Encryption
Clear AES-256-GCM AES-192-GCM AES-128-GCM
SPARC T4-2 server 3,618 3,929 3,164 2,613
SPARC T3-2 server 2,278 1,451 1,455 1,449
2-Socket 2.93 GHz Xeon X5670 3,299 749 748 753

Speedup T4-2 vs X5670 1.1x 5.2x 4.2x 3.5x
Speedup T4-2 vs T3-2 1.6x 2.7x 2.2x 1.8x

(*) Maximum Delivered values measured over 5 concurrent mkfile operations.

Configuration Summary

Storage Configuration:

Sun Storage 6780 array
16 x 15K RPM drives
Raid 0 pool
Write back cache enable
Controller cache mirroring disabled for maximum bandwidth for test
Eight 8 Gb/sec ports per host

Server Configuration:

SPARC T4-2 server
2 x SPARC T4 2.85 GHz processors
256 GB memory
Oracle Solaris 11

SPARC T3-2 server
2 x SPARC T3 1.6 GHz processors
Oracle Solaris 11 Express 2010.11

Sun Fire X4270 M2 server
2 x Intel Xeon X5670, 2.93 GHz processors
Oracle Solaris 11

Benchmark Description

The benchmark ran the UNIX command mkfile (1M). Mkfile is a simple single threaded program to create a file of a specified size. The script ran 5 mkfile operations in the background and observed the peak bandwidth observed during the test.

See Also

Disclosure Statement

Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of December 16, 2011.

Wednesday Dec 08, 2010

Sun Blade X6275 M2 Cluster with Sun Storage 7410 Performance Running Seismic Processing Reverse Time Migration

This Oil & Gas benchmark highlights both the computational performance improvements of the Sun Blade X6275 M2 server module over the previous genernation server and the linear scalability achievable for the total application throughput using a Sun Storage 7410 system to deliver almost 2 GB/sec I/O effective write performance.

Oracle's Sun Storage 7410 system attached via 10 Gigabit Ethernet to a cluster of Oracle's Sun Blade X6275 M2 server modules was used to demonstrate the performance of a 3D VTI Reverse Time Migration application, a heavily used geophysical imaging and modeling application for Oil & Gas Exploration. The total application throughput scaling and computational kernel performance improvements are presented for imaging two production sized grids using 800 input samples.

  • The Sun Blade X6275 M2 server module showed up to a 40% performance improvement over the previous generation server module with super-linear scalability to 16 nodes for the 9-Point Stencil used in this Reverse Time Migration computational kernel.

  • The balanced combination of Oracle's Sun Storage 7410 system over 10 GbE to the Sun Blade X6275 M2 server module cluster showed linear scalability for the total application throughput, including the I/O and MPI communication, to produce a final 3-D seismic depth imaged cube for interpretation.

  • The final image write time from the Sun Blade X6275 M2 server module nodes to Oracle's Sun Storage 7410 system achieved 10GbE line speed of 1.25 GBytes/second or better write performance. The effects of I/O buffer caching on the Sun Blade X6275 M2 server module nodes and 34 GByte write optimized cache on the Sun Storage 7410 system gave up to 1.8 GBytes/second effective write performance.

Performance Landscape

Server Generational Performance Improvements

Performance improvements for the Reverse Time Migration computational kernel using a Sun Blade X6275 M2 cluster are compared to the previous generation Sun Blade X6275 cluster. Hyper-threading was enabled for both configurations allowing 24 OpenMP threads for the Sun Blade X6275 M2 server module nodes and 16 for the Sun Blade X6275 server module nodes.

Sun Blade X6275 M2 Performance Improvements
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
X6275 Kernel Time (sec) X6275 M2 Kernel Time (sec) X6275 M2 Speedup X6275 Kernel Time (sec) X6275 M2 Kernel Time (sec) X6275 M2 Speedup
16 306 242 1.3 728 576 1.3
14 355 271 1.3 814 679 1.2
12 435 346 1.3 945 797 1.2
10 541 390 1.4 1156 890 1.3
8 726 555 1.3 1511 1193 1.3

Application Scaling

Performance and scaling results of the total application, including I/O, for the reverse time migration demonstration application are presented. Results were obtained using a Sun Blade X6275 M2 server cluster with a Sun Storage 7410 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server node.

Application Scaling Across Multiple Nodes
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup
16 501 242 2.1\* 2.3\* 1060 576 2.0 2.1\*
14 583 271 1.8 2.0 1219 679 1.7 1.8
12 681 346 1.6 1.6 1420 797 1.5 1.5
10 807 390 1.3 1.4 1688 890 1.2 1.3
8 1058 555 1.0 1.0 2085 1193 1.0 1.0

\* Super-linear scaling due to the compute kernel fitting better into available cache for larger node counts

Image File Effective Write Performance

The performance for writing the final 3D image from a Sun Blade X6275 M2 server cluster over 10 Gigabit Ethernet to a Sun Storage 7410 system are presented. Each server allocated one core per node for MPI I/O thus allowing 22 OpenMP compute threads per node with hyperthreading enabled. Captured performance analytics from the Sun Storage 7410 system indicate effective use of its 34 Gigabyte write optimized cache.

Image File Effective Write Performance
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Write Time (sec) Write Performance (GB/sec) Write Time (sec) Write Performance (GB/sec)
16 4.8 1.5 10.2 1.4
14 5.0 1.4 10.2 1.4
12 4.0 1.8 11.3 1.3
10 4.3 1.6 9.1 1.6
8 4.6 1.5 9.7 1.5

Note: Performance results better than 1.3GB/sec related to I/O buffer caching on server nodes.

Configuration Summary

Hardware Configuration:

8 x 2 node Sun Blade X6275 M2 server nodes, each node with
2 x 2.93 GHz Intel Xeon X5670 processors
48 GB memory (12 x 4 GB at 1333 MHz)
1 x QDR InfiniBand Host Channel Adapter

Sun Datacenter InfiniBand Switch IB-36
Sun Network 10 GbE Switch 72p

Sun Storage 7410 system connected via 10 Gigabit Ethernet
4 x 17 GB STEC ZeusIOPs SSD mirrored - 34 GB
40 x 750 GB 7500 RPM Seagate SATA disks mirrored - 14.4 TB
No L2ARC Readzilla Cache

Software Configuration:

Oracle Enterprise Linux Server release 5.5
Oracle Message Passing Toolkit 8.2.1c (for MPI)
Oracle Solaris Studio 12.2 C++, Fortran, OpenMP

Benchmark Description

This Vertical Transverse Isotropy (VTI) Anisotropic Reverse Time Depth Migration (RTM) application measures the total time it takes to image 800 samples of various production size grids and write the final image to disk for the next work flow step involving 3-D seismic volume interpretation. In doing so, it reports the compute, interprocessor communication, and I/O performance of the individual functions that comprise the total solution. Unlike most references for the Reverse Time Migration, that focus solely on the performance of the 3D stencil compute kernel, this demonstration code additionally reports the total throughput involved in processing large data sets with a full 3D Anisotropic RTM application. It provides valuable insight into configuration and sizing for specific seismic processing requirements. The performance effects of new processors, interconnects, I/O subsystems, and software technologies can be evaluated while solving a real Exploration business problem.

This benchmark study uses the "in-core" implementation of this demonstration code where each node reads in only the trace, velocity, and conditioning data to be processed by that node plus a 4 element array pad (based on spatial order 8) shared with it's neighbors to the left and right during the initialization phase. It maintains previous, current, and next wavefield state information for each of the source, receiver, and anisotropic wavefields in memory. The second two grid dimensions used in this benchmark are specifically chosen to be prime numbers to exaggerate the effects of data alignment. Algorithm adaptions for processing higher orders in space and alternative "out-of-core" solutions using SSDs for wave state checkpointing are implemented in this demonstration application to better understand the effects of problem size scaling. Care is taken to handle absorption boundary conditioning and a variety of imaging conditions, appropriately.

RTM Application Structure:

Read Processing Parameter File, Determine Domain Decomposition, and Initialize Data Structures, and Allocate Memory.

Read Velocity, Epsilon, and Delta Data Based on Domain Decomposition and create source, receiver, & anisotropic previous, current, and next wave states.

First Loop over Time Steps

Compute 3D Stencil for Source Wavefield (a,s) - 8th order in space, 2nd order in time
Propagate over Time to Create s(t,z,y,x) & a(t,z,y,x)
Inject Estimated Source Wavelet
Apply Absorption Boundary Conditioning (a)
Update Wavefield States and Pointers
Write Snapshot of Wavefield (out-of-core) or Push Wavefield onto Stack (in-core)
Communicate Boundary Information

Second Loop over Time Steps
Compute 3D Stencil for Receiver Wavefield (a,r) - 8th order in space, 2nd order in time
Propagate over Time to Create r(t,z,y,x) & a(t,z,y,x)
Read Receiver Trace and Inject Receiver Wavelet
Apply Absorption Boundary Conditioning (a)
Update Wavefield States and Pointers
Communicate Boundary Information
Read in Source Wavefield Snapshot (out-of-core) or Pop Off of Stack (in-core)
Cross-correlate Source and Receiver Wavefields
Update image using image conditioning parameters

Write 3D Depth Image i(z,x,y) = Sum over time steps s(t,z,x,y) \* r(t,z,x,y) or other imaging conditions.

Key Points and Best Practices

This demonstration application represents a full Reverse Time Migration solution. Many references to the RTM application tend to focus on the compute kernel and ignore the complexity that the input, communication, and output bring to the task.

Image File MPI Write Performance Tuning

Changing the Image File Write from MPI non-blocking to MPI blocking and setting Oracle Message Passing Toolkit MPI environment variables revealed an 18x improvement in write performance to the Sun Storage 7410 system going from:

    86.8 to 4.8 seconds for the 1243 x 1151 x 1231 grid size
    183.1 to 10.2 seconds for the 2486 x 1151 x 1231 grid size

The Swat Sun Storage 7410 analytics data capture indicated an initial write performance of about 100 MB/sec with the MPI non-blocking implementation. After modifying to MPI blocking writes, Swat showed between 1.3 and 1.8 GB/sec with up to 13000 write ops/sec to write the final output image. The Swat results are consistent with the actual measured performance and provide valuable insight into the Reverse Time Migration application I/O performance.

The reason for this vast improvement has to do with whether the MPI file mode is sequential or not (MPI_MODE_SEQUENTIAL, O_SYNC, O_DSYNC). The MPI non-blocking routines, MPI_File_iwrite_at and MPI_wait, typically used for overlapping I/O and computation, do not support sequential file access mode. Therefore, the application could not take full performance advantages of the Sun Storage 7410 system write optimized cache. In contrast, the MPI blocking routine, MPI_File_write_at, defaults to MPI sequential mode and the performance advantages of the write optimized cache are realized. Since writing the final image is at the end of RTM execution, there is no need to overlap the I/O with computation.

Additional MPI parameters used:

    setenv SUNW_MP_PROCBIND true
    setenv MPI_SPIN 1
    setenv MPI_PROC_BIND 1

Adjusting the Level of Multithreading for Performance

The level of multithreading (8, 10, 12, 22, or 24) for various components of the RTM should be adjustable based on the type of computation taking place. Best to use OpenMP num_threads clause to adjust the level of multi-threading for each particular work task. Use numactl to specify how the threads are allocated to cores in accordance to the OpenMP parallelism level.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 12/07/2010.

Tuesday Oct 26, 2010

3D VTI Reverse Time Migration Scalability On Sun Fire X2270-M2 Cluster with Sun Storage 7210

This Oil & Gas benchmark shows the Sun Storage 7210 system delivers almost 2 GB/sec bandwidth and realizes near-linear scaling performance on a cluster of 16 Sun Fire X2270 M2 servers.

Oracle's Sun Storage 7210 system attached via QDR InfiniBand to a cluster of sixteen of Oracle's Sun Fire X2270 M2 servers was used to demonstrate the performance of a Reverse Time Migration application, an important application in the Oil & Gas industry. The total application throughput and computational kernel scaling are presented for two production sized grids of 800 samples.

  • Both the Reverse Time Migration I/O and combined computation shows near-linear scaling from 8 to 16 nodes on the Sun Storage 7210 system connected via QDR InfiniBand to a Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231: 2.0x improvement
      2486 x 1151 x 1231: 1.7x improvement
  • The computational kernel of the Reverse Time Migration has linear to super-linear scaling from 8 to 16 nodes in Oracle's Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231 : 2.2x improvement
      2486 x 1151 x 1231 : 2.0x improvement
  • Intel Hyper-Threading provides additional performance benefits to both the Reverse Time Migration I/O and computation when going from 12 to 24 OpenMP threads on the Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231: 8% - computational kernel; 2% - total application throughput
      2486 x 1151 x 1231: 12% - computational kernel; 6% - total application throughput
  • The Sun Storage 7210 system delivers the Velocity, Epsilon, and Delta data to the Reverse Time Migration at a steady rate even when timing includes memory initialization and data object creation:

      1243 x 1151 x 1231: 1.4 to 1.6 GBytes/sec
      2486 x 1151 x 1231: 1.2 to 1.3 GBytes/sec

    One can see that when doubling the size of the problem, the additional complexity of overlapping I/O and multiple node file contention only produces a small reduction in read performance.

Performance Landscape

Application Scaling

Performance and scaling results of the total application, including I/O, for the reverse time migration demonstration application are presented. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server.

Application Scaling Across Multiple Nodes
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup
16 504 259 2.0 2.2\* 1024 551 1.7 2.0
14 565 279 1.8 2.0 1191 677 1.5 1.6
12 662 343 1.6 1.6 1426 817 1.2 1.4
10 784 394 1.3 1.4 1501 856 1.2 1.3
8 1024 560 1.0 1.0 1745 1108 1.0 1.0

\* Super-linear scaling due to the compute kernel fitting better into available cache

Application Scaling – Hyper-Threading Study

The affects of hyperthreading are presented when running the reverse time migration demonstration application. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server.

Hyper-Threading Comparison – 12 versus 24 OpenMP Threads
Number Nodes Thread per Node Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total HT Speedup Kernel HT Speedup Total Time (sec) Kernel Time (sec) Total HT Speedup Kernel HT Speedup
16 24 504 259 1.02 1.08 1024 551 1.06 1.12
16 12 515 279 1.00 1.00 1088 616 1.00 1.00

Read Performance

Read performance is presented for the velocity, epsilon and delta files running the reverse time migration demonstration application. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server.

Velocity, Epsilon, and Delta File Read and Memory Initialization Performance
Number Nodes Overlap MBytes Read Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Time (sec) Time Relative 8-node Total GBytes Read Read Rate GB/s Time (sec) Time Relative 8-node Total GBytes Read Read Rate GB/s
16 2040 16.7 1.1 23.2 1.4 36.8 1.1 44.3 1.2
8 951
14.8 1.0 22.1 1.6 33.0 1.0 43.2 1.3

Configuration Summary

Hardware Configuration:

16 x Sun Fire X2270 M2 servers, each with
2 x 2.93 GHz Intel Xeon X5670 processors
48 GB memory (12 x 4 GB at 1333 MHz)

Sun Storage 7210 system connected via QDR InfiniBand
2 x 18 GB SATA SSD (logzilla)
40 x 1 TB 7200 RM SATA disk

Software Configuration:

SUSE Linux Enterprise Server SLES 10 SP 2
Oracle Message Passing Toolkit 8.2.1 (for MPI)
Sun Studio 12 Update 1 C++, Fortran, OpenMP

Benchmark Description

This Reverse Time Migration (RTM) demonstration application measures the total time it takes to image 800 samples of various production size grids and write the final image to disk. In this version, each node reads in only the trace, velocity, and conditioning data to be processed by that node plus a four element inline 3-D array pad (spatial order of eight) shared with its neighbors to the left and right during the initialization phase. It represents a full RTM application including the data input, computation, communication, and final output image to be used by the next work flow step involving 3D volumetric seismic interpretation.

Key Points and Best Practices

This demonstration application represents a full Reverse Time Migration solution. Many references to the RTM application tend to focus on the compute kernel and ignore the complexity that the input, communication, and output bring to the task.

I/O Characterization without Optimal Checkpointing

Velocity, Epsilon, and Delta Files - Grid Reading

The additional amount of overlapping reads to share velocity, epsilon, and delta edge data with neighbors can be calculated using the following equation:

    (number_nodes - 1) x (order_in_space) x (y_dimension) x (z_dimension) x (4 bytes) x (3 files)

For this particular benchmark study, the additional 3-D pad overlap for the 16 and 8 node cases is:

    16 nodes: 15 x 8 x 1151 x 1231 x 4 x 3 = 2.04 GB extra
    8 nodes: 7 x 8 x 1151 x 1231 x 4 x 3 = 0.95 GB extra

For the first of the two test cases, the total size of the three files used for the 1243 x 1151 x 1231 case is

    1243 x 1151 x 1231 x 4 bytes = 7.05 GB per file x 3 files = 21.13 GB

With the additional 3-D pad, the total amount of data read is:

    16 nodes: 2.04 GB + 21.13 GB = 23.2 GB
    8 nodes: 0.95 GB + 21.13 GB = 22.1 GB

For the second of the two test cases, the total size of the three files used for the 2486 x 1151 x 1231 case is

    2486 x 1151 x 1231 x 4 bytes = 14.09 GB per file x 3 files = 42.27 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: 2.04 GB + 42.27 GB = 44.3 GB
    8 nodes: 0.95 GB + 42.27 GB = 43.2 GB

Note that the amount of overlapping data read increases, not only by the number of nodes, but as the y dimension and/or the z dimension increases.

Trace Reading

The additional amount of overlapping reads to share trace edge data with neighbors for can be calculated using the following equation:

    (number_nodes - 1) x (order_in_space) x (y_dimension) x (4 bytes) x (number_of_time_slices)

For this particular benchmark study, the additional overlap for the 16 and 8 node cases is:

    16 nodes: 15 x 8 x 1151 x 4 x 800 = 442MB extra
    8 nodes: 7 x 8 x 1151 x 4 x 800 = 206MB extra

For the first case the size of the trace data file used for the 1243 x 1151 x 1231 case is

    1243 x 1151 x 4 bytes x 800 = 4.578 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: .442 GB + 4.578 GB = 5.0 GB
    8 nodes: .206 GB + 4.578 GB = 4.8 GB

For the second case the size of the trace data file used for the 2486 x 1151 x 1231 case is

    2486 x 1151 x 4 bytes x 800 = 9.156 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: .442 GB + 9.156 GB = 9.6 GB
    8 nodes: .206 GB + 9.156 GB = 9.4 GB

As the number of nodes is increased, the overlap causes more disk lock contention.

Writing Final Output Image

1243x1151x1231 - 7.1 GB per file:

    16 nodes: 78 x 1151 x 1231 x 4 = 442MB/node (7.1 GB total)
    8 nodes: 156 x 1151 x 1231 x 4 = 884MB/node (7.1 GB total)

2486x1151x1231 - 14.1 GB per file:

    16 nodes: 156 x 1151 x 1231 x 4 = 930 MB/node (14.1 GB total)
    8 nodes: 311 x 1151 x 1231 x 4 = 1808 MB/node (14.1 GB total)

Resource Allocation

It is best to allocate one node as the Oracle Grid Engine resource scheduler and MPI master host. This is especially true when running with 24 OpenMP threads in hyperthreading mode to avoid oversubscribing a node that is cooperating in delivering the solution.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/20/2010.

Thursday Sep 23, 2010

Sun Storage F5100 Flash Array with PCI-Express SAS-2 HBAs Achieves Over 17 GB/sec Read

Sun Storage F5100 Flash Array with PCI-Express SAS-2 HBAs Achieves Over 17 GB/sec Read bandwidth flash openstorage performance storage Oracle's Sun Storage F5100 Flash Array storage is a high-performance, high-density, solid-state flash array delivering 17 GB/sec sequential read throughput performance (1 MB reads) and 10 GB/sec write sequential throughput performance (1 MB writes).
  • Use the PCI-Express SAS-2 HBAs and the slot count can be reduced 50%, compared to the PCI-Express SAS-1 HBAs.

  • The Sun Storage F5100 Flash Array storage using 8 PCI-Express SAS-2 HBAs showed a 33% aggregate, sequential read bandwidth improvement over using 16 PCI-Express SAS-1 HBAs.

  • The Sun Storage F5100 Flash Array storage using 8 PCI-Express SAS-2 HBAs showed a 6% aggregate, sequential write bandwidth improvement over using 16 PCI-Express SAS-1 HBAs.

  • Each SAS port of the Sun Storage F5100 Flash Array storage delivered over 1 GB/sec sequential read performance.

  • Performance data is also presented utilizing smaller numbers of FMODs in the full configuration, demonstrating near perfect scaling from 20 to 80 FMODs.

The Sun Storage F5100 Flash Array storage is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

Performance Landscape

Results for the PCI-Express SAS-2 HBAs were obtained using four hosts, each configured with 2 HBAs.

Results for the PCI-Express SAS-1 HBAs were obtained using four hosts, each configured with 4 HBAs.

Bandwidth Measurements

Sequential Read (Aggregate GB/sec) for 1 MB Transfers
HBA Configuration FMODs
1 20 40 80
8 SAS-2 HBAs 0.26 4.3 8.5 17.0
16 SAS-1 HBAs 0.26 3.2 6.4 12.8
Sequential Write (Aggregate GB/sec) for 1 MB Transfers
HBA Configuration FMODs
1 20 40 80
8 SAS-2 HBAs 0.14 2.7 5.2 10.3
16 SAS-1 HBAs 0.12 2.4 4.8 9.7

Results and Configuration Summary

Storage Configuration:

Sun Storage F5100 Flash Array
80 Flash Modules
16 ports
4 domains (20 Flash Modules per domain)
CAM zoning - 5 Flash Modules per port

Server Configuration:

4 x Sun Fire X4270 servers, each with
16 GB memory
2 x 2.93 GHz Quad-core Intel Xeon X5570 processors
2 x PCI-Express SAS-2 External HBAs, firmware version SW1.1-RC5

Software Configuration:

OpenSolaris 2009.06 or Oracle Solaris 10 10/09
Vdbench 5.0

Benchmark Description

Two IO performance metrics on the Sun Storage F5100 Flash Array storage using Vdbench 5.0 were measured: 100% Sequential Read and 100% Sequential Write. This demonstrates the maximum performance and throughput of the storage system.

Vdbench is publicly available for download at: http://vdbench.org

Key Points and Best Practices

  • Please note that the Sun Storage F5100 Flash Array storage is a 4KB sector device. Doing IOs of less than 4KB in size, or IOs not aligned on 4KB boundaries, can impact performance on write operations.
  • Drive each Flash Module with 8 outstanding IOs.
  • Both ports of each LSI PCE-Express SAS-2 HBA were used.
  • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance.

See Also

Disclosure Statement

The Sun Storage F5100 Flash Array storage delivered 17.0 GB/sec sequential read and 10.3 GB/sec sequential write. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 20, 2010.

Wednesday Sep 22, 2010

Oracle Solaris 10 9/10 ZFS OLTP Performance Improvements

Oracle Solaris ZFS has seen significant performance improvements in the Oracle Solaris 10 9/10 release compared to the previous release, Oracle Solaris 10 10/09.
  • A 28% reduction in response time comparing holding the load constant in an OLTP workload test comparing Solaris 10 9/10 release to Oracle Solaris 10 10/09.
  • A 19% increase in IOPS throughput holding the response time of 28 msec constant in an OLTP workload test comparing Solaris 10 9/10 release to Oracle Solaris 10 10/09.
  • OLTP workload throughput rates of at least 800 IOPS using Oracle's Sun SPARC Enterprise T5420 server and Oracle's StorageTek 2540 array were used in calculating the above improvement percentages.

Performance Landscape

8K Block Random Read/Write OLTP-Style Test
IOPS Response Time (msec)
Oracle Solaris 10 9/10 Oracle Solaris 10 10/09
100 5.1 8.3
500 11.7 24.6
800 20.1 28.1
900 23.9 32.0
950 28.8 34.4

Results and Configuration Summary

Storage Configuration:

1 x StorageTek 2540 Array
12 x 73 GB 15K RPM HDDs
2 RAID5 5+1 volumes
1 RAID0 host stripe across the volumes

Server Configuration:

1 x Sun SPARC Enterprise T5240 server with
8 GB memory
2 x 1.6 GHz UltraSPARC T2 Plus processors

Software Configuration:

Oracle Solaris 10 10/09
Oracle Solaris 10 9/10
ZFS
SVM

Benchmark Description

IOPs test consisting of a mixture of random 8K block reads and writes accessing a significant portion of the available storage. As such the workload is not very "cache friendly" and, hence, illustrates the capability of the system to more fully utilize the processing capability of the back end storage.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/20/2010.

Tuesday Sep 21, 2010

Sun Flash Accelerator F20 PCIe Cards Outperform IBM on SPC-1C

Oracle's Sun Flash Accelerator F20 PCIe cards delivered outstanding value as measured by the SPC-1C benchmark, showing the advantage of Oracle's Sun FlashFire technology in terms of both performance and price/performance.
  • Three of Sun Flash Accelerator F20 PCIe cards delivered an aggregate of 72,521.11 SPC-1C IOPS, achieving the best price/performance (TSC price / SPC-1C IOPS).

  • The Sun Flash Accelerator F20 PCIe cards delivered 61% better performance at 1/5th the TSC price than the IBM System Storage EXP12S.

  • The Sun Flash Accelerator F20 PCIe cards delivered 9x better price/performance (TSC price / SPC-1C IOPS) than the IBM System Storage EXP12S.

  • The Sun F20 PCIe Flash Accelerators and workload generator were run and priced inside Oracle's Sun Fire X4270M2 server. The storage and workload generator used the same space (2 RU) as the IBM System Storage EXP12S by itself.

  • The Sun Flash Accelerator F20 PCIe cards delivered 6x better access density (SPC-1C IOPS / ASU Capacity (GB)) than the IBM System Storage EXP12S.

  • The Sun Flash Accelerator F20 PCIe cards delivered 1.5x better price / storage capacity (TSC / ASU Capacity (GB)) than the IBM System Storage EXP12S.

  • This type of workload is similar to database acceleration workloads where the storage is used as a low-latency cache. Typically these applications do not require data protection.

Performance Landscape

System SPC-1C
IOPS™
ASU
Capacity
(GB)
TSC Data
Protection
Level
LRT
Response
(µsecs)
Access
Density
Price
/Perf
$/GB Identifier
Sun F5100 300,873.47 1374.390 $151,381 unprotected 330 218.9 $0.50 110.14 C00010
Sun F20 72,521.11 147.413 $15,554 unprotected 468 492.0 $0.21 105.51 C00011
IBM EXP12S 45,000.20 547.610 $87,468 unprotected 460 82.2 $1.94 159.76 E00001

SPC-1C IOPS – SPC-1C performance metric, bigger is better
ASU Capacity – Application storage unit (ASU) capacity (in GB)
TSC – Total price of tested storage configuration, smaller is better
Data Protection Level – Data protection level used in benchmark
LRT Response (µsecs) – Average response time (microseconds) of the 10% BSU load level test run, smaller is better
Access Density – Derived metric of SPC-1C IOPS / ASU Capacity (GB), bigger is better
Price/Perf – Derived metric of TSC / SPC-1C IOPS, smaller is better.
$/GB – Derived metric of TSC / ASU Capacity (GB), smaller is better
Pricing for the IBM EXP12S included maintenance, pricing for the Sun F20 did not
Identifier – The SPC-1C submission identifier

Results and Configuration Summary

Storage Configuration:

3 x Sun Flash Accelerator F20 PCIe cards (4 FMODs each)

Hardware Configuration:

1 x Sun Fire X4270 M2 server
12 GB memory
1 x 2.93 GHz Intel Xeon X5670 processor

Software Configuration:

Oracle Solaris 10
SPC-1C benchmark kit

Benchmark Description

SPC Benchmark 1C? (SPC-1C) is the first Storage Performance Council (SPC) component-level benchmark applicable across a broad range of storage component products such as disk drives, host bus adapters (HBAs), intelligent enclosures, and storage software, such as, Logical Volume Managers. SPC-1C utilizes an identical workload to SPC-1, which is designed to demonstrate the performance of a storage component product, while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries, as well as, update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.

SPC-1C configurations consist of one or more HBAs/Controllers and one of the following storage device configurations:

  • One, two, or four storage devices in a stand alone configuration. An external enclosure may be used, but only to provide power and/or connectivity for the storage devices.

  • A "Small Storage Subsystem" configured in no larger than a 4U enclosure profile (1 - 4U, 2 - 2U, 4 - 1U, etc.).

Key Points and Best Practices

  • For best performance, insure partitions start on a 4K aligned boundary.

See Also

Disclosure Statement

SPC-1C, SPC-1C IOPS, SPC-1C LRT are trademarks of Storage Performance Council (SPC), see www.storageperformance.org for more information. Sun Flash Accelerator F20 PCIe cards SPC-1C submission identifier C00011 results of 72,521.11 SPC-1C IOPS over a total ASU capacity of 147.413 GB using unprotected data protection, a SPC-1C LRT of 0.468 milliseconds, a 100% load over all ASU response time of 6.17 milliseconds and a total TSC price (not including three-year maintenance) of $15,554. Sun Storage F5100 flash array SPC-1C submission identifier C00010 results of 300,873.47 SPC-1C IOPS over a total ASU capacity of 1374.390 GB using unprotected data protection, a SPC-1C LRT of 0.33 milliseconds, a 100% load over all ASU response time of 2.63 milliseconds and a total TSC price (including three-year maintenance) of $151,381. This compares with IBM System Storage EXP12S SPC-1C/E Submission identifier E00001 results of 45,000.20 SPC-1C IOPS over a total ASU capacity of 547.61 GB using unprotected data protection level, a SPC-1C LRT of 0.46 milliseconds, a 100% load over all ASU response time of 6.95 milliseconds and a total TSC price (including three-year maintenance) of $87,468.35. Derived metrics: Access Density (SPC-1C IOPS / ASU Cpacity (GB)); Price / Performance (TSC / SPC-1C IOPS); Price / Storage Capacity (TSC / ASU Cpacity (GB))

The Sun Flash Accelerator F20 PCIe cards is a single half-height, low-profile PCIe card. The IBM System Storage EXP12S is a 2RU (3.5") array.

Monday Aug 23, 2010

Repriced: SPC-1 Sun Storage 6180 Array (8Gb) 1.9x Better Than IBM DS5020 in Price-Performance

Results are presented on Oracle's Sun Storage 6180 array with 8Gb connectivity for the SPC-1 benchmark.
  • The Sun Storage 6180 array is more than 1.9 times better in price-performance compared to the IBM DS5020 system as measured by the SPC-1 benchmark.

  • The Sun Storage 6180 array delivers 50% more SPC-1 IOPS than the previous generation Sun Storage 6140 array and IBM DS4700 on the SPC-1 benchmark.

  • The Sun Storage 6180 array is more than 3.1 times better in price-performance compared to the NetApp FAS3040 system as measured by the SPC-1 benchmark.

  • The Sun Storage 6180 array betters the Hitachi 2100 system by 34% in price-performance on the SPC-1 benchmark.

  • The Sun Storage 6180 array has 16% better IOPS/disk drive performance than the Hitachi 2100 on the SPC-1 benchmark.

Performance Landscape

Select results for the SPC-1 benchmark comparing competitive systems (ordered by performance), data as of August 6th, 2010 from the Storage Performance Council website.

Sponsor System SPC-1 IOPS $/SPC-1
IOPS
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Results
Identifier
Hitachi HDS 2100 31,498.58 $5.85 3,967.500 $187,321 Mirroring A00076
NetApp FAS3040 30,992.39 $13.58 12,586.586 $420,800 RAID6 A00062
Oracle SS6180 (8Gb) 26,090.03 $4.37 5,145.060 $114,042 Mirroring A00084
IBM DS5020 (8Gb) 26,090.03 $8.46 5,145.060 $220,778 Mirroring A00081
Fujitsu DX80 19,492.86 $3.45 5,355.400 $67,296 Mirroring A00082
Oracle STK6140 (4Gb) 17,395.53 $4.93 1,963.269 $85,823 Mirroring A00048
IBM DS4700 (4Gb) 17,195.84 $11.67 1,963.270 $200,666 Mirroring A00046

SPC-1 IOPS = the Performance Metric
$/SPC-1 IOPS = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-1 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

80 x 146.8GB 15K RPM drives
4 x Qlogic QLE 2560 HBA

Server Configuration:

IBM system x3850 M2

Software Configuration:

MS Windows 2003 Server SP2
SPC-1 benchmark kit

Benchmark Description

SPC Benchmark-1 (SPC-1): is the first industry standard storage benchmark and is the most comprehensive performance analysis environment ever constructed for storage subsystems. The I/O workload in SPC-1 is characterized by predominately random I/O operations as typified by multi-user OLTP, database, and email servers environments. SPC-1 uses a highly efficient multi-threaded workload generator to thoroughly analyze direct attach or network storage subsystems. The SPC-1 benchmark enables companies to rapidly produce valid performance and price-performance results using a variety of host platforms and storage network topologies.

SPC1 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1, SPC-1 IOPS, $/SPC-1 IOPS reg tm of Storage Performance Council (SPC). More info www.storageperformance.org, results as of 8/6/2010. Sun Storage 6180 array 26,090.03 SPC-1 IOPS, ASU Capacity 5,145.060GB, $/SPC-1 IOPS $4.37, Data Protection Mirroring, Cost $114,042, Ident. A00084.

Repriced: SPC-2 (RAID 5 & 6 Results) Sun Storage 6180 Array (8Gb) Outperforms IBM DS5020 by up to 64% in Price-Performance

Results are presented on Oracle's Sun Storage 6180 array with 8 Gb connectivity for the SPC-2 benchmark using RAID 5 and RAID 6.
  • The Sun Storage 6180 array outperforms the IBM DS5020 system by 62% in price-performance for SPC-2 benchmark using RAID 5 data protection.

  • The Sun Storage 6180 array outperforms the IBM DS5020 system by 64% in price-performance for SPC-2 benchmark using RAID 6 data protection.

  • The Sun Storage 6180 array is over 50% faster than the previous generation systems, the Sun Storage 6140 array and IBM DS4700, on the SPC-2 benchmark using RAID 5 data protection.

Performance Landscape

Select results from Oracle and IBM competitive systems for the SPC-2 benchmark (in performance order), data as of August 7th, 2010 from the Storage Performance Council website.

Sponsor System SPC-2 MBPS $/SPC-2 MBPS ASU Capacity (GB) TSC Price Data
Protection
Level
Results Identifier
Oracle SS6180 1,286.74 $56.88 3,504.693 $73,190 RAID 6 B00044
IBM DS5020 1,286.74 $93.26 3,504.693 $120,002 RAID 6 B00042
Oracle SS6180 1,244.89 $50.40 3,504.693 $62,747 RAID 5 B00043
IBM DS5020 1,244.89 $81.73 3,504.693 $101,742 RAID 5 B00041
IBM DS4700 823.62 $106.73 1,748.874 $87,903 RAID 5 B00028
Oracle ST6140 790.67 $67.82 1,675.037 $53,622 RAID 5 B00017
Oracle ST2540 735.62 $37.32 2,177.548 $27,451 RAID 5 B00021
Oracle ST2530 672.05 $26.15 1,451.699 $17,572 RAID 5 B00026

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

Sun Storage 6180 array with 4GB cache
30 x 146.8GB 15K RPM drives (for RAID 5)
36 x 146.8GB 15K RPM drives (for RAID 6)
4 x PCIe 8 Gb single port HBA

Server Configuration:

IBM system x3850 M2

Software Configuration:

Microsoft Windows 2003 Server SP2
SPC-2 benchmark kit

Benchmark Description

The SPC Benchmark-2™ (SPC-2) is a series of related benchmark performance tests that simulate the sequential component of demands placed upon on-line, non-volatile storage in server class computer systems. SPC-2 provides measurements in support of real world environments characterized by:
  • Large numbers of concurrent sequential transfers.
  • Demanding data rate requirements, including requirements for real time processing.
  • Diverse application techniques for sequential processing.
  • Substantial storage capacity requirements.
  • Data persistence requirements to ensure preservation of data without corruption or loss.

Key Points and Best Practices

  • This benchmark was performed using RAID 5 and RAID 6 protection.
  • The controller stripe size was set to 512k.
  • No volume manager was used.

See Also

Disclosure Statement

SPC-2, SPC-2 MBPS, $/SPC-2 MBPS are regular trademarks of Storage Performance Council (SPC). More info www.storageperformance.org, results as of 8/9/2010. Sun Storage 6180 Array 1,286.74 SPC-2 MBPS, $/SPC-2 MBPS $56.88, ASU Capacity 3,504.693 GB, Protect RAID 6, Cost $73,190, Ident. B00044. Sun Storage 6180 Array 1,244.89 SPC-2 MBPS, $/SPC-2 MBPS $50.40, ASU Capacity 3,504.693 GB, Protect RAID 5, Cost $62,747, Ident. B00043.

Thursday Jun 10, 2010

Hyperion Essbase ASO World Record on Sun SPARC Enterprise M5000

Oracle's Sun SPARC Enterprise M5000 server is an excellent platform for implementing Oracle Essbase as demonstrated by the Aggregate Storage Option (ASO) benchmark.

  • Oracle's Sun SPARC Enterprise M5000 server with Oracle Solaris 10 and using Oracle's Sun Storage F5100 Flash Array system has achieved world record performance running the Oracle Essbase Aggregate Storage Option benchmark using Oracle Hyperion Essbase 11.1.1.3 and the Oracle 11g database.

  • The workload used over 1 billion records in a 15 dimensional database with millions of members. Oracle Hyperion is a component of Oracle Fusion Middleware.

  • Sun Storage F5100 Flash Array system provides more than 20% improvement out of the box compared to a mid-size fiber channel disk array for default aggregation and user based aggregation.

  • The Sun SPARC Enterprise M5000 server with Sun Storage F5100 Flash Array system and Oracle Hyperion Essbase 11.1.1.3 running on Oracle Solaris 10 provides less than 1 second query response times for 20K users in a 15 dimensional database.

  • Sun Storage F5100 Flash Array system and Oracle Hyperion Essbase provides the best combination for large Essbase database leveraging ZFS and taking advantage of high bandwidth for faster load and aggregation.

  • Oracle Fusion Middleware provides a family of complete, integrated, hot pluggable and best-of-breed products known for enabling enterprise customers to create and run agile and intelligent business applications. Oracle Hyperion's performance demonstrates why so many customers rely on Oracle Fusion Middleware as their foundation for innovation.

Performance Landscape

System Data Base Size Data Load Def Agg User Aggregation
Sun M5000, 2.53 GHz SPARC64 VII 1000M 269 min 526 min 115 min
Sun M5000, 2.4 GHz SPARC64 VII 400M 120 min 448 min 18 min

less time means faster result.

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise M5000
      4 x SPARC64 VII, 2.53 GHz
      64 GB memory
    Sun Storage F5100 Flash Array
      40 x 24 GB Flash modules

Software Configuration:

    Oracle Solaris 10
    Oracle Solaris ZFS
    Installer V 11.1.1.3
    Oracle Hyperion Essbase Client v 11.1.1.3
    Oracle Hyperion Essbase v 11.1.1.3
    Oracle Hyperion Essbase Administration services 64-bit
    Oracle Weblogic 9.2MP3 -- 64 bit
    Oracle Fusion Middleware
    Oracle RDBMS 11.1.0.7 64-bit

Benchmark Description

The benchmark highlights how Oracle Essbase can support pervasive deployments in large enterprises. It simulates an organization that needs to support a large Essbase Aggregate Storage database with over one billion data items, large dimension with 14 million members and 20 thousand active concurrent users, each operating in mixed mode: ad-hoc reporting and report viewing. The application for this benchmark was designed to model a scaled out version of a financial business intelligence application.

The benchmarks simulates typical administrative and user operations in an OLAP application environment. Administrative operations include: dimension build, data load, and data aggregation. User testing modeled a total user base of 200,000 with 10 percent actively retrieving data from Essbase.

Key Points and Best Practices

  • Sun Storage F5100 Flash Array system has been used to accelerate the application performance.
  • Jumbo frames were enabled to faster data loading.

See Also

Disclosure Statement

Oracle Essbase, www.oracle.com/solutions/mid/oracle-hyperion-enterprise.html, results 5/20/2010.

Wednesday Apr 14, 2010

Oracle Sun Storage F5100 Flash Array Delivers World Record SPC-1C Performance

Oracle's Sun Storage F5100 flash array delivered world record performance on the SPC-1C benchmark. The SPC-1C benchmark shows the advantage of Oracle's FlashFire technology.

  • The Sun Storage F5100 flash array delivered world record SPC-1C performance of 300,873.47 SPC-1C IOPS.

  • The Sun Storage F5100 flash array requires half the rack space of the of the next best result, the IBM System Storage EXP12S.

  • The Sun Storage F5100 flash array delivered nearly seven times better SPC-1C IOPS performance than the next best SPC-1C result, the IBM System Storage EXP12S with 8 SSDs.

  • The Sun Storage F5100 flash array delivered the world record SPC-1C LRT (response time) performance of 330 microseconds, and a full load response time of 2.63 milliseconds, which is over 2.5x better than the IBM System Storage EXP12S SPC-1C result.

  • Compared to the IBM result, the Sun Storage F5100 flash array delivered 2.7x better access density (SPC-1C IOPS/ ASU GB), 3.9x better price/performance (TSC/ SPC-1C IOPS) and 31% better tested $/GB (TSC/ ASU) as part of these SPC-1C benchmark results.

  • The Sun Storage F5100 flash array delivered world record SPC-1C performance using the SPC-1C workload driven by the Sun SPARC Enterprise M5000 server. This type of workload is similar to database acceleration workloads where the storage is used as a low-latency cache. Typically these applications do not require data protection.

Performance Landscape

System SPC-1C
IOPS
ASU
Capacity
(GB)
TSC Data
Protection
Level
LRT
Response
(usecs)
Access
Density
Price
/Perf
$/GB Identifier
Sun F5100 300,873.47 1374.390 $151,381 unprotected 330 218.9 $0.50 110.1 C00010
IBM EXP12S 45,000.20 547.610 $87,486 unprotected 460 82.2 $1.94 159.8 E00001

SPC-1C IOPS – SPC-1C performance metric, bigger is better
ASU Capacity – Application storage unit capacity (in GB)
TSC – Total price of tested storage configuration, smaller is better
Data Protection Level – Data protection level used in benchmark
LRT Response (usecs) – Average response time (microseconds) of the 10% BSU load level test run, smaller is better
Access Density – Derived metric of SPC-1C IOPS / ASU GB, bigger is better
Price/Perf – Derived metric of TSC / SPC-1C IOPS, smaller is better
$/GB – Derived metric of TSC / ASU, smaller is better
Identifier – The SPC-1C submission identifier

Results and Configuration Summary

Storage Configuration:

1 x Sun Storage F5100 flash array with 80 FMODs

Hardware Configuration:

1 x Sun SPARC Enterprise M5000 server
16 x StorageTek PCIe SAS Host Bus Adapter, 8 Port

Software Configuration:

Oracle Solaris 10
SPC-1C benchmark kit

Benchmark Description

SPC-1C is the first SPC component-level benchmark applicable across a broad range of storage component products such as disk drives, host bus adapters (HBAs) intelligent enclosures, and storage software such as Logical Volume Managers. SPC-1C utilizes an identical workload as SPC-1, which is designed to demonstrate the performance of a storage component product while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries as well as update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.

SPC-1C configurations consist of one or more HBAs/Controllers and one of the following storage device configurations:

  • One, two, or four storage devices in a stand alone configuration. An external enclosure may be used but only to provide power and/or connectivity for the storage devices.

  • A "Small Storage Subsystem" configured in no larger than a 4U enclosure profile (1 - 4U, 2 - 2U, 4 - 1U, etc.).

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1C, SPC-1C IOPS, SPC-1C LRT are trademarks of Storage Performance Council (SPC), see www.storageperformance.org for more information. Sun Storage F5100 flash array SPC-1C submission identifier C00010 results of 300,873.47 SPC-1C IOPS over a total ASU capacity of 1374.390 GB using unprotected data protection, a SPC-1C LRT of 0.33 milliseconds, a 100% load over all ASU response time of 2.63 milliseconds and a total TSC price (including three-year maintenance) of $151,381. This compares with IBM System Storage EXP12S SPC-1C/E Submission identifier E00001 results of 45,000.20 SPC-1C IOPS over a total ASU capacity of 547.61 GB using unprotected data protection level, a SPC-1C LRT of 0.46 milliseconds, a 100% load over all ASU response time of 6.95 milliseconds and a total TSC price (including three-year maintenance) of $87,468.

The Sun Storage F5100 flash array is a 1RU (1.75") array. The IBM System Storage EXP12S is a 2RU (3.5") array.

Monday Mar 29, 2010

Sun Blade X6275/QDR IB/ Reverse Time Migration

Significance of Results

Oracle's Sun Blade X6275 cluster with a Lustre file system was used to demonstrate the performance potential of the system when running reverse time migration applications complete with I/O processing.

  • Reduced the Total Application run time for the Reverse Time Migration when processing 800 input traces for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes, by implementing algorithm I/O optimizations and taking advantage of MPI I/O features in HPC ClusterTools:

    • 1243x1151x1231 - Wall clock time reduced from 11.5 to 6.3 minutes (1.8x improvement)
    • 2486x1151x1231 - Wall clock time reduced from 21.5 to 13.5 minutes (1.6x improvement)
  • Reduced the I/O Intensive Trace-Input time for the Reverse Time Migration when reading 800 input traces for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes running HPC ClusterTools, by modifying the algorithm to minimize the per node data requirement and avoiding unneeded synchronization:

    • 2486x1151x1231 : Time reduced from 121.5 to 3.2 seconds (38.0x improvement)
    • 1243x1151x1231 : Time reduced from 71.5 to 2.2 seconds (32.5x improvement)
  • Reduced the I/O Intensive Grid Initialization time for the Reverse Time Migration Grid when reading the Velocity, Epsilon, and Delta slices for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes running HPC ClusterTools, by modifying the algorithm to minimize the per node grid data requirement:

    • 2486x1151x1231 : Time reduced from 15.6 to 4.9 seconds (3.2x improvement)
    • 1243x1151x1231 : Time reduced from 8.9 to 1.2 seconds (7.4x improvement)

Performance Landscape

In the tables below, the hyperthreading feature is enabled and the systems are fully utilized.

This first table presents the total application performance in minutes. The overall performance improved significantly because of the improved I/O performance and other benefits.


Total Application Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (mins)
MPI I/O
Time (mins)
Improvement Original
Time (mins)
MPI I/O
Time (mins)
Improvement
24 11.5 6.3 1.8x 21.5 13.5 1.6x
20 12.0 8.0 1.5x 21.9 15.4 1.4x
16 13.8 9.7 1.4x 26.2 18.0 1.5x
12 21.7 13.2 1.6x 29.5 23.1 1.3x

This next table presents the initialization I/O time. The results are presented in seconds and shows the advantage of the improved MPI I/O strategy.


Initialization Time Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (sec)
MPI I/O
Time (sec)
Improvement Original
Time (sec)
MPI I/O
Time (sec)
Improvement
24 8.9 1.2 7.4x 15.6 4.9 3.2x
20 9.3 1.5 6.2x 16.9 3.9 4.3x
16 9.7 2.5 3.9x 17.4 11.3 1.5x
12 9.8 3.3 3.0x 22.5 14.9 1.5x

This last table presents the trace I/O time. The results are presented in seconds and shows the significant advantage of the improved MPI I/O strategy.


Trace I/O Time Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (sec)
MPI I/O
Time (sec)
Improvement Original
Time (sec)
MPI I/O
Time (sec)
Improvement
24 71.5 2.2 32.5x 121.5 3.2 38.0x
20 67.7 2.4 28.2x 118.3 3.9 30.3x
16 64.2 2.7 23.7x 110.7 4.6 24.1x
12 69.9 4.2 16.6x 296.3 14.6 20.3x

Results and Configuration Summary

Hardware Configuration:

Oracle's Sun Blade 6048 Modular Modular System with
12 x Oracle's Sun Blade x6275 Server Modules, each with
4 x 2.93 GHz Intel Xeon QC X5570 processors
12 x 4 GB memory at 1333 MHz
2 x 24 GB Internal Flash
QDR InfiniBand Lustre 1.8.0.1 File System

Software Configuration:

OS: 64-bit SUSE Linux Enterprise Server SLES 10 SP 2
MPI: Oracle Message Passing Toolkit 8.2.1 for I/O optimization to Lustre file system
MPI: Scali MPI Connect 5.6.6-59413 for original Lustre file system runs
Compiler: Oracle Solaris Studio 12 C++, Fortran, OpenMP

Benchmark Description

The primary objective of this Reverse Time Migration Benchmark is to present MPI I/O tuning techniques, exploit the power of the Sun's HPC ClusterTools MPI I/O implementation, and demonstrate the world-class performance of Sun's Lustre File System to Exploration Geophysicists throughout the world. A Sun Blade 6048 Modular System with 12 Sun Blade X6275 server modules were clustered together with a QDR Infiniband Lustre File System to show performance improvements in the Reverse Time Migration Throughput by using the Sun HPC ClusterTools MPI-IO features to implement specific algorithm I/O optimizations.

This Reverse Time Migration Benchmark measures the total time it takes to image 800 samples of various production size grids and write the final image to disk. In this new I/O optimized version, each node reads in only the data to be processed by that node plus a 4 element inline pad shared with it's neighbors to the left and right. This latest version, essentially loads the boundary condition data during the initialization phase. The previous version handled boundary conditions by having each node read in all the trace, velocity, and conditioning data. Or, alternatively, the master node would read in all the data and distribute it in it's entirety to every node in the cluster. With the previous version, each node had full memory copies of all input data sets even when it only processed a subset of that data. The new version only holds the inline dimensions and pads to be processed by a particular node in memory.

Key Points and Best Practices

  • The original implementation of the trace I/O involved the master node reading in nx \* ny floats and communicating this trace data to all the other nodes in a synchronous manner. Each node only used a subset of the trace data for each of the 800 time steps. The optimized I/O version has each node asynchronously read in only the (nx/num_procs + 8) \* ny floats that it will be processing. The additional 8 inline values for the optimized I/O version are the 4 element pads of a node's left and right neighbors to handle initial boundary conditions. The MPI_Barrier needed for the original implementation, for synchronization, and the additional I/O for each node to load all the data values, truly impacts performance. For the I/O optimized version, each node reads only the data values it needs and does not require the same MPI_Barrier synchronization as the original version of the Reverse Time Migration Benchmark. By performing such I/O optimizations, a significant improvement is seen in the Trace I/O.

  • For the best MPI performance, allocate the X6275 nodes in blade by blade order and run with HyperThreading enabled. The "Binary Conditioning" part of the Reverse Time Migration specifically likes hyperthreading.

  • To get the best I/O performance, use a maximum of 70% of each nodes available memory for the Reverse Time Migration application. Execution time may vary I/O results can occur if the nodes have different memory size configurations.

See Also

Thursday Jan 21, 2010

SPARC Enterprise M4000 PeopleSoft NA Payroll 240K Employees Performance (16 Streams)

The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology, the Sun Storage F5100 flash array, has produced World Record Performance on PeopleSoft Payroll 9.0 (North American) 240K employees benchmark.

  • The Sun SPARC Enterprise M4000 server with four 2.53 GHz SPARC64 VII processors and the Sun Storage F5100 flash array using 16 job streams (payroll threads) is 55% faster than the HP rx6600 (4 x 1.6GHz Itanium2 processors) as measured for payroll processing tasks in the PeopleSoft Payroll 9.0 (North American) benchmark. The Sun result used the Oracle 11gR1 database running on Solaris 10.

  • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and the Sun Storage F5100 flash array is 2.1x faster than the 2027 MIPs IBM Z990 (6 Z990 Gen1 processors) as measured for payroll processing tasks in the PeopleSoft Payroll 9.0 (North American) benchmark. The Sun result use the Oracle 11gR1 database running on Solaris 10 while the IBM result was run with 8 payroll threads and used IBM DB2 for Z/OS 8.1 for the database.

  • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and a Sun Storage F5100 flash array processed payroll for 240K employees using PeopleSoft Payroll 9.0 (North American) and Oracle 11gR1 running on Solaris 10 with different execution strategies with resulted in a maximum CPU utilization of 45% compared to HP's reported CPU utilization of 89%.

  • The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology processed 16 Sequential Jobs and single run control with a total time of 534 minutes, an improvement of 19% compared to HP's time of 633 minutes.

  • Sun's FlashFire technology dramatically improves IO performance for the Peoplesoft Payroll 9.0 (North American) benchmark with significant performance boost over best optimized FC disks (60+).

  • The Sun Storage F5100 Flash Array is a high performance high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

  • Sun estimates that the MIPS rating for a Sun SPARC Enterprise M4000 server is over 3000 MIPS.

Performance Landscape

240K Employees

System Processor OS/Database Time in Minutes Num of
Streams
Ver
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M4000 4x 2.53GHz SPARC64 VII Solaris/Oracle 11gR1 43.78 51.26 286.11 534.35 16 9.0
HP rx6600 4x 1.6GHz Itanium2 HP-UX/Oracle 11g 68.07 81.17 350.16 633.25 16 9.0
IBM Z990 6x Gen1 2027 MIPS Z/OS /DB2 91.70 107.34 328.66 544.80 8 9.0

Note: IBM benchmark documents show that 6 Gen1 procs is 2027 mips. 13 Gen1 processors were in this config but only 6 were available for testing.

Results and Configuration Summary

Hardware Configuration:

    1 x Sun SPARC Enterprise M4000 (4 x 2.53 GHz/32GB)
    1 x Sun Storage F5100 Flash Array (40 x 24GB FMODs)
    1 x Sun Storage J4200 (12 x 450GB SAS 15K RPM)

Software Configuration:

    Solaris 10 5/09
    Oracle PeopleSoft HCM 9.0 64-bit
    Oracle PeopleSoft Enterprise (PeopleTools) 8.49.08 64-bit
    Micro Focus Server Express 4.0 SP4 64-bit
    Oracle RDBMS 11.1.0.7 64-bit
    HP's Mercury Interactive QuickTest Professional 9.0

Benchmark Description

The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing large organization. The five processes are:

  • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

  • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

For the benchmark, we collect at least three data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of sixteen job streams to be configured to run in parallel.

Key Points and Best Practices

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle PeopleSoft Payroll 9.0 benchmark, Sun M4000 (4 2.53GHz SPARC64) 43.78 min, IBM Z990 (6 gen1) 91.70 min, HP rx6600 (4 1.6GHz Itanium2) 68.07 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html, results 1/21/2010.

Wednesday Nov 18, 2009

Sun Flash Accelerator F20 PCIe Card Achieves 100K 4K IOPS and 1.1 GB/sec

Part of the Sun FlashFire family, the Sun Flash Accelerator F20 PCIe Card is a low-profile x8 PCIe card with 4 Solid State Disks-on-Modules (DOMs) delivering over 101K IOPS (4K IO) and 1.1 GB/sec throughput (1M reads).

The Sun F20 card is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

  • The Sun Flash Accelerator F20 PCIe Card demonstrates breakthrough performance of 101K IOPS for 4K random read
  • The Sun Flash Accelerator F20 PCIe Card can also perform 88K IOPS for 4K random write
  • The Sun Flash Accelerator F20 PCIe Card has unprecedented throughput of 1.1 GB/sec.
  • The Sun Flash Accelerator F20 PCIe Card (low-profile x8 size) has the IOPS performance of over 550 SAS drives or 1,100 SATA drives.

Performance Landscape

Bandwidth and IOPS Measurements

Test DOMs
4 2 1
Random 4K Read 101K IOPS 68K IOPS 35K IOPS
Maximum Delivered Random 4K Write 88K IOPS 44K IOPS 22K IOPS
Maximum Delivered 50-50 4K Read/Write 54K IOPS 27K IOPS 13K IOPS
Sequential Read (1M) 1.1 GB/sec 547 MB/sec 273 MB/sec
Maximum Delivered Sequential Write (1M) 567 MB/sec 243 MB/sec 125 MB/sec

Sustained Random 4K Write\* 37K IOPS 18K IOPS 10K IOPS
Sustained 50/50 4K Read/Write\* 34K IOPS 17K IOPS 8.6K IOPS

(\*) Maximum Delivered values measured over a 1 minute period. Sustained write performance differs from maximum delivered performance. Over time, wear-leveling and erase operations are required and impact write performance levels.

Latency Measurements

The Sun Flash Accelerator F20 PCIe Card is tuned for 4 KB or larger IO sizes, the write service for IOs smaller than 4 KB can be 10 times more than shown in the table below. It should also be noted that the service times shown below are both the latency and the time to transfer the data. This becomes the dominant portion the the service time for IOs over 64 KB in size.

Transfer Size Service Time (ms)
Read Write
4 KB 0.32 0.22
8 KB 0.34 0.24
16 KB 0.37 0.27
32 KB 0.43 0.33
64 KB 0.54 0.46
128 KB 0.49 1.30
256 KB 1.31 2.15
512 KB 2.25 2.25

- Latencies are measured application latencies via vdbench tool.
- Please note that the FlashFire F20 card is a 4KB sector device. Doing IOs of less than 4KB in size, or not aligned on 4KB boundaries, can result in a significant performance degradations on write operations.

Results and Configuration Summary

Storage:

    Sun Flash Accelerator F20 PCIe Card
      4 x 24-GB Solid State Disks-on-Modules (DOMs)

Servers:

    1 x Sun Fire X4170

Software:

    OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
    Vdbench 5.0
    Required Flash Array Patches SPARC, ses/sgen patch 138128-01 or later & mpt patch 141736-05
    Required Flash Array Patches x86, ses/sgen patch 138129-01 or later & mpt patch 141737-05

Benchmark Description

Sun measured a wide variety of IO performance metrics on the Sun Flash Accelerator F20 PCIe Card using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

Vdbench profile f20-parmfile.txt is here for bandwidth and IOPs. And here is the vdbench profile f20-latency.txt file for latency.

Vdbench is publicly available for download at: http://vdbench.org

Key Points and Best Practices

  • Drive each Flash Modules with 32 outstanding IO as shown in the benchmark profile above.
  • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance

See Also

Disclosure Statement

Sun Flash Accelerator F20 PCIe Card delivered 100K 4K read IOPS and 1.1 GB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 14, 2009.

Wednesday Oct 28, 2009

SPC-2 Sun Storage 6780 Array RAID 5 & RAID 6 51% better $/performance than IBM DS5300

Significance of Results

Results on the Sun Storage 6780 Array with 8Gb connectivity are presented for the SPC-2 benchmark using RAID 5 and RAID 6.
  • The Sun Storage 6780 array outperforms the IBM DS5300 by 51% in price performance for SPC-2 benchmark using RAID 5 data protection.

  • The Sun Storage 6780 array outperforms the IBM DS5300 by 51% in price performance for SPC-2 benchmark using RAID 6 data protection.

  • The Sun Storage 6780 Array has 62% better performance than the Fujitsu 800/1100 and delivers a price performance advantage of 5.6x as measured by the SPC-2 benchmark.

  • The Sun Storage 6800 array with 8Gb connectivity improved performance by 36% over the 4GB connected solution as measured by the SPC-2 benchmark.

Performance Landscape

SPC-2 Performance Chart (in increasing price-performance order)

Sponsor System SPC-2
MBPS
$/SPC-2
MBPS
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
Sun SS6780 (8Gb) 5,634.17 $44.88 16,383.186 $252,873 RAID 5 10/27/09 B00047
IBM DS5300 (8Gb) 5,634.17 $67.75 16,383.186 $381,720 RAID 5 10/21/09 B00045
Sun SS6780 (8Gb) 5,543.88 $45.61 14,042.731 $252,873 RAID 6 10/27/09 B00048
IBM DS5300 (8Gb) 5,543.88 $68.85 14,042.731 $381,720 RAID 6 10/21/09 B00046
Sun SS6780 (4Gb) 4,818.43 $53.61 16,383.186 $258,329 RAID 5 02/03/09 B00039
IBM DS5300 (4Gb) 4,818.43 $93.80 16,383.186 $451,986 RAID 5 09/25/08 B00037
Sun SS6780 (4Gb) 4,675.50 $55.25 14,042.731 $258,329 RAID 6 02/03/09 B00040
IBM DS5300 (4Gb) 4,675.50 $96.67 14,042.731 $451,986 RAID 6 09/25/08 B00038
Fujitsu 800/1100 3,480.68 $238.93 4,569.845 $831,649 Mirroring 03/08/07 B00019

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

    8 x CM200 trays, each with 16 x 146GB 15K RPM drives
    8 x Qlogic 8Gb HBA

Server Configuration:

    4 x IBM x3650
      2 x 2.93 GHz Intel X5570
      5 GB memory

Software Configuration:

    Microsoft Windows Server 2003 Enterprise Edition (32-bit) with SP2
    SPC-2 benchmark kit

Benchmark Description

The SPC Benchmark-2™ (SPC-2) is a series of related benchmark performance tests that simulate the sequential component of demands placed upon on-line, non-volatile storage in server class computer systems. SPC-2 provides measurements in support of real world environments characterized by:
  • Large numbers of concurrent sequential transfers.
  • Demanding data rate requirements, including requirements for real time processing.
  • Diverse application techniques for sequential processing.
  • Substantial storage capacity requirements.
  • Data persistence requirements to ensure preservation of data without corruption or loss.

Key Points and Best Practices

  • This benchmark was performed using RAID 5 and RAID 6 protection.
  • The controller stripe size was set to 512k.
  • No volume manager was used.

See Also

Benchmark Tags

$/Perf, performance, bandwidth, OpenStorage, Storage

Disclosure Statement

SPC-2, SPC-2 MBPS, $/SPC-2 MBPS are regular trademarks of Storage Performance Council (SPC). More info www.storageperformance.org. Sun Storage 6780 Array 5,634.17 SPC-2 MBPS, $/SPC-2 MBPS $44.88, ASU Capacity 16,838.186GB, Protect RAID 5, Cost $252,873.00, Ident. B00047. Sun Storage 6780 Array 5,543.88 SPC-2 MBPS, $/SPC-2 MBPS $45.61, ASU Capacity 14,042.731 GB, Protect RAID 6, Cost $252,873.00, Ident. B00048.

Publication Rules

See here for publication rules.

About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« July 2016
SunMonTueWedThuFriSat
     
1
2
3
4
5
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today