Wednesday Dec 08, 2010

Sun Blade X6275 M2 Cluster with Sun Storage 7410 Performance Running Seismic Processing Reverse Time Migration

This Oil & Gas benchmark highlights both the computational performance improvements of the Sun Blade X6275 M2 server module over the previous genernation server and the linear scalability achievable for the total application throughput using a Sun Storage 7410 system to deliver almost 2 GB/sec I/O effective write performance.

Oracle's Sun Storage 7410 system attached via 10 Gigabit Ethernet to a cluster of Oracle's Sun Blade X6275 M2 server modules was used to demonstrate the performance of a 3D VTI Reverse Time Migration application, a heavily used geophysical imaging and modeling application for Oil & Gas Exploration. The total application throughput scaling and computational kernel performance improvements are presented for imaging two production sized grids using 800 input samples.

  • The Sun Blade X6275 M2 server module showed up to a 40% performance improvement over the previous generation server module with super-linear scalability to 16 nodes for the 9-Point Stencil used in this Reverse Time Migration computational kernel.

  • The balanced combination of Oracle's Sun Storage 7410 system over 10 GbE to the Sun Blade X6275 M2 server module cluster showed linear scalability for the total application throughput, including the I/O and MPI communication, to produce a final 3-D seismic depth imaged cube for interpretation.

  • The final image write time from the Sun Blade X6275 M2 server module nodes to Oracle's Sun Storage 7410 system achieved 10GbE line speed of 1.25 GBytes/second or better write performance. The effects of I/O buffer caching on the Sun Blade X6275 M2 server module nodes and 34 GByte write optimized cache on the Sun Storage 7410 system gave up to 1.8 GBytes/second effective write performance.

Performance Landscape

Server Generational Performance Improvements

Performance improvements for the Reverse Time Migration computational kernel using a Sun Blade X6275 M2 cluster are compared to the previous generation Sun Blade X6275 cluster. Hyper-threading was enabled for both configurations allowing 24 OpenMP threads for the Sun Blade X6275 M2 server module nodes and 16 for the Sun Blade X6275 server module nodes.

Sun Blade X6275 M2 Performance Improvements
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
X6275 Kernel Time (sec) X6275 M2 Kernel Time (sec) X6275 M2 Speedup X6275 Kernel Time (sec) X6275 M2 Kernel Time (sec) X6275 M2 Speedup
16 306 242 1.3 728 576 1.3
14 355 271 1.3 814 679 1.2
12 435 346 1.3 945 797 1.2
10 541 390 1.4 1156 890 1.3
8 726 555 1.3 1511 1193 1.3

Application Scaling

Performance and scaling results of the total application, including I/O, for the reverse time migration demonstration application are presented. Results were obtained using a Sun Blade X6275 M2 server cluster with a Sun Storage 7410 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server node.

Application Scaling Across Multiple Nodes
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup
16 501 242 2.1\* 2.3\* 1060 576 2.0 2.1\*
14 583 271 1.8 2.0 1219 679 1.7 1.8
12 681 346 1.6 1.6 1420 797 1.5 1.5
10 807 390 1.3 1.4 1688 890 1.2 1.3
8 1058 555 1.0 1.0 2085 1193 1.0 1.0

\* Super-linear scaling due to the compute kernel fitting better into available cache for larger node counts

Image File Effective Write Performance

The performance for writing the final 3D image from a Sun Blade X6275 M2 server cluster over 10 Gigabit Ethernet to a Sun Storage 7410 system are presented. Each server allocated one core per node for MPI I/O thus allowing 22 OpenMP compute threads per node with hyperthreading enabled. Captured performance analytics from the Sun Storage 7410 system indicate effective use of its 34 Gigabyte write optimized cache.

Image File Effective Write Performance
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Write Time (sec) Write Performance (GB/sec) Write Time (sec) Write Performance (GB/sec)
16 4.8 1.5 10.2 1.4
14 5.0 1.4 10.2 1.4
12 4.0 1.8 11.3 1.3
10 4.3 1.6 9.1 1.6
8 4.6 1.5 9.7 1.5

Note: Performance results better than 1.3GB/sec related to I/O buffer caching on server nodes.

Configuration Summary

Hardware Configuration:

8 x 2 node Sun Blade X6275 M2 server nodes, each node with
2 x 2.93 GHz Intel Xeon X5670 processors
48 GB memory (12 x 4 GB at 1333 MHz)
1 x QDR InfiniBand Host Channel Adapter

Sun Datacenter InfiniBand Switch IB-36
Sun Network 10 GbE Switch 72p

Sun Storage 7410 system connected via 10 Gigabit Ethernet
4 x 17 GB STEC ZeusIOPs SSD mirrored - 34 GB
40 x 750 GB 7500 RPM Seagate SATA disks mirrored - 14.4 TB
No L2ARC Readzilla Cache

Software Configuration:

Oracle Enterprise Linux Server release 5.5
Oracle Message Passing Toolkit 8.2.1c (for MPI)
Oracle Solaris Studio 12.2 C++, Fortran, OpenMP

Benchmark Description

This Vertical Transverse Isotropy (VTI) Anisotropic Reverse Time Depth Migration (RTM) application measures the total time it takes to image 800 samples of various production size grids and write the final image to disk for the next work flow step involving 3-D seismic volume interpretation. In doing so, it reports the compute, interprocessor communication, and I/O performance of the individual functions that comprise the total solution. Unlike most references for the Reverse Time Migration, that focus solely on the performance of the 3D stencil compute kernel, this demonstration code additionally reports the total throughput involved in processing large data sets with a full 3D Anisotropic RTM application. It provides valuable insight into configuration and sizing for specific seismic processing requirements. The performance effects of new processors, interconnects, I/O subsystems, and software technologies can be evaluated while solving a real Exploration business problem.

This benchmark study uses the "in-core" implementation of this demonstration code where each node reads in only the trace, velocity, and conditioning data to be processed by that node plus a 4 element array pad (based on spatial order 8) shared with it's neighbors to the left and right during the initialization phase. It maintains previous, current, and next wavefield state information for each of the source, receiver, and anisotropic wavefields in memory. The second two grid dimensions used in this benchmark are specifically chosen to be prime numbers to exaggerate the effects of data alignment. Algorithm adaptions for processing higher orders in space and alternative "out-of-core" solutions using SSDs for wave state checkpointing are implemented in this demonstration application to better understand the effects of problem size scaling. Care is taken to handle absorption boundary conditioning and a variety of imaging conditions, appropriately.

RTM Application Structure:

Read Processing Parameter File, Determine Domain Decomposition, and Initialize Data Structures, and Allocate Memory.

Read Velocity, Epsilon, and Delta Data Based on Domain Decomposition and create source, receiver, & anisotropic previous, current, and next wave states.

First Loop over Time Steps

Compute 3D Stencil for Source Wavefield (a,s) - 8th order in space, 2nd order in time
Propagate over Time to Create s(t,z,y,x) & a(t,z,y,x)
Inject Estimated Source Wavelet
Apply Absorption Boundary Conditioning (a)
Update Wavefield States and Pointers
Write Snapshot of Wavefield (out-of-core) or Push Wavefield onto Stack (in-core)
Communicate Boundary Information

Second Loop over Time Steps
Compute 3D Stencil for Receiver Wavefield (a,r) - 8th order in space, 2nd order in time
Propagate over Time to Create r(t,z,y,x) & a(t,z,y,x)
Read Receiver Trace and Inject Receiver Wavelet
Apply Absorption Boundary Conditioning (a)
Update Wavefield States and Pointers
Communicate Boundary Information
Read in Source Wavefield Snapshot (out-of-core) or Pop Off of Stack (in-core)
Cross-correlate Source and Receiver Wavefields
Update image using image conditioning parameters

Write 3D Depth Image i(z,x,y) = Sum over time steps s(t,z,x,y) \* r(t,z,x,y) or other imaging conditions.

Key Points and Best Practices

This demonstration application represents a full Reverse Time Migration solution. Many references to the RTM application tend to focus on the compute kernel and ignore the complexity that the input, communication, and output bring to the task.

Image File MPI Write Performance Tuning

Changing the Image File Write from MPI non-blocking to MPI blocking and setting Oracle Message Passing Toolkit MPI environment variables revealed an 18x improvement in write performance to the Sun Storage 7410 system going from:

    86.8 to 4.8 seconds for the 1243 x 1151 x 1231 grid size
    183.1 to 10.2 seconds for the 2486 x 1151 x 1231 grid size

The Swat Sun Storage 7410 analytics data capture indicated an initial write performance of about 100 MB/sec with the MPI non-blocking implementation. After modifying to MPI blocking writes, Swat showed between 1.3 and 1.8 GB/sec with up to 13000 write ops/sec to write the final output image. The Swat results are consistent with the actual measured performance and provide valuable insight into the Reverse Time Migration application I/O performance.

The reason for this vast improvement has to do with whether the MPI file mode is sequential or not (MPI_MODE_SEQUENTIAL, O_SYNC, O_DSYNC). The MPI non-blocking routines, MPI_File_iwrite_at and MPI_wait, typically used for overlapping I/O and computation, do not support sequential file access mode. Therefore, the application could not take full performance advantages of the Sun Storage 7410 system write optimized cache. In contrast, the MPI blocking routine, MPI_File_write_at, defaults to MPI sequential mode and the performance advantages of the write optimized cache are realized. Since writing the final image is at the end of RTM execution, there is no need to overlap the I/O with computation.

Additional MPI parameters used:

    setenv SUNW_MP_PROCBIND true
    setenv MPI_SPIN 1
    setenv MPI_PROC_BIND 1

Adjusting the Level of Multithreading for Performance

The level of multithreading (8, 10, 12, 22, or 24) for various components of the RTM should be adjustable based on the type of computation taking place. Best to use OpenMP num_threads clause to adjust the level of multi-threading for each particular work task. Use numactl to specify how the threads are allocated to cores in accordance to the OpenMP parallelism level.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 12/07/2010.

Tuesday Oct 26, 2010

3D VTI Reverse Time Migration Scalability On Sun Fire X2270-M2 Cluster with Sun Storage 7210

This Oil & Gas benchmark shows the Sun Storage 7210 system delivers almost 2 GB/sec bandwidth and realizes near-linear scaling performance on a cluster of 16 Sun Fire X2270 M2 servers.

Oracle's Sun Storage 7210 system attached via QDR InfiniBand to a cluster of sixteen of Oracle's Sun Fire X2270 M2 servers was used to demonstrate the performance of a Reverse Time Migration application, an important application in the Oil & Gas industry. The total application throughput and computational kernel scaling are presented for two production sized grids of 800 samples.

  • Both the Reverse Time Migration I/O and combined computation shows near-linear scaling from 8 to 16 nodes on the Sun Storage 7210 system connected via QDR InfiniBand to a Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231: 2.0x improvement
      2486 x 1151 x 1231: 1.7x improvement
  • The computational kernel of the Reverse Time Migration has linear to super-linear scaling from 8 to 16 nodes in Oracle's Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231 : 2.2x improvement
      2486 x 1151 x 1231 : 2.0x improvement
  • Intel Hyper-Threading provides additional performance benefits to both the Reverse Time Migration I/O and computation when going from 12 to 24 OpenMP threads on the Sun Fire X2270 M2 server cluster:

      1243 x 1151 x 1231: 8% - computational kernel; 2% - total application throughput
      2486 x 1151 x 1231: 12% - computational kernel; 6% - total application throughput
  • The Sun Storage 7210 system delivers the Velocity, Epsilon, and Delta data to the Reverse Time Migration at a steady rate even when timing includes memory initialization and data object creation:

      1243 x 1151 x 1231: 1.4 to 1.6 GBytes/sec
      2486 x 1151 x 1231: 1.2 to 1.3 GBytes/sec

    One can see that when doubling the size of the problem, the additional complexity of overlapping I/O and multiple node file contention only produces a small reduction in read performance.

Performance Landscape

Application Scaling

Performance and scaling results of the total application, including I/O, for the reverse time migration demonstration application are presented. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server.

Application Scaling Across Multiple Nodes
Number Nodes Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup Total Time (sec) Kernel Time (sec) Total Speedup Kernel Speedup
16 504 259 2.0 2.2\* 1024 551 1.7 2.0
14 565 279 1.8 2.0 1191 677 1.5 1.6
12 662 343 1.6 1.6 1426 817 1.2 1.4
10 784 394 1.3 1.4 1501 856 1.2 1.3
8 1024 560 1.0 1.0 1745 1108 1.0 1.0

\* Super-linear scaling due to the compute kernel fitting better into available cache

Application Scaling – Hyper-Threading Study

The affects of hyperthreading are presented when running the reverse time migration demonstration application. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server.

Hyper-Threading Comparison – 12 versus 24 OpenMP Threads
Number Nodes Thread per Node Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Total Time (sec) Kernel Time (sec) Total HT Speedup Kernel HT Speedup Total Time (sec) Kernel Time (sec) Total HT Speedup Kernel HT Speedup
16 24 504 259 1.02 1.08 1024 551 1.06 1.12
16 12 515 279 1.00 1.00 1088 616 1.00 1.00

Read Performance

Read performance is presented for the velocity, epsilon and delta files running the reverse time migration demonstration application. Results were obtained using a Sun Fire X2270 M2 server cluster with a Sun Storage 7210 system for the file server. The servers were running with hyperthreading enabled, allowing for 24 OpenMP threads per server.

Velocity, Epsilon, and Delta File Read and Memory Initialization Performance
Number Nodes Overlap MBytes Read Grid Size - 1243 x 1151 x 1231 Grid Size - 2486 x 1151 x1231
Time (sec) Time Relative 8-node Total GBytes Read Read Rate GB/s Time (sec) Time Relative 8-node Total GBytes Read Read Rate GB/s
16 2040 16.7 1.1 23.2 1.4 36.8 1.1 44.3 1.2
8 951
14.8 1.0 22.1 1.6 33.0 1.0 43.2 1.3

Configuration Summary

Hardware Configuration:

16 x Sun Fire X2270 M2 servers, each with
2 x 2.93 GHz Intel Xeon X5670 processors
48 GB memory (12 x 4 GB at 1333 MHz)

Sun Storage 7210 system connected via QDR InfiniBand
2 x 18 GB SATA SSD (logzilla)
40 x 1 TB 7200 RM SATA disk

Software Configuration:

SUSE Linux Enterprise Server SLES 10 SP 2
Oracle Message Passing Toolkit 8.2.1 (for MPI)
Sun Studio 12 Update 1 C++, Fortran, OpenMP

Benchmark Description

This Reverse Time Migration (RTM) demonstration application measures the total time it takes to image 800 samples of various production size grids and write the final image to disk. In this version, each node reads in only the trace, velocity, and conditioning data to be processed by that node plus a four element inline 3-D array pad (spatial order of eight) shared with its neighbors to the left and right during the initialization phase. It represents a full RTM application including the data input, computation, communication, and final output image to be used by the next work flow step involving 3D volumetric seismic interpretation.

Key Points and Best Practices

This demonstration application represents a full Reverse Time Migration solution. Many references to the RTM application tend to focus on the compute kernel and ignore the complexity that the input, communication, and output bring to the task.

I/O Characterization without Optimal Checkpointing

Velocity, Epsilon, and Delta Files - Grid Reading

The additional amount of overlapping reads to share velocity, epsilon, and delta edge data with neighbors can be calculated using the following equation:

    (number_nodes - 1) x (order_in_space) x (y_dimension) x (z_dimension) x (4 bytes) x (3 files)

For this particular benchmark study, the additional 3-D pad overlap for the 16 and 8 node cases is:

    16 nodes: 15 x 8 x 1151 x 1231 x 4 x 3 = 2.04 GB extra
    8 nodes: 7 x 8 x 1151 x 1231 x 4 x 3 = 0.95 GB extra

For the first of the two test cases, the total size of the three files used for the 1243 x 1151 x 1231 case is

    1243 x 1151 x 1231 x 4 bytes = 7.05 GB per file x 3 files = 21.13 GB

With the additional 3-D pad, the total amount of data read is:

    16 nodes: 2.04 GB + 21.13 GB = 23.2 GB
    8 nodes: 0.95 GB + 21.13 GB = 22.1 GB

For the second of the two test cases, the total size of the three files used for the 2486 x 1151 x 1231 case is

    2486 x 1151 x 1231 x 4 bytes = 14.09 GB per file x 3 files = 42.27 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: 2.04 GB + 42.27 GB = 44.3 GB
    8 nodes: 0.95 GB + 42.27 GB = 43.2 GB

Note that the amount of overlapping data read increases, not only by the number of nodes, but as the y dimension and/or the z dimension increases.

Trace Reading

The additional amount of overlapping reads to share trace edge data with neighbors for can be calculated using the following equation:

    (number_nodes - 1) x (order_in_space) x (y_dimension) x (4 bytes) x (number_of_time_slices)

For this particular benchmark study, the additional overlap for the 16 and 8 node cases is:

    16 nodes: 15 x 8 x 1151 x 4 x 800 = 442MB extra
    8 nodes: 7 x 8 x 1151 x 4 x 800 = 206MB extra

For the first case the size of the trace data file used for the 1243 x 1151 x 1231 case is

    1243 x 1151 x 4 bytes x 800 = 4.578 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: .442 GB + 4.578 GB = 5.0 GB
    8 nodes: .206 GB + 4.578 GB = 4.8 GB

For the second case the size of the trace data file used for the 2486 x 1151 x 1231 case is

    2486 x 1151 x 4 bytes x 800 = 9.156 GB

With the additional pad based on the number of nodes, the total amount of data read is:

    16 nodes: .442 GB + 9.156 GB = 9.6 GB
    8 nodes: .206 GB + 9.156 GB = 9.4 GB

As the number of nodes is increased, the overlap causes more disk lock contention.

Writing Final Output Image

1243x1151x1231 - 7.1 GB per file:

    16 nodes: 78 x 1151 x 1231 x 4 = 442MB/node (7.1 GB total)
    8 nodes: 156 x 1151 x 1231 x 4 = 884MB/node (7.1 GB total)

2486x1151x1231 - 14.1 GB per file:

    16 nodes: 156 x 1151 x 1231 x 4 = 930 MB/node (14.1 GB total)
    8 nodes: 311 x 1151 x 1231 x 4 = 1808 MB/node (14.1 GB total)

Resource Allocation

It is best to allocate one node as the Oracle Grid Engine resource scheduler and MPI master host. This is especially true when running with 24 OpenMP threads in hyperthreading mode to avoid oversubscribing a node that is cooperating in delivering the solution.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 10/20/2010.

Thursday Sep 23, 2010

Sun Storage F5100 Flash Array with PCI-Express SAS-2 HBAs Achieves Over 17 GB/sec Read

Sun Storage F5100 Flash Array with PCI-Express SAS-2 HBAs Achieves Over 17 GB/sec Read bandwidth flash openstorage performance storage Oracle's Sun Storage F5100 Flash Array storage is a high-performance, high-density, solid-state flash array delivering 17 GB/sec sequential read throughput performance (1 MB reads) and 10 GB/sec write sequential throughput performance (1 MB writes).
  • Use the PCI-Express SAS-2 HBAs and the slot count can be reduced 50%, compared to the PCI-Express SAS-1 HBAs.

  • The Sun Storage F5100 Flash Array storage using 8 PCI-Express SAS-2 HBAs showed a 33% aggregate, sequential read bandwidth improvement over using 16 PCI-Express SAS-1 HBAs.

  • The Sun Storage F5100 Flash Array storage using 8 PCI-Express SAS-2 HBAs showed a 6% aggregate, sequential write bandwidth improvement over using 16 PCI-Express SAS-1 HBAs.

  • Each SAS port of the Sun Storage F5100 Flash Array storage delivered over 1 GB/sec sequential read performance.

  • Performance data is also presented utilizing smaller numbers of FMODs in the full configuration, demonstrating near perfect scaling from 20 to 80 FMODs.

The Sun Storage F5100 Flash Array storage is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

Performance Landscape

Results for the PCI-Express SAS-2 HBAs were obtained using four hosts, each configured with 2 HBAs.

Results for the PCI-Express SAS-1 HBAs were obtained using four hosts, each configured with 4 HBAs.

Bandwidth Measurements

Sequential Read (Aggregate GB/sec) for 1 MB Transfers
HBA Configuration FMODs
1 20 40 80
8 SAS-2 HBAs 0.26 4.3 8.5 17.0
16 SAS-1 HBAs 0.26 3.2 6.4 12.8
Sequential Write (Aggregate GB/sec) for 1 MB Transfers
HBA Configuration FMODs
1 20 40 80
8 SAS-2 HBAs 0.14 2.7 5.2 10.3
16 SAS-1 HBAs 0.12 2.4 4.8 9.7

Results and Configuration Summary

Storage Configuration:

Sun Storage F5100 Flash Array
80 Flash Modules
16 ports
4 domains (20 Flash Modules per domain)
CAM zoning - 5 Flash Modules per port

Server Configuration:

4 x Sun Fire X4270 servers, each with
16 GB memory
2 x 2.93 GHz Quad-core Intel Xeon X5570 processors
2 x PCI-Express SAS-2 External HBAs, firmware version SW1.1-RC5

Software Configuration:

OpenSolaris 2009.06 or Oracle Solaris 10 10/09
Vdbench 5.0

Benchmark Description

Two IO performance metrics on the Sun Storage F5100 Flash Array storage using Vdbench 5.0 were measured: 100% Sequential Read and 100% Sequential Write. This demonstrates the maximum performance and throughput of the storage system.

Vdbench is publicly available for download at: http://vdbench.org

Key Points and Best Practices

  • Please note that the Sun Storage F5100 Flash Array storage is a 4KB sector device. Doing IOs of less than 4KB in size, or IOs not aligned on 4KB boundaries, can impact performance on write operations.
  • Drive each Flash Module with 8 outstanding IOs.
  • Both ports of each LSI PCE-Express SAS-2 HBA were used.
  • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance.

See Also

Disclosure Statement

The Sun Storage F5100 Flash Array storage delivered 17.0 GB/sec sequential read and 10.3 GB/sec sequential write. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 20, 2010.

Wednesday Sep 22, 2010

Oracle Solaris 10 9/10 ZFS OLTP Performance Improvements

Oracle Solaris ZFS has seen significant performance improvements in the Oracle Solaris 10 9/10 release compared to the previous release, Oracle Solaris 10 10/09.
  • A 28% reduction in response time comparing holding the load constant in an OLTP workload test comparing Solaris 10 9/10 release to Oracle Solaris 10 10/09.
  • A 19% increase in IOPS throughput holding the response time of 28 msec constant in an OLTP workload test comparing Solaris 10 9/10 release to Oracle Solaris 10 10/09.
  • OLTP workload throughput rates of at least 800 IOPS using Oracle's Sun SPARC Enterprise T5420 server and Oracle's StorageTek 2540 array were used in calculating the above improvement percentages.

Performance Landscape

8K Block Random Read/Write OLTP-Style Test
IOPS Response Time (msec)
Oracle Solaris 10 9/10 Oracle Solaris 10 10/09
100 5.1 8.3
500 11.7 24.6
800 20.1 28.1
900 23.9 32.0
950 28.8 34.4

Results and Configuration Summary

Storage Configuration:

1 x StorageTek 2540 Array
12 x 73 GB 15K RPM HDDs
2 RAID5 5+1 volumes
1 RAID0 host stripe across the volumes

Server Configuration:

1 x Sun SPARC Enterprise T5240 server with
8 GB memory
2 x 1.6 GHz UltraSPARC T2 Plus processors

Software Configuration:

Oracle Solaris 10 10/09
Oracle Solaris 10 9/10
ZFS
SVM

Benchmark Description

IOPs test consisting of a mixture of random 8K block reads and writes accessing a significant portion of the available storage. As such the workload is not very "cache friendly" and, hence, illustrates the capability of the system to more fully utilize the processing capability of the back end storage.

See Also

Disclosure Statement

Copyright 2010, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 9/20/2010.

Tuesday Sep 21, 2010

Sun Flash Accelerator F20 PCIe Cards Outperform IBM on SPC-1C

Oracle's Sun Flash Accelerator F20 PCIe cards delivered outstanding value as measured by the SPC-1C benchmark, showing the advantage of Oracle's Sun FlashFire technology in terms of both performance and price/performance.
  • Three of Sun Flash Accelerator F20 PCIe cards delivered an aggregate of 72,521.11 SPC-1C IOPS, achieving the best price/performance (TSC price / SPC-1C IOPS).

  • The Sun Flash Accelerator F20 PCIe cards delivered 61% better performance at 1/5th the TSC price than the IBM System Storage EXP12S.

  • The Sun Flash Accelerator F20 PCIe cards delivered 9x better price/performance (TSC price / SPC-1C IOPS) than the IBM System Storage EXP12S.

  • The Sun F20 PCIe Flash Accelerators and workload generator were run and priced inside Oracle's Sun Fire X4270M2 server. The storage and workload generator used the same space (2 RU) as the IBM System Storage EXP12S by itself.

  • The Sun Flash Accelerator F20 PCIe cards delivered 6x better access density (SPC-1C IOPS / ASU Capacity (GB)) than the IBM System Storage EXP12S.

  • The Sun Flash Accelerator F20 PCIe cards delivered 1.5x better price / storage capacity (TSC / ASU Capacity (GB)) than the IBM System Storage EXP12S.

  • This type of workload is similar to database acceleration workloads where the storage is used as a low-latency cache. Typically these applications do not require data protection.

Performance Landscape

System SPC-1C
IOPS™
ASU
Capacity
(GB)
TSC Data
Protection
Level
LRT
Response
(µsecs)
Access
Density
Price
/Perf
$/GB Identifier
Sun F5100 300,873.47 1374.390 $151,381 unprotected 330 218.9 $0.50 110.14 C00010
Sun F20 72,521.11 147.413 $15,554 unprotected 468 492.0 $0.21 105.51 C00011
IBM EXP12S 45,000.20 547.610 $87,468 unprotected 460 82.2 $1.94 159.76 E00001

SPC-1C IOPS – SPC-1C performance metric, bigger is better
ASU Capacity – Application storage unit (ASU) capacity (in GB)
TSC – Total price of tested storage configuration, smaller is better
Data Protection Level – Data protection level used in benchmark
LRT Response (µsecs) – Average response time (microseconds) of the 10% BSU load level test run, smaller is better
Access Density – Derived metric of SPC-1C IOPS / ASU Capacity (GB), bigger is better
Price/Perf – Derived metric of TSC / SPC-1C IOPS, smaller is better.
$/GB – Derived metric of TSC / ASU Capacity (GB), smaller is better
Pricing for the IBM EXP12S included maintenance, pricing for the Sun F20 did not
Identifier – The SPC-1C submission identifier

Results and Configuration Summary

Storage Configuration:

3 x Sun Flash Accelerator F20 PCIe cards (4 FMODs each)

Hardware Configuration:

1 x Sun Fire X4270 M2 server
12 GB memory
1 x 2.93 GHz Intel Xeon X5670 processor

Software Configuration:

Oracle Solaris 10
SPC-1C benchmark kit

Benchmark Description

SPC Benchmark 1C? (SPC-1C) is the first Storage Performance Council (SPC) component-level benchmark applicable across a broad range of storage component products such as disk drives, host bus adapters (HBAs), intelligent enclosures, and storage software, such as, Logical Volume Managers. SPC-1C utilizes an identical workload to SPC-1, which is designed to demonstrate the performance of a storage component product, while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries, as well as, update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.

SPC-1C configurations consist of one or more HBAs/Controllers and one of the following storage device configurations:

  • One, two, or four storage devices in a stand alone configuration. An external enclosure may be used, but only to provide power and/or connectivity for the storage devices.

  • A "Small Storage Subsystem" configured in no larger than a 4U enclosure profile (1 - 4U, 2 - 2U, 4 - 1U, etc.).

Key Points and Best Practices

  • For best performance, insure partitions start on a 4K aligned boundary.

See Also

Disclosure Statement

SPC-1C, SPC-1C IOPS, SPC-1C LRT are trademarks of Storage Performance Council (SPC), see www.storageperformance.org for more information. Sun Flash Accelerator F20 PCIe cards SPC-1C submission identifier C00011 results of 72,521.11 SPC-1C IOPS over a total ASU capacity of 147.413 GB using unprotected data protection, a SPC-1C LRT of 0.468 milliseconds, a 100% load over all ASU response time of 6.17 milliseconds and a total TSC price (not including three-year maintenance) of $15,554. Sun Storage F5100 flash array SPC-1C submission identifier C00010 results of 300,873.47 SPC-1C IOPS over a total ASU capacity of 1374.390 GB using unprotected data protection, a SPC-1C LRT of 0.33 milliseconds, a 100% load over all ASU response time of 2.63 milliseconds and a total TSC price (including three-year maintenance) of $151,381. This compares with IBM System Storage EXP12S SPC-1C/E Submission identifier E00001 results of 45,000.20 SPC-1C IOPS over a total ASU capacity of 547.61 GB using unprotected data protection level, a SPC-1C LRT of 0.46 milliseconds, a 100% load over all ASU response time of 6.95 milliseconds and a total TSC price (including three-year maintenance) of $87,468.35. Derived metrics: Access Density (SPC-1C IOPS / ASU Cpacity (GB)); Price / Performance (TSC / SPC-1C IOPS); Price / Storage Capacity (TSC / ASU Cpacity (GB))

The Sun Flash Accelerator F20 PCIe cards is a single half-height, low-profile PCIe card. The IBM System Storage EXP12S is a 2RU (3.5") array.

Monday Aug 23, 2010

Repriced: SPC-1 Sun Storage 6180 Array (8Gb) 1.9x Better Than IBM DS5020 in Price-Performance

Results are presented on Oracle's Sun Storage 6180 array with 8Gb connectivity for the SPC-1 benchmark.
  • The Sun Storage 6180 array is more than 1.9 times better in price-performance compared to the IBM DS5020 system as measured by the SPC-1 benchmark.

  • The Sun Storage 6180 array delivers 50% more SPC-1 IOPS than the previous generation Sun Storage 6140 array and IBM DS4700 on the SPC-1 benchmark.

  • The Sun Storage 6180 array is more than 3.1 times better in price-performance compared to the NetApp FAS3040 system as measured by the SPC-1 benchmark.

  • The Sun Storage 6180 array betters the Hitachi 2100 system by 34% in price-performance on the SPC-1 benchmark.

  • The Sun Storage 6180 array has 16% better IOPS/disk drive performance than the Hitachi 2100 on the SPC-1 benchmark.

Performance Landscape

Select results for the SPC-1 benchmark comparing competitive systems (ordered by performance), data as of August 6th, 2010 from the Storage Performance Council website.

Sponsor System SPC-1 IOPS $/SPC-1
IOPS
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Results
Identifier
Hitachi HDS 2100 31,498.58 $5.85 3,967.500 $187,321 Mirroring A00076
NetApp FAS3040 30,992.39 $13.58 12,586.586 $420,800 RAID6 A00062
Oracle SS6180 (8Gb) 26,090.03 $4.37 5,145.060 $114,042 Mirroring A00084
IBM DS5020 (8Gb) 26,090.03 $8.46 5,145.060 $220,778 Mirroring A00081
Fujitsu DX80 19,492.86 $3.45 5,355.400 $67,296 Mirroring A00082
Oracle STK6140 (4Gb) 17,395.53 $4.93 1,963.269 $85,823 Mirroring A00048
IBM DS4700 (4Gb) 17,195.84 $11.67 1,963.270 $200,666 Mirroring A00046

SPC-1 IOPS = the Performance Metric
$/SPC-1 IOPS = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-1 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

80 x 146.8GB 15K RPM drives
4 x Qlogic QLE 2560 HBA

Server Configuration:

IBM system x3850 M2

Software Configuration:

MS Windows 2003 Server SP2
SPC-1 benchmark kit

Benchmark Description

SPC Benchmark-1 (SPC-1): is the first industry standard storage benchmark and is the most comprehensive performance analysis environment ever constructed for storage subsystems. The I/O workload in SPC-1 is characterized by predominately random I/O operations as typified by multi-user OLTP, database, and email servers environments. SPC-1 uses a highly efficient multi-threaded workload generator to thoroughly analyze direct attach or network storage subsystems. The SPC-1 benchmark enables companies to rapidly produce valid performance and price-performance results using a variety of host platforms and storage network topologies.

SPC1 is built to:

  • Provide a level playing field for test sponsors.
  • Produce results that are powerful and yet simple to use.
  • Provide value for engineers as well as IT consumers and solution integrators.
  • Is easy to run, easy to audit/verify, and easy to use to report official results.

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1, SPC-1 IOPS, $/SPC-1 IOPS reg tm of Storage Performance Council (SPC). More info www.storageperformance.org, results as of 8/6/2010. Sun Storage 6180 array 26,090.03 SPC-1 IOPS, ASU Capacity 5,145.060GB, $/SPC-1 IOPS $4.37, Data Protection Mirroring, Cost $114,042, Ident. A00084.

Repriced: SPC-2 (RAID 5 & 6 Results) Sun Storage 6180 Array (8Gb) Outperforms IBM DS5020 by up to 64% in Price-Performance

Results are presented on Oracle's Sun Storage 6180 array with 8 Gb connectivity for the SPC-2 benchmark using RAID 5 and RAID 6.
  • The Sun Storage 6180 array outperforms the IBM DS5020 system by 62% in price-performance for SPC-2 benchmark using RAID 5 data protection.

  • The Sun Storage 6180 array outperforms the IBM DS5020 system by 64% in price-performance for SPC-2 benchmark using RAID 6 data protection.

  • The Sun Storage 6180 array is over 50% faster than the previous generation systems, the Sun Storage 6140 array and IBM DS4700, on the SPC-2 benchmark using RAID 5 data protection.

Performance Landscape

Select results from Oracle and IBM competitive systems for the SPC-2 benchmark (in performance order), data as of August 7th, 2010 from the Storage Performance Council website.

Sponsor System SPC-2 MBPS $/SPC-2 MBPS ASU Capacity (GB) TSC Price Data
Protection
Level
Results Identifier
Oracle SS6180 1,286.74 $56.88 3,504.693 $73,190 RAID 6 B00044
IBM DS5020 1,286.74 $93.26 3,504.693 $120,002 RAID 6 B00042
Oracle SS6180 1,244.89 $50.40 3,504.693 $62,747 RAID 5 B00043
IBM DS5020 1,244.89 $81.73 3,504.693 $101,742 RAID 5 B00041
IBM DS4700 823.62 $106.73 1,748.874 $87,903 RAID 5 B00028
Oracle ST6140 790.67 $67.82 1,675.037 $53,622 RAID 5 B00017
Oracle ST2540 735.62 $37.32 2,177.548 $27,451 RAID 5 B00021
Oracle ST2530 672.05 $26.15 1,451.699 $17,572 RAID 5 B00026

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price-Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

Sun Storage 6180 array with 4GB cache
30 x 146.8GB 15K RPM drives (for RAID 5)
36 x 146.8GB 15K RPM drives (for RAID 6)
4 x PCIe 8 Gb single port HBA

Server Configuration:

IBM system x3850 M2

Software Configuration:

Microsoft Windows 2003 Server SP2
SPC-2 benchmark kit

Benchmark Description

The SPC Benchmark-2™ (SPC-2) is a series of related benchmark performance tests that simulate the sequential component of demands placed upon on-line, non-volatile storage in server class computer systems. SPC-2 provides measurements in support of real world environments characterized by:
  • Large numbers of concurrent sequential transfers.
  • Demanding data rate requirements, including requirements for real time processing.
  • Diverse application techniques for sequential processing.
  • Substantial storage capacity requirements.
  • Data persistence requirements to ensure preservation of data without corruption or loss.

Key Points and Best Practices

  • This benchmark was performed using RAID 5 and RAID 6 protection.
  • The controller stripe size was set to 512k.
  • No volume manager was used.

See Also

Disclosure Statement

SPC-2, SPC-2 MBPS, $/SPC-2 MBPS are regular trademarks of Storage Performance Council (SPC). More info www.storageperformance.org, results as of 8/9/2010. Sun Storage 6180 Array 1,286.74 SPC-2 MBPS, $/SPC-2 MBPS $56.88, ASU Capacity 3,504.693 GB, Protect RAID 6, Cost $73,190, Ident. B00044. Sun Storage 6180 Array 1,244.89 SPC-2 MBPS, $/SPC-2 MBPS $50.40, ASU Capacity 3,504.693 GB, Protect RAID 5, Cost $62,747, Ident. B00043.

Thursday Jun 10, 2010

Hyperion Essbase ASO World Record on Sun SPARC Enterprise M5000

Oracle's Sun SPARC Enterprise M5000 server is an excellent platform for implementing Oracle Essbase as demonstrated by the Aggregate Storage Option (ASO) benchmark.

  • Oracle's Sun SPARC Enterprise M5000 server with Oracle Solaris 10 and using Oracle's Sun Storage F5100 Flash Array system has achieved world record performance running the Oracle Essbase Aggregate Storage Option benchmark using Oracle Hyperion Essbase 11.1.1.3 and the Oracle 11g database.

  • The workload used over 1 billion records in a 15 dimensional database with millions of members. Oracle Hyperion is a component of Oracle Fusion Middleware.

  • Sun Storage F5100 Flash Array system provides more than 20% improvement out of the box compared to a mid-size fiber channel disk array for default aggregation and user based aggregation.

  • The Sun SPARC Enterprise M5000 server with Sun Storage F5100 Flash Array system and Oracle Hyperion Essbase 11.1.1.3 running on Oracle Solaris 10 provides less than 1 second query response times for 20K users in a 15 dimensional database.

  • Sun Storage F5100 Flash Array system and Oracle Hyperion Essbase provides the best combination for large Essbase database leveraging ZFS and taking advantage of high bandwidth for faster load and aggregation.

  • Oracle Fusion Middleware provides a family of complete, integrated, hot pluggable and best-of-breed products known for enabling enterprise customers to create and run agile and intelligent business applications. Oracle Hyperion's performance demonstrates why so many customers rely on Oracle Fusion Middleware as their foundation for innovation.

Performance Landscape

System Data Base Size Data Load Def Agg User Aggregation
Sun M5000, 2.53 GHz SPARC64 VII 1000M 269 min 526 min 115 min
Sun M5000, 2.4 GHz SPARC64 VII 400M 120 min 448 min 18 min

less time means faster result.

Results and Configuration Summary

Hardware Configuration:

    Sun SPARC Enterprise M5000
      4 x SPARC64 VII, 2.53 GHz
      64 GB memory
    Sun Storage F5100 Flash Array
      40 x 24 GB Flash modules

Software Configuration:

    Oracle Solaris 10
    Oracle Solaris ZFS
    Installer V 11.1.1.3
    Oracle Hyperion Essbase Client v 11.1.1.3
    Oracle Hyperion Essbase v 11.1.1.3
    Oracle Hyperion Essbase Administration services 64-bit
    Oracle Weblogic 9.2MP3 -- 64 bit
    Oracle Fusion Middleware
    Oracle RDBMS 11.1.0.7 64-bit

Benchmark Description

The benchmark highlights how Oracle Essbase can support pervasive deployments in large enterprises. It simulates an organization that needs to support a large Essbase Aggregate Storage database with over one billion data items, large dimension with 14 million members and 20 thousand active concurrent users, each operating in mixed mode: ad-hoc reporting and report viewing. The application for this benchmark was designed to model a scaled out version of a financial business intelligence application.

The benchmarks simulates typical administrative and user operations in an OLAP application environment. Administrative operations include: dimension build, data load, and data aggregation. User testing modeled a total user base of 200,000 with 10 percent actively retrieving data from Essbase.

Key Points and Best Practices

  • Sun Storage F5100 Flash Array system has been used to accelerate the application performance.
  • Jumbo frames were enabled to faster data loading.

See Also

Disclosure Statement

Oracle Essbase, www.oracle.com/solutions/mid/oracle-hyperion-enterprise.html, results 5/20/2010.

Wednesday Apr 14, 2010

Oracle Sun Storage F5100 Flash Array Delivers World Record SPC-1C Performance

Oracle's Sun Storage F5100 flash array delivered world record performance on the SPC-1C benchmark. The SPC-1C benchmark shows the advantage of Oracle's FlashFire technology.

  • The Sun Storage F5100 flash array delivered world record SPC-1C performance of 300,873.47 SPC-1C IOPS.

  • The Sun Storage F5100 flash array requires half the rack space of the of the next best result, the IBM System Storage EXP12S.

  • The Sun Storage F5100 flash array delivered nearly seven times better SPC-1C IOPS performance than the next best SPC-1C result, the IBM System Storage EXP12S with 8 SSDs.

  • The Sun Storage F5100 flash array delivered the world record SPC-1C LRT (response time) performance of 330 microseconds, and a full load response time of 2.63 milliseconds, which is over 2.5x better than the IBM System Storage EXP12S SPC-1C result.

  • Compared to the IBM result, the Sun Storage F5100 flash array delivered 2.7x better access density (SPC-1C IOPS/ ASU GB), 3.9x better price/performance (TSC/ SPC-1C IOPS) and 31% better tested $/GB (TSC/ ASU) as part of these SPC-1C benchmark results.

  • The Sun Storage F5100 flash array delivered world record SPC-1C performance using the SPC-1C workload driven by the Sun SPARC Enterprise M5000 server. This type of workload is similar to database acceleration workloads where the storage is used as a low-latency cache. Typically these applications do not require data protection.

Performance Landscape

System SPC-1C
IOPS
ASU
Capacity
(GB)
TSC Data
Protection
Level
LRT
Response
(usecs)
Access
Density
Price
/Perf
$/GB Identifier
Sun F5100 300,873.47 1374.390 $151,381 unprotected 330 218.9 $0.50 110.1 C00010
IBM EXP12S 45,000.20 547.610 $87,486 unprotected 460 82.2 $1.94 159.8 E00001

SPC-1C IOPS – SPC-1C performance metric, bigger is better
ASU Capacity – Application storage unit capacity (in GB)
TSC – Total price of tested storage configuration, smaller is better
Data Protection Level – Data protection level used in benchmark
LRT Response (usecs) – Average response time (microseconds) of the 10% BSU load level test run, smaller is better
Access Density – Derived metric of SPC-1C IOPS / ASU GB, bigger is better
Price/Perf – Derived metric of TSC / SPC-1C IOPS, smaller is better
$/GB – Derived metric of TSC / ASU, smaller is better
Identifier – The SPC-1C submission identifier

Results and Configuration Summary

Storage Configuration:

1 x Sun Storage F5100 flash array with 80 FMODs

Hardware Configuration:

1 x Sun SPARC Enterprise M5000 server
16 x StorageTek PCIe SAS Host Bus Adapter, 8 Port

Software Configuration:

Oracle Solaris 10
SPC-1C benchmark kit

Benchmark Description

SPC-1C is the first SPC component-level benchmark applicable across a broad range of storage component products such as disk drives, host bus adapters (HBAs) intelligent enclosures, and storage software such as Logical Volume Managers. SPC-1C utilizes an identical workload as SPC-1, which is designed to demonstrate the performance of a storage component product while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries as well as update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.

SPC-1C configurations consist of one or more HBAs/Controllers and one of the following storage device configurations:

  • One, two, or four storage devices in a stand alone configuration. An external enclosure may be used but only to provide power and/or connectivity for the storage devices.

  • A "Small Storage Subsystem" configured in no larger than a 4U enclosure profile (1 - 4U, 2 - 2U, 4 - 1U, etc.).

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1C, SPC-1C IOPS, SPC-1C LRT are trademarks of Storage Performance Council (SPC), see www.storageperformance.org for more information. Sun Storage F5100 flash array SPC-1C submission identifier C00010 results of 300,873.47 SPC-1C IOPS over a total ASU capacity of 1374.390 GB using unprotected data protection, a SPC-1C LRT of 0.33 milliseconds, a 100% load over all ASU response time of 2.63 milliseconds and a total TSC price (including three-year maintenance) of $151,381. This compares with IBM System Storage EXP12S SPC-1C/E Submission identifier E00001 results of 45,000.20 SPC-1C IOPS over a total ASU capacity of 547.61 GB using unprotected data protection level, a SPC-1C LRT of 0.46 milliseconds, a 100% load over all ASU response time of 6.95 milliseconds and a total TSC price (including three-year maintenance) of $87,468.

The Sun Storage F5100 flash array is a 1RU (1.75") array. The IBM System Storage EXP12S is a 2RU (3.5") array.

Monday Mar 29, 2010

Sun Blade X6275/QDR IB/ Reverse Time Migration

Significance of Results

Oracle's Sun Blade X6275 cluster with a Lustre file system was used to demonstrate the performance potential of the system when running reverse time migration applications complete with I/O processing.

  • Reduced the Total Application run time for the Reverse Time Migration when processing 800 input traces for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes, by implementing algorithm I/O optimizations and taking advantage of MPI I/O features in HPC ClusterTools:

    • 1243x1151x1231 - Wall clock time reduced from 11.5 to 6.3 minutes (1.8x improvement)
    • 2486x1151x1231 - Wall clock time reduced from 21.5 to 13.5 minutes (1.6x improvement)
  • Reduced the I/O Intensive Trace-Input time for the Reverse Time Migration when reading 800 input traces for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes running HPC ClusterTools, by modifying the algorithm to minimize the per node data requirement and avoiding unneeded synchronization:

    • 2486x1151x1231 : Time reduced from 121.5 to 3.2 seconds (38.0x improvement)
    • 1243x1151x1231 : Time reduced from 71.5 to 2.2 seconds (32.5x improvement)
  • Reduced the I/O Intensive Grid Initialization time for the Reverse Time Migration Grid when reading the Velocity, Epsilon, and Delta slices for two production sized surveys from a QDR Infiniband Lustre file system on 24 X6275 nodes running HPC ClusterTools, by modifying the algorithm to minimize the per node grid data requirement:

    • 2486x1151x1231 : Time reduced from 15.6 to 4.9 seconds (3.2x improvement)
    • 1243x1151x1231 : Time reduced from 8.9 to 1.2 seconds (7.4x improvement)

Performance Landscape

In the tables below, the hyperthreading feature is enabled and the systems are fully utilized.

This first table presents the total application performance in minutes. The overall performance improved significantly because of the improved I/O performance and other benefits.


Total Application Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (mins)
MPI I/O
Time (mins)
Improvement Original
Time (mins)
MPI I/O
Time (mins)
Improvement
24 11.5 6.3 1.8x 21.5 13.5 1.6x
20 12.0 8.0 1.5x 21.9 15.4 1.4x
16 13.8 9.7 1.4x 26.2 18.0 1.5x
12 21.7 13.2 1.6x 29.5 23.1 1.3x

This next table presents the initialization I/O time. The results are presented in seconds and shows the advantage of the improved MPI I/O strategy.


Initialization Time Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (sec)
MPI I/O
Time (sec)
Improvement Original
Time (sec)
MPI I/O
Time (sec)
Improvement
24 8.9 1.2 7.4x 15.6 4.9 3.2x
20 9.3 1.5 6.2x 16.9 3.9 4.3x
16 9.7 2.5 3.9x 17.4 11.3 1.5x
12 9.8 3.3 3.0x 22.5 14.9 1.5x

This last table presents the trace I/O time. The results are presented in seconds and shows the significant advantage of the improved MPI I/O strategy.


Trace I/O Time Performance Comparison
Reverse Time Migration - SMP Threads and MPI Mode
Nodes 1243 x 1151 x 1231
800 Samples
2486 x 1151 x 1231
800 Samples
Original
Time (sec)
MPI I/O
Time (sec)
Improvement Original
Time (sec)
MPI I/O
Time (sec)
Improvement
24 71.5 2.2 32.5x 121.5 3.2 38.0x
20 67.7 2.4 28.2x 118.3 3.9 30.3x
16 64.2 2.7 23.7x 110.7 4.6 24.1x
12 69.9 4.2 16.6x 296.3 14.6 20.3x

Results and Configuration Summary

Hardware Configuration:

Oracle's Sun Blade 6048 Modular Modular System with
12 x Oracle's Sun Blade x6275 Server Modules, each with
4 x 2.93 GHz Intel Xeon QC X5570 processors
12 x 4 GB memory at 1333 MHz
2 x 24 GB Internal Flash
QDR InfiniBand Lustre 1.8.0.1 File System

Software Configuration:

OS: 64-bit SUSE Linux Enterprise Server SLES 10 SP 2
MPI: Oracle Message Passing Toolkit 8.2.1 for I/O optimization to Lustre file system
MPI: Scali MPI Connect 5.6.6-59413 for original Lustre file system runs
Compiler: Oracle Solaris Studio 12 C++, Fortran, OpenMP

Benchmark Description

The primary objective of this Reverse Time Migration Benchmark is to present MPI I/O tuning techniques, exploit the power of the Sun's HPC ClusterTools MPI I/O implementation, and demonstrate the world-class performance of Sun's Lustre File System to Exploration Geophysicists throughout the world. A Sun Blade 6048 Modular System with 12 Sun Blade X6275 server modules were clustered together with a QDR Infiniband Lustre File System to show performance improvements in the Reverse Time Migration Throughput by using the Sun HPC ClusterTools MPI-IO features to implement specific algorithm I/O optimizations.

This Reverse Time Migration Benchmark measures the total time it takes to image 800 samples of various production size grids and write the final image to disk. In this new I/O optimized version, each node reads in only the data to be processed by that node plus a 4 element inline pad shared with it's neighbors to the left and right. This latest version, essentially loads the boundary condition data during the initialization phase. The previous version handled boundary conditions by having each node read in all the trace, velocity, and conditioning data. Or, alternatively, the master node would read in all the data and distribute it in it's entirety to every node in the cluster. With the previous version, each node had full memory copies of all input data sets even when it only processed a subset of that data. The new version only holds the inline dimensions and pads to be processed by a particular node in memory.

Key Points and Best Practices

  • The original implementation of the trace I/O involved the master node reading in nx \* ny floats and communicating this trace data to all the other nodes in a synchronous manner. Each node only used a subset of the trace data for each of the 800 time steps. The optimized I/O version has each node asynchronously read in only the (nx/num_procs + 8) \* ny floats that it will be processing. The additional 8 inline values for the optimized I/O version are the 4 element pads of a node's left and right neighbors to handle initial boundary conditions. The MPI_Barrier needed for the original implementation, for synchronization, and the additional I/O for each node to load all the data values, truly impacts performance. For the I/O optimized version, each node reads only the data values it needs and does not require the same MPI_Barrier synchronization as the original version of the Reverse Time Migration Benchmark. By performing such I/O optimizations, a significant improvement is seen in the Trace I/O.

  • For the best MPI performance, allocate the X6275 nodes in blade by blade order and run with HyperThreading enabled. The "Binary Conditioning" part of the Reverse Time Migration specifically likes hyperthreading.

  • To get the best I/O performance, use a maximum of 70% of each nodes available memory for the Reverse Time Migration application. Execution time may vary I/O results can occur if the nodes have different memory size configurations.

See Also

Thursday Jan 21, 2010

SPARC Enterprise M4000 PeopleSoft NA Payroll 240K Employees Performance (16 Streams)

The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology, the Sun Storage F5100 flash array, has produced World Record Performance on PeopleSoft Payroll 9.0 (North American) 240K employees benchmark.

  • The Sun SPARC Enterprise M4000 server with four 2.53 GHz SPARC64 VII processors and the Sun Storage F5100 flash array using 16 job streams (payroll threads) is 55% faster than the HP rx6600 (4 x 1.6GHz Itanium2 processors) as measured for payroll processing tasks in the PeopleSoft Payroll 9.0 (North American) benchmark. The Sun result used the Oracle 11gR1 database running on Solaris 10.

  • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and the Sun Storage F5100 flash array is 2.1x faster than the 2027 MIPs IBM Z990 (6 Z990 Gen1 processors) as measured for payroll processing tasks in the PeopleSoft Payroll 9.0 (North American) benchmark. The Sun result use the Oracle 11gR1 database running on Solaris 10 while the IBM result was run with 8 payroll threads and used IBM DB2 for Z/OS 8.1 for the database.

  • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and a Sun Storage F5100 flash array processed payroll for 240K employees using PeopleSoft Payroll 9.0 (North American) and Oracle 11gR1 running on Solaris 10 with different execution strategies with resulted in a maximum CPU utilization of 45% compared to HP's reported CPU utilization of 89%.

  • The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology processed 16 Sequential Jobs and single run control with a total time of 534 minutes, an improvement of 19% compared to HP's time of 633 minutes.

  • Sun's FlashFire technology dramatically improves IO performance for the Peoplesoft Payroll 9.0 (North American) benchmark with significant performance boost over best optimized FC disks (60+).

  • The Sun Storage F5100 Flash Array is a high performance high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

  • Sun estimates that the MIPS rating for a Sun SPARC Enterprise M4000 server is over 3000 MIPS.

Performance Landscape

240K Employees

System Processor OS/Database Time in Minutes Num of
Streams
Ver
Payroll
Processing
Result
Run 1 Run 2 Run 3
Sun M4000 4x 2.53GHz SPARC64 VII Solaris/Oracle 11gR1 43.78 51.26 286.11 534.35 16 9.0
HP rx6600 4x 1.6GHz Itanium2 HP-UX/Oracle 11g 68.07 81.17 350.16 633.25 16 9.0
IBM Z990 6x Gen1 2027 MIPS Z/OS /DB2 91.70 107.34 328.66 544.80 8 9.0

Note: IBM benchmark documents show that 6 Gen1 procs is 2027 mips. 13 Gen1 processors were in this config but only 6 were available for testing.

Results and Configuration Summary

Hardware Configuration:

    1 x Sun SPARC Enterprise M4000 (4 x 2.53 GHz/32GB)
    1 x Sun Storage F5100 Flash Array (40 x 24GB FMODs)
    1 x Sun Storage J4200 (12 x 450GB SAS 15K RPM)

Software Configuration:

    Solaris 10 5/09
    Oracle PeopleSoft HCM 9.0 64-bit
    Oracle PeopleSoft Enterprise (PeopleTools) 8.49.08 64-bit
    Micro Focus Server Express 4.0 SP4 64-bit
    Oracle RDBMS 11.1.0.7 64-bit
    HP's Mercury Interactive QuickTest Professional 9.0

Benchmark Description

The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

To measure five application business process run times for a database representing large organization. The five processes are:

  • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

  • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

  • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

  • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

  • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

For the benchmark, we collect at least three data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of sixteen job streams to be configured to run in parallel.

Key Points and Best Practices

Please see the white paper for information on PeopleSoft payroll best practices using flash.

See Also

Disclosure Statement

Oracle PeopleSoft Payroll 9.0 benchmark, Sun M4000 (4 2.53GHz SPARC64) 43.78 min, IBM Z990 (6 gen1) 91.70 min, HP rx6600 (4 1.6GHz Itanium2) 68.07 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html, results 1/21/2010.

Wednesday Nov 18, 2009

Sun Flash Accelerator F20 PCIe Card Achieves 100K 4K IOPS and 1.1 GB/sec

Part of the Sun FlashFire family, the Sun Flash Accelerator F20 PCIe Card is a low-profile x8 PCIe card with 4 Solid State Disks-on-Modules (DOMs) delivering over 101K IOPS (4K IO) and 1.1 GB/sec throughput (1M reads).

The Sun F20 card is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

  • The Sun Flash Accelerator F20 PCIe Card demonstrates breakthrough performance of 101K IOPS for 4K random read
  • The Sun Flash Accelerator F20 PCIe Card can also perform 88K IOPS for 4K random write
  • The Sun Flash Accelerator F20 PCIe Card has unprecedented throughput of 1.1 GB/sec.
  • The Sun Flash Accelerator F20 PCIe Card (low-profile x8 size) has the IOPS performance of over 550 SAS drives or 1,100 SATA drives.

Performance Landscape

Bandwidth and IOPS Measurements

Test DOMs
4 2 1
Random 4K Read 101K IOPS 68K IOPS 35K IOPS
Maximum Delivered Random 4K Write 88K IOPS 44K IOPS 22K IOPS
Maximum Delivered 50-50 4K Read/Write 54K IOPS 27K IOPS 13K IOPS
Sequential Read (1M) 1.1 GB/sec 547 MB/sec 273 MB/sec
Maximum Delivered Sequential Write (1M) 567 MB/sec 243 MB/sec 125 MB/sec

Sustained Random 4K Write\* 37K IOPS 18K IOPS 10K IOPS
Sustained 50/50 4K Read/Write\* 34K IOPS 17K IOPS 8.6K IOPS

(\*) Maximum Delivered values measured over a 1 minute period. Sustained write performance differs from maximum delivered performance. Over time, wear-leveling and erase operations are required and impact write performance levels.

Latency Measurements

The Sun Flash Accelerator F20 PCIe Card is tuned for 4 KB or larger IO sizes, the write service for IOs smaller than 4 KB can be 10 times more than shown in the table below. It should also be noted that the service times shown below are both the latency and the time to transfer the data. This becomes the dominant portion the the service time for IOs over 64 KB in size.

Transfer Size Service Time (ms)
Read Write
4 KB 0.32 0.22
8 KB 0.34 0.24
16 KB 0.37 0.27
32 KB 0.43 0.33
64 KB 0.54 0.46
128 KB 0.49 1.30
256 KB 1.31 2.15
512 KB 2.25 2.25

- Latencies are measured application latencies via vdbench tool.
- Please note that the FlashFire F20 card is a 4KB sector device. Doing IOs of less than 4KB in size, or not aligned on 4KB boundaries, can result in a significant performance degradations on write operations.

Results and Configuration Summary

Storage:

    Sun Flash Accelerator F20 PCIe Card
      4 x 24-GB Solid State Disks-on-Modules (DOMs)

Servers:

    1 x Sun Fire X4170

Software:

    OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
    Vdbench 5.0
    Required Flash Array Patches SPARC, ses/sgen patch 138128-01 or later & mpt patch 141736-05
    Required Flash Array Patches x86, ses/sgen patch 138129-01 or later & mpt patch 141737-05

Benchmark Description

Sun measured a wide variety of IO performance metrics on the Sun Flash Accelerator F20 PCIe Card using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

Vdbench profile f20-parmfile.txt is here for bandwidth and IOPs. And here is the vdbench profile f20-latency.txt file for latency.

Vdbench is publicly available for download at: http://vdbench.org

Key Points and Best Practices

  • Drive each Flash Modules with 32 outstanding IO as shown in the benchmark profile above.
  • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance

See Also

Disclosure Statement

Sun Flash Accelerator F20 PCIe Card delivered 100K 4K read IOPS and 1.1 GB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 14, 2009.

Wednesday Oct 28, 2009

SPC-2 Sun Storage 6780 Array RAID 5 & RAID 6 51% better $/performance than IBM DS5300

Significance of Results

Results on the Sun Storage 6780 Array with 8Gb connectivity are presented for the SPC-2 benchmark using RAID 5 and RAID 6.
  • The Sun Storage 6780 array outperforms the IBM DS5300 by 51% in price performance for SPC-2 benchmark using RAID 5 data protection.

  • The Sun Storage 6780 array outperforms the IBM DS5300 by 51% in price performance for SPC-2 benchmark using RAID 6 data protection.

  • The Sun Storage 6780 Array has 62% better performance than the Fujitsu 800/1100 and delivers a price performance advantage of 5.6x as measured by the SPC-2 benchmark.

  • The Sun Storage 6800 array with 8Gb connectivity improved performance by 36% over the 4GB connected solution as measured by the SPC-2 benchmark.

Performance Landscape

SPC-2 Performance Chart (in increasing price-performance order)

Sponsor System SPC-2
MBPS
$/SPC-2
MBPS
ASU
Capacity
(GB)
TSC Price Data
Protection
Level
Date Results
Identifier
Sun SS6780 (8Gb) 5,634.17 $44.88 16,383.186 $252,873 RAID 5 10/27/09 B00047
IBM DS5300 (8Gb) 5,634.17 $67.75 16,383.186 $381,720 RAID 5 10/21/09 B00045
Sun SS6780 (8Gb) 5,543.88 $45.61 14,042.731 $252,873 RAID 6 10/27/09 B00048
IBM DS5300 (8Gb) 5,543.88 $68.85 14,042.731 $381,720 RAID 6 10/21/09 B00046
Sun SS6780 (4Gb) 4,818.43 $53.61 16,383.186 $258,329 RAID 5 02/03/09 B00039
IBM DS5300 (4Gb) 4,818.43 $93.80 16,383.186 $451,986 RAID 5 09/25/08 B00037
Sun SS6780 (4Gb) 4,675.50 $55.25 14,042.731 $258,329 RAID 6 02/03/09 B00040
IBM DS5300 (4Gb) 4,675.50 $96.67 14,042.731 $451,986 RAID 6 09/25/08 B00038
Fujitsu 800/1100 3,480.68 $238.93 4,569.845 $831,649 Mirroring 03/08/07 B00019

SPC-2 MBPS = the Performance Metric
$/SPC-2 MBPS = the Price/Performance Metric
ASU Capacity = the Capacity Metric
Data Protection = Data Protection Metric
TSC Price = Total Cost of Ownership Metric
Results Identifier = A unique identification of the result Metric

Complete SPC-2 benchmark results may be found at http://www.storageperformance.org.

Results and Configuration Summary

Storage Configuration:

    8 x CM200 trays, each with 16 x 146GB 15K RPM drives
    8 x Qlogic 8Gb HBA

Server Configuration:

    4 x IBM x3650
      2 x 2.93 GHz Intel X5570
      5 GB memory

Software Configuration:

    Microsoft Windows Server 2003 Enterprise Edition (32-bit) with SP2
    SPC-2 benchmark kit

Benchmark Description

The SPC Benchmark-2™ (SPC-2) is a series of related benchmark performance tests that simulate the sequential component of demands placed upon on-line, non-volatile storage in server class computer systems. SPC-2 provides measurements in support of real world environments characterized by:
  • Large numbers of concurrent sequential transfers.
  • Demanding data rate requirements, including requirements for real time processing.
  • Diverse application techniques for sequential processing.
  • Substantial storage capacity requirements.
  • Data persistence requirements to ensure preservation of data without corruption or loss.

Key Points and Best Practices

  • This benchmark was performed using RAID 5 and RAID 6 protection.
  • The controller stripe size was set to 512k.
  • No volume manager was used.

See Also

Benchmark Tags

$/Perf, performance, bandwidth, OpenStorage, Storage

Disclosure Statement

SPC-2, SPC-2 MBPS, $/SPC-2 MBPS are regular trademarks of Storage Performance Council (SPC). More info www.storageperformance.org. Sun Storage 6780 Array 5,634.17 SPC-2 MBPS, $/SPC-2 MBPS $44.88, ASU Capacity 16,838.186GB, Protect RAID 5, Cost $252,873.00, Ident. B00047. Sun Storage 6780 Array 5,543.88 SPC-2 MBPS, $/SPC-2 MBPS $45.61, ASU Capacity 14,042.731 GB, Protect RAID 6, Cost $252,873.00, Ident. B00048.

Publication Rules

See here for publication rules.

Wednesday Oct 14, 2009

Oracle Open World (OOW) BestPerf Index 14 October 2009

Here is a BestPerf blog index to a variety of benchmarks announced at Oracle Open World and others talked about at the conference.

Colors used:

Benchmark
Best Practices
Other

ORACLEOPENWORLD

CMT Servers

Oct 11, 2009 \* TPC-C World Record Sun - Oracle \*
Oct 13, 2009 Sun T5440 Oracle BI EE Sun T5440 World Record
Oct 13, 2009 SPECweb200 Sun T5440 World Record, Solaris Containers and Sun Storage F5100
Sep 01, 2009 String Searching - Sun T5240 & T5440 Outperform IBM Cell Broadband Engine
Aug 27, 2009 Sun T5240 Beats 4-Chip IBM Power 570 POWER6 System on SPECjbb2005
Aug 26, 2009 Sun T5220 Sets Single Chip World Record on SPECjbb2005
Aug 12, 2009 SPECmail2009 on Sun T5240 and Sun Java System Messaging Server 6.3
Jul 23, 2009 World Record Performance of Sun CMT Servers
Jul 22, 2009 Why does 1.6 beat 4.7?
Jul 21, 2009 Zeus ZXTM Traffic Manager World Record on Sun T5240
Jul 21, 2009 Sun T5440 World Record SAP-SD 4-Processor Two-tier SAP ERP 6.0 EP4 (Unicode)

SPARC64 Servers

Oct 13, 2009 SAP 2-tier SD Benchmark on Sun M9000/32 SPARC64 VII
Oct 13, 2009 Oracle PeopleSoft Payroll Sun M4000 and Sun Storage F5100 World Record Performance
Oct 12, 2009 Best Practices: M4000 Sun Storage F5100 is a good option for Peoplesoft Payroll
Oct 13, 2009 Oracle Hyperion Sun M5000 and Sun Storage 7410
Oct 13, 2009 SPECcpu2006 Results On MSeries Servers, New SPARC64 VII

X86 Servers

Oct 13, 2009 SAP 2-tier SD-Parallel on Sun Blade X6270 1-node, 2-node and 4-node
Aug 28, 2009 Sun X4270 World Record SAP-SD 2-Processor Two-tier SAP ERP 6.0 EP 4 (Unicode)
Oct 02, 2009 Sun X4270 VMware VMmark benchmark achieves excellent result
Sep 22, 2009 Sun X4270 Virtualized for Two-tier SAP ERP 6.0 EP4 (Unicode) Standard Sales and Distribution Benchmark

HPC Benchmarks

Oct 13, 2009 Halliburton ProMAX Oil & Gas Appl on Sun 6048/X6275 Cluster and Oracle Database
Oct 13, 2009 MCAE ABAQUS faster on Sun F5100 and Sun X4270 - Single Node World Record
Oct 12, 2009 MCAE ANSYS faster on Sun F5100 and Sun X4270
Oct 12, 2009 MCAE MCS/NASTRAN faster on Sun F5100 and Fire X4270
Oct 13, 2009 CP2K Life Sciences, Ab-initio Chem - Sun C48 with Sun Blade X6275 - QDR InfiniBand
Oct 09, 2009 X6275 Cluster Demonstrates Performance and Scalability on WRF 2.5km CONUS Dataset

Specific Storage Benchmarks

Oct 12, 2009 SPC-2 Sun Storage 6180 RAID 5 & RAID 6 Over 70% Better $/Performance than IBM
Oct 12, 2009 SPC-1 Sun Storage 6180 Over 70% Better $/Performance than IBM
Oct 12, 2009 1.6 Million 4K IOPS in 1RU on Sun Storage F5100 Flash Array

Additional CMT Server Benchmarks

Jul 21, 2009 1.6 GHz SPEC CPU2006 - Rate Benchmarks
Jul 21, 2009 Sun Blade T6320 World Record SPECjbb2005 performance
Jul 21, 2009 Sun T5440 SPECjbb2005 Beats IBM POWER6 Chip-to-Chip

Tuesday Oct 13, 2009

Sun T5440 Oracle BI EE Sun SPARC Enterprise T5440 World Record

The Oracle BI EE, a component of Oracle Fusion Middleware,  workload was run on two Sun SPARC Enterprise T5440 servers and achieved world record performance.
  • Two Sun SPARC Enterprise T5440 servers with four 1.6 GHz UltraSPARC T2 Plus processors delivered the best performance of 50K concurrent users on the Oracle BI EE 10.1.3.4 benchmark with Oracle 11g database running on free and open Solaris 10.

  • The two node Sun SPARC Enterprise T5440 servers with Oracle BI EE running on Solaris 10 using 8 Solaris Containers shows 1.8x scaling over Sun's previous one node SPARC Enterprise T5440 server result with 4 Solaris Containers.

  • The two node SPARC Enterprise T5440 servers demonstrated the performance and scalability of the UltraSPARC T2 Plus processor demonstrating 50K users can be serviced with 0.2776 sec response time.

  • The Sun SPARC Enterprise T5220 server was used as an NFS server with 4 internal SSDs and the ZFS file system which showed significant I/O performance improvement over traditional disk for Business Intelligence Web Catalog activity.

  • Oracle Fusion Middleware provides a family of complete, integrated, hot pluggable and best-of-breed products known for enabling enterprise customers to create and run agile and intelligent business applications. Oracle BI EE performance demonstrates why so many customers rely on Oracle Fusion Middleware as their foundation for innovation.

  • IBM has not published any POWER6 processor based results on this important benchmark.

Performance Landscape

System Processors Users
Chips GHz Type
2 x Sun SPARC Enterprise T5440 8 1.6 UltraSPARC T2 Plus 50,000
1 x Sun SPARC Enterprise T5440 4 1.6 UltraSPARC T2 Plus 28,000
5 x Sun Fire T2000 1 1.2 UltraSPARC T1 10,000

Results and Configuration Summary

Hardware Configuration:

    2 x Sun SPARC Enterprise T5440 (1.6GHz/128GB)
    1 x Sun SPARC Enterprise T5220 (1.2GHz/64GB) and 4 SSDs (used as NFS server)

Software Configuration:

    Solaris10 05/09
    Oracle BI EE 10.1.3.4
    Oracle Fusion Middleware
    Oracle 11gR1

Benchmark Description

The objective of this benchmark is to highlight how Oracle BI EE can support pervasive deployments in large enterprises, using minimal hardware, by simulating an organization that needs to support more than 25,000 active concurrent users, each operating in mixed mode: ad-hoc reporting, application development, and report viewing.

The user population was divided into a mix of administrative users and business users. A maximum of 28,000 concurrent users were actively interacting and working in the system during the steady-state period. The tests executed 580 transactions per second, with think times of 60 seconds per user, between requests. In the test scenario 95% of the workload consisted of business users viewing reports and navigating within dashboards. The remaining 5% of the concurrent users, categorized as administrative users, were doing application development.

The benchmark scenario used a typical business user sequence of dashboard navigation, report viewing, and drill down. For example, a Service Manager logs into the system and navigates to his own set of dashboards viz. .Service Manager.. The user then selects the .Service Effectiveness. dashboard, which shows him four distinct reports, .Service Request Trend., .First Time Fix Rate., .Activity Problem Areas., and .Cost Per completed Service Call . 2002 till 2005. . The user then proceeds to view the .Customer Satisfaction. dashboard, which also contains a set of 4 related reports. He then proceeds to drill-down on some of the reports to see the detail data. Then the user proceeds to more dashboards, for example .Customer Satisfaction. and .Service Request Overview.. After navigating through these dashboards, he logs out of the application

This benchmark did not use a synthetic database schema. The benchmark tests were run on a full production version of the Oracle Business Intelligence Applications with a fully populated underlying database schema. The business processes in the test scenario closely represents a true customer scenario.

See Also

Disclosure Statement

Oracle BI EE benchmark results 10/13/2009, see
About

BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

Index Pages
Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today