Wednesday Apr 14, 2010

Oracle Sun Storage F5100 Flash Array Delivers World Record SPC-1C Performance

Oracle's Sun Storage F5100 flash array delivered world record performance on the SPC-1C benchmark. The SPC-1C benchmark shows the advantage of Oracle's FlashFire technology.

  • The Sun Storage F5100 flash array delivered world record SPC-1C performance of 300,873.47 SPC-1C IOPS.

  • The Sun Storage F5100 flash array requires half the rack space of the of the next best result, the IBM System Storage EXP12S.

  • The Sun Storage F5100 flash array delivered nearly seven times better SPC-1C IOPS performance than the next best SPC-1C result, the IBM System Storage EXP12S with 8 SSDs.

  • The Sun Storage F5100 flash array delivered the world record SPC-1C LRT (response time) performance of 330 microseconds, and a full load response time of 2.63 milliseconds, which is over 2.5x better than the IBM System Storage EXP12S SPC-1C result.

  • Compared to the IBM result, the Sun Storage F5100 flash array delivered 2.7x better access density (SPC-1C IOPS/ ASU GB), 3.9x better price/performance (TSC/ SPC-1C IOPS) and 31% better tested $/GB (TSC/ ASU) as part of these SPC-1C benchmark results.

  • The Sun Storage F5100 flash array delivered world record SPC-1C performance using the SPC-1C workload driven by the Sun SPARC Enterprise M5000 server. This type of workload is similar to database acceleration workloads where the storage is used as a low-latency cache. Typically these applications do not require data protection.

Performance Landscape

System SPC-1C
IOPS
ASU
Capacity
(GB)
TSC Data
Protection
Level
LRT
Response
(usecs)
Access
Density
Price
/Perf
$/GB Identifier
Sun F5100 300,873.47 1374.390 $151,381 unprotected 330 218.9 $0.50 110.1 C00010
IBM EXP12S 45,000.20 547.610 $87,486 unprotected 460 82.2 $1.94 159.8 E00001

SPC-1C IOPS – SPC-1C performance metric, bigger is better
ASU Capacity – Application storage unit capacity (in GB)
TSC – Total price of tested storage configuration, smaller is better
Data Protection Level – Data protection level used in benchmark
LRT Response (usecs) – Average response time (microseconds) of the 10% BSU load level test run, smaller is better
Access Density – Derived metric of SPC-1C IOPS / ASU GB, bigger is better
Price/Perf – Derived metric of TSC / SPC-1C IOPS, smaller is better
$/GB – Derived metric of TSC / ASU, smaller is better
Identifier – The SPC-1C submission identifier

Results and Configuration Summary

Storage Configuration:

1 x Sun Storage F5100 flash array with 80 FMODs

Hardware Configuration:

1 x Sun SPARC Enterprise M5000 server
16 x StorageTek PCIe SAS Host Bus Adapter, 8 Port

Software Configuration:

Oracle Solaris 10
SPC-1C benchmark kit

Benchmark Description

SPC-1C is the first SPC component-level benchmark applicable across a broad range of storage component products such as disk drives, host bus adapters (HBAs) intelligent enclosures, and storage software such as Logical Volume Managers. SPC-1C utilizes an identical workload as SPC-1, which is designed to demonstrate the performance of a storage component product while performing the typical functions of business critical applications. Those applications are characterized by predominately random I/O operations and require both queries as well as update operations. Examples of those types of applications include OLTP, database operations, and mail server implementations.

SPC-1C configurations consist of one or more HBAs/Controllers and one of the following storage device configurations:

  • One, two, or four storage devices in a stand alone configuration. An external enclosure may be used but only to provide power and/or connectivity for the storage devices.

  • A "Small Storage Subsystem" configured in no larger than a 4U enclosure profile (1 - 4U, 2 - 2U, 4 - 1U, etc.).

Key Points and Best Practices

See Also

Disclosure Statement

SPC-1C, SPC-1C IOPS, SPC-1C LRT are trademarks of Storage Performance Council (SPC), see www.storageperformance.org for more information. Sun Storage F5100 flash array SPC-1C submission identifier C00010 results of 300,873.47 SPC-1C IOPS over a total ASU capacity of 1374.390 GB using unprotected data protection, a SPC-1C LRT of 0.33 milliseconds, a 100% load over all ASU response time of 2.63 milliseconds and a total TSC price (including three-year maintenance) of $151,381. This compares with IBM System Storage EXP12S SPC-1C/E Submission identifier E00001 results of 45,000.20 SPC-1C IOPS over a total ASU capacity of 547.61 GB using unprotected data protection level, a SPC-1C LRT of 0.46 milliseconds, a 100% load over all ASU response time of 6.95 milliseconds and a total TSC price (including three-year maintenance) of $87,468.

The Sun Storage F5100 flash array is a 1RU (1.75") array. The IBM System Storage EXP12S is a 2RU (3.5") array.

Tuesday Apr 13, 2010

Oracle Sun Flash Accelerator F20 PCIe Card Accelerates Web Caching Performance

Using Oracle's Sun FlashFire technology, the Sun Accelerator F20 PCIe Card is shown to be a high performance and cost effective caching device for web servers. Many current web and application servers are designed with an active cache that is used for holding things like session objects, files and web pages. The Sun F20 card is shown to be an excellent candidate to improve performance over using HDD solutions.

  • The Sun Flash Accelerator F20 PCIe Card provides 2x better Quality of Service (QoS) at the same load as compared to 15K RPM high performance disk drives.

  • The Sun Flash Accelerator F20 PCIe Card enables scaling to 3x more users than 15K RPM high performance disk drives.

  • The Sun Flash Accelerator F20 PCIe Card provides 25% higher Quality of Service (QoS) than 15K RPM high performance disk drives at maximum rate.

  • The Sun Flash Accelerator F20 PCIe Card allows for easy expansion of the webcache. Each card provides an additional 96 GB of storage.

  • The Sun Flash Accelerator F20 PCIe Card used as a caching device offers Bitrate and Quality of Service (QoS) comparable to that provided by memory. While memory also provides excellent caching performance in comparison to disk, memory capacity is limited in servers.

Performance Landscape

Experiment results using three Sun Flash Accelerator F20 PCIe Cards.

Load Factor No Cache F20 Webcache Memcache
Max Load @Disk Load Max Load @F20 Load
Max Connections 7,000 7,000 27,000 27,000
Average Bitrate 445 Kbps 870 Kbps 602 Kbps 678 Kbps
Cache Hit Rate 0% 98% 99% 56%

QoS Bitrates %Connect %Connect %Connect %Connect
900 Kbps - 1 Mbps 0% 97% 0% 0%
800 Kbps 0% 3% 0% 6%
700 Kbps 0% 0% 64% 70%
600 Kbps 18% 0% 24% 15%
420 Kbps - 500 Kbps 88% 0% 12% 9%

Experiment results using two Sun Flash Accelerator F20 PCIe Cards.

Load Factor No Cache F20 Webcache Memcache
Max Load @Disk Load Max Load @F20 Load
Max Connections 7,000 7,000 22,000 27,000
Average Bitrate 445 Kbps 870 Kbps 622 Kbps 678 Kbps
Cache Hit Rate 0% 98% 80% 56%

QoS Bitrates %Connect %Connect %Connect %Connect
900 Kbps - 1 Mbps 0% 97% 0% 0%
800 Kbps 0% 3% 1% 6%
700 Kbps 0% 0% 68% 70%
600 Kbps 18% 0% 26% 15%
420 Kbps - 500 Kbps 88% 0% 5% 9%

Results and Configuration Summary

Hardware Configuration:

Sun Fire X4270, 72 GB memory
3 X Sun Flash Accelerator F20 PCIe Card
Sun Storage J4400 (12 15K RPM disks)

Software Configuration:

Sun Java System Web Server 7
OpenSolaris
Flickr Photo Download Workload
Oracle Solaris Zettabyte File System (ZFS)

Three configurations are compared:

  1. No cache, 12 x high-speed 15K RPM Disks
  2. 3 x Sun Flash Accelerator F20 PCIe Cards as cache device
  3. 64 GB server memory as cache device

Benchmark Description

This benchmark is based upon the description of the flickr website presented at http://highscalability.com/flickr-architecture. It measures performance of an HTTP-based file photo Slide Show workload. The workload randomly selects and downloads from 80 photos stored in 4 bins:

  • 20 large photos, 1800x1800p, 1 MB, 1% probability
  • 20 medium photos, 1000x1000p, 500 KB, 4% probability
  • 20 small photos, 540x540p, 100K, 35% probability
  • 20 thumbnail photos, 100x100p, 5k, 60% probability

Benchmark metrics are:

  • Scalability – Number of persistent connections achieved
  • Quality of Service (QoS) – bitrate achieved by each user
    • max speed: 1 Mbps, min speed SLA: 420 Kbps
    • divides bitrates between max and min in 5 bands, corresponding to dial-in, T1, etc.
    • example: 900 Kbps, 800 Kbps, 700 Kbps, 600 Kbps, 500 Kbps
    • reports %users in each bitrate band

Three cases were tested:

  • Disk as OverFlow Cache – Contents are served from 12 high-performance 15K RPM disks configured in a ZFS zpool.
  • Sun Flash Accelerator F20 PCIe Card as Cache Device – Contents are served from 2 F20 Cards, with 8 component DOMs configured in a ZFS spool
  • Memory as Cache – Contents are served from tmpfs

Key Points and Best Practices

See Also

Disclosure Statement

Results as of 4/1/2010.

Wednesday Nov 18, 2009

Sun Flash Accelerator F20 PCIe Card Achieves 100K 4K IOPS and 1.1 GB/sec

Part of the Sun FlashFire family, the Sun Flash Accelerator F20 PCIe Card is a low-profile x8 PCIe card with 4 Solid State Disks-on-Modules (DOMs) delivering over 101K IOPS (4K IO) and 1.1 GB/sec throughput (1M reads).

The Sun F20 card is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

  • The Sun Flash Accelerator F20 PCIe Card demonstrates breakthrough performance of 101K IOPS for 4K random read
  • The Sun Flash Accelerator F20 PCIe Card can also perform 88K IOPS for 4K random write
  • The Sun Flash Accelerator F20 PCIe Card has unprecedented throughput of 1.1 GB/sec.
  • The Sun Flash Accelerator F20 PCIe Card (low-profile x8 size) has the IOPS performance of over 550 SAS drives or 1,100 SATA drives.

Performance Landscape

Bandwidth and IOPS Measurements

Test DOMs
4 2 1
Random 4K Read 101K IOPS 68K IOPS 35K IOPS
Maximum Delivered Random 4K Write 88K IOPS 44K IOPS 22K IOPS
Maximum Delivered 50-50 4K Read/Write 54K IOPS 27K IOPS 13K IOPS
Sequential Read (1M) 1.1 GB/sec 547 MB/sec 273 MB/sec
Maximum Delivered Sequential Write (1M) 567 MB/sec 243 MB/sec 125 MB/sec

Sustained Random 4K Write\* 37K IOPS 18K IOPS 10K IOPS
Sustained 50/50 4K Read/Write\* 34K IOPS 17K IOPS 8.6K IOPS

(\*) Maximum Delivered values measured over a 1 minute period. Sustained write performance differs from maximum delivered performance. Over time, wear-leveling and erase operations are required and impact write performance levels.

Latency Measurements

The Sun Flash Accelerator F20 PCIe Card is tuned for 4 KB or larger IO sizes, the write service for IOs smaller than 4 KB can be 10 times more than shown in the table below. It should also be noted that the service times shown below are both the latency and the time to transfer the data. This becomes the dominant portion the the service time for IOs over 64 KB in size.

Transfer Size Service Time (ms)
Read Write
4 KB 0.32 0.22
8 KB 0.34 0.24
16 KB 0.37 0.27
32 KB 0.43 0.33
64 KB 0.54 0.46
128 KB 0.49 1.30
256 KB 1.31 2.15
512 KB 2.25 2.25

- Latencies are measured application latencies via vdbench tool.
- Please note that the FlashFire F20 card is a 4KB sector device. Doing IOs of less than 4KB in size, or not aligned on 4KB boundaries, can result in a significant performance degradations on write operations.

Results and Configuration Summary

Storage:

    Sun Flash Accelerator F20 PCIe Card
      4 x 24-GB Solid State Disks-on-Modules (DOMs)

Servers:

    1 x Sun Fire X4170

Software:

    OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
    Vdbench 5.0
    Required Flash Array Patches SPARC, ses/sgen patch 138128-01 or later & mpt patch 141736-05
    Required Flash Array Patches x86, ses/sgen patch 138129-01 or later & mpt patch 141737-05

Benchmark Description

Sun measured a wide variety of IO performance metrics on the Sun Flash Accelerator F20 PCIe Card using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

Vdbench profile f20-parmfile.txt is here for bandwidth and IOPs. And here is the vdbench profile f20-latency.txt file for latency.

Vdbench is publicly available for download at: http://vdbench.org

Key Points and Best Practices

  • Drive each Flash Modules with 32 outstanding IO as shown in the benchmark profile above.
  • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance

See Also

Disclosure Statement

Sun Flash Accelerator F20 PCIe Card delivered 100K 4K read IOPS and 1.1 GB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 14, 2009.

Wednesday Nov 04, 2009

New TPC-C World Record Sun/Oracle

TPC-C Sun SPARC Enterprise T5440 with Oracle RAC World Record Database Result

Sun and Oracle demonstrate the World's fastest database performance. Sun Microsystems using 12 Sun SPARC Enterprise T5440 servers, 60 Sun Storage F5100 Flash arrays and Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning delivered a world-record TPC-C benchmark result.

  • The 12-node Sun SPARC Enterprise T5440 server cluster result delivered a world record TPC-C benchmark result of 7,646,486.7 tpmC and $2.36 $/tpmC (USD) using Oracle 11g R1 on a configuration available 3/19/10.

  • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the IBM Power 595 (5GHz) with IBM DB2 9.5 database by 26% and has 16% better price/performance on the TPC-C benchmark.

  • The complete Oracle/Sun solution used 10.7x better computational density than the IBM configuration (computational density = performance/rack).

  • The complete Oracle/Sun solution used 8 times fewer racks than the IBM configuration.

  • The complete Oracle/Sun solution has 5.9x better power/performance than the IBM configuration.

  • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the HP Superdome (1.6GHz Itanium2) by 87% and has 19% better price/performance on the TPC-C benchmark.

  • The Oracle/Sun solution utilized Sun FlashFire technology to deliver this result. The Sun Storage F5100 flash array was used for database storage.

  • Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning scales and effectively uses all of the nodes in this configuration to produce the world record performance.

  • This result showed Sun and Oracle's integrated hardware and software stacks provide industry-leading performance.

More information on this benchmark will be posted in the next several days.

Performance Landscape

TPC-C results (sorted by tpmC, bigger is better)


System
tpmC Price/tpmC Avail Database Cluster Racks w/KtpmC
12 x Sun SPARC Enterprise T5440 7,646,487 2.36 USD 03/19/10 Oracle 11g RAC Y 9 9.6
IBM Power 595 6,085,166 2.81 USD 12/10/08 IBM DB2 9.5 N 76 56.4
HP Integrity Superdome 4,092,799 2.93 USD 08/06/07 Oracle 10g R2 N 46 to be added

Avail - Availability date
w/KtmpC - Watts per 1000 tpmC
Racks - clients, servers, storage, infrastructure

Sun and IBM TPC-C Response times


System
tpmC

Response Time

New Order 90th%

Response Time

New Order Average

12 x Sun SPARC Enterprise T5440 7,646,487 0.170 0.168
IBM Power 595 6,085,166 1.69
1.22
Response Time Ratio - Sun Better

9.9x 7.3x

Sun uses 7x comparison to highlight the differences in response times between Sun's solution and IBM.  Although notice that Sun is 10x faster on New Order transactions that finish in the 90% percentile.

It is also interesting to note that none of Sun's response times, avg or 90th percentile, for any transaction is over 0.25 seconds. While IBM does not have even one interactive transaction, not even the menu, below 0.50 seconds. Graphs of Sun's and IBM's response times for New-Order can be found in the full disclosure reports on TPC's website TPC-C Official Result Page.

Results and Configuration Summary

Hardware Configuration:

    9 racks used to hold

    Servers:
      12 x Sun SPARC Enterprise T5440
      4 x 1.6 GHz UltraSPARC T2 Plus
      512 GB memory
      10 GbE network for cluster
    Storage:
      60 x Sun Storage F5100 Flash Array
      61 x Sun Fire X4275, Comstar SAS target emulation
      24 x Sun StorageTek 6140 (16 x 300 GB SAS 15K RPM)
      6 x Sun Storage J4400
      3 x 80-port Brocade FC switches
    Clients:
      24 x Sun Fire X4170, each with
      2 x 2.53 GHz X5540
      48 GB memory

Software Configuration:

    Solaris 10 10/09
    OpenSolaris 6/09 (COMSTAR) for Sun Fire X4275
    Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning
    Tuxedo CFS-R Tier 1
    Sun Web Server 7.0 Update 5

Benchmark Description

TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

See Also

Disclosure Statement

TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Performance Processing Council (TPC). 12-node Sun SPARC Enterprise T5440 Cluster (1.6GHz UltraSPARC T2 Plus, 4 processor) with Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning, 7,646,486.7 tpmC, $2.36/tpmC. Available 3/19/10. IBM Power 595 (5GHz Power6, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5, 6,085,166 tpmC, $2.81/tpmC, available 12/10/08. HP Integrity Superdome(1.6GHz Itanium2, 64 processors, 128 cores, 256 threads) with Oracle 10g Enterprise Edition, 4,092,799 tpmC, $2.93/tpmC. Available 8/06/07. Source: www.tpc.org, results as of 11/5/09.

Thursday Oct 15, 2009

Oracle Flash Cache - SGA Caching on Sun Storage F5100

Overview and Significance of Results

Oracle and Sun's Flash Cache technology combines New features in Oracle with the Sun Storage F5100 to improve database performance. In Oracle databases, the System Global Area (SGA) is a group of shared memory areas that are dedicated to an Oracle “instance” (Oracle processes in execution sharing a database) . All Oracle processes use the SGA to hold information. The SGA is used to store incoming data (data and index buffers) and internal control information that is needed by the database. The size of the SGA is limited by the size of the available physical memory.

This benchmark tested and measured the performance of a new Oracle Database 11g (Release2) feature, which allows to extend the SGA size and caching beyond physical memory, to a large flash memory storage device as the Sun Storage F5100 flash array.

One particular benchmark test demonstrated a dramatic performance improvement (almost 5x) using the Oracle Extended SGA feature on flash storage by reaching SGA sizes in the hundreds of GB range, at a more reasonable cost than equivalently sized RAM and with much faster access times than disk I/O.

The workload consisted in a high volume of SQL select transactions accessing a very large table in a typical business oriented OLTP database. To obtain a baseline, throughput and response times were measured applying the workload against a traditional storage configuration and constrained by disk I/O demand (DB working set of about 3x the size of the data cache in the SGA). The workload was then executed with an added Sun Storage F5100 Flash Array configured to contain an Extended SGA of incremental size.

The tests have shown scaling throughput along with increasing Flash Cache size.

Table of Results

F5100 Extended SGA Size (GB) Query Txns / Min Avg Response Time (Secs) Speedup Ratio
No 76338 0.118 N/A
25 169396 0.053 2.2
50 224318 0.037 2.9
75 300568 0.031 3.9
100 357086 0.025 4.6




Configuration Summary

Server Configuration:

    Sun SPARC Enterprise M5000 Server
    8 x SPARC64 VII 2.4GHz Quad Core
    96 GB memory

Storage Configuration:

    8 x Sun Storage J4200 Arrays, 12x 146 GB 15K RPM disks each (96 disks total)
    1 x Sun Storage F5100 Flash Array

Software Configuration:

    Oracle 11gR2
    Solaris 10

Benchmark Description

The workload consisted in a high volume of SQL select transactions accessing a very large table in a typical business oriented OLTP database.

The database consisted of various tables: Products, Customers, Orders, Warehouse Inventory (Stock) data, etc. and the Stock table alone was 3x the size of the db cache size.

To obtain a baseline, throughput and response times were measured applying the workload against a traditional storage configuration and constrained by disk I/O demand. The workload was then executed with an added Sun Storage F5100 Flash Array configured to contain an Extended SGA of incremental size.

During all tests, the in memory SGA data cache was limited to 25 GB .

The Extended SGA was allocated on a “raw' Solaris Volume created with the Solaris Volume Manager (SVM) on a set of devices (Flash Modules) residing on the Sun Storage F5100 flash array.

Key Points and Best Practices

In order to verify the performance improvement brought by extended SGA, the feature had to be tested with a large enough database size and with a workload requiring significant disk I/O activity to access the data. For that purpose, the size of the database needed to be a multiple of the physical memory size, avoiding the case in which the accessed data could be entirely or almost entirely cached in physical memory.

The above represents a typical “use case” in which the Flash Cache Extension is able to show remarkable performance advantages.

If the DB dataset is already entirely cached, or the DB I/O demand is not significant or the application is already saturating the CPU for non database related processing, or large data caching is not productive (DSS type Queries), the Extended SGA may not improve performance.

It is also relevant to know that additional memory structures needed to manage the Extended SGA are allocated in the “in memory” SGA, therefore reducing its data caching capacity.

Increasing the Extended Cache beyond a specific threshold, dependent on various factors, may reduce the benefit of widening the Flash SGA and actually reduce the overall throughput.

This new cache is somewhat similar architecturally to the L2ARC on ZFS. Once written, flash cache buffers are read-only, and updates are only done into main memory SGA buffers. This feature is expected to primarily benefit read-only and read-mostly workloads.

A typical sizing of database flash cache is 2x to 10x the size of SGA memory buffers. Note that header information is stored in the SGA for each flash cache buffer (100 bytes per buffer in exclusive mode, 200 bytes per buffer in RAC mode), so the number of available SGA buffers is reduced as the flash cache size increases, and the SGA size should be increased accordingly.

Two new init.ora parameters have been introduced, illustrated below:

    db_flash_cache_file = /lfdata/lffile_raw
    db_flash_cache_size = 100G
The db_flash_cache_file parameter takes a single file name, which can be a file system file, a raw device, or an ASM volume. The db_flash_cache_size parameter specifies the size of the flash cache. Note that for raw devices, the partition being used should start at cylinder 1 rather than cylinder 0 (to avoid the disk's volume label).

See Also

Disclosure Statement

Results as of October 10, 2009 from Sun Microsystems.

Tuesday Oct 13, 2009

SPECweb2005 on Sun SPARC Enterprise T5440 World Record using Solaris Containers and Sun Storage F5100 Flash

The Sun SPARC Enterprise T5440 server with 1.6GHz UltraSPARC T2 Plus with Solaris Containers, Sun Flash Open Storage, and Sun JAVA System Web Server 7.0 Update 5 achieved World Record SPECweb2005.
  • Sun has obtained a World Record SPECweb2005 performance result of 100,209 SPECweb2005 on the Sun SPARC Enterprise T5440, running Solaris 10 10/09 Sun JAVA System Web Server 7.0 Update 5, and Java Hotspot™ Server VM.

  • This result demonstrates performance leadership of the Sun SPARC Enterprise T5440 server and its scalability, by using Solaris Containers to consolidate multiple web serving environments, and Sun OpenStorage Flash technology to store large datasets for fast data retrieval.

  • The Sun SPARC Enterprise T5440 delivers 21% greater SPECweb2005 performance than the HP DL370 G6 with 3.2GHz Xeon W5580 processors.

  • The Sun SPARC Enterprise T5440 delivers 40% greater SPECweb2005 performance than the HP DL 585 G5 with four 3.114 GHz Opteron 8393 SE processors.

  • The Sun SPARC Enterprise T5440 delivers 2x the SPECweb2005 performance of the HP DL 580 G5 with four 2.66GHz Xeon X7460 processors.

  • There are no IBM Power6 results on the SPECweb2005 benchmark.

  • This benchmark result clearly demonstrates that the Sun SPARC Enterprise T5440 running Solaris 10 10/09 and Sun Java System Webserver 7.0 Update 5 can support thousands of concurrent web server sessions and is an industry leader in web serving with a Sun solution.

Performance Landscape

Server

Processor

SPECweb2005

Banking\*

Ecomm\*

Support\*

Webserver

OS

Sun T5440

4x 1.6 T2 Plus

100,209

176,500

133,000

95,000

Java WebServer

Solaris

HP DL370 G6

2x 3.2 W5580

83,073

117,120

142,080

76,352

Rock

RedHat
Linux

HP DL585 G5

4x 3.11 O8393

71,629

117,504

123,072

56,320

Rock

RedHat
Linux

HP DL580 G5

4x 2.66 X7460

50,013

97,632

69,600

40,800

Rock

RedHat
Linux

\* Banking - SPECweb2005-Banking
   Ecomm - SPECweb2005-Ecommerce
   Support - SPECweb2005-Support

Results and Configuration Summary

Hardware Configuration:

  1 Sun SPARC Enterprise T5440 with

  • 4 x UltraSPARC T2 Processor 8 core, 64 threads, 1.6 GHz
  • 254 GB memory
  • 6 x 4Gb PCI Express 8-Port Host Adapter (SG-XPCIE8SAS-E-Z)
  • 1 x Sun Storage F5100 Flash Array (TA5100RASA4-80AA)
  • 1 x Sun Storage F5100 Flash Array (TA5100RASA4-40AA)

Server Software Configuration:

  • Solaris 10 10/09
  • JAVA System Web Server 7.0 Update 5
  • Java Hotspot™ Server VM

Network configuration:

  • 1 x Arista DCS-7124s 24-10GbE port  switch
  • 1 x Cisco 2970 series (WS-C2970G-24TS-E) switch for the three 1 GbE networks

Back-end Simulator:

  1 Sun Fire X4270 with

  • 2 x 2.93 GHz Intel X5570 Quad core
  • 48GB memory
  • Solaris 10 10/09
  • JSWS 7.0 Update 5
  • Java Hotspot™ Server VM

Clients:

  8 Sun Blade™ T6320

  • 1 x 1.417 GHz UltraSPARC-T2
  • 64 GB memory
  • Solaris 10 5/09
  • Java Hotspot™ Server VM

  8 Sun Blade™ 6270

  • 2 x 2.93 GHz Intel X5570 Quad core
  • 36 GB memory
  • Solaris 10 5/09
  • Java Hotspot™ Server VM

Benchmark Description

SPECweb2005, successor to SPECweb99 and SPECweb99_SSL, is an industry standard benchmark for evaluating Web Server performance developed by SPEC. The benchmark simulates multiple user sessions accessing a Web Server and generating static and dynamic HTTP requests. The major features of SPECweb2005 are:

  • Measures simultaneous user sessions
  • Dynamic content: currently PHP and JSP implementations
  • Page images requested using 2 parallel HTTP connections
  • Multiple, standardized workloads: Banking (HTTPS), E-commerce (HTTP and HTTPS), and Support (HTTP)
  • Simulates browser caching effects
  • File accesses more accurately simulate today's disk access patterns

Key Points and Best Practices

  • The server was divided into four Solaris Containers and a single web server instance was executed in each container.
  • Four processor sets were created (with varying numbers of threads depending on the workload) to run the web server in. This was done to reduce memory access latency using the physical memory closest to the processor.  All interrupts were run on the remaining threads.
  • Each web server is executed in the FX scheduling class to improve performance by reducing the frequency of context switches.
  • Two Sun Storage F5100 Flash Arrays (holding the target file set and logs) were shared by the four containers  for fast data retrieval.   
  • Use of Solaris Containers highlights the consolidation of multiple web serving environments on a single server.
  • Use of the Sun Ext I/O Expansion unit and Sun Storage F5100 Flash Arrays highlight the expandability of the server.

    Disclosure Statement

    Sun SPARC Enterprise T5440 (8 cores, 1 chip) 100209 SPECweb2005, was submitted to SPEC for review on October 13, 2009.  HP ProLiant DL370 G6 (8 cores, 2 chips) 83,073 SPECweb2005. HP ProLiant DL585 G5 (16 cores, 4 chips) 71,629 SPECweb2005. HP ProLiant DL580 G5 (24 cores, 4 chips) 50,013 SPECweb2005. SPEC, SPECweb reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of Oct 10, 2009.

    Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance

    The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology, the Sun Storage F5100 flash array, has produced World Record Performance on PeopleSoft Payroll (North America) 9.0 benchmark.

    • A Sun SPARC Enterprise M4000 server with four new 2.53GHz SPARC64 VII processors and a Sun Storage F5100 flash array is 33% faster than the HP rx6600 (4 x 1.6GHz Itanium2 processors) on the PeopleSoft Payroll (NA) 9.0 benchmark. The Sun solution used the Oracle 11g database running on Solaris 10.

    • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and the Sun Storage F5100 flash array is 35% faster than the 2027 MIPs IBM Z990 (6 Z990 Gen1 processors) on the PeopleSoft Payroll (NA) 9.0 benchmark with Oracle 11g database running on Solaris 10. The IBM result used IBM DB2 for Z/OS 8.1 for the database.

    • The Sun SPARC Enterprise M4000 server with four 2.53GHz SPARC64 VII processors and a Sun Storage F5100 flash array processed 250K employee payroll checks using PeopleSoft Payroll (NA) 9.0 and Oracle 11g running on Solaris 10. Four different execution strategies were run with an average improvement of 25% compared to HP's results run on the rx6600. Sun achieved these results with 8 concurrent jobs using only 25% CPU utilization while HP required 16 concurrent jobs with a 88% CPU utilization.

    • The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology processed 8 Sequential Jobs and single run control with a total time of 527.85 mins, an improvement of 20% compared to HPs time of 633.09 mins.

    • The Sun SPARC Enterprise M4000 server combined with Sun FlashFire technology demonstrated a speedup of 81% going from 1 to 8 streams on the PeopleSoft Payroll (NA) 9.0 benchmark using the Oracle 11g database.

    • The Sun FlashFire technology dramatically improves IO performance for the PeopleSoft Payroll benchmark with significant performance boost over best optimized FC disks (60+).

    • The Sun Storage F5100 Flash Array is a high performance high density solid state flash array which provides a read latency of only 0.5 msec which is about 10 times faster than the normal disk latencies 5 msec measured on this benchmark.

    • Sun estimates that the MIPS rating for a Sun SPARC Enterprise M4000 server is over 2742 MIPS.

    Performance Landscape

    250K Employees

    System Processor OS/Database Time in Minutes Version
    Run 1 Run 2 Run 3
    Sun M4000 4x 2.53GHz SPARC64 VII Solaris/Oracle 11g 79.35 288.47 527.85 9.0
    HP rx6600 4x 1.6GHz Itanium2 HP-UX/Oracle 11g 81.17 350.16 633.25 9.0
    IBM Z990 6x Gen1 2027 MIPS Z/OS /DB2 107.34 328.66 544.80 9.0
    HP rx6600 4x 1.6GHz Itanium2 HP-UX/Oracle 11g 105.70 369.59 633.09 9.0

    Note: IBM benchmark documents show that 6 Gen1 procs is 2027 mips. 13 Gen1 processors were in this config but only 6 were available for testing.

    500K Employees

    System Processor OS/Database Time in Minutes Version
    Run 1 Run 2 Run 3
    HP rx7640 8x 1.6GHz Itanium2 HP-UX/Oracle 11g 133.63 712.72 1665.01 9.0

    Results and Configuration Summary

    Hardware Configuration:

      1 x Sun SPARC Enterprise M4000 (4 x 2.53 GHz/32GB)
      1 x Sun Storage F5100 Flash Array (40 x 24GB FMODs)
      1 x Sun Storage J4200 (12 x 450GB SAS 15K RPM)

    Software Configuration:

      Solaris 10 5/09
      Oracle PeopleSoft HCM 9.0
      Oracle PeopleSoft Enterprise (PeopleTools) 8.49
      Micro Focus Server Express 4.0 SP4
      Oracle RDBMS 11.1.0.7 64-bit
      HP's Mercury Interactive QuickTest Professional 9.0

    Benchmark Description

    The PeopleSoft 9.0 Payroll (North America) benchmark is a performance benchmark established by PeopleSoft to demonstrate system performance for a range of processing volumes in a specific configuration. This information may be used to determine the software, hardware, and network configurations necessary to support processing volumes. This workload represents large batch runs typical of OLTP workloads during a mass update.

    To measure five application business process run times for a database representing large organization. The five processes are:

    • Paysheet Creation: generates payroll data worksheet for employees, consisting of std payroll information for each employee for given pay cycle.

    • Payroll Calculation: Looks at Paysheets and calculates checks for those employees.

    • Payroll Confirmation: Takes information generated by Payroll Calculation and updates the employees' balances with the calculated amounts.

    • Print Advice forms: The process takes the information generated by payroll Calculations and Confirmation and produces an Advice for each employee to report Earnings, Taxes, Deduction, etc.

    • Create Direct Deposit File: The process takes information generated by above processes and produces an electronic transmittal file use to transfer payroll funds directly into an employee bank a/c.

    For the benchmark, we collect at least four data points with different number of job streams (parallel jobs). This batch benchmark allows a maximum of eight job streams to be configured to run in parallel.

    Key Points and Best Practices

    See Also

    Disclosure Statement

    Oracle PeopleSoft Payroll (NA) 9.0 benchmark, Sun M4000 (4 2.53GHz SPARC64) 79.35 min, IBM Z990 (6 gen1) 107.34 min, HP rx6600 (4 1.6GHz Itanium2) 105.70 min, www.oracle.com/apps_benchmark/html/white-papers-peoplesoft.html Results 10/13/2009.

    Monday Oct 12, 2009

    MCAE MCS/NASTRAN faster on Sun F5100 and Fire X4270

    Significance of Results

    The Sun Storage F5100 Flash Array can double performance over internal hard disk drives as shown by the I/O intensive MSC/Nastran MCAE application MDR3 benchmark tests on a Sun Fire X4270 server.

    The MD Nastran MDR3 benchmarks were run on a single Sun Fire X4270 server. The I/O intensive test cases were run at different core levels from one up to the maximum of 8 available cores in SMP mode.

    The MSC/Nastran MD 2008 R3 module is an MCAE application based on the finite element method (FEA) of analysis. This computer based numerical method inherently involves a substantial I/O component. The purpose was to evaluate the performance of the Sun Storage F5100 Flash Array relative to high performance 15K RPM internal stripped HDDs.

    • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "xx0cmd2" test case by 107% in the 8-core server configuration.

    • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "xl0tdf1"test case by 85% in the 8-core server configuration.

    The MD Nastran MDR3 test suite was designed to include some very I/O intensive test cases albeit some are not very scalable. These cases are the called "xx0wmd0" and "xx0xst0". Both were run and results are presented using a single core server configuration.

    • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "xx0xst0"test case by 33% in the single-core server configuration.

    • The Sun Storage F5100 Flash Array outperformed the high performance 15K RPM SAS drives on the "xx0wmd0"test case by 20% in the single-core server configuration.

    Performance Landscape

    MD Nastran MDR3 Benchmark Tests

    Results in seconds

    Test Case DMP 4x15K RPM
    72 GB SAS HDD
    striped HW RAID0
    Sun F5100
    r/w buff 4096
    striped
    Sun F5100
    Performance
    Advantage
    xx0cmd2 8 959 463 107%
    xl0tdf1 8 1104 596 85%
    xx0xst0 1 1307 980 33%
    xx0wmd0 1 20250 16806 20%

    Results and Configuration Summary

    Hardware Configuration:
      Sun Fire X4270
        2 x 2.93 GHz QC Intel Xeon X5570 processors
        24 GB memory
        4 x 72 GB 15K RPM striped (HW RAID0) SAS disks
      Sun Storage F5100 Flash Array
        20 x 24 GB flash modules
        Intel controller

    Software Configuration:

      O/S: 64-bit SUSE Linux Enterprise Server 10 SP 2
      Application: MSC/NASTRAN MD 2008 R3
      Benchmark: MDR3 Benchmark Test Suite
      HP MPI: 02.03.00.00 [7585] Linux x86-64

    Benchmark Description

    The benchmark tests are representative of typical MSC/Nastran applications including both SMP and DMP runs involving linear statics, nonlinear statics, and natural frequency extraction.

    The MD (Multi Discipline) Nastran 2008 application performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior and/or deformations are concerned. The new release includes the MARC module for general purpose nonlinear analyses and the Dytran module that employs an explicit solver to analyze crash and high velocity impact conditions.

    Please go here for a more complete description of the tests.

    Key Points and Best Practices

    • Based on the maximum physical memory on a platform the user can stipulate the maximum portion of this memory that can be allocated to the Nastran job. This is done on the command line with the mem= option. On Linux based systems where the platform has a large amount of memory and where the model does not have large scratch I/O requirements the memory can be allocated to a tmpfs scratch space file system. On Solaris X64 systems advantage can be taken of ZFS for higher I/O performance.

    • The MD Nastran MDR3 test cases don't scale very well, a few not at all and the rest on up to 8 cores at best.

    • The test cases for the MSC/Nastran module all have a substantial I/O component where 15% to 25% of the total run times are associated with I/O activity (primarily scratch files). The required scratch file size ranges from less than 1 GB on up to about 140 GB. Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system, further enhanced as indicated here by implementing the Lustre based I/O system. High performance interconnects such as InfiniBand for inter node cluster message passing as well as I/O transfer from the storage system can also enhance performance substantially.

    See Also

    Disclosure Statement

    MSC.Software is a registered trademark of MSC. All information on the MSC.Software website is copyrighted. MD Nastran MDR3 results from http://www.mscsoftware.com and this report as of October 12, 2009.

    Sunday Oct 11, 2009

    1.6 Million 4K IOPS in 1RU on Sun Storage F5100 Flash Array

    The Sun Storage F5100 Flash Array is a high performance high density solid state flash array delivering over 1.6M IOPS (4K IO) and 12.8GB/sec throughput (1M reads). The Flash Array is designed to accelerate IO-intensive applications, such as databases, at a fraction of the power, space, and cost of traditional hard disk drives. It is based on enterprise-class SLC flash technology, with advanced wear-leveling, integrated backup protection, solid state robustness, and 3M hours MTBF reliability.

    • The Sun Storage F5100 Flash Array demonstrates breakthrough performance of 1.6M IOPS for 4K random reads
    • The Sun Storage F5100 Flash Array can also perform 1.2M IOPS for 4K random writes
    • The Sun Storage F5100 Flash Array has unprecedented throughput of 12.8 GB/sec.

    Performance Landscape

    Results were obtained using four hosts.

    Bandwidth and IOPS Measurements

    Test Flash Modules
    80 40 20 1
    Random 4K Read 1,591K IOPS 796K IOPS 397K IOPS 21K IOPS
    Maximum Delivered Random 4K Write 1,217K IOPS 610K IOPS 304K IOPS 15K IOPS
    Maximum Delivered 50-50 4K Read/Write 850K IOPS 426K IOPS 213K IOPS 11K IOPS
    Sequential Read (1M) 12.8 GB/sec 6.4 GB/sec 3.2 GB/sec 265 MB/sec
    Maximum Delivered Sequential Write (1M) 9.7 GB/sec 4.8 GB/sec 2.4 GB/sec 118 MB/sec

    Sustained Random 4K Write\*

    172K IOPS 9K IOPS

    (\*) Maximum Delivered values measured over a 1 minute period. Sustained write performance measured over a 1 hour period and differs from maximum delivered performance. Over time, wear-leveling and erase operations are required and impact write performance levels.

    Latency Measurements

    The Sun Storage F5100 Flash Array is tuned for 4 KB or larger IO sizes, the write service for IOs smaller than 4 KB can be 10 times more than shown in the table below. It should also be noted that the service times shown below are both the latency and the time to transfer the data. This becomes the dominant portion the the service time for IOs over 64 KB in size.

    Transfer Size Service Time (ms)
    Read Write
    4 KB 0.41 0.28
    8 KB 0.42 0.35
    16 KB 0.45 0.72
    32 KB 0.51 0.77
    64 KB 0.63 1.52
    128 KB 0.87 2.99
    256 KB 1.34 6.03
    512 KB 2.29 12.14
    1024 KB 4.19 23.79

    - Latencies are measured application latencies via vdbench tool.
    - Please note that the F5100 Flash Array is a 4KB sector device. Doing IOs of less than 4KB in size, or not aligned on 4KB boundaries, can result in a significant performance degradations on write operations.

    Results and Configuration Summary

    Storage:

      Sun Storage F5100 Flash Array
        80 Flash Modules
        16 ports
        4 domains (20 Flash Modules per)
        CAM zoning - 5 Flash Modules per port

    Servers:

      4 x Sun SPARC Enterprise T5240
      4 x 4 HBAs each, firmware version 01.27.03.00-IT

    Software:

      OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
      Vdbench 5.0
      Required Flash Array Patches SPARC, ses/sgen patch 138128-01 or later & mpt patch 141736-05
      Required Flash Array Patches x86, ses/sgen patch 138129-01 or later & mpt patch 141737-05

    Benchmark Description

    Sun measured a wide variety of IO performance metrics on the Sun Storage F5100 Flash Array using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

    Vdbench profile parmfile.txt here

    Vdbench is publicly available for download at: http://vdbench.org

    Key Points and Best Practices

    • Drive each Flash Modules with 32 outstanding IO as shown in the benchmark profile above.
    • LSI HBA firmware level should be at Phase 15 maxq.
    • LSI HBAs either use single port HBAs or only 1 port per HBA.
    • SPARC platforms will align with the 4K boundary size set by the Flash Array. x86/windows platforms don't necessarily have this alignment built in and can show lower performance

    See Also

    Disclosure Statement

    Sun Storage F5100 Flash Array delivered 1.6M 4K read IOPS and 12.8 GB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of September 12, 2009.

    TPC-C World Record Sun - Oracle

    TPC-C Sun SPARC Enterprise T5440 with Oracle RAC World Record Database Result

    Sun and Oracle demonstrate the World's fastest database performance. Sun Microsystems using 12 Sun SPARC Enterprise T5440 servers, 60 Sun Storage F5100 Flash arrays and Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning delivered a world-record TPC-C benchmark result.

    • The 12-node Sun SPARC Enterprise T5440 server cluster result delivered a world record TPC-C benchmark result of 7,646,486.7 tpmC and $2.36 $/tpmC (USD) using Oracle 11g R1 on a configuration available 3/19/10.

    • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the IBM Power 595 (5GHz) with IBM DB2 9.5 database by 26% and has 16% better price/performance on the TPC-C benchmark.

    • The complete Oracle/Sun solution used 10.7x better computational density than the IBM configuration (computational density = performance/rack).

    • The complete Oracle/Sun solution used 8 times fewer racks than the IBM configuration.

    • The complete Oracle/Sun solution has 5.9x better power/performance than the IBM configuration.

    • The 12-node Sun SPARC Enterprise T5440 server cluster beats the performance of the HP Superdome (1.6GHz Itanium2) by 87% and has 19% better price/performance on the TPC-C benchmark.

    • The Oracle/Sun solution utilized Sun FlashFire technology to deliver this result. The Sun Storage F5100 flash array was used for database storage.

    • Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning scales and effectively uses all of the nodes in this configuration to produce the world record performance.

    • This result showed Sun and Oracle's integrated hardware and software stacks provide industry-leading performance.

    More information on this benchmark will be posted in the next several days.

    Performance Landscape

    TPC-C results (sorted by tpmC, bigger is better)


    System
    tpmC Price/tpmC Avail Database Cluster Racks w/KtpmC
    12 x Sun SPARC Enterprise T5440 7,646,487 2.36 USD 03/19/10 Oracle 11g RAC Y 9 9.6
    IBM Power 595 6,085,166 2.81 USD 12/10/08 IBM DB2 9.5 N 76 56.4
    Bull Escala PL6460R 6,085,166 2.81 USD 12/15/08 IBM DB2 9.5 N 71 56.4
    HP Integrity Superdome 4,092,799 2.93 USD 08/06/07 Oracle 10g R2 N 46 to be added

    Avail - Availability date
    w/KtmpC - Watts per 1000 tpmC
    Racks - clients, servers, storage, infrastructure

    Results and Configuration Summary

    Hardware Configuration:

      9 racks used to hold

      Servers:
        12 x Sun SPARC Enterprise T5440
        4 x 1.6 GHz UltraSPARC T2 Plus
        512 GB memory
        10 GbE network for cluster
      Storage:
        60 x Sun Storage F5100 Flash Array
        61 x Sun Fire X4275, Comstar SAS target emulation
        24 x Sun StorageTek 6140 (16 x 300 GB SAS 15K RPM)
        6 x Sun Storage J4400
        3 x 80-port Brocade FC switches
      Clients:
        24 x Sun Fire X4170, each with
        2 x 2.53 GHz X5540
        48 GB memory

    Software Configuration:

      Solaris 10 10/09
      OpenSolaris 6/09 (COMSTAR) for Sun Fire X4275
      Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning
      Tuxedo CFS-R Tier 1
      Sun Web Server 7.0 Update 5

    Benchmark Description

    TPC-C is an OLTP system benchmark. It simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses.

    POSTSCRIPT: Here are some comments on IBM's grasping-at-straws-perf/core attacks on the TPC-C result:
    c0t0d0s0 blog: "IBM's Reaction to Sun&Oracle TPC-C

    See Also

    Disclosure Statement

    TPC Benchmark C, tpmC, and TPC-C are trademarks of the Transaction Performance Processing Council (TPC). 12-node Sun SPARC Enterprise T5440 Cluster (1.6GHz UltraSPARC T2 Plus, 4 processor) with Oracle 11g Enterprise Edition with Real Application Clusters and Partitioning, 7,646,486.7 tpmC, $2.36/tpmC. Available 3/19/10. IBM Power 595 (5GHz Power6, 32 chips, 64 cores, 128 threads) with IBM DB2 9.5, 6,085,166 tpmC, $2.81/tpmC, available 12/10/08. HP Integrity Superdome(1.6GHz Itanium2, 64 processors, 128 cores, 256 threads) with Oracle 10g Enterprise Edition, 4,092,799 tpmC, $2.93/tpmC. Available 8/06/07. Source: www.tpc.org, results as of 10/11/09.

    Thursday Jun 25, 2009

    Sun SSD Server Platform Bandwidth and IOPS (Speeds & Feeds)

    The Sun SSD (32 GB SATA 2.5" SSD) is the world's first enterprise-quality, open-standard Flash design. Built to an industry-standard JEDEC form factor, the module is being made available to developers and the OpenSolaris Storage community to foster Flash innovation. The Sun SSD delivers unprecedented IO performance, saves on power, space, and cooling, and will enable new levels of server optimization and datacenter efficiencies.

    • The Sun SSD demonstrated performance of 98K 4K random read IOPS on a Sun Fire X4450 server running the Solaris operating system.

    Performance Landscape

    Solaris 10 Results

    Test SSD Result
    X4450 T5240
    Random Read (4K) 98.4K IOPS 71.5K IOPS
    Random Write (4K) 31.8K IOPS 14.4K IOPS
    50-50 Read/Write (4K) 14.9K IOPS 15.7K IOPS
    Sequential Read (MB/sec) 764 MB/sec 1012 MB/sec
    Sequential Write (MB/sec) 376 MB/sec 531 MB/sec

    Results and Configuration Summary

    Storage:

      4 x Sun SSD
      32 GB SATA 2.5" SSD (24 GB usable)
      2.5in drive form factor

    Servers:

      Sun SPARC Enterprise T5240 - 4 internal drive slots used (LSI driver)
      Sun Fire X4450 - 4 internal drive slots used (LSI driver)

    Software:

      OpenSolaris 2009.06 or Solaris 10 10/09 (MPT driver enhancements)
      Vdbench 5.0

    Benchmark Description

    Sun measured a wide variety of IO performance metrics on the Sun SSD using Vdbench 5.0 measuring 100% Random Read, 100% Random Write, 100% Sequential Read, 100% Sequential Write, and 50-50 read/write. This demonstrates the maximum performance and throughput of the storage system.

    Vdbench profile:

      wd=wm_80dr,sd=sd\*,readpct=0,rhpct=0,seekpct=100
      wd=ws_80dr,sd=sd\*,readpct=0,rhpct=0,seekpct=0
      wd=rm_80dr,sd=(sd1-sd80),readpct=100,rhpct=0,seekpct=100
      wd=rs_80dr,sd=(sd1-sd80),readpct=100,rhpct=0,seekpct=0
      wd=rwm_80dr,sd=sd\*,readpct=50,rhpct=0,seekpct=100
      rd=default
      ###Random Read and writes tests varying transfer size
      rd=default,el=30m,in=6,forx=(4K),forth=(32),io=max,pause=20
      rd=run1_rm_80dr,wd=rm_80dr
      rd=run2_wm_80dr,wd=wm_80dr
      rd=run3_rwm_80dr,wd=rwm_80dr
      ###Sequential read and Write tests varying transfer size
      rd=default,el=30m,in=6,forx=(512k),forth=(32),io=max,pause=20
      rd=run4_rs_80dr,wd=rs_80dr
      rd=run5_ws_80dr,wd=ws_80dr
    Vdbench is publicly available for download at: http://vdbench.org

    Key Points and Best Practices

    • All measurements were done with the internal HBA and not the internal RAID.

    See Also

    Disclosure Statement

    Sun SSD delivered 71.5K 4K read IOPS and 1012 MB/sec sequential read. Vdbench 5.0 (http://vdbench.org) was used for the test. Results as of June 17, 2009.

    Friday Jun 19, 2009

    SSDs in HPC: Reducing the I/O Bottleneck BluePrint Best Practices

    High-Performance Computing (HPC) applications can be dramatically increased by simply using SSDs instead of traditional hard drives. To read about these findings see the Sun BluePrint by Larry McIntosh and Michael Burke, called "Solid State Drives in HPC: Reducing the I/O Bottleneck".

    There was a BestPerf blog posting on the NASTRAN/SSD results at:
    http://blogs.sun.com/BestPerf/entry/sun_fire_x2270_msc_nastran

    Our BestPerf authors will blog about more of their recent benchmarks in the coming weeks.

    Tuesday Jun 16, 2009

    Sun Fire X2270 MSC/Nastran Vendor_2008 Benchmarks

    Significance of Results

    The I/O intensive MSC/Nastran Vendor_2008 benchmark test suite was used to compare the performance on a Sun Fire X2270 server when using SSDs internally instead of HDDs.

    The effect on performance from increasing memory to augment I/O caching was also examined. The Sun Fire X2270 server was equipped with Intel QC Xeon X5570 processors (Nehalem). The positive effect of adding memory to increase I/O caching is offset to some degree by the reduction in memory frequency with additional DIMMs in the bays of each memory channel on each cpu socket for these Nehalem processors.

    • SSDs can significantly improve NASTRAN performance especially on runs with larger core counts.
    • Additional memory in the server can also increase performance, however in some systems additional memory can decrease memory GHz so this may offset the benefits of increased capacity.
    • If SSDs are not used striped disks will often improve performance of IO-bound MCAE applications.
    • To obtain the highest performance it is recommended that SSDs be used and servers be configured with the largest memory possible without decreasing memory GHz. One should always look at the workload characteristics and compare against this benchmark to correctly set expectations.

    SSD vs. HDD Performance

    The performance of two striped 30GB SSDs was compared to two striped 7200 rpm 500GB SATA drives on a Sun Fire X2270 server.

    • At the 8-core level (maximum cores for a single node) SSDs were 2.2x faster for the larger xxocmd2 and the smaller xlotdf1 cases.
    • For 1-core results SSDs are up to 3% faster.
    • On the smaller mdomdf1 test case there was no increase in performance on the 1-, 2-, and 4-cores configurations.

    Performance Enhancement with I/O Memory Caching

    Performance for Nastran can often be increased by additional memory to provide additional in-core space to cache I/O and thereby reduce the IO demands.

    The main memory was doubled from 24GB to 48GB. At the 24GB level one 4GB DIMM was placed in the first bay of each of the 3 CPU memory channels on each of the two CPU sockets on the Sun Fire X2270 platform. This configuration allows a memory frequency of 1333MHz.

    At the 48GB level a second 4GB DIMM was placed in the second bay of each of the 3 CPU memory channels on each socket. This reduces the memory frequency to 1066MHz.

    Adding Memory With HDDs (SATA)

    • The additional server memory increased the performance when running with the slower SATA drives at the higher core levels (e.g. 4- & 8-cores on a single node)
    • The larger xxocmd2 case was 42% faster and the smaller xlotdf1 case was 32% faster at the maximum 8-core level on a single system.
    • The special I/O intensive getrag case was 8% faster at the 1-core level.

    Adding Memory With SDDs

    • At the maximum 8-core level (for a single node) the larger xxocmd2 case was 47% faster in overall run time.
    • The effects were much smaller at lower core counts and in the tests at the 1-core level most test cases ran from 5% to 14% slower with the slower CPU memory frequency dominating over the added in-core space available for I/O caching vs. direct transfer to SSD.
    • Only the special I/O intensive getrag case was an exception running 6% faster at the 1-core level.

    Increasing performance with Two Striped (SATA) Drives

    The performance of multiple striped drives was also compared to single drive. The study compared two striped internal 7200 rpm 500GB SATA drives to a singe single internal SATA drive.

    • On a single node with 8 cores, the largest test xx0cmd2 was 40% faster, a smaller test case xl0tdf1 was 33% faster and even the smallest test case mdomdf1 case was 12% faster.

    • On 1-core the added boost in performance with striped disks was from 4% to 13% on the various test cases.

    • One 1-core the special I/O-intensive test case getrag was 29% faster.

    Performance Landscape

    Times in table are elapsed time (sec).


    MSC/Nastran Vendor_2008 Benchmark Test Suite

    Test Cores Sun Fire X2270
    2 x X5570 QC 2.93 GHz
    2 x 7200 RPM SATA HDDs
    Sun Fire X2270
    2 x X5570 QC 2.93 GHz
    2 x SSDs
    48 GB
    1067MHz
    24 GB
    2 SATA
    1333MHz
    24 GB
    1 SATA
    1333MHz
    Ratio (2xSATA):
    48GB/
    24GB
    Ratio:
    2xSATA/
    1xSATA
    48 GB
    1067MHz
    24 GB
    1333MHz
    Ratio:
    48GB/
    24GB
    Ratio (24GB):
    2xSATA/
    2xSSD

    vlosst1 1 133 127 134 1.05 0.95 133 126 1.05 1.01

    xxocmd2 1
    2
    4
    8
    946
    622
    466
    1049
    895
    614
    631
    1554
    978
    703
    991
    2590
    1.06
    1.01
    0.74
    0.68
    0.87
    0.87
    0.64
    0.60
    947
    600
    426
    381
    884
    583
    404
    711
    1.07
    1.03
    1.05
    0.53
    1.01
    1.05
    1.56
    2.18

    xlotdf1 1
    2
    4
    8
    2226
    1307
    858
    912
    2000
    1240
    833
    1562
    2081
    1308
    1030
    2336
    1.11
    1.05
    1.03
    0.58
    0.96
    0.95
    0.81
    0.67
    2214
    1315
    744
    674
    1939
    1189
    751
    712
    1.14
    1.10
    0.99
    0.95
    1.03
    1.04
    1.11
    2.19

    xloimf1 1 1216 1151 1236 1.06 0.93 1228 1290 0.95 0.89

    mdomdf1 1
    2
    4
    987
    524
    270
    913
    485
    237
    983
    520
    269
    1.08
    1.08
    1.14
    0.93
    0.93
    0.88
    987
    524
    270
    911
    484
    250
    1.08
    1.08
    1.08
    1.00
    1.00
    0.95

    Sol400_1
    (xl1fn40_1)
    1 2555 2479 2674 1.03 0.93 2549 2402 1.06 1.03

    Sol400_S
    (xl1fn40_S)
    1 2450 2302 2481 1.06 0.93 2449 2262 1.08 1.02

    getrag
    (xx0xst0)
    1 778 843 1178 0.92 0.71 771 817 0.94 1.03

    Results and Configuration Summary

    Hardware Configuration:
      Sun Fire X2270
        1 2-socket rack mounted server
        2 x 2.93 GHz QC Intel Xeon X5570 processors
        2 x internal striped SSDs
        2 x internal striped 7200 rpm 500GB SATA drives

    Software Configuration:

      O/S: Linux 64-bit SUSE SLES 10 SP 2
      Application: MSC/NASTRAN MD 2008
      Benchmark: MSC/NASTRAN Vendor_2008 Benchmark Test Suite
      HP MPI: 02.03.00.00 [7585] Linux x86-64
      Voltaire OFED-5.1.3.1_5 GridStack for SLES 10

    Benchmark Description

    The benchmark tests are representative of typical MSC/Nastran applications including both SMP and DMP runs involving linear statics, nonlinear statics, and natural frequency extraction.

    The MD (Multi Discipline) Nastran 2008 application performs both structural (stress) analysis and thermal analysis. These analyses may be either static or transient dynamic and can be linear or nonlinear as far as material behavior and/or deformations are concerned. The new release includes the MARC module for general purpose nonlinear analyses and the Dytran module that employs an explicit solver to analyze crash and high velocity impact conditions.

    • As of the Summer '08 there is now an official Solaris X64 version of the MD Nastran 2008 system that is certified and maintained.
    • The memory requirements for the test cases in the new MSC/Nastran Vendor 2008 benchmark test suite range from a few hundred megabytes to no more than 5 GB.

    Please go here for a more complete description of the tests.

    Key Points and Best Practices

    For more on Best Practices of SSD on HPC applications also see the Sun Blueprint:
    http://wikis.sun.com/display/BluePrints/Solid+State+Drives+in+HPC+-+Reducing+the+IO+Bottleneck

    Additional information on the MSC/Nastran Vendor 2008 benchmark test suite.

    • Based on the maximum physical memory on a platform the user can stipulate the maximum portion of this memory that can be allocated to the Nastran job. This is done on the command line with the mem= option. On Linux based systems where the platform has a large amount of memory and where the model does not have large scratch I/O requirements the memory can be allocated to a tmpfs scratch space file system. On Solaris X64 systems advantage can be taken of ZFS for higher I/O performance.

    • The MSC/Nastran Vendor 2008 test cases don't scale very well, a few not at all and the rest on up to 8 cores at best.

    • The test cases for the MSC/Nastran module all have a substantial I/O component where 15% to 25% of the total run times are associated with I/O activity (primarily scratch files). The required scratch file size ranges from less than 1 GB on up to about 140 GB. Performance will be enhanced by using the fastest available drives and striping together more than one of them or using a high performance disk storage system, further enhanced as indicated here by implementing the Lustre based I/O system. High performance interconnects such as Infiniband for inter node cluster message passing as well as I/O transfer from the storage system can also enhance performance substantially.

    See Also

    Disclosure Statement

    MSC.Software is a registered trademark of MSC. All information on the MSC.Software website is copyrighted. MSC/Nastran Vendor 2008 results from http://www.mscsoftware.com and this report as of June 9, 2009.

    About

    BestPerf is the source of Oracle performance expertise. In this blog, Oracle's Strategic Applications Engineering group explores Oracle's performance results and shares best practices learned from working on Enterprise-wide Applications.

    Index Pages
    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today