Tuesday Jun 29, 2010

Identifying Ideal Oracle Database Objects for Flash Storage and Accelerators

The Sun Storage F5100 Flash Array and Sun Flash Accelerator F20 PCIe Card help accelerate I/O bound applications such as databases. The following are some of the guidelines to identify Oracle database objects that can benefit by using the flash storage. Even though the title explicitly states "Oracle", some of these guidelines are applicable to other databases and non-database products. Exercise discretion, evaluate and experiment before implementing these recommendations as they are.

  • Heavily used database tables and indexes are ideal for flash storage

    • - The database workloads with no I/O bottlenecks may not show significant performance gains
    • - The database workloads with severe I/O bottlenecks can fully realize the benefits of flash devices

      • Top 5 Timed Foreground Events section in any AWR report that was collected on the target database system is useful in finding whether disk I/O is a bottleneck

        • Large number of Waits and the large amount of time in DB spent waiting for some blocked resource under User I/O Wait Class is an indication of I/O contention on the system
  • Identify the I/O intensive tables and indexes in a database with the help of Oracle Enterprise Manager Database Control, a web-based tool for managing Oracle database(s)

    • - The "Performance" page in OEM Database Control helps you quickly identify and analyze performance problems
    • - Historical and the real-time database activity can be viewed from the "performance" page.
      • The same page also provides information about the top resource consuming database objects
  • An alternate way to identify the I/O intensive objects in a database is to analyze the AWR reports that are generated over a period of time especially when the database is busy

    • - Scan through the SQL ordered by .. tables in each AWR report
    • - Look for the top INSERT & UPDATE statements with more elapsed and DB times
      • The database tables that are updated frequently & repeatedly, along with the indexes created on such tables are good candidates for the flash devices

    • - SQL ordered by Reads is useful in identifying the database tables with large number of physical reads
      • The database table(s) from which large amounts of data is read/fetched from physical disk(s) are also good candidates for the flash devices

        • To identify I/O intensive indexes, look through the explain plans of the top SQLs that are sorted by Physical Reads

  • Examine the File IO Stats section in any AWR report that was collected on the target database system

    • - Consider moving the database files with heavy reads, writes and relatively high average buffer wait time to flash volumes
  • Examine Segments by Physical Reads, Segments by Physical Writes and Segments by Buffer Busy Waits sections in AWR report

    • - The database tables and indexes with large number of physical reads, physical writes and buffer busy waits may benefit from the flash acceleration
  • Sun flash storage may not be ideal for storing Oracle redo logs

    • - Sun Flash Modules (FMOD) in F5100 array and F20 Flash Accelerator Card are optimized for 4K sector size

        A redo log write that is not aligned with the beginning of the 4K physical sector results in a significant performance degradation

    • - In general, Oracle redo log files default to a block size that is equal to the physical sector size of the disk, which is typically 512 bytes

      • Majority of the recent Oracle Database platforms detect the 4K sector size on Sun flash devices
      • Oracle database automatically creates redo log files with a 4K block size on file systems created on Sun flash devices
        • However with a block size of 4K for the redo logs, there will be significant increase in redo wastage that may offset expected performance gains

F5100 Flash Storage and F20 PCIe Flash Accelerator Card as Oracle Database Smart Flash Cache

In addition to the I/O intensive database objects, customers running Oracle 11g Release 2 or later versions have the flexibility of using flash devices to turn on the "Database Smart Flash Cache" feature to reduce physical disk I/O. The Database Smart Flash Cache is a transparent extension of the database buffer cache using flash storage technology. The flash storage acts as a Level 2 cache to the (Level 1) SGA. Database Smart Flash Cache can significantly improve the performance of Oracle databases by reducing the amount of disk I/O at a much lower cost than adding an equivalent amount of RAM.

F20 Flash Accelerator offers an additional benefit - since it is a PCIe card, the I/O operations bypass disk controller overhead.

The database flash cache can be enabled by setting appropriate values to the following Oracle database parameters.


	db_flash_cache_file
	db_flash_cache_size

Check Oracle Database Administrator's Guide 11g Release 2 (11.2) : Configuring Database Smart Flash Cache documentation for the step-by-step instructions to configure Database Smart Flash Cache on flash devices.

Wednesday Jan 20, 2010

PeopleSoft NA Payroll 240K EE Benchmark with 16 Job Streams : Another Home Run for Sun

Poor Steve A.[1] ... This entry is not about Steve A. though. It is about the new PeopleSoft NA Payroll benchmark result that Sun published today.

First things first. Here is the direct URL to our latest benchmark results:

        PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (16 job streams[2] -- simply referred as 'stream' hereonwards)

The summary of the benchmark test results is shown below only for the 16 stream benchmarks. These numbers were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the results and the actual numbers that are of interest to the customers. The results in the following table are sorted by the hourly throughput (payments/hour) in the descending order. The goal is to achieve as much hourly throughput as possible. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 240,000 & Number of payments: 360,000
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 5/09 1x Sun SPARC Enterprise M4000 with 4 x 2.53 GHz SPARC64-VII Quad-Core processors and 32 GB memory
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes
1 x Sun Storage J4200 Array for redo logs
16 43.78 493,376
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
16 68.07 317,320

This is all public information. Feel free to compare the hardware configurations and the data presented in both of the rows and draw your own conclusions. Since both Sun and HP used the same benchmark toolkit, workload and ran the benchmark with the same number of job streams, comparison should be pretty straight forward.

If you want to compare the 8 stream results, check the other blog entry: PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise. Sun used the same hardware to run both benchmark tests with 8 and 16 streams respectively. We could have gotten away with 20+ Flash Modules (FMODs), but we want to keep the benchmark environment consistent with our prior benchmark effort around the same benchmark workload with 8 job streams. Due to the same hardware setup, now we can easily demonstrate the advantage of parallelism (simply by comparing the test results from 8 and 16 stream benchmarks) and how resilient and scalable the F5100 Flash array is.

Our benchmarks showed an improvement of ~55% in overall throughput when the number of job streams were increased from 8 to 16. Also our 16 stream results showed ~55% improvement in overall throughput over HP's published results with the same number of streams at a maximum average CPU utilization of 45% compared to HP's maximum average CPU utilization of 89%. The half populated Sun Storage F5100 Flash Array played the key role in both of those benchmark efforts by demonstrating superior I/O performance over the traditional disk based arrays.

Before concluding, I would like to highlight a few known facts (just for the benefit of those people who may fall for the PR trickery):

  1. 8 job streams != 16 job streams. In other words, the results from an 8 stream effort is not comparable to that of a 16 stream result.
  2. The throughput should go up with increased number of job streams [ only up to some extent -- do not forget that there will be a saturation point for everything ]. For example, the throughput with 16 streams might be higher compared to the 8 stream throughput.
  3. The Law of Diminishing Returns applies to the software world too, not just for the economics. So, there is no guarantee that the throughput will be much better with 24 or 32 job streams.

Other blog posts and documents of interest:

  1. Best Practices for Oracle PeopleSoft Enterprise Payroll for North America using the Sun Storage F5100 Flash Array or Sun Flash Accelerator F20 PCIe Card
  2. PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (8 streams benchmark white paper)
  3. PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise
  4. App benchmarks, incorrect conclusions and the Sun Storage F5100
  5. Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance
































Notes:

[1] Steve A. tried so hard and his best to make everyone else believe that HP's 16 job stream NA Payroll 240K EE benchmark results are on par with Sun's 8 stream benchmark results. Apparently Steve A. failed and gave up after we showed the world a few screenshots from a published and eventually withdrawn benchmark [ by HP ]. You can read all his arguments, comparisons etc., in the comments section of my other blog entry PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise as well as in Joerg Moellenkamp's blog entries around the same topic.

[2] In PeopleSoft terminology, a job stream is something that is equivalent to a thread.

Tuesday Nov 10, 2009

PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise

During the "Sun day" keynote at OOW 09, John Fowler stated that we are #1 in PeopleSoft North American Payroll performance. Later Vince Carbone from our Performance Technologies group went on comparing our benchmark numbers with HP's and IBM's in BestPerf's group blog at Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance. Meanwhile Jeorg Moellenkamp had been clarifying few things in his blog at App benchmarks, incorrect conclusions and the Sun Storage F5100. Interestingly it all happened while we have no concrete evidence in our hands to show to the outside world. We got our benchmark results validated right before the Oracle OpenWorld, which gave us the ability to speak about it publicly [ and we used it to the extent we could use ]. However Oracle folks were busy with their scheduled tasks for OOW 09 and couldn't work on the benchmark results white paper until now. Finally the white paper with the NA Payroll benchmark results is available on Oracle Applications benchmark web site. Here is the URL:

        PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000

Once again the summary of results is shown below but in a slightly different format. These numbers were extracted from the very first page of the benchmark results white papers where PeopleSoft usually highlights the significance of the results and the actual numbers that they are interested in. The results are sorted by the hourly throughput (payments/hour) in the descending order. The goal is to achieve as much hourly throughput as possible. Since there is one 16 stream result as well in the following table, exercise caution when comparing 8 stream results with 16 stream results. In general, 16 parallel job streams are supposed to yield better throughput when compared to 8 parallel job streams. Hence comparing a 16 stream number with an 8 stream number is not an exact apple-to-apple comparison. It is more like comparing an apple to another apple that is half in size. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 240,000 & Number of payments: 360,000
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 5/09 1x Sun SPARC Enterprise M4000 with 4 x 2.53 GHz SPARC64-VII Quad-Core processors and 32 GB memory
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes
1 x Sun Storage J4200 Array for redo logs
8 67.85 318,349
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
16 68.07 317,320
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
8 89.77 240,615\*
IBM z/OS 1 x IBM zSeries 990 model 2084-B16 with 313 Feature with 6 x IBM z990 Gen1 processors (populated: 13, used: 6) and 32 GB memory
1 x IBM TotalStorage DS8300 with dual 4-way processors
8 91.7 235,551

This is all public information -- so, feel free to draw your own conclusions. \*At this time of writing, HP's 8 stream results were pulled out of Oracle Applications benchmark web site for some reason I do not know why. Hopefully it will show up again on the same web site soon. If it doesn't re-appear even after a month, probably we can simply assume that the result is withdrawn.

As these benchmark results were already discussed by different people in different blogs, I have nothing much to add. The only thing that I want to highlight is that this particular workload is moderately CPU intensive, but very I/O bound. Hence the better the I/O sub-system, the better the performance. Vince provided an insight on Why Sun Storage F5100 is a good option for this workload, while Jignesh Shah from our ISV-Engineering organization focused on the performance of this benchmark workload with F20 PCIe Card.

Also when dealing with NA Payroll, it is very unlikely to achieve a nice out-of-the-box performance. It requires a lot of database tuning too. As the data sets are very large, we partitioned the data in some of the very hot objects and it showed good improvement in query response times. So if you are a PeopleSoft customer running Payroll application with millions of rows of non-partitioned data, consider partitioning the data. [Updated 11/30/09]We are currently working on a best practices blueprint document for PeopleSoft North American Payroll that presents a variety of tuning tips like these in addition to the recommended practices for F5100 flash array and flash accelerator F20 PCIe card. Stay tuned .. Sun published a best practices blueprint document with a variety of tuning tips like these in addition to the recommended practices for F5100 flash array and flash accelerator F20 PCIe card. You can download the blueprint from the following location:

    Best Practices for Oracle PeopleSoft Enterprise Payroll for North America using the Sun Storage F5100 Flash Array or Sun Flash Accelerator F20 PCIe Card

Related Blog Post:

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today