Monday Sep 24, 2012

E-Business Suite : Role of CHUNK_SIZE in Oracle Payroll

Different batch processes in Oracle Payroll flow have the ability to spawn multiple child processes (or threads) to complete the work in hand. The number of child processes to fork is controlled by the THREADS parameter in APPS.PAY_ACTION_PARAMETERS view.

THREADS parameter

The default value for THREADS parameter is 1, which is fine for a single-processor system but not optimal for the modern multi-core multi-processor systems. Setting the THREADS parameter to a value equal to or less than the total number of [virtual] processors available on the system may improve the performance of payroll processing. However on the down side, since multiple child processes operate against the same set of payroll tables in HR schema, database may experience undesired consequences such as buffer busy waits and index contention, which results in giving up some of the gains achieved by using multiple child processes/threads to process the work. Couple of other action parameters, CHUNK_SIZE and CHUNK_SHUFFLE, help alleviate the database contention.

eg.,

Set a value for THREADS parameter as shown below.

CONNECT APPS/APPS_PASSWORD

UPDATE PAY_ACTION_PARAMETERS
SET PARAMETER_VALUE = DESIRED_VALUE
WHERE PARAMETER_NAME = 'THREADS';

COMMIT;

(I am not aware of any maximum value for THREADS parameter)


CHUNK_SIZE parameter

The size of each commit unit for the batch process is controlled by the CHUNK_SIZE action parameter. In other words, chunking is the act of splitting the assignment actions into commit groups of desired size represented by the CHUNK_SIZE parameter. The default value is 20, and each thread processes one chunk at a time -- which means each child process inserts or processes 20 assignment actions at any time.

When multiple threads are configured, each thread picks up a chunk to process, completes the assignment actions and then picks up another chunk. This is repeated until all the chunks are exhausted.

It is possible to use different chunk sizes in different batch processes. During the initial phase of processing, CHUNK_SIZE number of assignment actions are inserted into relevant table(s). When multiple child processes are inserting data at the same time into the same set of tables, as explained earlier, database may experience contention. The default value of 20 is mostly optimal in such a case. Experiment with different values for the initial phase by +/-10 for CHUNK_SIZE parameter and observe the performance impact. A larger value may make sense during the main processing phase. Again experimentation is the key in finding the suitable value for your environment. Start with a large value such as 2000 for the chunk size, then increment or decrement the size by 500 at a time until an optimal value is found.

eg.,

Set a value for CHUNK_SIZE parameter as shown below.

CONNECT APPS/APPS_PASSWORD

UPDATE PAY_ACTION_PARAMETERS
SET PARAMETER_VALUE = DESIRED_VALUE
WHERE PARAMETER_NAME = 'CHUNK_SIZE';

COMMIT;

CHUNK_SIZE action parameter accepts a value that is as low as 1 or as high as 16000.


CHUNK SHUFFLE parameter

By default, chunks of assignment actions are processed sequentially by all threads - which may not be a good thing especially given that all child processes/threads performing similar actions against the same set of tables almost at the same time. By saying not a good thing, I mean to say that the default behavior leads to contention in the database (in data blocks, for example).

It is possible to relieve some of that database contention by randomizing the processing order of chunks of assignment actions. This behavior is controlled by the CHUNK SHUFFLE action parameter. Chunk processing is not randomized unless explicitly configured.

eg.,

Set chunk shuffling as shown below.

CONNECT APPS/APPS_PASSWORD

UPDATE PAY_ACTION_PARAMETERS
SET PARAMETER_VALUE = 'Y'
WHERE PARAMETER_NAME = 'CHUNK SHUFFLE';

COMMIT;

Finally I recommend checking the following document out for additional details and additional pay action tunable parameters that may speed up the processing of Oracle Payroll.
    My Oracle Support Doc ID: 226987.1 Oracle 11i & R12 Human Resources (HRMS) & Benefits (BEN) Tuning & System Health Checks

Also experiment with different combinations of parameters and values until the right set of action parameters and values are found for your deployment.

Wednesday Jun 16, 2010

PeopleSoft NA Payroll 500K EE Benchmark on Solaris : The Saga Continues ..

Few clarifications before we start.

Difference between 240K and 500K EE PeopleSoft NA Payroll benchmarks

Not too long ago Sun published PeopleSoft NA Payroll 240K EE benchmark results with 16 job streams and 8 job streams. First of all, I want to make sure everyone understands the fact that PeopleSoft NA Payroll 240K and 500K EE benchmarks are two completely different benchmarks. The 240K database model represents a large sized organization where as 500K database model represents an extra-large sized organization. Vendors [who benchmark] have the flexibility of configuring 8, 16, 24 or 32 parallel job streams (or threads) in those two benchmarks to parallellize the work being done.

Now that the clarifications are out of the way, here is the direct URL for the 500K Payroll benchmark results that Oracle|Sun published last week. (document will be updated shortly to fix the branding issues)

    PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M5000 (500K EE 32 job streams)

What's changed at Sun & Oracle?

The 500K payroll benchmark work was started few months ago when Sun is an independent entity. By the time the benchmark work is complete, Sun was part of Oracle Corporation. However it has no impact whatsoever on the way we have been operating and interacting with the PeopleSoft benchmark team for the past few years. We (Sun) still have to package all the benchmark results and submit for validation just like any other vendor. It is still the same laborious process that we have to go through from both ends of Oracle (that is, PeopleSoft & Sun). I just mentioned this to highlight Oracle's non-compromising nature on anything at any level in publishing quality benchmarks.

SUMMARY OF 500K NA PAYROLL BENCHMARK RESULTS

The following bar chart summarizes all the published benchmark results by different vendors. Each 3D bar on X-axis represent one vendor, and the Y-axis shows the throughput (#payments/hour) achieved by corresponding vendor. Actual throughput and the vendor name is also shown in each of the 3D bar for clarity. Common sense dictates that higher the throughput, the better it is.

The numbers in the following table were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the results and the actual numbers that are of interest to the customers. The results in the following table are sorted by the hourly throughput (payments/hour) in the descending order. The goal of this benchmark is to achieve as much hourly throughput as possible. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 500,480 & Number of payments: 750,720
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 10/09 1x Sun SPARC Enterprise M5000 with 8 x 2.53 GHz SPARC64 VII Quad-Core CPUs & 64G RAM
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes. Capacity: 960 GB
1 x Sun Storage 2510 Array for redo logs. Capacity: 272 GB. Total storage capacity: 1.2 TB
32 50.11 898,886
IBM z/OS 1.10 1 x IBM System z10 Enterprise Class Model 2097-709 with 8 x 4.4 GHz IBM System z10 Gen1 CPUs & 32G RAM
1 x IBM TotalStorage DS8300. Total capacity: 9 TB
8\* 58.96 763,962
HP HP-UX B.11.31 1 x HP Integrity rx7640 with 8 x 1.6 GHz Intel Itanium2 9150 Dual-Core CPUs & 64G RAM
1 x HP StorageWorks EVA 8100. Total capacity: 8 TB
32 96.17 468,370

This is all public information. Feel free to compare the hardware configurations and the data presented in all three rows and draw your own conclusions. Since all vendors used the same benchmark toolkit, comparisons should be pretty straight forward.

Sun Storage F5100 Flash Array, the differentiator

Of all these benchmark results, clearly the F5100 storage array is the key differentiator. The payroll workload is I/O intensive, and requires low latency storage for better throughput (it is implicit that less latency means less I/O waits).

There is a lot of noise from some of the outside blog readers (I do not know who those readers are or who they work for) when Sun published the very first NA Payroll 240K EE benchmark with eight job streams using an F5100 array that has 40 flash modules (FMOD). Few people thought it is necessary to have those many flash modules to get that kind of performance that Sun demonstrated in the benchmark. Now that we have the 500K benchmark result as well, I want to highlight another fact that it is the same F5100 that was used in all the three NA Payroll benchmarks that Sun published in the last 10 months. Even though other vendors increased the number of disk drives when moved from 240K to 500K EE benchmark environment, Sun hasn't increased the flash modules in F5100 -- the number of FMODs remained at 40 even in 500K EE benchmark. This fact implicitly suggests at least two things -- 1. F5100 array is resilient, scales and performs consistently even with increased load. 2. May be 40 flash modules are not needed in 240K EE Payroll benchmark. Hopefully this will silence those naysayers and critics now.

While we are on the same topic, the storage capacity in the other array that was used to store the redo logs was in fact reduced from 5.3 TB in a J4200 array that was used in 240K EE/16 job stream benchmark to 272 GB in a 2510 array that was used in 500K EE/32 job stream benchmark. Of course, in both cases, the redo logs consumed only 20 GB on disk - but since the arrays were connected to the database server, we have to report the total capacity of the array(s) whether it is being used or not.

Notes on CPU utilization and IBM's #job streams

Even though I highlighted the I/O factor in the above paragraph, it is hard to ignore the fact that the NA Payroll workload is CPU intensive too. Even when multiple job streams are configured, each stream runs as a single-thread process -- hence it is vital to have a server with powerful processors for better [overall] performance.

Observant readers might have noticed couple of interesting things.

  1. The maximum average CPU usage that Sun reported in 500K EE benchmark in any scenario by any process is only 43.99% (less than half of the total processing capacity)

    The reason is simple. The SUT, M5000, has eight quad-core processors and each core is capable of running two hardware threads in parallel. Hence there are 64 virtual CPUs on the system, and since we ran only 32 job streams, only half of the total available CPU power was in use.

    Customers in a similar situation have the flexibility to consolidate another workload onto the same system to take advantage of the available/remaining CPU cycles.

  2. IBM's 500K EE benchmark result is only with 8 job streams

    I do not know the exact reason - but if I have to speculate, it is as good as anyone's guess. Based on the benchmark results white paper, it appears that the z10 system (mainframe) has eight single core processors, and perhaps that is why they ran the whole benchmark with only eight job streams.

Also See:

Wednesday Jan 20, 2010

PeopleSoft NA Payroll 240K EE Benchmark with 16 Job Streams : Another Home Run for Sun

Poor Steve A.[1] ... This entry is not about Steve A. though. It is about the new PeopleSoft NA Payroll benchmark result that Sun published today.

First things first. Here is the direct URL to our latest benchmark results:

        PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (16 job streams[2] -- simply referred as 'stream' hereonwards)

The summary of the benchmark test results is shown below only for the 16 stream benchmarks. These numbers were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the results and the actual numbers that are of interest to the customers. The results in the following table are sorted by the hourly throughput (payments/hour) in the descending order. The goal is to achieve as much hourly throughput as possible. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 240,000 & Number of payments: 360,000
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 5/09 1x Sun SPARC Enterprise M4000 with 4 x 2.53 GHz SPARC64-VII Quad-Core processors and 32 GB memory
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes
1 x Sun Storage J4200 Array for redo logs
16 43.78 493,376
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
16 68.07 317,320

This is all public information. Feel free to compare the hardware configurations and the data presented in both of the rows and draw your own conclusions. Since both Sun and HP used the same benchmark toolkit, workload and ran the benchmark with the same number of job streams, comparison should be pretty straight forward.

If you want to compare the 8 stream results, check the other blog entry: PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise. Sun used the same hardware to run both benchmark tests with 8 and 16 streams respectively. We could have gotten away with 20+ Flash Modules (FMODs), but we want to keep the benchmark environment consistent with our prior benchmark effort around the same benchmark workload with 8 job streams. Due to the same hardware setup, now we can easily demonstrate the advantage of parallelism (simply by comparing the test results from 8 and 16 stream benchmarks) and how resilient and scalable the F5100 Flash array is.

Our benchmarks showed an improvement of ~55% in overall throughput when the number of job streams were increased from 8 to 16. Also our 16 stream results showed ~55% improvement in overall throughput over HP's published results with the same number of streams at a maximum average CPU utilization of 45% compared to HP's maximum average CPU utilization of 89%. The half populated Sun Storage F5100 Flash Array played the key role in both of those benchmark efforts by demonstrating superior I/O performance over the traditional disk based arrays.

Before concluding, I would like to highlight a few known facts (just for the benefit of those people who may fall for the PR trickery):

  1. 8 job streams != 16 job streams. In other words, the results from an 8 stream effort is not comparable to that of a 16 stream result.
  2. The throughput should go up with increased number of job streams [ only up to some extent -- do not forget that there will be a saturation point for everything ]. For example, the throughput with 16 streams might be higher compared to the 8 stream throughput.
  3. The Law of Diminishing Returns applies to the software world too, not just for the economics. So, there is no guarantee that the throughput will be much better with 24 or 32 job streams.

Other blog posts and documents of interest:

  1. Best Practices for Oracle PeopleSoft Enterprise Payroll for North America using the Sun Storage F5100 Flash Array or Sun Flash Accelerator F20 PCIe Card
  2. PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (8 streams benchmark white paper)
  3. PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise
  4. App benchmarks, incorrect conclusions and the Sun Storage F5100
  5. Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance
































Notes:

[1] Steve A. tried so hard and his best to make everyone else believe that HP's 16 job stream NA Payroll 240K EE benchmark results are on par with Sun's 8 stream benchmark results. Apparently Steve A. failed and gave up after we showed the world a few screenshots from a published and eventually withdrawn benchmark [ by HP ]. You can read all his arguments, comparisons etc., in the comments section of my other blog entry PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise as well as in Joerg Moellenkamp's blog entries around the same topic.

[2] In PeopleSoft terminology, a job stream is something that is equivalent to a thread.

Tuesday Nov 10, 2009

PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise

During the "Sun day" keynote at OOW 09, John Fowler stated that we are #1 in PeopleSoft North American Payroll performance. Later Vince Carbone from our Performance Technologies group went on comparing our benchmark numbers with HP's and IBM's in BestPerf's group blog at Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance. Meanwhile Jeorg Moellenkamp had been clarifying few things in his blog at App benchmarks, incorrect conclusions and the Sun Storage F5100. Interestingly it all happened while we have no concrete evidence in our hands to show to the outside world. We got our benchmark results validated right before the Oracle OpenWorld, which gave us the ability to speak about it publicly [ and we used it to the extent we could use ]. However Oracle folks were busy with their scheduled tasks for OOW 09 and couldn't work on the benchmark results white paper until now. Finally the white paper with the NA Payroll benchmark results is available on Oracle Applications benchmark web site. Here is the URL:

        PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000

Once again the summary of results is shown below but in a slightly different format. These numbers were extracted from the very first page of the benchmark results white papers where PeopleSoft usually highlights the significance of the results and the actual numbers that they are interested in. The results are sorted by the hourly throughput (payments/hour) in the descending order. The goal is to achieve as much hourly throughput as possible. Since there is one 16 stream result as well in the following table, exercise caution when comparing 8 stream results with 16 stream results. In general, 16 parallel job streams are supposed to yield better throughput when compared to 8 parallel job streams. Hence comparing a 16 stream number with an 8 stream number is not an exact apple-to-apple comparison. It is more like comparing an apple to another apple that is half in size. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 240,000 & Number of payments: 360,000
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 5/09 1x Sun SPARC Enterprise M4000 with 4 x 2.53 GHz SPARC64-VII Quad-Core processors and 32 GB memory
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes
1 x Sun Storage J4200 Array for redo logs
8 67.85 318,349
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
16 68.07 317,320
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
8 89.77 240,615\*
IBM z/OS 1 x IBM zSeries 990 model 2084-B16 with 313 Feature with 6 x IBM z990 Gen1 processors (populated: 13, used: 6) and 32 GB memory
1 x IBM TotalStorage DS8300 with dual 4-way processors
8 91.7 235,551

This is all public information -- so, feel free to draw your own conclusions. \*At this time of writing, HP's 8 stream results were pulled out of Oracle Applications benchmark web site for some reason I do not know why. Hopefully it will show up again on the same web site soon. If it doesn't re-appear even after a month, probably we can simply assume that the result is withdrawn.

As these benchmark results were already discussed by different people in different blogs, I have nothing much to add. The only thing that I want to highlight is that this particular workload is moderately CPU intensive, but very I/O bound. Hence the better the I/O sub-system, the better the performance. Vince provided an insight on Why Sun Storage F5100 is a good option for this workload, while Jignesh Shah from our ISV-Engineering organization focused on the performance of this benchmark workload with F20 PCIe Card.

Also when dealing with NA Payroll, it is very unlikely to achieve a nice out-of-the-box performance. It requires a lot of database tuning too. As the data sets are very large, we partitioned the data in some of the very hot objects and it showed good improvement in query response times. So if you are a PeopleSoft customer running Payroll application with millions of rows of non-partitioned data, consider partitioning the data. [Updated 11/30/09]We are currently working on a best practices blueprint document for PeopleSoft North American Payroll that presents a variety of tuning tips like these in addition to the recommended practices for F5100 flash array and flash accelerator F20 PCIe card. Stay tuned .. Sun published a best practices blueprint document with a variety of tuning tips like these in addition to the recommended practices for F5100 flash array and flash accelerator F20 PCIe card. You can download the blueprint from the following location:

    Best Practices for Oracle PeopleSoft Enterprise Payroll for North America using the Sun Storage F5100 Flash Array or Sun Flash Accelerator F20 PCIe Card

Related Blog Post:

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today