PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise

During the "Sun day" keynote at OOW 09, John Fowler stated that we are #1 in PeopleSoft North American Payroll performance. Later Vince Carbone from our Performance Technologies group went on comparing our benchmark numbers with HP's and IBM's in BestPerf's group blog at Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance. Meanwhile Jeorg Moellenkamp had been clarifying few things in his blog at App benchmarks, incorrect conclusions and the Sun Storage F5100. Interestingly it all happened while we have no concrete evidence in our hands to show to the outside world. We got our benchmark results validated right before the Oracle OpenWorld, which gave us the ability to speak about it publicly [ and we used it to the extent we could use ]. However Oracle folks were busy with their scheduled tasks for OOW 09 and couldn't work on the benchmark results white paper until now. Finally the white paper with the NA Payroll benchmark results is available on Oracle Applications benchmark web site. Here is the URL:

        PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000

Once again the summary of results is shown below but in a slightly different format. These numbers were extracted from the very first page of the benchmark results white papers where PeopleSoft usually highlights the significance of the results and the actual numbers that they are interested in. The results are sorted by the hourly throughput (payments/hour) in the descending order. The goal is to achieve as much hourly throughput as possible. Since there is one 16 stream result as well in the following table, exercise caution when comparing 8 stream results with 16 stream results. In general, 16 parallel job streams are supposed to yield better throughput when compared to 8 parallel job streams. Hence comparing a 16 stream number with an 8 stream number is not an exact apple-to-apple comparison. It is more like comparing an apple to another apple that is half in size. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 240,000 & Number of payments: 360,000
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 5/09 1x Sun SPARC Enterprise M4000 with 4 x 2.53 GHz SPARC64-VII Quad-Core processors and 32 GB memory
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes
1 x Sun Storage J4200 Array for redo logs
8 67.85 318,349
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
16 68.07 317,320
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
8 89.77 240,615\*
IBM z/OS 1 x IBM zSeries 990 model 2084-B16 with 313 Feature with 6 x IBM z990 Gen1 processors (populated: 13, used: 6) and 32 GB memory
1 x IBM TotalStorage DS8300 with dual 4-way processors
8 91.7 235,551

This is all public information -- so, feel free to draw your own conclusions. \*At this time of writing, HP's 8 stream results were pulled out of Oracle Applications benchmark web site for some reason I do not know why. Hopefully it will show up again on the same web site soon. If it doesn't re-appear even after a month, probably we can simply assume that the result is withdrawn.

As these benchmark results were already discussed by different people in different blogs, I have nothing much to add. The only thing that I want to highlight is that this particular workload is moderately CPU intensive, but very I/O bound. Hence the better the I/O sub-system, the better the performance. Vince provided an insight on Why Sun Storage F5100 is a good option for this workload, while Jignesh Shah from our ISV-Engineering organization focused on the performance of this benchmark workload with F20 PCIe Card.

Also when dealing with NA Payroll, it is very unlikely to achieve a nice out-of-the-box performance. It requires a lot of database tuning too. As the data sets are very large, we partitioned the data in some of the very hot objects and it showed good improvement in query response times. So if you are a PeopleSoft customer running Payroll application with millions of rows of non-partitioned data, consider partitioning the data. [Updated 11/30/09]We are currently working on a best practices blueprint document for PeopleSoft North American Payroll that presents a variety of tuning tips like these in addition to the recommended practices for F5100 flash array and flash accelerator F20 PCIe card. Stay tuned .. Sun published a best practices blueprint document with a variety of tuning tips like these in addition to the recommended practices for F5100 flash array and flash accelerator F20 PCIe card. You can download the blueprint from the following location:

    Best Practices for Oracle PeopleSoft Enterprise Payroll for North America using the Sun Storage F5100 Flash Array or Sun Flash Accelerator F20 PCIe Card

Related Blog Post:

Comments:

[Trackback] A few days ago i wrote about a recent Peoplesoft benchmark at a part of the article "App benchmarks, incorrect conclusions and the Sun Storage F5100" There was one part missing due to increased workload at the OOW 09 - the official paper about the b...

Posted by c0t0d0s0.org on November 11, 2009 at 05:31 PM PST #

There never was an eight-stream result on the HP hardware...Joerg at c0t0d0s0.org just made that up.

Go to the original blog that Vince Carbone wrote...where he himself says that the slower HP number he posted came about because he added the post-processing times in. This is NOT the way Oracle uses this benchmark, and when you make this same comparison, the Sun hardware is still (barely) equal to the HP.

Giri...you should read the whole thing here...including Vince's responses.

http://blogs.sun.com/BestPerf/entry/oracle_peoplesoft_payroll_sun_sparc

Now...here is the fundamental question -- the one NO ONE at Sun or Oracle can answer about the F5100 in this test:

"Why did Sun need a COMBINATION of 40x24GBSSDs AND 12x15KRPMHDD to (barely) match the performance of a system that only used 58x15KHDDs -- in an IOPS BOUND application?"

Moreover...the Sun SSD result used RAID-0 and the HP result used RAID-1.

Why are the SSDs in the F5100 so slow that it takes 40 of them in RAID-0 to match the performance of 58 HDDs in RAID-1?

Also...why did the Sun M4000 need twice as many cores as the HP result to deliver merely equal performance?

http://www.oracle.com/apps_benchmark/doc/peoplesoft/performance-report/ps9-na-pay-9_ora_hp_rx6600.pdf

http://www.oracle.com/apps_benchmark/doc/peoplesoft/performance-report/PS9-NA-PAY-9_ORA_Sun_M4000.pdf

Posted by Steve A. on November 15, 2009 at 07:22 AM PST #

Interestingly, you now try to show your lack of understanding here:

1. I never wrote, that there was no 8-stream result on that page .... it was removed ... don't try to turn the fact.

2. Regarding the number of cores: Didn't you looked at the utilization graph ? 25% of a 4 socket machine ? ;) sounds like just one socket was used by this system.

3. Please at least try to understand the mechanisms in the Peoplesoft benchmark.

4. As it doesn't seem you understand it at c0t0d0s0.org, i write it here again: The Sun Benchmark used JBODS, you write like the HP benchmark just uses disks, but they used an enterprise grade array there.... you know ... battery backup for caches, dual controller, 8 GB cache and so on ... but after telling this to you again and again, you still don't get this.

5. I wrote to you, that the 40 FMODs were used because of the capacity as far as i know.

Posted by Joerg M. on November 15, 2009 at 03:10 PM PST #

[Steve A.]There never was an eight-stream result on the HP hardware...Joerg at c0t0d0s0.org just made that up.
----------------------------
[Giri] I'm sorry to rain on your parade, Steve -- but since you are under the impression that Sun folks made up those numbers, here is the evidence that you are looking for. The following two screenshots not only proves that there was an eight-stream result on the HP hardware [ back in 2008 ], but also shows the elapsed time in that case is 89.77 minutes.

http://technopark02.googlepages.com/PSFT_NApayroll_HP_8streamResults.png
http://technopark02.googlepages.com/PSFT_NApayroll_HP_8streamResults_Exp.png

If you want a fair comparison, that is what you should compare the Sun's eight-stream result against. Also notice the 21.70 minute improvement in their benchmark result just by doubling the number of streams using similar hardware that they used in running the eight-stream benchmark test. In case if you did not notice yet, certainly there is a lot of difference between the eight- and sixteen-stream results.

If you really want to know why that result was pulled out of Oracle's web site, you better ask HP folks or Oracle, certainly not us.

Also as the one year old document was removed from Oracle's web site, I thought it is improper to post the entire document 'as is' - hence the relevant screenshots. If you want to look at the entire document, once again HP or Oracle is your best bet.

Hopefully the charade ends with this comment now that you have all the information/facts/etc., that you need.

Posted by Giri Mandalika on November 15, 2009 at 03:48 PM PST #

[Trackback] It's really seldom, that i'm spotlighting a reader in a blog entry, but i think it's necessary to give some perspective to the reader Steve A. It's really hard to annoy me that much, that i'm getting angry (despite the belief of many colleagues that i...

Posted by c0t0d0s0.org on November 15, 2009 at 09:54 PM PST #

"4. As it doesn't seem you understand it at c0t0d0s0.org, i write it here again: The Sun Benchmark used JBODS, you write like the HP benchmark just uses disks, but they used an enterprise grade array there.... you know ... battery backup for caches, dual controller, 8 GB cache and so on ... but after telling this to you again and again, you still don't get this."

You apparently (still) don't "get" the fact that the F5100 used 2.5GBytes of DRAM cache in front of all those SSDs, AND Sun used RAID-0, while HP HDDs were RAID-1.

Raid-0 is much faster than RAID-1, remember??

Also...the EVA8100 is an old and dog-slow array -- evidenced by it's lousy showings across multiple benchmarks.

So, you forgot to answer...why did Sun need 40 x $2,000each SSDs (and then another 12x15K HDD) to match 58x $350each HDDs?

Posted by Steve A. on November 15, 2009 at 09:56 PM PST #

Of course HP didn't use RAID-0 ... ? ;) Do you really think they've used an 58-way mirror ? ;)

And please stop to speak of "matching" with 8 streams Sun was 33% faster ... the systems was just at 25%. At a system that barely loaded the components.

Please stop making a fool out of you by asking the already answered question again and again and try to making a point that isn't one.

BTW: I want an formal apology from you ... you've accused my from lying by stating that i've mad up the numbers and the documents provided by Giri clearly shows that there was an 8 streams result with the mentioned performance numbers.

Posted by Joerg on November 16, 2009 at 12:01 AM PST #

Joerg...you said:

"Of course HP didn't use RAID-0 ... ? ;) Do you really think they've used an 58-way mirror ? ;)"

Wow!!! What can you possibly mean by this "58-way mirror" comment?

Did you think that RAID-1 on a pool of 58 spindles is done my mirroring the contents of a single disk to 58 other disks?!?

That "58-way mirror" would be very silly...and virtually impossible to do with any commercially available technology.

Joerg, FYI Raid-1 on 58 disks is done by mirroring each of 29 disks to another disk. 29 mirrored pairs = 58 disks, right?

Now...do you know that RAID-0 is merely striping? Do you know that that's what Sun did with it's F5100 JBOF configuration?

Maybe you want to rephrase your comment above.

By the way, all of the other PeopleSoft Payroll benchmarks use RAID-1, but Sun's test of the F5100 JBOF was the only result to ever use RAID-0...which is faster...but you lose everything if any one of the SSDs or HDDs fails.

"...the documents provided by Giri clearly shows.."

Giri didn't post documents, he posted bit-images of selected pages of a document no one else has ever seen or heard of. They show nothing, because they were never audited or published. There is no record at Oracle that this result was ever published, nor is there at HP.

Even Sun's own Vince Carbone never referenced these purported results when he did his original blog. Why do you think even Vince Carbone never saw these numbers?

Now...the numbers Sun and Vince Carbone DID reference...
http://blogs.sun.com/BestPerf/entry/oracle_peoplesoft_payroll_sun_sparc

...are ALL from the September 2009 result that I have been referencing from the very beginning.

Joerg, maybe you still haven't read Vince's comments there?

Giri, the image capture files you posted at technopark02 don't match the format of the other 9.0 benchmark results that Oracle has posted since 2008. Rather these image files use the results format from the older 8.9 version benchmarks. Can you say where you obtained these image files?

As regards the M4000 running only 25% CPU utilization? Well, wouldn't low CPU utilization be a natural result of (a) running only 8 threads on a 16 core machine, and (b) CPUs that are sitting around waiting for dog-slow SSDs?

FYI...SSDs take forever to complete writes once the SSD DRAM write buffers fill up, and the write-penalty for wear-leveling is worse than RAID-5 on mechanical disk!

You can see this by looking at the single-threaded results (charts in Figure 1.) and notice that the Sun SSD result with one thread is virtually identical to the single-thread result on the HDD system.

So much for all this nonsense about 16 threads vs 8 threads. Remember that this is an IOPS bound application running in batch mode. It goes faster when there are more spindles or faster spindles....but why doesn't it go faster with SSDs?

After all this noise...we STILL don't have an answer to the relevant question...that no one at Sun or Oracle has answered:

For an IOPS BOUND application that only needs 200GBytes of storage, why did Sun need a COMBINATION of:

- 40x24GB SSDs (at $2,000 each)
- with 2.5GBytes of DRAM write cache in front of them
- AND another 12x15KRPMHDDs
- all running without fault-tolerance in RAID-0

All this...to just barely match the performance of a system that only used 58x15K mechanical HDDs?

Posted by Steve A. on November 16, 2009 at 07:33 AM PST #

[Steve A.] Giri didn't post documents, he posted bit-images of selected pages of a document no one else has ever seen or heard of. They show nothing, because they were never audited or published. There is no record at Oracle that this result was ever published, nor is there at HP.
--

[Giri] Wrong again, Steve. If you haven't seen the document that had those numbers, it does not necessarily mean that they were never audited or published. Those numbers were audited and the document was available on Oracle's web site until October 2009 before HP's sixteen-stream result appeared on the same web site. That is how I got hold of that document (by the way, Vince had a copy of that document too -- don't just assume things the way you like to see especially those that you have no control over. Also remember that we are not obligated to post that document anywhere or to prove anything to some anonymous troll).

In other words, for a PeopleSoft benchmark result to appear on Oracle's web site, the following two must be true:

1. the results are validated (audited), and
2. the vendor who ran the benchmark is willing to make the benchmark results public

Unless those two conditions were met, Oracle wouldn't even post the benchmark white paper on their web site.

If you have no proper context or understanding or the history of this benchmark, please spend some time doing research, communicating with different parties at HP & Oracle and in getting the facts right before stating what you know or what you assume as 'facts'.

Posted by Giri Mandalika on November 16, 2009 at 01:00 PM PST #

@Giri: Just don't take this guy too seriously. There was already some speculation, that Steve A. is indeed Matt Bryant (an utter HP shill) or just a copycat.

@Steve: Sorry Steve, you clearly need professional help. I have this 8 stream document too. Are you really that weird, that we make up such documents just to proof a point? I don't know in what companies you worked in the past, but at Sun this is a carreer-limimiting-move.

Regarding this 25%: I hoped you did at least the basic research. Let's assume every stream is a thread. Let's further assume that this thread isn't dividable in sub threads. The SPARC 64 VII is a 8 thread processor. An m4000 has 32 threads. 8 of 32 ? 25% ;) Sounds like 8 threads running at full speed, and 24 almost idling because the application doesn't have enough threads to load an M4000 CPU wise.

Regarding Cache Size: Oh ... when you want to count this way ... There is almost a 1 GB when you take the caches under the hard disks into consideration (58\*16 MB) ;)

Regarding RAID-1 vs. RAID-0: I didn't thought, that i have to explain the basics to you: A RAID 1 has pretty much the same read performance than a RAID0 for multithreaded loads, especially when using techniques like split seeks. With writes this may look different, especially when using a software raid. But you have to take into consideration, that you have a RAID controller in this configuration, so the system has to transmit the data just once to the storage, and the storage takes care of the mirroring. Why did i talked about an 58 mirror? just to show you, that the HP configuration didn't used RAID1, they must have used RAID10 (or they took a number RAID1 volumes and did partitions, which would be a kind of RAID 0 on app level)

Regarding your stupid comment "You can see this by looking at the single-threaded results (charts in Figure 1.) and notice that the Sun SSD result with one thread is virtually identical to the single-thread result on the HDD system." You are aware that the 8 way stream HP and the 8 way Sun result use a different scaling ? ;)

So let's look in the actual numbers for runtime in minutes.

1 Thread Sun:
Paysheet: 27.32, PayCalc 232.17, PayConfirm 188.28, PrintAdvice 78.35, Direct Deposit 1.73

1 Thread HP - 8 streams
Paysheet: 34.35, PayCalc 275.17, PayConfirm 231.9, PrintAdvice 89.13, DirectDeposit 1.98

1 Thread HP - 16 streams
Paysheet: 37.38, PayCalc 276.92, PayConfirm 230.35, PrintAdvice 86.85, Direct Deposit 1.75

Now stop this silly performance discussion about nearly identical performance ...

The question why the benchmark used the 40 SSD variant and 12 disks was answered several times now, but somehow you aren't able to see it. But i should mention that i didn't answered too you, my comments are more for the public to set you comments in a perspective. And by the way, you are the only one so far who didn't get it ...

Posted by Joerg M. on November 16, 2009 at 10:25 PM PST #

@Giri

This entire discussion began with the unsubstantiated performance claims Sun was making for the F5100 Flash storage array.

http://blogs.sun.com/BestPerf/entry/why_sun_storage_f5100_is

I am not sure why the 2008 HP result is relevant -- everyone knows this is overwhelmingly an IOPS (spindle) bound application -- you even said so yourself, and Vince Carbone also says this above.

So...let's return to Sun's performance claims for the F5100.

1) Sun claims that in I/O and latency intensive applications, the F5100 loaded with 80 SSDs is good for a "million IOPS", and can replace 3,000 fast HDDs.

http://www.sun.com/storage/disk_systems/sss/f5100/features.xml

2) That means that a single Sun SSD module can replace 37.5 HDDs, right?

3) The point here is that Sun's claims for the F5100 don't hold water. Sun SSDs can't replace 37, 10, 5 or even 3HDDs.

In fact these SSDs can only replace 1.1 mechanical HDDs in this application.

Why?

@Joerg

You continue to claim "The question why the benchmark used the 40 SSD variant and 12 disks was answered several times now..."

No. The only answer you ever gave was that 40 SSDs were needed to provide enough capacity for the database.

Then...I informed you that the size of the benchmark database was only 200GB, and could have EASILY FIT ON TEN of Sun's SSDs in RAID-0.

You have NOT answered...so please stop claiming "I already answered".

So...can ANYONE at Sun now answer the question:

For an IOPS BOUND application that only needs 200GBytes of storage, why did Sun need a COMBINATION of:

- 40x24GB SSDs (at $2,000 each)
- with 2.5GBytes of DRAM write cache in front of them
- AND another 12x15KRPMHDDs for redo logs.
- all running without fault-tolerance in RAID-0

All this...to just barely match the performance of a system that only used 58x15K mechanical HDDs?

Posted by Steve A. on November 19, 2009 at 02:54 AM PST #

No, I dont like this comparison between SUN and HP machine. It looks like IBM FUD. I dont want SUN to do these kind of comparisons. SUN can let the bench be, but dont compare. It is not a fair comparison, and I hoped SUN plays fair? But not?

How can you compare HP 4 dual core 1.6GHz vs SUN 4 quad 2.53GHz? Even a dog would understand that it is not fair. This is something IBM could do, but SUN didnt. I thought? But maybe I am wrong?

I would like to have this comparison removed. Seriously. This is not fair play, and only looks bad.

Posted by Kebabbert on January 27, 2010 at 10:04 PM PST #

good........

Posted by Electronic Cigarette on February 18, 2010 at 08:42 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today