Friday May 31, 2013

Oracle Internet Directory 11g Benchmark on SPARC T5

SUMMARY

System Under Test (SUT)     Oracle's SPARC T5-2 server
Software     Oracle Internet Directory 11gR1-PS6
Target Load     50 million user entries
Reference URL     OID/T5 benchmark white paper

Oracle Internet Directory (OID) is an LDAP v3 Directory Server that has multi-threaded, multi-process, multi-instance process architecture with Oracle database as the directory store.


BENCHMARK WORKLOAD DESCRIPTION

Five test scenarios were executed in this benchmark - each test scenario performing a different type of LDAP operation. The key metrics are throughput -- the number of operations completed per second, and latency -- the time it took in milliseconds to complete an operation.

TEST SCENARIOS & RESULTS

1. LDAP Search operation : search for and retrieve specific entries from the directory

In this test scenario, each LDAP search operation matches a single unique entry. Each Search operation results in the lookup of an entry in such a way that no client looks up the same entry twice and no two clients lookup the same entry, and all entries are looked-up randomly.

#clients Throughput
Operations/Second
Latency
milliseconds
1,000 944,624 1.05

2. LDAP Add operation : add entries, their object classes, attributes and values to the directory

In this test scenario, 16 concurrent LDAP clients added 500,000 entries of object class InetOrgPerson with 21 attributes to the directory.

#clients Throughput
Operations/Second
Latency
milliseconds
16 1,000 15.95

3. LDAP Compare operation : compare a given attribute value to the attribute value in a directory entry

In this test scenario, userpassword attribute was compared. That is, each LDAP Compare operation matches user password of a user.

#clients Throughput
Operations/Second
Latency
milliseconds
1,000 594,426 1.68

4. LDAP Modify operation : add, delete or replace attributes for entries

In this test scenario, 50 concurrent LDAP clients updated a unique entry each time and a total of 50 million entries were updated. Attribute that is being modified was not indexed

#clients Throughput
Operations/Second
Latency
milliseconds
50 16,735 2.98

5. LDAP Authentication operation : authenticates the credentials of a user

In this test scenario, 1000 concurrent LDAP clients authenticated 50 million users.

#clients Throughput
Operations/Second
Latency
milliseconds
1,000 305,307 3.27

BONUS: LDAP Mixed operations Test

In this test scenario, 1000 LDAP clients were used to perform LDAP Search, Bind and Modify operations concurrently.
Operation breakdown (load distribution): Search: 65%. Bind: 30%. Modify: 5%

LDAP Operation #clients Throughput
Operations/Second
Latency
milliseconds
Search 650 188,832 3.86
Bind 300 87,159 1.08
Modify 50 14,528 12

And finally, the:

HARDWARE CONFIGURATION

1 x Oracle SPARC T5-2 Server
    » 2 x 3.6 GHz SPARC T5 sockets each with 16 Cores (Total Cores: 32) and 8 MB L3 cache
    » 512 GB physical memory
    » 2 x 10 GbE cards
    » 1 x Sun Storage F5100 Flash Array with 80 flash modules
    » Oracle Solaris 11.1 operating system

ACKNOWLEDGEMENTS

Major credit goes to our colleague, Ramaprakash Sathyanarayan

Friday Apr 12, 2013

Siebel 8.1.1.4 Benchmark on SPARC T5

Hardly six months after announcing Siebel 8.1.1.4 benchmark results on Oracle SPARC T4 servers, we have a brand new set of Siebel 8.1.1.4 benchmark results on Oracle SPARC T5 servers. There are no updates to the Siebel benchmark kit in the last couple years - so, we continued to use the Siebel 8.1.1.4 benchmark workload to measure the performance of Siebel Financial Services Call Center and Order Management business transactions on the recently announced SPARC T5 servers.

Benchmark Details

The latest Siebel 8.1.1.4 benchmark was executed on a mix of SPARC T5-2, SPARC T4-2 and SPARC T4-1 servers. The benchmark test simulated the actions of a large corporation with 40,000 concurrent active users. To date, this is the highest user count we achieved in a Siebel benchmark.


User Load Breakdown & Achieved Throughput

Siebel Application Module %Total Load #Users Business Trx per Hour
Financial Services Call Center 70 28,000 273,786
Order Management 30 12,000 59,553
Total     100 40,000 333,339

Average Transaction Response Times for both Financial Services Call Center and Order Management transactions were under one second.


Software & Hardware Specification

 Test Component Software Version Server Model Server Qty Per Server Specification OS
Chips Cores vCPUs CPU Speed CPU Type Memory
Application Server Siebel 8.1.1.4 SPARC T5-2 2 2 32 256 3.6 GHz SPARC-T5 512 GB Solaris 10 1/13 (S10U11)
Database Server Oracle 11g R2 11.2.0.2 SPARC T4-2 1 2 16 128 2.85 GHz SPARC-T4 256 GB Solaris 10 8/11 (S10U10)
Web Server iPlanet Web Server 7.0.9 (7 U9) SPARC T4-1 1 1 8 64 2.85 GHz SPARC-T4 128 GB Solaris 10 8/11 (S10U10)
Load Generator Oracle Application Test Suite 9.21.0043 SunFire X4200 1 2 4 4 2.6 GHz AMD Opteron 285 SE 16 GB Windows 2003 R2 SP2
Load Drivers (Agents) Oracle Application Test Suite 9.21.0043 SunFire X4170 8 2 12 12 2.93 GHz Intel Xeon X5670 48 GB Windows 2003 R2 SP2

Additional Notes:

  • Siebel Gateway Server was configured to run on one of the application server nodes
  • Four Siebel application servers were configured in the Siebel Enterprise to handle 40,000 concurrent users
    • - Each SPARC T5-2 was configured to run two Siebel application server instances
    • - Each of the Siebel application server instances on SPARC T5-2 servers were separated using Solaris virtualization technology, Zones
    • - 40,000 concurrent user sessions were load balanced across all four Siebel application server instances
  • Siebel database was hosted on a Sun Storage F5100 Flash Array consisting 80 x 24 GB flash modules (FMODs)
    • - Siebel 8.1.1.4 benchmark workload is not I/O intensive and does not require flash storage for better I/O performance
  • Fourteen iPlanet Web Server virtual servers were configured with Siebel Web Server Extension (SWSE) plug-in to handle 40,000 concurrent user load
    • - All fourteen iPlanet Web Server instances forwarded HTTP requests from Siebel clients to all four Siebel application server instances in a round robin fashion
  • Oracle Application Test Suite (OATS) was stable and held up amazingly well over the entire duration of the test run.
  • The benchmark test results were validated and thoroughly audited by the Siebel benchmark and PSR teams
    • - Nothing new here. All Sun published Siebel benchmarks including the SPARC T4 one were properly audited before releasing those to the outside world

Resource Utilization

Component #Users CPU% Memory Footprint
Gateway/Application Server 20,000 67.03 205.54 GB
Application Server 20,000 66.09 206.24 GB
Database Server 40,000 33.43 108.72 GB
Web Server 40,000 29.48 14.03 GB

Finally, how does this benchmark stack up against other published benchmarks? Short answer is "very well". Head over to the Oracle Siebel Benchmark White Papers webpage to do the comparison yourself.



[Credit to our hard working colleagues in SAE, Siebel PSR, benchmark and Oracle Platform Integration (OPI) teams. Special thanks to Sumti Jairath and Venkat Krishnaswamy for the last minute fire drill]

Copy of this blog post is also available at:
Siebel 8.1.1.4 Benchmark on SPARC T5

Tuesday Feb 12, 2013

OBIEE 11g Benchmark on SPARC T4

Just like the Siebel 8.1.x/SPARC T4 benchmark post, this one too was overdue for at least four months. In any case, I hope the Oracle BI customers already knew about the OBIEE 11g/SPARC T4 benchmark effort. In here I will try to provide few additional / interesting details that aren't covered in the following Oracle PR that was posted on oracle.com on 09/30/2012.

    SPARC T4 Server Delivers Outstanding Performance on Oracle Business Intelligence Enterprise Edition 11g


Benchmark Details

System Under Test

The entire BI middleware stack including the WebLogic 11g Server, OBI Server, OBI Presentation Server and Java Host was installed and configured on a single SPARC T4-4 server consisting four 8-Core 3.0 GHz SPARC T4 processors (total #cores: 32) and 128 GB physical memory. Oracle Solaris 10 8/11 is the operating system.

BI users were authenticated against Oracle Internet Directory (OID) in this benchmark - hence OID software which was part of Oracle Identity Management 11.1.1.6.0 was also installed and configured on the system under test (SUT). Oracle BI Server's Query Cache was turned on, and as a result, most of the query results were cached in OBIS layer, that resulted in minimal database activity making it ideal to have the Oracle 11g R2 database server with the OBIEE database running on the same box as well.

Oracle BI database was hosted on a Sun ZFS Storage 7120 Appliance. The BI Web Catalog was under a ZFS/zpool on a couple of SSDs.


Test Scenario

In this benchmark, 25000 concurrent users assumed five different business user roles -- Marketing Executive, Sales Representative, Sales Manager, Sales Vice-president, and Service Manager. The load was distributed equally among those five business user roles. Each of those different BI users accessed five different pre-built dashboards with each dashboard having an average of five reports - a mix of charts, tables and pivot tables - and returning 50-500 rows of aggregated data. The benchmark test scenario included drilling down into multiple levels from a table or chart within a dashboard. There is a 60 second think time between requests, per user.


BI Setup & Test Results

OBIEE 11g 11.1.1.6.0 was deployed on SUT in a vertical scale-out fashion. Two Oracle BI Presentation Server processes, one Oracle BI Server process, one Java Host process and two instances of WebLogic Managed Servers handled 25,000 concurrent user sessions smoothly. This configuration resulted in a sub-second overall average transaction response time (average of averages over a duration of 120 minutes or 2 hours). On average, 450 business transactions were executed per second, which triggered 750 SQL executions per second.

It took only 52% of CPU on average (~5% system CPU and rest in user land) to do all this work to achieve the throughput outlined above. Since 25,000 unique test/BI users hammered different dashboards consistently, not so surprisingly bulk of the CPU was spent in Oracle BI Presentation Server layer, which took a whopping 29%. BI Server consumed about 10-11% and the rest was shared by Java Host, OID, WebLogic Managed Server instances and the Oracle database.


So, what is the key take away from this whole exercise?

SPARC T4 rocks Oracle BI world. OBIEE 11g/SPARC T4 is an ideal combination that may work well for majority of OBIEE deployments on Solaris platform. Or in marketing jargon - The excellent vertical and horizontal scalability of the SPARC T4 server gives customer the option to scale up as well as scale out growth, to support large BI EE installations, with minimal hardware investment.

Evaluate and decide for yourself.

[Credit to our colleagues in Oracle FMW PSR, ISVe teams and SCA lab support engineers]

Wednesday Jan 30, 2013

Siebel 8.1.1.4 Benchmark on SPARC T4

Siebel is a multi-threaded native application that performs well on Oracle's T-series SPARC hardware. We have several versions of Siebel benchmarks published on previous generation T-series servers ranging from Sun Fire T2000 to Oracle SPARC T3-4. So, it is natural to see that tradition extended to the current genration SPARC T4 as well.

Benchmark Details

29,000 user Siebel 8.1.1.4 benchmark on a mix of SPARC T4-1 and T4-2 servers was announced during the Oracle OpenWorld 2012 event. In this benchmark, Siebel application server instances ran on three SPARC T4-2/Solaris 10 8/11 systems where as the Oracle database server 11gR2 was configured on a single SPARC T4-1/Solaris 11 11/11 system. Several iPlanet web server 7 U9 instances with the Siebel Web Plug-in (SWE) installed ran on one SPARC T4-1/Solaris 10 8/11 system. Siebel database was hosted on a single Sun Storage F5100 flash array consisting 80 flash modules (FMODs) each with capacity 24 GB.

Siebel Call Center and Order Management System are the modules that were tested in the benchmark. The benchmark workload had 70% of virtual users running Siebel Call Center transactions and the remaining 30% vusers running Siebel Order Management System transactions. This benchmark on T4 exhibited sub-second response times on average for both Siebel Call Center and Order Management System modules.

Load balancing at various layers including web and test client systems ensured near uniform load across all web and application server instances. All three Siebel application server systems consumed ~78% CPU on average. The database and web server systems consumed ~53% and ~18% CPU respectively.

All these details are supposed to be available in a standard Oracle|Siebel benchmark template - but for some reason, I couldn't find it on Oracle's Siebel Benchmark White Papers web page yet. Meanwhile check out the following PR that was posted on oracle.com on 09/28/2012.

    SPARC T4 Servers Set World Record on Siebel CRM 8.1.1.4 Benchmark

Looks like the large number of vusers (29,000 to be precise) sets this benchmark apart from the other benchmarks published with the same Siebel 8.1.1.4 benchmark workload.

[Credit to our colleagues in Siebel PSR, benchmark, SAE and ISVe teams]

Sunday Jan 30, 2011

PeopleSoft Financials 9.0 (Day-in-the-Life) Benchmark on Oracle Sun

It is very rare to see any vendor publishing a benchmark on competing products of their own let alone products that are 100% compatible with each other. Well, it happened at Oracle|Sun. M-series and T-series hardware was the subject of two similar / comparable benchmarks; and PeopleSoft Financials 9.0 DIL was the benchmarked workload.

Benchmark report URLs

PeopleSoft Financials 9.0 on Oracle's SPARC T3-1 Server
PeopleSoft Financials 9.0 on Oracle's Sun SPARC Enterprise M4000 Server

Brief description of workload

The benchmark workload simulated Financial Control and Reporting business processes that a customer typically performs when closing their books at period end. "Closing the books" generally involves Journal generation, editing & posting; General Ledger allocations, summary & consolidations and reporting in GL. The applications that were involved in this process are: General Ledger, Asset Management, Cash Management, Expenses, Payables and Receivables.

The benchmark execution simulated the processing required for closing the books (background batch processes) along with some online concurrent transaction activity by 1000 end users.

Summary of Benchmark Test Results

The following table summarizes the test results of the "close the books" business processes. For the online transaction response times, check the benchmark reports (too many transactions to summarize here). Feel free to draw your own conclusions.

As of this writing no other vendor published any benchmark result with PeopleSoft Financials 9.0 workload.

(If the following table is illegible, click here for cleaner copy of test results.)

Hardware Configuration Elapsed Time Journal Lines per Hour Ledger Lines per Hour
Batch only Batch + 1K users Batch only Batch + 1K users Batch only Batch + 1K users
DB + Proc Sched  1 x Sun SPARC Enterprise M5000 Server
 8 x 2.53 GHz QC SPARC64 VII processors, 128 GB RAM
App + Web  1 x SPARC T3-1 Server
 1 x 1.65 GHz 16-Core SPARC T3 processor, 128 GB RAM
24.15m
Reporting: 11.67m
25.03m
Reporting: 11.98m
6,355,093 6,141,258 6,209,682 5,991,382
DB + Proc Sched  1 x Sun SPARC Enterprise M5000 Server
 8 x 2.66 GHz QC SPARC64 VII+ processors, 128 GB RAM
App + Web  1 x Sun SPARC Enterprise M4000 Server
 4 x 2.66 GHz QC SPARC64 VII+ processors, 128 GB RAM
21.74m
Reporting: 11.35m
23.30m
Reporting: 11.42m
7,059,591 6,597,239 6,898,060 6,436,236

Software Versions

Oracle’s PeopleSoft Enterprise Financials/SCM 9.00.00.331
Oracle’s PeopleSoft Enterprise (PeopleTools) 8.49.23 64-bit
Oracle’s PeopleSoft Enterprise (PeopleTools) 8.49.23 32-bit on Windows Server 2003 SP2 for generating reports using nVision
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 64-bit + RDBMS patch 9699654
Oracle Tuxedo 9.1 RP36 Jolt 9.1 64-bit
Oracle WebLogic Server 9.2 MP3 64-bit (Java version "1.5.0_12")
MicroFocus Server Express 4.0 SP4 64-bit
Oracle Solaris 10 10/09 and 09/10

Acknowledgments

It is one of the complex and stressful benchmarks that I have ever been involved in. It is a collaborative effort from different teams within Oracle Corporation. A sincere thanks to the PeopleSoft benchmark team for providing adequate support throughout the execution of the benchmark and for the swift validation of benchmark results numerous times (yes, "numerous" - it is not a typo.)

Sunday Oct 24, 2010

SPARC T3 reiterates Siebel CRM's Supremacy on T-series Hardware

It's been mentioned and proved several times that Sun/Oracle's T-series hardware is the best fit to deploy and run Siebel CRM. Feel free to browse through the list of Siebel benchmarks that Sun published in the past on T-series:

        2004-2010 : A Look Back at Sun Published Oracle Benchmarks

Oracle Corporation announced the availability of SPARC T3 servers in Oracle OpenWorld 2010, and sure enough there is a Siebel CRM benchmark on SPARC T3-1 server to support the server launch event. Check the following web page for high level details of the benchmark.

        SPARC T3-1 Server Posts a High Score on New Siebel CRM 8.1.1 Benchmark

I intend to provide the missing pieces of information in this blog post.

First of all, it is not a "Platform Sizing and Performance Program" (PSPP) benchmark. Siebel 8.1.1 was used to run the benchmark, and there is no Siebel PSPP benchmark kit available as of today for v8.1.1. Hence the test results from this benchmark exercise are not directly comparable to the Siebel 8.0 PSPP benchmark results.

Workload

The benchmark workload consists of a mix of Siebel Financial Services Call Center and Siebel Web Services / EAI transactions. The FINS Call Center transactions create a bunch of Opportunities, Quotes and Orders, where as the Web Services / EAI transactions submit new Service Requests (SR), search for and update existing SRs. The transaction mix is 40% FINS Call Center transactions and 60% Web Services / EAI transactions.

Software Versions

  • Siebel CRM 8.1.1
  • Oracle RDBMS 11g R2 (11.2.0.1), 64-bit
  • iPlanet Web Server 7.0 Update 8, 32-bit
  • Solaris 10 09/10 in the application-tier and
  • Solaris 10 10/09 in the web- and database-tiers

Hardware Configuration

  • Application Server : 1 x SPARC T3-1 Server (2 RU system)
    • One socket 16-Core 1.65 GHz SPARC T3 processor, 128 hardware threads, 6 MB L2 Cache, 64 GB RAM
  • Web Server + Database Server : 1 x Sun SPARC Enterprise T5240 Server (2 RU system)
    • Two socket 16-Core 1.165 GHz UltraSPARC T2 Plus processors, 128 hardware threads, 4 MB L2 Cache, 64 GB RAM

Virtualization Technology

iPlanet Web Server and the Oracle 11g Database Server were configured on a single Sun SPARC Enterprise T5240 Server. Those software layers were isolated from each other with the help of Oracle Solaris Containers virtualization technology. Resource allocations are shown below.

Tier #vCPU Memory (GB)
Database 96 48
Web 32 16

Test Results

#vUsers Avg Trx Resp Time (sec) Business Trx
Throughput/HR
Avg CPU Utilization (%) Avg Memory Footprint (GB)
FINS EAI FINS EAI App DB Web App DB + Web
13,000 0.43 0.2 48,409 116,449 58 42 37 52 35

Why stop at 13K users?

Notice that the average CPU utilization on the application server node (SPARC T3-1) is only ~58%. The application server node has room to accommodate more online vusers - however, there is not enough free memory left on the server to scale beyond 13,000 concurrent users. That is the main reason to stop at 13,000 user count in this benchmark.

Siebel Best Practices

Check the following presentation:

        Siebel on Oracle Solaris : Best Practices, Tuning Tips

Acknowledgments

Credit to all our peers at Oracle Corporation who helped us with the hardware, workload, verification and validation etc., in a timely manner. Also Jenny deserves special credit for spending enormous amount of time running the benchmark with patience.

Wednesday Jun 16, 2010

PeopleSoft NA Payroll 500K EE Benchmark on Solaris : The Saga Continues ..

Few clarifications before we start.

Difference between 240K and 500K EE PeopleSoft NA Payroll benchmarks

Not too long ago Sun published PeopleSoft NA Payroll 240K EE benchmark results with 16 job streams and 8 job streams. First of all, I want to make sure everyone understands the fact that PeopleSoft NA Payroll 240K and 500K EE benchmarks are two completely different benchmarks. The 240K database model represents a large sized organization where as 500K database model represents an extra-large sized organization. Vendors [who benchmark] have the flexibility of configuring 8, 16, 24 or 32 parallel job streams (or threads) in those two benchmarks to parallellize the work being done.

Now that the clarifications are out of the way, here is the direct URL for the 500K Payroll benchmark results that Oracle|Sun published last week. (document will be updated shortly to fix the branding issues)

    PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M5000 (500K EE 32 job streams)

What's changed at Sun & Oracle?

The 500K payroll benchmark work was started few months ago when Sun is an independent entity. By the time the benchmark work is complete, Sun was part of Oracle Corporation. However it has no impact whatsoever on the way we have been operating and interacting with the PeopleSoft benchmark team for the past few years. We (Sun) still have to package all the benchmark results and submit for validation just like any other vendor. It is still the same laborious process that we have to go through from both ends of Oracle (that is, PeopleSoft & Sun). I just mentioned this to highlight Oracle's non-compromising nature on anything at any level in publishing quality benchmarks.

SUMMARY OF 500K NA PAYROLL BENCHMARK RESULTS

The following bar chart summarizes all the published benchmark results by different vendors. Each 3D bar on X-axis represent one vendor, and the Y-axis shows the throughput (#payments/hour) achieved by corresponding vendor. Actual throughput and the vendor name is also shown in each of the 3D bar for clarity. Common sense dictates that higher the throughput, the better it is.

The numbers in the following table were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the results and the actual numbers that are of interest to the customers. The results in the following table are sorted by the hourly throughput (payments/hour) in the descending order. The goal of this benchmark is to achieve as much hourly throughput as possible. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 500,480 & Number of payments: 750,720
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 10/09 1x Sun SPARC Enterprise M5000 with 8 x 2.53 GHz SPARC64 VII Quad-Core CPUs & 64G RAM
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes. Capacity: 960 GB
1 x Sun Storage 2510 Array for redo logs. Capacity: 272 GB. Total storage capacity: 1.2 TB
32 50.11 898,886
IBM z/OS 1.10 1 x IBM System z10 Enterprise Class Model 2097-709 with 8 x 4.4 GHz IBM System z10 Gen1 CPUs & 32G RAM
1 x IBM TotalStorage DS8300. Total capacity: 9 TB
8\* 58.96 763,962
HP HP-UX B.11.31 1 x HP Integrity rx7640 with 8 x 1.6 GHz Intel Itanium2 9150 Dual-Core CPUs & 64G RAM
1 x HP StorageWorks EVA 8100. Total capacity: 8 TB
32 96.17 468,370

This is all public information. Feel free to compare the hardware configurations and the data presented in all three rows and draw your own conclusions. Since all vendors used the same benchmark toolkit, comparisons should be pretty straight forward.

Sun Storage F5100 Flash Array, the differentiator

Of all these benchmark results, clearly the F5100 storage array is the key differentiator. The payroll workload is I/O intensive, and requires low latency storage for better throughput (it is implicit that less latency means less I/O waits).

There is a lot of noise from some of the outside blog readers (I do not know who those readers are or who they work for) when Sun published the very first NA Payroll 240K EE benchmark with eight job streams using an F5100 array that has 40 flash modules (FMOD). Few people thought it is necessary to have those many flash modules to get that kind of performance that Sun demonstrated in the benchmark. Now that we have the 500K benchmark result as well, I want to highlight another fact that it is the same F5100 that was used in all the three NA Payroll benchmarks that Sun published in the last 10 months. Even though other vendors increased the number of disk drives when moved from 240K to 500K EE benchmark environment, Sun hasn't increased the flash modules in F5100 -- the number of FMODs remained at 40 even in 500K EE benchmark. This fact implicitly suggests at least two things -- 1. F5100 array is resilient, scales and performs consistently even with increased load. 2. May be 40 flash modules are not needed in 240K EE Payroll benchmark. Hopefully this will silence those naysayers and critics now.

While we are on the same topic, the storage capacity in the other array that was used to store the redo logs was in fact reduced from 5.3 TB in a J4200 array that was used in 240K EE/16 job stream benchmark to 272 GB in a 2510 array that was used in 500K EE/32 job stream benchmark. Of course, in both cases, the redo logs consumed only 20 GB on disk - but since the arrays were connected to the database server, we have to report the total capacity of the array(s) whether it is being used or not.

Notes on CPU utilization and IBM's #job streams

Even though I highlighted the I/O factor in the above paragraph, it is hard to ignore the fact that the NA Payroll workload is CPU intensive too. Even when multiple job streams are configured, each stream runs as a single-thread process -- hence it is vital to have a server with powerful processors for better [overall] performance.

Observant readers might have noticed couple of interesting things.

  1. The maximum average CPU usage that Sun reported in 500K EE benchmark in any scenario by any process is only 43.99% (less than half of the total processing capacity)

    The reason is simple. The SUT, M5000, has eight quad-core processors and each core is capable of running two hardware threads in parallel. Hence there are 64 virtual CPUs on the system, and since we ran only 32 job streams, only half of the total available CPU power was in use.

    Customers in a similar situation have the flexibility to consolidate another workload onto the same system to take advantage of the available/remaining CPU cycles.

  2. IBM's 500K EE benchmark result is only with 8 job streams

    I do not know the exact reason - but if I have to speculate, it is as good as anyone's guess. Based on the benchmark results white paper, it appears that the z10 system (mainframe) has eight single core processors, and perhaps that is why they ran the whole benchmark with only eight job streams.

Also See:

Friday Mar 26, 2010

2004-2010 : A Look Back at Sun Published Oracle Benchmarks

Since Sun Microsystems became a legacy, I got this idea of a reminiscent [farewell] blog post for the company that gave me the much needed break when I was a graduate student back in 2002. As I spend more than 50% of my time benchmarking different Oracle products on Sun hardware, it'd be fitting to fill this blog entry with a recollection of the benchmarks I was actively involved in over the past 6+ years. Without further ado, the list follows.

2004

 1.  10,000 user Siebel 7.5.2 PSPP benchmark on a combination of SunFire v440, v890 and E2900 servers. Database: Oracle 9i

2005

 2.  8,000 user Siebel 7.7 PSPP benchmark on a combination of SunFire v490, v890, T2000 and E2900 servers. Database: Oracle 9i

 3.  12,500 user Siebel 7.7 PSPP benchmark on a combination of SunFire v490, v890, T2000 and E2900 servers. Database: Oracle 9i

2006

 4.  10,000 user Siebel Analytics 7.8.4 benchmark on multiple SunFire T2000 servers. Database: Oracle 10g

2007

 5.  10,000 user Siebel 8.0 PSPP benchmark on two T5220 servers. Database: Oracle 10g R2

2008

 6.  Oracle E-Business Suite 11i Payroll benchmark for 5,000 employees. Database: Oracle 10g R1

 7.  14,000 user Siebel 8.0 PSPP benchmark on a single T5440 server. Database: Oracle 10g R2

 8.  10,000 user Siebel 8.0 PSPP benchmark on a single T5240 server. Database: Oracle 10g R2

2009

 9.  4,000 user PeopleSoft HR Self-Service 8.9 benchmark on a combination of M3000 and T5120 servers. Database: Oracle 10g R2

 10.  28,000 user Oracle Business Intelligence Enterprise Edition (OBIEE) 10.1.3.4 benchmark on a single T5440 server. Database: Oracle 11g R1

 11.  50,000 user Oracle Business Intelligence Enterprise Edition (OBIEE) 10.1.3.4 benchmark on two T5440 servers. Database: Oracle 11g R1

 12.  PeopleSoft North American Payroll 9.0 240K EE 8-stream benchmark on a single M4000 server with F5100 Flash Array storage. Database: Oracle 11g R1

2010

 13.  PeopleSoft North American Payroll 9.0 240K EE 16-stream benchmark on a single M4000 server with F5100 Flash Array storage. Database: Oracle 11g R1

 14.  6,000 user PeopleSoft Campus Solutions 9.0 benchmark on a combination of X6270 blades and M4000 server. Database: Oracle 11g R1


Although challenging and exhilarating, benchmarks aren't always pleasant to work on, and really not for people with weak hearts. While running most of these benchmarks, my blood pressure shot up several times leaving me wonder why do I keep working on time sensitive and/or politically, strategically incorrect benchmarks (apparently not every benchmark finds a home somewhere on the public network). Nevertheless in the best interest of my employer, the showdown must go on.

Thursday Feb 18, 2010

PeopleSoft Campus Solutions 9.0 benchmark on Sun SPARC Enterprise M4000 and X6270 blades

Oracle|Sun published PeopleSoft Campus Solutions 9.0 benchmark results today. Here is the direct URL to the benchmark results white paper:

      PeopleSoft Enterprise Campus Solutions 9.0 using Oracle 11g on a Sun SPARC Enterprise M4000 & Sun Blade X6270 Modules

Sun published three PeopleSoft benchmarks on SPARC platform over the last 12 month period -- one OLTP and two batch benchmarks[1]. The latest benchmark is somewhat special for at least couple of reasons:

  • Campus Solutions 9.0 workload has both online transactions and batch processes, and
  • This is the very first time ever Sun published a PeopleSoft benchmark on x64 hardware running Oracle Enterprise Linux

The summary of the benchmark test results is shown below. These numbers were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the test results and the actual numbers that are of interest to the customers. Test results are sorted by the hourly throughput (invoices & transcripts per hour) in the descending order. Click on the link that is underneath the vendor name to open corresponding benchmark result.

While analyzing these test results, remember that the higher the throughput, the better. In the case of online transactions, it is desirable to keep the response times as low as possible.

(Prettier version of the following table is at:
        Oracle PeopleSoft Campus Solutions 9.0 Benchmark Test Results
)

Oracle PeopleSoft Campus Solutions 9.0 Benchmark Test Results

Vendor Hardware Configuration OS Resource Utilization Response/Elapsed Times at Peak Load (6,000 users)
Online Transactions: Avg Response Times (sec) Batch Throughput/hr
CPU% Mem (GB) Logon LSC Page Load Page Save Invoice Transcripts
Sun DB 1 x M4000 with 2 x 2.53GHz SPARC64 VII QC processors, 32GB RAM
1 x Sun Storage Flash Accelerator F20 with 4 x 24GB FMODs
1 x ST2540 array with 11 × 136.7GB SAS 15K RPM drives
Solaris 10 37.29 20.94 0.64 0.78 0.82 1.57 31,797 36,652
APP 2 x X6270 blades with 2 x 2.93GHz Xeon 5570 QC processors, 24GB RAM OEL4 U8 41.69\* 4.99\*
WEB+PS 1 x X6270 blade with 2 x 2.8GHz Xeon 5560 QC processors, 24GB RAM OEL4 U8 33.08 6.03

HP DB 1 x Integrity rx6600 with 4 x 1.6GHz Itanium 9050 DC procs, 32G RAM
1 x HP StorageWorks EVA8100 array with 58 x 146GB drives
HP-UX 11iv3 61 30 0.71 0.91 0.83 1.63 22,753 30,257
APP 2 x BL460c blade with 2 x 3.16GHz Xeon 5460 QC procs, 16GB RAM RHEL4U5 61.81 3.6
WEB 1 x BL460c blade with 2 x 3GHz Xeon 5160 DC procs, 8GB RAM RHEL4U5 44.36 3.77
PS 1 x BL460c blade with 2 x 3GHz Xeon 5160 DC procs, 8GB RAM RHEL4U5 21.90 1.48

HP DB 1 x ProLiant DL580 G4 w/ 4 x 3.4GHz Xeon 7140M DC procs, 32G RAM
1 x HP StorageWorks XP128 array with 28 x 73GB drives
Win2003R2 70.37 21.26 0.72 1.17 0.94 1.80 17,621 25,423
APP 4 x BL480c G1 blades with 2 x 3GHz Xeon 5160 DC procs, 12GB RAM Win2003R2 65.61 2.17
WEB 1 x BL460c G1 blades with 2 x 3GHz Xeon 5160 DC procs, 12GB RAM Win2003R2 54.11 3.13
PS 1 x BL460c G1 blades with 2 x 3GHz Xeon 5160 DC procs, 12GB RAM Win2003R2 32.44 1.40

This is all public information. Feel free to compare the hardware configurations & the data presented in the table and draw your own conclusions. Since both Sun and HP used the same benchmark workload, toolkit and ran the benchmark with the same number of concurrent users and job streams for the batch processes, comparison should be pretty straight forward.

Hopefully the following paragraphs will provide relevant insights into the benchmark and the application.

Caution in interpreting the Online Transaction Response Times

Average response times for the online transactions were measured using HP's QuickTest Pro (QTP) tool. This is a benchmark requirement. QTP test scripts have a dependency on the web browser (IE in particular) -- hence it is extremely sensitive to the web browser latencies, remote desktop/VNC latencies and other latencies induced by the operating system. Be aware that all these latencies will be factored into the transaction response times and due to this, the final average transaction response times might be skewed a little. In other words, the reported average transaction response times may not necessarily be very accurate. In most of the cases we might be looking at the approximate values and the actual values might be far better than the ones reported in the benchmark report. (I really wish Oracle|PeopleSoft would throw away some of the skewed samples to make the data more accurate and reliable.). Please keep this in mind when looking at the response times of the online transactions.

Quick note about Consolidation

In our benchmark environment, we had the PeopleSoft Process Scheduler (batch server) set up on the same node as that of the web server node. In general, Oracle recommends setting up the process scheduler either on the database server node or on a dedicated system. However in the benchmark environment, we chose not to run the process scheduler on the database server node as it would hurt the performance of the online transactions. At the same time, we noticed plenty of idle CPU cycles on the web server node even at the peak load of 6,000 concurrent users, so we decided to run the PS on the web server node. In case if customers are not comfortable with this kind of setup, they can use any supported virtualization technology (eg., Logical Domains, Containers on Solaris, Oracle VM on OEL) to separate the process scheduler from the web server by allocating the system resources as they like. It is just a matter of choice.

PeopleSoft Load Balancing

PeopleSoft has load balancing mechanism built into the web server to forward the incoming requests to appropriate application server in the enterprise, and within the application server to send the request to an appropriate application server process, PSAPPSRV. (I'm not 100% sure but I think application server balances the load among application server processes in a round robin fashion on \*nix platforms whereas on Windows, it forwards all the requests to a single application server process until it reaches the configured limit before moving on to the next available application server process.). However this in-built load balancing is not perfect. Most of the times, the number of requests processed by each of the identically configured application server processes [running on different application server nodes in the enterprise] may not be even. This minor shortcoming could lead to uneven resource usage across different nodes in the PeopleSoft deployment. You can notice this in the CPU and memory usage reported for the two app server nodes in the benchmark environment (check the benchmark results white paper).

Sun Flash Accelerator F20 PCIe Card

To reduce I/O latency, hot tables and hot indexes were placed on a Sun Flash Accelerator F20 PCIe Card in this benchmark. The F20 card has a total capacity of 96 GB with 4 x 24GB Flash Modules (FMODs). Although this workload is moderately I/O intensive, the batch processes in this benchmark generate a lot of I/O for few minutes in the steady state of the benchmark. The flash accelerator handled the burst of I/O activity pretty well, and as a result the performance of the batch processesing was improved.

Check the white paper Best Practices for Oracle PeopleSoft Enterprise Payroll for North America using the Sun Storage F5100 Flash Array or Sun Flash Accelerator F20 PCIe Card to know more about the top flash products offered by Oracle|Sun and how they can be deployed in a PeopleSoft environment for maximum benefit.

Solaris specific Tuning

Almost on all versions of Solaris 10, the kernel uses 4M as the maximum page size despite the fact that the underlying hardware supports as high as 256M pages. However large pages may improve the performance of some of the memory intensive workloads such as Oracle database by reducing the number of virtual <=> physical translations there by reducing the expensive dTLB/iTLB misses. In the benchmark environment, the following values were set in the /etc/system configuration file of the database server node to enable 256MB pages for the process heap and ISM.


\* 256M pages for process heap
set max_uheap_lpsize=0x10000000

\* 256M pages for ISM
set mmu_ism_pagesize=0x10000000

While we are on the same topic, Linux configuration is out-of-the-box. No OS tuning was performed in this benchmark.

Tuning Tip for Solaris Customers

Even though we did not set up the middle-tier on a Solaris box in this benchmark, this particular tuning tip is still valid and may help all those customers running the application server on Solaris. Consider lowering the shell limit for the file descriptors to a value of 512 or less if it was set to any value greater than 512. As of today (until the release of PeopleTools 8.50), there are certain parts of code in PeopleSoft calls the file control routine, fcntl(), and the file close routine, fclose(), in a loop "ulimit -n" number of times to close a bunch of files which were opened to perform a specific task. In general, PeopleSoft processes won't open hundreds of files. Hence the above mentioned behavior results in ton of dummy calls that error out. Besides, those system calls are not cheap -- they consume CPU cycles. It gets worse when there are a number of PeopleSoft processes that exhibit this kind of behavior simultaneously. (high system CPU% is one of the symptoms that helps identifying this behavior). Oracle|PeopleSoft is currently trying to address this performance issue. Meanwhile customers can lower the file descriptors shell limit to reduce its intensity and impact.

We have not observed this behavior on OEL when running the benchmark. But be sure to trace the system calls and figure out if the shell limit for the file descriptors need be lowered even on Linux or other supported platforms.


______________________________________

Footnotes:

1. PeopleSoft benchmarks on Sun platform in year 2009-2010

  1. PeopleSoft HRMS 8.9 SELF-SERVICE Using ORACLE on Sun SPARC Enterprise M3000 and Enterprise T5120 Servers -- online transactions (OLTP)
  2. PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (8 streams) -- batch workload
  3. PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (16 streams) -- batch workload



2. \*HP's benchmark results white paper did not show the CPU and memory breakdown numbers separately for each of the application server nodes. It only shows the average of average CPU and memory utilization for all app server nodes under "App Servers". Sun's average CPU, memory numbers [shown in the above table] were calculated in the same way for consistency.

Wednesday Jan 20, 2010

PeopleSoft NA Payroll 240K EE Benchmark with 16 Job Streams : Another Home Run for Sun

Poor Steve A.[1] ... This entry is not about Steve A. though. It is about the new PeopleSoft NA Payroll benchmark result that Sun published today.

First things first. Here is the direct URL to our latest benchmark results:

        PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (16 job streams[2] -- simply referred as 'stream' hereonwards)

The summary of the benchmark test results is shown below only for the 16 stream benchmarks. These numbers were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the results and the actual numbers that are of interest to the customers. The results in the following table are sorted by the hourly throughput (payments/hour) in the descending order. The goal is to achieve as much hourly throughput as possible. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 240,000 & Number of payments: 360,000
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 5/09 1x Sun SPARC Enterprise M4000 with 4 x 2.53 GHz SPARC64-VII Quad-Core processors and 32 GB memory
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes
1 x Sun Storage J4200 Array for redo logs
16 43.78 493,376
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
16 68.07 317,320

This is all public information. Feel free to compare the hardware configurations and the data presented in both of the rows and draw your own conclusions. Since both Sun and HP used the same benchmark toolkit, workload and ran the benchmark with the same number of job streams, comparison should be pretty straight forward.

If you want to compare the 8 stream results, check the other blog entry: PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise. Sun used the same hardware to run both benchmark tests with 8 and 16 streams respectively. We could have gotten away with 20+ Flash Modules (FMODs), but we want to keep the benchmark environment consistent with our prior benchmark effort around the same benchmark workload with 8 job streams. Due to the same hardware setup, now we can easily demonstrate the advantage of parallelism (simply by comparing the test results from 8 and 16 stream benchmarks) and how resilient and scalable the F5100 Flash array is.

Our benchmarks showed an improvement of ~55% in overall throughput when the number of job streams were increased from 8 to 16. Also our 16 stream results showed ~55% improvement in overall throughput over HP's published results with the same number of streams at a maximum average CPU utilization of 45% compared to HP's maximum average CPU utilization of 89%. The half populated Sun Storage F5100 Flash Array played the key role in both of those benchmark efforts by demonstrating superior I/O performance over the traditional disk based arrays.

Before concluding, I would like to highlight a few known facts (just for the benefit of those people who may fall for the PR trickery):

  1. 8 job streams != 16 job streams. In other words, the results from an 8 stream effort is not comparable to that of a 16 stream result.
  2. The throughput should go up with increased number of job streams [ only up to some extent -- do not forget that there will be a saturation point for everything ]. For example, the throughput with 16 streams might be higher compared to the 8 stream throughput.
  3. The Law of Diminishing Returns applies to the software world too, not just for the economics. So, there is no guarantee that the throughput will be much better with 24 or 32 job streams.

Other blog posts and documents of interest:

  1. Best Practices for Oracle PeopleSoft Enterprise Payroll for North America using the Sun Storage F5100 Flash Array or Sun Flash Accelerator F20 PCIe Card
  2. PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (8 streams benchmark white paper)
  3. PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise
  4. App benchmarks, incorrect conclusions and the Sun Storage F5100
  5. Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance
































Notes:

[1] Steve A. tried so hard and his best to make everyone else believe that HP's 16 job stream NA Payroll 240K EE benchmark results are on par with Sun's 8 stream benchmark results. Apparently Steve A. failed and gave up after we showed the world a few screenshots from a published and eventually withdrawn benchmark [ by HP ]. You can read all his arguments, comparisons etc., in the comments section of my other blog entry PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise as well as in Joerg Moellenkamp's blog entries around the same topic.

[2] In PeopleSoft terminology, a job stream is something that is equivalent to a thread.

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today