Friday Oct 08, 2010

Is it really Solaris versus Windows & Linux?

(Even though the title explicitly states "Solaris Versus .. ", this blog entry is equally applicable to all the operating systems in the world with few changes.)

Lately I have seen quite a few e-mails and heard few customer representatives talking about the performance of their application(s) on Solaris, Windows and Linux. Typically they go like the following with a bunch of supporting data (all numbers) and no hardware configuration specified whatsoever.

  • "Transaction X is nearly twice as slow on Solaris compared to the same transaction running on Windows or Linux"
  • "Transaction X runs much faster on my Windows laptop than on a Solaris box"

Lack of awareness and taking the hardware completely out of the discussions and context are the biggest problems with complaints like these. Those claims make sense only when the underlying hardware is the same in all test cases. For example, comparing a single user, single threaded transaction running on Windows, Linux and Solaris on x86 hardware is appropriate (as long as the type and speed of the processor are identical), but not against Solaris running on SPARC hardware. This is mainly because the processor architecture is completely different for x86 and SPARC platforms.

Besides, these days Oracle offers two types of SPARC hardware - 1. T-series and 2. M-series, which serve different purposes though they are compatible with each other. It is hard to compare and analyze the performance discrimination between different SPARC offerings (T- and M-series) too with no proper understanding of the characteristics of the CPUs in use. Choosing the right hardware for the right job is the key.

It is improper to compare the business transactions running on x86 with SPARC systems or even between different types of SPARC systems, and to incorrectly attribute the hardware strength or weakness to the operating system that runs on top of the bare metal. If there is so much of discrepancy among different operating environments, it is recommended to spend some time understanding the nuances in testing hardware before spending enormous amounts of time trying to tune the application and the operating system.

The bottomline: in addition to the software (application + OS), hardware plays an important role in the performance and scalability of an application - so, unless the testing hardware is the same for all test cases on different operating systems, don't you just focus on the operating system alone and make hasty decisions to switch to other operating platforms. Carefully choose appropriate hardware for the task in hand.

Thursday Sep 23, 2010

OOW 2010 : Accelerate and Bullet-Proof Your Siebel CRM Deployment with Oracle's Sun Servers

The best practices slides from today's OpenWorld presentation can be downloaded from the following location.

        Siebel on Oracle Solaris : Best Practices, Tuning Tips

The entire presentation with proper disclaimers and Oracle Solaris Cluster specific slides will be posted on Oracle's web site soon. Stay tuned.

Wednesday Jun 16, 2010

PeopleSoft NA Payroll 500K EE Benchmark on Solaris : The Saga Continues ..

Few clarifications before we start.

Difference between 240K and 500K EE PeopleSoft NA Payroll benchmarks

Not too long ago Sun published PeopleSoft NA Payroll 240K EE benchmark results with 16 job streams and 8 job streams. First of all, I want to make sure everyone understands the fact that PeopleSoft NA Payroll 240K and 500K EE benchmarks are two completely different benchmarks. The 240K database model represents a large sized organization where as 500K database model represents an extra-large sized organization. Vendors [who benchmark] have the flexibility of configuring 8, 16, 24 or 32 parallel job streams (or threads) in those two benchmarks to parallellize the work being done.

Now that the clarifications are out of the way, here is the direct URL for the 500K Payroll benchmark results that Oracle|Sun published last week. (document will be updated shortly to fix the branding issues)

    PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M5000 (500K EE 32 job streams)

What's changed at Sun & Oracle?

The 500K payroll benchmark work was started few months ago when Sun is an independent entity. By the time the benchmark work is complete, Sun was part of Oracle Corporation. However it has no impact whatsoever on the way we have been operating and interacting with the PeopleSoft benchmark team for the past few years. We (Sun) still have to package all the benchmark results and submit for validation just like any other vendor. It is still the same laborious process that we have to go through from both ends of Oracle (that is, PeopleSoft & Sun). I just mentioned this to highlight Oracle's non-compromising nature on anything at any level in publishing quality benchmarks.

SUMMARY OF 500K NA PAYROLL BENCHMARK RESULTS

The following bar chart summarizes all the published benchmark results by different vendors. Each 3D bar on X-axis represent one vendor, and the Y-axis shows the throughput (#payments/hour) achieved by corresponding vendor. Actual throughput and the vendor name is also shown in each of the 3D bar for clarity. Common sense dictates that higher the throughput, the better it is.

The numbers in the following table were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the results and the actual numbers that are of interest to the customers. The results in the following table are sorted by the hourly throughput (payments/hour) in the descending order. The goal of this benchmark is to achieve as much hourly throughput as possible. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 500,480 & Number of payments: 750,720
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 10/09 1x Sun SPARC Enterprise M5000 with 8 x 2.53 GHz SPARC64 VII Quad-Core CPUs & 64G RAM
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes. Capacity: 960 GB
1 x Sun Storage 2510 Array for redo logs. Capacity: 272 GB. Total storage capacity: 1.2 TB
32 50.11 898,886
IBM z/OS 1.10 1 x IBM System z10 Enterprise Class Model 2097-709 with 8 x 4.4 GHz IBM System z10 Gen1 CPUs & 32G RAM
1 x IBM TotalStorage DS8300. Total capacity: 9 TB
8\* 58.96 763,962
HP HP-UX B.11.31 1 x HP Integrity rx7640 with 8 x 1.6 GHz Intel Itanium2 9150 Dual-Core CPUs & 64G RAM
1 x HP StorageWorks EVA 8100. Total capacity: 8 TB
32 96.17 468,370

This is all public information. Feel free to compare the hardware configurations and the data presented in all three rows and draw your own conclusions. Since all vendors used the same benchmark toolkit, comparisons should be pretty straight forward.

Sun Storage F5100 Flash Array, the differentiator

Of all these benchmark results, clearly the F5100 storage array is the key differentiator. The payroll workload is I/O intensive, and requires low latency storage for better throughput (it is implicit that less latency means less I/O waits).

There is a lot of noise from some of the outside blog readers (I do not know who those readers are or who they work for) when Sun published the very first NA Payroll 240K EE benchmark with eight job streams using an F5100 array that has 40 flash modules (FMOD). Few people thought it is necessary to have those many flash modules to get that kind of performance that Sun demonstrated in the benchmark. Now that we have the 500K benchmark result as well, I want to highlight another fact that it is the same F5100 that was used in all the three NA Payroll benchmarks that Sun published in the last 10 months. Even though other vendors increased the number of disk drives when moved from 240K to 500K EE benchmark environment, Sun hasn't increased the flash modules in F5100 -- the number of FMODs remained at 40 even in 500K EE benchmark. This fact implicitly suggests at least two things -- 1. F5100 array is resilient, scales and performs consistently even with increased load. 2. May be 40 flash modules are not needed in 240K EE Payroll benchmark. Hopefully this will silence those naysayers and critics now.

While we are on the same topic, the storage capacity in the other array that was used to store the redo logs was in fact reduced from 5.3 TB in a J4200 array that was used in 240K EE/16 job stream benchmark to 272 GB in a 2510 array that was used in 500K EE/32 job stream benchmark. Of course, in both cases, the redo logs consumed only 20 GB on disk - but since the arrays were connected to the database server, we have to report the total capacity of the array(s) whether it is being used or not.

Notes on CPU utilization and IBM's #job streams

Even though I highlighted the I/O factor in the above paragraph, it is hard to ignore the fact that the NA Payroll workload is CPU intensive too. Even when multiple job streams are configured, each stream runs as a single-thread process -- hence it is vital to have a server with powerful processors for better [overall] performance.

Observant readers might have noticed couple of interesting things.

  1. The maximum average CPU usage that Sun reported in 500K EE benchmark in any scenario by any process is only 43.99% (less than half of the total processing capacity)

    The reason is simple. The SUT, M5000, has eight quad-core processors and each core is capable of running two hardware threads in parallel. Hence there are 64 virtual CPUs on the system, and since we ran only 32 job streams, only half of the total available CPU power was in use.

    Customers in a similar situation have the flexibility to consolidate another workload onto the same system to take advantage of the available/remaining CPU cycles.

  2. IBM's 500K EE benchmark result is only with 8 job streams

    I do not know the exact reason - but if I have to speculate, it is as good as anyone's guess. Based on the benchmark results white paper, it appears that the z10 system (mainframe) has eight single core processors, and perhaps that is why they ran the whole benchmark with only eight job streams.

Also See:

Friday Mar 26, 2010

2004-2010 : A Look Back at Sun Published Oracle Benchmarks

Since Sun Microsystems became a legacy, I got this idea of a reminiscent [farewell] blog post for the company that gave me the much needed break when I was a graduate student back in 2002. As I spend more than 50% of my time benchmarking different Oracle products on Sun hardware, it'd be fitting to fill this blog entry with a recollection of the benchmarks I was actively involved in over the past 6+ years. Without further ado, the list follows.

2004

 1.  10,000 user Siebel 7.5.2 PSPP benchmark on a combination of SunFire v440, v890 and E2900 servers. Database: Oracle 9i

2005

 2.  8,000 user Siebel 7.7 PSPP benchmark on a combination of SunFire v490, v890, T2000 and E2900 servers. Database: Oracle 9i

 3.  12,500 user Siebel 7.7 PSPP benchmark on a combination of SunFire v490, v890, T2000 and E2900 servers. Database: Oracle 9i

2006

 4.  10,000 user Siebel Analytics 7.8.4 benchmark on multiple SunFire T2000 servers. Database: Oracle 10g

2007

 5.  10,000 user Siebel 8.0 PSPP benchmark on two T5220 servers. Database: Oracle 10g R2

2008

 6.  Oracle E-Business Suite 11i Payroll benchmark for 5,000 employees. Database: Oracle 10g R1

 7.  14,000 user Siebel 8.0 PSPP benchmark on a single T5440 server. Database: Oracle 10g R2

 8.  10,000 user Siebel 8.0 PSPP benchmark on a single T5240 server. Database: Oracle 10g R2

2009

 9.  4,000 user PeopleSoft HR Self-Service 8.9 benchmark on a combination of M3000 and T5120 servers. Database: Oracle 10g R2

 10.  28,000 user Oracle Business Intelligence Enterprise Edition (OBIEE) 10.1.3.4 benchmark on a single T5440 server. Database: Oracle 11g R1

 11.  50,000 user Oracle Business Intelligence Enterprise Edition (OBIEE) 10.1.3.4 benchmark on two T5440 servers. Database: Oracle 11g R1

 12.  PeopleSoft North American Payroll 9.0 240K EE 8-stream benchmark on a single M4000 server with F5100 Flash Array storage. Database: Oracle 11g R1

2010

 13.  PeopleSoft North American Payroll 9.0 240K EE 16-stream benchmark on a single M4000 server with F5100 Flash Array storage. Database: Oracle 11g R1

 14.  6,000 user PeopleSoft Campus Solutions 9.0 benchmark on a combination of X6270 blades and M4000 server. Database: Oracle 11g R1


Although challenging and exhilarating, benchmarks aren't always pleasant to work on, and really not for people with weak hearts. While running most of these benchmarks, my blood pressure shot up several times leaving me wonder why do I keep working on time sensitive and/or politically, strategically incorrect benchmarks (apparently not every benchmark finds a home somewhere on the public network). Nevertheless in the best interest of my employer, the showdown must go on.

Wednesday Jan 20, 2010

PeopleSoft NA Payroll 240K EE Benchmark with 16 Job Streams : Another Home Run for Sun

Poor Steve A.[1] ... This entry is not about Steve A. though. It is about the new PeopleSoft NA Payroll benchmark result that Sun published today.

First things first. Here is the direct URL to our latest benchmark results:

        PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (16 job streams[2] -- simply referred as 'stream' hereonwards)

The summary of the benchmark test results is shown below only for the 16 stream benchmarks. These numbers were extracted from the very first page of the benchmark results white papers where Oracle|PeopleSoft highlights the significance of the results and the actual numbers that are of interest to the customers. The results in the following table are sorted by the hourly throughput (payments/hour) in the descending order. The goal is to achieve as much hourly throughput as possible. Click on the link that is underneath the hourly throughput values to open corresponding benchmark result.

Oracle PeopleSoft North American Payroll 9.0 - Number of employees: 240,000 & Number of payments: 360,000
Vendor OS Hardware Config #Job Streams Elapsed Time (min) Hourly Throughput
Payments per Hour
Sun Solaris 10 5/09 1x Sun SPARC Enterprise M4000 with 4 x 2.53 GHz SPARC64-VII Quad-Core processors and 32 GB memory
1 x Sun Storage F5100 Flash Array with 40 Flash Modules for data, indexes
1 x Sun Storage J4200 Array for redo logs
16 43.78 493,376
HP HP-UX 1 x HP Integrity rx6600 with 4 x 1.6 GHz Intel Itanium2 9000 Dual-Core processors and 32 GB memory
1 x HP StorageWorks EVA 8100
16 68.07 317,320

This is all public information. Feel free to compare the hardware configurations and the data presented in both of the rows and draw your own conclusions. Since both Sun and HP used the same benchmark toolkit, workload and ran the benchmark with the same number of job streams, comparison should be pretty straight forward.

If you want to compare the 8 stream results, check the other blog entry: PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise. Sun used the same hardware to run both benchmark tests with 8 and 16 streams respectively. We could have gotten away with 20+ Flash Modules (FMODs), but we want to keep the benchmark environment consistent with our prior benchmark effort around the same benchmark workload with 8 job streams. Due to the same hardware setup, now we can easily demonstrate the advantage of parallelism (simply by comparing the test results from 8 and 16 stream benchmarks) and how resilient and scalable the F5100 Flash array is.

Our benchmarks showed an improvement of ~55% in overall throughput when the number of job streams were increased from 8 to 16. Also our 16 stream results showed ~55% improvement in overall throughput over HP's published results with the same number of streams at a maximum average CPU utilization of 45% compared to HP's maximum average CPU utilization of 89%. The half populated Sun Storage F5100 Flash Array played the key role in both of those benchmark efforts by demonstrating superior I/O performance over the traditional disk based arrays.

Before concluding, I would like to highlight a few known facts (just for the benefit of those people who may fall for the PR trickery):

  1. 8 job streams != 16 job streams. In other words, the results from an 8 stream effort is not comparable to that of a 16 stream result.
  2. The throughput should go up with increased number of job streams [ only up to some extent -- do not forget that there will be a saturation point for everything ]. For example, the throughput with 16 streams might be higher compared to the 8 stream throughput.
  3. The Law of Diminishing Returns applies to the software world too, not just for the economics. So, there is no guarantee that the throughput will be much better with 24 or 32 job streams.

Other blog posts and documents of interest:

  1. Best Practices for Oracle PeopleSoft Enterprise Payroll for North America using the Sun Storage F5100 Flash Array or Sun Flash Accelerator F20 PCIe Card
  2. PeopleSoft Enterprise Payroll 9.0 using Oracle for Solaris on a Sun SPARC Enterprise M4000 (8 streams benchmark white paper)
  3. PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise
  4. App benchmarks, incorrect conclusions and the Sun Storage F5100
  5. Oracle PeopleSoft Payroll (NA) Sun SPARC Enterprise M4000 and Sun Storage F5100 World Record Performance
































Notes:

[1] Steve A. tried so hard and his best to make everyone else believe that HP's 16 job stream NA Payroll 240K EE benchmark results are on par with Sun's 8 stream benchmark results. Apparently Steve A. failed and gave up after we showed the world a few screenshots from a published and eventually withdrawn benchmark [ by HP ]. You can read all his arguments, comparisons etc., in the comments section of my other blog entry PeopleSoft North American Payroll on Sun Solaris with F5100 Flash Array : A blog Reprise as well as in Joerg Moellenkamp's blog entries around the same topic.

[2] In PeopleSoft terminology, a job stream is something that is equivalent to a thread.

Friday Oct 09, 2009

Sun achieves the Magic Number 50,000 on T5440 with Oracle Business Intelligence EE 10.1.3.4

Less than two months ago, Sun Microsystems published an Oracle Business Intelligence benchmark with the best single system performance of 28,000 concurrent BI EE users at ~75% CPU utilization. Sun and Oracle Corporation announced another Oracle Business Intelligence benchmark result today with two identical T5440 servers in the Oracle BI Cluster serving 50,000 concurrent BI EE users.

An Oracle white paper with Sun's 50,000 user benchmark results can be accessed from Oracle's Business Intelligence web.

The hardware specifications for each of the T5440s are similar to the hardware that was used in the prior benchmark effort on a single T5440 server. However this time the Presentation Catalog (also frequently referred as the Web Catalog) was moved to a T5220 server where the NFS server was running. Besides this the only other change from the earlier 28,000 user benchmark exercise is the addition of another T5440 to the test rig.

The following graph shows the scalability of the application from one node to four nodes to eight nodes running on T5440 servers.

OBIEE on T5440 : Scalability Graph

Without further ado, here is the summary of the benchmark results along with their significance and some interesting facts:

  • One of the major goals of this benchmark effort is to show the horizontal and vertical scalability of the application (OBIEE) by highlighting the superior performance and the resilience of the underlying hardware (T5440) and the operating system (Solaris). Needless to say the goal has been met.

  • Another goal of this benchmark is to show decent number of concurrent BI EE users executing transactions with good response times. Since we already showed the maximum load that can be achieved on a single BI instance (7500 users) and on a single T5440 server running multiple BI instances (28,000 users), this time we did not attempt to get the peak number that can be achieved from the two T5440 servers in the benchmark environment. Now that there is an additional server in the test setup that is taking care of the Presentation Catalog and the database server, 2 \* 28000 = 56,000 BI EE users would have been an achievable target -- but we opted to stop at the "magic" and the "respectable" number 50,000 instead.

  • The entire benchmark run lasted for about 9 hours 45 minutes, and out of which 8 hours were the rampup hours where the 50,000 BI virtual users were logging into the application few users at a time. LoadRunner tool reported only 4 errors for the entire duration of the run; and there are zero errors in the 60 minute steady state period during which the statistics reported in the document were collected.

  • Two Sun SPARC Enterprise T5440 servers each with 4 x 8-Core 1.6 GHz UltraSPARC T2 Plus processors delivered the best performance of 50,000 concurrent BI EE users at around 63% CPU utilization.

  • The BI EE Cluster was deployed on two T5440 servers running Solaris 10 5/09 operating system. All the nodes in the BI Cluster were consolidated onto two T5440 servers using the free and efficient Solaris Containers virtualization technology.

  • The Presentation Catalog was hosted on ZFS powered file system that was created on top of four internal Solid State Drive (SSD) disks. The Catalog was shared among all eight BI nodes in the cluster as an NFS share. One 8-Core 1.2 GHz UltraSPARC T2 processor powered T5220 server was used to run the NFS server. Due to the minimal activity of the database, Oracle 11g database was also hosted on the same server. Solaris 10 5/09 is the operating system.

  • Solid State Drive (SSD) disks with ZFS file system showed significant I/O performance improvement over traditional disks for the Presentation Catalog activity. In addition, ZFS helped get past the UFS limitation of 32767 sub-directories in a Presentation Catalog directory.

  • Caching was turned ON at the application server, which led to minimal database activity on the server. Note hat the caching mechanism was turned ON even in the prior benchmark exercise.

  • The low end CoolThreads CMT Server T5220 and the mid-range T5440 server once again proved to be ideal candidates to deploy and run multi-thread workloads by exhibiting resilient performance when handling large number of simultaneous requests from 50,000 BI EE virtual users. T5220 handled large number of concurrent asynchronous read/write requests from eight different NFS clients.

  • NFS v3 was configured at the NFS Server as well as at the NFS Client nodes. NFS version 4 is the default on Solaris 10, and it might have worked as expected. However a handful of bug reports prompted us to go with the more matured and less buggy version 3.

  • 3283 watts is the average power consumption when all the 50,000 concurrent BI users are in the steady state of the benchmark test. That is, in the case of similarly configured workloads, the T5440 server supports 15.2 users per watt of energy consumed and supports 5,000 users per rack unit.

  • A summary of the results with system-wide averages of CPU and memory utilization is shown below. The latest results are highlighted in blue color.

    #Vusers Clustered #BI Nodes #CPU #Core RAM CPU Memory Avg Trx Response Time #Trx/sec
    7,500 No 1 1 8 32 GB 72.85% 18.11 GB 0.22 sec 155
    28,000 Yes 4 4 32 128 GB 75.04% 76.16 GB 0.25 sec 580
    50,000 Yes 8 8 64 256 GB 63.32% 172.21 GB 0.28 sec 1031

TOPOLOGY DIAGRAM

The topology diagram in the benchmark results white paper is almost illegible. Here is the original topology diagram that was inserted into the white paper.

OBIEE on T5440 : 50K User Benchmark Topology

Quite frankly I'm not very proud of this drawing -- but that's the best that I could come up with in a short span. Rather than showing the flow of communication between each and every component in the benchmark setup, I simplified the drawing by introducing a "black box" sort of thing - "private network" - in the middle, which protected the drawing from getting messy.


CPU USAGE GRAPH

The following two-dimensional graph shows the CPU utilization patterns at all 3 nodes in the benchmark setup for the 60 minute steady state of the benchmark run. This graph was generated using the free GNUplot tool with sar data as the inputs.

OBIEE on T5440 : 50K User Benchmark CPU Usage Graph

COMPETITIVE LANDSCAPE

And finally here is a quick summary of all the results that are published by different vendors so far with similar benchmark kit. Feel free to draw your own conclusions. All this is public information. Check the corresponding benchmark reports by clicking on the URLs under the "#Users" column.

Server Processors #Users OS
Chips Cores Threads GHz Type
  2 x Sun SPARC Enterprise T5440 (APP)
  1 x Sun SPARC Enterprise T5220 (NFS,DB)
8
1
64
8
512
64
1.6
1.2
UltraSPARC T2 Plus
UltraSPARC T2
50,000 Solaris 10 5/09
  1 x Sun SPARC Enterprise T5440 4 32 256 1.6 UltraSPARC T2 Plus 28,000 Solaris 10 5/09
  5 x Sun Fire T2000 1 8 32 1.2 UltraSPARC T1 10,000 Solaris 10 11/06
  3 x HP DL380 G4 2 4 4 2.8 Intel Xeon 5,800 OEL
  1 x IBM x3755 4 8 8 2.8 AMD Opteron 4,000 RHEL4


Before you go, do not forget to check the best practices for configuring / deploying Oracle Business Intelligence on top of Solaris 10 running on Sun CMT hardware.

Related Blog Posts:
T5440 Rocks [again] with Oracle Business Intelligence Enterprise Edition Workload

Monday Aug 17, 2009

Oracle Business Intelligence on Sun : Few Best Practices

(Updated on 10/16/09 with additional content and restructured the blog entry for clarity and easy navigation)

The following suggested best practices are applicable to all Oracle BI EE deployments on Sun hardware (CMT and M-class) running Solaris 10 or later. These recommendations are based on our observations from the 50,000 user benchmark on Sun SPARC Enterprise T5440. It is not the complete list, and your mileage may vary.

Hardware : Firmware

Ensure that the system's firmware is up-to-date.

Solaris Recommendations

  • Upgrade to the latest update release of Solaris 10.

  • Solaris runs in 64-bit mode by default on SPARC platform. Consider running 64-bit BI EE on Solaris.

      64-bit BI EE platform is immune to the 4 GB virtual memory limitation of the 32-bit BI EE platform -- hence can potentially support even more users and have larger caches as long as the hardware resources are available.

  • Enable 256M large pages on all nodes. By default, the latest update of Solaris 10 will use a maximum of 4M pages even when 256M pages are a good fit.

      256M pages can be enabled with the following /etc/system tunables.
      
      \* 256M pages for the process heap
      set max_uheap_lpsize=0x10000000
      
      \* 256M pages for ISM
      set mmu_ism_pagesize=0x10000000
      
      

  • Increase the file descriptor limits by adding the following lines to /etc/system on all BI nodes.
      
      \* file descriptor limits
      set rlim_fd_cur=65536
      set rlim_fd_max=65536
      
      
  • On larger systems with more CPUs or CPU cores, try not to deploy Oracle BI EE in the global zone.

      In our benchmark testing, we have observed unpredictable and abnormal behavior of the BI server process (nqsserver) in the global zone under moderate loads. This behavior is clearly noticeable when there are more than 64 vcpus allocated to the global zone.

  • If the BI presentation catalog is stored on a local file system, create a ZFS file system to hold the catalog.

      If there are more than 25,000 authorized users in a BI deployment, the default UFS file system may run into Too many links error when the Presentation Server tries to create more than 32,767 sub-directories (refer to LINK_MAX on Solaris)

  • Store the Presentation Catalog on a disk with faster I/O such as a Solid State Drive (SSD). For uniform reads and writes across different disk drives [ and of course for better performance ], we recommend creating ZFS file system on top of a zpool with multiple SSDs.

    Here is an example that shows the ZFS file system creation steps for the BI Presentation Catalog.

    
    # zpool create -f BIshare c1t2d0s6 c1t3d0s0 c1t4d0s6 c1t5d0s6
    
    # zpool list
    NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
    BIshare   118G    97K   118G     0%  ONLINE  -
    
    # zfs create BIshare/WebCat
    
    # fstyp /dev/dsk/c1t2d0s6
    zfs
    
    # zpool status -v
      pool: BIshare
     state: ONLINE
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            BIshare     ONLINE       0     0     0
              c1t2d0s6  ONLINE       0     0     0
              c1t3d0s0  ONLINE       0     0     0
              c1t4d0s6  ONLINE       0     0     0
              c1t5d0s6  ONLINE       0     0     0
    
    errors: No known data errors
    
    

    Observe the I/O activity on ZFS file system by running zpool iostat -v command.

Solaris : ZFS Recommendations

  • If the file system is mainly used for storing the Presentation Catalog, consider setting the ZFS record size to 8K. This is because of the relatively small size (8K or less) reads/writes from/into the BI Catalog.

    eg.,
    
            # zfs set recordsize=8K BIshare/WebCat
    
    

    In the case of database, you may have to set the ZFS record size to the database block size.

  • Even though disabling ZFS Intent Log (ZIL) may improve the performance of synchronous write operations, it is not a recommended practice to disable ZIL. Doing so may compromise the data integrity.

      Disabling the ZIL on an NFS Server can lead to client side corruption.

  • When running CPU intensive workloads, consider disabling the ZFS' metadata compression to provide more CPU cycles to the application.

      Starting with Solaris 10 11/06, metadata compression can be disabled and enabled dynamically as shown below.

      To disable the metadata compression:

      
              # echo zfs_mdcomp_disable/W0t1 | mdb -kw
      
      

      To enable the metadata compression:

      
              # echo zfs_mdcomp_disable/W0t0 | mdb -kw
      
      

      To permanently disable the metadata compression, set the following /etc/system tunable.

      
              set zfs:zfs_mdcomp_disable=1
      
      

Solaris : NFS Recommendations

One of the requirements of OBIEE is that the BI Presentation Catalog must be shared across different BI nodes in the BI Cluster. (There will be only one copy of the presentation catalog). Unless the catalog has been replicated on different nodes, there is no other choice but to share it across different nodes. One way to do this is to create an NFS share with the top level directory of the catalog, and then to mount it over NFS at the BI nodes.

  • Version 4 is the default NFS version on Solaris 10. However it appears that as of this writing, NFS v4 is not as mature as v3. So we recommend experimenting with both versions to see which one fits well to the needs of the BI deployment.

    To enable NFS v3 on both server and the client, edit /etc/default/nfs and make the changes as shown below.

      NFS Server
      NFS_SERVER_VERSMIN=3 NFS_SERVER_VERSMAX=3
      NFS Client
      NFS_CLIENT_VERSMIN=3 NFS_CLIENT_VERSMAX=3
  • Experiment with the following NFS tunables.

      NFS Server
      
      NFSD_SERVERS=<desired_number> <-- on CMT systems with large number of hardware threads you can go as high as 512
      NFS_SERVER_DELEGATION=[ON|OFF] <-- ON is the default. Experiment with OFF
      NFSMAPID_DOMAIN=<network_domain_where_BI_was_deployed>
      
      
      NFS Client
      NFSMAPID_DOMAIN=<network_domain_where_BI_was_deployed>
  • Monitor the DNLC hit rate and tune the directory name look-up cache (DNLC).

      To monitor the DNLC hit rate, run "vmstat -s | grep cache" command. It is ideal to see a hit rate of 95% or above.

      Add the following tunable parameter to /etc/system on NFS server with a desired value for the DNLC cache.

      
              set ncsize=<desired_number>
      
      
  • Mounting NFS Share

    Mount the NFS share that contains the Presentation Services Catalog on all the NFS clients (BI nodes in this context) using the following mount options:

    
            rw, forcedirectio, nocto
    
    

Oracle BI EE Cluster Deployment Recommendations

  • Ensure that all the BI components in the cluster are configured in a many-to-many fashion

  • For proper load balancing, configure all BI nodes to be identical in the BI Cluster

  • When planning to add an identically configured new node to the BI Cluster, simply clone an existing well-configured BI node running in a non-global zone.

      Cloning a BI node running in a dedicated zone results in an exact copy of the BI node being cloned. This approach is simple, less error prone and eliminates the need to configure the newly added node from scratch.

Oracle BI Presentation Services Configuration Recommendations

  • Increase the file descriptors limit. Edit SAROOTDIR/setup/systunesrv.sh to increase the value from 1024 to any other value of your choice. In addition you must increase the shell limit using the ulimit -n command

    eg.,
    
    	ulimit -n 2048
    
    

  • Configure 256M large pages for the JVM heap of Chart server and OC4J web server (this recommendation is equally applicable to other web servers such as WebLogic or Sun Java system Web Server). Also use parallel GC, and restrict the number of parallel GC threads to 1/8th of the number of virtual cpus.

    eg.,

    
    	-XX:LargePageSizeInBytes=256M -XX:+UseParallelGC -XX:ParallelGCThreads=8
    
    

  • The Oracle BI Presentation Server keeps the access information of all the users in the Web Catalog. When there are large number of unique BI users, it can take a significant amount of time to look up a user if all the users reside in a single directory. To avoid this, hash the user directories. It can be achieved by having the following entry in SADATADIR/web/config/instanceconfig.xml

    eg.,

    
    	<Catalog>
    		<HashUserHomeDirectories>2</HashUserHomeDirectories>
    	</Catalog>
    
    

    HashUserHomeDirectories specifies the number of characters to use to hash user names into sub directories. When this element is turned on, for example, the default name for user Steve's home directory would become /users/st/steve.

  • BI Server and BI Presentation Server processes create many temporary files while rendering reports and dashboards for a user. This can result in significant I/O activity on the system. The I/O waits can be minimized by pointing the temporary directories to a memory resident file system such as /tmp on Solaris OS. To achieve this, add the following line to the instanceconfig.xml configuration file.

    eg.,

    
    	<TempDir>/tmp/OracleBISAW</TempDir>
    
    

    Similarly the Temporary directory (SATEMPDIR) can be pointed to a memory resident file system such as /tmp to minimize the I/O waits.

  • Consider tuning the value of CacheMaxEntries in instanceconfig.xml. A value of 20,000 was used in the 50,000 user OBIEE benchmark on T5440 servers. Be aware that the Presentation Services process (sawserver64) consumes more virtual memory when this parameter is set to a high value.

    eg.,
    
    	<CacheMaxEntries>20000</CacheMaxEntries>
    
    
  • If the presentation services log contains errors such as "The queue for the thread pool AsyncLogon is at it's maximum capacity of 50 jobs.", consider increasing the Presentation Services' asynchronous job queue. 50 is the default value.

    The following example increases the job queue size to 200.

    
    	<ThreadPoolDefaults>
    		<AsyncLogon>
    			<MaxQueue>200</MaxQueue>
    		</AsyncLogon>
    	</ThreadPoolDefaults>
    
    
  • Increase the query cache expiry time especially when the BI deployment is supposed to handle large number of concurrent users. The default is 60 minutes. However under very high loads, a cache entry may be removed before one hour if many queries are being run. Hence it is necessary to tune the parameter CacheMaxExpireMinutes in Presentation Services' instanceconfig.xml.

    The following example increases the query cache expiry time to 3 hours.

    
    	<CacheMaxExpireMinutes>180</CacheMaxExpireMinutes>
    
    
  • Consider increasing the Presentation Services' cache timeout values to keep the cached data intact for longer periods.

    The following example increases the cache timeout values to 5 hours in instanceconfig.xml configuration file.

    
    	<AccountIndexRefreshSecs>18000</AccountIndexRefreshSecs>
    	<AccountCacheTimeoutSecs>18000</AccountCacheTimeoutSecs>
    	<CacheTimeoutSecs>18000</CacheTimeoutSecs>
    	<CacheCleanupSecs>18000</CacheCleanupSecs>
    	<PrivilegeCacheTimeoutSecs>18000</PrivilegeCacheTimeoutSecs>
    
    

Oracle BI Server Configuration Recommendations

  • Enable caching at the BI server and control/tune the cache expiry time for each of the table based on your organizations' needs.

  • Unless the repository needs to be edited online frequently, consider setting up the "read only" mode for the repository. It may ease lock contention up to some extent.

  • Increase the session limit and the number of requests per session limit especially when the BI deployment is expected to handle large number of concurrent users. Also increase the number of BI server threads.

    The following configuration was used in 50,000 user OBIEE benchmark on T5440 servers.

    
    (Source configuration file: NQSConfig,.INI)
    
    [ CACHE ]
    ENABLE = YES;
    DATA_STORAGE_PATHS = "/export/oracle/OracleBIData/cache" 500 MB;
    MAX_ROWS_PER_CACHE_ENTRY = 0;
    MAX_CACHE_ENTRY_SIZE = 10 MB;
    MAX_CACHE_ENTRIES = 5000;
    POPULATE_AGGREGATE_ROLLUP_HITS = NO;
    USE_ADVANCED_HIT_DETECTION = NO;
    
    // Cluster-aware cache
    GLOBAL_CACHE_STORAGE_PATH = "/export/oracle/OracleBIsharedRepository/GlobalCacheDirectory" 2048 MB;
    MAX_GLOBAL_CACHE_ENTRIES = 10000;
    CACHE_POLL_SECONDS = 300;
    CLUSTER_AWARE_CACHE_LOGGING = NO;
    
    [ SERVER ]
    READ_ONLY_MODE = YES;
    MAX_SESSION_LIMIT = 20000 ;
    MAX_REQUEST_PER_SESSION_LIMIT = 1500 ;
    SERVER_THREAD_RANGE = 512-2048;
    SERVER_THREAD_STACK_SIZE = 0;
    DB_GATEWAY_THREAD_RANGE = 512-512;
    
    #SERVER_HOSTNAME_OR_IP_ADDRESSES = "ALLNICS";
    CLUSTER_PARTICIPANT = YES;
    
    

Related Blog Posts

Saturday Dec 20, 2008

Siebel on Sun Solaris: More Performance with Less Number of mprotect() Calls

By default each transaction in Siebel CRM application makes a large number of serialized mprotect() calls which in turn may degrade the performance of Siebel. When the load is very high on the Siebel application servers, the mprotect() calls are serialized by the operating system kernel resulting in high number of context switches and low CPU utilization.

If a Siebel deployment exhibits the above mentioned pathological conditions, performance / scalability of the application can be improved by limiting the number of mprotect() calls from the application server processes during the run-time. To achieve this behavior, set the value of Siebel CRM's AOM tunable parameter MemProtection to FALSE. Starting with the release of Siebel 7.7, the parameter MemProtection is a hidden one with the default value of TRUE. To set its value to FALSE, run the following command from the CLI version of Siebel Server Manager - srvrmgr.


change param MemProtection=False for comp <component_alias_name> server <siebel_server_name>

where:

component_alias_name is the alias name of the AOM component to be configured. eg., SCCObjMgr_enu is the alias for the Call Center Object Manager, and

siebel_server_name is the name of the Siebel Server for which the component being configured.

Note that this parameter is not a dynamic one - hence the Siebel application server(s) must be restarted for this parameter to be effective.

Run truss -c -p <pid_of_any_busy_siebmtshmw_process> before and after the change to see how the mprotect system call count varies.

For more information about this tunable on Solaris platform, check Siebel Performance Tuning Guide Version 7.7 or later in Siebel Bookshelf.

See Also:
Siebel on Sun CMT hardware : Best Practices

(Originally posted on blogger at:
Siebel on Sun Solaris: More Performance with Less mprotect() Calls)

Sunday Nov 30, 2008

PeopleSoft on Solaris 10: Fixing the "msgget: No space left on device" Error

(Crossposting the 8+ month old blog entry from my other blog hosted on blogger. Source URL:
http://technopark02.blogspot.com/2008/03/peoplesoft-fixing-msgget-no-space-left.html
)

When a large number of application server processes are configured in a single PeopleSoft domain or in multiple domains cumulative, it is very likely that the PeopleSoft application server domain boot process may fail with errors like:

Booting server processes ...
exec PSSAMSRV -A -- -C psappsrv.cfg -D CS90SPV -S PSSAMSRV :
        Failed.
113954.ben15!PSSAMSRV.29746.1.0: LIBTUX_CAT:681: ERROR: Failure to create message queue
113954.ben15!PSSAMSRV.29746.1.0: LIBTUX_CAT:248: ERROR: System init function failed, Uunixerr = : 
                   msgget: No space left on device
113954.ben15!tmboot.29708.1.-2: CMDTUX_CAT:825: ERROR: Process PSSAMSRV at ben15 failed with /T 
                   tperrno (TPEOS - operating system error)

In this particular example, the PeopleSoft Enterprise is running on a Solaris 10 system. Fortunately the error message is very clear in this case; and the failure is related to the message queues. During the domain boot up process, there is a call to msgget() to create a message queue. If the call to msgget() succeeds, it returns a non-negative integer that serves as the identifier for the newly created message queue. However in the case of a failure, it returns -1 and sets the error number to EACCES, EEXIST, ENOENT or ENOSPC depending on the underlying reason.

From the above error messages it clear that the msgget() failed with the errno set to ENOSPC (No space left on device). Man page of msgget(2) has the following explanation for ENOSPC error code on Solaris:

ERRORS
     The msgget() function will fail if:
     ...
     ...
     ENOSPC    A message queue identifier is to  be  created  but
               the  system-imposed limit on the maximum number of
               allowed  message  queue  identifiers  system  wide
               would be exceeded. See NOTES.

NOTES
     ...
     ...

     The system-imposed limit on  the  number  of  message  queue
     identifiers  is  maintained on a per-project basis using the
     project.max-msg-ids resource control.

It has enough clues to suspect the configured number for the message queue identifiers.

Prior to the release of Solaris 10, the /etc/system System V IPC tunable, msgsys:msginfo_msgmni, was used to control the maximum number of message queues that can be created. The default value on pre-Solaris 10 systems is 50.

With the release of Solaris 10, majority of the System V IPC tunables were obsoleted and equivalent resource controls were created for the remaining tunables to reduce the administrative overhead. On Solaris 10 and later versions, System V IPC can be tuned on a per project basis using the newly introduced resource controls.

On any Solaris 10 system, the resource control, project.max-msg-ids, replaced the old /etc/system tunable, msginfo_msgmni. And the default value has been raised to 128.

Now back to the failure in PeopleSoft environment. Let's first check the current value configured for project.max-msg-ids.

  • Get the project ID.
     % id -p
    uid=222227(psft) gid=2294(dba) projid=3(default)
  • Examine the project.max-msg-ids resource control for the project with ID 3, using the prctl utility.
     % prctl -n project.max-msg-ids -i project 3
    project: 3: default
    NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
    project.max-msg-ids
            privileged        128       -   deny                                 -
            system          16.8M     max   deny                                 -

Alternatively run the command ipcs -q to check the number of active message queues. Note that the project with id '3' is configured to create a maximum of 128 (default) message queues. In any case, the number of active message queues from the ipcs -q output may almost match with the configured value for the project.max-msg-ids.

Since it appears the configured PeopleSoft domain(s) needs more than 128 message queues in order to bring up all the application server processes that constitute the PeopleSoft Enterprise, the solution is to increase the value for the resource control, project.max-msg-ids, to any value beyond 128. For the sake of simplicity, let's increase it to 256 (2 \* default value, that is). Again prctl utility can be used to set the new value for the resource control.

  • Assume the privileges of the 'root' user
     % su
    Password:
  • Increase the maximum value for the message queue identifiers to 256 using the prctl utility.
     # prctl -n project.max-msg-ids -r -v 256 -i project 3
  • Verify the new maximum value for the message queue identifiers
     # prctl -n project.max-msg-ids -i project 3
    project: 3: default
    NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
    project.max-msg-ids
            privileged        256       -   deny                                 -
            system          16.8M     max   deny                                 -

With this change, the PeopleSoft Enterprise should boot up at least with no Failure to create message queue .. msgget: No space left on device errors.

Before we conclude, note that the above mentioned solution is not persistent across multiple operating system reboots. To make it persistent, create a new project using the projadd command. The man page for projadd(1M) has an example showing the creation of a project.

Friday Nov 21, 2008

Oracle on Solaris 10 : Fixing the 'ORA-27102: out of memory' Error

(Crossposting the 2+ year old blog entry from my other blog hosted on blogger. Source URL:
http://technopark02.blogspot.com/2006/09/solaris-10oracle-fixing-ora-27102-out.html)

Symptom:

As part of a database tuning effort you increase the SGA/PGA sizes; and Oracle greets with an ORA-27102: out of memory error message. The system had enough free memory to serve the needs of Oracle.

SQL> startup
ORA-27102: out of memory
SVR4 Error: 22: Invalid argument

Diagnosis
$ oerr ORA 27102
27102, 00000, "out of memory"
// \*Cause: Out of memory
// \*Action: Consult the trace file for details

Not so helpful. Let's look the alert log for some clues.

% tail -2 alert.log
WARNING: EINVAL creating segment of size 0x000000028a006000
fix shm parameters in /etc/system or equivalent

Oracle is trying to create a 10G shared memory segment (depends on SGA/PGA sizes), but operating system (Solaris in this example) responded with an invalid argument (EINVAL) error message. There is a little hint about setting shm parameters in /etc/system.

Prior to Solaris 10, shmsys:shminfo_shmmax parameter has to be set in /etc/system with maximum memory segment value that can be created. 8M is the default value on Solaris 9 and prior versions; where as 1/4th of the physical memory is the default on Solaris 10 and later. On a Solaris 10 (or later) system, it can be verified as shown below:

% prtconf | grep Mem
Memory size: 32760 Megabytes

% id -p
uid=59008(oracle) gid=10001(dba) projid=3(default)

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      7.84GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

Now it is clear that the system is using the default value of 8G in this scenario, where as the application (Oracle) is trying to create a memory segment (10G) larger than 8G. Hence the failure.

So, the solution is to configure the system with a value large enough for the shared segment being created, so Oracle succeeds in starting up the database instance.

On Solaris 9 and prior releases, it can be done by adding the following line to /etc/system, followed by a reboot for the system to pick up the new value.

set shminfo_shmmax = 0x000000028a006000

However shminfo_shmmax parameter was obsoleted with the release of Solaris 10; and Sun doesn't recommend setting this parameter in /etc/system even though it works as expected.

On Solaris 10 and later, this value can be changed dynamically on a per project basis with the help of resource control facilities . This is how we do it on Solaris 10 and later:

% prctl -n project.max-shm-memory -r -v 10G -i project 3

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      10.0GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

Note that changes made with the prctl command on a running system are temporary, and will be lost when the system is rebooted. To make the changes permanent, create a project with projadd command and associate it with the user account as shown below:

% projadd -p 3  -c 'eBS benchmark' -U oracle -G dba  -K 'project.max-shm-memory=(privileged,10G,deny)' OASB
% usermod -K project=OASB oracle
Finally make sure the project is created with projects -l or cat /etc/project commands.

% projects -l
...
...
OASB
        projid : 3
        comment: "eBS benchmark"
        users  : oracle
        groups : dba
        attribs: project.max-shm-memory=(privileged,10737418240,deny)

% cat /etc/project
...
...
OASB:3:eBS benchmark:oracle:dba:project.max-shm-memory=(privileged,10737418240,deny)

With these changes, Oracle would start the database up normally.

SQL> startup
ORACLE instance started.

Total System Global Area 1.0905E+10 bytes
Fixed Size                  1316080 bytes
Variable Size            4429966096 bytes
Database Buffers         6442450944 bytes
Redo Buffers               31457280 bytes
Database mounted.
Database opened.

Related information:

  1. What's New in Solaris System Tuning in the Solaris 10 Release?
  2. Resource Controls (overview)
  3. System Setup Recommendations for Solaris 8 and Solaris 9
  4. Man page of prctl(1)
  5. Man page of projadd


Addendum : Oracle RAC settings

Anonymous Bob suggested the following settings for Oracle RAC in the form of a comment for the benefit of others who run into similar issue(s) when running Oracle RAC. I'm pasting the comment as is (Disclaimer: I have not verified these settings):

Thanks for a great explanation, I would like to add one comment that will help those with an Oracle RAC installation. Modifying the default project covers oracle processes great and is all that is needed for a single instance DB. In RAC however, the CRS process starts the DB and it is a root owned process and root does not use the default project. To fix ORA-27102 issue for RAC I added the following lines to an init script that runs before the init.crs script fires.

# Recommended Oracle RAC system params
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536

# For root processes like crsd
prctl -n project.max-shm-memory -r -v 8G -i project system
prctl -n project.max-shm-ids -r -v 512 -i project system

# For oracle processes like sqlplus
prctl -n project.max-shm-memory -r -v 8G -i project default
prctl -n project.max-shm-ids -r -v 512 -i project default

So simple yet it took me a week working with Oracle and SUN to come up with that answer...Hope that helps someone out.

Bob
# posted by Blogger Bob : 6:48 AM, April 25, 2008

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today