Friday Oct 09, 2009

Sun achieves the Magic Number 50,000 on T5440 with Oracle Business Intelligence EE 10.1.3.4

Less than two months ago, Sun Microsystems published an Oracle Business Intelligence benchmark with the best single system performance of 28,000 concurrent BI EE users at ~75% CPU utilization. Sun and Oracle Corporation announced another Oracle Business Intelligence benchmark result today with two identical T5440 servers in the Oracle BI Cluster serving 50,000 concurrent BI EE users.

An Oracle white paper with Sun's 50,000 user benchmark results can be accessed from Oracle's Business Intelligence web.

The hardware specifications for each of the T5440s are similar to the hardware that was used in the prior benchmark effort on a single T5440 server. However this time the Presentation Catalog (also frequently referred as the Web Catalog) was moved to a T5220 server where the NFS server was running. Besides this the only other change from the earlier 28,000 user benchmark exercise is the addition of another T5440 to the test rig.

The following graph shows the scalability of the application from one node to four nodes to eight nodes running on T5440 servers.

OBIEE on T5440 : Scalability Graph

Without further ado, here is the summary of the benchmark results along with their significance and some interesting facts:

  • One of the major goals of this benchmark effort is to show the horizontal and vertical scalability of the application (OBIEE) by highlighting the superior performance and the resilience of the underlying hardware (T5440) and the operating system (Solaris). Needless to say the goal has been met.

  • Another goal of this benchmark is to show decent number of concurrent BI EE users executing transactions with good response times. Since we already showed the maximum load that can be achieved on a single BI instance (7500 users) and on a single T5440 server running multiple BI instances (28,000 users), this time we did not attempt to get the peak number that can be achieved from the two T5440 servers in the benchmark environment. Now that there is an additional server in the test setup that is taking care of the Presentation Catalog and the database server, 2 \* 28000 = 56,000 BI EE users would have been an achievable target -- but we opted to stop at the "magic" and the "respectable" number 50,000 instead.

  • The entire benchmark run lasted for about 9 hours 45 minutes, and out of which 8 hours were the rampup hours where the 50,000 BI virtual users were logging into the application few users at a time. LoadRunner tool reported only 4 errors for the entire duration of the run; and there are zero errors in the 60 minute steady state period during which the statistics reported in the document were collected.

  • Two Sun SPARC Enterprise T5440 servers each with 4 x 8-Core 1.6 GHz UltraSPARC T2 Plus processors delivered the best performance of 50,000 concurrent BI EE users at around 63% CPU utilization.

  • The BI EE Cluster was deployed on two T5440 servers running Solaris 10 5/09 operating system. All the nodes in the BI Cluster were consolidated onto two T5440 servers using the free and efficient Solaris Containers virtualization technology.

  • The Presentation Catalog was hosted on ZFS powered file system that was created on top of four internal Solid State Drive (SSD) disks. The Catalog was shared among all eight BI nodes in the cluster as an NFS share. One 8-Core 1.2 GHz UltraSPARC T2 processor powered T5220 server was used to run the NFS server. Due to the minimal activity of the database, Oracle 11g database was also hosted on the same server. Solaris 10 5/09 is the operating system.

  • Solid State Drive (SSD) disks with ZFS file system showed significant I/O performance improvement over traditional disks for the Presentation Catalog activity. In addition, ZFS helped get past the UFS limitation of 32767 sub-directories in a Presentation Catalog directory.

  • Caching was turned ON at the application server, which led to minimal database activity on the server. Note hat the caching mechanism was turned ON even in the prior benchmark exercise.

  • The low end CoolThreads CMT Server T5220 and the mid-range T5440 server once again proved to be ideal candidates to deploy and run multi-thread workloads by exhibiting resilient performance when handling large number of simultaneous requests from 50,000 BI EE virtual users. T5220 handled large number of concurrent asynchronous read/write requests from eight different NFS clients.

  • NFS v3 was configured at the NFS Server as well as at the NFS Client nodes. NFS version 4 is the default on Solaris 10, and it might have worked as expected. However a handful of bug reports prompted us to go with the more matured and less buggy version 3.

  • 3283 watts is the average power consumption when all the 50,000 concurrent BI users are in the steady state of the benchmark test. That is, in the case of similarly configured workloads, the T5440 server supports 15.2 users per watt of energy consumed and supports 5,000 users per rack unit.

  • A summary of the results with system-wide averages of CPU and memory utilization is shown below. The latest results are highlighted in blue color.

    #Vusers Clustered #BI Nodes #CPU #Core RAM CPU Memory Avg Trx Response Time #Trx/sec
    7,500 No 1 1 8 32 GB 72.85% 18.11 GB 0.22 sec 155
    28,000 Yes 4 4 32 128 GB 75.04% 76.16 GB 0.25 sec 580
    50,000 Yes 8 8 64 256 GB 63.32% 172.21 GB 0.28 sec 1031

TOPOLOGY DIAGRAM

The topology diagram in the benchmark results white paper is almost illegible. Here is the original topology diagram that was inserted into the white paper.

OBIEE on T5440 : 50K User Benchmark Topology

Quite frankly I'm not very proud of this drawing -- but that's the best that I could come up with in a short span. Rather than showing the flow of communication between each and every component in the benchmark setup, I simplified the drawing by introducing a "black box" sort of thing - "private network" - in the middle, which protected the drawing from getting messy.


CPU USAGE GRAPH

The following two-dimensional graph shows the CPU utilization patterns at all 3 nodes in the benchmark setup for the 60 minute steady state of the benchmark run. This graph was generated using the free GNUplot tool with sar data as the inputs.

OBIEE on T5440 : 50K User Benchmark CPU Usage Graph

COMPETITIVE LANDSCAPE

And finally here is a quick summary of all the results that are published by different vendors so far with similar benchmark kit. Feel free to draw your own conclusions. All this is public information. Check the corresponding benchmark reports by clicking on the URLs under the "#Users" column.

Server Processors #Users OS
Chips Cores Threads GHz Type
  2 x Sun SPARC Enterprise T5440 (APP)
  1 x Sun SPARC Enterprise T5220 (NFS,DB)
8
1
64
8
512
64
1.6
1.2
UltraSPARC T2 Plus
UltraSPARC T2
50,000 Solaris 10 5/09
  1 x Sun SPARC Enterprise T5440 4 32 256 1.6 UltraSPARC T2 Plus 28,000 Solaris 10 5/09
  5 x Sun Fire T2000 1 8 32 1.2 UltraSPARC T1 10,000 Solaris 10 11/06
  3 x HP DL380 G4 2 4 4 2.8 Intel Xeon 5,800 OEL
  1 x IBM x3755 4 8 8 2.8 AMD Opteron 4,000 RHEL4


Before you go, do not forget to check the best practices for configuring / deploying Oracle Business Intelligence on top of Solaris 10 running on Sun CMT hardware.

Related Blog Posts:
T5440 Rocks [again] with Oracle Business Intelligence Enterprise Edition Workload

Monday Aug 17, 2009

T5440 Rocks [again] with Oracle Business Intelligence Enterprise Edition Workload

A while ago, I blogged about how we scaled Siebel 8.0 up to 14,000 concurrent users by consolidating the entire Siebel stack on a single Sun SPARC® Enterprise T5440 server with 4 x 1.4 GHz eight-core UltraSPARC® T2 Plus Processors. OLTP workload was used in that performance benchmark effort.

We repeated a similar effort by collaborating with Oracle Corporation, but with an OLAP workload this time around. Today Sun and Oracle announced the 28,000 user Oracle Business Intelligence Enterprise Edition (OBIEE) 10.1.3.4 benchmark results on a single Sun SPARC Enterprise T5440 server with 4 x 1.6 GHz eight-core UltraSPARC T2 Plus Processors running Solaris 10 5/09 operating system. An Oracle white paper with Sun's 28,000 user benchmark results is available on Oracle's benchmark web site.

Some of the notes and key take away's from this benchmark are as follows:

  • Key specifications for the Sun SPARC Enterprise T5440 system under test are: 4 x UltraSPARC T2 Plus processors, 32 cores, 256 compute threads and 128 GB of memory in a 4RU space.

  • The entire OBIEE solution was deployed on a single Sun SPARC Enterprise T5440 server using Oracle BI Cluster software.

  • The BI Cluster was configured with 4 x BI nodes. Each of those BI nodes were configured to run inside a Solaris Container.

    1. Each Solaris Container was configured with one physical processor (that is, 8 cores or 64 virtual cpus), and 32 GB physical memory.

    2. Each BI node was configured to run BI Server, Presentation Server and OC4J Web Server

    3. Two of the BI nodes have the BI Cluster Controller running (primary & secondary)

    4. One out of four Containers was sharing CPU and memory resources with Oracle 11g RDBMS and the host operating system that are running in the global zone

  • Caching was turned ON at the application server, which led to minimal database activity on the server.

    1. In other words, one can use these results only to size the hardware requirements for a complete BI EE deployment excluding the database server.

    2. All the OBIEE benchmark results published so far are with the caching turned ON. This fact was not explicitly mentioned in some of the benchmark results white papers. Check the competitive Landscape for the pointers to different benchmark results published by different vendors.

  • From our experiments with the OBIEE benchmark workload, it appears that a BI deployment with a single non-cluster BI node could reasonably scale well up to 7,500 active users on a T5440 server. To scale beyond 7,500 concurrent users, you might need another instance of BI. Of course, your mileage may vary.

  • BI EE exhibited excellent horizontal scalability when multiple BI nodes were clustered using BI Cluster software. Four BI nodes in the Cluster were able to handle 28,000 concurrent users with minimal impact on the overall average transaction response times.

      It appeared as though we can simply add more BI nodes to the BI Cluster to cope with the increase in user base. However due to the limited hardware resources, we could not try running beyond 4 nodes in the BI Cluster. As of today, the theoritical limit for the number of BI nodes in a Cluster is 16.

  • The underlying hardware must behave well in order for the application to scale and perform well -- so, credit goes to UltraSPARC T2 Plus powered Sun SPARC Enterprise T5440 server as well. In other words, it is fair to say the combination of (T5440 + OBIEE) performs and scales well on Solaris.

  • A summary of the results with system-wide averages of CPU and memory utilization is shown below.

    #Vusers Clustered #BI Nodes #CPU #Core RAM CPU Memory Avg Trx Response Time #Trx/sec
    7,500 No 1 1 8 32 GB 72.85% 18.11 GB 0.22 sec 155
    28,000 Yes 4 4 32 128 GB 75.04% 76.16 GB 0.25 sec 580
  • Internal Solid State Drive (SSD) with ZFS file system showed significant I/O performance improvement over traditional disk for the BI catalog activity. In addition, ZFS helped get past the UFS limitation of 32,767 sub-directories in a BI catalog directory.

  • The benchmark demonstrated that 64-bit BI EE platform is immune to the 4 GB virtual memory limitation of the 32-bit BI EE platform -- hence can potentially support even more users and have larger caches as long as the hardware resources are available.

      Solaris runs in 64-bit mode by default on SPARC platform. Consider running 64-bit BI EE on Solaris.

  • 2,107 watts is the average power consumption when all the 28,000 concurrent users are in the steady state of the benchmark test. That is, in the case of similarly configured workloads, T5440 supports 13.2 users per watt of the power consumed; and supports 7,000 users per rack unit.

TOPOLOGY DIAGRAM:

A picture is worth a thousand words. The following topology diagram(s) says it all about the configuration.

1. Single Node BI Non-Cluster Configuration : 7,500 Concurrent Users

Even though the Solaris Container was shown in a cloud like graphical form, it has nothing to do with the "Cloud Computing". It is just a side effect of fancy drawing.

2. Four Node BI Cluster Configuration : 28,000 Concurrent Users

COMPETITIVE LANDSCAPE

Here is a quick summary of all the results that are published by different vendors. Feel free to draw your own conclusions. All this is public information. Check the corresponding benchmark reports by clicking on the URLs under the "#Users" column.

Server Processors #Users OS
Chips Cores Threads GHz Type
  1 x Sun SPARC Enterprise T5440 4 32 256 1.6 UltraSPARC T2 Plus 28,000 Solaris 10 5/09
  5 x Sun Fire T2000 1 8 32 1.2 UltraSPARC T1 10,000 Solaris 10 11/06
  3 x HP DL380 G4 2 4 4 2.8 Intel Xeon 5,800 OEL
  1 x IBM x3755 4 8 8 2.8 AMD Opteron 4,000 RHEL4

CAUTION

Although T5440 possesses a ton of great qualities, it might not be suitable for deploying workloads with heavy single-threaded dependencies. The T5440 is an excellent hardware platform for multi-threaded, and moderately single-threaded/multi-process workloads. When in doubt, it is a good idea to leverage Sun Microsystems' Try & Buy program to try the workloads on the T5440 server before making the final call.


Check the second part of this blog post for the best practices for configuring / deploying Oracle Business Intelligence on top of Solaris 10 running on Sun CMT hardware.

Related Blog Posts:

Oracle Business Intelligence on Sun : Few Best Practices

(Updated on 10/16/09 with additional content and restructured the blog entry for clarity and easy navigation)

The following suggested best practices are applicable to all Oracle BI EE deployments on Sun hardware (CMT and M-class) running Solaris 10 or later. These recommendations are based on our observations from the 50,000 user benchmark on Sun SPARC Enterprise T5440. It is not the complete list, and your mileage may vary.

Hardware : Firmware

Ensure that the system's firmware is up-to-date.

Solaris Recommendations

  • Upgrade to the latest update release of Solaris 10.

  • Solaris runs in 64-bit mode by default on SPARC platform. Consider running 64-bit BI EE on Solaris.

      64-bit BI EE platform is immune to the 4 GB virtual memory limitation of the 32-bit BI EE platform -- hence can potentially support even more users and have larger caches as long as the hardware resources are available.

  • Enable 256M large pages on all nodes. By default, the latest update of Solaris 10 will use a maximum of 4M pages even when 256M pages are a good fit.

      256M pages can be enabled with the following /etc/system tunables.
      
      \* 256M pages for the process heap
      set max_uheap_lpsize=0x10000000
      
      \* 256M pages for ISM
      set mmu_ism_pagesize=0x10000000
      
      

  • Increase the file descriptor limits by adding the following lines to /etc/system on all BI nodes.
      
      \* file descriptor limits
      set rlim_fd_cur=65536
      set rlim_fd_max=65536
      
      
  • On larger systems with more CPUs or CPU cores, try not to deploy Oracle BI EE in the global zone.

      In our benchmark testing, we have observed unpredictable and abnormal behavior of the BI server process (nqsserver) in the global zone under moderate loads. This behavior is clearly noticeable when there are more than 64 vcpus allocated to the global zone.

  • If the BI presentation catalog is stored on a local file system, create a ZFS file system to hold the catalog.

      If there are more than 25,000 authorized users in a BI deployment, the default UFS file system may run into Too many links error when the Presentation Server tries to create more than 32,767 sub-directories (refer to LINK_MAX on Solaris)

  • Store the Presentation Catalog on a disk with faster I/O such as a Solid State Drive (SSD). For uniform reads and writes across different disk drives [ and of course for better performance ], we recommend creating ZFS file system on top of a zpool with multiple SSDs.

    Here is an example that shows the ZFS file system creation steps for the BI Presentation Catalog.

    
    # zpool create -f BIshare c1t2d0s6 c1t3d0s0 c1t4d0s6 c1t5d0s6
    
    # zpool list
    NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
    BIshare   118G    97K   118G     0%  ONLINE  -
    
    # zfs create BIshare/WebCat
    
    # fstyp /dev/dsk/c1t2d0s6
    zfs
    
    # zpool status -v
      pool: BIshare
     state: ONLINE
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            BIshare     ONLINE       0     0     0
              c1t2d0s6  ONLINE       0     0     0
              c1t3d0s0  ONLINE       0     0     0
              c1t4d0s6  ONLINE       0     0     0
              c1t5d0s6  ONLINE       0     0     0
    
    errors: No known data errors
    
    

    Observe the I/O activity on ZFS file system by running zpool iostat -v command.

Solaris : ZFS Recommendations

  • If the file system is mainly used for storing the Presentation Catalog, consider setting the ZFS record size to 8K. This is because of the relatively small size (8K or less) reads/writes from/into the BI Catalog.

    eg.,
    
            # zfs set recordsize=8K BIshare/WebCat
    
    

    In the case of database, you may have to set the ZFS record size to the database block size.

  • Even though disabling ZFS Intent Log (ZIL) may improve the performance of synchronous write operations, it is not a recommended practice to disable ZIL. Doing so may compromise the data integrity.

      Disabling the ZIL on an NFS Server can lead to client side corruption.

  • When running CPU intensive workloads, consider disabling the ZFS' metadata compression to provide more CPU cycles to the application.

      Starting with Solaris 10 11/06, metadata compression can be disabled and enabled dynamically as shown below.

      To disable the metadata compression:

      
              # echo zfs_mdcomp_disable/W0t1 | mdb -kw
      
      

      To enable the metadata compression:

      
              # echo zfs_mdcomp_disable/W0t0 | mdb -kw
      
      

      To permanently disable the metadata compression, set the following /etc/system tunable.

      
              set zfs:zfs_mdcomp_disable=1
      
      

Solaris : NFS Recommendations

One of the requirements of OBIEE is that the BI Presentation Catalog must be shared across different BI nodes in the BI Cluster. (There will be only one copy of the presentation catalog). Unless the catalog has been replicated on different nodes, there is no other choice but to share it across different nodes. One way to do this is to create an NFS share with the top level directory of the catalog, and then to mount it over NFS at the BI nodes.

  • Version 4 is the default NFS version on Solaris 10. However it appears that as of this writing, NFS v4 is not as mature as v3. So we recommend experimenting with both versions to see which one fits well to the needs of the BI deployment.

    To enable NFS v3 on both server and the client, edit /etc/default/nfs and make the changes as shown below.

      NFS Server
      NFS_SERVER_VERSMIN=3 NFS_SERVER_VERSMAX=3
      NFS Client
      NFS_CLIENT_VERSMIN=3 NFS_CLIENT_VERSMAX=3
  • Experiment with the following NFS tunables.

      NFS Server
      
      NFSD_SERVERS=<desired_number> <-- on CMT systems with large number of hardware threads you can go as high as 512
      NFS_SERVER_DELEGATION=[ON|OFF] <-- ON is the default. Experiment with OFF
      NFSMAPID_DOMAIN=<network_domain_where_BI_was_deployed>
      
      
      NFS Client
      NFSMAPID_DOMAIN=<network_domain_where_BI_was_deployed>
  • Monitor the DNLC hit rate and tune the directory name look-up cache (DNLC).

      To monitor the DNLC hit rate, run "vmstat -s | grep cache" command. It is ideal to see a hit rate of 95% or above.

      Add the following tunable parameter to /etc/system on NFS server with a desired value for the DNLC cache.

      
              set ncsize=<desired_number>
      
      
  • Mounting NFS Share

    Mount the NFS share that contains the Presentation Services Catalog on all the NFS clients (BI nodes in this context) using the following mount options:

    
            rw, forcedirectio, nocto
    
    

Oracle BI EE Cluster Deployment Recommendations

  • Ensure that all the BI components in the cluster are configured in a many-to-many fashion

  • For proper load balancing, configure all BI nodes to be identical in the BI Cluster

  • When planning to add an identically configured new node to the BI Cluster, simply clone an existing well-configured BI node running in a non-global zone.

      Cloning a BI node running in a dedicated zone results in an exact copy of the BI node being cloned. This approach is simple, less error prone and eliminates the need to configure the newly added node from scratch.

Oracle BI Presentation Services Configuration Recommendations

  • Increase the file descriptors limit. Edit SAROOTDIR/setup/systunesrv.sh to increase the value from 1024 to any other value of your choice. In addition you must increase the shell limit using the ulimit -n command

    eg.,
    
    	ulimit -n 2048
    
    

  • Configure 256M large pages for the JVM heap of Chart server and OC4J web server (this recommendation is equally applicable to other web servers such as WebLogic or Sun Java system Web Server). Also use parallel GC, and restrict the number of parallel GC threads to 1/8th of the number of virtual cpus.

    eg.,

    
    	-XX:LargePageSizeInBytes=256M -XX:+UseParallelGC -XX:ParallelGCThreads=8
    
    

  • The Oracle BI Presentation Server keeps the access information of all the users in the Web Catalog. When there are large number of unique BI users, it can take a significant amount of time to look up a user if all the users reside in a single directory. To avoid this, hash the user directories. It can be achieved by having the following entry in SADATADIR/web/config/instanceconfig.xml

    eg.,

    
    	<Catalog>
    		<HashUserHomeDirectories>2</HashUserHomeDirectories>
    	</Catalog>
    
    

    HashUserHomeDirectories specifies the number of characters to use to hash user names into sub directories. When this element is turned on, for example, the default name for user Steve's home directory would become /users/st/steve.

  • BI Server and BI Presentation Server processes create many temporary files while rendering reports and dashboards for a user. This can result in significant I/O activity on the system. The I/O waits can be minimized by pointing the temporary directories to a memory resident file system such as /tmp on Solaris OS. To achieve this, add the following line to the instanceconfig.xml configuration file.

    eg.,

    
    	<TempDir>/tmp/OracleBISAW</TempDir>
    
    

    Similarly the Temporary directory (SATEMPDIR) can be pointed to a memory resident file system such as /tmp to minimize the I/O waits.

  • Consider tuning the value of CacheMaxEntries in instanceconfig.xml. A value of 20,000 was used in the 50,000 user OBIEE benchmark on T5440 servers. Be aware that the Presentation Services process (sawserver64) consumes more virtual memory when this parameter is set to a high value.

    eg.,
    
    	<CacheMaxEntries>20000</CacheMaxEntries>
    
    
  • If the presentation services log contains errors such as "The queue for the thread pool AsyncLogon is at it's maximum capacity of 50 jobs.", consider increasing the Presentation Services' asynchronous job queue. 50 is the default value.

    The following example increases the job queue size to 200.

    
    	<ThreadPoolDefaults>
    		<AsyncLogon>
    			<MaxQueue>200</MaxQueue>
    		</AsyncLogon>
    	</ThreadPoolDefaults>
    
    
  • Increase the query cache expiry time especially when the BI deployment is supposed to handle large number of concurrent users. The default is 60 minutes. However under very high loads, a cache entry may be removed before one hour if many queries are being run. Hence it is necessary to tune the parameter CacheMaxExpireMinutes in Presentation Services' instanceconfig.xml.

    The following example increases the query cache expiry time to 3 hours.

    
    	<CacheMaxExpireMinutes>180</CacheMaxExpireMinutes>
    
    
  • Consider increasing the Presentation Services' cache timeout values to keep the cached data intact for longer periods.

    The following example increases the cache timeout values to 5 hours in instanceconfig.xml configuration file.

    
    	<AccountIndexRefreshSecs>18000</AccountIndexRefreshSecs>
    	<AccountCacheTimeoutSecs>18000</AccountCacheTimeoutSecs>
    	<CacheTimeoutSecs>18000</CacheTimeoutSecs>
    	<CacheCleanupSecs>18000</CacheCleanupSecs>
    	<PrivilegeCacheTimeoutSecs>18000</PrivilegeCacheTimeoutSecs>
    
    

Oracle BI Server Configuration Recommendations

  • Enable caching at the BI server and control/tune the cache expiry time for each of the table based on your organizations' needs.

  • Unless the repository needs to be edited online frequently, consider setting up the "read only" mode for the repository. It may ease lock contention up to some extent.

  • Increase the session limit and the number of requests per session limit especially when the BI deployment is expected to handle large number of concurrent users. Also increase the number of BI server threads.

    The following configuration was used in 50,000 user OBIEE benchmark on T5440 servers.

    
    (Source configuration file: NQSConfig,.INI)
    
    [ CACHE ]
    ENABLE = YES;
    DATA_STORAGE_PATHS = "/export/oracle/OracleBIData/cache" 500 MB;
    MAX_ROWS_PER_CACHE_ENTRY = 0;
    MAX_CACHE_ENTRY_SIZE = 10 MB;
    MAX_CACHE_ENTRIES = 5000;
    POPULATE_AGGREGATE_ROLLUP_HITS = NO;
    USE_ADVANCED_HIT_DETECTION = NO;
    
    // Cluster-aware cache
    GLOBAL_CACHE_STORAGE_PATH = "/export/oracle/OracleBIsharedRepository/GlobalCacheDirectory" 2048 MB;
    MAX_GLOBAL_CACHE_ENTRIES = 10000;
    CACHE_POLL_SECONDS = 300;
    CLUSTER_AWARE_CACHE_LOGGING = NO;
    
    [ SERVER ]
    READ_ONLY_MODE = YES;
    MAX_SESSION_LIMIT = 20000 ;
    MAX_REQUEST_PER_SESSION_LIMIT = 1500 ;
    SERVER_THREAD_RANGE = 512-2048;
    SERVER_THREAD_STACK_SIZE = 0;
    DB_GATEWAY_THREAD_RANGE = 512-512;
    
    #SERVER_HOSTNAME_OR_IP_ADDRESSES = "ALLNICS";
    CLUSTER_PARTICIPANT = YES;
    
    

Related Blog Posts

Saturday Nov 15, 2008

Yet Another Siebel 8.0 PSPP Benchmark on Sun CMT Hardware ..

.. Sun SPARC Enterprise T5240.

(This blog entry also serves as a summary page for all the Siebel 8.0 benchmarks that Sun published so far.)

Yesterday Sun published a brand new 10,000 user Siebel 8.0 benchmark result using a combination of T5240 and T5120 servers. In this benchmark, a Sun SPARC Enterprise T5240 server equipped with two 1.2 GHz, 8-Core UltraSPARC T2 Plus processors served as the system under test on which we ran the Siebel Gateway and Enterprise application servers. Two Sun SPARC Enterprise T5120 servers equipped with one 1.2 GHz, 8-Core UltraSPARC T2 processor were configured to run the Oracle database and the Sun Java System Web servers.

A copy of the latest benchmark publication is available on Oracle Applications' benchmark web site at:
        Siebel CRM Release 8.0 Industry Applications and Oracle 10g R2 DB on Sun SPARC Enterprise T5120 & T5240 servers running Solaris 10

For some reason, the topology diagram in the benchmark publication document was messed up esp. the fonts -- probably the odt -> doc -> pdf conversion. The clean copy of the diagram is shown below.

Significance of the Siebel 8.0 on T5240 benchmark

In case if anyone wonder why do we need another Siebel 8.0 benchmark on CMT hardware esp. when we already published couple of Siebel 8.0 benchmarks on T5220 and T5440 systems -- 10,000 users on T5120/T5220 and 14,000 users on T5440, the answer is simple: to show linear scalability.

In the first benchmark that Sun published in January 2008, we showed the scalability of the application, Siebel, on T5220 systems. We were able to scale up to 5,000 concurrent users on a single T5220 system (running the Siebel application servers) with 1.2 GHz, 8-Core US-T2 processor. We've used two such systems to publish the 10,000 user benchmark in the first installment.

The goal of the second benchmark that we published in October 2008 during the T5440 server launch showcases how to consolidate multiple workloads on a T5440 server. We demonstrated it by deploying the whole Siebel Enterprise -- Sun Java System Web Server along with the Siebel Web Server plug-in Siebel Web Engine (SWE), Siebel Gateway Server, Siebel Application Server and the Oracle Database Server -- on a single Sun SPARC Enterprise T5440 server equipped with four 1.4 GHz, 8-Core UltraSPARC T2 Plus processors. We ran 14,000 concurrent virtual users against this setup to make it a benchmark publication. Since we ran all tiers of Siebel Enterprise on the same box, it is hard to compare the scalability numbers from this benchmark against the numbers that we published in the 10,000 user benchmark on T5120/T5220 servers.

In April 2008, Sun has launched the first multi-processor CMT system, Sun SPARC Enterprise T5240. T5240 holds two UltraSPARC T2 Plus processors, and is supposed to exhibit 2x performance[1] as that of a T5220. In other words, two T5220 servers can be consolidated onto a single T5240. To prove this, we re-ran the 10,000 user benchmark that we published back in January 2008 by replacing the two T5220 servers in the application tier with a T5240 server, and keeping the remaining hardware configuration for the web and database servers intact. The results from this benchmark speak for themselves - but for your convenience here is the quick summary of the results.



#users #units x Server Model Business Transactions
Throughput/hour
Projected
Daily
Transactions
Benchmark Publication
URL & Date
10,000 2 x T5220 142,061 1,136,488 10K/T5220, 01/2008
10,000 1 x T5240 141,205 1,129,640 10K/T5240, 11/2008


If you are a Sun-Oracle customer, make sure to check the Siebel on Sun CMT hardware : Best Practices web page for some useful tips.

Related entries:

  1. Siebel 8.0 on Sun SPARC Enterprise T5440 - More Bang for the Buck!!
  2. Sun publishes 10,000 user Siebel 8.0 PSPP benchmark on Niagara 2 systems
  3. Siebel CRM 8.0 PSPP UltraSPARC T2 beats POWER6 and sets World Record


[1] There is no unique definition for the word 'performance' -- it has different meanings based on the context.

Monday Oct 13, 2008

Siebel on Sun CMT hardware : Best Practices

The following suggested best practices are applicable to all Siebel deployments on CMT hardware (Tx00, T5x20, T5x40) running Solaris 10 [Note: some of this tuning applies to Siebel running on conventional hardware running Solaris]. These recommendations are based on our observations from the 14,000 user benchmark on Sun SPARC Enterprise T5440. Your mileage may vary.

All Tiers
  • Ensure that the system's firmware is up-to-date.

  • Upgrade to the latest update release of Solaris 10.

      Note to the customers running Siebel on Solaris 5/08: apply the kernel patch 137137-07 as soon as it is available on sunsolve.sun.com web site. Patch 137137-07 and later revisions, Solaris 10 10/08 will have the workaround to a critical Siebel specific bug. Oracle Corporation will eventually fix the bug in their codebase - in the meantime Solaris is covering for Siebel and all other 32-bit applications with their own memory allocators that return unaligned mutexes. Check the RFE 6729759 Need to accommodate non-8-byte-aligned mutexes and Oracle's Siebel support document 735451.1 Do NOT apply Kernel Patch 137111-04 on Solaris 10 for more details.


  • Enable 256M large pages on all nodes. By default, the latest update of Solaris 10 will use a maximum of 4M pages even when 256M pages are a good fit.

      256M pages can be enabled with the following /etc/system tunable.
      \* 256M pages
      set max_uheap_lpsize=0x10000000


  • Pro-actively avoid running into stdio's 256 file descriptors limitation.

      Set the following in a shell or add the following lines to the shell's profile (bash/ksh).
      ulimit -n 2048
      export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1:$LD_PRELOAD_32

      Technically the file descriptor limit can be set to as high as 65536. However from the application's perspective, 2048 is a reasonable limit.


  • Improve scalability with MT-hot memory allocation library, libumem or libmtmalloc.

    To improve the scalability of the multi-threaded workloads, preload MT-hot object-caching memory allocation library like libumem(3LIB), mtmalloc(3MALLOC).

      eg., To preload the libumem library, set the LD_PRELOAD_32 environment variable in the shell (bash/ksh) as shown below.

      export LD_PRELOAD_32=/usr/lib/libumem.so.1:$LD_PRELOAD_32

      Web and the Application servers in the Siebel Enterprise stack are 32-bit. However Oracle 10g or 11g RDBMS on Solaris 10 SPARC is 64-bit. Hence the path to the libumem library in the PRELOAD statement differs slightly in the database-tier as shown below.

      export LD_PRELOAD_64=/usr/lib/sparcv9/libumem.so.1:$LD_PRELOAD_64

    Be aware that the trade-off is the increase in memory footprint -- you may notice 5 to 20% increase in the memory footprint with one of these MT-hot memory allocation libraries preloaded. Also not every Siebel application module benefits from MT-hot memory allocators. The recommendation is to experiment before implementing in production environments.

  • TCP/IP tunables

    Application fared well with the following set of TCP/IP parameters on Solaris 10 5/08.

    ndd -set /dev/tcp tcp_time_wait_interval 60000
    ndd -set /dev/tcp tcp_conn_req_max_q 1024
    ndd -set /dev/tcp tcp_conn_req_max_q0 4096
    ndd -set /dev/tcp tcp_ip_abort_interval 60000
    ndd -set /dev/tcp tcp_keepalive_interval 900000
    ndd -set /dev/tcp tcp_rexmit_interval_initial 3000
    ndd -set /dev/tcp tcp_rexmit_interval_max 10000
    ndd -set /dev/tcp tcp_rexmit_interval_min 3000
    ndd -set /dev/tcp tcp_smallest_anon_port 1024
    ndd -set /dev/tcp tcp_slow_start_initial 2
    ndd -set /dev/tcp tcp_xmit_hiwat 799744
    ndd -set /dev/tcp tcp_recv_hiwat 799744
    ndd -set /dev/tcp tcp_max_buf  8388608
    ndd -set /dev/tcp tcp_cwnd_max  4194304
    ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
    ndd -set /dev/udp udp_xmit_hiwat 799744
    ndd -set /dev/udp udp_recv_hiwat 799744
    ndd -set /dev/udp udp_max_buf 8388608

Siebel Application Tier
  • All T-series systems (T1000/T2000, T5120/T5220, T5120/T5240, T5440) support the 256M page size. However Siebel's siebmtshw script restricts the page size to 4M. Comment out the following lines in $SIEBEL_HOME/siebsrvr/bin/siebmtshw.
      # This will set 4M page size for Heap and 64 KB for stack
      MPSSHEAP=4M
      MPSSSTACK=64K
      MPSSERRFILE=/tmp/mpsserr
      LD_PRELOAD=/usr/lib/mpss.so.1
      export MPSSHEAP MPSSSTACK MPSSERRFILE LD_PRELOAD

  • Experiment with less number of Siebel Object Managers.

      Configure the Object Managers in such a way that each OM will be handling at least 200 active users. Siebel's standard recommendation of 100 or less users per Object Manager is suitable for conventional systems but not ideal for CMT systems like Tx000, T5x20, T5x40, T5440. Sun's CMT systems are ideal for running multi-threaded processes with tons of LWPs per process. Besides, there will be significant improvement in the overall memory footprint with less number of Siebel Object Managers.

  • Try Oracle 11g R1 client in the application-tier. Oracle 10g R2 clients may crash under high load. For the symptoms of the crash, check Solaris/SPARC: Oracle 11gR1 client for Siebel 8.0.

      Oracle 10g R2 10.2.0.4 32-bit client is supposed to have a fix for the process crash issue - however it wasn't verified in our test environment.


Siebel Database Tier
  • Eliminate double buffering by forcing the file system to use direct I/O.

    Oracle database caches the data in its own cache within the shared global area (SGA) known as the database block buffer cache. Database reads and writes are cached in block buffer cache so the subsequent accesses for the same blocks do not need to re-read the data from the operating system. On the other hand, file systems on Solaris default to reading the data though the global file system cache for improved I/O operations. That is, by default each read is cached potentially twice - one copy in the operating system's file system cache, and the other copy in Oracle's block buffer cache. In addition to double caching, there is also some extra CPU overhead for the code which manages the operating system's file system cache. The solution is to eliminate double caching by forcing the file system to bypass the OS file system cache when reading and writing to the disk.

      In the 14,000 user benchmark setup, the UFS file systems (holding the data files and the redo logs) were mounted with the forcedirectio option.

      eg.,
      mount -o forcedirectio /dev/dsk/<partition> <mountpoint>


  • Store data files separate from the redo log files -- If the data files and the redo log files are stored on the same disk drive and if that disk drive fails, the files cannot be used in the database recovery procedures.

      In the 14,0000 user benchmark setup, there are two Sun StorateTek 2540 arrays connected to the T5440 - one array was holding the data files, where as the other was holding the Oracle redo log files.

  • Size online redo logs to control the frequency of log switches.

      In the 14,0000 user benchmark setup, two online redo logs were configured each with 10 GB disk space. When all 14,000 concurrent users are on-line, there is only one log switch in a 60 minute period.

  • If the storage array supports the read-ahead feature, enable it. When 'read-ahead enabled' is set to true, the write will be committed to the cache as opposed to the disk, and the OS signals the application that the write has been committed.


    Oracle Database Initialization Parameters

  • Set Oracle's initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to appropriate value. DB_FILE_MULTIBLOCK_READ_COUNT parameter specifies the maximum number of blocks read in one I/O operation during a sequential scan.

      In the 14,0000 user benchmark configuration, DB_BLOCK_SIZE was set to 8 kB. During the benchmark run, the average reads are around 18.5 kB per second. Hence setting DB_FILE_MULTIBLOCK_READ_COUNT to a high value does not necessarily improve the I/O performance. A value of 8 for the database init parameter DB_FILE_MULTIBLOCK_READ_COUNT seems to perform better.


  • On T5240 and T5440 servers, set the database initialization parameter CPU_COUNT to 64. Otherwise, by default Oracle RDBMS assumes 128 and 256 for the CPU_COUNT on T5240 and T5440 respectively. Oracle's optimizer might use a completely different execution plan when it notices such a large number for the CPU_COUNT; and the resulting execution plan need not necessarily be an optimal one. In the 14,000 user benchmark, setting CPU_COUNT to 64 produced optimal execution plans.


  • On T5240 and T5440 servers, explicitly set the database initialization parameter _enable_NUMA_optimization to FALSE. On these multi-socket servers, _enable_NUMA_optimization will be set to TRUE by default. During the 14,000 user benchmark run, we noticed intermittent shadow process crashes with the default behavior. We didn't realize any additional gains either with the default NUMA optimizations.

Siebel Web Tier
  • Upgrade to the latest service pack of Sun Java Web Server 6.1 (32-bit).

  • Run the Sun Java Web Server in multi-process mode by setting the MaxProcs directive in magnus.conf to a value that is greater than 1. In the multi-process mode, the web server can handle requests using multiple processes with multiple threads in each process.

      When you specify a value greater than 1 for the MaxProcs, the web server relies on the operating system to distribute connections among/between multiple web server processes. However many modern operating systems including Solaris do not distribute connections evenly, particularly when there are a small number of concurrent connections.

  • Tune the maximum simultaneous requests by setting the RqThrottle parameter in the magnus.conf file to appropriate value. A value of 1024 was used in the 14,000 user benchmark.

Siebel 8.0 on Sun SPARC Enterprise T5440 - More Bang for the Buck!!

Today Sun announced the 14,000 user Siebel 8.0 PSPP benchmark results on a single Sun SPARC Enterprise T5440. An Oracle white paper with Sun's 14,000 user benchmark results is available on Oracle's Siebel benchmark web site. The content in this blog post complements the benchmark white paper.

Some of the notes and highlights from this competitive benchmark are as follows:

  • Key specifications for the Sun SPARC Enterprise T5440 system under test, are: 4 x UltraSPARC T2 Plus processors, 32 cores, 256 compute threads and 128 GB of memory in a 4RU space.

  • The entire Siebel 8.0 solution was deployed on a single Sun SPARC Enterprise T5440 including the web, gateway, application, and database servers.

      9 load driver clients with dual-core Opteron and Xeon processors were used to load up 14,000 concurrent users

  • Web, Application and the Database servers were isolated from each other by creating three Solaris Containers (non-global zones or local zones) dedicated one each for all those servers.

      Solaris 10 Binary Application Guarantee Program guarantees the binary compatibility for all applications running under Solaris native host operating system environments as well as Solaris 10 OS running as a guest operating system in a virtualized platform environment.

  • Siebel Gateway server and the Siebel Application servers were installed and configured in one of the three Solaris Containers. Two identical copies of Siebel Application server instances were configured to handle 7,000 user load by each of those instances.

      From our experiments with the Siebel 8.0 benchmark workload, it appears that a single instance of Siebel Application server could scale up to 10,000 active users. Siebel Connection Broker (SCBroker) component becomes the bottleneck at the peak load in a single instance of the Siebel Application server.

  • To keep it simple, the benchmark publication white paper limits itself to an overview of the system configuration. The full details are available in the diagram below.

    Topology Diagram



    The breakdown of the approximate averages of CPU and memory utilization by each tier is shown below.

    TierCPUMemory
    Web78%4.50 GB
    App76%69.00 GB
    DB72%20.00 GB

    System-wide averages are as follows:

    TierCPUMemory
    Web + App + DB82%93.5 GB


  • 1276 watts is the average power consumption when all the 14,000 concurrent users are in the steady state of the benchmark test. That is, in the case of similarly configured workloads, T5440 supports 10.97 users per watt of the power consumed; and supports 3500 users per rack unit.

Based on the above notes: Sun SPARC Enterprise T5440 is inexpensive, requires: less power and data center footprint, ideal for consolidation and equally importantly scales well.



Vendor-to-Vendor comparison

How does our new 14,000 user benchmark result compare with the high watermark benchmark results published by other vendors using the same Siebel 8.0 PSPP workload?

Besides Sun, IBM and HP are the only other vendors who published benchmark results so far with the Siebel 8.0 PSPP benchmark workload. IBM's highest user count is 7,000; where as 5,200 is HP's. Here is a quick comparison of the throughputs based on the results published by Sun, IBM and HP with the highest number of active users.

Sun Microsystems' 14,000 user benchmark on a single T5440 outperformed:

  • IBM's 7,000 user benchmark result by 1.9x

  • HP's 5,200 user benchmark result by 2.5x
      HP published the 5,200 user result with a combination of 2 x BL460c running Windows Server 2003 and 1 x rx6600 HP system running HP-UX.

  • Sun's own 10,000 user benchmark result on a combination of 2 x T5120 and 2 x T5220s by 1.4x

From the operating system perspective, Solaris outperformed AIX, Windows Server 2003 and HP-UX. Linux is nowhere to be found in the competitive landscape.

A simple comparison of all the published Siebel 8.0 benchmark results (as of today) by all vendors justifies the title of this blog post. As IBM and HP do not post the list price of all of their servers, I am not even attempting to show the price/performance comparison in here. On the other hand, Sun openly lists out all the list prices at store.sun.com web site.

CAUTION

Although T5440 possesses a ton of great qualities, it might not be suitable for deploying workloads with heavy single-threaded dependencies. The T5440 is an excellent hardware platform for multi-threaded, and moderately single-threaded/multi-process workloads. When in doubt, it is a good idea to leverage Sun Microsystems' Try & Buy program to try the workloads on this new and shiny T5440 before making the final call.



I would like to share the tuning information from the OS and the underlying hardware perspective for couple of reasons -- 1. Oracle's benchmark white paper does not include any of the system specific tuning information, and 2. it may take quite a bit of time for Oracle Corporation to update the Siebel Tuning Guide for Solaris with some of the tuning information that you find in here.

Check the second part of this blog post for the best practices running Siebel on Sun CMT hardware.

Tuesday Apr 08, 2008

Running Batch Workloads on Sun's CMT Servers

Ever since Sun introduced Chip Multi-Threading (CMT) hardware in the form of UltraSPARC T1's T1000/T2000, our internal mail aliases were inundated with variety of customer stories, majority of those go like 'batch jobs are taking 12+ hours on T2000, where as it takes only 3 or 4 hours on US-IV+ based v490'. Even after two and half years since the introduction of the revolutionary CMT hardware, it appears that majority of Sun customers are still under the impression that Sun's CMT systems like T2000, T5220 are not capable of handling CPU intensive batch workloads. It is not a valid concern. CMT processors like UltraSPARC T1, T2, T2 Plus can handle batch workloads just as well like any other traditional/conventional processor viz. UltraSPARC-IV+, SPARC64-VI, AMD Opteron, Intel Xeon, IBM POWER6. However CMT awareness and little effort are required at the customer end to achieve good throughput on CMT systems.

First of all, the end users must realize the fact that the maximum clock speed of the existing CMT processor line-up (UltraSPARC T1, UltraSPARC T2, UltraSPARC T2 Plus) is only 1.4 GHz; and on top of that each strand (individual hardware thread) within a core shares the CPU cycles with the other strands that operate on the same core (Note: each core operates at the speed of the processor). Based on these facts, it is no surprise to see batch jobs taking longer times to complete when only one or a very few single-threaded batch jobs are submitted to the system. In such cases, the system resources are fairly under-utilized in addition to the longer elapsed times. One possible trick to achieve the required throughput in the expected time frame is to split up the workload into multiple jobs. For example, if an EDU customer needs to generate 1000 transcripts, the customer should consider submitting 4 individual jobs with 250 transcripts each or 8 jobs with 125 transcripts each rather than submitting one job for all 1000 transcripts. Ideally the customer should observe the resource utilization (CPU%, for example); and experiment with the number of jobs to be submitted until the system achieves the desired throughput within the expected time frame.

Case study: Oracle E-Business Suite Payroll 11i workload on Sun SPARC Enterprise T5220

In order to prove that the aforementioned methodology works beyond a reasonable doubt, let's take Oracle's E-Business Suite 11.5.10 Payroll workload as an example. On a single T5220 with one 1.4 GHz UltraSPARC T2 processor, acting as the batch, application and database server, 4 payroll threads generated 5,000 paychecks in 31.53 minutes of time consuming only 6.04% CPU on average. ~9,500 paychecks is the projected hourly throughput. This is a classic example of what majority of Sun's CMT customers are experiencing as of today i.e., longer batch processing times with little resource consumption. Keep in mind that each UltraSPARC T2 and UltraSPARC T2 Plus processors can execute up to 64 jobs in parallel (on a side note, UltraSPARC T1 processor can execute up to 32 jobs in parallel). So to put the idling resources for effective use, there by to improve the elapsed times and the overall throughput, few experiments were conducted with 64 payroll threads and the results are very impressive. With a maximum of 64 payroll threads, it took only 4.63 minutes to process 5,000 paychecks at an average of 40.77% CPU utilization. In other words, similarly configured T5220 can process ~64,700 paychecks at less than half of the available CPU cycles. Here is a word of caution: just because the processor can execute 64 threads in parallel, it doesn't mean it is always optimal to submit 64 parallel jobs on systems like T5220. Very high number of batch jobs (payroll threads in this particular scenario) might be an overkill for simple tasks like NACHA in Payroll process.

The following white paper has more detailed information about the nature of the workload and the results from the experiments with various number of threads for different components of the Oracle Applications' Payroll batch workload. Refer to the same white paper for the exact tuning information as well.

Link to the white paper:
     E-Business Suite Payroll 11i (11.5.10) using Oracle 10g on a Sun SPARC Enterprise T5220

Here is the summary of the results that were extracted from the white paper:

Hardware configuration
          1x Sun SPARC Enterprise T5220 for running the application, batch and the database servers
              Specifications: 1x 1.4 GHz 8-core UltraSPARC T2 processor with 64 GB memory

Software configuration
          Oracle E-Business Suite 11.5.10
          Oracle 10g R1 10.1.0.4 RDBMS
          Solaris 10 8/07

Results
Oracle E-Business Suite 11i Payroll - Number of employees: 5,000
Component #Threads Time (min) Avg. CPU% Hourly Throughput
Payroll process 64 1.87 90.56 160,714
PrePayments 64 0.20 46.33 1,500,000
Ext. Proc. Archive 64 1.90 90.77 157,895
NACHA 8 0.05 2.52 6,000,000
Check Writer 24 0.38 9 782,609
Costing 48 0.23 32.5 1,285,714
Total or Average NA 4.63 min 40.77% 64,748

It is evident from the average CPU% that the Payroll process and the External Process Archive components are extremely CPU intensive; and hence take longer time to complete. That's the reason 64 threads were configured for those components to run at the full potential of the system. Light-weight components like NACHA need fewer threads to complete the job efficiently. Configuring 64 threads for NACHA will have a negative impact on the throughput. In other words, we would be wasting CPU cycles for no apparent improvement.

It is the responsibility of the customers to tune the application and the workload appropriately. One size doesn't fit all.

The Payroll 11i results on the T5220 demonstrate clearly that Sun's CMT systems are capable of handling batch workloads well. It would be interesting to see how well they perform against other systems equipped with traditional processors with higher clock speeds. For this comparison, we could use couple of results that were published by UNISYS and IBM with the same workload. The following table summarizes the results from the following two white papers. For the sake of completeness, Sun's CMT results were included as well.

Source URLs:
  1. E-Business Suite Payroll 11i (11.5.10) using Oracle 10g on a UNISYS ES7000/one Enterprise Server
  2. E-Business Suite Payroll 11i (11.5.10) using Oracle 10g for Novell SUSE Linux on IBM eServer xSeries 366 Servers
Oracle E-Business Suite 11i Payroll - Number of employees: 5,000
Vendor OS Hardware Config #Threads Time (min) Avg. CPU% Hourly Throughput
UNISYS Linux: RHEL 4 Update 3 DB/App/Batch server: 1x Unisys ES7000/one Enterprise Server (4x 3.0 GHz Dual-Core Intel Xeon 7041 processors, 32 GB memory) 121 5.18 min 53.22% 57,915
IBM Novell SUSE Linux Enterprise Server 9 SP1 DB, App servers: 2x IBM eServer xSeries 366 4-way server (4x 3.66 GHz Intel Xeon MP Processors (EM64T), 32 GB memory) 12 8.42 min 50+%2 35,644
Sun Solaris 10 8/07 DB/App/Batch server: 1x Sun SPARC Enterprise T5220 (1x 1.4 GHz 8-core UltraSPARC T2 processor, 64 GB memory) 8 to 64 4.63 min 40.77% 64,748

Better results were highlighted. The results speak for themselves. One 1.4 GHz UltraSPARC T2 processor outperformed four 3 GHz / 3.66 GHz processors in terms of the average CPU utilization and most importantly in the hourly throughput (Hourly throughput calculation relies on the total elapsed time).

Before we conclude, let us reiterate few things purely based on the factual evidence presented in this blog post:

  • Sun's CMT servers like T2000, T5220, T5240 (two socket system with UltraSPARC T2 Plus processors) are good to run batch workloads like Oracle Applications Payroll 11i

  • Sun's CMT servers like T2000, T5220, T5240 are good to run the Oracle 10g RDBMS when the DML/DDL/SQL statements that make up the majority of the workload are not very complex, and

  • When the application is tuned appropriately, the performance of CMT processors can outperform some of the traditional processors that were touted to deliver the best single thread performance

Footnotes

1. There is a note in the UNISYS/Payroll 11i white paper that says "[...] the gains {from running increased numbers of threads} decline at higher numbers of parallel threads." This is quite contrary to what Sun observed in its Payroll 11i experiments on UltraSPARC T2 based T5220. Higher number of parallel threads (maximum: 64) improved the throughput on T5220, where as UNISYS' observation is based on their experiments with a maximum of 12 parallel threads. Moral of the story: do NOT treat all hardware alike.

2. IBM's Payroll 11i white paper has no references to the average CPU numbers. 50+% was derived from the "Figure 3: Average CPU Utilization".

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today