Monday Aug 17, 2009

Oracle Business Intelligence on Sun : Few Best Practices

(Updated on 10/16/09 with additional content and restructured the blog entry for clarity and easy navigation)

The following suggested best practices are applicable to all Oracle BI EE deployments on Sun hardware (CMT and M-class) running Solaris 10 or later. These recommendations are based on our observations from the 50,000 user benchmark on Sun SPARC Enterprise T5440. It is not the complete list, and your mileage may vary.

Hardware : Firmware

Ensure that the system's firmware is up-to-date.

Solaris Recommendations

  • Upgrade to the latest update release of Solaris 10.

  • Solaris runs in 64-bit mode by default on SPARC platform. Consider running 64-bit BI EE on Solaris.

      64-bit BI EE platform is immune to the 4 GB virtual memory limitation of the 32-bit BI EE platform -- hence can potentially support even more users and have larger caches as long as the hardware resources are available.

  • Enable 256M large pages on all nodes. By default, the latest update of Solaris 10 will use a maximum of 4M pages even when 256M pages are a good fit.

      256M pages can be enabled with the following /etc/system tunables.
      
      \* 256M pages for the process heap
      set max_uheap_lpsize=0x10000000
      
      \* 256M pages for ISM
      set mmu_ism_pagesize=0x10000000
      
      

  • Increase the file descriptor limits by adding the following lines to /etc/system on all BI nodes.
      
      \* file descriptor limits
      set rlim_fd_cur=65536
      set rlim_fd_max=65536
      
      
  • On larger systems with more CPUs or CPU cores, try not to deploy Oracle BI EE in the global zone.

      In our benchmark testing, we have observed unpredictable and abnormal behavior of the BI server process (nqsserver) in the global zone under moderate loads. This behavior is clearly noticeable when there are more than 64 vcpus allocated to the global zone.

  • If the BI presentation catalog is stored on a local file system, create a ZFS file system to hold the catalog.

      If there are more than 25,000 authorized users in a BI deployment, the default UFS file system may run into Too many links error when the Presentation Server tries to create more than 32,767 sub-directories (refer to LINK_MAX on Solaris)

  • Store the Presentation Catalog on a disk with faster I/O such as a Solid State Drive (SSD). For uniform reads and writes across different disk drives [ and of course for better performance ], we recommend creating ZFS file system on top of a zpool with multiple SSDs.

    Here is an example that shows the ZFS file system creation steps for the BI Presentation Catalog.

    
    # zpool create -f BIshare c1t2d0s6 c1t3d0s0 c1t4d0s6 c1t5d0s6
    
    # zpool list
    NAME      SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
    BIshare   118G    97K   118G     0%  ONLINE  -
    
    # zfs create BIshare/WebCat
    
    # fstyp /dev/dsk/c1t2d0s6
    zfs
    
    # zpool status -v
      pool: BIshare
     state: ONLINE
     scrub: none requested
    config:
    
            NAME        STATE     READ WRITE CKSUM
            BIshare     ONLINE       0     0     0
              c1t2d0s6  ONLINE       0     0     0
              c1t3d0s0  ONLINE       0     0     0
              c1t4d0s6  ONLINE       0     0     0
              c1t5d0s6  ONLINE       0     0     0
    
    errors: No known data errors
    
    

    Observe the I/O activity on ZFS file system by running zpool iostat -v command.

Solaris : ZFS Recommendations

  • If the file system is mainly used for storing the Presentation Catalog, consider setting the ZFS record size to 8K. This is because of the relatively small size (8K or less) reads/writes from/into the BI Catalog.

    eg.,
    
            # zfs set recordsize=8K BIshare/WebCat
    
    

    In the case of database, you may have to set the ZFS record size to the database block size.

  • Even though disabling ZFS Intent Log (ZIL) may improve the performance of synchronous write operations, it is not a recommended practice to disable ZIL. Doing so may compromise the data integrity.

      Disabling the ZIL on an NFS Server can lead to client side corruption.

  • When running CPU intensive workloads, consider disabling the ZFS' metadata compression to provide more CPU cycles to the application.

      Starting with Solaris 10 11/06, metadata compression can be disabled and enabled dynamically as shown below.

      To disable the metadata compression:

      
              # echo zfs_mdcomp_disable/W0t1 | mdb -kw
      
      

      To enable the metadata compression:

      
              # echo zfs_mdcomp_disable/W0t0 | mdb -kw
      
      

      To permanently disable the metadata compression, set the following /etc/system tunable.

      
              set zfs:zfs_mdcomp_disable=1
      
      

Solaris : NFS Recommendations

One of the requirements of OBIEE is that the BI Presentation Catalog must be shared across different BI nodes in the BI Cluster. (There will be only one copy of the presentation catalog). Unless the catalog has been replicated on different nodes, there is no other choice but to share it across different nodes. One way to do this is to create an NFS share with the top level directory of the catalog, and then to mount it over NFS at the BI nodes.

  • Version 4 is the default NFS version on Solaris 10. However it appears that as of this writing, NFS v4 is not as mature as v3. So we recommend experimenting with both versions to see which one fits well to the needs of the BI deployment.

    To enable NFS v3 on both server and the client, edit /etc/default/nfs and make the changes as shown below.

      NFS Server
      NFS_SERVER_VERSMIN=3 NFS_SERVER_VERSMAX=3
      NFS Client
      NFS_CLIENT_VERSMIN=3 NFS_CLIENT_VERSMAX=3
  • Experiment with the following NFS tunables.

      NFS Server
      
      NFSD_SERVERS=<desired_number> <-- on CMT systems with large number of hardware threads you can go as high as 512
      NFS_SERVER_DELEGATION=[ON|OFF] <-- ON is the default. Experiment with OFF
      NFSMAPID_DOMAIN=<network_domain_where_BI_was_deployed>
      
      
      NFS Client
      NFSMAPID_DOMAIN=<network_domain_where_BI_was_deployed>
  • Monitor the DNLC hit rate and tune the directory name look-up cache (DNLC).

      To monitor the DNLC hit rate, run "vmstat -s | grep cache" command. It is ideal to see a hit rate of 95% or above.

      Add the following tunable parameter to /etc/system on NFS server with a desired value for the DNLC cache.

      
              set ncsize=<desired_number>
      
      
  • Mounting NFS Share

    Mount the NFS share that contains the Presentation Services Catalog on all the NFS clients (BI nodes in this context) using the following mount options:

    
            rw, forcedirectio, nocto
    
    

Oracle BI EE Cluster Deployment Recommendations

  • Ensure that all the BI components in the cluster are configured in a many-to-many fashion

  • For proper load balancing, configure all BI nodes to be identical in the BI Cluster

  • When planning to add an identically configured new node to the BI Cluster, simply clone an existing well-configured BI node running in a non-global zone.

      Cloning a BI node running in a dedicated zone results in an exact copy of the BI node being cloned. This approach is simple, less error prone and eliminates the need to configure the newly added node from scratch.

Oracle BI Presentation Services Configuration Recommendations

  • Increase the file descriptors limit. Edit SAROOTDIR/setup/systunesrv.sh to increase the value from 1024 to any other value of your choice. In addition you must increase the shell limit using the ulimit -n command

    eg.,
    
    	ulimit -n 2048
    
    

  • Configure 256M large pages for the JVM heap of Chart server and OC4J web server (this recommendation is equally applicable to other web servers such as WebLogic or Sun Java system Web Server). Also use parallel GC, and restrict the number of parallel GC threads to 1/8th of the number of virtual cpus.

    eg.,

    
    	-XX:LargePageSizeInBytes=256M -XX:+UseParallelGC -XX:ParallelGCThreads=8
    
    

  • The Oracle BI Presentation Server keeps the access information of all the users in the Web Catalog. When there are large number of unique BI users, it can take a significant amount of time to look up a user if all the users reside in a single directory. To avoid this, hash the user directories. It can be achieved by having the following entry in SADATADIR/web/config/instanceconfig.xml

    eg.,

    
    	<Catalog>
    		<HashUserHomeDirectories>2</HashUserHomeDirectories>
    	</Catalog>
    
    

    HashUserHomeDirectories specifies the number of characters to use to hash user names into sub directories. When this element is turned on, for example, the default name for user Steve's home directory would become /users/st/steve.

  • BI Server and BI Presentation Server processes create many temporary files while rendering reports and dashboards for a user. This can result in significant I/O activity on the system. The I/O waits can be minimized by pointing the temporary directories to a memory resident file system such as /tmp on Solaris OS. To achieve this, add the following line to the instanceconfig.xml configuration file.

    eg.,

    
    	<TempDir>/tmp/OracleBISAW</TempDir>
    
    

    Similarly the Temporary directory (SATEMPDIR) can be pointed to a memory resident file system such as /tmp to minimize the I/O waits.

  • Consider tuning the value of CacheMaxEntries in instanceconfig.xml. A value of 20,000 was used in the 50,000 user OBIEE benchmark on T5440 servers. Be aware that the Presentation Services process (sawserver64) consumes more virtual memory when this parameter is set to a high value.

    eg.,
    
    	<CacheMaxEntries>20000</CacheMaxEntries>
    
    
  • If the presentation services log contains errors such as "The queue for the thread pool AsyncLogon is at it's maximum capacity of 50 jobs.", consider increasing the Presentation Services' asynchronous job queue. 50 is the default value.

    The following example increases the job queue size to 200.

    
    	<ThreadPoolDefaults>
    		<AsyncLogon>
    			<MaxQueue>200</MaxQueue>
    		</AsyncLogon>
    	</ThreadPoolDefaults>
    
    
  • Increase the query cache expiry time especially when the BI deployment is supposed to handle large number of concurrent users. The default is 60 minutes. However under very high loads, a cache entry may be removed before one hour if many queries are being run. Hence it is necessary to tune the parameter CacheMaxExpireMinutes in Presentation Services' instanceconfig.xml.

    The following example increases the query cache expiry time to 3 hours.

    
    	<CacheMaxExpireMinutes>180</CacheMaxExpireMinutes>
    
    
  • Consider increasing the Presentation Services' cache timeout values to keep the cached data intact for longer periods.

    The following example increases the cache timeout values to 5 hours in instanceconfig.xml configuration file.

    
    	<AccountIndexRefreshSecs>18000</AccountIndexRefreshSecs>
    	<AccountCacheTimeoutSecs>18000</AccountCacheTimeoutSecs>
    	<CacheTimeoutSecs>18000</CacheTimeoutSecs>
    	<CacheCleanupSecs>18000</CacheCleanupSecs>
    	<PrivilegeCacheTimeoutSecs>18000</PrivilegeCacheTimeoutSecs>
    
    

Oracle BI Server Configuration Recommendations

  • Enable caching at the BI server and control/tune the cache expiry time for each of the table based on your organizations' needs.

  • Unless the repository needs to be edited online frequently, consider setting up the "read only" mode for the repository. It may ease lock contention up to some extent.

  • Increase the session limit and the number of requests per session limit especially when the BI deployment is expected to handle large number of concurrent users. Also increase the number of BI server threads.

    The following configuration was used in 50,000 user OBIEE benchmark on T5440 servers.

    
    (Source configuration file: NQSConfig,.INI)
    
    [ CACHE ]
    ENABLE = YES;
    DATA_STORAGE_PATHS = "/export/oracle/OracleBIData/cache" 500 MB;
    MAX_ROWS_PER_CACHE_ENTRY = 0;
    MAX_CACHE_ENTRY_SIZE = 10 MB;
    MAX_CACHE_ENTRIES = 5000;
    POPULATE_AGGREGATE_ROLLUP_HITS = NO;
    USE_ADVANCED_HIT_DETECTION = NO;
    
    // Cluster-aware cache
    GLOBAL_CACHE_STORAGE_PATH = "/export/oracle/OracleBIsharedRepository/GlobalCacheDirectory" 2048 MB;
    MAX_GLOBAL_CACHE_ENTRIES = 10000;
    CACHE_POLL_SECONDS = 300;
    CLUSTER_AWARE_CACHE_LOGGING = NO;
    
    [ SERVER ]
    READ_ONLY_MODE = YES;
    MAX_SESSION_LIMIT = 20000 ;
    MAX_REQUEST_PER_SESSION_LIMIT = 1500 ;
    SERVER_THREAD_RANGE = 512-2048;
    SERVER_THREAD_STACK_SIZE = 0;
    DB_GATEWAY_THREAD_RANGE = 512-512;
    
    #SERVER_HOSTNAME_OR_IP_ADDRESSES = "ALLNICS";
    CLUSTER_PARTICIPANT = YES;
    
    

Related Blog Posts

Wednesday Feb 18, 2009

Sun Blueprint : MySQL in Solaris Containers

While the costs of managing a data center are becoming a major concern with the increased number of under-utilized servers, customers are actively looking for solutions to consolidate their workloads to:

  • improve server utilization
  • improve data center space utilization
  • reduce power and cooling requirements
  • lower capital and operating expenditures
  • reduce carbon footprint, ..

To cater those customers, Sun offers several virtualization technologies such as Logical Domains, Solaris Containers, xVM at free of cost for SPARC and x86/x64 platforms.

In order to help our customers who are planning for the consolidation of their MySQL databases on systems running Solaris 10, we put together a document with a bunch of installation steps and the best practices to run MySQL inside a Solaris Container. Although the document was focused on the Solaris Containers technology, majority of the tuning tips including the ZFS tips are applicable to all MySQL instances running [on Solaris] under different virtualization technologies.

You can access the blueprint document at the following location:

        Running MySQL Database in Solaris Containers

The blueprint document briefly explains the MySQL server & Solaris Containers technology, introduces different options to install MySQL server on Solaris 10, shows the steps involved in installing and running Solaris Zones & MySQL, and finally provides few best practices to run MySQL optimally inside a Solaris Container.

Feel free to leave a comment if you notice any incorrect information, or if you have generic suggestions to improve documents like these.

Acknowledgments

Many thanks to Prashant Srinivasan, John David Duncan and Margaret B. for their help in different phases of this blueprint.

Monday Oct 13, 2008

Siebel on Sun CMT hardware : Best Practices

The following suggested best practices are applicable to all Siebel deployments on CMT hardware (Tx00, T5x20, T5x40) running Solaris 10 [Note: some of this tuning applies to Siebel running on conventional hardware running Solaris]. These recommendations are based on our observations from the 14,000 user benchmark on Sun SPARC Enterprise T5440. Your mileage may vary.

All Tiers
  • Ensure that the system's firmware is up-to-date.

  • Upgrade to the latest update release of Solaris 10.

      Note to the customers running Siebel on Solaris 5/08: apply the kernel patch 137137-07 as soon as it is available on sunsolve.sun.com web site. Patch 137137-07 and later revisions, Solaris 10 10/08 will have the workaround to a critical Siebel specific bug. Oracle Corporation will eventually fix the bug in their codebase - in the meantime Solaris is covering for Siebel and all other 32-bit applications with their own memory allocators that return unaligned mutexes. Check the RFE 6729759 Need to accommodate non-8-byte-aligned mutexes and Oracle's Siebel support document 735451.1 Do NOT apply Kernel Patch 137111-04 on Solaris 10 for more details.


  • Enable 256M large pages on all nodes. By default, the latest update of Solaris 10 will use a maximum of 4M pages even when 256M pages are a good fit.

      256M pages can be enabled with the following /etc/system tunable.
      \* 256M pages
      set max_uheap_lpsize=0x10000000


  • Pro-actively avoid running into stdio's 256 file descriptors limitation.

      Set the following in a shell or add the following lines to the shell's profile (bash/ksh).
      ulimit -n 2048
      export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1:$LD_PRELOAD_32

      Technically the file descriptor limit can be set to as high as 65536. However from the application's perspective, 2048 is a reasonable limit.


  • Improve scalability with MT-hot memory allocation library, libumem or libmtmalloc.

    To improve the scalability of the multi-threaded workloads, preload MT-hot object-caching memory allocation library like libumem(3LIB), mtmalloc(3MALLOC).

      eg., To preload the libumem library, set the LD_PRELOAD_32 environment variable in the shell (bash/ksh) as shown below.

      export LD_PRELOAD_32=/usr/lib/libumem.so.1:$LD_PRELOAD_32

      Web and the Application servers in the Siebel Enterprise stack are 32-bit. However Oracle 10g or 11g RDBMS on Solaris 10 SPARC is 64-bit. Hence the path to the libumem library in the PRELOAD statement differs slightly in the database-tier as shown below.

      export LD_PRELOAD_64=/usr/lib/sparcv9/libumem.so.1:$LD_PRELOAD_64

    Be aware that the trade-off is the increase in memory footprint -- you may notice 5 to 20% increase in the memory footprint with one of these MT-hot memory allocation libraries preloaded. Also not every Siebel application module benefits from MT-hot memory allocators. The recommendation is to experiment before implementing in production environments.

  • TCP/IP tunables

    Application fared well with the following set of TCP/IP parameters on Solaris 10 5/08.

    ndd -set /dev/tcp tcp_time_wait_interval 60000
    ndd -set /dev/tcp tcp_conn_req_max_q 1024
    ndd -set /dev/tcp tcp_conn_req_max_q0 4096
    ndd -set /dev/tcp tcp_ip_abort_interval 60000
    ndd -set /dev/tcp tcp_keepalive_interval 900000
    ndd -set /dev/tcp tcp_rexmit_interval_initial 3000
    ndd -set /dev/tcp tcp_rexmit_interval_max 10000
    ndd -set /dev/tcp tcp_rexmit_interval_min 3000
    ndd -set /dev/tcp tcp_smallest_anon_port 1024
    ndd -set /dev/tcp tcp_slow_start_initial 2
    ndd -set /dev/tcp tcp_xmit_hiwat 799744
    ndd -set /dev/tcp tcp_recv_hiwat 799744
    ndd -set /dev/tcp tcp_max_buf  8388608
    ndd -set /dev/tcp tcp_cwnd_max  4194304
    ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
    ndd -set /dev/udp udp_xmit_hiwat 799744
    ndd -set /dev/udp udp_recv_hiwat 799744
    ndd -set /dev/udp udp_max_buf 8388608

Siebel Application Tier
  • All T-series systems (T1000/T2000, T5120/T5220, T5120/T5240, T5440) support the 256M page size. However Siebel's siebmtshw script restricts the page size to 4M. Comment out the following lines in $SIEBEL_HOME/siebsrvr/bin/siebmtshw.
      # This will set 4M page size for Heap and 64 KB for stack
      MPSSHEAP=4M
      MPSSSTACK=64K
      MPSSERRFILE=/tmp/mpsserr
      LD_PRELOAD=/usr/lib/mpss.so.1
      export MPSSHEAP MPSSSTACK MPSSERRFILE LD_PRELOAD

  • Experiment with less number of Siebel Object Managers.

      Configure the Object Managers in such a way that each OM will be handling at least 200 active users. Siebel's standard recommendation of 100 or less users per Object Manager is suitable for conventional systems but not ideal for CMT systems like Tx000, T5x20, T5x40, T5440. Sun's CMT systems are ideal for running multi-threaded processes with tons of LWPs per process. Besides, there will be significant improvement in the overall memory footprint with less number of Siebel Object Managers.

  • Try Oracle 11g R1 client in the application-tier. Oracle 10g R2 clients may crash under high load. For the symptoms of the crash, check Solaris/SPARC: Oracle 11gR1 client for Siebel 8.0.

      Oracle 10g R2 10.2.0.4 32-bit client is supposed to have a fix for the process crash issue - however it wasn't verified in our test environment.


Siebel Database Tier
  • Eliminate double buffering by forcing the file system to use direct I/O.

    Oracle database caches the data in its own cache within the shared global area (SGA) known as the database block buffer cache. Database reads and writes are cached in block buffer cache so the subsequent accesses for the same blocks do not need to re-read the data from the operating system. On the other hand, file systems on Solaris default to reading the data though the global file system cache for improved I/O operations. That is, by default each read is cached potentially twice - one copy in the operating system's file system cache, and the other copy in Oracle's block buffer cache. In addition to double caching, there is also some extra CPU overhead for the code which manages the operating system's file system cache. The solution is to eliminate double caching by forcing the file system to bypass the OS file system cache when reading and writing to the disk.

      In the 14,000 user benchmark setup, the UFS file systems (holding the data files and the redo logs) were mounted with the forcedirectio option.

      eg.,
      mount -o forcedirectio /dev/dsk/<partition> <mountpoint>


  • Store data files separate from the redo log files -- If the data files and the redo log files are stored on the same disk drive and if that disk drive fails, the files cannot be used in the database recovery procedures.

      In the 14,0000 user benchmark setup, there are two Sun StorateTek 2540 arrays connected to the T5440 - one array was holding the data files, where as the other was holding the Oracle redo log files.

  • Size online redo logs to control the frequency of log switches.

      In the 14,0000 user benchmark setup, two online redo logs were configured each with 10 GB disk space. When all 14,000 concurrent users are on-line, there is only one log switch in a 60 minute period.

  • If the storage array supports the read-ahead feature, enable it. When 'read-ahead enabled' is set to true, the write will be committed to the cache as opposed to the disk, and the OS signals the application that the write has been committed.


    Oracle Database Initialization Parameters

  • Set Oracle's initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to appropriate value. DB_FILE_MULTIBLOCK_READ_COUNT parameter specifies the maximum number of blocks read in one I/O operation during a sequential scan.

      In the 14,0000 user benchmark configuration, DB_BLOCK_SIZE was set to 8 kB. During the benchmark run, the average reads are around 18.5 kB per second. Hence setting DB_FILE_MULTIBLOCK_READ_COUNT to a high value does not necessarily improve the I/O performance. A value of 8 for the database init parameter DB_FILE_MULTIBLOCK_READ_COUNT seems to perform better.


  • On T5240 and T5440 servers, set the database initialization parameter CPU_COUNT to 64. Otherwise, by default Oracle RDBMS assumes 128 and 256 for the CPU_COUNT on T5240 and T5440 respectively. Oracle's optimizer might use a completely different execution plan when it notices such a large number for the CPU_COUNT; and the resulting execution plan need not necessarily be an optimal one. In the 14,000 user benchmark, setting CPU_COUNT to 64 produced optimal execution plans.


  • On T5240 and T5440 servers, explicitly set the database initialization parameter _enable_NUMA_optimization to FALSE. On these multi-socket servers, _enable_NUMA_optimization will be set to TRUE by default. During the 14,000 user benchmark run, we noticed intermittent shadow process crashes with the default behavior. We didn't realize any additional gains either with the default NUMA optimizations.

Siebel Web Tier
  • Upgrade to the latest service pack of Sun Java Web Server 6.1 (32-bit).

  • Run the Sun Java Web Server in multi-process mode by setting the MaxProcs directive in magnus.conf to a value that is greater than 1. In the multi-process mode, the web server can handle requests using multiple processes with multiple threads in each process.

      When you specify a value greater than 1 for the MaxProcs, the web server relies on the operating system to distribute connections among/between multiple web server processes. However many modern operating systems including Solaris do not distribute connections evenly, particularly when there are a small number of concurrent connections.

  • Tune the maximum simultaneous requests by setting the RqThrottle parameter in the magnus.conf file to appropriate value. A value of 1024 was used in the 14,000 user benchmark.
About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today