Monday Dec 08, 2008

Consolidating Siebel CRM 8.0 on a Single Sun SPARC Enterprise Server, T5440

.. blueprint document is now available on wikis.sun.com. Here is the direct link to the blueprint:
            Consolidating Oracle Siebel CRM 8 on a Single Sun SPARC Enterprise Server.

Siebel 8.0 Platform Sizing and Performance Program (PSPP) benchmark workload was used in running all the performance tests using Solaris Containers and Logical Domains (LDoms) on a single Sun SPARC Enterprise T5440 server running Solaris 10 5/08 (containers) and 10/08 (LDoms). The focus of the blueprint is around 3 major configurations -- performance numbers at 3500 users (small configuration), 7000 users (medium configuration) and 14000 user (large configuration) loads. Hence this blueprint document complements the 14,000 user Siebel 8.0 benchmark that we published earlier back in October 2008.

The blueprint has details around the resource allocations for all the tiers of a typical Siebel deployment to support 3500, 7000 and 14000 concurrent users, offers performance tuning tips that are specific to Solaris and Sun CMT systems, and shows the results from the 3500, 7000 and 14000 user performance tests using Solaris 10's virtualization technologies - Solaris Containers and the Logical Domains.

Notes:

  1. All the performance tests were conducted either with Solaris Containers or with the Logical Domains, but not with a mix of both of those technologies.

  2. Resource allocations were identical in both cases -- that is, with the Solaris Containers and the Logical Domains.

Friday Nov 21, 2008

Oracle on Solaris 10 : Fixing the 'ORA-27102: out of memory' Error

(Crossposting the 2+ year old blog entry from my other blog hosted on blogger. Source URL:
http://technopark02.blogspot.com/2006/09/solaris-10oracle-fixing-ora-27102-out.html)

Symptom:

As part of a database tuning effort you increase the SGA/PGA sizes; and Oracle greets with an ORA-27102: out of memory error message. The system had enough free memory to serve the needs of Oracle.

SQL> startup
ORA-27102: out of memory
SVR4 Error: 22: Invalid argument

Diagnosis
$ oerr ORA 27102
27102, 00000, "out of memory"
// \*Cause: Out of memory
// \*Action: Consult the trace file for details

Not so helpful. Let's look the alert log for some clues.

% tail -2 alert.log
WARNING: EINVAL creating segment of size 0x000000028a006000
fix shm parameters in /etc/system or equivalent

Oracle is trying to create a 10G shared memory segment (depends on SGA/PGA sizes), but operating system (Solaris in this example) responded with an invalid argument (EINVAL) error message. There is a little hint about setting shm parameters in /etc/system.

Prior to Solaris 10, shmsys:shminfo_shmmax parameter has to be set in /etc/system with maximum memory segment value that can be created. 8M is the default value on Solaris 9 and prior versions; where as 1/4th of the physical memory is the default on Solaris 10 and later. On a Solaris 10 (or later) system, it can be verified as shown below:

% prtconf | grep Mem
Memory size: 32760 Megabytes

% id -p
uid=59008(oracle) gid=10001(dba) projid=3(default)

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      7.84GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

Now it is clear that the system is using the default value of 8G in this scenario, where as the application (Oracle) is trying to create a memory segment (10G) larger than 8G. Hence the failure.

So, the solution is to configure the system with a value large enough for the shared segment being created, so Oracle succeeds in starting up the database instance.

On Solaris 9 and prior releases, it can be done by adding the following line to /etc/system, followed by a reboot for the system to pick up the new value.

set shminfo_shmmax = 0x000000028a006000

However shminfo_shmmax parameter was obsoleted with the release of Solaris 10; and Sun doesn't recommend setting this parameter in /etc/system even though it works as expected.

On Solaris 10 and later, this value can be changed dynamically on a per project basis with the help of resource control facilities . This is how we do it on Solaris 10 and later:

% prctl -n project.max-shm-memory -r -v 10G -i project 3

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      10.0GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

Note that changes made with the prctl command on a running system are temporary, and will be lost when the system is rebooted. To make the changes permanent, create a project with projadd command and associate it with the user account as shown below:

% projadd -p 3  -c 'eBS benchmark' -U oracle -G dba  -K 'project.max-shm-memory=(privileged,10G,deny)' OASB
% usermod -K project=OASB oracle
Finally make sure the project is created with projects -l or cat /etc/project commands.

% projects -l
...
...
OASB
        projid : 3
        comment: "eBS benchmark"
        users  : oracle
        groups : dba
        attribs: project.max-shm-memory=(privileged,10737418240,deny)

% cat /etc/project
...
...
OASB:3:eBS benchmark:oracle:dba:project.max-shm-memory=(privileged,10737418240,deny)

With these changes, Oracle would start the database up normally.

SQL> startup
ORACLE instance started.

Total System Global Area 1.0905E+10 bytes
Fixed Size                  1316080 bytes
Variable Size            4429966096 bytes
Database Buffers         6442450944 bytes
Redo Buffers               31457280 bytes
Database mounted.
Database opened.

Related information:

  1. What's New in Solaris System Tuning in the Solaris 10 Release?
  2. Resource Controls (overview)
  3. System Setup Recommendations for Solaris 8 and Solaris 9
  4. Man page of prctl(1)
  5. Man page of projadd


Addendum : Oracle RAC settings

Anonymous Bob suggested the following settings for Oracle RAC in the form of a comment for the benefit of others who run into similar issue(s) when running Oracle RAC. I'm pasting the comment as is (Disclaimer: I have not verified these settings):

Thanks for a great explanation, I would like to add one comment that will help those with an Oracle RAC installation. Modifying the default project covers oracle processes great and is all that is needed for a single instance DB. In RAC however, the CRS process starts the DB and it is a root owned process and root does not use the default project. To fix ORA-27102 issue for RAC I added the following lines to an init script that runs before the init.crs script fires.

# Recommended Oracle RAC system params
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536

# For root processes like crsd
prctl -n project.max-shm-memory -r -v 8G -i project system
prctl -n project.max-shm-ids -r -v 512 -i project system

# For oracle processes like sqlplus
prctl -n project.max-shm-memory -r -v 8G -i project default
prctl -n project.max-shm-ids -r -v 512 -i project default

So simple yet it took me a week working with Oracle and SUN to come up with that answer...Hope that helps someone out.

Bob
# posted by Blogger Bob : 6:48 AM, April 25, 2008

Saturday Nov 15, 2008

Yet Another Siebel 8.0 PSPP Benchmark on Sun CMT Hardware ..

.. Sun SPARC Enterprise T5240.

(This blog entry also serves as a summary page for all the Siebel 8.0 benchmarks that Sun published so far.)

Yesterday Sun published a brand new 10,000 user Siebel 8.0 benchmark result using a combination of T5240 and T5120 servers. In this benchmark, a Sun SPARC Enterprise T5240 server equipped with two 1.2 GHz, 8-Core UltraSPARC T2 Plus processors served as the system under test on which we ran the Siebel Gateway and Enterprise application servers. Two Sun SPARC Enterprise T5120 servers equipped with one 1.2 GHz, 8-Core UltraSPARC T2 processor were configured to run the Oracle database and the Sun Java System Web servers.

A copy of the latest benchmark publication is available on Oracle Applications' benchmark web site at:
        Siebel CRM Release 8.0 Industry Applications and Oracle 10g R2 DB on Sun SPARC Enterprise T5120 & T5240 servers running Solaris 10

For some reason, the topology diagram in the benchmark publication document was messed up esp. the fonts -- probably the odt -> doc -> pdf conversion. The clean copy of the diagram is shown below.

Significance of the Siebel 8.0 on T5240 benchmark

In case if anyone wonder why do we need another Siebel 8.0 benchmark on CMT hardware esp. when we already published couple of Siebel 8.0 benchmarks on T5220 and T5440 systems -- 10,000 users on T5120/T5220 and 14,000 users on T5440, the answer is simple: to show linear scalability.

In the first benchmark that Sun published in January 2008, we showed the scalability of the application, Siebel, on T5220 systems. We were able to scale up to 5,000 concurrent users on a single T5220 system (running the Siebel application servers) with 1.2 GHz, 8-Core US-T2 processor. We've used two such systems to publish the 10,000 user benchmark in the first installment.

The goal of the second benchmark that we published in October 2008 during the T5440 server launch showcases how to consolidate multiple workloads on a T5440 server. We demonstrated it by deploying the whole Siebel Enterprise -- Sun Java System Web Server along with the Siebel Web Server plug-in Siebel Web Engine (SWE), Siebel Gateway Server, Siebel Application Server and the Oracle Database Server -- on a single Sun SPARC Enterprise T5440 server equipped with four 1.4 GHz, 8-Core UltraSPARC T2 Plus processors. We ran 14,000 concurrent virtual users against this setup to make it a benchmark publication. Since we ran all tiers of Siebel Enterprise on the same box, it is hard to compare the scalability numbers from this benchmark against the numbers that we published in the 10,000 user benchmark on T5120/T5220 servers.

In April 2008, Sun has launched the first multi-processor CMT system, Sun SPARC Enterprise T5240. T5240 holds two UltraSPARC T2 Plus processors, and is supposed to exhibit 2x performance[1] as that of a T5220. In other words, two T5220 servers can be consolidated onto a single T5240. To prove this, we re-ran the 10,000 user benchmark that we published back in January 2008 by replacing the two T5220 servers in the application tier with a T5240 server, and keeping the remaining hardware configuration for the web and database servers intact. The results from this benchmark speak for themselves - but for your convenience here is the quick summary of the results.



#users #units x Server Model Business Transactions
Throughput/hour
Projected
Daily
Transactions
Benchmark Publication
URL & Date
10,000 2 x T5220 142,061 1,136,488 10K/T5220, 01/2008
10,000 1 x T5240 141,205 1,129,640 10K/T5240, 11/2008


If you are a Sun-Oracle customer, make sure to check the Siebel on Sun CMT hardware : Best Practices web page for some useful tips.

Related entries:

  1. Siebel 8.0 on Sun SPARC Enterprise T5440 - More Bang for the Buck!!
  2. Sun publishes 10,000 user Siebel 8.0 PSPP benchmark on Niagara 2 systems
  3. Siebel CRM 8.0 PSPP UltraSPARC T2 beats POWER6 and sets World Record


[1] There is no unique definition for the word 'performance' -- it has different meanings based on the context.

Saturday Nov 01, 2008

Ramifications of the Solaris 10 kernel patch 137111

Summary

A recent code change in Solaris 10 inadvertantly exposed an inherent bug in some of the 32-bit applications that rely on their own memory allocators. Due to this, some of the 3rd party applications which were working earlier without the KU 137111 may crash on Solaris 10/SPARC with the KU 137111 (any revision).

Symptoms & the Cause

It was identified that majority of such application failures are mainly due to the applications' custom memory allocator that incorrectly returns 4-byte aligned mutexes in place of the required 8-byte aligned mutexes. In Solaris, mutex_t and pthread_mutex_t structures have been defined to be aligned on an 8-byte boundary. Both of those structures contain the upad64_t member, which is a double even for the 32-bit applications. The natural alignment of a double is 8 bytes; and per the SPARC Compliance Definition 2.4, the structures must be aligned according to their strictest member. That is, applications which create 4-byte aligned mutexes are technically non-compliant on Solaris/SPARC (for the sake of simplicity, such code will be referred to as the non-complying code for the remainder of this blog entry).

Due to a change in the implementation of the userland mutexes introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and pthread_mutex_t must start at 8-byte aligned addresses. If this requirement is not satisfied, all non-compliant applications on Solaris/SPARC may fail with the signal SEGV with a callstack similar to the following one or with similar callstacks containing the function mutex_trylock_process.

  \*_atomic_cas_64(0x141f2c, 0x0, 0xff000000, 0x1651, 0xff000000, 0x466d90)
  set_lock_byte64(0x0, 0x1651, 0xff000000, 0x0, 0xfec82a00, 0x0)
  fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780)
  ...

Patches & the Next Steps

Note that only non-compliant 32-bit applications will be affected by the KU 137111. All other complying 32-bit applications continue to run as expected even with the KU 137111 - hence the customers, partners, ISVs and the other software vendors must understand the fact that it is not a Solaris issue. Customers running into this issue must work with the respective software vendors to obtain a patch/fix. We suggest the ISVs and the rest of the software vendors to pro-actively check their 32-bit native code for any discrepancies like the one mentioned in this blog entry.

In our testing of some of the enterprise applications, we have identified Oracle's Siebel CRM as one of the potential applications that is vulnerable to the KU 137111. It appears that IBM's Lotus Domino Server is also prone to a crash on Solaris 10 with the same kernet patch. Speaking of these two known cases, Oracle/Siebel and IBM/Lotus Domino customers (running Solaris) should approach Oracle and IBM Corporations respectively but not Sun Microsystems for a proper fix.

As it may take some time for the ISVs / software vendors to identify and fix the non-complying code in their applications, Sun is planning to provide an interim fix to the mutex byte alignment issue in the form of a Solaris kernel patch. As of this writing, we expect the fix to be integrated into the KU 137137-07. The fix is already available in the latest update of the Solaris, Solaris 10 10/08. Those who cannot upgrade to Solaris 10 10/08 from the prior versions of Solaris 10 must wait for the patch KU [Updated 12/07/08] 137137-07 137137-09.

One must note that the fix in Solaris is a tentative one that allows the non-complying code to run on SPARC hardware for the time being. There is no guarantee that the non-complying code continues to run 'as is' in the future with new Solaris kernel patches and/or major updates/releases of the Solaris operating system. So the best long term solution is for the software vendors to fix the non-compliant code before it is too late.

Acknowledgments

Steve S and Roger F of Sun Microsystems.

Monday Oct 13, 2008

Siebel on Sun CMT hardware : Best Practices

The following suggested best practices are applicable to all Siebel deployments on CMT hardware (Tx00, T5x20, T5x40) running Solaris 10 [Note: some of this tuning applies to Siebel running on conventional hardware running Solaris]. These recommendations are based on our observations from the 14,000 user benchmark on Sun SPARC Enterprise T5440. Your mileage may vary.

All Tiers
  • Ensure that the system's firmware is up-to-date.

  • Upgrade to the latest update release of Solaris 10.

      Note to the customers running Siebel on Solaris 5/08: apply the kernel patch 137137-07 as soon as it is available on sunsolve.sun.com web site. Patch 137137-07 and later revisions, Solaris 10 10/08 will have the workaround to a critical Siebel specific bug. Oracle Corporation will eventually fix the bug in their codebase - in the meantime Solaris is covering for Siebel and all other 32-bit applications with their own memory allocators that return unaligned mutexes. Check the RFE 6729759 Need to accommodate non-8-byte-aligned mutexes and Oracle's Siebel support document 735451.1 Do NOT apply Kernel Patch 137111-04 on Solaris 10 for more details.


  • Enable 256M large pages on all nodes. By default, the latest update of Solaris 10 will use a maximum of 4M pages even when 256M pages are a good fit.

      256M pages can be enabled with the following /etc/system tunable.
      \* 256M pages
      set max_uheap_lpsize=0x10000000


  • Pro-actively avoid running into stdio's 256 file descriptors limitation.

      Set the following in a shell or add the following lines to the shell's profile (bash/ksh).
      ulimit -n 2048
      export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1:$LD_PRELOAD_32

      Technically the file descriptor limit can be set to as high as 65536. However from the application's perspective, 2048 is a reasonable limit.


  • Improve scalability with MT-hot memory allocation library, libumem or libmtmalloc.

    To improve the scalability of the multi-threaded workloads, preload MT-hot object-caching memory allocation library like libumem(3LIB), mtmalloc(3MALLOC).

      eg., To preload the libumem library, set the LD_PRELOAD_32 environment variable in the shell (bash/ksh) as shown below.

      export LD_PRELOAD_32=/usr/lib/libumem.so.1:$LD_PRELOAD_32

      Web and the Application servers in the Siebel Enterprise stack are 32-bit. However Oracle 10g or 11g RDBMS on Solaris 10 SPARC is 64-bit. Hence the path to the libumem library in the PRELOAD statement differs slightly in the database-tier as shown below.

      export LD_PRELOAD_64=/usr/lib/sparcv9/libumem.so.1:$LD_PRELOAD_64

    Be aware that the trade-off is the increase in memory footprint -- you may notice 5 to 20% increase in the memory footprint with one of these MT-hot memory allocation libraries preloaded. Also not every Siebel application module benefits from MT-hot memory allocators. The recommendation is to experiment before implementing in production environments.

  • TCP/IP tunables

    Application fared well with the following set of TCP/IP parameters on Solaris 10 5/08.

    ndd -set /dev/tcp tcp_time_wait_interval 60000
    ndd -set /dev/tcp tcp_conn_req_max_q 1024
    ndd -set /dev/tcp tcp_conn_req_max_q0 4096
    ndd -set /dev/tcp tcp_ip_abort_interval 60000
    ndd -set /dev/tcp tcp_keepalive_interval 900000
    ndd -set /dev/tcp tcp_rexmit_interval_initial 3000
    ndd -set /dev/tcp tcp_rexmit_interval_max 10000
    ndd -set /dev/tcp tcp_rexmit_interval_min 3000
    ndd -set /dev/tcp tcp_smallest_anon_port 1024
    ndd -set /dev/tcp tcp_slow_start_initial 2
    ndd -set /dev/tcp tcp_xmit_hiwat 799744
    ndd -set /dev/tcp tcp_recv_hiwat 799744
    ndd -set /dev/tcp tcp_max_buf  8388608
    ndd -set /dev/tcp tcp_cwnd_max  4194304
    ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
    ndd -set /dev/udp udp_xmit_hiwat 799744
    ndd -set /dev/udp udp_recv_hiwat 799744
    ndd -set /dev/udp udp_max_buf 8388608

Siebel Application Tier
  • All T-series systems (T1000/T2000, T5120/T5220, T5120/T5240, T5440) support the 256M page size. However Siebel's siebmtshw script restricts the page size to 4M. Comment out the following lines in $SIEBEL_HOME/siebsrvr/bin/siebmtshw.
      # This will set 4M page size for Heap and 64 KB for stack
      MPSSHEAP=4M
      MPSSSTACK=64K
      MPSSERRFILE=/tmp/mpsserr
      LD_PRELOAD=/usr/lib/mpss.so.1
      export MPSSHEAP MPSSSTACK MPSSERRFILE LD_PRELOAD

  • Experiment with less number of Siebel Object Managers.

      Configure the Object Managers in such a way that each OM will be handling at least 200 active users. Siebel's standard recommendation of 100 or less users per Object Manager is suitable for conventional systems but not ideal for CMT systems like Tx000, T5x20, T5x40, T5440. Sun's CMT systems are ideal for running multi-threaded processes with tons of LWPs per process. Besides, there will be significant improvement in the overall memory footprint with less number of Siebel Object Managers.

  • Try Oracle 11g R1 client in the application-tier. Oracle 10g R2 clients may crash under high load. For the symptoms of the crash, check Solaris/SPARC: Oracle 11gR1 client for Siebel 8.0.

      Oracle 10g R2 10.2.0.4 32-bit client is supposed to have a fix for the process crash issue - however it wasn't verified in our test environment.


Siebel Database Tier
  • Eliminate double buffering by forcing the file system to use direct I/O.

    Oracle database caches the data in its own cache within the shared global area (SGA) known as the database block buffer cache. Database reads and writes are cached in block buffer cache so the subsequent accesses for the same blocks do not need to re-read the data from the operating system. On the other hand, file systems on Solaris default to reading the data though the global file system cache for improved I/O operations. That is, by default each read is cached potentially twice - one copy in the operating system's file system cache, and the other copy in Oracle's block buffer cache. In addition to double caching, there is also some extra CPU overhead for the code which manages the operating system's file system cache. The solution is to eliminate double caching by forcing the file system to bypass the OS file system cache when reading and writing to the disk.

      In the 14,000 user benchmark setup, the UFS file systems (holding the data files and the redo logs) were mounted with the forcedirectio option.

      eg.,
      mount -o forcedirectio /dev/dsk/<partition> <mountpoint>


  • Store data files separate from the redo log files -- If the data files and the redo log files are stored on the same disk drive and if that disk drive fails, the files cannot be used in the database recovery procedures.

      In the 14,0000 user benchmark setup, there are two Sun StorateTek 2540 arrays connected to the T5440 - one array was holding the data files, where as the other was holding the Oracle redo log files.

  • Size online redo logs to control the frequency of log switches.

      In the 14,0000 user benchmark setup, two online redo logs were configured each with 10 GB disk space. When all 14,000 concurrent users are on-line, there is only one log switch in a 60 minute period.

  • If the storage array supports the read-ahead feature, enable it. When 'read-ahead enabled' is set to true, the write will be committed to the cache as opposed to the disk, and the OS signals the application that the write has been committed.


    Oracle Database Initialization Parameters

  • Set Oracle's initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to appropriate value. DB_FILE_MULTIBLOCK_READ_COUNT parameter specifies the maximum number of blocks read in one I/O operation during a sequential scan.

      In the 14,0000 user benchmark configuration, DB_BLOCK_SIZE was set to 8 kB. During the benchmark run, the average reads are around 18.5 kB per second. Hence setting DB_FILE_MULTIBLOCK_READ_COUNT to a high value does not necessarily improve the I/O performance. A value of 8 for the database init parameter DB_FILE_MULTIBLOCK_READ_COUNT seems to perform better.


  • On T5240 and T5440 servers, set the database initialization parameter CPU_COUNT to 64. Otherwise, by default Oracle RDBMS assumes 128 and 256 for the CPU_COUNT on T5240 and T5440 respectively. Oracle's optimizer might use a completely different execution plan when it notices such a large number for the CPU_COUNT; and the resulting execution plan need not necessarily be an optimal one. In the 14,000 user benchmark, setting CPU_COUNT to 64 produced optimal execution plans.


  • On T5240 and T5440 servers, explicitly set the database initialization parameter _enable_NUMA_optimization to FALSE. On these multi-socket servers, _enable_NUMA_optimization will be set to TRUE by default. During the 14,000 user benchmark run, we noticed intermittent shadow process crashes with the default behavior. We didn't realize any additional gains either with the default NUMA optimizations.

Siebel Web Tier
  • Upgrade to the latest service pack of Sun Java Web Server 6.1 (32-bit).

  • Run the Sun Java Web Server in multi-process mode by setting the MaxProcs directive in magnus.conf to a value that is greater than 1. In the multi-process mode, the web server can handle requests using multiple processes with multiple threads in each process.

      When you specify a value greater than 1 for the MaxProcs, the web server relies on the operating system to distribute connections among/between multiple web server processes. However many modern operating systems including Solaris do not distribute connections evenly, particularly when there are a small number of concurrent connections.

  • Tune the maximum simultaneous requests by setting the RqThrottle parameter in the magnus.conf file to appropriate value. A value of 1024 was used in the 14,000 user benchmark.

Siebel 8.0 on Sun SPARC Enterprise T5440 - More Bang for the Buck!!

Today Sun announced the 14,000 user Siebel 8.0 PSPP benchmark results on a single Sun SPARC Enterprise T5440. An Oracle white paper with Sun's 14,000 user benchmark results is available on Oracle's Siebel benchmark web site. The content in this blog post complements the benchmark white paper.

Some of the notes and highlights from this competitive benchmark are as follows:

  • Key specifications for the Sun SPARC Enterprise T5440 system under test, are: 4 x UltraSPARC T2 Plus processors, 32 cores, 256 compute threads and 128 GB of memory in a 4RU space.

  • The entire Siebel 8.0 solution was deployed on a single Sun SPARC Enterprise T5440 including the web, gateway, application, and database servers.

      9 load driver clients with dual-core Opteron and Xeon processors were used to load up 14,000 concurrent users

  • Web, Application and the Database servers were isolated from each other by creating three Solaris Containers (non-global zones or local zones) dedicated one each for all those servers.

      Solaris 10 Binary Application Guarantee Program guarantees the binary compatibility for all applications running under Solaris native host operating system environments as well as Solaris 10 OS running as a guest operating system in a virtualized platform environment.

  • Siebel Gateway server and the Siebel Application servers were installed and configured in one of the three Solaris Containers. Two identical copies of Siebel Application server instances were configured to handle 7,000 user load by each of those instances.

      From our experiments with the Siebel 8.0 benchmark workload, it appears that a single instance of Siebel Application server could scale up to 10,000 active users. Siebel Connection Broker (SCBroker) component becomes the bottleneck at the peak load in a single instance of the Siebel Application server.

  • To keep it simple, the benchmark publication white paper limits itself to an overview of the system configuration. The full details are available in the diagram below.

    Topology Diagram



    The breakdown of the approximate averages of CPU and memory utilization by each tier is shown below.

    TierCPUMemory
    Web78%4.50 GB
    App76%69.00 GB
    DB72%20.00 GB

    System-wide averages are as follows:

    TierCPUMemory
    Web + App + DB82%93.5 GB


  • 1276 watts is the average power consumption when all the 14,000 concurrent users are in the steady state of the benchmark test. That is, in the case of similarly configured workloads, T5440 supports 10.97 users per watt of the power consumed; and supports 3500 users per rack unit.

Based on the above notes: Sun SPARC Enterprise T5440 is inexpensive, requires: less power and data center footprint, ideal for consolidation and equally importantly scales well.



Vendor-to-Vendor comparison

How does our new 14,000 user benchmark result compare with the high watermark benchmark results published by other vendors using the same Siebel 8.0 PSPP workload?

Besides Sun, IBM and HP are the only other vendors who published benchmark results so far with the Siebel 8.0 PSPP benchmark workload. IBM's highest user count is 7,000; where as 5,200 is HP's. Here is a quick comparison of the throughputs based on the results published by Sun, IBM and HP with the highest number of active users.

Sun Microsystems' 14,000 user benchmark on a single T5440 outperformed:

  • IBM's 7,000 user benchmark result by 1.9x

  • HP's 5,200 user benchmark result by 2.5x
      HP published the 5,200 user result with a combination of 2 x BL460c running Windows Server 2003 and 1 x rx6600 HP system running HP-UX.

  • Sun's own 10,000 user benchmark result on a combination of 2 x T5120 and 2 x T5220s by 1.4x

From the operating system perspective, Solaris outperformed AIX, Windows Server 2003 and HP-UX. Linux is nowhere to be found in the competitive landscape.

A simple comparison of all the published Siebel 8.0 benchmark results (as of today) by all vendors justifies the title of this blog post. As IBM and HP do not post the list price of all of their servers, I am not even attempting to show the price/performance comparison in here. On the other hand, Sun openly lists out all the list prices at store.sun.com web site.

CAUTION

Although T5440 possesses a ton of great qualities, it might not be suitable for deploying workloads with heavy single-threaded dependencies. The T5440 is an excellent hardware platform for multi-threaded, and moderately single-threaded/multi-process workloads. When in doubt, it is a good idea to leverage Sun Microsystems' Try & Buy program to try the workloads on this new and shiny T5440 before making the final call.



I would like to share the tuning information from the OS and the underlying hardware perspective for couple of reasons -- 1. Oracle's benchmark white paper does not include any of the system specific tuning information, and 2. it may take quite a bit of time for Oracle Corporation to update the Siebel Tuning Guide for Solaris with some of the tuning information that you find in here.

Check the second part of this blog post for the best practices running Siebel on Sun CMT hardware.

Sunday May 25, 2008

Deploying TWiki 4.2.0 on Sun Java Web Server 7.0

(The following instructions are based on Manish Kapur's Installing TWiki on Sun Java System Web Server. These instructions complement Manish's 2005 blog post)

The goal is to get the TWiki up and running on Sun Java Web Server (formerly known as Sun ONE Web Server / iPlanet Web Server). The assumption is that Sun Java Web Server is running on the Solaris operating system.

Steps:
  1. Create 'twiki' user.
    % mkdir /export/twiki
    % useradd -d /export/twiki -s /bin/bash twiki
    % cd /export
    % chown twiki:other twiki
    % passwd twiki

  2. Install Sun Java Web Server 7.0 Update 2 in /export/twiki/sjws7u2 directory. Select 'Custom' installation; and choose port '8080' to run the web server.

    Sun Java Web Server 7.0 Update 2 is available for free at Sun Downloads web site.
            View by Category -> Web & Proxy Servers -> Web Servers -> Web Server 7.0 Update 2.

  3. Install TWiki 4.2.0

    Download TWiki tgz file
    % mkdir /export/twiki/sjws7u2/https-<hostname>/docs/twiki
    % cp TWiki-4.2.0.tgz /export/twiki/sjws7u2/https-<hostname>/docs/twiki
    % cd /export/twiki/sjws7u2/https-<hostname>/docs/twiki
    % gunzip TWiki-4.2.0.tgz
    % tar xf TWiki-4.2.0.tar

  4. Enable CGI on the web server

    • cd /export/twiki/sjws7u2/https-<hostname>/config

    • Edit the 'default' section of obj.conf file to include the following two lines

      NameTrans fn="pfx2dir" from="/twiki/view" dir="/export/twiki/sjws7u2/<hostname>/docs/twiki/bin/view" name="cgi"

      Service fn="send-cgi" type="magnus-internal/cgi" nice="10"


      Hopefully the following example gives an idea on where to insert those two lines. In the example, 't2000-240-08' is the hostname.

       % diff -C 3 obj.conf obj.conf.orig
      
      \*\*\* obj.conf    Wed May 14 01:36:34 2008
      --- obj.conf.orig       Wed May 14 01:02:52 2008
      \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
      \*\*\* 10,17 \*\*\*\*
        AuthTrans fn="match-browser" browser="\*MSIE\*" ssl-unclean-shutdown="true"
        NameTrans fn="ntrans-j2ee" name="j2ee"
        NameTrans fn="pfx2dir" from="/mc-icons" dir="/export/twiki/sjws7u2/lib/icons" name="es-internal"
      - # Consider anything in the directory /export/twiki/sjws7u2/https-t2000-240-08/docs/twiki/bin to be a CGI
      - NameTrans fn="pfx2dir" from="/twiki/view" dir="/export/twiki/sjws7u2/https-t2000-240-08/docs/twiki/bin/view" name="cgi"
        PathCheck fn="uri-clean"
        PathCheck fn="check-acl" acl="default"
        PathCheck fn="find-pathinfo"
      --- 10,15 ----
      \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
      \*\*\* 20,27 \*\*\*\*
        ObjectType fn="type-j2ee"
        ObjectType fn="type-by-extension"
        ObjectType fn="force-type" type="text/plain"
      - # Run CGI processes with a "Nice" level of 10
      - Service fn="send-cgi" type="magnus-internal/cgi" nice="10"
        Service method="(GET|HEAD)" type="magnus-internal/directory" fn="index-common"
        Service method="(GET|HEAD|POST)" type="\*~magnus-internal/\*" fn="send-file"
        Service method="TRACE" fn="service-trace"
      --- 18,23 ----

  5. Install Revision Control System (RCS), GNU diff utilities (diffutils) and update the PATH variable in user's .profile.

    Those packages are available in 'ready to install' form at sunfreeware.com and blastwave.org web sites. Perhaps the easiest way is to use pkg-get to pull those packages with the required dependencies from Blastwave.

    Assuming RCS and diffutils are available under /usr/csw/bin directory,
    % cat ~/.profile | grep PATH
    export PATH=/usr/bin:/usr/sbin:/usr/local/bin:/usr/sfw/bin:/usr/csw/bin:$PATH

  6. Configure the TWiki. Edit TWiki's \*.cfg files, that is.

    1. Create the config file LocalLib.cfg.

      1. There is a template for this config file in twiki/bin/LocalLib.cfg.txt. Copy LocalLib.cfg.txt to LocalLib.cfg.
        % cd /export/twiki/sjws7u2/https-<hostname>/docs/twiki/bin
        % cp LocalLib.cfg.txt LocalLib.cfg
      2. Edit LocalLib.cfg to update the $twikiLibPath variable.
        $twikiLibPath = "/export/twiki/sjws7u2/https-<hostname>/docs/twiki/lib";

    2. Manually create the config file LocalSite.cfg.
      % cd /export/twiki/sjws7u2/https-<hostname>/docs/twiki/lib
      % cp TWiki.spec LocalSite.cfg

    3. Update the following variables with appropriate values in LocalSite.cfg.

      $TWiki::cfg{DefaultUrlHost}, $TWiki::cfg{ScriptUrlPath}, $TWiki::cfg{PubUrlPath}, $TWiki::cfg{PubDir}, $TWiki::cfg{TemplateDir}, $TWiki::cfg{DataDir}, $TWiki::cfg{LocalesDir}, and $TWiki::cfg{OS}, $TWiki::cfg{RCS}{EgrepCmd} and $TWiki::cfg{RCS}{FgrepCmd}.

      This diff output shows how they were configured in the demo TWiki that I used for the Proof-Of-Concept (POC). The hardware it was deployed on is a Sun Fire T2000. So Cool Stack was installed on the server to take advantage of the optimized Perl.

  7. Exit the current shell; and reconnect.

  8. Restart the Sun Java Web Server.
    % cd /export/twiki/sjws7u2/https-<hostname>/bin
    % ./stopserv
    % ./startserv
  9. Manually execute the 'view' script from the command line.
    % cd /export/twiki/sjws7u2/https-/docs/twiki/bin
    % ./view
    You should see valid HTML (not errors) on the stdout. If you see errors, check /export/twiki/sjws7u2/https-<hostname>/logs/errors for the error messages. Fix the issues and re-run 'view' from the command line. Continue this exercise until the script returns valid HTML.

  10. Finally verify the TWiki installation through the web browser.
            Open http://<webhost>:<port>/twiki/view in a web browser.
References:
  1. Installing TWiki on Sun Java System Web Server by Manish Kapur, Sun Microsystems
  2. Using PHP on Sun Java System Web Server by Joe McCabe, Sun Microsystems


Copy of these instructions are also available at my other weblog:
http://technopark02.blogspot.com/2008/05/deploying-twiki-420-on-sun-java-web.html

Tuesday Apr 08, 2008

Running Batch Workloads on Sun's CMT Servers

Ever since Sun introduced Chip Multi-Threading (CMT) hardware in the form of UltraSPARC T1's T1000/T2000, our internal mail aliases were inundated with variety of customer stories, majority of those go like 'batch jobs are taking 12+ hours on T2000, where as it takes only 3 or 4 hours on US-IV+ based v490'. Even after two and half years since the introduction of the revolutionary CMT hardware, it appears that majority of Sun customers are still under the impression that Sun's CMT systems like T2000, T5220 are not capable of handling CPU intensive batch workloads. It is not a valid concern. CMT processors like UltraSPARC T1, T2, T2 Plus can handle batch workloads just as well like any other traditional/conventional processor viz. UltraSPARC-IV+, SPARC64-VI, AMD Opteron, Intel Xeon, IBM POWER6. However CMT awareness and little effort are required at the customer end to achieve good throughput on CMT systems.

First of all, the end users must realize the fact that the maximum clock speed of the existing CMT processor line-up (UltraSPARC T1, UltraSPARC T2, UltraSPARC T2 Plus) is only 1.4 GHz; and on top of that each strand (individual hardware thread) within a core shares the CPU cycles with the other strands that operate on the same core (Note: each core operates at the speed of the processor). Based on these facts, it is no surprise to see batch jobs taking longer times to complete when only one or a very few single-threaded batch jobs are submitted to the system. In such cases, the system resources are fairly under-utilized in addition to the longer elapsed times. One possible trick to achieve the required throughput in the expected time frame is to split up the workload into multiple jobs. For example, if an EDU customer needs to generate 1000 transcripts, the customer should consider submitting 4 individual jobs with 250 transcripts each or 8 jobs with 125 transcripts each rather than submitting one job for all 1000 transcripts. Ideally the customer should observe the resource utilization (CPU%, for example); and experiment with the number of jobs to be submitted until the system achieves the desired throughput within the expected time frame.

Case study: Oracle E-Business Suite Payroll 11i workload on Sun SPARC Enterprise T5220

In order to prove that the aforementioned methodology works beyond a reasonable doubt, let's take Oracle's E-Business Suite 11.5.10 Payroll workload as an example. On a single T5220 with one 1.4 GHz UltraSPARC T2 processor, acting as the batch, application and database server, 4 payroll threads generated 5,000 paychecks in 31.53 minutes of time consuming only 6.04% CPU on average. ~9,500 paychecks is the projected hourly throughput. This is a classic example of what majority of Sun's CMT customers are experiencing as of today i.e., longer batch processing times with little resource consumption. Keep in mind that each UltraSPARC T2 and UltraSPARC T2 Plus processors can execute up to 64 jobs in parallel (on a side note, UltraSPARC T1 processor can execute up to 32 jobs in parallel). So to put the idling resources for effective use, there by to improve the elapsed times and the overall throughput, few experiments were conducted with 64 payroll threads and the results are very impressive. With a maximum of 64 payroll threads, it took only 4.63 minutes to process 5,000 paychecks at an average of 40.77% CPU utilization. In other words, similarly configured T5220 can process ~64,700 paychecks at less than half of the available CPU cycles. Here is a word of caution: just because the processor can execute 64 threads in parallel, it doesn't mean it is always optimal to submit 64 parallel jobs on systems like T5220. Very high number of batch jobs (payroll threads in this particular scenario) might be an overkill for simple tasks like NACHA in Payroll process.

The following white paper has more detailed information about the nature of the workload and the results from the experiments with various number of threads for different components of the Oracle Applications' Payroll batch workload. Refer to the same white paper for the exact tuning information as well.

Link to the white paper:
     E-Business Suite Payroll 11i (11.5.10) using Oracle 10g on a Sun SPARC Enterprise T5220

Here is the summary of the results that were extracted from the white paper:

Hardware configuration
          1x Sun SPARC Enterprise T5220 for running the application, batch and the database servers
              Specifications: 1x 1.4 GHz 8-core UltraSPARC T2 processor with 64 GB memory

Software configuration
          Oracle E-Business Suite 11.5.10
          Oracle 10g R1 10.1.0.4 RDBMS
          Solaris 10 8/07

Results
Oracle E-Business Suite 11i Payroll - Number of employees: 5,000
Component #Threads Time (min) Avg. CPU% Hourly Throughput
Payroll process 64 1.87 90.56 160,714
PrePayments 64 0.20 46.33 1,500,000
Ext. Proc. Archive 64 1.90 90.77 157,895
NACHA 8 0.05 2.52 6,000,000
Check Writer 24 0.38 9 782,609
Costing 48 0.23 32.5 1,285,714
Total or Average NA 4.63 min 40.77% 64,748

It is evident from the average CPU% that the Payroll process and the External Process Archive components are extremely CPU intensive; and hence take longer time to complete. That's the reason 64 threads were configured for those components to run at the full potential of the system. Light-weight components like NACHA need fewer threads to complete the job efficiently. Configuring 64 threads for NACHA will have a negative impact on the throughput. In other words, we would be wasting CPU cycles for no apparent improvement.

It is the responsibility of the customers to tune the application and the workload appropriately. One size doesn't fit all.

The Payroll 11i results on the T5220 demonstrate clearly that Sun's CMT systems are capable of handling batch workloads well. It would be interesting to see how well they perform against other systems equipped with traditional processors with higher clock speeds. For this comparison, we could use couple of results that were published by UNISYS and IBM with the same workload. The following table summarizes the results from the following two white papers. For the sake of completeness, Sun's CMT results were included as well.

Source URLs:
  1. E-Business Suite Payroll 11i (11.5.10) using Oracle 10g on a UNISYS ES7000/one Enterprise Server
  2. E-Business Suite Payroll 11i (11.5.10) using Oracle 10g for Novell SUSE Linux on IBM eServer xSeries 366 Servers
Oracle E-Business Suite 11i Payroll - Number of employees: 5,000
Vendor OS Hardware Config #Threads Time (min) Avg. CPU% Hourly Throughput
UNISYS Linux: RHEL 4 Update 3 DB/App/Batch server: 1x Unisys ES7000/one Enterprise Server (4x 3.0 GHz Dual-Core Intel Xeon 7041 processors, 32 GB memory) 121 5.18 min 53.22% 57,915
IBM Novell SUSE Linux Enterprise Server 9 SP1 DB, App servers: 2x IBM eServer xSeries 366 4-way server (4x 3.66 GHz Intel Xeon MP Processors (EM64T), 32 GB memory) 12 8.42 min 50+%2 35,644
Sun Solaris 10 8/07 DB/App/Batch server: 1x Sun SPARC Enterprise T5220 (1x 1.4 GHz 8-core UltraSPARC T2 processor, 64 GB memory) 8 to 64 4.63 min 40.77% 64,748

Better results were highlighted. The results speak for themselves. One 1.4 GHz UltraSPARC T2 processor outperformed four 3 GHz / 3.66 GHz processors in terms of the average CPU utilization and most importantly in the hourly throughput (Hourly throughput calculation relies on the total elapsed time).

Before we conclude, let us reiterate few things purely based on the factual evidence presented in this blog post:

  • Sun's CMT servers like T2000, T5220, T5240 (two socket system with UltraSPARC T2 Plus processors) are good to run batch workloads like Oracle Applications Payroll 11i

  • Sun's CMT servers like T2000, T5220, T5240 are good to run the Oracle 10g RDBMS when the DML/DDL/SQL statements that make up the majority of the workload are not very complex, and

  • When the application is tuned appropriately, the performance of CMT processors can outperform some of the traditional processors that were touted to deliver the best single thread performance

Footnotes

1. There is a note in the UNISYS/Payroll 11i white paper that says "[...] the gains {from running increased numbers of threads} decline at higher numbers of parallel threads." This is quite contrary to what Sun observed in its Payroll 11i experiments on UltraSPARC T2 based T5220. Higher number of parallel threads (maximum: 64) improved the throughput on T5220, where as UNISYS' observation is based on their experiments with a maximum of 12 parallel threads. Moral of the story: do NOT treat all hardware alike.

2. IBM's Payroll 11i white paper has no references to the average CPU numbers. 50+% was derived from the "Figure 3: Average CPU Utilization".

Friday Jan 04, 2008

Sun publishes 10,000 user Siebel 8.0 PSPP benchmark on Niagara 2 systems

Here is the link to the benchmark white paper publication:

Siebel CRM Release 8.0 Industry Applications and Oracle 10gR2 DB on Sun SPARC Enterprise T5120/T5220 servers running the Solaris 10 OS

Some key highlights from the white paper:


  • All tiers of the Siebel CRM Release 8.0 architecture ran on chip multithreading (CMT) processor, UltraSPARC T2 based T5120/T5220 systems running Solaris 10 8/07.
  • Multithreading capability of the US T2 processor allowed each of the active Object Manager (OM) processes to run hundreds of Light Weight Processes (LWP), thus utilizing the available resources very effectively. A total number of 30 object managers serviced the work load of 10,000 concurrent users.
  • While supporting 10,000 concurrent Siebel users, the entire Sun Solution based on Sun SPARC Enterprise T5120/T5220 servers running Siebel CRM Release 8.0 and Oracle 10g R2 on top of Solaris 10 consumed 1202 watts in a 6U rack space. As a result the T5210/T5220 supports 8.3 users per watt of energy consumed and supports 1666 users per rack unit.
  • Apparently this is the best Siebel 8.0 benchmark result out there in terms of the number of users and price/performance. Feel free to compare Sun's 10,000 user result with other published results that you may find at Oracle Siebel Benchmark White Papers web page.

Tuesday Nov 27, 2007

Solaris/SPARC: Oracle 11gR1 client for Siebel 8.0

First things first - Oracle 11g Release 1 for Solaris/SPARC is available now; and can be downloaded from here.

In some Siebel 8.0 environments where Oracle database is being used, customers might be noticing intermittent Siebel object manager crashes under high loads when the work is actively being done by tons of LWPs with less number of object managers. Usually the call stack might look something like:

/export/home/siebel/siebsrvr/lib/libsslcosd.so:0x4ad24
/lib/libc.so.1:0xc5924
/lib/libc.so.1:0xba868
/export/home/oracle/lib32/libclntsh.so.10.1:kpuhhfre+0x8d0 [ Signal 11 (SEGV)]
/export/home/oracle/lib32/libclntsh.so.10.1:kpufhndl0+0x35ac

/export/home/siebel/siebsrvr/lib/libsscdo90.so:0x3140c
/export/home/siebel/siebsrvr/lib/libsscfdm.so:0x677944
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cQCSSLockSqlCursorGDelete6M_v_+0x264
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSDbConnOCacheSqlCursor6MpnMCSSSqlCursor__I_+0x734
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSDbConnRTryCacheSqlCursor6MpnMCSSSqlCursor_b_I_+0xcc
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSSqlObjOCacheSqlCursor6MpnMCSSSqlCursor_b_I_+0x6e4
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSSqlObjHRelease6Mb_v_+0x4d0
/export/home/siebel/siebsrvr/lib/libsscfom.so:__1cKCSSBusComp2T5B6M_v_+0x790
/export/home/siebel/siebsrvr/lib/libsscacmbc.so:__1cJCSSBCBase2T5B6M_v_+0x640
/export/home/siebel/siebsrvr/lib/libsscasabc.so:0x19275c
/export/home/siebel/siebsrvr/lib/libsscfom.so:0x318774
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cWCSSSWEFrameMgrInternalPReleaseBOScreen6M_v_+0x10c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalJBuildView6MpkHnLBUILDSCREEN_pnSCSSSWEViewBookmark_ipnJCSSBusObj_pnPCSSDrilldownDef_p1pI_I_+0x18b0
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalJBuildView6MrpnKCSSSWEView_pkHnLBUILDSCREEN_pnSCSSSWEViewBookmark_ipnJCSSBusObj_pnPCSSDrilldownDef_p4_I_+0x50
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrUActionBuildViewAsync6MpnbACSSSWEActionBuildViewAsync_pnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x38c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrODoPostedAction6MpnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x104
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrSCheckPostedActions6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo_ri_I_+0x378
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalSInvokeAppletMethod6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo_rnOCSSStringArray__I_+0x12f8
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorMInvokeMethod6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x88c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorP_ProcessCommand6MpnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x860
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorOProcessCommand6MpnUCSSSWEGenericRequest_pnVCSSSWEGenericResponse_rpnNWWEReqModInfo_rpnJWWECbInfo__I_+0x9bc
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorOProcessCommand6MpnRCSSSWEHttpRequest_pnSCSSSWEHttpResponse_rpnJWWECbInfo__I_+0xd8
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cSCSSServiceSWEIfaceHRequest6MpnMCSSSWEReqRec_pnRCSSSWEResponseRec__I_+0x404
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cSCSSServiceSWEIfaceODoInvokeMethod6MpkHrknOCCFPropertySet_r3_I_+0xa80
/export/home/siebel/siebsrvr/lib/libsscfom.so:__1cKCSSServiceMInvokeMethod6MpkHrknOCCFPropertySet_r3_I_+0x244
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:__1cOCSSSIOMSessionTModInvokeSrvcMethod6MpkHp13rnISSstring__i_+0x124
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:
__1cOCSSSIOMSessionMRPCMiscModel6MnMSISOMRPCCode_nMSISOMArgType_LpnSCSSSISOMRPCArgList_4ripv_i_+0x5bc
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:
__1cOCSSSIOMSessionJHandleRPC6MnMSISOMRPCCode_nMSISOMArgType_LpnSCSSSISOMRPCArgList_4ripv_i_+0xb98
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cJCSSClientLHandleOMRPC6MpnMCSSClientReq__I_+0x78
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cJCSSClientNHandleRequest6MpnMCSSClientReq__I_+0x2f8
/export/home/siebel/siebsrvr/lib/libsssasos.so:0x8c2c4
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cLSOMMTServerQSessionHandleMsg6MpnJsmiSisReq__i_+0x1bc
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cNsmiMainThreadUCompSessionHandleMsg6MpnJsmiSisReq__i_+0x16c
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cM_smiMessageQdDOProcessMessage6MpnM_smiMsgQdDItem_li_i_+0x93c
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cM_smiMessageQdDOProcessRequest6Fpv1r1_i_+0x244
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cN_smiWorkQdDueuePProcessWorkItem6Mpv1r1_i_+0xd4
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cN_smiWorkQdDueueKWorkerTask6Fpv_i_+0x300
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cQSmiThrdEntryFunc6Fpv_i_+0x46c
/export/home/siebel/siebsrvr/lib/libsslcosd.so:0x5be88
/export/home/siebel/siebsrvr/mw/lib/libmfc400su.so:__1cP_AfxThreadEntry6Fpv_I_+0x100
/export/home/siebel/siebsrvr/mw/lib/libkernel32.so:__1cIMwThread6Fpv_v_+0x23c
/lib/libc.so.1:0xc57f8

Setting the Siebel environment variable SIEBEL_STDERROUT to 1 shows the following heap dump in StdErrOut directory under Siebel enterprise logs directory.

% more stderrout_7762_23511113.txt
\*\*\*\*\*\*\*\*\*\* Internal heap ERROR 17112 addr=35dddae8 \*\*\*\*\*\*\*\*\*

\*\*\*\*\* Dump of memory around addr 35dddae8:
35DDCAE0 00000000 00000000 [........]
35DDCAF0 00000000 00000000 00000000 00000000 [................]
Repeat 243 times
35DDDA30 00000000 00000000 00003181 00300020 [..........1..0. ]
35DDDA40 0949D95C 35D7A888 10003179 00000000 [.I.\\5.....1y....]
35DDDA50 0949D95C 0949D8B8 35D7A89C C0000075 [.I.\\.I..5......u]
...
...
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
HEAP DUMP heap name="Alloc environm" desc=949d8b8
extent sz=0x1024 alt=32767 het=32767 rec=0 flg=3 opc=2
parent=949d95c owner=0 nex=0 xsz=0x1038
EXTENT 0 addr=364fb324
Chunk 364fb32c sz= 4144 free " "
EXTENT 1 addr=364f8ebc
Chunk 364f8ec4 sz= 4144 free " "
EXTENT 2 addr=364f7d5c
Chunk 364f7d64 sz= 4144 free " "
EXTENT 3 addr=364f6d04
Chunk 364f6d0c sz= 4144 recreate "Alloc statemen " latch=0
ds 2c38df34 sz= 4144 ct= 1
...
...
EXTENT 406 addr=35ddda54
Chunk 35ddda5c sz= 116 free " "
Chunk 35dddad0 sz= 24 BAD MAGIC NUMBER IN NEXT CHUNK (6)
freeable assoc with mark prv=0 nxt=0

Dump of memory from 0x35DDDAD0 to 0x35DDEAE8
35DDDAD0 20000019 35DDDA5C 00000000 00000000 [ ...5..\\........]
35DDDAE0 00000095 0000008B 00000006 35DDDAD0 [............5...]
35DDDAF0 00000000 00000000 00000095 35DDDB10 [............5...]
...
...
EXTENT 2080 addr=d067a6c
Chunk d067a74 sz= 2220 freeable "Alloc statemen " ds=2b0fffe4
Chunk d068320 sz= 1384 freeable assoc with mark prv=0 nxt=0
Chunk d068888 sz= 4144 freeable "Alloc statemen " ds=2b174550
Chunk d0698b8 sz= 4144 recreate "Alloc statemen " latch=0
ds 1142ea34 sz= 112220 ct= 147
223784cc sz= 4144
240ea014 sz= 884
28eac1bc sz= 900
2956df7c sz= 900
1ae38c34 sz= 612
92adaa4 sz= 884
2f6b96ac sz= 640
c797bc4 sz= 668
2965dde4 sz= 912
1cf6ad4c sz= 656
10afa5e4 sz= 656
2f6732bc sz= 700
27cb3964 sz= 716
1b91c1fc sz= 584
a7c28ac sz= 884
169ac284 sz= 900
...
...
Chunk 2ec307c8 sz= 12432 free " "
Chunk 3140a3f4 sz= 4144 free " "
Chunk 31406294 sz= 4144 free " "
Bucket 6 size=16400
Bucket 7 size=32784
Total free space = 340784
UNPINNED RECREATABLE CHUNKS (lru first):
PERMANENT CHUNKS:
Chunk 949f3c8 sz= 100 perm "perm " alo=100
Permanent space = 100
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
Hla: 255

ORA-21500: internal error code, arguments: [17112], [0x35DDDAE8], [], [], [], [], [], []
Errors in file :
ORA-21500: internal error code, arguments: [17112], [0x35DDDAE8], [], [], [], [], [], []

----- Call Stack Trace -----
NOTE: +offset is used to represent that the
function being called is offset bytes from
the _PROCEDURE_LINKAGE_TABLE_.
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
F2365738 CALL +23052 D7974250 ? D797345C ? DD20 ?
D79741AC ? D79735F8 ?
F2ECD7A0 ?
F286DDB8 PTR_CALL 00000000 949A018 ? 14688 ? B680B0 ?
F2365794 ? F2ECD7A0 ? 14400 ?
F286E18C CALL +77460 949A018 ? 0 ? F2F0E8D4 ?
1004 ? 1000 ? 1000 ?
F286DFF8 CALL +66708 949A018 ? 0 ? 42D8 ? 1 ?
D79743E0 ? 949E594 ?
...
...
__1cN_smiWorkQdDueu CALL __1cN_smiWorkQdDueu 1C8F608 ? 18F55F38 ?
30A5A008 ? 1A98E208 ?
FDBDF178 ? FDBE0424 ?
__1cQSmiThrdEntryFu PTR_CALL 00000000 1C8F608 ? FDBE0424 ?
1AB6EDE0 ? FDBDF178 ? 0 ?
1500E0 ?
__1cROSDWslThreadSt PTR_CALL 00000000 1ABE8250 ? 600140 ? 600141 ?
105F76E8 ? 0 ? 1AC74864 ?
__1cP_AfxThreadEntr PTR_CALL 00000000 0 ? FF30A420 ? 203560 ?
1AC05AF0 ? E2 ? 1AC05A30 ?
__1cIMwThread6Fpv_v PTR_CALL 00000000 D7A7DF6C ? 17F831E0 ? 0 ? 1 ?
0 ? 17289C ?
_lwp_start()+0 ? 00000000 1 ? 0 ? 1 ? 0 ? FCE6C710 ?
1AC72680 ?

----- Argument/Register Address Dump -----

Argument/Register addr=d7974250. Dump of memory from 0xD7974210 to 0xD7974350
D7974200 0949A018 00014688 00B680B0 F2365794
D7974220 F2ECD7A0 00014400 D7974260 F286DDB8 800053FC 80000000 00000002 D79743E0
D7974240 4556454E 545F3231 35303000 32000000 F23654D4 F2365604 00000000 0949A018
D7974260 00000000 0949A114 0000000A F2ECD7A0 FC873504 00001004 F2365794 F2F0E8D4
D7974280 0949A018 00000000 F2F0E8D4 00001004 00001000 00001000 D79742D0 F286E18C
D79742A0 F6CB7400 FC872CC0 00000004 00CDE34C 4556454E 545F3437 31313200 00000000
D79742C0 00000000 00000001 D79743E0 00000000 F2F0E8D4 00001004 00001000 00007530
D79742E0 00007400 00000001 FC8731D4 00003920 0949A018 00000000 000042D8 00000001
D7974300 D79743E0 0949E594 D7974330 F286DFF8 D7974338 F244D5A8 00000000 01000000
D7974320 364FBFFF 01E2C000 00001000 00000000 00001000 00000000 F2ECD7A0 F23654D4
D7974340 000000FF 00000001 F2F0E8D4 F25F9824
Argument/Register addr=d797345c. Dump of memory from 0xD797341C to 0xD797355C
D7973400 F2365738
...
...
Argument/Register addr=1eb21388. Dump of memory from 0x1EB21348 to 0x1EB21488
1EB21340 00000000 00000000 FE6CE088 FE6CE0A4 0000000A 00000010
1EB21360 00000000 00000000 00000000 00000010 00000000 00000000 00000000 FEB99308
1EB21380 00000000 00000000 00000000 1C699634 FFFFFFFF 00000000 FFFFFFFF 00000001
1EB213A0 00000000 00000000 00000000 00000000 00000081 00000000 F16F1038 67676767
1EB213C0 00000000 FEB99308 00000000 00000000 FEB99308 FEB46EF4 00000000 00000000
1EB213E0 00000000 00000000 FEB99308 00000000 00000000 00000001 0257E3B4 03418414
1EB21400 00000000 00000000 FEB99308 FEB99308 00000000 00000000 00000000 00000000
1EB21420 00000000 00000000 00000000 00000000 1EB213B0 00000000 00000041 00000000
1EB21440 0031002D 00410036 00510038 004C0000 00000000 00000000 00000000 00000000
1EB21460 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1EB21480 00000109 00000000

----- End of Call Stack Trace -----

Although I'm not sure what exactly is the underlying issue for the core dump, my suspicion is that there is some memory corruption in Oracle client's code; and the Siebel Object Manager crash is due to the Oracle bug 5682121 - Multithreaded OCI clients do not mutex properly for LOB operations. The fix to this particular bug would be available in Oracle 10.2.0.4 release; and is already available as part of Oracle 11.1.0.6.0. In case if you notice the symptoms of failure as described in this blog post, upgrade the Oracle client in the application-tier to Oracle 11gR1 and see if it brings stability to the Siebel environment.

Original blog post is at:
http://technopark02.blogspot.com/2007/11/solarissparc-oracle-11gr1-client-for.html

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today