Sunday Nov 30, 2008

PeopleSoft on Solaris 10: Fixing the "msgget: No space left on device" Error

(Crossposting the 8+ month old blog entry from my other blog hosted on blogger. Source URL:
http://technopark02.blogspot.com/2008/03/peoplesoft-fixing-msgget-no-space-left.html
)

When a large number of application server processes are configured in a single PeopleSoft domain or in multiple domains cumulative, it is very likely that the PeopleSoft application server domain boot process may fail with errors like:

Booting server processes ...
exec PSSAMSRV -A -- -C psappsrv.cfg -D CS90SPV -S PSSAMSRV :
        Failed.
113954.ben15!PSSAMSRV.29746.1.0: LIBTUX_CAT:681: ERROR: Failure to create message queue
113954.ben15!PSSAMSRV.29746.1.0: LIBTUX_CAT:248: ERROR: System init function failed, Uunixerr = : 
                   msgget: No space left on device
113954.ben15!tmboot.29708.1.-2: CMDTUX_CAT:825: ERROR: Process PSSAMSRV at ben15 failed with /T 
                   tperrno (TPEOS - operating system error)

In this particular example, the PeopleSoft Enterprise is running on a Solaris 10 system. Fortunately the error message is very clear in this case; and the failure is related to the message queues. During the domain boot up process, there is a call to msgget() to create a message queue. If the call to msgget() succeeds, it returns a non-negative integer that serves as the identifier for the newly created message queue. However in the case of a failure, it returns -1 and sets the error number to EACCES, EEXIST, ENOENT or ENOSPC depending on the underlying reason.

From the above error messages it clear that the msgget() failed with the errno set to ENOSPC (No space left on device). Man page of msgget(2) has the following explanation for ENOSPC error code on Solaris:

ERRORS
     The msgget() function will fail if:
     ...
     ...
     ENOSPC    A message queue identifier is to  be  created  but
               the  system-imposed limit on the maximum number of
               allowed  message  queue  identifiers  system  wide
               would be exceeded. See NOTES.

NOTES
     ...
     ...

     The system-imposed limit on  the  number  of  message  queue
     identifiers  is  maintained on a per-project basis using the
     project.max-msg-ids resource control.

It has enough clues to suspect the configured number for the message queue identifiers.

Prior to the release of Solaris 10, the /etc/system System V IPC tunable, msgsys:msginfo_msgmni, was used to control the maximum number of message queues that can be created. The default value on pre-Solaris 10 systems is 50.

With the release of Solaris 10, majority of the System V IPC tunables were obsoleted and equivalent resource controls were created for the remaining tunables to reduce the administrative overhead. On Solaris 10 and later versions, System V IPC can be tuned on a per project basis using the newly introduced resource controls.

On any Solaris 10 system, the resource control, project.max-msg-ids, replaced the old /etc/system tunable, msginfo_msgmni. And the default value has been raised to 128.

Now back to the failure in PeopleSoft environment. Let's first check the current value configured for project.max-msg-ids.

  • Get the project ID.
     % id -p
    uid=222227(psft) gid=2294(dba) projid=3(default)
  • Examine the project.max-msg-ids resource control for the project with ID 3, using the prctl utility.
     % prctl -n project.max-msg-ids -i project 3
    project: 3: default
    NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
    project.max-msg-ids
            privileged        128       -   deny                                 -
            system          16.8M     max   deny                                 -

Alternatively run the command ipcs -q to check the number of active message queues. Note that the project with id '3' is configured to create a maximum of 128 (default) message queues. In any case, the number of active message queues from the ipcs -q output may almost match with the configured value for the project.max-msg-ids.

Since it appears the configured PeopleSoft domain(s) needs more than 128 message queues in order to bring up all the application server processes that constitute the PeopleSoft Enterprise, the solution is to increase the value for the resource control, project.max-msg-ids, to any value beyond 128. For the sake of simplicity, let's increase it to 256 (2 \* default value, that is). Again prctl utility can be used to set the new value for the resource control.

  • Assume the privileges of the 'root' user
     % su
    Password:
  • Increase the maximum value for the message queue identifiers to 256 using the prctl utility.
     # prctl -n project.max-msg-ids -r -v 256 -i project 3
  • Verify the new maximum value for the message queue identifiers
     # prctl -n project.max-msg-ids -i project 3
    project: 3: default
    NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
    project.max-msg-ids
            privileged        256       -   deny                                 -
            system          16.8M     max   deny                                 -

With this change, the PeopleSoft Enterprise should boot up at least with no Failure to create message queue .. msgget: No space left on device errors.

Before we conclude, note that the above mentioned solution is not persistent across multiple operating system reboots. To make it persistent, create a new project using the projadd command. The man page for projadd(1M) has an example showing the creation of a project.

Friday Nov 21, 2008

Oracle on Solaris 10 : Fixing the 'ORA-27102: out of memory' Error

(Crossposting the 2+ year old blog entry from my other blog hosted on blogger. Source URL:
http://technopark02.blogspot.com/2006/09/solaris-10oracle-fixing-ora-27102-out.html)

Symptom:

As part of a database tuning effort you increase the SGA/PGA sizes; and Oracle greets with an ORA-27102: out of memory error message. The system had enough free memory to serve the needs of Oracle.

SQL> startup
ORA-27102: out of memory
SVR4 Error: 22: Invalid argument

Diagnosis
$ oerr ORA 27102
27102, 00000, "out of memory"
// \*Cause: Out of memory
// \*Action: Consult the trace file for details

Not so helpful. Let's look the alert log for some clues.

% tail -2 alert.log
WARNING: EINVAL creating segment of size 0x000000028a006000
fix shm parameters in /etc/system or equivalent

Oracle is trying to create a 10G shared memory segment (depends on SGA/PGA sizes), but operating system (Solaris in this example) responded with an invalid argument (EINVAL) error message. There is a little hint about setting shm parameters in /etc/system.

Prior to Solaris 10, shmsys:shminfo_shmmax parameter has to be set in /etc/system with maximum memory segment value that can be created. 8M is the default value on Solaris 9 and prior versions; where as 1/4th of the physical memory is the default on Solaris 10 and later. On a Solaris 10 (or later) system, it can be verified as shown below:

% prtconf | grep Mem
Memory size: 32760 Megabytes

% id -p
uid=59008(oracle) gid=10001(dba) projid=3(default)

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      7.84GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

Now it is clear that the system is using the default value of 8G in this scenario, where as the application (Oracle) is trying to create a memory segment (10G) larger than 8G. Hence the failure.

So, the solution is to configure the system with a value large enough for the shared segment being created, so Oracle succeeds in starting up the database instance.

On Solaris 9 and prior releases, it can be done by adding the following line to /etc/system, followed by a reboot for the system to pick up the new value.

set shminfo_shmmax = 0x000000028a006000

However shminfo_shmmax parameter was obsoleted with the release of Solaris 10; and Sun doesn't recommend setting this parameter in /etc/system even though it works as expected.

On Solaris 10 and later, this value can be changed dynamically on a per project basis with the help of resource control facilities . This is how we do it on Solaris 10 and later:

% prctl -n project.max-shm-memory -r -v 10G -i project 3

% prctl -n project.max-shm-memory -i project 3
project: 3: default
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-shm-memory
        privileged      10.0GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

Note that changes made with the prctl command on a running system are temporary, and will be lost when the system is rebooted. To make the changes permanent, create a project with projadd command and associate it with the user account as shown below:

% projadd -p 3  -c 'eBS benchmark' -U oracle -G dba  -K 'project.max-shm-memory=(privileged,10G,deny)' OASB
% usermod -K project=OASB oracle
Finally make sure the project is created with projects -l or cat /etc/project commands.

% projects -l
...
...
OASB
        projid : 3
        comment: "eBS benchmark"
        users  : oracle
        groups : dba
        attribs: project.max-shm-memory=(privileged,10737418240,deny)

% cat /etc/project
...
...
OASB:3:eBS benchmark:oracle:dba:project.max-shm-memory=(privileged,10737418240,deny)

With these changes, Oracle would start the database up normally.

SQL> startup
ORACLE instance started.

Total System Global Area 1.0905E+10 bytes
Fixed Size                  1316080 bytes
Variable Size            4429966096 bytes
Database Buffers         6442450944 bytes
Redo Buffers               31457280 bytes
Database mounted.
Database opened.

Related information:

  1. What's New in Solaris System Tuning in the Solaris 10 Release?
  2. Resource Controls (overview)
  3. System Setup Recommendations for Solaris 8 and Solaris 9
  4. Man page of prctl(1)
  5. Man page of projadd


Addendum : Oracle RAC settings

Anonymous Bob suggested the following settings for Oracle RAC in the form of a comment for the benefit of others who run into similar issue(s) when running Oracle RAC. I'm pasting the comment as is (Disclaimer: I have not verified these settings):

Thanks for a great explanation, I would like to add one comment that will help those with an Oracle RAC installation. Modifying the default project covers oracle processes great and is all that is needed for a single instance DB. In RAC however, the CRS process starts the DB and it is a root owned process and root does not use the default project. To fix ORA-27102 issue for RAC I added the following lines to an init script that runs before the init.crs script fires.

# Recommended Oracle RAC system params
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536

# For root processes like crsd
prctl -n project.max-shm-memory -r -v 8G -i project system
prctl -n project.max-shm-ids -r -v 512 -i project system

# For oracle processes like sqlplus
prctl -n project.max-shm-memory -r -v 8G -i project default
prctl -n project.max-shm-ids -r -v 512 -i project default

So simple yet it took me a week working with Oracle and SUN to come up with that answer...Hope that helps someone out.

Bob
# posted by Blogger Bob : 6:48 AM, April 25, 2008

Saturday Nov 01, 2008

Ramifications of the Solaris 10 kernel patch 137111

Summary

A recent code change in Solaris 10 inadvertantly exposed an inherent bug in some of the 32-bit applications that rely on their own memory allocators. Due to this, some of the 3rd party applications which were working earlier without the KU 137111 may crash on Solaris 10/SPARC with the KU 137111 (any revision).

Symptoms & the Cause

It was identified that majority of such application failures are mainly due to the applications' custom memory allocator that incorrectly returns 4-byte aligned mutexes in place of the required 8-byte aligned mutexes. In Solaris, mutex_t and pthread_mutex_t structures have been defined to be aligned on an 8-byte boundary. Both of those structures contain the upad64_t member, which is a double even for the 32-bit applications. The natural alignment of a double is 8 bytes; and per the SPARC Compliance Definition 2.4, the structures must be aligned according to their strictest member. That is, applications which create 4-byte aligned mutexes are technically non-compliant on Solaris/SPARC (for the sake of simplicity, such code will be referred to as the non-complying code for the remainder of this blog entry).

Due to a change in the implementation of the userland mutexes introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and pthread_mutex_t must start at 8-byte aligned addresses. If this requirement is not satisfied, all non-compliant applications on Solaris/SPARC may fail with the signal SEGV with a callstack similar to the following one or with similar callstacks containing the function mutex_trylock_process.

  \*_atomic_cas_64(0x141f2c, 0x0, 0xff000000, 0x1651, 0xff000000, 0x466d90)
  set_lock_byte64(0x0, 0x1651, 0xff000000, 0x0, 0xfec82a00, 0x0)
  fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780)
  ...

Patches & the Next Steps

Note that only non-compliant 32-bit applications will be affected by the KU 137111. All other complying 32-bit applications continue to run as expected even with the KU 137111 - hence the customers, partners, ISVs and the other software vendors must understand the fact that it is not a Solaris issue. Customers running into this issue must work with the respective software vendors to obtain a patch/fix. We suggest the ISVs and the rest of the software vendors to pro-actively check their 32-bit native code for any discrepancies like the one mentioned in this blog entry.

In our testing of some of the enterprise applications, we have identified Oracle's Siebel CRM as one of the potential applications that is vulnerable to the KU 137111. It appears that IBM's Lotus Domino Server is also prone to a crash on Solaris 10 with the same kernet patch. Speaking of these two known cases, Oracle/Siebel and IBM/Lotus Domino customers (running Solaris) should approach Oracle and IBM Corporations respectively but not Sun Microsystems for a proper fix.

As it may take some time for the ISVs / software vendors to identify and fix the non-complying code in their applications, Sun is planning to provide an interim fix to the mutex byte alignment issue in the form of a Solaris kernel patch. As of this writing, we expect the fix to be integrated into the KU 137137-07. The fix is already available in the latest update of the Solaris, Solaris 10 10/08. Those who cannot upgrade to Solaris 10 10/08 from the prior versions of Solaris 10 must wait for the patch KU [Updated 12/07/08] 137137-07 137137-09.

One must note that the fix in Solaris is a tentative one that allows the non-complying code to run on SPARC hardware for the time being. There is no guarantee that the non-complying code continues to run 'as is' in the future with new Solaris kernel patches and/or major updates/releases of the Solaris operating system. So the best long term solution is for the software vendors to fix the non-compliant code before it is too late.

Acknowledgments

Steve S and Roger F of Sun Microsystems.

Tuesday Nov 27, 2007

Solaris/SPARC: Oracle 11gR1 client for Siebel 8.0

First things first - Oracle 11g Release 1 for Solaris/SPARC is available now; and can be downloaded from here.

In some Siebel 8.0 environments where Oracle database is being used, customers might be noticing intermittent Siebel object manager crashes under high loads when the work is actively being done by tons of LWPs with less number of object managers. Usually the call stack might look something like:

/export/home/siebel/siebsrvr/lib/libsslcosd.so:0x4ad24
/lib/libc.so.1:0xc5924
/lib/libc.so.1:0xba868
/export/home/oracle/lib32/libclntsh.so.10.1:kpuhhfre+0x8d0 [ Signal 11 (SEGV)]
/export/home/oracle/lib32/libclntsh.so.10.1:kpufhndl0+0x35ac

/export/home/siebel/siebsrvr/lib/libsscdo90.so:0x3140c
/export/home/siebel/siebsrvr/lib/libsscfdm.so:0x677944
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cQCSSLockSqlCursorGDelete6M_v_+0x264
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSDbConnOCacheSqlCursor6MpnMCSSSqlCursor__I_+0x734
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSDbConnRTryCacheSqlCursor6MpnMCSSSqlCursor_b_I_+0xcc
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSSqlObjOCacheSqlCursor6MpnMCSSSqlCursor_b_I_+0x6e4
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSSqlObjHRelease6Mb_v_+0x4d0
/export/home/siebel/siebsrvr/lib/libsscfom.so:__1cKCSSBusComp2T5B6M_v_+0x790
/export/home/siebel/siebsrvr/lib/libsscacmbc.so:__1cJCSSBCBase2T5B6M_v_+0x640
/export/home/siebel/siebsrvr/lib/libsscasabc.so:0x19275c
/export/home/siebel/siebsrvr/lib/libsscfom.so:0x318774
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cWCSSSWEFrameMgrInternalPReleaseBOScreen6M_v_+0x10c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalJBuildView6MpkHnLBUILDSCREEN_pnSCSSSWEViewBookmark_ipnJCSSBusObj_pnPCSSDrilldownDef_p1pI_I_+0x18b0
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalJBuildView6MrpnKCSSSWEView_pkHnLBUILDSCREEN_pnSCSSSWEViewBookmark_ipnJCSSBusObj_pnPCSSDrilldownDef_p4_I_+0x50
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrUActionBuildViewAsync6MpnbACSSSWEActionBuildViewAsync_pnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x38c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrODoPostedAction6MpnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x104
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrSCheckPostedActions6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo_ri_I_+0x378
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalSInvokeAppletMethod6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo_rnOCSSStringArray__I_+0x12f8
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorMInvokeMethod6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x88c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorP_ProcessCommand6MpnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x860
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorOProcessCommand6MpnUCSSSWEGenericRequest_pnVCSSSWEGenericResponse_rpnNWWEReqModInfo_rpnJWWECbInfo__I_+0x9bc
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorOProcessCommand6MpnRCSSSWEHttpRequest_pnSCSSSWEHttpResponse_rpnJWWECbInfo__I_+0xd8
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cSCSSServiceSWEIfaceHRequest6MpnMCSSSWEReqRec_pnRCSSSWEResponseRec__I_+0x404
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cSCSSServiceSWEIfaceODoInvokeMethod6MpkHrknOCCFPropertySet_r3_I_+0xa80
/export/home/siebel/siebsrvr/lib/libsscfom.so:__1cKCSSServiceMInvokeMethod6MpkHrknOCCFPropertySet_r3_I_+0x244
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:__1cOCSSSIOMSessionTModInvokeSrvcMethod6MpkHp13rnISSstring__i_+0x124
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:
__1cOCSSSIOMSessionMRPCMiscModel6MnMSISOMRPCCode_nMSISOMArgType_LpnSCSSSISOMRPCArgList_4ripv_i_+0x5bc
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:
__1cOCSSSIOMSessionJHandleRPC6MnMSISOMRPCCode_nMSISOMArgType_LpnSCSSSISOMRPCArgList_4ripv_i_+0xb98
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cJCSSClientLHandleOMRPC6MpnMCSSClientReq__I_+0x78
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cJCSSClientNHandleRequest6MpnMCSSClientReq__I_+0x2f8
/export/home/siebel/siebsrvr/lib/libsssasos.so:0x8c2c4
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cLSOMMTServerQSessionHandleMsg6MpnJsmiSisReq__i_+0x1bc
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cNsmiMainThreadUCompSessionHandleMsg6MpnJsmiSisReq__i_+0x16c
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cM_smiMessageQdDOProcessMessage6MpnM_smiMsgQdDItem_li_i_+0x93c
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cM_smiMessageQdDOProcessRequest6Fpv1r1_i_+0x244
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cN_smiWorkQdDueuePProcessWorkItem6Mpv1r1_i_+0xd4
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cN_smiWorkQdDueueKWorkerTask6Fpv_i_+0x300
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cQSmiThrdEntryFunc6Fpv_i_+0x46c
/export/home/siebel/siebsrvr/lib/libsslcosd.so:0x5be88
/export/home/siebel/siebsrvr/mw/lib/libmfc400su.so:__1cP_AfxThreadEntry6Fpv_I_+0x100
/export/home/siebel/siebsrvr/mw/lib/libkernel32.so:__1cIMwThread6Fpv_v_+0x23c
/lib/libc.so.1:0xc57f8

Setting the Siebel environment variable SIEBEL_STDERROUT to 1 shows the following heap dump in StdErrOut directory under Siebel enterprise logs directory.

% more stderrout_7762_23511113.txt
\*\*\*\*\*\*\*\*\*\* Internal heap ERROR 17112 addr=35dddae8 \*\*\*\*\*\*\*\*\*

\*\*\*\*\* Dump of memory around addr 35dddae8:
35DDCAE0 00000000 00000000 [........]
35DDCAF0 00000000 00000000 00000000 00000000 [................]
Repeat 243 times
35DDDA30 00000000 00000000 00003181 00300020 [..........1..0. ]
35DDDA40 0949D95C 35D7A888 10003179 00000000 [.I.\\5.....1y....]
35DDDA50 0949D95C 0949D8B8 35D7A89C C0000075 [.I.\\.I..5......u]
...
...
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
HEAP DUMP heap name="Alloc environm" desc=949d8b8
extent sz=0x1024 alt=32767 het=32767 rec=0 flg=3 opc=2
parent=949d95c owner=0 nex=0 xsz=0x1038
EXTENT 0 addr=364fb324
Chunk 364fb32c sz= 4144 free " "
EXTENT 1 addr=364f8ebc
Chunk 364f8ec4 sz= 4144 free " "
EXTENT 2 addr=364f7d5c
Chunk 364f7d64 sz= 4144 free " "
EXTENT 3 addr=364f6d04
Chunk 364f6d0c sz= 4144 recreate "Alloc statemen " latch=0
ds 2c38df34 sz= 4144 ct= 1
...
...
EXTENT 406 addr=35ddda54
Chunk 35ddda5c sz= 116 free " "
Chunk 35dddad0 sz= 24 BAD MAGIC NUMBER IN NEXT CHUNK (6)
freeable assoc with mark prv=0 nxt=0

Dump of memory from 0x35DDDAD0 to 0x35DDEAE8
35DDDAD0 20000019 35DDDA5C 00000000 00000000 [ ...5..\\........]
35DDDAE0 00000095 0000008B 00000006 35DDDAD0 [............5...]
35DDDAF0 00000000 00000000 00000095 35DDDB10 [............5...]
...
...
EXTENT 2080 addr=d067a6c
Chunk d067a74 sz= 2220 freeable "Alloc statemen " ds=2b0fffe4
Chunk d068320 sz= 1384 freeable assoc with mark prv=0 nxt=0
Chunk d068888 sz= 4144 freeable "Alloc statemen " ds=2b174550
Chunk d0698b8 sz= 4144 recreate "Alloc statemen " latch=0
ds 1142ea34 sz= 112220 ct= 147
223784cc sz= 4144
240ea014 sz= 884
28eac1bc sz= 900
2956df7c sz= 900
1ae38c34 sz= 612
92adaa4 sz= 884
2f6b96ac sz= 640
c797bc4 sz= 668
2965dde4 sz= 912
1cf6ad4c sz= 656
10afa5e4 sz= 656
2f6732bc sz= 700
27cb3964 sz= 716
1b91c1fc sz= 584
a7c28ac sz= 884
169ac284 sz= 900
...
...
Chunk 2ec307c8 sz= 12432 free " "
Chunk 3140a3f4 sz= 4144 free " "
Chunk 31406294 sz= 4144 free " "
Bucket 6 size=16400
Bucket 7 size=32784
Total free space = 340784
UNPINNED RECREATABLE CHUNKS (lru first):
PERMANENT CHUNKS:
Chunk 949f3c8 sz= 100 perm "perm " alo=100
Permanent space = 100
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
Hla: 255

ORA-21500: internal error code, arguments: [17112], [0x35DDDAE8], [], [], [], [], [], []
Errors in file :
ORA-21500: internal error code, arguments: [17112], [0x35DDDAE8], [], [], [], [], [], []

----- Call Stack Trace -----
NOTE: +offset is used to represent that the
function being called is offset bytes from
the _PROCEDURE_LINKAGE_TABLE_.
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
F2365738 CALL +23052 D7974250 ? D797345C ? DD20 ?
D79741AC ? D79735F8 ?
F2ECD7A0 ?
F286DDB8 PTR_CALL 00000000 949A018 ? 14688 ? B680B0 ?
F2365794 ? F2ECD7A0 ? 14400 ?
F286E18C CALL +77460 949A018 ? 0 ? F2F0E8D4 ?
1004 ? 1000 ? 1000 ?
F286DFF8 CALL +66708 949A018 ? 0 ? 42D8 ? 1 ?
D79743E0 ? 949E594 ?
...
...
__1cN_smiWorkQdDueu CALL __1cN_smiWorkQdDueu 1C8F608 ? 18F55F38 ?
30A5A008 ? 1A98E208 ?
FDBDF178 ? FDBE0424 ?
__1cQSmiThrdEntryFu PTR_CALL 00000000 1C8F608 ? FDBE0424 ?
1AB6EDE0 ? FDBDF178 ? 0 ?
1500E0 ?
__1cROSDWslThreadSt PTR_CALL 00000000 1ABE8250 ? 600140 ? 600141 ?
105F76E8 ? 0 ? 1AC74864 ?
__1cP_AfxThreadEntr PTR_CALL 00000000 0 ? FF30A420 ? 203560 ?
1AC05AF0 ? E2 ? 1AC05A30 ?
__1cIMwThread6Fpv_v PTR_CALL 00000000 D7A7DF6C ? 17F831E0 ? 0 ? 1 ?
0 ? 17289C ?
_lwp_start()+0 ? 00000000 1 ? 0 ? 1 ? 0 ? FCE6C710 ?
1AC72680 ?

----- Argument/Register Address Dump -----

Argument/Register addr=d7974250. Dump of memory from 0xD7974210 to 0xD7974350
D7974200 0949A018 00014688 00B680B0 F2365794
D7974220 F2ECD7A0 00014400 D7974260 F286DDB8 800053FC 80000000 00000002 D79743E0
D7974240 4556454E 545F3231 35303000 32000000 F23654D4 F2365604 00000000 0949A018
D7974260 00000000 0949A114 0000000A F2ECD7A0 FC873504 00001004 F2365794 F2F0E8D4
D7974280 0949A018 00000000 F2F0E8D4 00001004 00001000 00001000 D79742D0 F286E18C
D79742A0 F6CB7400 FC872CC0 00000004 00CDE34C 4556454E 545F3437 31313200 00000000
D79742C0 00000000 00000001 D79743E0 00000000 F2F0E8D4 00001004 00001000 00007530
D79742E0 00007400 00000001 FC8731D4 00003920 0949A018 00000000 000042D8 00000001
D7974300 D79743E0 0949E594 D7974330 F286DFF8 D7974338 F244D5A8 00000000 01000000
D7974320 364FBFFF 01E2C000 00001000 00000000 00001000 00000000 F2ECD7A0 F23654D4
D7974340 000000FF 00000001 F2F0E8D4 F25F9824
Argument/Register addr=d797345c. Dump of memory from 0xD797341C to 0xD797355C
D7973400 F2365738
...
...
Argument/Register addr=1eb21388. Dump of memory from 0x1EB21348 to 0x1EB21488
1EB21340 00000000 00000000 FE6CE088 FE6CE0A4 0000000A 00000010
1EB21360 00000000 00000000 00000000 00000010 00000000 00000000 00000000 FEB99308
1EB21380 00000000 00000000 00000000 1C699634 FFFFFFFF 00000000 FFFFFFFF 00000001
1EB213A0 00000000 00000000 00000000 00000000 00000081 00000000 F16F1038 67676767
1EB213C0 00000000 FEB99308 00000000 00000000 FEB99308 FEB46EF4 00000000 00000000
1EB213E0 00000000 00000000 FEB99308 00000000 00000000 00000001 0257E3B4 03418414
1EB21400 00000000 00000000 FEB99308 FEB99308 00000000 00000000 00000000 00000000
1EB21420 00000000 00000000 00000000 00000000 1EB213B0 00000000 00000041 00000000
1EB21440 0031002D 00410036 00510038 004C0000 00000000 00000000 00000000 00000000
1EB21460 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1EB21480 00000109 00000000

----- End of Call Stack Trace -----

Although I'm not sure what exactly is the underlying issue for the core dump, my suspicion is that there is some memory corruption in Oracle client's code; and the Siebel Object Manager crash is due to the Oracle bug 5682121 - Multithreaded OCI clients do not mutex properly for LOB operations. The fix to this particular bug would be available in Oracle 10.2.0.4 release; and is already available as part of Oracle 11.1.0.6.0. In case if you notice the symptoms of failure as described in this blog post, upgrade the Oracle client in the application-tier to Oracle 11gR1 and see if it brings stability to the Siebel environment.

Original blog post is at:
http://technopark02.blogspot.com/2007/11/solarissparc-oracle-11gr1-client-for.html

Friday Aug 10, 2007

Oracle 10gR2/Solaris x64: Must Have Patches for E-Business Suite 11.5.10

If you have an Oracle E-Business Suite 11.5.10 database running on Oracle 10gR2 (10.2.0.2) / Solaris x86-64 platform, make sure you have the following two Oracle patches to avoid concurrency issues and intermittent Oracle shadow process crashes.

Oracle patches

1) 4770693 BUG: Intel Solaris: Unnecessary latch sleeps by default
2) 5666714 BUG: ORA-7445 ON DELETE

Symptoms

1) If the top 5 database timed events look something similar to the following in AWR, it is very likely that the database is running into the bug 4770693 Intel Solaris: Unnecessary latch sleeps by default.

Top 5 Timed Events

EventWaitsTime(s)Avg Wait(ms)% Total Call TimeWait Class
latch: cache buffers chains 94,301 169,403 1,796 86.1Concurrency
CPU time-- 5,478-- 2.8--
wait list latch free 247,466 4,756 19 2.4Other
buffer busy waits 14,928 1,382 93 .7Concurrency
db file sequential read 98,750 552 6 .3User I/O

Apply Oracle server patch 4770693 to get rid of the concurrency issue(s). Note that the fix will be part of the release 10.2.0.3.

2) If the application becomes unstable and if you notice core dumps in cdump directory, have a look at the corresponding stack traces generated in udump directory. If the call stack looks similar to the following stack, apply Oracle server patch 5666714 to overcome this problem.

Alert log will have the following errors:

Errors in file /opt/oracle/admin/VIS/udump/vis_ora_1040.trc:
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [2168] [] [] []
Fri Jun 15 01:30:38 2007
Errors in file /opt/oracle/admin/VIS/udump/vis_ora_1040.trc:
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [9] [] [] []
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [2168] [] [] []
Fri Jun 15 01:30:38 2007
Errors in file /opt/oracle/admin/VIS/udump/vis_ora_1040.trc:
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [9] [] [] []
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [9] [] [] []
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [2168] [] [] [] 

% more vis_ora_1040.trc
...
...
\*\*\* 2007-06-15 01:30:38.403
\*\*\* SERVICE NAME:(VIS) 2007-06-15 01:30:38.402
\*\*\* SESSION ID:(1111.4) 2007-06-15 01:30:38.402
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x878
\*\*\* 2007-06-15 01:30:38.403
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [2168] [] [] []
Current SQL statement for this session:
INSERT /\*+ IDX(0) \*/ INTO "INV"."MLOG$_MTL_SUPPLY" (dmltype$$,old_new$$,snaptime$$,change_vector$$,m_row$$) 
VALUES (:d,:o,to_date('4000-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:m)
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst()+23          ?        0000000000000001     00177A9EC 000000000 0061D0A60
                                                   000000000
ksedmp()+636         ?        0000000000000001     001779481 000000000 00000000B
                                                   000000000
ssexhd()+729         ?        0000000000000001     000E753CE 000000000 0061D0B90
                                                   000000000
_sigsetjmp()+25      ?        0000000000000001     0FDDCB7E6 0FFFFFD7F 0061D0B50
                                                   000000000
call_user_handler()  ?        0000000000000001     0FDDC0BA2 0FFFFFD7F 0061D0EF0
+589                                               000000000
sigacthandler()+163  ?        0000000000000001     0FDDC0D88 0FFFFFD7F 000000002
                                                   000000000
kglsim_pin_simhp()+  ?        0000000000000001     0FFFFFFFF 0FFFFFFFF 00000000B
173                                                000000000
kxsGetRuntimeLock()  ?        0000000000000001     001EBF830 000000000 005E5D868
+683                                               000000000
kksfbc()+7361        ?        0000000000000001     001FB60A6 000000000 005E5D868
                                                   000000000
opiexe()+1691        ?        0000000000000001     0029045D0 000000000 0FFDF9250
                                                   0FFFFFD7F
opiall0()+1316       ?        0000000000000001     0028E9FB9 000000000 000000001
                                                   000000000
opikpr()+536         ?        0000000000000001     00290B2DD 000000000 0000000B7
                                                   000000000
opiodr()+1087        ?        0000000000000001     000E7BE1C 000000000 000000001
                                                   000000000
rpidrus()+217        ?        0000000000000001     000E8058E 000000000 0FFDFA6B8
                                                   0FFFFFD7F
skgmstack()+163      ?        0000000000000001     003F611D0 000000000 005E5D868
                                                   000000000
rpidru()+129         ?        0000000000000001     000E808A6 000000000 005E6FAD0
                                                   000000000
rpiswu2()+431        ?        0000000000000001     000E7FD8C 000000000 0FFDFB278
                                                   0FFFFFD7F
kprball()+1189       ?        0000000000000001     000E86E6A 000000000 0FFDFB278
                                                   0FFFFFD7F
kntxslt()+3150       ?        0000000000000001     0030601F3 000000000 005F7C538
                                                   000000000
kntxit()+998         ?        0000000000000001     003058EBB 000000000 005F7C538
                                                   000000000
0000000001E4866E     ?        0000000000000001     001E4864B 000000000 000000000
                                                   000000000
delrow()+9170        ?        0000000000000001     0032020B7 000000000 000000002
                                                   000000000
qerdlFetch()+640     ?        0000000000000001     0033545F5 000000000 0EF38B020
                                                   000000003
delexe()+909         ?        0000000000000001     0032034EA 000000000 005E6FC50
                                                   000000000
opiexe()+9267        ?        0000000000000001     002906368 000000000 000000001
                                                   000000000
opiodr()+1087        ?        0000000000000001     000E7BE1C 000000000 0FFDFCD10
                                                   0FFFFFD7F
ttcpip()+1168        ?        0000000000000001     003D031AD 000000000 0FFDFEDF4
                                                   0FFFFFD7F
opitsk()+1212        ?        0000000000000001     000E77C41 000000000 000E7BA00
                                                   000000000
opiino()+931         ?        0000000000000001     000E7B0D8 000000000 005E5B8F0
                                                   000000000
opiodr()+1087        ?        0000000000000001     000E7BE1C 000000000 000000000
                                                   000000000
opidrv()+748         ?        0000000000000001     000E76A11 000000000 0FFDFF6D8
                                                   0FFFFFD7F
sou2o()+86           ?        0000000000000001     000E73E6B 000000000 000000000
                                                   000000000
opimai_real()+127    ?        0000000000000001     000E3A7C4 000000000 000000000
                                                   000000000
main()+95            ?        0000000000000001     000E3A694 000000000 000000000
                                                   000000000
0000000000E3A4D7     ?        0000000000000001     000E3A4DC 000000000 000000000
                                                   000000000
 
--------------------- Binary Stack Dump ---------------------
 
========== FRAME [1] (ksedst()+23 -> 0000000000000001) ==========
Dump of memory from 0x00000000061D0910 to 0x00000000061D0920
0061D0910 061D0920 00000000 0177A9EC 00000000  [ .........w.....]
========== FRAME [2] (ksedmp()+636 -> 0000000000000001) ==========
Dump of memory from 0x00000000061D0920 to 0x00000000061D0A60
0061D0920 061D0A60 00000000 01779481 00000000  [`.........w.....]
0061D0930 0000000B 00000000 061D0EF0 00000000  [................]
0061D0940 05E5B96C 00000000 05E5C930 00000000  [l.......0.......]
0061D0950 05E5C930 00000000 FE0D2000 FFFFFD7F  [0........ ......]
0061D0960 061D0A40 00000000 00000000 00000000  [@...............]
0061D0970 00000000 00000000 00000000 00000000  [................]
...
...

After installing the Oracle database patches, check the installed patches by running opatch lsinventory on your database server.

% opatch lsinventory
Invoking OPatch 10.2.0.1.0

Oracle interim Patch Installer version 10.2.0.1.0
Copyright (c) 2005, Oracle Corporation.  All rights reserved..


Oracle Home       : /oracle/product/10.1.0
Central Inventory : /export/home/oracle/oraInventory
   from           : /oracle/product/10.1.0/oraInst.loc
OPatch version    : 10.2.0.1.0
OUI version       : 10.2.0.1.0
OUI location      : /oracle/product/10.1.0/oui
Log file location : /oracle/product/10.1.0/cfgtoollogs/opatch/opatch-2007_Aug_10_21-56-03-PDT_Fri.log

Lsinventory Output file location : 
/oracle/product/10.1.0/cfgtoollogs/opatch/lsinv/lsinventory-2007_Aug_10_21-56-03-PDT_Fri.txt

--------------------------------------------------------------------------------
Installed Top-level Products (3): 

Oracle Database 10g                                                  10.2.0.1.0
Oracle Database 10g Products                                         10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 1                            10.2.0.2.0
There are 3 products installed in this Oracle Home.


Interim patches (2) :

Patch  4770693      : applied on Thu Aug 02 15:27:23 PDT 2007
   Created on 12 Jul 2006, 11:52:39 hrs US/Pacific
   Bugs fixed:
     4770693

Patch  5666714      : applied on Fri Jul 20 10:21:33 PDT 2007
   Created on 29 Nov 2006, 04:52:58 hrs US/Pacific
   Bugs fixed:
     5666714

--------------------------------------------------------------------------------

OPatch succeeded.
About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today