Saturday Dec 06, 2008

Is Oracle's PeopleSoft Really a Multi-Threaded Application?

Perhaps the answer to this question is irrelevant to many of the PeopleSoft end users - but I assume it is important for the administrators to know what kind of application they are dealing with.

Here is a snapshot of the process statistics (prstat output) for the PeopleSoft application server processes running on a Solaris 10 system:

   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP      
 10864 psft      353M  235M sleep   60   10   0:13:42 1.5% PSAPPSRV/11
 10855 psft      353M  235M sleep    2   10   0:13:55 1.5% PSAPPSRV/11
 10846 psft      353M  235M sleep    3   10   0:14:04 1.5% PSAPPSRV/11
 10870 psft      353M  235M sleep    0   10   0:13:50 1.5% PSAPPSRV/11
 10873 psft      353M  235M sleep    1   10   0:13:57 1.4% PSAPPSRV/11
 10852 psft      353M  235M sleep    0   10   0:13:57 1.4% PSAPPSRV/11
 10858 psft      353M  235M sleep   60   10   0:13:47 1.3% PSAPPSRV/11
 10849 psft      349M  231M cpu0    20   10   0:13:55 1.3% PSAPPSRV/11
 10867 psft      353M  235M sleep   60   10   0:13:53 1.3% PSAPPSRV/11
 10861 psft      349M  231M sleep   60   10   0:13:56 1.2% PSAPPSRV/11

Notice the number of LWPs (represented by NLWP in the snapshot) that are associated with each of those PSAPPSRV processes. Just by looking at the above snapshot, one may under the impression that the PSAPPSRV (PeopleSoft Application Server process) is a multi-threaded process because it appears that there are 11 worker threads actively processing the user requests.

To dig a little deeper, prstat utility on Solaris provides the ability to check the statistics for each of the light-weight processes (LWPs) that are associated with a process (here I'm assuming that the intended audience can differentiate a process from a light-weight process). With the help of -L option of the prstat, Solaris reports the statistics for each LWP that is associated with a given process. Let's have a close look at those stats.

  PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/LWPID
10864 psft      353M  235M cpu32    0   10   0:01:37 1.5% PSAPPSRV/1
10864 psft      353M  235M sleep   59    0   0:00:00 0.0% PSAPPSRV/11
10864 psft      353M  235M sleep   29   10   0:00:00 0.0% PSAPPSRV/10
10864 psft      353M  235M sleep   28   10   0:00:00 0.0% PSAPPSRV/9
10864 psft      353M  235M sleep   28   10   0:00:00 0.0% PSAPPSRV/8
10864 psft      353M  235M sleep   59    0   0:00:00 0.0% PSAPPSRV/7
10864 psft      353M  235M sleep   59    0   0:00:00 0.0% PSAPPSRV/6
10864 psft      353M  235M sleep   51    2   0:00:00 0.0% PSAPPSRV/5
10864 psft      353M  235M sleep   59    0   0:00:00 0.0% PSAPPSRV/4
10864 psft      353M  235M sleep   29   10   0:00:00 0.0% PSAPPSRV/3
10864 psft      353M  235M sleep   59    0   0:00:00 0.0% PSAPPSRV/2

Notice all the PIDs in the first column. That's right - it is the same pid, 10864, across different lines. It confirms that the above snapshot is the process stats breakdown by LWPs for a given process. Now check the output under the TIME column. That column represents the cumulative execution time for the process -- LWP, in this case. Except for the LWP #1, the exec time for rest of the LWPs is zero i.e., even though the PSAPPSRV process spawned 10 more LWPs, in reality, they are not doing any work at all. When I tried to find the reason {from my counterpart at Oracle Corporation} for the creation of multiple LWPs, I was told that the multiple LWPs are a side effect of loading JRE(s) into the process address space during the run-time. Also it appears that the PeopleSoft application server processes (PSAPPSRV) can process only one user request (transaction) at a time. It is the limitation of the PeopleSoft Enterprise by design.

So the bottomline is: PeopleSoft application server processes (PSAPPSRV) are not multi-threaded even though they appear to be multi-threaded from the operating system perspective.

Before we conclude, be aware that the discussion in this blog post applies only to the PeopleSoft application server processes, PSAPPSRV. You cannot generalize it to the whole PeopleSoft Enterprise. For example, application engine processes (PSAESRV) that run under the control of the Process Scheduler are actually multi-threaded processes. However expanding the discussion around Process Scheduler/Application Engine is beyond the scope of this blog post.

Acknowledgements:
Sanjay Goyal

(Originally posted on blogger at:
http://technopark02.blogspot.com/2008/03/is-peoplesoft-really-multi-threaded.html)

Sunday Nov 30, 2008

PeopleSoft on Solaris 10: Fixing the "msgget: No space left on device" Error

(Crossposting the 8+ month old blog entry from my other blog hosted on blogger. Source URL:
http://technopark02.blogspot.com/2008/03/peoplesoft-fixing-msgget-no-space-left.html
)

When a large number of application server processes are configured in a single PeopleSoft domain or in multiple domains cumulative, it is very likely that the PeopleSoft application server domain boot process may fail with errors like:

Booting server processes ...
exec PSSAMSRV -A -- -C psappsrv.cfg -D CS90SPV -S PSSAMSRV :
        Failed.
113954.ben15!PSSAMSRV.29746.1.0: LIBTUX_CAT:681: ERROR: Failure to create message queue
113954.ben15!PSSAMSRV.29746.1.0: LIBTUX_CAT:248: ERROR: System init function failed, Uunixerr = : 
                   msgget: No space left on device
113954.ben15!tmboot.29708.1.-2: CMDTUX_CAT:825: ERROR: Process PSSAMSRV at ben15 failed with /T 
                   tperrno (TPEOS - operating system error)

In this particular example, the PeopleSoft Enterprise is running on a Solaris 10 system. Fortunately the error message is very clear in this case; and the failure is related to the message queues. During the domain boot up process, there is a call to msgget() to create a message queue. If the call to msgget() succeeds, it returns a non-negative integer that serves as the identifier for the newly created message queue. However in the case of a failure, it returns -1 and sets the error number to EACCES, EEXIST, ENOENT or ENOSPC depending on the underlying reason.

From the above error messages it clear that the msgget() failed with the errno set to ENOSPC (No space left on device). Man page of msgget(2) has the following explanation for ENOSPC error code on Solaris:

ERRORS
     The msgget() function will fail if:
     ...
     ...
     ENOSPC    A message queue identifier is to  be  created  but
               the  system-imposed limit on the maximum number of
               allowed  message  queue  identifiers  system  wide
               would be exceeded. See NOTES.

NOTES
     ...
     ...

     The system-imposed limit on  the  number  of  message  queue
     identifiers  is  maintained on a per-project basis using the
     project.max-msg-ids resource control.

It has enough clues to suspect the configured number for the message queue identifiers.

Prior to the release of Solaris 10, the /etc/system System V IPC tunable, msgsys:msginfo_msgmni, was used to control the maximum number of message queues that can be created. The default value on pre-Solaris 10 systems is 50.

With the release of Solaris 10, majority of the System V IPC tunables were obsoleted and equivalent resource controls were created for the remaining tunables to reduce the administrative overhead. On Solaris 10 and later versions, System V IPC can be tuned on a per project basis using the newly introduced resource controls.

On any Solaris 10 system, the resource control, project.max-msg-ids, replaced the old /etc/system tunable, msginfo_msgmni. And the default value has been raised to 128.

Now back to the failure in PeopleSoft environment. Let's first check the current value configured for project.max-msg-ids.

  • Get the project ID.
     % id -p
    uid=222227(psft) gid=2294(dba) projid=3(default)
  • Examine the project.max-msg-ids resource control for the project with ID 3, using the prctl utility.
     % prctl -n project.max-msg-ids -i project 3
    project: 3: default
    NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
    project.max-msg-ids
            privileged        128       -   deny                                 -
            system          16.8M     max   deny                                 -

Alternatively run the command ipcs -q to check the number of active message queues. Note that the project with id '3' is configured to create a maximum of 128 (default) message queues. In any case, the number of active message queues from the ipcs -q output may almost match with the configured value for the project.max-msg-ids.

Since it appears the configured PeopleSoft domain(s) needs more than 128 message queues in order to bring up all the application server processes that constitute the PeopleSoft Enterprise, the solution is to increase the value for the resource control, project.max-msg-ids, to any value beyond 128. For the sake of simplicity, let's increase it to 256 (2 \* default value, that is). Again prctl utility can be used to set the new value for the resource control.

  • Assume the privileges of the 'root' user
     % su
    Password:
  • Increase the maximum value for the message queue identifiers to 256 using the prctl utility.
     # prctl -n project.max-msg-ids -r -v 256 -i project 3
  • Verify the new maximum value for the message queue identifiers
     # prctl -n project.max-msg-ids -i project 3
    project: 3: default
    NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
    project.max-msg-ids
            privileged        256       -   deny                                 -
            system          16.8M     max   deny                                 -

With this change, the PeopleSoft Enterprise should boot up at least with no Failure to create message queue .. msgget: No space left on device errors.

Before we conclude, note that the above mentioned solution is not persistent across multiple operating system reboots. To make it persistent, create a new project using the projadd command. The man page for projadd(1M) has an example showing the creation of a project.

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today