Disabling PMF action script with SUNW.gds

Allow me to step back and first describe why one wants to disable the PMF action script when using GDS.

Sun Cluster comes with a generic data service (resource type is SUNW.gds), allowing to create a resource that starts, stops and probes a custom application in a very easy and quick way. There is even a GUI based wizard for it.

The central part of a data service is the application monitor (the probe). There are two types of monitoring methods used by default with GDS - through PMF and through the script/command provided with the Probe_command property:

  1. GDS creates two PMF tags by default:
    1. for the monitor (<resourcegroupname>,<resourcename>,0.mon), which registers the process ids (pids) from /opt/SUNWscgds/bin/gds_probe;
    2. for the service (<resourcegroupname>,<resourcename>,0.svc), which registers the pids started by the script/command provided with the Start_command property. The Child_mon_level property denotes the level to which the forked children processes are monitored.
  2. The script/command provided with the Probe_command is periodically run within a configurable interval (Thorough_probe_interval), signaling the health of the application via return code.


The reason why the two monitoring methods are combined is because it allows faster detection if the application failed:

  • If all processes registered with the service tag are gone, the resource is determined to have failed by PMF (monitoring from external).
  • If the Probe_command signals a return code other than zero, then the resource is determined to have failed by the gds_probe (monitoring from internal).
So a loss of all pids of the service tag is recognized immediatly, while a failure reported through the probe is recognized every Thorough_probe_interval seconds.

After understanding all that, there are some applications that may not leave a process after they got started, or the final process(es) may not be a (level of) child started through the script/command provided with the Start_command.

For those types of applications one might just want to use the script/command registered with the Probe_command property in order to perform monitoring, and wishes to switch off PMF.

There is no way to simply turn off using PMF with GDS, but you can turn off the triggered restart by PMF, if all pids registered with the service tag are gone:

/usr/cluster/bin/pmfadm -s <resourcegroupname>,<resourcename>,0.svc

Using this within the script registered through Start_command obviously requires that the script "knows" about its resourcegroup- and resourcename. If you set the Start_command to e.g.

Start_command="/path/to/my-start-script.ksh -R %RS_NAME -G %RG_NAME"

and use the following code within the my-start-script.ksh:

 

#!/bin/ksh
while getopts 'R:G:' opt
do
   case "${opt}" in
           R)  RESOURCE=${OPTARG};;
           G)  RESOURCEGROUP=${OPTARG};;
   esac
done
/usr/bin/sleep 60 &
/usr/cluster/bin/pmfadm -s ${RESOURCEGROUP},${RESOURCE},0.svc
[...rest of your code to start the application...]

The "/usr/bin/sleep 60 &" is needed to guarantee that there is at least one pid registered with PMF for the service tag long enough until the "pmfadm -s" could stop the restart behavior (which is getting performed by the PMF action script).

After that, even if all pids registered for the service tag are gone, PMF will no longer perform any action for this resource. Monitoring is then totally dependent on the registered Probe_command.

Note that this also means that the application fault is only detected every Thorough_probe_interval seconds for this resource. Within this interval it is possible that the application failed, but it is not immediately being detected.

Comments:

Post a Comment:
Comments are closed for this entry.
About

This Blog is about my work at Availability Engineering: Wine, Cluster and Song :-) The views expressed on this blog are my own and do not necessarily reflect the views of Sun and/or Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today