Creating Sun Cluster Agents for simple tasks.

I often hear questions like the following question asked in various forums:

  • How do i send an e-mail whenever a resource group fails over?

  • How do i automate tweaking of this configuration file just before my application is started by SunCluster?
  • All of these can be considered a variant of the following question: "How do i run my own custom script to do this custom task just before/after a given resource/group starts?"

    As people already familiar with SunCluster would know, this calls for an Agent and SunCluster does have tools such as the Sun Cluster Agent Builder GUI and the Generic Data Service ( GDS) model (supported by scdsbuilder as well), for such things. Those tools are focussed more towards "applications", with the notion that there would be a set of user level daemon/processes running which would constitute the applications.

    Dealing with different types of applications is a rather complex topic in itself, for today's blog, let me quickly get to a very basic approach i would like to introduce today. The approach consists of a few very simple steps.

    - Create a script, which would do the specific action you want to do. Such as

    logger -p user.notice Start script called with $\*
    exit 0

    Note the "exit 0" at the end, whenever SunCluster framework calls your script, this tells the SC framework that the start operation was successful Replace syslog command with whatever it is that you wanted to do.

    Let us say you name this script as startmyscagent.

    - Create a script which would do the STOP action for whatever you did in the start script. If there is no need for this, skip this step. We would use /bin/true for the STOP action.

    - Install your scripts in a specific location on all cluster nodes. suggest a location in /opt, such as /opt/myscagent.

    - Create a Resource Type Registration (RTR) file for your agent. Call it "myscagent.rtr". It would look like the following (cut and paste into your favourite editor).

    RESOURCE_TYPE = "myscagent";

    This is it! Your "Agent development" is done! Now you can use SunCluster admin commands to register this resource type and create resource of this type. For example.

    - scrgadm -at myscagent -f /opt/myscagent/myscagent.rtr
    - scrgadm -ag RG1
    - scrgadm -og RG1
    - scrgadm -aj myresource1 -g RG1 -t myscagent
    - scswitch -ej myresource1

    Now the start script you implemented would be called everytime the RG is starting. Sometimes it is desirable to have the start script of myscagent to be called \*just before\* another existing resource on the cluster. For example, to make sure your start script is always called BEFORE a resource named "resource2" is started:

    - scrgadm -c -j resource2 -y Resource_dependencies=myresource1

    To be sure, this approach is very primitive. It does not differentiate between difference resources of the same resource type. The action taken is exactly the same, regardless of how many different resources you create of type "myscagent". Sure, you can create another agent call "myotherscagent" or whatever, but if the actions taken by the two RTs are very similar, it becomes cumbersome to maintain. Additionally, there is no notion of resource properties or resource monitoring, nor does it show you how to implement more advanced methods like VALIDATE and MONITOR_CHECK etc.

    For simple tasks, this approach gets the job done though, and in the process you get a first hand understanding of how SunCluster Resource Group Manager (RGM) calls methods on resource type implementations.

    For a deep dive into this, check out the on the Sun Cluster Agent Developer Guide on the Sun Docs site.

    Ashutosh Tripathi
    Senior Software Engineer, SunCluster Engineering


    The limitation expressed about not being able to different things if different resources are created from the myscagent resource type can be easily overcome. Lets say we have rgs RG1 and RG2 which when started, should send email to E1, and E2 respectively. The RGM passes the RT/RG/R name to the start script as command line arguments. So these can be used to branch off to different action in start method for the dummy agents created in the respective resource groups.
    Similarly for different action for resources in the same resource group, we can branch depending on name of the myscagent resource.
    For sophisticated/fine-grained communication, the events framework can be explored !

    Posted by Suraj Verma on October 19, 2006 at 05:28 AM PDT #

    Thanks for making this process accessible. Did you mean to type "logger" in your example script?

    Posted by Todd Vander Does on September 27, 2007 at 03:36 AM PDT #

    Hi Todd,

    Hope you found the blog useful. And yes, i did mean to type "logger" instead of syslog. Have corrected it, thanks for catching it.


    Posted by ashu on October 02, 2007 at 07:31 AM PDT #

    Hi All,

    May I know if this suggested method works on Solaris Cluster 3.2? Also, do you know where to get more info on events framework mentioned in Suraj Verma's comment?

    Thanks in advance.


    Posted by Paul Liong on January 13, 2008 at 04:50 PM PST #

    Hi Paul,

    Yes, this method works on SC3.2.

    The event framework Suraj mentioned is focused on being able to communicate events (such as failover, membership changes etc.) happening on the cluster, to an external client which maybe interested in those events. The framework is called Cluster Reconfiguration Notification Protocol (CRNP). You can read about it in the Data Services Developer's Guide. Here is a pointer:

    If the kind of thing you want to do are not really about communicating with external machines, rather with doing more complex failover actions on the cluster itself, the scdsbuilder GUI mentioned in the blog might be a better place to start. It can generate code for you which you can modify to your liking to achieve specific and complex tasks.


    Posted by ashu on January 14, 2008 at 06:00 AM PST #

    Hi Ashu,

    Thanks for your quick reply. In effect, we hope to kick off a script to start up an application once the SC 3.2 Oracle Resource is started. So, is there any other way to achieve this purpose rather than create a new agent?

    Thanks in advance.


    Posted by Paul Liong on January 14, 2008 at 09:54 AM PST #

    Hi Ashu,

    Sorry. I hope to further clarify my former question. The script is to kick off an application running on the other Sun Box. So, is there any more simple way to perform this action.

    Thanks & Regards

    Posted by Paul Liong on January 14, 2008 at 05:18 PM PST #

    Hi Paul,

    Is there a reason the other Sun Box can't be part of the same cluster where Oracle is? If it were, what you wanna do can be just a simple "dependency" between the Oracle resource and the other application resource.

    My blog entry here was really meant to address "simple actions", not so much to manage a new application, for which we do recommend writing an Agent.

    However, assuming that you want something quick and dirty: How about simply doing a ssh to the other remote box and starting up the application? This "script" which does an ssh can then be plugged in into the SC framework exactly as i described above. You would put a Resource_dependency on the Oracle resource.

    Be careful with the security issues of doing this though. Please research various ways of doing ssh in an automated way to minimize security exposure.

    Does the above help?

    Posted by ashu on January 15, 2008 at 04:48 AM PST #

    Hi Ashu,

    Thanks. It really helps.


    Posted by Paul Liong on January 15, 2008 at 11:45 PM PST #

    Hi Ashu,

    Is there any way to set the "start time out value" for the suggested PTR:

    RESOURCE_TYPE = "myscagent";

    In oder words, is that OK to modify it as the following:

    RESOURCE_TYPE = "myscagent";
    { Property = Start_timeout; MIN=5; DEFAULT=45; }

    Thanks & Regards

    Posted by Paul Liong on January 24, 2008 at 04:49 PM PST #

    Hi Paul,

    Yes. You got it, that is the right syntax. The default Start_timeout is 1 hour (3600 seconds), which might be too long, so what you are doing is a very reasonable thing to do. However, i must point out that there WOULD be scenarios where you would hit that Start_timeout and thus you should think about what happens next. Minimally, you should think about if what you are doing in the Start script, is interrupted in the middle (with a TERM signal followed by KILL signal), would it leave your system/application in an "indeterminate" state?

    If so, you might wanna make your STOP implementation a bit more robust by having it do "cleanup" activities so that after the STOP method is run, your system is in a well defined state. If you encounter any errors in the STOP method while doing cleanup, you should exit non-zero so that the resource can be put into STOP_FAILED state and the administrator can then set things right.

    Note that the STOP method should not fail if there is nothing for it to do (ie, the system already is in a clean state). This is sometimes known as the "idempotency of the STOP method" rule.


    Posted by ashu on February 01, 2008 at 05:33 AM PST #

    Hi Ashu,

    Thanks for your further reply and advice. By the way, what are the other default properties come with the customer defined agent?

    Thanks & Regards

    Posted by Paul Liong on February 03, 2008 at 03:32 PM PST #

    Hi Paul,

    Customer defined Agents inherit all the system properties. Do "man r_properties" to read more about them. Many of those properties deal with advanced options having to do with fault monitoring etc. However there are some which would be useful even for the simplest of Agents. Check out "Failover_mode" in particular.

    Additionally there are resource group properties (do "man rg_properties") which might be interesting. "Pingpong_interval" is interesting in particular.


    Posted by ashu on February 04, 2008 at 03:15 AM PST #

    Hi Ashu,

    Noted & thanks for your further update.


    Posted by Paul Liong on February 04, 2008 at 04:02 PM PST #

    Post a Comment:
    • HTML Syntax: NOT allowed

    Oracle Solaris Cluster Engineering Blog


    « August 2016