Monday Mar 16, 2009

Using a Sailfin Cluster (Load-balancing with a Simple INVITE Application)

This blog covers the following items:

  1. Deploy a simple invite application on a sailfin cluster (Application provided).

  2. Create the SIPp INVITE scenario file (Ready scenario file provided).

  3. Run the SIPp scenario.

  4. See that the requests are load-balanced.

Before You Start
a)
Get SIPp installed on your machine. Get it from http://sipp.sourceforge.net/ .
b) To create the Sailfin Cluster see Quick Start with SailFin Clustering. Once the CLB enabled cluster is created you can follow the below steps to try it out.


Deploy a Simple INVITE Application on the cluster
Run asadmin deploy --target <cluster-name> <path-to-deployable-sar-file> at <sailfin-install-location>/bin
(Application source is available here. Deployable application sar file is available here. )


Create the SIPp INVITE scenario file
\*
Get this invite.xml file available here.
or
\* Get the default uac scenario xml available from sipp (by running "sipp -sd uac >invite.xml) and modify it for use with sailfin.
Modifications required are:
- Replace the first <recv response="200" rtd="true"> with <recv response="200" rtd="true" rrs="true">
- Replace "ACK sip:[service]@[remote_ip]:[remote_port] SIP/2.0" with "ACK [next_url] SIP/2.0".
- Replace " BYE sip:[service]@[remote_ip]:[remote_port] SIP/2.0" with " BYE [next_url] SIP/2.0"


Now run SIPp as follows
sipp -sf invite.xml -m <number-of-calls-you-want-to-create> -p <sipp-client-port-your-pick> <host-name-any-one-instance>:<sip-port-of-that-instance>

For example:
Use "sipp -sf invite.xml -m 10 -p 7000 sailfin-cluster-machine.india.sun.com:35060" to send 10 requests to an instance that is running on machine sailfin-cluster-machine.india.sun.com and whose sip port is 35060.
Now, to find the SIP port of an instance open the domain.xml at <sailfin-install-location>/domains/<your-domain-name>/config/domain.xml and search for the instance name. Look for the SIP_PORT system property information for that instance. If you don't find it then the default 35060 port is being used.


Checking that requests are being Load Balanced

From Admin UI:
a) Open Admin UI in the browser <hostname-of-machine-with-domain>:4848. Enter default username and password (admin/adminadmin)
b) Click the arrow next to “Clusters” on the left pane. Click the arrow next to “<Your-Cluster>”. Click on one of the instances.
c) Click on the “Monitor” tab. Click on “Call Flow” sub-tab. Check the “Enabled” check box to enable Call Flow Monitoring. Click on the “Save” button. Repeat this for each instance in your cluster.
d) Run the above SIPp command again on the command line.
e) Click on the “Refresh” button. Scroll down to check on the Call Flow Data table. Do this on the Call Flow screen of each instance to find that each instance has served some requests.

From Command Line:
a) Enable SIP module logging by running "./asadmin set <cluster-name>-config.log-service.module-log-levels.property.sip=FINE" .
b) Run the above SIPp command again.
c) Observe the server logs for all the instances.
These logs are available at <sailfin-install-location>/nodeagents/<agent-name>/<instance-name>/logs/server.log. Look for text "The first line" and you will find the SIP messages that are processed. You will find that requests are being processed by all instances.

Tuesday Feb 17, 2009

Notes on Sailfin Cluster Failure Management and GMS


Here are some short notes on how a SailFin Cluster deals with instance failures. These are good for troubleshooting and helpful when debugging issues with failure scenarios. But first, if you are unfamiliar with sailfin clustering please read Quick Start with Sailfin Clustering.

Sailfin relies on Group Management Service for its failure management. This includes detection of an instance's failure and appropriate notification. Below is a list of some types of instance failure that GMS helps detect:
  1. Software Failure:
    a. Node Agent and instance process dying
    b. Instance process alone dying [Transient Failure]

  2. Hardware Failure:
    a. Network Failure [cable snap at the machine's end or at the router's end]
    b. Power Failure [of the machine hosting a sailfin instance]


Notes on how GMS works:
  1. Each instance in a sailfin cluster has an instance of GMS service running in it which starts as the instance is started. A logical GMS group is formed by the GMS services running on all the instances of a cluster.

  2. Using the GMS service each member of the group is able to send and receive signals. Using a heartbeat mechanism the GMS services are able to detect states such as addition, failure or recovery of a group member.

  3. These states are registered as events and are logged in the instance's server.log file under <sailfin-installation>/nodeagents/<agent-name>/<instance-name>/logs/server.log. For example, if an instance is shutdown using the "asadmin stop-instance <instance-name>" command, all other instances that are part of the group would detect this shutdown. You will note a PEER_STOP_EVENT registered in the server log files of these instances.

Below is a list of some important GMS events along with their significance:
  1. PEER_STOP_EVENT: Indicates a planned shutdown of an instance (using the asadmin stop-instance command).
  2. ADD_EVENT: Indicates that an instance has been started (using the asadmin start-instance command) and its GMS service joining the logical GMS group.
  3. JOINED_AND_READY_EVENT: Indicates that startup of an instance is complete.
  4. IN_DOUBT_EVENT: Indicates that GMS suspects that an instance has failed. (Try this by killing an instance and its associated node-agent's process and notice the messages in the logs of other instances)
  5. FAILURE_EVENT: Indicates confirmation of failure of an instance by GMS.
These log messages also indicate the instance associated with the event. This information is quite handy when debugging failure based scenarios.

Node-Agent as a WatchDog:
One other failure detection mechanism is a non-GMS one. The node-agent acts as a watch-dog for the instance. It detects instance process failure and attempts a restart of the instance. This is the transient failure mentioned above as item 1 (b) above.

CLB as a Listener of GMS events:
CLB is a listener of GMS events and it adjust its functonality as per the event's significance. For example, the CLB considers an instance to be available to serve requests until it receives a FAILURE_EVENT for that instance from GMS. On receiving a FAILURE_EVENT the CLB stops forwarding requests to the failed instance. The instance is added back to the CLB's list of available instances only after the CLB receives a JOINED_AND_READY_EVENT for that instance.

GMS failure detection and notification times can vary depending on the hardware used and certain GMS and sailfin configurable settings. For information on this and other functionality provided by GMS, please read documention available at shoal.dev.java.net and swik.net/GlassFish+Shoal.



Quick Start with Sailfin Clustering

Sailfin Clustering

[Read More]
About

Sailfin, Glassfish and more....

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today