State of the Cluster With Get-Health Command

GlassFish 3.1 uses GMS, part of Project Shoal, to provide dynamic membership information about a cluster, including the state of its instances and the DAS. The asadmin subcommand "get-health" gives you a snapshot of this state. For example, here is my cluster ready to be started:

  hostname% ./asadmin get-health mycluster
  instance1 not started
  instance2 not started
  instance3 not started
  Command get-health executed successfully.

...and after the cluster has started:

  hostname% ./asadmin get-health mycluster
  instance1 started since Fri Feb 25 16:27:16 EST 2011
  instance2 started since Fri Feb 25 16:27:15 EST 2011
  instance3 started since Fri Feb 25 16:27:15 EST 2011
  Command get-health executed successfully.

Besides this basic information, the GMS-based system can give you information about instances that have been shut down or that have failed (see instances 1 and 2 in the next example). Because this information comes from the GMS group members (instances or DAS), information from the instances is still correct even if the DAS is restarted. In the following, notice the "stopped" and "failed" messages. After the DAS restarts, the startup time for instance3 is still valid. Because instances 1 and 2 are stopped, they cannot communicate when they failed/stopped:

  hostname% ./asadmin get-health mycluster
  instance1 stopped since Fri Feb 25 16:57:56 EST 2011
  instance2 failed since Fri Feb 25 16:58:26 EST 2011
  instance3 started since Fri Feb 25 16:27:15 EST 2011
  Command get-health executed successfully.

  hostname% ./asadmin restart-domain
  Successfully restarted the domain
  Command restart-domain executed successfully.

  hostname% ./asadmin get-health mycluster
  instance1 not started
  instance2 not started
  instance3 started since Fri Feb 25 16:27:15 EST 2011
  Command get-health executed successfully.

Note that you can see these events in the server.log as well. Here are some of the messages related to the shutdown and failure above:

  • GMS1017: Received PlannedShutdownEvent Announcement from member: instance1 with shutdown type: INSTANCE_SHUTDOWN of group: mycluster
  • GMS1007: Received FailureSuspectedEvent for member: instance2 of group: mycluster
  • GMS1019: member: instance2 of group: mycluster has failed

The asadmin get-health command is not new, but there is one new feature in GlassFish 3.1. If you have an instance configured as a service that is automatically restarted, it could fail and restart quickly before the system has a chance to process that it has failed for certain. In this case, you could see a message such as:

  hostname% ./asadmin get-health mycluster
  instance1 rejoined since Fri Feb 25 17:01:14 EST 2011
  ...

In a case like this, it is important to find out what is happening in that instance. If an instance fails and stays down, it becomes obvious quickly. But if an instance fails and restarts often, it may not be obvious unless you look through the server logs. So seeing an instance in the "rejoined" state could signal a problem that the instance in questions is constantly failing. Here are some of the messages you would see in server.log related to the rejoin:

  • GMS1053: member: instance1 was restarted at 3:45:41 PM EST on Feb 26, 2011.
  • GMS1054: Note that there was no Failure notification sent out for this instance that was previously started at ....
  • GMS1024: Adding Join member: instance1 group: mycluster StartupState: INSTANCE_STARTUP rejoining: missed reporting past FAILURE of this instance that had joined the group at ....

Whenever you create and start a cluster, it's a good idea to use the asadmin get-health command to make sure communication is working properly among the instances and the DAS. In my next blog, I'll show how you can use the "validate-multicast" subcommand to help diagnose a problem if one or more instances are not being found by asadmin get-health.

For more information on the get-health subcommand, run asadmin get-health --help. A version of it is attached here as get-health.txt. For more information on clusters and GMS, please see the article "Clustering in GlassFish Version 3.1".

Comments:

Post a Comment:
Comments are closed for this entry.
About

Whatever part of GlassFish or the Java EE world that catches my attention. (Also, go Red Sox.)

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today