Monday Feb 28, 2011

Shoal 1.5.29 released - GlassFish 3.1's runtime dynamic clustering service

As part of the tumultuous release of GlassFish 3.1 today which comes with clustering, centralized administration, high availability, improved automated delegated transaction recovery and a host of other features on top of the Java EE 6 platform,  I am delighted to announce the release of  Shoal 1.5.29 - the latest version of the Runtime Dynamic Clustering Framework library that is the underlying runtime clustering engine for GlassFish 3.1.

GMS is employed by many GlassFish modules for group communications and group lifecycle event notifications. GlassFish modules that use GMS include the HA (in-memory replication) module, the transaction service module, the IIOP Failover Loadbalancer, and EJB Timer Service (for timer migrations). 

The In-memory Replication module (also a part of Project Shoal ) is a caching backing store module built on top of Shoal GMS. The replication module is used in GlassFish for HTTPSession, EJB Stateful Session Beans state replication, and Single Sign-On state replication, and in addition, is also employed by Project Metro Web Services for making Reliable Messaging and Secure Conversations highly available in GlassFish 3.1 release. 

In this Shoal GMS release, we have a major change incorporated and that is the default transport provider for Shoal has changed from JXTA to Grizzly - specifically this release uses Grizzly version 1.9.28 as the transport provider. Grizzly gives us better performance with its NIO based transport - GlassFish 3.1 HA is about 34% improved over GlassFish 2.x in our internal benchmarks partly due to the move to use Grizzly as the transport under GMS with the HA module using GMS messaging APIs for its replication logic. Moreover,  the developers of Grizzly are co-located with and part of the GlassFish team allowing for faster support within the team. The JXTA 2.5 transport is still available via a source code build, however, since it is not tested as part of the extensive GlassFish 3.1 HA and GMS testing, it is not included in the pre-built shoal-gms jar. 

Additional features in this release include a new notification of the Master Change Event when the group master changes to another member when an existing group master fails or is shut down administratively. Another new feature is the REJOIN sub event as part of the JoinNotificationSignal and JoinedAndReadyNotificationSignal to symbolize a  use case where a member failed and restarted much before GMS's failure detection algo confirmed the failure - in such cases, a failure notification of the restarted member is confusing and hence a REJOIN sub event is sent in as part of the member's JoinNotificationSignal and JoinedAndReadyNotificationSignal. 

We hope and look forward to the community to continue giving its valuable feedback for improvements in the Shoal modules  - Please download Shoal as a library for use in your projects and give us your valuable feedback and RFEs for improvements. We welcome your feedback at users AT

You can download Shoal 1.5.29 library from here :

Thursday Oct 21, 2010

How To Configure & Test High Availability with GlassFish Server 3.1 Using A Single Machine

Update 02/28/2011 : GlassFish 3.1 shipped today with clustering, centralized administration, high availability and a host of new features providing these features on the Java EE 6 platform.  Get the download here.

The Shoal runtime clustering (Shoal GMS) and HA (Shoal Cache) team has been busy working on bringing GMS and In-memory replication capabilities to GlassFish Server 3.1 which is the release-in-development at the time of writing this entry. 

In this entry, I will describe steps to configure and test High Availability with GlassFish Server 3.1 and trust this will help the community and customers to run their own Java EE 5 or Java EE 6 apps with HA and give us feedback. 

Single Machine Step-by-Step instructions to setup cluster and HA 

So here are the steps doing the same on a single machine:

  • Download GlassFish 3.1 final build from here. Pick up either the full Java EE distro or web distro. Use either the zip distribution or the executable installer.  As a convenience, the latest promoted build is also aliased as latest-glassfish-<platform>.(sh/exe) or for the full Java EE distro while it is aliased as latest-web-<platform>.(sh/exe) or for the web distro.
    • <install_dir>/glassfish/bin
  • Ensure you have multicast enabled on your network and Shoal GMS and Cache can work in this environment. Run this on two terminals : 
    • ./asadmin validate-multicast 
    • This should show messages being sent and received between the two terminals and ensure that basic multicast support exists on your network.
    • Your messages would look like this in one of the terminals :
      • Will use port 2,048
      • Will use address
      • Will use bind interface null
      • Will use wait period 2,000 (in milliseconds)
      • Listening for data...
      • Sending message with content "" every 2,000 milliseconds
      • Received data from (loopback)
      • Received data from
      • Exiting after 20 seconds. To change this timeout, use the --timeout command line option.
      • Command validate-multicast executed successfully.
  • Run the command to start the domain : 
    • ./asadmin start-domain 
  • Next create a cluster using the command line interface : 
    • ./asadmin create-cluster <cluster-name>  
    • In the above step, the multicast address and port that Shoal GMS/Cache would use will be auto generated for you. If you want to set a specific multicast address and port of your choice for Shoal GMS to use then do this : 
    • ./asadmin create-cluster --multicastaddress 229.x.x.x --multicastport yyyyy 
      • where x is any integer between 0 and 254 and y is any integer over 1024 (if not a super user) for an available port  you choose. 
  • Next, create a couple or more instances belonging to this cluster : 
    • ./asadmin create-local-instance --cluster <clustername> <instancename>

      Note down the HTTP Port of each instance as you create them - you will need it when testing out failover. 

  • Next, start the cluster : ./asadmin start-cluster <clustername>
  • Check if the cluster started fine : ./asadmin get-health <clustername>  The get-health command reports data based on GMS's auto-discovery of the instances in the cluster as the cluster started up. You should see output similar to the following :
    • inst1 started since Thu Oct 21 14:45:10 PDT 2010
    • inst2 started since Thu Oct 21 14:45:19 PDT 2010
  • You can also use ./asadmin list-instances command to see if the clustered instances are running. 
  • Now you are ready to deploy an application and try out HA using the port-hopping technique to test failover without an LB. 
    • Note that you can do port hopping only when you are on single physical machine or when both the instances are on the virtual machine instance.  If going beyond a single machine, then you will need to front the cluster with an LB capable of sticky sessions and round robin 
  • Deploy the ClusterJSP application first, before you try your app, as it will help set a baseline with a GlassFish supplied ear file that is tested to establish a baseline that basic HA functionality is working. The clusterjsp file is here (will be part of samples soon) : 
    • ./asadmin deploy --availabilityenabled=true --target <clustername> <path-to>/clusterjsp.ear
    • The availabilityenabled flag is the only requirement in the deploy command to HA enable your application. Besides that, for a web application, you do need to add the <distributable/> element to the web.xml packaged with the application. This tells the web container that the application is ready to be used in a distributed system such as a cluster.
  • Access the first instance's URL on your favorite browser : 
    • http://<host>:<first instance port>/clusterjsp
  • The clusterjsp browser window should look like the following :

Note in the image above, the "Served from Server Instance : inst1" meta information that tells you that this page was served from the first instance "inst1" being the first instance's name.

Also note, that under the section "Data retrieved from the HttpSession:" there is an entry stating jreplicaLocation=inst2. This is a HttpSession Cookie that is sent back by the ShoalCache layer to the web container that forwards it to the browser that an LB could potentially use - that this session's replica instance is inst2. An LB capable of handling such information such as the latest upcoming version of GlassFish LB Plugin that works with the Oracle iPlanet WebServer, can failover to the exact replica instance on failure of a primary, thereby saving broadcast type network calls in the replication layer to find out which instance has a particular session to be resumed on this failover instance.  This is very useful particularly in larger clusters. 

  • Add some session data as a key and a value, Name of Session Attribute: John and Value:  Doe. 
The above page shows that first instance has saved the session data to the second instance and has it in the first instance web container's active cache. 
  • Now to simulate failover, you can port hop to the next instance or any random instance in your cluster, say second instance 
    • http://<host>:<second instance port>/clusterjsp
    • Before doing the above, you can optionally, stop the instance that served the first request i.e the first instance,  using 
      • ./asadmin stop-instance <instance name> command or 
      • find the process id of first instance  by using jps -mlvV | grep <instancename> and terminate the process using kill -9 <pid>
      • Run ./asadmin get-health command again to see the status of the cluster. You should see output similar to the following if you killed the instance :
      • inst1 failed since Thu Oct 21 15:17:47 PDT 2010
      • inst2 started since Thu Oct 21 14:45:19 PDT 2010
    • On the second instance's page you will see that the session data written on the first instance was saved in the cluster and retrieved when the page was loaded on second instance. The session was resumed on second instance. Your page would look similar to the following : 

Note above that the second instance served the page and the session data written by the first instance was retrieved from the replica cache by the replication module in second instance. Also note that the second instance has the first instance as its replica in this two instance cluster, but we know that instance does not exist any more as it was killed or stopped. 

  • At this point, any session data written from this page on second instance would not be highly available if you have a two instance cluster as the first instance is no longer around. 
  • Go the terminal window and restart the first instance ./asadmin start-instance <firstinstancename> 
  • Go back to the browser that has the page served from second instance and add some session data, ex. Name of Session Attribute : Jane, Value of Session Attribute: Doe. Your page should look like the following:

Note that session parameters Jane = Doe has been added to the session but the session should now be highly available as you have restarted the first instance and then written the session parameters on the second instance. 

  • At this time, simulate a second failover by porthopping to the first instance  http://<host>:<first instance port>/clusterjsp/HaJsp.jsp
  • Your page should look like the following : 

As you see above, inst1 retrieved all the session parameters and this shows the high availability of sessions in the two instance cluster. 

If all goes well as above, you now have a baseline with which to compare your experience when you deploy your own application to try out GlassFish 3.1 High Availability for sessions.

You can also try a cluster with 3 or more instances to see High Availability in action. You will also see that for each new session, the replication module chooses a different replica instance - this is a change from the buddy replication based mechanism with GlassFish v2.x High Availability feature. We will have a more detailed blog entry on this new approach later. 

With the above instructions, you should now be able to deploy your own application and if you see issues there and not with clusterjsp, this will give you a reason to investigate what is different with your application behavior that could contribute to the issue. Most often, the issue would be a non-Serializable object in your session that was working fine when you deployed to a non clustered single instance as there was no need to ship session objects to another replica instance. When the non-Serializable was involved in a distributed systems setup as in a cluster, issues start to show up. So look out for those situations. Start with scouring the server log for these indications. 

If you do see issues that you believe belong in GlassFish HA component, please send us feedback on the user list : users at glassfish dot dev dot java dot net 

You can also file issues at the GlassFish issue tracker  here.  GlassFish HA issues are filed under the "failover" subcomponent.


Shreedhar Ganapathy


« August 2016