Friday Jan 11, 2008

GlassFish Hidden Nugget: Automatic Distributed Transaction Recovery Service

GlassFish v2 and v2 ur1 releases (and later) have support for transaction recovery (both manual and automated) in the sense that incomplete transactions at the time of an instance failure can be committed either manually or automatically.

Part of the new feature set in the cluster profile is a little known feature called Automated Distributed Transaction Recovery that comes out of Project Shoal's support for it. 

Essentially, Automatic Distributed Transaction Recovery in GlassFish works as follows :

Consider the following :

  • a cluster of three instances : instance1, instance2, and instance3
  • Two XA resources used by each GlassFish instance
  • a transaction starts on instance 1,
  • Transaction Manager on instance1 asks resource X to pre-commit,
  • Transaction Manager on instance1 asks resource Y to pre-commit,
  • Transaction Manager on instance1 asks resource X to do a commit,

Now, instance1 crashes

The Transaction Service component in one of the surviving members, instance2 and instance3, gets a notification signal that a failure recovery operation needs to be performed for a instance1. This signal from Shoal is called FailureRecoverySignal.

This notification signal comes to the Transaction Service component in only one particular selected instance as a result of a selection algorithm run in Shoal's GMS component that takes advantage of the identically ordered cluster view provided to it by the underlying group communication provider (default provider is Jxta).

The Transaction Service component in this instance, say instance2, would now go into its autorecovery block. It starts by waiting for a designated time (default to 60 seconds) to allow for the failed instance1 to start back up.

If instance1 is starting up, its own Transaction Service component would do self recovery to complete phase 1 transactions.

In instance2, after the wait timeout occurs, the transaction service component would now see if instance1 is part of the group view and if not try to acquire a lock for the failed instance's transaction logs through Shoal's FailureRecoverySignal and if successful (indicating that the failed instance did not startup), acquire the transaction log and start recovery of transactions i.e complete the commit operations for the pre-commit transactions. If the acquisition of the lock fails, then it gives up, and checks that the failed instance did startup through Shoal's group view and logs this fact.


If, during the recovery operations  being performed by instance2, the failed instance1 starts up, the transaction service component in this instance would first check with Shoal if a recovery operation is in progress for its resources by any other instance in the group and if yes, it waits for the recovery operations to be completed and then completes startup. This ability to check for such recovery operations in progress is through a related Shoal feature called Failure Fencing[1].  If there are no recovery operations in progress, then the startup proceeds with a self recovery which recovers any incomplete transactions in instance1's logs.

Now during recovery of instance1's transaction logs, instance2 fails, then the fact that this instance was in the process of recovering for instance1 is known to the remaining members of the group (i.e. instance3) through the failure fencing recovery state recorded in Shoal's Distributed State Cache. As a result, when instance3's transaction service gets the failure recovery signal, not only does it get it for instance2's failure, but also for instance1. This facility covers for cases where cascading failures or multiple failures occur.

Note that, for the automatic distrbuted transaction recovery to work, access to the transaction logs for all instances in the cluster for
purposes of auto recovery requires that the logs be mounted on a shared/mirrored disk[2].


[1] More on Shoal's Automated Delegated Recovery Selection
[2] Distributed Transaction Recovery

 

 

 

Saturday Dec 08, 2007

Excellent Article on Shoal by non-Sun authors

Just came across this excellent introductory article on Shoal clustering framework on Java.net which I believe is to be published on upcoming Tuesday going by the date posted (12/11/2007).

Noticeably, this is an article by authors that we, at the Shoal community, have not yet corresponded with. This is great news as it lets us know that there is a quiet adoption of this framework. The article lucidly explains salient aspects of Shoal's clustering approach and how easy it is to integrate it into your application/infrastructure. 

We hope this will make it even easier for users to adopt this technology.

Do send us your questions at the Shoal users mailing list.


Wednesday Dec 05, 2007

Shoal Whitepaper on Scalable Dynamic Clustering

Its been a while since I have blogged. I have been rather busy managing the GlassFish v2 ur1 release which is round the corner around Dec 13/14.

Meanwhile, Mohamed Abdelaziz and I put together a Whitepaper that gives details on the scalability and dynamic clustering aspects of Shoal. The paper provides a good overview of the self composing nature of Shoal and dives deeper into the set of protocols that provide the basis for building fault tolerant infrastructures.

Shoal is coming along very well in various spaces going by the increasing hits to our website from Telco and Financial majors. We are continuously improving Shoal into a robust component based on feedback from our user community. So keep'em coming to the users mailing list.

Tuesday Aug 14, 2007

Shoal Clustering User Guide Part 1

In this series of blog entries, I will provide a guide for new users on how to use Shoal for your application.

This particular blog entry provides a Shoal Clustering 101 type basic introduction. Subsequent blogs will enable the user to get more closer to the metal on how to use this technology.

What is Shoal? 

Shoal is a technology/framework that allows consuming applications the ability to participate in a group and thereby communicate with the group, register interest in being notified of group events, and share application data across group members. These functionalities enable Shoal to be used as a clustering framework in enterprise infrastructure software while also being capable of other use cases.

Shoal's core is the Group Management Service (GMS) which provides client APIs to interact with a group while allowing group communication libraries to be integrated through a service provider interface implementation. Currently with a Jxta service provider, Shoal takes advantage of advanced Jxta features for robustness, reliability and scalability.

While Shoal itself is scalable to many instances in a group (upto 64 nodes in our tests), the scaling is always determined by the size and characteristics of the employing application/product.

Shoal Downloads  

Among the first things you want to know is where to get Shoal downloads.  

Shoal download is available here. Pick the latest zip file for the latest and greatest stable version. The zip file contains two jars of interest - the shoal-gms.jar containing Shoal's client api, GMS core implementation, and a Jxta service provider implementation,  and jxta.jar which is the Jxta peer-to-peer platform.

Shoal Documentation 

Now that you have the jars, you would need documentation to see how to integrate Shoal into your product. Shoal offers APIs to consuming applications to participate in a cluster. Shoal's JavaDocs containing the APIs are available here. In the JavaDoc, select the com.sun.enterprise.ee.cms.core package.  Look in the Description section for a simple introduction to the API.

As the Description mentions, the GMSFactory class located in the com.sun.enterprise.ee.cms.core package is the entry point for getting Shoal's Group Management Service.

Code Snippet 

The following code snippet uses the GMSFactory to start the GMS module, use the GroupManagementService reference to initialize the group, register interest in events,  join the group and the api to call when the process is ready to leave the group : 


//initializes GMS and underlying group communication provider

final GroupManagementService gms = GMSFactory.startGMSModule(serverIdentifierName, groupIdentifierName, GroupManagementService.MemberType.CORE, configProperties);

//register for group events

//register to receive notification when a process joins the group
gms.addActionFactory(new JoinNotificationActionFactoryImpl(this));

//register to receive notification when a group member leaves on a planned shutdown
gms.addActionFactory(new PlannedShutdownActionFactoryImpl(this));

//register to receive notification when a group member is suspected to have failed
gms.addActionFactory(new FailureSuspectedActionFactoryImpl(this));

//register to receive notification when a group member is confirmed failed
gms.addActionFactory(new FailureNotificationActionFactoryImpl(this));

//register to receive notification when this process is selected to perform recovery operations on a failed member's resources
gms.addActionFactory(serviceName, new FailureRecoveryActionFactoryImpl(this));

//register to receive messages from other group members to this registered component

gms.addActionFactory(new MessageActionFactoryImpl(this), componentName);

//joins the group
gms.join();
 

//leaves the group gracefully
gms.shutdown(GMSConstants.shutdownType.INSTANCE_SHUTDOWN); 

As the above code snippet shows, it is extremely easy to instrument Shoal into your application and be on the road to clustering your application and take remedial actions on occurrence of group events.

A Simple Shoal Example (sources)

Look through this example code to see the full source of a Simple Shoal Example.  Look in the runSimpleSample() method to understand steps to make Shoal an in-process component.

Questions? Comments? 

If you have questions on the above, please send your questions to the Shoal users alias: users [at] shoal [dot] dev [dot] java [dot] net or post your questions as a comment to this blog entry.

In the next blog entry, I will go a bit deeper explaining specific parts of Shoal.


Tuesday Aug 07, 2007

Shoal Clustering Framework 1.0 Early Access available

Its been a fairly long time since I blogged.

Over the past few months, we (the GlassFish HA team and the Jxta Team ) have been concentrating on improving and addressing GlassFish high availability features and associated bugs. In the process, Shoal's Group Management Service benefited from intensive QE cycles on 8-node GlassFish clusters under scores of scenarios and test cases. We have been focused on fixing bugs, improving performance, and progressively gating changes in Shoal to manage risks for the upcoming GlassFish v2 FCS release. 

At the moment, Shoal is in good shape and we decided to release the Shoal 1.0 Early Access version a couple of days ago. This will be followed by the 1.0 Final release before the GlassFish v2 release once we know that we have delivered the final acceptable bits for that product.

After that our next step is to address the unique requirements that arise out of the Sailfin project which is building a SIP supported application server based on GlassFish.  

We'd love to hear feedback from our user community about your success stories, issues and enhancement requests using Shoal.

We have seen anecdotal evidence of how useful and easy-to-use this library is and it would help improve our adoption and project growth if more specific feedback is available. Our Statcounter statistics are showing companies from very interesting industry segments going through the Shoal site, downloads and documents so this is a huge boost to our commitment to build a good group communications API based library.

If you have a success story or feedback using Shoal that you can share, drop us a line at the Shoal user alias

users[at]shoal[dot]dev[dot]java[dot]net

and we will highlight it our blogs and on the Shoal web site.

Also we welcome code and design contributions from experienced clustering and distributed systems developers.

 

 

 

About

Shreedhar Ganapathy

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today