Thursday Oct 21, 2010

How To Configure & Test High Availability with GlassFish Server 3.1 Using A Single Machine

Update 02/28/2011 : GlassFish 3.1 shipped today with clustering, centralized administration, high availability and a host of new features providing these features on the Java EE 6 platform.  Get the download here.

The Shoal runtime clustering (Shoal GMS) and HA (Shoal Cache) team has been busy working on bringing GMS and In-memory replication capabilities to GlassFish Server 3.1 which is the release-in-development at the time of writing this entry. 

In this entry, I will describe steps to configure and test High Availability with GlassFish Server 3.1 and trust this will help the community and customers to run their own Java EE 5 or Java EE 6 apps with HA and give us feedback. 

Single Machine Step-by-Step instructions to setup cluster and HA 

So here are the steps doing the same on a single machine:

  • Download GlassFish 3.1 final build from here. Pick up either the full Java EE distro or web distro. Use either the zip distribution or the executable installer.  As a convenience, the latest promoted build is also aliased as latest-glassfish-<platform>.(sh/exe) or for the full Java EE distro while it is aliased as latest-web-<platform>.(sh/exe) or for the web distro.
    • <install_dir>/glassfish/bin
  • Ensure you have multicast enabled on your network and Shoal GMS and Cache can work in this environment. Run this on two terminals : 
    • ./asadmin validate-multicast 
    • This should show messages being sent and received between the two terminals and ensure that basic multicast support exists on your network.
    • Your messages would look like this in one of the terminals :
      • Will use port 2,048
      • Will use address
      • Will use bind interface null
      • Will use wait period 2,000 (in milliseconds)
      • Listening for data...
      • Sending message with content "" every 2,000 milliseconds
      • Received data from (loopback)
      • Received data from
      • Exiting after 20 seconds. To change this timeout, use the --timeout command line option.
      • Command validate-multicast executed successfully.
  • Run the command to start the domain : 
    • ./asadmin start-domain 
  • Next create a cluster using the command line interface : 
    • ./asadmin create-cluster <cluster-name>  
    • In the above step, the multicast address and port that Shoal GMS/Cache would use will be auto generated for you. If you want to set a specific multicast address and port of your choice for Shoal GMS to use then do this : 
    • ./asadmin create-cluster --multicastaddress 229.x.x.x --multicastport yyyyy 
      • where x is any integer between 0 and 254 and y is any integer over 1024 (if not a super user) for an available port  you choose. 
  • Next, create a couple or more instances belonging to this cluster : 
    • ./asadmin create-local-instance --cluster <clustername> <instancename>

      Note down the HTTP Port of each instance as you create them - you will need it when testing out failover. 

  • Next, start the cluster : ./asadmin start-cluster <clustername>
  • Check if the cluster started fine : ./asadmin get-health <clustername>  The get-health command reports data based on GMS's auto-discovery of the instances in the cluster as the cluster started up. You should see output similar to the following :
    • inst1 started since Thu Oct 21 14:45:10 PDT 2010
    • inst2 started since Thu Oct 21 14:45:19 PDT 2010
  • You can also use ./asadmin list-instances command to see if the clustered instances are running. 
  • Now you are ready to deploy an application and try out HA using the port-hopping technique to test failover without an LB. 
    • Note that you can do port hopping only when you are on single physical machine or when both the instances are on the virtual machine instance.  If going beyond a single machine, then you will need to front the cluster with an LB capable of sticky sessions and round robin 
  • Deploy the ClusterJSP application first, before you try your app, as it will help set a baseline with a GlassFish supplied ear file that is tested to establish a baseline that basic HA functionality is working. The clusterjsp file is here (will be part of samples soon) : 
    • ./asadmin deploy --availabilityenabled=true --target <clustername> <path-to>/clusterjsp.ear
    • The availabilityenabled flag is the only requirement in the deploy command to HA enable your application. Besides that, for a web application, you do need to add the <distributable/> element to the web.xml packaged with the application. This tells the web container that the application is ready to be used in a distributed system such as a cluster.
  • Access the first instance's URL on your favorite browser : 
    • http://<host>:<first instance port>/clusterjsp
  • The clusterjsp browser window should look like the following :

Note in the image above, the "Served from Server Instance : inst1" meta information that tells you that this page was served from the first instance "inst1" being the first instance's name.

Also note, that under the section "Data retrieved from the HttpSession:" there is an entry stating jreplicaLocation=inst2. This is a HttpSession Cookie that is sent back by the ShoalCache layer to the web container that forwards it to the browser that an LB could potentially use - that this session's replica instance is inst2. An LB capable of handling such information such as the latest upcoming version of GlassFish LB Plugin that works with the Oracle iPlanet WebServer, can failover to the exact replica instance on failure of a primary, thereby saving broadcast type network calls in the replication layer to find out which instance has a particular session to be resumed on this failover instance.  This is very useful particularly in larger clusters. 

  • Add some session data as a key and a value, Name of Session Attribute: John and Value:  Doe. 
The above page shows that first instance has saved the session data to the second instance and has it in the first instance web container's active cache. 
  • Now to simulate failover, you can port hop to the next instance or any random instance in your cluster, say second instance 
    • http://<host>:<second instance port>/clusterjsp
    • Before doing the above, you can optionally, stop the instance that served the first request i.e the first instance,  using 
      • ./asadmin stop-instance <instance name> command or 
      • find the process id of first instance  by using jps -mlvV | grep <instancename> and terminate the process using kill -9 <pid>
      • Run ./asadmin get-health command again to see the status of the cluster. You should see output similar to the following if you killed the instance :
      • inst1 failed since Thu Oct 21 15:17:47 PDT 2010
      • inst2 started since Thu Oct 21 14:45:19 PDT 2010
    • On the second instance's page you will see that the session data written on the first instance was saved in the cluster and retrieved when the page was loaded on second instance. The session was resumed on second instance. Your page would look similar to the following : 

Note above that the second instance served the page and the session data written by the first instance was retrieved from the replica cache by the replication module in second instance. Also note that the second instance has the first instance as its replica in this two instance cluster, but we know that instance does not exist any more as it was killed or stopped. 

  • At this point, any session data written from this page on second instance would not be highly available if you have a two instance cluster as the first instance is no longer around. 
  • Go the terminal window and restart the first instance ./asadmin start-instance <firstinstancename> 
  • Go back to the browser that has the page served from second instance and add some session data, ex. Name of Session Attribute : Jane, Value of Session Attribute: Doe. Your page should look like the following:

Note that session parameters Jane = Doe has been added to the session but the session should now be highly available as you have restarted the first instance and then written the session parameters on the second instance. 

  • At this time, simulate a second failover by porthopping to the first instance  http://<host>:<first instance port>/clusterjsp/HaJsp.jsp
  • Your page should look like the following : 

As you see above, inst1 retrieved all the session parameters and this shows the high availability of sessions in the two instance cluster. 

If all goes well as above, you now have a baseline with which to compare your experience when you deploy your own application to try out GlassFish 3.1 High Availability for sessions.

You can also try a cluster with 3 or more instances to see High Availability in action. You will also see that for each new session, the replication module chooses a different replica instance - this is a change from the buddy replication based mechanism with GlassFish v2.x High Availability feature. We will have a more detailed blog entry on this new approach later. 

With the above instructions, you should now be able to deploy your own application and if you see issues there and not with clusterjsp, this will give you a reason to investigate what is different with your application behavior that could contribute to the issue. Most often, the issue would be a non-Serializable object in your session that was working fine when you deployed to a non clustered single instance as there was no need to ship session objects to another replica instance. When the non-Serializable was involved in a distributed systems setup as in a cluster, issues start to show up. So look out for those situations. Start with scouring the server log for these indications. 

If you do see issues that you believe belong in GlassFish HA component, please send us feedback on the user list : users at glassfish dot dev dot java dot net 

You can also file issues at the GlassFish issue tracker  here.  GlassFish HA issues are filed under the "failover" subcomponent.

Wednesday Oct 28, 2009

GlassFish v2.1.1 and Sailfin 2.0 (Sun GlassFish Communications Server 2.0 ) released!

Great news out today that both the GlassFish v2.1.1 (Sun GlassFish Enterprise Server 2.1.1) and Sailfin 2.0 (Sun GlassFish Communications Server 2.0 ) have been released today. 

GlassFish v2.1.1 has over 200 bug fixes and improvements. On the high availability side, we have refined the replication module further and on the fault tolerance side, Shoal GMS Clustering (version 1.1) now leverages the GlassFish Node Agent's ability to detect process losses quickly to inform Shoal GMS Members about failures. This improved our software failure detection timeout from around 9-12 seconds to around 4-5 seconds - an important improvement in supporting failover and other recovery oriented computing components.  

Sailfin 2.0 introduces High Availability for SIP and Converged applications and uses an enhanced predictive replication algorithm supporting high traffic and large number of active sessions in addition to supporting Rolling Upgrade with session states retained. With many improvements in Shoal's clustering core along with support network traffic specification to specific network interfaces, and improved health state transitions, this makes for added reliability and scalability in the clustering capabilities of Sailfin. 

Congratulations to the entire Sailfin and GlassFish developer community for splendid work in successfully releasing these products. 


Friday Sep 25, 2009

How Does Sailfin (Sun GlassFish Communications Server) SIP Session Replication Module Select Replica Instances?

For scalable deployments of middleware with high availability, employing a session state persistence approach to persist session state to all instances in the cluster could be a sub-optimal solution. Replicating sessions to all instances in the cluster would result in significantly higher network traffic just for replicating state reducing bandwidth for growing application user requests. This approach of sharing sessions across all instances perhaps is suited for small clusters with limited number of concurrent requests.

One of the better approaches to use to secure scaling advantages is the approach of buddy replication. In this approach, each instance selects one (or more) other instance(s) in the cluster to replicate any and all of its sessions. This is a superior approach and in fact, works for fairly large deployments. There are factors to consider here, in terms of the overhead the replication subsystem will need to handle at the cost of performance particularly when large number of concurrent sessions are being processed and later expired. An overhead to consider is the need for instances to form ring-like replica partnerships based on a certain order in which buddies would be available and selected. When a buddy instance fails, there is the cost of re-adjusting and forming new buddy relationship with another surviving instance, and when the original buddy recovers, to re-adjust again to use this upcoming instance as a replica partner by one of the instances in the cluster. Think of this as a chain based ring with its links randomly being removed but with the consistent goal of retaining a connected chain ring with the overhead of relinking each time a link is removed or added or a new one added to the chain.

There is also cost to be considered (if such were the design approach), each time the cluster shape changes for dynamically changing/updating any cookie information pertaining to replica locations that could be sent back as part of the response headers to the LB - typically that cost should also be avoided through more efficient means.

In the case of GlassFish, we have fairly successfully used buddy replication with each instance having a single replica buddy. We use the approach of locating sessions in the cluster when a request is directed by the LB to any random instance when a failure of an instance that was processing requests occurs. This has worked well for reasonably large mission critical environments where the scalability and availability requirements are within the boundaries of this approach.

In Sailfin 2.0, scaling and reliability needs for telco applications is typically very high and we needed a scalable approach to ensure Sip Session Replication overhead sustained good performance with the added reliability and availability. We, therefore, used a consistent hashing algorithm to dynamically assign a replica instance for each new session. This we did by leveraging the consistent hashing mechanism that the Sailfin Converged Load Balancer (CLB) uses for proxying requests to a target instance using a BEKey. In the case of replication, the same logic of using a hashed key for the target instance assignment is taken a bit further.

For replica selection, for each new session, we pre-calculate the most likely target instance that the CLB would failover to, if the current target primary instance that would serve the session, were to fail in future. This gives us the instance to which, the current primary instance, should replicate to. This gave us significant benefits in that there were no client cookie updates required to include replica partner information dynamically. There was no readjustment of replica partnerships needed when a particular instance failed as the hashing algorithm would provide another instance to replicate to with just an API call. When the failed instance comes back into the cluster, the sessions that were owned by it in its prior incarnation that are unexpired, would migrate back to it to maintain a balanced set of sessions across the cluster. And the replica selection algorithm would assign the original failover instance for this primary, as the replication partner.

Since this is based on a hashed selection algorithm with predetermined failover target, replica selection is dynamic, and does not need the knowledge of a particular order of instances being ready in the cluster to point all sessions from another instance as a replication partner. And more importantly, as the failover occurs to the specifc instance where replica data is located, there is significantly less network overhead to locate any particular session in the cluster when a particular request within the session scope is sent to the CLB. This allows for more bandwidth being available for a larger number of user sessions to be served. This approach is thus superior to the buddy replication approach and helped us scale to higher throughput and sustain a larger number of long running sessions.

It must be emphasized here that system level, and application server level tuning, and sizing are essential to ensure sustained performance, scalability and reliability in addition to the improvements provided with the SSR replication scheme and other parts of the Sailfin v2 server (aka Sun GlassFish Communications Server 2.0) .

As always, we welcome your feedback and encourage you to try Sailfin and send us any inputs and questions you may have in this respect.

Sailfin Promoted Builds are available here : Sailfin Downloads

Tuesday Sep 08, 2009

Sun GlassFish Communications Server (Sailfin) adds High Availability Feature

Project Sailfin is building version 2.0 of the JSR 289 compliant application server. The Sailfin 2.0 release also known as Sun GlassFish Communications Server 2.0 will have a notable new feature with the addition of Sip Session Replication component. Sailfin 2.0 will provide High Availability of Sip artifacts providing resilience and availability of conversational state to Telco deployments.  Sailfin 2.0 is targeted for release around end of October/early November 2009. 

High availability through Sip Session Replication component (aka SSR component) allows for replication of SIP artifacts such as SIP Application Sessions, SIP Sessions, Timers, and Dialog Fragments in addition to Converged Sessions. Combined with existing GlassFish replication support for HTTP sessions, deployments can now be highly available covering both SIP protocol-only applications  and converged (SIP and HTTP ) applications.  To support the large scale load that can typically expected with Telco applications, the HA team employed a dynamic replica selection algorithm for each SIP artifact based on a consistent hashing algorithm thus obviating the need for buddy based replication where one would need to react to cluster shape changes and re-partner with another instance when a failure occurs - an expensive operation during high load conditions.   (see this blog entry for more details )

The SSR component along with all of Sailfin is undergoing intensive quality testing including 24x7 longevity, scalability, reliability and fault tolerance testing at this time and we are making progress every day. 

Turning on SSR in Sailfin 2.0 builds is extremely easy similar to how it is with GlassFish. You only need to deploy your SIP (JSR 289 compliant) or Converged Application with the availability-enabled option checked in the Admin GUI console or use the --availabilityenabled=true switch with the asadmin deploy command when deploy your SIP Archive sar file. 

Here's your call to action : Go ahead and download the latest promoted build of Sailfin v2, deploy your SAR archive with availablity-enabled set to true (SSR enabled) and provide us feedback. 

Sunday Jun 21, 2009

GlassFish Clustering: Meaning and impact of configuring attributes of group-management-service element ?

Based on a query from one of our field colleagues, I added an FAQ entry on the meaning of the group-management-service element attributes  and impact of changing the default values as this is not yet covered by our official documentation (it will be updated with this information soon).

These configurational attributes determine Shoal GMS's behavior with respect to Discovery and Health Monitoring of cluster instances in a GlassFish cluster. It is very useful and important to understand these underpinnings of GlassFish's runtime clustering engine.

Here's the FAQ entry :

Tuesday Feb 10, 2009

Performance Monitor enters Sun's GlassFish offerings

Sun today announced an array of commercial offerings based around the successful open source GlassFish project's application server. As part of these offerings is an enterprise management enabler tool aptly called Enterprise Manager :)

The Enterprise Manager is a collection of utilities designed to enhance your production deployment experience with Sun GlassFish Enterprise Server v2.1

The Enterprise Manager comprises of three components :

  • Performance Advisor
  • SNMP Monitoring support
  • Performance Monitor

The Enterprise Manager is included when you purchase a support subscription for Sun GlassFish Enterprise Server v2.1. For more details on the Enterprise Manager look here.  For more details on the Sun GlassFish Portfolio subscription plans, look here.

Of the above Enterprise Manager components, the Performance Advisor, and SNMP Monitoring support components are covered in a few other blogs. In this entry, I'll introduce you to the Performance Monitor.

The Performance Monitor is a closed source product from Sun built on top of VisualVM 1.1, and Netbeans 6.5 Platform. It is a  monitoring tool that provides dynamic visualization of GlassFish Server as it runs your applications.

Some of its key features include :

  • Monitoring local and remote Java processes
  • Trendline representation through charts for JVM monitoring artifacts such as Heap, CPU utilization, threads, classes, etc much like JConsole.
  • Data collection, processing and charting for GlassFish monitoring artifacts. There are extensive charts that leverage the GlassFish JMX based APIs and provide views into monitoring the health of your GlassFish deployment. 

The Performance Monitor is designed with easy-to-use features including:

  • Easy connectivity
    • Local or remote GlassFish servers    
    • Choice of SSL/TLS based secure, or plain JMX connections
  • Logical fine-grained views of GlassFish servers, clusters, node agents, deployed artifacts and services
  • Embellishments such as
    • tool tips explaining what the chart is about and current numbers,
    • ability to hide certain chart lines from a chart,
    • hide/unhide charts from the page view
    • drag top level tab to the side or bottom of page for a lateral or stacked view of the charts.
  • Detailed charts covering telemetry data for common, critical services and resources in production systems based on feedback from existing customers on the need for charting these monitoring artifacts
    • these include Http Services including per listener views, Web services, Pools such as thread pools, JDBC and other connection pool statistics, etc.

Here are some screenshots for your reference :

A collective view of a few web tier monitoring data:   

A view of the Http Service graphs :

Perf Monitor Http Listener View

A View of the Node Agent Page with status of instances the Node Agent is managing:

Perf Monitor Node Agent View

The Performance Monitor tool is immensely helpful in diagnosing problems before they turn into production bottlenecks and show stoppers. We hope that this offering along with GlassFish support subscription will help our customers be successful with their GlassFish deployments.

Tuesday Oct 21, 2008

Economic woes resulting in slashed tech budgets? Sun's GlassFish+MySQL makes perfect sense

With signs of a global economic turmoil underway, the financial system is under severe stress with credits vanishing, and Government Departments, small and medium businesses, and large corporations are faced with potential budget cuts for technology spending. Case in point :

With Sun's open source "stack" and cost-effective support subscriptions, affected sectors of the economy can take advantage of standards based, high quality, open source software with the low cost support services from Sun. This is a good time to consider moving away from expensive proprietary software stacks to Sun's open source software.

Take the case of Sun's commercially supported GlassFish + MySQL offering, both rock solid products that come with features only available in high cost closed-source products. This offering provides you the opportunity to seamlessly move from expensive licenses to an annual support subscription based offering that will reduce your cost of ownership. For example, Unlimited deployment of GlassFish + MySQL starts at $65,000. GlassFish + MySQL constitutes a very compelling offering that will help you justify the move both on features and costs.

Contact Sun to find out how you can save significantly with this offering.

For more on the value proposition, read Arun's blog entry here.

Wednesday Mar 12, 2008

GlassFish High Availability Session at Sun Tech Days Hyderabad

At the recent Sun Tech Days event at Hyderabad, I gave a talk covering GlassFish's High Availability features, particularly the In-Memory Replication support, as part of GlassFish Day (Feb 29th). 

I had the privilege of talking to a full house of around 500 people. The session covered introduction to HA, how easy it is to create, and configure a cluster of instances, and to configure the application for enabling in-memory replication based availability. The session elicited very good questions ranging from the basics to involved ones in the area of sizing the heap to sticky sessions support. I spent an hour after the session outside the hall answering questions posed by interested folks from several companies.

Many attendees wanted to get a copy of the slide deck. Look here for it.

Needless to say, we would very much appreciate any feedback or questions on GlassFish's High Availability. Please send these to us at the GlassFish user mailing alias.

Tuesday Mar 11, 2008

Notes from GlassFish Booth at Sun Tech Days, Hyderabad '08

I am back after a two week visit to India. I took the week off last week to be with my folks and to unwind and recharge in my home city, Mumbai.

The week before that was an incredible one for me as I saw the hi tech boom in India first hand. The swelling and enthusiastic crowds at the Sun Tech Days event at Hyderabad on Feb 27, 28 and 29 was very thrilling to experience to say the least. The event was highly successful in attracting budding youngsters and experienced professionals alike. I am told that the content of the event was very fulfilling for attendees.

Some notes of interest :

  • Each session had between 1000 and 1500 attendees. Community Day was on Feb 29th with tracks such as GlassFish, OpenSolaris, Netbeans, etc. and each track had an expected 450 people based on hall capacity. In the GlassFish track, we saw that increase to around 500.
  • At the GlassFish booth, I met quite a lot of visitors many of whom had either vaguely heard about GlassFish or never heard of it. About a 3rd of people were ones who work with an Application Server in a professional capacity. Others were developers learning the ropes at work or students who were excited to see an open source appserver like GlassFish.
  • GlassFish is relatively new here - people have heard about it and are now beginning to try it out. So this was a huge opportunity to spread the word about GlassFish as a compelling open source project delivering a fully Java EE 5 compliant application server packing a whole bunch of features that commercial vendors usually provide for the cost of an expensive license and services attachment.
  • Common theme of questions from professionals were on availability of migration tools to move from BEA, and Websphere. They are looking for detailed documentation and engineering services support for migration to enable them to recommend GF in their orgs and in their client orgs. We are working on this on multiple fronts and through the migrate2glassfish project.
  • Many professionals I talked to had tried JBoss before but due to their employer or customer platform preference, they were on WebSphere or WebLogic.
    They were impressed with GlassFish's Administration ease of use and GlassFish's published SpecJ numbers.
  • Companies from which people came to booth read like the who's who of the hitech majors' list such as Wipro, Satyam, Infosys, Cap Gemini, etc.
  • Most users were still developing on J2EE 1.4. Many are beginning to move to Java EE 5.
  • Many developers I met were on Eclipse. I demoed GlassFish and NetBeans 6.0 when possible at the booth and many went back saying they will surely try NB with GF.
Based on reactions and questions that came from people at the booth, here are the things that I thought attracted many of these folks to think of converting from their existing middleware to GlassFish : 


  • GlassFish's open source status
  • Free for development and deployment
  • No license fee for product purchase
  • Strong community support
  • Sun's backing and commercial support offering (the concept of indemnification was new for some but many understood impact for their customers abroad)
  • Administration ease of use
  • Market leading product differentiators such as Grizzly, Virtual server support in Web tier, Web Services support, Easy cluster creation and management, Call Flow Monitoring, High Availability Options through In-memory replication and HADB, Inclusion of a quality MQ product, Netbeans and Eclipse IDE integration, etc.

Many of the professionals working with other application servers said the Administration, Clustering, High Availability, Loadbalancer support, Webservices support, migration support, and IDE integration would motivate them to try out or switch to GlassFish. 

Looking forward to hearing from these new GlassFish users at our user community mailing list.




Friday Jan 11, 2008

GlassFish Hidden Nugget: Automatic Distributed Transaction Recovery Service

GlassFish v2 and v2 ur1 releases (and later) have support for transaction recovery (both manual and automated) in the sense that incomplete transactions at the time of an instance failure can be committed either manually or automatically.

Part of the new feature set in the cluster profile is a little known feature called Automated Distributed Transaction Recovery that comes out of Project Shoal's support for it. 

Essentially, Automatic Distributed Transaction Recovery in GlassFish works as follows :

Consider the following :

  • a cluster of three instances : instance1, instance2, and instance3
  • Two XA resources used by each GlassFish instance
  • a transaction starts on instance 1,
  • Transaction Manager on instance1 asks resource X to pre-commit,
  • Transaction Manager on instance1 asks resource Y to pre-commit,
  • Transaction Manager on instance1 asks resource X to do a commit,

Now, instance1 crashes

The Transaction Service component in one of the surviving members, instance2 and instance3, gets a notification signal that a failure recovery operation needs to be performed for a instance1. This signal from Shoal is called FailureRecoverySignal.

This notification signal comes to the Transaction Service component in only one particular selected instance as a result of a selection algorithm run in Shoal's GMS component that takes advantage of the identically ordered cluster view provided to it by the underlying group communication provider (default provider is Jxta).

The Transaction Service component in this instance, say instance2, would now go into its autorecovery block. It starts by waiting for a designated time (default to 60 seconds) to allow for the failed instance1 to start back up.

If instance1 is starting up, its own Transaction Service component would do self recovery to complete phase 1 transactions.

In instance2, after the wait timeout occurs, the transaction service component would now see if instance1 is part of the group view and if not try to acquire a lock for the failed instance's transaction logs through Shoal's FailureRecoverySignal and if successful (indicating that the failed instance did not startup), acquire the transaction log and start recovery of transactions i.e complete the commit operations for the pre-commit transactions. If the acquisition of the lock fails, then it gives up, and checks that the failed instance did startup through Shoal's group view and logs this fact.

If, during the recovery operations  being performed by instance2, the failed instance1 starts up, the transaction service component in this instance would first check with Shoal if a recovery operation is in progress for its resources by any other instance in the group and if yes, it waits for the recovery operations to be completed and then completes startup. This ability to check for such recovery operations in progress is through a related Shoal feature called Failure Fencing[1].  If there are no recovery operations in progress, then the startup proceeds with a self recovery which recovers any incomplete transactions in instance1's logs.

Now during recovery of instance1's transaction logs, instance2 fails, then the fact that this instance was in the process of recovering for instance1 is known to the remaining members of the group (i.e. instance3) through the failure fencing recovery state recorded in Shoal's Distributed State Cache. As a result, when instance3's transaction service gets the failure recovery signal, not only does it get it for instance2's failure, but also for instance1. This facility covers for cases where cascading failures or multiple failures occur.

Note that, for the automatic distrbuted transaction recovery to work, access to the transaction logs for all instances in the cluster for
purposes of auto recovery requires that the logs be mounted on a shared/mirrored disk[2].

[1] More on Shoal's Automated Delegated Recovery Selection
[2] Distributed Transaction Recovery




Friday Jan 26, 2007

Differences between GlassFish v1 and GlassFish v2

I frequently see this search term "Differences between GlassFish v1 and GlassFish v2" appear in Google searches showing up on my Statcounter account referrer link. So I guess its a good topic to blog about.

GlassFish v1 was the first release of the Java EE 5 application server from the GlassFish Project at This release had featured a application server environment where one could create a domain with one appserver instance in it that was Java EE 5 compatible and included a good looking administration console in addition to performance enablers such as the nio-based Grizzly http connector framework. This was released around JavaOne in 2006. At the time, the community asked for more given the positive responses we received on v1. The community asked about our roadmap and insisted that clustering capabilities were a dire need among other things. We listened to the community and came up with the plan for GlassFish v2. In the meantime, we have also released update releases to GlassFish v1 i.e ur1 and ur1 p01

GlassFish v2 development is in progress and has reached Milestone 4 ( the one we use for the beta release). Primarily, the v2 release offers ability to create a domain with many standalone instances and clusters of instances. In essence you could have one or more domains, each with its own appserver instances some or all of which may be part of one or more clusters (well, an instance can only be part of one cluster, but you can have multiple clusters each with its own instances). In addition, there is the HA component which provides in-memory session state persistence between instances in a cluster. Numerous improvements to the base v1 code have been made to make it more performant. We have dynamic clustering coming in from Shoal that enables components in a clustered instance, and the administrator to get clustered instance status. V2 also contains dynamic IIOP endpoints failover loadbalancer.

The other notable difference from v1 is that you can run GlassFish v2 in various profiles, namely, Developer Profile, Cluster Profile, or Enterprise profile. A domain can be created with a specific profile name. With Developer profile you get the single instance domain that is well suited for development mode. The difference between cluster profile and enterprise profile is sort of subtle. The enterprise profile carries almost all of what cluster profile offers except that it is aimed at large scale deployments in enterprises. The cluster profile basically allows you to create within your domain, not only non-clustered standalone instances but also clusters with instances attached to it with homogeneous configuration. In other words, one can have multiple clusters and have sets of instances within each cluster with homogeneous configuration. The instances can be spread over many machines and the lifecycle of these instances is managed by an entity called a Node Agent which sits on each machine and responds to the domain administrative server's instructions to stop or start an instance on a particular machine. One can deploy in a single action, applications to a target cluster and all instances in that cluster will have the application deployed along with corresponding resources.

The differences between these profiles are covered well in this One Pager document section 4.1.2

There is more as per this blog in the Aquarium :
 • Update Center
 • WS Interoperability (WSIT) Functionality
 • JBI Support
 • In-memory replication.

 • Dynamic clustering from Shoal.

 • Optimized ORB
 • Startup Architecture improvements.
 • Improved Web Tier: Comet,
virtual hosting, in memory JSP compilation.
 • JSF used in the Admin GUI.

Well, you get the idea on how much progress has been achieved from v1 last year to v2 this year.


Send in comments if you would like to hear more details and I will go dig around to get you the necessary information.

 Also send in your successful deployment stories on GlassFish to the Stories blog


Wednesday Oct 25, 2006

SAP Netweaver - Welcome to Java EE 5 Community

A colleague pointed to the SAP Developer Newsletter which states that Netweaver Application Server has been released covering entire set of Java EE 5 standards.

It is indeed a very welcome sign that Netweaver is already a Java EE 5 implementation. It goes to show that SAP sees significant value to be achieved with Java EE 5's developer-ease-of-use oriented specs enabling rapid development of enterprise applications.

That said, I wonder about the claim of being the first in the industry, given that GlassFish/Sun Java System Application Server 9 has been around since May 2006.  As the GlassFish codebase is fundamentally the innards of Java EE 5 SDK and Sun Java System Application Server 9, I  would  tend to believe that  Netweaver is ONE of the first.  Anyways that's another discussion.

Welcome to the Java EE 5 world Netweaver! Here's a toast to a larger Java EE 5 community!
Cheers! :)

Thursday Oct 19, 2006

Java EE SDK: Did you know?

This post is more of a clarification and useful pointer for Java EE 5 SDK, although this may have a little dated pieces of info.

We receive many questions on how performant is GlassFish, its relation to Java EE 5 SDK, and the Sun Java System Application Server version 9. 

While the responses we give vary a bit depending on how narrowly scoped the questions are, the one thing that I feel needs to be communicated well is that the SPECJ numbers that were submitted and published for Sun Java System Application Server 9.0  back in May, '06 are exactly applicable to the Java EE 5 SDK! And to GlassFish v1 (b48 FCS build) ! That's because the code bases are exactly the same, the difference being that GlassFish is the open source project where the Java EE 5 SDK was being developed.

The SpecJ results page also lists the tuning parameters used.

This actually ends up being a little connect-the-dots challenge for learners and adopters alike so it is very useful to clarify here. I do hope more people read this post and benefit.

When you download the Java EE SDK, you are actually not just seeing a Reference Implementation of the Java EE Spec, but also getting a highly price performant application server.

Did you know you could use this application server for single instance deployment in your production system?

Here are some useful links :
Java EE 5 SDK Download
Tom Daly's Blog on SpecJ results
And an insightful blog written around the SpecJ submission time by our performance guru Scott Oaks (interesting comments as well)
GlassFish V2 builds


Tuesday Oct 10, 2006

Introduction to the GlassFish Domain and Domain Model

Much of the contents in this blog apply to GlassFish V2(with clustering support).

The GlassFish Project's application server comes with a very nice browser-based Administration console and a rich command line interface. Behind these easy-to-use interfaces lies the JMX compliant administration infrastructure that instruments management of these configurational entities and makes it possible to configure, operate and perform lifecycle operations on the server. At the heart of it all is the domain configuration model that provides the basis for the domain.xml configuration repository wherein lies the values applied to a given domain's configuration.

The concept of a domain in GlassFish can be related to an administrative area. Under a domain, one can have standalone instances (and with GlassFish v2, cluster(s) of instances). The administrator, thus, configures and manages all the operational aspects of the instances/clusters under a given domain through the Domain Administration Server (DAS). A domain can span multiple machines. Instances belonging to a domain can be located on multiple machines. Each machine's instances are started/stopped by a lightweight java process called the node-agent. The node-agent performs operations requested by the DAS such as starting or stopping one or more instances it is managing on a specific machine. Additionally, the node agent synchronizes configuration state between the DAS and the instances it manages.

A domain can have standalone instances and/or cluster(s) of instances. Java EE (J2EE) applications can be deployed to the domain, a specific instance, or a specific cluster. An already deployed application can be referenced by an instance or cluster by creating references to the application deployed on the domain. When as application is deployed to a specific instance, the application is placed in the domain level and an application-ref is created under the specific instance. Similarly, resources such as jdbc resources and related connection pools can be deployed/created at the domain level, a specific instance or a specific cluster. A resource that already exists can be used by other instances by creating a resource reference to the resource on the domain. When a resource is deployed to a particular instance, then the resource is created at the domain level and a resource-ref is created in the instance referencing the resource in the domain.

All of the above functionality is based on the foundations provided by our domain model. The domain model is specified in the domain dtd. The dtd specifies configuration artifacts and their related attributes and sub-elements, through a pseudo hierarchical and largely referential structure. At the top level is the domain element under which all configurational elements lie.

Below, these subelements are explained for more information.
The domain can comprise of a set of 'applications', 'resources', 'configs', 'servers', 'clusters', 'loadbalancers', 'lb-configs' and 'node-agents'. These are subelements of the top level root element 'domain'. In addition, special token value elements called 'system-property' and name-value pair elements called 'property' can be defined in the domain and at all sub levels for customization and extension.

The element 'configs' comprises of a set of 'config' subelements each of which is a set of configurational elements such as 'http-service', 'iiop-service', 'jms-service', etc.

A 'config' can be shared by one or more 'servers' through references('config-ref'). In other words, a particular configuration can be referenced by one or more instances but customized for specifics like port, etc through 'system-property' definitions under that particular instance.

The element 'node-agents' comprises of one or more 'node-agent' entities which are lightweight java processes living on specific machines that are part of the administrative domain as explained in the earlier paragraphs. The 'node-agent' manages the lifecycle of the instances that are installed on the machine it manages. The node agent plays an important role in synchronizing the configuration state from the domain administration server with the instances it manages.

The element 'servers' comprises of one or more sub-elements of type 'server' referring to an individual instance.
Each 'server' instance references a 'config' through the attribute 'config-ref'. By default, creating an instance results in a '<instance-name>-config' being created with auto selected ports representing that particular instance's configuration. This config could be shared as explained above by referencing in another instance.

The subelement 'applications' comprises a set of application types that are deployed to the particular domain or instance or cluster.
An application can be any of several recognized types, namely, j2ee-application, web-module, ejb-module, connector-module, appclient-module, lifecycle-module and mbean

When an application is deployed in GlassFish, it is placed in the domain and made available to instances through 'application-ref'. In other words, instances ('server' elements) reference an application through the subelement 'application-ref'.

The subelement 'resources' comprises a set of resource types that go to support the application deployed. Several recognized types of resources are defined in the domain dtd most common of which are 'jdbc-resource',  'connector-resource', jdbc-connection-pool', 'connector-connection-pool', 'persistence-manager-factory-resource', 'resource-adapter-config', etc.

Resources are referenced by the instances through resource-refs. In other words, a particular 'server' would reference resources applicable to it by the subelement 'resource-ref'.

In a later blog, I will explain the loadbalancers and lb-configs elements.

Java EE/J2EE Web Hosting: Whats your view as an Application Deployer?

In an earlier blog, I sent out a questionnaire to Hosting Providers to get their view about Java web hosting.
Although the blog did not get any direct comment responses,  our engagement with the industry players is beginning to get us some of the responses.

In this blog, the focus is on user experience of customers of web hosting firms. 

Given that the basic Java Web Hosting environments currently are shared hosting offerings, it would interesting to hear experiences of what works(ease of use) and what doesn't (pain points) with Java EE/J2EE based application server hosting providers.

Some questions come to mind:
For the application deployer:

  1. Are full fledged J2EE hosting solutions (not just a web container) available at a reasonable cost?
  2. How far do you get control over the app server instance?
  3. How do you administer your applications ? Do hosting providers give you dedicated access to the administration consoles ?
  4. How do you administer/configure your instance? What level of control do hosting providers offer?
  5. If instance is shared, what level of freedom do you have when you deploy your app or configure some resource and the instance needs to be restarted?
  6. What control do you have over the instance crashing due to someone else's app bringing it down?
  7. Do Hosting Providers offer any monitoring infrastructure to allow you to monitor your apps/instance?
  8. Do web hosters offer dedicated application server instances for a reasonable price?
  9. What application server is typically offered to you when you choose a J2EE / JavaEE web hoster?
Your responses will again help us develop our future releases of GlassFish address some of the problems through technological  means.


Shreedhar Ganapathy


« June 2016