Friday Mar 21, 2008

Upgrade Methods

In a data-center environment, everybody's goal is to minimize system downtime and maximize the features available in the products they use. That is why every organization is striving to get a high-availability (HA) solution for their applications that are running in a production environment. As every technology evolves day by day, as every product is getting better day by day, every customer-centric organization will strive to create more value to their customers by looking for more innovative features in the software they choose to run their data center, along with better performance and competitive pricing. As such high-value products become available on the market, organizations choose these new and improved products to replace the software products or versions that the data center is currently running on. At the same time, it is a challenging task to upgrade or migrate an existing setup or environment to the latest one with minimal downtime. This is particularly true with the HA solutions that are available in the market.

Be assured that, by having Sun Cluster software as an HA solution for your data center, there is no need to worry about getting the latest version of Sun Cluster in your data center, no matter whether the setup is new or old. The beauty of Sun Cluster software is that it includes multiple methods for upgrading the cluster software, along with the Solaris Operating System and other software in the cluster stack.

The process of upgrading Sun Cluster to the latest version is less complex than one might expect. Some Sun Cluster upgrade methods involve just a single switchover time, the only system downtime the upgrade will require. Without losing any data, without significant downtime, you will be up and running the latest HA solution from Sun Cluster. If you are wondering, is a hassle-free process possible with a minimum downtime, the answer is yes; definitely count on us. We are going to tell you how it is possible.

This article will describe the different upgrade solutions that are available with Sun Cluster 3.2. The individual can choose which particular upgrade method to use, per one's requirements and comfort.

Rolling Upgrade 
In a rolling upgrade, you upgrade the cluster software to an update release on one or more nodes at a time, depending on the number of nodes and your cluster topology. The cluster services continue on the other nodes except for the time it takes to switch services from the node(s) to be upgraded to the upgraded node(s). The rolling upgrade method is supported only for upgrading minor updates of Sun Cluster software and the Solaris Operating System.

Cluster downtime is limited to the time needed for the cluster to switch services over to an upgraded node.

Dual Partition Upgrade 
In a dual-partition upgrade, you divide the cluster into two groups of nodes. You bring down one group of nodes and upgrade those nodes, while the other group of nodes continues to provide services. After you complete the upgrade of the first group of nodes, you switch services to those upgraded nodes. You then upgrade the remaining nodes and boot them back into the cluster to join the rest of the upgraded nodes.

Cluster downtime is limited to the time needed for the cluster to switch services over to the upgraded partition.

Live Upgrade Method 
The Live Upgrade method uses the normal Solaris Live Upgrade method. It requires one additional hard disk, which is called the "alternate root" disk. The current root disk continues to host the cluster services until the upgrade operations are successfully completed and committed in the alternate root disk.

A live upgrade maintains your previous cluster configuration until you have upgraded all the nodes and you commit to the upgrade. If the upgraded configuration causes a problem, you can revert to your previous cluster configuration until you can rectify the problem.

Cluster downtime is the single reboot time of the systems.

Standard Upgrade Method 
In a standard upgrade method, you shut down the entire cluster before you upgrade the cluster nodes. You return the cluster to the production environment after all nodes are fully upgraded successfully. The cluster will be out of operation until upgrade of Solaris Cluster software, along with the Solaris Operating system if necessary, is completed.

If downtime is not a significant concern, this can be the most efficient method to upgrade a cluster. Use a Cluster Control Panel tool, such as cconsole or cssh, to access all nodes at once and perform the upgrade on all nodes simultaneously from the master console window.

Arun Kurse/Venugopal Ns
Solaris Cluster Engineering

Friday Dec 21, 2007

IEEE Cluster 2007 Conference

The IEEE Cluster 2007 conference was held in Austin, Texas this year. There were plenty of hands-on tutorials, paper presentations, poster sessions and panel discussions all related to cluster computing which encompasses both high-performance cluster computing and high-availability clustering.

I had the privilege to co-author and present a poster paper  named CHAF - An Object Oriented Framework for Configuring Applications in a Clustered Environment which was implemented in Sun Cluster 3.2.  The live demonstration of this implementation was performed on a laptop with a lab cluster at the back end. My session and demo were well received that it was referenced in an email by one of  Sun customers to Sun later.

Notable topics that were the focus of several research papers and panel discussions at the conference included multi-core and virtualization.  Our very own Andy Bechtolsheim gave the opening keynote on "Scaling to Petaflops" discussing the challenges and opportunities associated with peta-scale,  and the work Sun has done and continue to do in this area.  More details/links on these topics can be found at my personal blog.

I also got an opportunity to visit the impressive Texas Advanced Computing Center at the University of Austin where a new supercomputer (using Sun machines and the new Sun Magnum switch) is being built.  It will be the largest supercomputer in the world including 4000 nodes when it becomes operational.

The closing keynote on "the Challenges and Rewards of Petascale Clusters" by Mark Seager from Lawrence Livermore National Labs reminded the attendees that today mainstream technologies (e.g. virtual machines and object oriented design) came from the research community some 20 years ago, and projected that parallel programming in research mode today will be a mainstream technology in the near future.  I have no doubt none whatsoever that Sun will play a key role in it!

Augustus Diraviam
Solaris Cluster Engineering

Friday Dec 14, 2007

Sun Tech Days in Shanghai, China

Sun Tech Days in Shanghai, China 2007 was held in the Shanghai International Convention Center. Over 1000 registered attendees were there during the first part of the Tech Days event, and around 200 stayed behind on the third day the start of the second part known as Community Event which was open to all software developers.

During the first two days, I helped man the G11N (globalization) booth where I answered questions related to Sun Cluster including some having to do with ZFS.  At the Community Event on the third day, I had the opportunity to present Open High Availability Cluster (OHAC) which was part of the umbrella OpenSolaris talks, in the afternoon. 

Of the 200 attendees who stayed behind for the Community Event, 50 came to the OHAC presentation. A quick poll at the start of the presentation indicated that half of which were familiar with the concept of HA cluster.  Even though the Sun Cluster presentation was scheduled for only 50 minutes, it was good 50 minutes as I showed how Sun Cluster actually works, the advantage of Sun Cluster in the kernel and the real HA Sun Cluster solution vs other built-in HA capability softwares which are not real HA in reality.  The live demo took a big chunk of presentation time, so there was only enough time for two questions during the official Q&A session at the end.

Based on the number of questions interactively asked during the live demo and even more questions asked after the official presentation and Q&A session, the interest in Sun Cluster technology is definitely there and growing!

Leland Chen
Solaris Cluster Engineering

Tuesday Dec 04, 2007

Sun Cluster Geographic Edition is now Open-Source

The source code for the Sun Cluster Geographic Edition product is now available in the HA Clusters community on! In addition to browsing the Open High Availability Cluster Geographic Edition source code, you can download it and build it with either the Sun Studio or the gcc compiler.

This source code release represents the second phase of the complete Sun Cluster open-sourcing roadmap. The first phase, the Sun Cluster Agents, occurred last June, and the third and final phase, the Sun Cluster core gate, will happen sometime next year.

I'm particularly pleased that, in addition to product code, this release of the Geographic Edition source includes test code, man pages, and globalization source.

Nick Solter

HA Clusters community facilitator and
Sun Cluster developer

Wednesday Nov 21, 2007

Sun Tech Days in Beijing and Tokyo

I attended Sun Tech Days at Beijing and Tokyo a couple weeks ago. It was a 3-day event with one day dedicated to Open Solaris and NetBeans. The attendees in Beijing were mostly Java developers and university students. When in Tokyo the attendees were mostly developers from big enterprises with hands-on experience on Sun Cluster and/or other clustering technologies.

I did a presentation on Open High Availibility Cluster in both cities. The audience, especially in Tokyo, were very engaged in the talk. I got dozens of questions, mostly on hardware (network interface, storage device) supported by Open HA Cluster, quorum, live upgrade, and Oracle RAC on Open HA Cluster. There was even one question about Sun Cluster Geographical Edition.

I also got the opportunity to demonstrate Open HA Cluster on my laptop at the booth area. A lot of people stopped by, especially in Beijing, and looked at the demo. I had a failover Apache service configured on two zones in my laptop, running Solaris Express 9/07 and Sun Cluster Express 7/07. When the Apache service went down (killed with kill -9), Open HA Cluster would restart the service. When the zone was rebooted, Open HA Cluster would fail over the service to another zone. Many questions I got were about the hardware configuration on the laptop, scalable services, and Solaris zones. I got a lot of compliments on my little IBM Thinkpad X61S. It demonstrated that Open HA Cluster is ready to run on most laptops with off-the-shelf hardware.

A couple days after the event, I received an email from a manager at a company in Beijing, inquiring about documentations on setting up zones and Open HA Cluster on Solaris Express. This is encouraging. I am glad to see someone out there is exploring the technology from us.

Elaine Ding
Solaris Cluster Engineering


Oracle Solaris Cluster Engineering Blog


« July 2016