News, tips, partners, and perspectives for the Oracle Solaris operating system

Eating our own dog food

Guest Author

You have heard about "Practice what you preach", and here at Solaris Cluster Oasis we
often talk about how important high availability is for your critical applications. Beyond just the good sense of using our own products, there is no
substitute for actually using your own product day in and day out. It
gives us engineers a very important dose of reality, in that any
problems with the product have a direct impact on the our own daily
functioning. That begs the question: How is the Solaris Cluster group dealing with its own high availability needs?

In this blog entry we teamed up with our Solaris Community Labs team to provide our regular visitors to Oasis with a peek into how SC plays a role in running key pieces of our own internal infrastructure. While a lot of SUN internal infrastructure uses Solaris Cluster, for the purpose of this blog entry, we landed up choosing one of the internal clusters which is used directly for Solaris Cluster Engineering team for their own home directories (yes, that is right, home directories, where all of our stuff lives, is on Solaris Cluster), and developer source code trees.

 See below for a block diagram of the cluster, continue after the diagram for more details about the configuration.


Here are some more specifications of the Cluster:

- Two T2000 servers

- Storage consists of four 6140's presenting RAID5 LUNs. We choose the 6140s to provide RAID, partly because they were there and also partly to leverage the disk cache on these boxes to improve performance

- Two Zpools configured as RAID 1+0, one for home directories and another for workspaces (workspace is engineer-speak for source code tree)

- Running S10U5 (5/08) and SC3.2U1 (2/08)

High Availability was a key requirement for this deployment as downtime for a home directory server with large number of users was simply not an option. For the developer source code too, downtime would mean that long running source code builds would have to be restarted, leading to costly loss of time, not to mention having lots of very annoyed developers roaming the corridors of your workplace is never a good thing :-)

Note that it is not sufficient to merely move the NFS services from one node to other during the failover, one has to make sure that any client state (including file locks) are failed over. This ensures that the clients truly don't see any impact (apart from perhaps a momentary pause). Additionally, deploying different Zpools on different cluster nodes means that the compute power of both nodes is utilized when both are up, while we continue to provide services when one of them is down.

Not only did the users benefit from the high availability, but the
cluster administrators gained maintenance flexibility. Recently, the SAN
fabric connected to this cluster was migrated from 2 GBps to 4 GPps and
a firmware update (performed in single-user mode) was needed on the
fibre channel host bus adapters (FC-HBA's). The work was completed
without impacting services and the users never noticed. This was simply
achieved by moving one of the Zpools (along with the associated NFS
shares and HA IP addresses) from one node to another (with a simple
click on the GUI) and upgrading the FC-HBA firmware. Once the update was
complete, repeat the same with the next node and the work was done!

While the above sounds useful for sure, we think there is a subtler point here, that of "confidence in the product". Allow us to explain: While doing a HW upgrade on a live production system as described above is interesting and useful, what is really important is the ability of the system administrator to be able to do this without taking a planned outage. That is only possible if the administrator has full confidence that no matter what, my applications would keep running and my end users will not be impacted. That is the TRUE value of having a rock solid product.

Hope the readers found this example useful. I am happy to report that the cluster has been performing very well and we haven't (yet) have had episodes of angry engineers roaming our corridors. Touch wood!

During the course of writing this blog entry, i got curious about the origins of the phrase "Eating one's own dog food". Some googling led me to  this page, apparently this phrase has its origins in TV advertising and came over into IT jargon via Microsoft, interesting....

Ashutosh Tripathi - Solaris Cluster Engineering

Rob Lagunas: Solaris Community Labs

Join the discussion

Comments ( 5 )
  • Abby Wednesday, April 1, 2009

    Do you have more information concerning your setup in the slide?

    Can this function with ZFS write logging to SSD enabled?

    How are the two backend SANs mirrored?

  • Paul Monday, December 28, 2009

    How do you manage to not have an outage given that SC is going to export the pool to move it to the other node?

  • ashu Monday, January 4, 2010

    Hi Paul,

    You are right that while the pool is being moved, the filesystem is not available. However, the way HA-NFS failover works is that while the filesystem is not available, the NFS server is also stopped in a way so that the NFS clients simply keep trying to contact the server and not really "error out". After the failover is complete, the NFS clients simply continue.

    So, as seen from an end-user perspective, they just see a momentary blip in service, no errors. If the failover is fast enough (which, with Solaris Cluster, indeed is, thanks to some pretty deep integration between SC and Solaris/NFS), the end effect is not really noticible.

    Strictly speaking though, is there an "outage", however small and low impact it maybe? You are right, yes, there is.



  • Paul Monday, January 4, 2010

    OK. I was kind of hoping that you'd done something that allowed failover without any blips in service at all.

    Can you give a little more detail as to what you're doing to the NFS server so clients don't explode?

    I would assume taking down the IP so the clients report server not responding and then retry, but if you're doing something more interesting it'd be nice to know.



  • ashu Friday, January 8, 2010

    Hi Paul,

    The details on how SC does NFS failovers to make them as smooth/fast as possible are certainly a bit detailed, and perhaps need another blog entry. At a high level though, lock failover, in-kernel state preservation (for NFSv4), a way to very cheaply and quickly determine if the NFS server is healthy, avoidance of triggering grace period for NFS shares which are NOT failing over (NFSv4) are some of the key things come to mind.

    IP failover is certainly part of that as you mentioned. Another part of that is just over the years hammering at issues as they happen to get into a solution which is rock solid. This involved not only Cluster, but also changes/improvements in Solaris/NFS. As i am writing this, many-many specifics come to mind, but the larger point is that this solution has been "baked" for a long time and that a very close integration of SC and Solaris/NFS is the key.



Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.