Monday Jan 26, 2009

Solaris Cluster 3.2U2 is now available

On behalf of the entire Solaris Cluster team, I am happy to announce the release of Solaris Cluster 3.2 1/09 today. This new release brings more features for high availability, virtualization, disaster recovery, flexibility, diagnosibility, and ease of use. This release also brings support for latest versions of many third party software applications.

If you would like to run Oracle RAC in Solaris Containers non-global zones, also called a zone cluster, check it out! If you would rather choose whether or not to have a device fenced in the event of a failure, you have the option to do so. Now, you do not need to create a separate partition for /globaldevices before installing your cluster. You can use ZFS for your root partition. "Exclusive-IP Zones" are now supported with Solaris Cluster to help you isolate Solaris Containers at IP level. A new configuration checker helps to ensure that your cluster is not vulnerable in the event of failures.

iPod version

Click on the links below to see what some of our team members have to say about the new features of Solaris Cluster 3.2 1/09.

  • Solaris Containers Cluster + Quorum Enhancements Flash / iPod
  • Solaris Containers Cluster for Oracle RAC Flash / iPod
  • Configuration Checker Flash / iPod
  • Optional Fencing + Optional Dedicated Partition Flash / iPod
  • Exclusive IP in Solaris Containers Flash / iPod
  • Agents Flash / iPod
  • Business Continuity with Solaris Cluster Geographic Edition Flash / iPod NEW

You could also read the detailed description of the new features in the Sun Cluster 3.2 1/09 Release Notes. Stay tuned for more details here.

Download Solaris Cluster 3.2 1/09 software here.

Jatin Jhala

Engineering Manager/Solaris Cluster 3.2U2 Release Lead

Monday Sep 29, 2008

Provisioning Sun Cluster using Sun N1 Service Provisioning System

The Sun Cluster product comes with various utilities that help you manage a single instance of a cluster. That includes a tool to install the Sun Cluster software and various data services, a GUI to manage Sun Cluster configuration details and status etc. However if you are planning to deploy many clusters in your datacenter you probably will feel the need of a tool that can ease the pain of planning and deploying multiple configurations.

The Sun Cluster framework plugin for N1 SPS is designed to be a solution to fill that gap. If you are not very familiar with the N1 SPS technology, take a look here. Essentially, this is a free to download software that can perform provisioning of OS and deployment of applications. The N1 SPS software allows developers to build customized models for specific applications that can be integrated with the main SPS infrastructure. The Sun Cluster plugin for N1 SPS is one such plugin which seamlessly integrates basic Sun Cluster deployment scenarios in the SPS framework.

The main functionality provided by this plugin is to create Sun Cluster configurations. In addition, you can do the following:

    o delete Sun Cluster configurations
    o add nodes to existing cluster configurations
    o remove nodes from existing cluster configurations
    o install or uninstall Sun Cluster framework software
    o install or uninstall Sun Cluster agent software
    o manage quorum disks.

Once you import the Sun Cluster framework plugin in the SPS framework and perform a few setup tasks (refer to the readme file available in the plugin itself), you should be able to map your planned cluster configurations. The software will provide you a view of the available hosts and will allow you to define variables specific to a cluster or a host. With a few more clicks, your brand new clusters will be ready to use!

The software is being made available as a free, unsupported download here. Documentation for the software can be found here.

Happy deploying!

Asit Chakraborti
Solaris Cluster Engineering

Tuesday Sep 16, 2008

An Invitation from Solaris Cluster to Visit Us at Oracle Open World

We published a blog on this website on Monday August 18th, inviting you all to come see us at Oracle Open World next week at Moscone Center in San Francisco. If that wasn't enough to convince you (or even if it was), please take a look at "An Invitation from Sun Microsystems" posted here.

Both Paul and I will be at OOW and hope to see you there. And we have a special gift for everyone who can guess what the strange background noises are in the video (no -- those are not the sounds of the Sun demo booth being built....)

Burt Clouse
Senior Engineering Manager, Solaris Cluster

Monday Jul 14, 2008

LDoms guest domains supported as Solaris Cluster nodes

Folks, when late last year we announced support for Solaris Cluster in LDoms I/O domains on this blog entry , we also hinted about support for LDoms guest domains. It has taken a bit longer then we envisaged, but i am pleased to report that SC Marketing has just announced support for LDoms guest domains with Solaris Cluster!!

So, what exactly does "support" mean here? It means that you can create a LDoms guest domain running Solaris, and then treat that guest domain as a cluster node by installing SC software (specific version and patch information noted later in the blog) inside the guest domain and have the SC software work with the virtual devices in the guest domain. The technically inclined reader would, at this point, have several questions pop into his head... How exactly does SC work with virtual devices? What do i have to do to make SC recognize these devices? Are there any differences between how SC is configured in LDoms guest domains, vs non-virtualized environments? Read-on below for a high level summary of specifics:

  • For shared storage devices (i.e. those accessible from multiple cluster nodes), the virtual device must be backed by a full SCSI LUN. That means, no file backed virtual devices, no slices, no volumes. This limitation is required because SC needs advanced features in the storage devices to guarantee data integrity and those features are available only for virtual storage devices backed by full SCSI LUNs.

  • One may need to use storage which is unshared (ie is accessed from only one cluster node), for things such as OS image installation for the guest domain. For such usage, any type of virtual devices can be used, including those backed by files in the I/O domain. However, for such virtual devices, make sure to configure them to be synchronous. Check LDoms documentation and release notes on how to do that. Currently (as of July 2008) one needs to add "set vds:vd_file_write_flags = 0" to the /etc/system file in the I/O domain exporting the file. This is required because the Cluster stores some key configuration information on the root filesystem (in /etc/cluster) and it expects that the information written to this location is written synchronously to the disks. If the root filesystem of the guest domain is on a file in the I/O domain, it needs this setting to be synchronous.

  • Network based storage (NAS etc.) is fine when used from within the guest domain. Check cluster support matrix for specifics. LDoms guest domains don't change this support.

  • For cluster private interconnect, the LDoms virtual device "vnet" can be used just fine, however the virtual switch which it maps must have the option "mode=sc" specified for it. So essentially, for the command ldm subcommand add-vsw, you would add another argument "mode=sc" on the command line while creating the virtual switch which would be used for cluster private interconnect inside the guest domains. This option enables a fastpath in the I/O domain for the Cluster heartbeat packets so that those packets do not compete with application network packets in the I/O domain for resources. This greatly improves the reliability of the Cluster heartbeats, even under heavy load, leading to a very stable cluster membership for applications to work with. Note however, that good engineering practices should still be followed while sizing your server resources (both in the I/O domain as well as in the guest domains) for the application load expected on the system.

  • With this announcement all features of Solaris Cluster supported in non-virtualized environments are supported in LDoms guest domains, unless explicitly noted in the SC release notes. Some limitations come from LDoms themselves, such as lack of jumbo frame support over virtual networks or lack of link based failure detection with IPMP in guest domains. Check LDoms documentation and release notes for such limitations as support for such missing features are improving all the time.

  • For support of specific applications with LDoms guest domains and SC, check with your ISV. Support for applications in LDoms guest domains is improving all the time, so check often.

  • Software version requirements. LDoms_1.0.3 or higher, S10U5 and patches 137111-01, 137042-01, 138042-02, and 138056-01 or higher are required in BOTH the LDoms guest domains as well as in the I/O domains exporting virtual devices to the guest domains. Solaris Cluster SC32U1 (3.2 2/08) with patch 126106-15 or higher is required in the LDoms guest domains.

  • Licensing for SC in LDoms guest domains follows the same model as those for the I/O domains. You basically pay for the physical server, irrespective of how many guest domains and I/O domains are deployed in that physical server.
  • This covers the high level overview of how SC is to be deployed inside the LDoms guest domains. Check out the SC Release notes for additional details, and some sample configurations. The whole virtualization space is evolving very rapidly and new developments are happening ever so quickly. Keep this blog page bookmarked and visit it frequently to find out how Solaris Cluster is evolving along with this space.


    Ashutosh Tripathi
    Solaris Cluster Engineering

    Friday Jul 11, 2008

    Introduction to PxFS and insight on global mounting

    If you are using Sun Cluster software you are using the Proxy file system (PxFS). Global devices are made possible by PxFS, and global devices are central to device management in a cluster. The source is out there and now is a good time to explain some of the PxFS magic. I will give an overview of PxFS architecture with source references. I will do so via multiple blog entries. In this entry I will introduce PxFS and explain global mounting.

    PxFS is a protocol layer that distributes a disk-based file system in a POSIX-compliant and highly available manner among cluster nodes. POSIX-compliant simultaneous access from multiple nodes is possible without demanding file-level locking from applications. The only requirement from the administrator to do a global mount, is to make sure the mount point exists on all cluster nodes. After that add a "-g" to the mount command and your mount becomes global. The following blog entry explains the terminology.

    First let me show how easy creating and mounting a UFS file system globally is, without even using a dedicated physical device. I will create a lofi device, format it as UFS, and mount it globally.

    Note: Do not try this in Solaris 9, as that Solaris version has an lofs bug that can panic the system.

    # mkfile 100m /var/tmp/100m
    # LOFIDEV=`lofiadm -a /var/tmp/100m`
    # yes | newfs ${LOFIDEV}

    Let us mount the above lofi device cluster wide (make sure the target directory exists on all nodes).

    # mount -g ${LOFIDEV} /mnt

    Done! You can access /mnt on any node of the cluster and reach the UFS file system on the lofi device on node1 transparently.

    We will now get into the details of global mounting. I will take the example of globally mounting a file system in shared storage. We have a three-node cluster, with node2 and node3 having direct connection to shared storage. The svm metadevice "/dev/md/mydg/dsk/d42" is being mounted globally on the directory "/global/answer" from node1.

    For code reference, startup of PxFS services happens here.

    The mount subsystem is an HA service. An HA service in cluster parlance means that the service has failover capability. Any HA service has one primary and one or more secondaries. Any of the secondaries can become a primary if the current primary dies. This promotion of secondary to primary is transparent to applications.

    For any cluster setup, there is always only one mount-service primary. All other nodes will have mount-service secondaries. Every cluster node will also have a mount client created when global mounts are first enabled for the node.

    The mount primary and secondary are two faces of the mount replica object, which is created when the node joins the cluster. This is the code that creates the mount replica server. The replica framework ensures that there is only one primary at a time and promotes a secondary to primary when needed.

    Now for the sequence of operations while doing a global mount. Refer to the picture below. Various steps during the global mount are numbered in sequence. Pointing your mouse at the number will pop up a tooltip explaining the step, with links to corresponding code.

    <script type="text/javascript" src="/SC/resource/wz_tooltip.txt"></script> Step 1 Step 2 Step 3 Step 3 Step 4 Step 4 Step 5 Step 5 Step 5 Step 6 Step 6 Step 6

    Here are the steps from the above image map for easy reading.

    1. The global mount command, mount -g, can be issued from any cluster node. It gets into the kernel and a generic mount redirects the call to PxFS. At this point, the directory to be mounted on is locked.

    2. The PxFS client tells the mount server about this global mount request via the mount client on that node. The mount client will have the server reference.

    3. The mount server in turn asks every client except the originating node, in this case node1, to lock the mount point.

    4. For shared devices, the mount server creates a PxFS primary and secondary. The node on which the device is primaried becomes the PxFS primary. For local devices, the mount is non-HA and an unreplicated PxFS server is created. The lofi device example above, will result in an unreplicated PxFS server being created on node1.

      The PxFS server does a hidden mount of the device. Details of the mount is contained in the PxFS server object.

    5. The mount server passes a reference to the newly created server to all mount clients and asks the clients to do a user-visible PxFS mount.

    6. The mount client creates and adds a vfs_t entry of the same type as the underlying file system.

    Now the mount is visible on all clients. There is some more magic the mount subsystem does, like starting an fs replica when a node joins the cluster, or creating a new PxFS secondary or primary when a node that is connected to storage joins the cluster etc. The next installment will be about how regular file access in PxFS works.

    Thanks to Walter Zorn for the javascript library which made tooltips so much easier.

    Binu Philip
    Solaris Cluster Engineering

    Oracle Solaris Cluster Engineering Blog


    « November 2015