Virtualization in today's world is attracting quite a lot of attention. In the era of power-packed machines, one would not prefer to dedicate an entire piece of costly hardware to a single cause. From running multiple applications to multiple operating systems (OS) to multiple hardware domains, it's all about making the most out of a box. But to truly leverage the positives of virtualization, the applications need to be supportive enough. Most modern-day software run on virtual platforms as if virtualization was transparent, which is really what the intent is.
For the sake of this experiment, VMware was chosen as the virtualization technology, mostly due to the fact that it has been around for quite some time now. Support for the Solaris(TM) 10 OS (both 32- and 64-bit versions) as a guest operating system, is already present in multiple flavors of VMware. VMware ESX Server 3.0.1 (http://www.vmware.com/products/vi/esx/) with the features it provides, was the platform of choice. Documentation for VMware ESX server can be found at http://www.vmware.com/support/pubs/vi_pubs.html. I would like to add that I did try my hands on VMware Workstation and VMware Server, but certain areas like networking and shared devices in a clustered setup with Sun Cluster 3.2 software, had problems in areas like Private Interconnects and Device Fencing.
The aim was to run Sun Cluster 3.2 software on top of Solaris 10 guest OSes, and thereby cluster VMware Virtual Machines running the Solaris OS. The initial assumption was that, since Solaris OS runs on VMware without problems, Sun Cluster software should work just fine! It makes sense to mention up front here that these are initial steps in this direction, and the Sun Cluster team is continuously investigating various virtualization techniques and Sun Cluster support for them. The setup mentioned here was done on 32-bit Solaris (purely due to available hardware at the time of this setup), but I must say that I strongly believe that things won't look or work any different in a 64-bit environment.
Given below are the various aspects of the setup. All mention of nodes and guests here refers to the virtual Solaris guests on VMware ESX Server, which are to be clustered using Sun Cluster 3.2 software.
P.S. : Images have been shown as thumbnails for the sake of brevity of the blog. Please click on the images to enlarge them.
I ) No. of cluster nodes
The maximum number of nodes that could be supported would be dictated by Sun Cluster software. Having VMware in the picture doesn't affect this aspect. The setup here has been tried with 2 and 3 node clusters. For the purpose of illustration, we have upto 3 physical hosts (Dual CPU SunFire V20z machines with 4 GB RAM), with each of the clustered nodes on a different physical host. However this could easily be extrapolated to the cluster members being on the same physical machine or a combination thereof.
VMware ESX Server provides various ways to add virtual SCSI storage devices to the Solaris Guests running on it. VMware virtual devices could be on :
Direct-attached SCSI storage
Fibre Channel SAN arrays
iSCSI SAN arrays
In all cases where the VMware ESX Server abstracts any of the above underlying storage devices, so that the guest just sees a SCSI disk, there is no direct control of the devices from the guest. The problem here, when it comes to clustering, is that SCSI reservations don't seem to work as expected in all cases. Sun Cluster fencing algorithms for shared devices requires SCSI reservations. So SCSI reservations not working doesn't help the cause. One could, however, use such devices when the intent is not to share them between cluster nodes, that is, they should be local devices.
However, VMware ESX has a feature called Raw Device Mapping (RDM), which allows the guest operating systems to have direct access to the devices, bypassing the VMware layer. More information on RDM can be found in VMware documentation. The following documents could be starting points:
RDM works with either Fibre Channel or iSCSI only. In the setup here, a SAN storage box connected through Fibre Channel was used for mapping LUNS to the physical hosts. These LUNS could then be mapped onto the VMware guests using RDM. SCSI reservations have been found to be working fine with RDM (both SCSI-2 Reserve/Release and SCSI-3). These RDM devices could therefore be used as shared devices between the cluster nodes. However, of course they can also serve as local devices for a node.
One point to note here is that the virtual SCSI controllers for the guest OSes need to be different for the local and the shared disks. This is a VMware requirement when sharing disks. Also the compatibility mode for RDM, to allow direct access to the storage from the guest, should be “Physical”. For detailed information, please refer to VMware ESX documentation.
Figure 1 (click to
enlarge) is a screenshot of the storage configuration on a physical host. It shows the LUNs from the SAN storage which the ESX Server sees.
Figure 1. Storage Configuration (SAN through Fibre Channel) on a VMware ESX Server
Figure 2 (click to enlarge) is a peek at what the device configuration for a Solaris guest OS looks like. It shows that a few devices (hard disks) are Mapped Raw LUNS. This of course is being done through RDM. Each such RDM mapping shows the virtual HBA adapter for the guest (vmhba1 here), the LUN ID from the SAN storage that is being mapped (28 here) and the SCSI bus location for that device (SCSI 1:0 for this guest here). The disks which show “Virtual Disk” against them, are devices abstracted by the VMware layer to the guest OS. Note that there are 2 SCSI controllers for the guest OS. SCSI Controller 0 is used for the local devices, and SCSI Controller 1 is used for devices that are shared with other guests. Also note that the compatibility mode for the RDM mapped device is “Physical”. This is to make sure that the guest OS has direct and uninhibited access to the device.
For sharing devices (mapped through RDM) between guests on the same physical host, one should enable “SCSI Bus Sharing” in the “Virtual Machine Properties “ for the SCSI controller that caters to the shared devices, and set it to “Virtual”. In Figure 2, SCSI Controller 1 in this setup is for sharing disks across physical hosts. Then choose “Use an existing Virtual Disk” while adding a hard disk and select the “.vmdk” file for that device that is intended to be shared. For example, Figure 2 shows the location of the .vmdk file for “Hard Disk 2”.
Sharing RDM mapped devices between guest OSes across physical hosts, would involve setting the SCSI Bus Sharing to "Physical", as shown in Figure 2, and mapping the same LUN from the SAN storage to the physical hosts running VMware ESX. Using RDM then, one would map the same LUN as devices on all guest OSes that need to share the device. e.g. Node 1 in this setup has LUN ID 28 mapped as "Hard Disk 2". The same LUN should be mapped as a hard disk in all other guest OSes which intend to have LUN ID 28 as a shared device with Node 1.
Figure 3 (click to enlarge) here is the guest Solaris OS showing the devices added to it. Controller “c1” has the 2 local disks shown in Figure 2, and Controller “c2” has the shared disks.
Figure 3. Guest Solaris OS Showing Devices Presented To It From VMware ESX
In addition to SCSI devices, the guests could also use iSCSI or NAS devices. The functioning and setup for them would be similar to that on a normal Solaris machine.
Both SCSI and Quorum Server type quorum devices were tried out, without problems. Do keep in mind here that a SCSI quorum device should be added to the guest via RDM. The guest OS should have direct access on the device.
NAS Quorum is expected to work as is.
VMware ESX Server's networking features are indeed rich. With virtual switches, VLAN support on virtual switches, NIC teaming etc., networking worked really fine. In fact, for the setups here, a single NIC on the physical host was used to route traffic for the guest OSes. This included both public and private interconnect traffic in a clustered scenario. However, segregation on the virtual switch level or the physical NIC level can easily be achieved, and would be ideal in a production environment. One could either have separate VLANs (on the same virtual switch) for the public and private traffic, or have dedicated physical NICs mapped onto different virtual switches for each type of traffic.
Do note that the "pcn" driver, which gets loaded by default in Solaris running on VMware, could possibly be a little unstable. So it is advised that one install the VMware tools on all the Solaris guest OSes involved, to switch to the "vmxnet" driver, which is pretty stable.
Figure 4 (click to enlarge) is a screenshot of the network configuration on a physical host. It shows the different virtual switches and the associated physical NICs, which cater to the traffic from the virtual machines on a physical host. Each virtual switch has a “Virtual Machine Port Group”, which is the interface to the external world for the guest OSes. In a typical production setup for Sun Cluster software, one could have a Port Group (say “VM Network”) for all public network traffic, and another dedicated Port Group (say “Interconnect Network 1”) for the private interconnect traffic between the cluster members.
Figure 4. Networking Setup on a Physical Host Running VMware ESX Server.
In this setup since we have all traffic (public and private)
from the clusterized Solaris guests, going through a single Port Group,
hence both the public and the private interconnect adapters for the
guest shows the same “VM Network” against them in Figure 2 shown earlier. We have leveraged Single NIC support for the private interconnects here. This could save a PCI slot on a VMware guest, which the user may want to use for adding more devices for the guest OS. Single NIC support would be available to customers pretty soon in the Sun Cluster 3.2 patch.
Note that the maximum number of PCI slots available to each guest OS in VMware ESX Server 3.0.l is 5 slots. This would mean that the total number of NICs + SCSI controllers <= 5. For more information, refer to VMware ESX documentation.
V) Closing comments
The hardware setup used for this experiment :
3 Dual CPU (2 X 1.792 GHz) SunFire V20zs with 4 GB RAM, 1 local disk, QLA 2300 FC-AL Adapter for SAN.
Sun StorEdge 3510 for SAN, connected to the physical hosts through Fibre Channel.
The cluster functions just as a normal cluster would do. We created Solaris Volume Manager metasets/VxVM disk groups and configured Sun Cluster agents to make applications highly available. All in all, Sun Cluster 3.2 software runs in a virtualized VMware setup, as expected, clustering Solaris guests, and adding yet another dimension to usability and availability. An overview of the configured cluster can be seen here.
Sorry for the long post ! But that was a handful of things to mention ! Feedback/Comments are welcome as always.
Sun Cluster Engineering.