ScalDeviceGroup and ScalMountPoint RTs
By hhnguyen on Nov 20, 2006
Sun Cluster 3.2 brings you a set of cool features which will enhance the RAS (Reliability, Availability & Serviceability) of your applications. One such cool feature is the ScalDeviceGroup RT (Resource Type) and the ScalMountPoint RT. This is a small feature but is very important when it comes to the availability of your applications with respect to the storage and/or file systems resources that the applications use.
The ScalDeviceGroup RT increases the availability and diagnosability of your applications in that it monitors the health of the logical volumes (Volume Manager based; currently we support Solaris Cluster Volume Manager and Veritas Cluster Volume Manager only) and indirectly causes your appliations to be taken offline when it detects any problem with those volumes. If your applications and the ScalDeviceGroup resource are set up with the offline-restart dependency relationship, your application will be taken offline as soon as a problem with the storage is detected by the ScalDeviceGroup resource. It works this way: as soon as the ScalDeviceGroup resource detects a problem with the storage that it is monitoring, it disables itself; because of the offline-restart dependency, your application will immediately be taken offline. ScalDeviceGroup is a scalable RT which means you can have different applications on different nodes, using the same storage resources and one ScalDeviceGroup resource monitoring those storage resources on each of the nodes. The great flexibility here is that the detection of an actual/potentional storage problem on one node does not necessarily cause applications on all nodes to be taken offine - the application only on the node where a problem has been detected by the ScalDeviceGroup resource will be taken offline. For example, if a node loses its connection to a shared storage while the remaining nodes still have access to that shared storage, the application on former will be taken offline while the applications on the rest of the nodes will continue to be online. When the storage problem is fixed, enable the ScalDeviceGroup resource by issuing the following command :
Once the ScalDeviceGroup resource comes online, becaues of the offline-restart dependency, your application will immediately be brought online. This is as simple as that.
The ScalMountPoint RT increases the availability and diagnoseability of your application from the point of view of the file system mountpoints that your applications use. Like the ScalDeviceGroup RT, your applications and the ScalMountPoint resource need to be set up with the offline-restart dependency relationship; as soon as a problem with the mountpoint is detected by the ScalMountPoint resource, your application will immediately be taken offline. Note, that the entity that the ScalMountPoint RT monitors is the mountpoint where the filesystem is mounted and it periodically checks whether the mountpoint directory points to an available (and usable) filesystem. Currently, the ScalMountPoint RT supports only two types of file systems : Sun's Shared QFS file system and NetApp's NAS file system. When the problem with the mountpint is fixed, enable the ScalMountPoint resource by issuing the following command :
Once the ScalMountPoint resource comes online, your application will immediately be brought online. This is as simple as that.
The ScalDeviceGroup and the ScalMounPoint RTs brigde a gap between the applications and the underlying storage resources, whether volumes or filesystem mountpoints. Previously applications did not have any visibility to the underlying storage resources and hence were not controllable if storage resources went bad. For example, if your applications depends on a storage resource, say a volume, and if that volume goes bad, the ScalDeviceGroup resource will detect that failure during its periodic probe and will indirectly cause your application to be taken offline, so that your application does not get into a "restart-and-fail" loop. Taking the application offline gracefully can prevent data loss/corruption. The big advantage here is that you will have a clue to why your application went offline suddenly. Plus you can tell your System Administrator what went wrong with the storage, as both the ScalDeviceGroup and the ScalMountPoint resources update their Status field with a brief message as to what kind of failure they detected in the storage resources. This Status field can be seen in the regular output from the Sun Cluster "clrs status" command. Having so much of visibility to the state of the storage resources, while still being in the aplication layer, greatly increases the RAS of your applications and reduces potential system downtime.
Though, both the ScalDeviceGroup and the ScalMountPoint RTs were designed to be used only for Oracle RAC application, the latter can be used by any scalable application. Because of a hard dependency on a Resource of type SUNW.rac_svm (for Solaris Cluster Volume Manager) or SUNW.rac_cvm (for Veritas Cluster Volume Manager), the ScalDeviceGroup RT can be used only for Oracle RAC application.
Example 1: Configuring your application and the ScalDeviceGroup resource
Step i) Registering ScalDeviceGroup RT with Sun Cluster
clresourcetype register ScalDeviceGroup
Step ii) Creating a Resource Group
clresourcegroup create -n Node1,Node2,Node3 scal_dg_rg
Step iii) Creating the ScalDeviceGroup resource (for Solaris Cluster Volume Manager)
clresource create -t ScalDeviceGroup -g scal_dg_rg -x DiskGroupName=dg1 scal_dg_rs
Step iv) Setting the offline-restart dependency between your application resource and the ScalDeviceGroup resource
clresource set -p Resource_dependencies_offline_restart=scal_dg_rs my_application_rs
Step v) Bringing online the Resource Group containing the ScalDeviceGroup resource
clresourcegroup online scal_dg_rg
Example 2: Configuring your application and the ScalMountPoint resources
Step i) Registering the ScalMountPoint resource
clresourcetype register ScalMountPoint
Step ii) Creating a Resource Group
clresourcegroup create -n Node1,Node2,Node3 scal_mountpoint_rg
Step iii) Creating the ScalMountPoint resource
clresource create -t ScalMountPoint -g scal_mountpoint_rg
-x MountPointDir=/a_mountpoint_dir -x FileSystemType=nas
-x TargetFileSystem=netapp21:/export/database1 scal_mountpoint_rs
Step iv) Setting the offline-restart dependency between your application resource and the ScalMountPoint resource
clresource set -p Resource_dependencies_offline_restart=scal_mountpoint_rs my_application_rs
Step v) Bringing online the Resource Group containing the ScalMountPoint resource
clresourcegroup online scal_mountpoint_rg
Sun Cluster Engineering