ScalDeviceGroup and ScalMountPoint RTs

Sun Cluster 3.2 brings you a set of cool features which will enhance the RAS (Reliability, Availability & Serviceability) of your applications. One such cool feature is the ScalDeviceGroup RT (Resource Type) and the ScalMountPoint RT. This is a small feature but is very important when it comes to the availability of your applications with respect to the storage and/or file systems resources that the applications use.

The ScalDeviceGroup RT increases the availability and diagnosability of your applications in that it monitors the health of the logical volumes (Volume Manager based; currently we support Solaris Cluster Volume Manager and Veritas Cluster Volume Manager only) and indirectly causes your appliations to be taken offline when it detects any problem with those volumes. If your applications and the ScalDeviceGroup resource are set up with the offline-restart dependency relationship, your application will be taken offline as soon as a problem with the storage is detected by the ScalDeviceGroup resource. It works this way: as soon as the ScalDeviceGroup resource detects a problem with the storage that it is monitoring, it disables itself; because of the offline-restart dependency, your application will immediately be taken offline. ScalDeviceGroup is a scalable RT which means you can have different applications on different nodes, using the same storage resources and one ScalDeviceGroup resource monitoring those storage resources on each of the nodes. The great flexibility here is that the detection of an actual/potentional storage problem on one node does not necessarily cause applications on all nodes to be taken offine - the application only on the node where a problem has been detected by the ScalDeviceGroup resource will be taken offline. For example, if a node loses its connection to a shared storage while the remaining nodes still have access to that shared storage, the application on former will be taken offline while the applications on the rest of the nodes will continue to be online. When the storage problem is fixed, enable the ScalDeviceGroup resource by issuing the following command :

clresource enable

Once the ScalDeviceGroup resource comes online, becaues of the offline-restart dependency, your application will immediately be brought online. This is as simple as that.

The ScalMountPoint RT increases the availability and diagnoseability of your application from the point of view of the file system mountpoints that your applications use. Like the ScalDeviceGroup RT, your applications and the ScalMountPoint resource need to be set up with the offline-restart dependency relationship; as soon as a problem with the mountpoint is detected by the ScalMountPoint resource, your application will immediately be taken offline. Note, that the entity that the ScalMountPoint RT monitors is the mountpoint where the filesystem is mounted and it periodically checks whether the mountpoint directory points to an available (and usable) filesystem. Currently, the ScalMountPoint RT supports only two types of file systems : Sun's Shared QFS file system and NetApp's NAS file system. When the problem with the mountpint is fixed, enable the ScalMountPoint resource by issuing the following command :

clresource enable

Once the ScalMountPoint resource comes online, your application will immediately be brought online. This is as simple as that.

The ScalDeviceGroup and the ScalMounPoint RTs brigde a gap between the applications and the underlying storage resources, whether volumes or filesystem mountpoints. Previously applications did not have any visibility to the underlying storage resources and hence were not controllable if storage resources went bad. For example, if your applications depends on a storage resource, say a volume, and if that volume goes bad, the ScalDeviceGroup resource will detect that failure during its periodic probe and will indirectly cause your application to be taken offline, so that your application does not get into a "restart-and-fail" loop. Taking the application offline gracefully can prevent data loss/corruption. The big advantage here is that you will have a clue to why your application went offline suddenly. Plus you can tell your System Administrator what went wrong with the storage, as both the ScalDeviceGroup and the ScalMountPoint resources update their Status field with a brief message as to what kind of failure they detected in the storage resources. This Status field can be seen in the regular output from the Sun Cluster "clrs status" command. Having so much of visibility to the state of the storage resources, while still being in the aplication layer, greatly increases the RAS of your applications and reduces potential system downtime.

Though, both the ScalDeviceGroup and the ScalMountPoint RTs were designed to be used only for Oracle RAC application, the latter can be used by any scalable application. Because of a hard dependency on a Resource of type SUNW.rac_svm (for Solaris Cluster Volume Manager) or SUNW.rac_cvm (for Veritas Cluster Volume Manager), the ScalDeviceGroup RT can be used only for Oracle RAC application.

Example 1: Configuring your application and the ScalDeviceGroup resource

Step i) Registering ScalDeviceGroup RT with Sun Cluster

clresourcetype register ScalDeviceGroup

Step ii) Creating a Resource Group

clresourcegroup create -n Node1,Node2,Node3 scal_dg_rg

Step iii) Creating the ScalDeviceGroup resource (for Solaris Cluster Volume Manager)

clresource create -t ScalDeviceGroup -g scal_dg_rg -x DiskGroupName=dg1 scal_dg_rs
-y Resource_dependencies=rac_svm

Step iv) Setting the offline-restart dependency between your application resource and the ScalDeviceGroup resource

clresource set -p Resource_dependencies_offline_restart=scal_dg_rs my_application_rs

Step v) Bringing online the Resource Group containing the ScalDeviceGroup resource

clresourcegroup online scal_dg_rg

Example 2: Configuring your application and the ScalMountPoint resources

Step i) Registering the ScalMountPoint resource

clresourcetype register ScalMountPoint

Step ii) Creating a Resource Group

clresourcegroup create -n Node1,Node2,Node3 scal_mountpoint_rg

Step iii) Creating the ScalMountPoint resource

clresource create -t ScalMountPoint -g scal_mountpoint_rg
-x MountPointDir=/a_mountpoint_dir -x FileSystemType=nas
-x TargetFileSystem=netapp21:/export/database1 scal_mountpoint_rs

Step iv) Setting the offline-restart dependency between your application resource and the ScalMountPoint resource

clresource set -p Resource_dependencies_offline_restart=scal_mountpoint_rs my_application_rs

Step v) Bringing online the Resource Group containing the ScalMountPoint resource

clresourcegroup online scal_mountpoint_rg

Krishnendu Sadhukhan
Sun Cluster Engineering

Comments:

why is cluster command so different between sc3.1 and sc3.2?

Posted by henry on November 20, 2006 at 12:12 PM PST #

Hi, sc3.2 has a new set of CLI's which are object-oriented and easier to use and remember. sc3.2 also still supports the old CLI's - you can use whichever you're comfortable with. The new CLI's are prefixed with "cl" while the old ones have "sc" prefix.

Posted by zoram on November 20, 2006 at 04:43 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mkb

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today