Sun Cluster Service Level Management
By hhnguyen on Oct 24, 2006
Soon to be released, Sun Cluster 3.2 has a great feature called "Sun Cluster Service Level Management". With this feature, it is possible to see telemetry factors of system resources utilization by Sun Cluster. With a very easy setup using clsetup/scsetup command, you can view CPU, memory, swap and network utilization of the cluster node, resource groups and individual system components like disk, adapters etc.
Another interesting feature is to perform CPU control for Sun Cluster resource groups. This functionality is built on CPU control facility available in Solaris operating system. For example, in Sun Cluster 3.2 running on Solaris 10, you can
- Assign CPU shares to resource groups running in global or non-global zones.
- Set maximum or minimum number of processors in a dedicated processor set for resource groups.
By monitoring system resource usage through Sun Cluster, you can collect data that reflects how a service using specific system resources is performing, discover resource bottlenecks and overloads or even under utilized hardware resources. Based on this data you can assign applications to nodes that have the necessary resources and choose which node to failover to.
Sun Cluster Service Level Management uses its own Derby based database to store telemetry data and needs to be configured along with a Sun Cluster HAStoragePlus resource. It needs its highly available storage (in the form of a mount point) to be monitored by HAStoragePlus resource so that all the nodes of the cluster can access the telemetry data.
Here is a small experiment you can carry out on Sun Cluster 3.2 beta software to see power of Service Level Management. In my experiment, I configured a Highly Available NFS (HA-NFS) service and monitored its system resource utilization. I used filebench and four Sun Fire V210 NFS clients to generate traffic. A load of about 20,000 files was driven for 3 minutes. Disk, network, resource group and node's system resource utilization was observed using Sun Cluster Manager and command line interface. A threshold limit was set for write throughput for disks, configured in SVM metaset, to produce an alarm if it exceeds 50 KB/sec.
After configuring HAStoragePlus resource for derby, all you need to do is:
- Permit system resource monitoring on Resource Groups. In this case, HA-NFS.
- View and enable monitoring of addition telemetry attributes other than default ones.
- View and optionally modify polling interval for telemetry data collection.
- Set a threshold on a telemetry attribute. In this case, wbyte.rate for a disk.
Here are some more graphs generated by Sun Cluster Manager for resource utilization.
With more features coming their way, I think Service Level Management will be a much needed and liked feature of Sun Cluster. With the help of Service Level Management:
- There won't be any need for hefty shell scripts to monitor disk utilizations, disk space and throughput.
- Similarly there won't be any need for third party monitoring products.
- It is possible to define Service Level Agreements (SLAs) for Sun Cluster.
- Consolidated view of system resource utilization on resource group and individual resource basis.
- Ability to show 24 hours of resource utilization data.
Sun Cluster Engineering