Sunday Dec 10, 2006

Making SMF services Highly Available with Sun Cluster

If you have written an SMF service for an application to run on a single-node and want to make the service highly available across multiple nodes, then Sun Cluster can be used. Even though the Sun Cluster service model is different from the SMF service model, for adding HA it takes only as much as running a few simple commands without any need to write new agent script or code.

To support the SMF services, Sun Cluster 3.2 provides the following three new resource types.
- Proxy_SMF_failover resource type allows an SMF service to be managed as a failover resource.
- Proxy_SMF_multimaster resource type allows an SMF service to be online on more than one node simultaneously (without any load balancing).
- Proxy_SMF_scalable resource type allows running an SMF service as a scalable resource (with Sun Cluster load balancing facility).

Here is an example to show how easy it is to make a DNS server (an SMF service) Highly Available. We encapsulate the DNS server SMF service in an SMF proxy resource dns-rs. Note that SC 3.1u4 already provides an HA-DNS agent, which is the recommended option for making a DNS server HA on Sun Cluster. This is just an example.

1. Create a text file specifying the name of the SMF service and the location of its manifest, as shown in the example below and save it in any convenient location, say /tmp.

# cat /tmp/dns_svcs
<svc:/network/dns/server:default>,</var/svc/manifest/network/dns/server.xml>


2. Register the Proxy SMF failover Resource type.

# clrt register SUNW.Proxy_SMF_failover

3. Create a resource group dns_rg specifying the list of cluster nodes on which the service can run, in this example, plift1 and plift2.

# clrg create -n plift1,plift2 dns_rg

4. Add a dns_rs to manage the above mentioned SMF service. Specify the path and name of the text file created earlier in step 1 as the value for the extension property Proxied_service_instances.

# clrs create -g dns_rg -t Proxy_SMF_failover -x Proxied_service_instances=/tmp/dns_svcs dns_rs

5. Manage the rg and bring it online.

# clrg online -M dns-rg

THAT'S IT!
The service is up and running, with all the supervision and failover capability provided by SUN Cluster. If a failure occurs, the resource group will automatically be restarted or switched onto a different node.

6. Verify the status of the resource group and the SMF proxy resource.

# clrg status
=== Cluster Resource Groups ===
Group Name Node Name Suspended Status
---------- --------- --------- ------
dns-rg plift1 No Online
  plift2 No Offline

# clrs status
=== Cluster Resources ===
Resource Name Node Name State Status Message
--------- --------- ----- --------------
dns-rs plift1 Online but not monitored Online
  plift2 Offline Offline

7. Verify that the dns server SMF service is online on plift1 and is offline on plift2.

Logon to plift1 and run the commands below to verify.

# svcs -a | grep dns
online 11:38:14 svc:/network/dns/server:default

# svcs -l svc:/network/dns/server:default
fmri svc:/network/dns/server:default
name BIND DNS server
enabled true
state online
next_state none
state_time Wed Nov 15 11:38:14 2006
restarter svc:/system/cluster/sc_restarter:default
dependency require_all/none file://localhost/etc/named.conf (online)
dependency require_all/none svc:/system/filesystem/minimal (online)
dependency require_any/error svc:/network/loopback (online)
dependency optional_all/error svc:/milestone/network (online)

("named" is a dns specific daemon which is started by the dns service)

# ps -efj | grep name
root 7773 1 7773 7773 0 11:38:15 ? 0:00 /usr/sbin/named
root 7785 7598 7784 7594 0 11:40:06 pts/4 0:00 grep named

Logon to plift2 and repeat the above commands and verify that the dns SMF service is offline.
# svcs -a | grep dns
offline 11:38:09 svc:/network/dns/server:default

# svcs -l svc:/network/dns/server:default
fmri svc:/network/dns/server:default
name BIND DNS server
enabled true
state offline
next_state none
state_time Wed Nov 15 11:38:09 2006
restarter svc:/system/cluster/sc_restarter:default
dependency require_all/none file://localhost/etc/named.conf (online)
dependency require_all/none svc:/system/filesystem/minimal (online)
dependency require_any/error svc:/network/loopback (online)
dependency optional_all/error svc:/milestone/network (online)

# ps -efj | grep name
root 15318 15287 15317 15283 0 11:40:11 pts/1 0:00 grep name

8. You can switchover the resource-group dns_rg from node plift1 to plift2. Thereby it goes offline on plift1 and goes online on plift2.

# clrg switch -n plift2 dns_rg

Check the status of the resource group.

# clrg status
=== Cluster Resource Groups ===
Group Name Node Name Suspended Status
---------- --------- --------- ------
dns-rg plift1 No Offline
  plift2 No Online

# clrs status
=== Cluster Resources ===
Resource Name Node Name State Status Message
--------- --------- ----- --------------
dns-rs plift1 Online Offline
  plift2 Offline but not monitored Online

Verify that the SMF service has been stopped on plift1 and running on plift2.

On plift1.

# svcs -a | grep dns
disabled Nov_07 svc:/network/dns/client:default
offline 11:42:22 svc:/network/dns/server:default

# ps -efj | grep name
root 7808 7598 7807 7594 0 11:43:47 pts/4 0:00 grep name

On plift2.

# svcs -a | grep dns
disabled Nov_04 svc:/network/dns/client:default
online 11:42:22 svc:/network/dns/server:default

# ps -efj | grep name
root 15331 15287 15330 15283 0 11:43:52 pts/1 0:00 grep name
root 15324 1 15324 15324 0 11:42:23 ? 0:00 /usr/sbin/named

In case you want to decouple the SMF service from Sun Cluster control, all you need to do is disable the SMF proxy resource and delete it.

Now that is as cool as it can get. So go ahead and try it out.

Harish Mallya
Sun Cluster Engineering.

About

mkb

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today