Wednesday Jul 04, 2012

How to configure a zone cluster on Solaris Cluster 4.0

This is a short overview on how to configure a zone cluster on Solaris Cluster 4.0. This is a little bit different as in Solaris Cluster 3.2/3.3 because Solaris Cluster 4.0 is only running on Solaris 11. The name of the zone cluster must be unique throughout the global Solaris Cluster and must be configured on a global Solaris Cluster. Please read all the requirements for zone cluster in Solaris Cluster Software Installation Guide for SC4.0.
For Solaris Cluster 3.2/3.3 please refer to my previous blog Configuration steps to create a zone cluster in Solaris Cluster 3.2/3.3.

A. Configure the zone cluster into the already running global cluster
  • Check if zone cluster can be created
    # cluster show-netprops
    to change number of zone clusters use
    # cluster set-netprops -p num_zoneclusters=12
    Note: 12 zone clusters is the default, values can be customized!

  • Create config file (zc1config) for zone cluster setup e.g:

  • Configure zone cluster
    # clzc configure -f zc1config zc1
    Note: If not using the config file the configuration can also be done manually # clzc configure zc1

  • Check zone configuration
    # clzc export zc1

  • Verify zone cluster
    # clzc verify zc1
    Note: The following message is a notice and comes up on several clzc commands
    Waiting for zone verify commands to complete on all the nodes of the zone cluster "zc1"...

  • Install the zone cluster
    # clzc install zc1
    Note: Monitor the consoles of the global zone to see how the install proceed! (The output is different on the nodes) It's very important that all global cluster nodes have installed the same set of ha-cluster packages!

  • Boot the zone cluster
    # clzc boot zc1

  • Login into non-global-zones of zone cluster zc1 on all nodes and finish Solaris installation.
    # zlogin -C zc1

  • Check status of zone cluster
    # clzc status zc1

  • Login into non-global-zones of zone cluster zc1 and configure the shell environment for root (for PATH: /usr/cluster/bin, for MANPATH: /usr/cluster/man)
    # zlogin -C zc1

  • If using additional name service configure /etc/nsswitch.conf of zone cluster non-global zones.
    hosts: cluster files
    netmasks: cluster files

  • Configure /etc/inet/hosts of the zone cluster zones
    Enter all the logical hosts of non-global zones



B. Add resource groups and resources to zone cluster
  • Create a resource group in zone cluster
    # clrg create -n <zone-hostname-node1>,<zone-hostname-node2> app-rg

    Note1: Use command # cluster status for zone cluster resource group overview.
    Note2: You can also run all commands for zone cluster in global cluster by adding the option -Z to the command. e.g:
    # clrg create -Z zc1 -n <zone-hostname-node1>,<zone-hostname-node2> app-rg

  • Set up the logical host resource for zone cluster
    In the global zone do:
    # clzc configure zc1
    clzc:zc1> add net
    clzc:zc1:net> set address=<zone-logicalhost-ip>
    clzc:zc1:net> end
    clzc:zc1> commit
    clzc:zc1> exit
    Note: Check that logical host is in /etc/hosts file
    In zone cluster do:
    # clrslh create -g app-rg -h <zone-logicalhost> <zone-logicalhost>-rs

  • Set up storage resource for zone cluster
    Register HAStoragePlus
    # clrt register SUNW.HAStoragePlus

    Example1) ZFS storage pool
    In the global zone do:
    Configure zpool eg: # zpool create <zdata> mirror cXtXdX cXtXdX
    and
    # clzc configure zc1
    clzc:zc1> add dataset
    clzc:zc1:dataset> set name=zdata
    clzc:zc1:dataset> end
    clzc:zc1> verify
    clzc:zc1> commit
    clzc:zc1> exit
    Check setup with # clzc show -v zc1
    In the zone cluster do:
    # clrs create -g app-rg -t SUNW.HAStoragePlus -p zpools=zdata app-hasp-rs


    Example2) HA filesystem
    In the global zone do:
    Configure SVM diskset and SVM devices.
    and
    # clzc configure zc1
    clzc:zc1> add fs
    clzc:zc1:fs> set dir=/data
    clzc:zc1:fs> set special=/dev/md/datads/dsk/d0
    clzc:zc1:fs> set raw=/dev/md/datads/rdsk/d0
    clzc:zc1:fs> set type=ufs
    clzc:zc1:fs> add options [logging]
    clzc:zc1:fs> end
    clzc:zc1> verify
    clzc:zc1> commit
    clzc:zc1> exit
    Check setup with # clzc show -v zc1
    In the zone cluster do:
    # clrs create -g app-rg -t SUNW.HAStoragePlus -p FilesystemMountPoints=/data app-hasp-rs


    Example3) Global filesystem as loopback file system
    In the global zone configure global filesystem and it to /etc/vfstab on all global nodes e.g.:
    /dev/md/datads/dsk/d0 /dev/md/datads/dsk/d0 /global/fs ufs 2 yes global,logging
    and
    # clzc configure zc1
    clzc:zc1> add fs
    clzc:zc1:fs> set dir=/zone/fs (zc-lofs-mountpoint)
    clzc:zc1:fs> set special=/global/fs (globalcluster-mountpoint)
    clzc:zc1:fs> set type=lofs
    clzc:zc1:fs> end
    clzc:zc1> verify
    clzc:zc1> commit
    clzc:zc1> exit
    Check setup with # clzc show -v zc1
    In the zone cluster do: (Create scalable rg if not already done)
    # clrg create -p desired_primaries=2 -p maximum_primaries=2 app-scal-rg
    # clrs create -g app-scal-rg -t SUNW.HAStoragePlus -p FilesystemMountPoints=/zone/fs hasp-rs

    More details of adding storage available in the Installation Guide for zone cluster

  • Switch resource group and resources online in the zone cluster
    # clrg online -eM app-rg
    # clrg online -eM app-scal-rg

  • Test: Switch of the resource group in the zone cluster
    # clrg switch -n zonehost2 app-rg
    # clrg switch -n zonehost2 app-scal-rg

  • Add supported dataservice to zone cluster
    Documentation for SC4.0 is available here



  • Example output:



    Appendix: To delete a zone cluster do:
    # clrg delete -Z zc1 -F +

    Note: Zone cluster uninstall can only be done if all resource groups are removed in the zone cluster. The command 'clrg delete -F +' can be used in zone cluster to delete the resource groups recursively.
    # clzc halt zc1
    # clzc uninstall zc1

    Note: If clzc command is not successful to uninstall the zone, then run 'zoneadm -z zc1 uninstall -F' on the nodes where zc1 is configured
    # clzc delete zc1

Thursday Sep 24, 2009

Configuration steps to create a zone cluster on Solaris Cluster 3.2/3.3

This is a short overview on how to configure a zone cluster. It is highly recommended to use Solaris 10 5/09 update7 with patch baseline July 2009 (or higher) and Sun Cluster 3.2 1/09 with Sun Cluster 3.2 core patch revision -33 or higher. The name of the zone cluster must be unique throughout the global Sun Cluster and must be configured on a global Sun Cluster. Please read the requirements for zone cluster in Sun Cluster Software Installation Guide.
For Solaris Cluster 4.0 please refer to blog How to configure a zone cluster on Solaris Cluster 4.0


A. Configure the zone cluster into the global cluster
  • Check if zone cluster can be created
    # cluster show-netprops
    to change number of zone clusters use
    # cluster set-netprops -p num_zoneclusters=12
    Note: 12 zone clusters is the default, values can be customized!

  • Configure filesystem for zonepath on all physical nodes
    # mkdir -p /zones/zc1
    # chmod 0700 /zones/zc1

  • Create config file (zc1config) for zone cluster setup e.g:

  • Configure zone cluster
    # clzc configure -f zc1config zc1
    Note: If not using the config file the configuration can also be done manually # clzc configure zc1

  • Check zone configuration
    # clzc export zc1

  • Verify zone cluster
    # clzc verify zc1
    Note: The following message is a notice and comes up on several clzc commands
    Waiting for zone verify commands to complete on all the nodes of the zone cluster "zc1"...

  • Install the zone cluster
    # clzc install zc1
    Note: Monitor the console of the global zone to see how the install proceed!

  • Boot the zone cluster
    # clzc boot zc1

  • Check status of zone cluster
    # clzc status zc1

  • Login into non-global-zones of zone cluster zc1 and configure the shell environment for root (for PATH: /usr/cluster/bin, for MANPATH: /usr/cluster/man)
    # zlogin -C zc1

B. Add resource groups and resources to zone cluster
  • Create a resource group in zone cluster
    # clrg create -n zone-hostname-node1,zone-hostname-node2 app-rg

    Note: Use command # cluster status for zone cluster resource group overview

  • Set up the logical host resource for zone cluster
    In the global zone do:
    # clzc configure zc1
    clzc:zc1> add net
    clzc:zc1:net> set address=<zone-logicalhost-ip>
    clzc:zc1:net> end
    clzc:zc1> commit
    clzc:zc1> exit
    Note: Check that logical host is in /etc/hosts file
    In zone cluster do:
    # clrslh create -g app-rg -h <zone-logicalhost> <zone-logicalhost>-rs

  • Set up storage resource for zone cluster
    Register HAStoragePlus
    # clrt register SUNW.HAStoragePlus

    Example1) ZFS storage pool
    In the global zone do:
    Configure zpool eg: # zpool create <zdata> mirror cXtXdX cXtXdX
    and
    # clzc configure zc1
    clzc:zc1> add dataset
    clzc:zc1:dataset> set name=zdata
    clzc:zc1:dataset> end
    clzc:zc1> exit
    Check setup with # clzc show -v zc1
    In the zone cluster do:
    # clrs create -g app-rg -t SUNW.HAStoragePlus -p zpools=zdata app-hasp-rs


    Example2) HA filesystem
    In the global zone do:
    Configure SVM diskset and SVM devices.
    and
    # clzc configure zc1
    clzc:zc1> add fs
    clzc:zc1:fs> set dir=/data
    clzc:zc1:fs> set special=/dev/md/datads/dsk/d0
    clzc:zc1:fs> set raw=/dev/md/datads/rdsk/d0
    clzc:zc1:fs> set type=ufs
    clzc:zc1:fs> end
    clzc:zc1> exit
    Check setup with # clzc show -v zc1
    In the zone cluster do:
    # clrs create -g app-rg -t SUNW.HAStoragePlus -p FilesystemMountPoints=/data app-hasp-rs

    More details of adding storage

  • Switch resource group and resources online in the zone cluster
    # clrg online -eM app-rg

  • Test: Switch of the resource group in the zone cluster
    # clrg switch -n zonehost2 app-rg

  • Add supported dataservice to zone cluster
    Further details about supported dataservices available at:
    - Zone Clusters - How to Deploy Virtual Clusters and Why
    - Running Oracle Real Application Clusters on Solaris Zone Cluster

Example output:



Appendix: To delete a zone cluster do:
# clrg delete -Z zc1 -F +

Note: Zone cluster uninstall can only be done if all resource groups are removed in the zone cluster. The command 'clrg delete -F +' can be used in zone cluster to delete the resource groups recursively.
# clzc halt zc1
# clzc uninstall zc1

Note: If clzc command is not successful to uninstall the zone, then run 'zoneadm -z zc1 uninstall -F' on the nodes where zc1 is configured
# clzc delete zc1

Monday Mar 09, 2009

memory leaks in "rgmd -z global" process

A memory leak occurs in the "rgmd -z global" process on Sun Cluster 3.2 1/09 Update2. The global zone instance of the rgmd process leaks memory in most situations such as "scstat" or "cluster show" and other basic commands. The problem is severe and the rgmd heap grows to a large size and crashes the Sun Cluster node.

The issue only happen if one of the following Sun Cluster core patches are active.
126106-27 or -29 or -30 Sun Cluster 3.2: CORE patch for Solaris 10
126107-28 or -30 or -31 Sun Cluster 3.2: CORE patch for Solaris 10_x86
Due to the fact that this patches are also part of the Sun Cluster 3.2 1/09 Update2 release the issue occur also on fresh installed Sun Cluster 3.2 1/09 Update2 systems.

The error can look as follows:
Analyze the grow of memory allocation with (or similar tools)
# prstat
3942 root 61M 11M sleep 101 - 0:00:02 0.7% rgmd/41
sometime later the increase of the memory allocation is visible.
3942 root 61M 20M sleep 101 - 0:01:15 0.7% rgmd/41
or
# pmap -x <pid_of_rgmd-z_global> | grep heap
00022000 47648 6992 6984 - rwx-- [ heap ]
sometime later the increase of the memory allocation is visible.
00022000 47648 15360 15352 - rwx-- [ heap ]

When the memory is full the Sun Cluster node panics with the following message:
Feb 25 07:59:23 node1 RGMD[1843]: [ID 381173 daemon.error] RGM: Could not allocate 1024 bytes; node is out of swap space; aborting node.
...
Feb 25 08:10:05 node1 cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking
Feb 25 08:10:05 node1 unix: [ID 836849 kern.notice]
Feb 25 08:10:05 node1 \^Mpanic[cpu0]/thread=2a100047ca0:
Feb 25 08:10:05 node1 unix: [ID 562397 kern.notice] Failfast: Aborting zone "global" (zone ID 0) because "globalrgmd" died 30 seconds ago.
Feb 25 08:10:06 node1 unix: [ID 100000 kern.notice]
...

Update 20.Mar.2009:
Available now:
Alert 1020253.1 Memory Leak in the "rgmd" Process of Solaris Cluster 3.2 may Cause a failfast Panic

Update 17.Jun.2009:
The -33 revision of the Sun Cluster core patch is the first released version which fix this issue.
126106-33 Sun Cluster 3.2: CORE patch for Solaris 10
126107-33 Sun Cluster 3.2: CORE patch for Solaris 10_x86


Workaround: Use previous version -19 to prevent issue.
126106-19 Sun Cluster 3.2: CORE patch for Solaris 10
126107-19 Sun Cluster 3.2: CORE patch for Solaris 10_x86

The issue is reported in bug 6808508 (description: scalable services coredump during the failover due to network failure). A fix is in progress. This blog will be updated when the fix is available.

About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today