Wednesday Jul 04, 2012

How to configure a zone cluster on Solaris Cluster 4.0

This is a short overview on how to configure a zone cluster on Solaris Cluster 4.0. This is a little bit different as in Solaris Cluster 3.2/3.3 because Solaris Cluster 4.0 is only running on Solaris 11. The name of the zone cluster must be unique throughout the global Solaris Cluster and must be configured on a global Solaris Cluster. Please read all the requirements for zone cluster in Solaris Cluster Software Installation Guide for SC4.0.
For Solaris Cluster 3.2/3.3 please refer to my previous blog Configuration steps to create a zone cluster in Solaris Cluster 3.2/3.3.

A. Configure the zone cluster into the already running global cluster
  • Check if zone cluster can be created
    # cluster show-netprops
    to change number of zone clusters use
    # cluster set-netprops -p num_zoneclusters=12
    Note: 12 zone clusters is the default, values can be customized!

  • Create config file (zc1config) for zone cluster setup e.g:

  • Configure zone cluster
    # clzc configure -f zc1config zc1
    Note: If not using the config file the configuration can also be done manually # clzc configure zc1

  • Check zone configuration
    # clzc export zc1

  • Verify zone cluster
    # clzc verify zc1
    Note: The following message is a notice and comes up on several clzc commands
    Waiting for zone verify commands to complete on all the nodes of the zone cluster "zc1"...

  • Install the zone cluster
    # clzc install zc1
    Note: Monitor the consoles of the global zone to see how the install proceed! (The output is different on the nodes) It's very important that all global cluster nodes have installed the same set of ha-cluster packages!

  • Boot the zone cluster
    # clzc boot zc1

  • Login into non-global-zones of zone cluster zc1 on all nodes and finish Solaris installation.
    # zlogin -C zc1

  • Check status of zone cluster
    # clzc status zc1

  • Login into non-global-zones of zone cluster zc1 and configure the shell environment for root (for PATH: /usr/cluster/bin, for MANPATH: /usr/cluster/man)
    # zlogin -C zc1

  • If using additional name service configure /etc/nsswitch.conf of zone cluster non-global zones.
    hosts: cluster files
    netmasks: cluster files

  • Configure /etc/inet/hosts of the zone cluster zones
    Enter all the logical hosts of non-global zones



B. Add resource groups and resources to zone cluster
  • Create a resource group in zone cluster
    # clrg create -n <zone-hostname-node1>,<zone-hostname-node2> app-rg

    Note1: Use command # cluster status for zone cluster resource group overview.
    Note2: You can also run all commands for zone cluster in global cluster by adding the option -Z to the command. e.g:
    # clrg create -Z zc1 -n <zone-hostname-node1>,<zone-hostname-node2> app-rg

  • Set up the logical host resource for zone cluster
    In the global zone do:
    # clzc configure zc1
    clzc:zc1> add net
    clzc:zc1:net> set address=<zone-logicalhost-ip>
    clzc:zc1:net> end
    clzc:zc1> commit
    clzc:zc1> exit
    Note: Check that logical host is in /etc/hosts file
    In zone cluster do:
    # clrslh create -g app-rg -h <zone-logicalhost> <zone-logicalhost>-rs

  • Set up storage resource for zone cluster
    Register HAStoragePlus
    # clrt register SUNW.HAStoragePlus

    Example1) ZFS storage pool
    In the global zone do:
    Configure zpool eg: # zpool create <zdata> mirror cXtXdX cXtXdX
    and
    # clzc configure zc1
    clzc:zc1> add dataset
    clzc:zc1:dataset> set name=zdata
    clzc:zc1:dataset> end
    clzc:zc1> verify
    clzc:zc1> commit
    clzc:zc1> exit
    Check setup with # clzc show -v zc1
    In the zone cluster do:
    # clrs create -g app-rg -t SUNW.HAStoragePlus -p zpools=zdata app-hasp-rs


    Example2) HA filesystem
    In the global zone do:
    Configure SVM diskset and SVM devices.
    and
    # clzc configure zc1
    clzc:zc1> add fs
    clzc:zc1:fs> set dir=/data
    clzc:zc1:fs> set special=/dev/md/datads/dsk/d0
    clzc:zc1:fs> set raw=/dev/md/datads/rdsk/d0
    clzc:zc1:fs> set type=ufs
    clzc:zc1:fs> add options [logging]
    clzc:zc1:fs> end
    clzc:zc1> verify
    clzc:zc1> commit
    clzc:zc1> exit
    Check setup with # clzc show -v zc1
    In the zone cluster do:
    # clrs create -g app-rg -t SUNW.HAStoragePlus -p FilesystemMountPoints=/data app-hasp-rs


    Example3) Global filesystem as loopback file system
    In the global zone configure global filesystem and it to /etc/vfstab on all global nodes e.g.:
    /dev/md/datads/dsk/d0 /dev/md/datads/dsk/d0 /global/fs ufs 2 yes global,logging
    and
    # clzc configure zc1
    clzc:zc1> add fs
    clzc:zc1:fs> set dir=/zone/fs (zc-lofs-mountpoint)
    clzc:zc1:fs> set special=/global/fs (globalcluster-mountpoint)
    clzc:zc1:fs> set type=lofs
    clzc:zc1:fs> end
    clzc:zc1> verify
    clzc:zc1> commit
    clzc:zc1> exit
    Check setup with # clzc show -v zc1
    In the zone cluster do: (Create scalable rg if not already done)
    # clrg create -p desired_primaries=2 -p maximum_primaries=2 app-scal-rg
    # clrs create -g app-scal-rg -t SUNW.HAStoragePlus -p FilesystemMountPoints=/zone/fs hasp-rs

    More details of adding storage available in the Installation Guide for zone cluster

  • Switch resource group and resources online in the zone cluster
    # clrg online -eM app-rg
    # clrg online -eM app-scal-rg

  • Test: Switch of the resource group in the zone cluster
    # clrg switch -n zonehost2 app-rg
    # clrg switch -n zonehost2 app-scal-rg

  • Add supported dataservice to zone cluster
    Documentation for SC4.0 is available here



  • Example output:



    Appendix: To delete a zone cluster do:
    # clrg delete -Z zc1 -F +

    Note: Zone cluster uninstall can only be done if all resource groups are removed in the zone cluster. The command 'clrg delete -F +' can be used in zone cluster to delete the resource groups recursively.
    # clzc halt zc1
    # clzc uninstall zc1

    Note: If clzc command is not successful to uninstall the zone, then run 'zoneadm -z zc1 uninstall -F' on the nodes where zc1 is configured
    # clzc delete zc1

Wednesday Jun 17, 2009

Ready for Sun Cluster 3.2 1/09 Update2?

Now it's time to install/upgrade to Sun Cluster 3.2 1/09 Update2. The major bugs of Sun Cluster 3.2 1/09 Update2 are fixed in
126106-33 or higher Sun Cluster 3.2: CORE patch for Solaris 10
126107-33 or higher Sun Cluster 3.2: CORE patch for Solaris 10_x86
126105-33 or higher Sun Cluster 3.2: CORE patch for Solaris 9

This means the core patch should be applied immediately after the installation of Sun Cluster 1/09 Update2 software. The installation approach in short words:

  • Install Sun Cluster 3.2 1/09 Update2 with java enterprise installer

  • Install the necessary Sun Cluster 3.2 core patch as mentioned above

  • Configure Sun Cluster 3.2 with scinstall

  • Further details available in Sun Cluster Software Installation Guide for Solaris OS.
    Also Installation services delivered by Oracle Advanced Customer Services are available.

    Monday May 04, 2009

    cluster configuration repository can get corrupted on installation Sun Cluster 3.2 1/09 Update2


    The issue only occurs if the Sun Cluster 3.2 1/09 Update2 will be installed with a non-default netmask address for cluster interconnect.

    Seen problems if system is affected:
    Errors with:
          * did devices
          * quorum device
          * The command 'scstat -i' can look like:
    -- IPMP Groups --
                     Node Name       Group    Status    Adapter   Status
                     ---------               -----        ------       -------       ------
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode2 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode1 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode2 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode1 - unexpected error.
    IPMP Group: scnode2    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode1    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode2    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode1    sc_ipmp0    Online    qfe0      Online


    How the problem occur?
    After the installation of Sun Cluster 3.2 1/09 Update2 product with the java installer it's necessary to run the #scinstall command. If choose "Custom" installation instead of "Typical" installation then it's possible to change the default of the netmask of cluster interconnect. The following questions come up within the installation procedure if answering the default netmask question with 'no'.

    Example scinstall:
           Is it okay to accept the default netmask (yes/no) [yes]? no
           Maximum number of nodes anticipated for future growth [64]? 4
           Maximum number of private networks anticipated for future growth [10]?
           Maximum number of virtual clusters expected [12]? 0
           What netmask do you want to use [255.255.255.128]?
    Prevent the issue by answering the virtual clusters question with '1' or other serious consideration to future growth potential if necessary.
    Do NOT answer the virtual clusters question with '0'!


    Example of the whole scinstall log when corrupted ccr occur:

    In the /etc/cluster/ccr/global/infrastructure file the error can be found by an empty entry for cluster.properties.private_netmask. Furthermore some other lines are not reflect the correct values for netmask as choosen within scinstall.
    Wrong infrastructure file:
    cluster.state enabled
    cluster.properties.cluster_id 0x49F82635
    cluster.properties.installmode disabled
    cluster.properties.private_net_number 172.16.0.0
    cluster.properties.cluster_netmask 255.255.248.0
    cluster.properties.private_netmask
    cluster.properties.private_subnet_netmask 255.255.255.248
    cluster.properties.private_user_net_number 172.16.4.0
    cluster.properties.private_user_netmask 255.255.254.0

    cluster.properties.private_maxnodes 6
    cluster.properties.private_maxprivnets 10
    cluster.properties.zoneclusters 0
    cluster.properties.auth_joinlist_type sys

    If answering the virtual cluster question with value '1' then the correct netmask entries are:
    cluster.properties.cluster_id 0x49F82635
    cluster.properties.installmode disabled
    cluster.properties.private_net_number 172.16.0.0
    cluster.properties.cluster_netmask 255.255.255.128
    cluster.properties.private_netmask 255.255.255.128
    cluster.properties.private_subnet_netmask 255.255.255.248
    cluster.properties.private_user_net_number 172.16.0.64
    cluster.properties.private_user_netmask 255.255.255.224

    cluster.properties.private_maxnodes 6
    cluster.properties.private_maxprivnets 10
    cluster.properties.zoneclusters 1
    cluster.properties.auth_joinlist_type sys


    Workaround if problem already occured:
    1.) Boot all nodes in non-cluster-mode with 'boot -x'
    2.) Change the wrong values of /etc/cluster/ccr/global/infrastructure on all nodes. See example above.
    3.) Write a new checksum for all infrastructure files on all nodes. Use -o (master file) on the node which is booting up first.
    scnode1 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure -o
    scnode2 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    scnode1 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    scnode2 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    4.) first reboot scnode1 (master infrastructure file) into cluster, then the other nodes.
    This is reported in bug 6825948.


    Update 17.Jun.2009:
    The -33 revision of the Sun Cluster core patch is the first released version which fix this issue at installation time.
    126106-33 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-33 Sun Cluster 3.2: CORE patch for Solaris 10_x86

    About

    I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today