Thursday Jan 21, 2010

Configuration of 3rd mediator host in campus cluster

One of the new features in Sun Cluster 3.2 11/09 update3 is the Solaris Volume Manager Three-Mediator support. This is very helpful for two-room campus cluster configurations. It's recommended to install the 3rd mediator host in a third-room. Please refer to the details about the Guidelines for Mediators.
In the Solaris Cluster 3.3 docs you can find Configuring Dual-String Mediator documentation.

Advantages:
-- The 3rd mediator host is used to get the majority of Solaris Volume Manager configuration when one room is lost due to an error.
-- The 3rd mediator host only need public network connection. (Must not be part of the cluster and need no connection to the shared storage).
Consider:
-- One more room necessary
-- One more host for administration, but if using a Sun Cluster quorum server then the 3rd mediator can be on the same host.


This example shows how to add a 3rd mediator host to an existing Solaris Volume Manager diskset

Configuration steps on the 3rd mediator host
  • A) Add root to sysadmin in /etc/group
    # vi /etc/group
    sysadmin::14:root

  • B) Create metadb and dummy diskset
    # metadb -afc 3 c0t0d0s7 c1t0d0s7
    # metaset -s <dummyds> -a -h <3rd_mediator_host>

  • Note: Maybe a good name for the dummyds can be a combination of the used setname on the campuscluster name e.g.:'setnameofcampuscluster_campusclustername'. If using more than one set it could be e.g: 'sets_of_campusclustername'. Or if using it for more than one cluster it's possible to create one set with a specific name for each cluster. This could be helpful for monitoring/configuration purposes. But keep in mind this is not required, one set is enough for all clusters which use this 3rd mediator host.

    Configuraiton steps on cluster nodes
  • A) Add 3rd mediator host to /etc/hosts
    # echo <ipaddress hostname> >> /etc/hosts

  • B) Add 3rd mediator host to existing diskset on one cluster node.
    # metaset -s <setname> -a -m <3rd_mediator_host>

  • ATTENTION: If using 3rd mediator host for more than one cluster. Each cluster node and diskset must have a unique name throughout all clusters, and a diskset cannot be named shared or admin.

    Example output:


    Hint1: On the cluster nodes and on the third mediator the configuration is stored in the file /etc/lvm/meddb

    Hint2: Use a script to monitor the mediator status on the cluster nodes. Similar to:

    Download the mediatorcheck.ksh script here!

    Thursday Sep 24, 2009

    Configuration steps to create a zone cluster on Solaris Cluster 3.2/3.3

    This is a short overview on how to configure a zone cluster. It is highly recommended to use Solaris 10 5/09 update7 with patch baseline July 2009 (or higher) and Sun Cluster 3.2 1/09 with Sun Cluster 3.2 core patch revision -33 or higher. The name of the zone cluster must be unique throughout the global Sun Cluster and must be configured on a global Sun Cluster. Please read the requirements for zone cluster in Sun Cluster Software Installation Guide.
    For Solaris Cluster 4.0 please refer to blog How to configure a zone cluster on Solaris Cluster 4.0


    A. Configure the zone cluster into the global cluster
    • Check if zone cluster can be created
      # cluster show-netprops
      to change number of zone clusters use
      # cluster set-netprops -p num_zoneclusters=12
      Note: 12 zone clusters is the default, values can be customized!

    • Configure filesystem for zonepath on all physical nodes
      # mkdir -p /zones/zc1
      # chmod 0700 /zones/zc1

    • Create config file (zc1config) for zone cluster setup e.g:

    • Configure zone cluster
      # clzc configure -f zc1config zc1
      Note: If not using the config file the configuration can also be done manually # clzc configure zc1

    • Check zone configuration
      # clzc export zc1

    • Verify zone cluster
      # clzc verify zc1
      Note: The following message is a notice and comes up on several clzc commands
      Waiting for zone verify commands to complete on all the nodes of the zone cluster "zc1"...

    • Install the zone cluster
      # clzc install zc1
      Note: Monitor the console of the global zone to see how the install proceed!

    • Boot the zone cluster
      # clzc boot zc1

    • Check status of zone cluster
      # clzc status zc1

    • Login into non-global-zones of zone cluster zc1 and configure the shell environment for root (for PATH: /usr/cluster/bin, for MANPATH: /usr/cluster/man)
      # zlogin -C zc1

    B. Add resource groups and resources to zone cluster
    • Create a resource group in zone cluster
      # clrg create -n zone-hostname-node1,zone-hostname-node2 app-rg

      Note: Use command # cluster status for zone cluster resource group overview

    • Set up the logical host resource for zone cluster
      In the global zone do:
      # clzc configure zc1
      clzc:zc1> add net
      clzc:zc1:net> set address=<zone-logicalhost-ip>
      clzc:zc1:net> end
      clzc:zc1> commit
      clzc:zc1> exit
      Note: Check that logical host is in /etc/hosts file
      In zone cluster do:
      # clrslh create -g app-rg -h <zone-logicalhost> <zone-logicalhost>-rs

    • Set up storage resource for zone cluster
      Register HAStoragePlus
      # clrt register SUNW.HAStoragePlus

      Example1) ZFS storage pool
      In the global zone do:
      Configure zpool eg: # zpool create <zdata> mirror cXtXdX cXtXdX
      and
      # clzc configure zc1
      clzc:zc1> add dataset
      clzc:zc1:dataset> set name=zdata
      clzc:zc1:dataset> end
      clzc:zc1> exit
      Check setup with # clzc show -v zc1
      In the zone cluster do:
      # clrs create -g app-rg -t SUNW.HAStoragePlus -p zpools=zdata app-hasp-rs


      Example2) HA filesystem
      In the global zone do:
      Configure SVM diskset and SVM devices.
      and
      # clzc configure zc1
      clzc:zc1> add fs
      clzc:zc1:fs> set dir=/data
      clzc:zc1:fs> set special=/dev/md/datads/dsk/d0
      clzc:zc1:fs> set raw=/dev/md/datads/rdsk/d0
      clzc:zc1:fs> set type=ufs
      clzc:zc1:fs> end
      clzc:zc1> exit
      Check setup with # clzc show -v zc1
      In the zone cluster do:
      # clrs create -g app-rg -t SUNW.HAStoragePlus -p FilesystemMountPoints=/data app-hasp-rs

      More details of adding storage

    • Switch resource group and resources online in the zone cluster
      # clrg online -eM app-rg

    • Test: Switch of the resource group in the zone cluster
      # clrg switch -n zonehost2 app-rg

    • Add supported dataservice to zone cluster
      Further details about supported dataservices available at:
      - Zone Clusters - How to Deploy Virtual Clusters and Why
      - Running Oracle Real Application Clusters on Solaris Zone Cluster

    Example output:



    Appendix: To delete a zone cluster do:
    # clrg delete -Z zc1 -F +

    Note: Zone cluster uninstall can only be done if all resource groups are removed in the zone cluster. The command 'clrg delete -F +' can be used in zone cluster to delete the resource groups recursively.
    # clzc halt zc1
    # clzc uninstall zc1

    Note: If clzc command is not successful to uninstall the zone, then run 'zoneadm -z zc1 uninstall -F' on the nodes where zc1 is configured
    # clzc delete zc1

    Monday May 04, 2009

    cluster configuration repository can get corrupted on installation Sun Cluster 3.2 1/09 Update2


    The issue only occurs if the Sun Cluster 3.2 1/09 Update2 will be installed with a non-default netmask address for cluster interconnect.

    Seen problems if system is affected:
    Errors with:
          * did devices
          * quorum device
          * The command 'scstat -i' can look like:
    -- IPMP Groups --
                     Node Name       Group    Status    Adapter   Status
                     ---------               -----        ------       -------       ------
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode2 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode1 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode2 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode1 - unexpected error.
    IPMP Group: scnode2    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode1    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode2    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode1    sc_ipmp0    Online    qfe0      Online


    How the problem occur?
    After the installation of Sun Cluster 3.2 1/09 Update2 product with the java installer it's necessary to run the #scinstall command. If choose "Custom" installation instead of "Typical" installation then it's possible to change the default of the netmask of cluster interconnect. The following questions come up within the installation procedure if answering the default netmask question with 'no'.

    Example scinstall:
           Is it okay to accept the default netmask (yes/no) [yes]? no
           Maximum number of nodes anticipated for future growth [64]? 4
           Maximum number of private networks anticipated for future growth [10]?
           Maximum number of virtual clusters expected [12]? 0
           What netmask do you want to use [255.255.255.128]?
    Prevent the issue by answering the virtual clusters question with '1' or other serious consideration to future growth potential if necessary.
    Do NOT answer the virtual clusters question with '0'!


    Example of the whole scinstall log when corrupted ccr occur:

    In the /etc/cluster/ccr/global/infrastructure file the error can be found by an empty entry for cluster.properties.private_netmask. Furthermore some other lines are not reflect the correct values for netmask as choosen within scinstall.
    Wrong infrastructure file:
    cluster.state enabled
    cluster.properties.cluster_id 0x49F82635
    cluster.properties.installmode disabled
    cluster.properties.private_net_number 172.16.0.0
    cluster.properties.cluster_netmask 255.255.248.0
    cluster.properties.private_netmask
    cluster.properties.private_subnet_netmask 255.255.255.248
    cluster.properties.private_user_net_number 172.16.4.0
    cluster.properties.private_user_netmask 255.255.254.0

    cluster.properties.private_maxnodes 6
    cluster.properties.private_maxprivnets 10
    cluster.properties.zoneclusters 0
    cluster.properties.auth_joinlist_type sys

    If answering the virtual cluster question with value '1' then the correct netmask entries are:
    cluster.properties.cluster_id 0x49F82635
    cluster.properties.installmode disabled
    cluster.properties.private_net_number 172.16.0.0
    cluster.properties.cluster_netmask 255.255.255.128
    cluster.properties.private_netmask 255.255.255.128
    cluster.properties.private_subnet_netmask 255.255.255.248
    cluster.properties.private_user_net_number 172.16.0.64
    cluster.properties.private_user_netmask 255.255.255.224

    cluster.properties.private_maxnodes 6
    cluster.properties.private_maxprivnets 10
    cluster.properties.zoneclusters 1
    cluster.properties.auth_joinlist_type sys


    Workaround if problem already occured:
    1.) Boot all nodes in non-cluster-mode with 'boot -x'
    2.) Change the wrong values of /etc/cluster/ccr/global/infrastructure on all nodes. See example above.
    3.) Write a new checksum for all infrastructure files on all nodes. Use -o (master file) on the node which is booting up first.
    scnode1 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure -o
    scnode2 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    scnode1 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    scnode2 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    4.) first reboot scnode1 (master infrastructure file) into cluster, then the other nodes.
    This is reported in bug 6825948.


    Update 17.Jun.2009:
    The -33 revision of the Sun Cluster core patch is the first released version which fix this issue at installation time.
    126106-33 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-33 Sun Cluster 3.2: CORE patch for Solaris 10_x86

    About

    I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today