Monday Sep 07, 2009

Entries in infrastructure file if using tagged VLAN for cluster interconnect

In some cases it's necessary to add a tagged VLAN id to the cluster interconnect. This example show the difference of the cluster interconnect configuration if using tagged VLAN id or not. The interface e1000g2 have a "normal" setup (no VLAN id) and the interface e1000g1 got a VLAN id of 2. The used ethernet switch must be configured first with tagged VLAN id before the cluster interconnect can be configured. Use "clsetup" to assign a VLAN id to cluster interconnect.

Entries for "normal" cluster interconnect interface in /etc/cluster/ccr/global/infrastructure - no tagged VLAN:
cluster.nodes.1.adapters.1.name e1000g2
cluster.nodes.1.adapters.1.properties.device_name e1000g
cluster.nodes.1.adapters.1.properties.device_instance 2
cluster.nodes.1.adapters.1.properties.transport_type dlpi
cluster.nodes.1.adapters.1.properties.lazy_free 1
cluster.nodes.1.adapters.1.properties.dlpi_heartbeat_timeout 10000
cluster.nodes.1.adapters.1.properties.dlpi_heartbeat_quantum 1000
cluster.nodes.1.adapters.1.properties.nw_bandwidth 80
cluster.nodes.1.adapters.1.properties.bandwidth 70
cluster.nodes.1.adapters.1.properties.ip_address 172.16.1.129
cluster.nodes.1.adapters.1.properties.netmask 255.255.255.128
cluster.nodes.1.adapters.1.state enabled
cluster.nodes.1.adapters.1.ports.1.name 0
cluster.nodes.1.adapters.1.ports.1.state enabled


Entries for cluster interconnect interface in /etc/cluster/ccr/global/infrastructure - with tagged VLAN:
cluster.nodes.1.adapters.2.name e1000g2001

cluster.nodes.1.adapters.2.properties.device_name e1000g
cluster.nodes.1.adapters.2.properties.device_instance 1
cluster.nodes.1.adapters.2.properties.vlan_id 2
cluster.nodes.1.adapters.2.properties.transport_type dlpi
cluster.nodes.1.adapters.2.properties.lazy_free 1
cluster.nodes.1.adapters.2.properties.dlpi_heartbeat_timeout 10000
cluster.nodes.1.adapters.2.properties.dlpi_heartbeat_quantum 1000
cluster.nodes.1.adapters.2.properties.nw_bandwidth 80
cluster.nodes.1.adapters.2.properties.bandwidth 70
cluster.nodes.1.adapters.2.properties.ip_address 172.16.2.1
cluster.nodes.1.adapters.2.properties.netmask 255.255.255.128
cluster.nodes.1.adapters.2.state enabled
cluster.nodes.1.adapters.2.ports.1.name 0
cluster.nodes.1.adapters.2.ports.1.state enabled

The tagged VLAN interface is a combination of the VLAN id and the used network interface. In this example e1000g2001, the 2 after the e1000g is the VLAN id and the 1 at the end is the instance of the e1000g driver. Normally this would be the e1000g1 interface but with the VLAN id it becomes the interface e1000g2001.

The ifconfig -a of the above configuration is:
# ifconfig -a

lo0: flags=20010008c9 mtu 8232 index 1
       inet 127.0.0.1 netmask ff000000
e1000g0: flags=9000843 mtu 1500 index 2
      inet 10.16.65.63 netmask fffff800 broadcast 10.16.55.255
      groupname sc_ipmp0
      ether 0:14:4f:20:6a:18
e1000g2: flags=201008843 mtu 1500 index 4
      inet 172.16.1.129 netmask ffffff80 broadcast 172.16.1.255
      ether 0:14:4f:20:6a:1a
e1000g2001: flags=201008843 mtu 1500 index 3
      inet 172.16.2.1 netmask ffffff80 broadcast 172.16.2.127
      ether 0:14:4f:20:6a:19

clprivnet0: flags=1009843 mtu 1500 index 5
      inet 172.16.4.1 netmask fffffe00 broadcast 172.16.5.255
      ether 0:0:0:0:0:1

Thursday Jul 23, 2009

Sun Cluster 3.x command line overview

I wrote together a quick reference guide for Sun Cluster 3.x. The guide includes the "old" command line which is used for Sun Cluster 3.0, 3.1, 3.2 and the already known Sun Cluster 3.2 new object based command line. Please do not expect the whole command line in this two pages. It should be a reminder for the most used commands within Sun Cluster 3.x. I added the pictures to this blog but also the pdf file is available for download.





Further reference guide are available:
Sun Cluster 3.2 Quick Reference Guide
German Sun Cluster 3.2 Quick Reference Guide
Sun Cluster 3.1 command line cheat sheet

Wednesday Jun 17, 2009

Ready for Sun Cluster 3.2 1/09 Update2?

Now it's time to install/upgrade to Sun Cluster 3.2 1/09 Update2. The major bugs of Sun Cluster 3.2 1/09 Update2 are fixed in
126106-33 or higher Sun Cluster 3.2: CORE patch for Solaris 10
126107-33 or higher Sun Cluster 3.2: CORE patch for Solaris 10_x86
126105-33 or higher Sun Cluster 3.2: CORE patch for Solaris 9

This means the core patch should be applied immediately after the installation of Sun Cluster 1/09 Update2 software. The installation approach in short words:

  • Install Sun Cluster 3.2 1/09 Update2 with java enterprise installer

  • Install the necessary Sun Cluster 3.2 core patch as mentioned above

  • Configure Sun Cluster 3.2 with scinstall

  • Further details available in Sun Cluster Software Installation Guide for Solaris OS.
    Also Installation services delivered by Oracle Advanced Customer Services are available.

    Friday May 08, 2009

    Administration of zpool devices in Sun Cluster 3.2 environment


    Carefully configure zpools in Sun Cluster 3.2. Because it's possible to use the same physical device in different zpools on different nodes at the same time. This means the zpool command does NOT care about if the physical device is already in use by another zpool on another node. e.g. If node1 have an active zpool with device c3t3d0 then it's possible to create a new zpool with c3t3d0 on another node. (assumption: c3t3d0 is the same shared device on all cluster nodes).

    Output of testing...


    If problems occurred due to administration mistakes then the following errors have been seen:

    NODE1# zpool import tank
    cannot import 'tank': I/O error

    NODE2# zpool import tankothernode
    cannot import 'tankothernode': one or more devices is currently unavailable

    NODE2# zpool import tankothernode
    cannot import 'tankothernode': no such pool available

    NODE1# zpool import tank
    cannot import 'tank': pool may be in use from other system, it was last accessed by NODE2 (hostid: 0x83083465) on Fri May 8 13:34:41 2009
    use '-f' to import anyway
    NODE1# zpool import -f tank
    cannot import 'tank': one or more devices is currently unavailable


    Furthermore the zpool command also use the disk without any warning if it used by Solaris Volume Manager diskset or Symantec (Veritas) Volume Manager diskgroup.

    Summary for Sun Cluster environment:
    ALWAYS MANUALLY CHECK THAT THE DEVICE WHICH USING FOR ZPOOL IS FREE!!!


    This is addressed in bug 6783988.

    Monday May 04, 2009

    cluster configuration repository can get corrupted on installation Sun Cluster 3.2 1/09 Update2


    The issue only occurs if the Sun Cluster 3.2 1/09 Update2 will be installed with a non-default netmask address for cluster interconnect.

    Seen problems if system is affected:
    Errors with:
          * did devices
          * quorum device
          * The command 'scstat -i' can look like:
    -- IPMP Groups --
                     Node Name       Group    Status    Adapter   Status
                     ---------               -----        ------       -------       ------
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode2 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode1 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode2 - unexpected error.
    scrconf: RPC: Authentication error; why = Client credential too weak
    scrconf: Failed to get zone information for scnode1 - unexpected error.
    IPMP Group: scnode2    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode1    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode2    sc_ipmp0    Online    qfe0      Online
    IPMP Group: scnode1    sc_ipmp0    Online    qfe0      Online


    How the problem occur?
    After the installation of Sun Cluster 3.2 1/09 Update2 product with the java installer it's necessary to run the #scinstall command. If choose "Custom" installation instead of "Typical" installation then it's possible to change the default of the netmask of cluster interconnect. The following questions come up within the installation procedure if answering the default netmask question with 'no'.

    Example scinstall:
           Is it okay to accept the default netmask (yes/no) [yes]? no
           Maximum number of nodes anticipated for future growth [64]? 4
           Maximum number of private networks anticipated for future growth [10]?
           Maximum number of virtual clusters expected [12]? 0
           What netmask do you want to use [255.255.255.128]?
    Prevent the issue by answering the virtual clusters question with '1' or other serious consideration to future growth potential if necessary.
    Do NOT answer the virtual clusters question with '0'!


    Example of the whole scinstall log when corrupted ccr occur:

    In the /etc/cluster/ccr/global/infrastructure file the error can be found by an empty entry for cluster.properties.private_netmask. Furthermore some other lines are not reflect the correct values for netmask as choosen within scinstall.
    Wrong infrastructure file:
    cluster.state enabled
    cluster.properties.cluster_id 0x49F82635
    cluster.properties.installmode disabled
    cluster.properties.private_net_number 172.16.0.0
    cluster.properties.cluster_netmask 255.255.248.0
    cluster.properties.private_netmask
    cluster.properties.private_subnet_netmask 255.255.255.248
    cluster.properties.private_user_net_number 172.16.4.0
    cluster.properties.private_user_netmask 255.255.254.0

    cluster.properties.private_maxnodes 6
    cluster.properties.private_maxprivnets 10
    cluster.properties.zoneclusters 0
    cluster.properties.auth_joinlist_type sys

    If answering the virtual cluster question with value '1' then the correct netmask entries are:
    cluster.properties.cluster_id 0x49F82635
    cluster.properties.installmode disabled
    cluster.properties.private_net_number 172.16.0.0
    cluster.properties.cluster_netmask 255.255.255.128
    cluster.properties.private_netmask 255.255.255.128
    cluster.properties.private_subnet_netmask 255.255.255.248
    cluster.properties.private_user_net_number 172.16.0.64
    cluster.properties.private_user_netmask 255.255.255.224

    cluster.properties.private_maxnodes 6
    cluster.properties.private_maxprivnets 10
    cluster.properties.zoneclusters 1
    cluster.properties.auth_joinlist_type sys


    Workaround if problem already occured:
    1.) Boot all nodes in non-cluster-mode with 'boot -x'
    2.) Change the wrong values of /etc/cluster/ccr/global/infrastructure on all nodes. See example above.
    3.) Write a new checksum for all infrastructure files on all nodes. Use -o (master file) on the node which is booting up first.
    scnode1 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure -o
    scnode2 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    scnode1 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    scnode2 # /usr/cluster/lib/sc/ccradm -i /etc/cluster/ccr/global/infrastructure
    4.) first reboot scnode1 (master infrastructure file) into cluster, then the other nodes.
    This is reported in bug 6825948.


    Update 17.Jun.2009:
    The -33 revision of the Sun Cluster core patch is the first released version which fix this issue at installation time.
    126106-33 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-33 Sun Cluster 3.2: CORE patch for Solaris 10_x86

    Friday Apr 24, 2009

    Upgrade to Sun Cluster 3.2 1/09 Update2 and SUNWscr preremove script

    There is a missing/old preremove script in Sun Cluster 3.2 2/08 Update1 which is equivalent to the patches
    126106-12 until -19 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-12 until -19 Sun Cluster 3.2: CORE patch for Solaris 10_x86
    126105-12 until -19 Sun Cluster 3.2: CORE patch for Solaris 9

    This means in case of Upgrade (using scinstall -u) from Sun Cluster 3.2 to Sun Cluster 3.2 update1 or update2 the issue can occur.
    More details available in Missing preremove script in Sun Cluster 3.2 core patch revision 12 and higher.
    The issue is, if the mentioned Sun Cluster core patches are installed it is not possible to remove the SUNWscr package within the upgrade to Sun Cluster 3.2 1/09 Update2.

    The problem looks as:
    # ./scinstall -u update
    Starting upgrade of Sun Cluster framework software
    Saving current Sun Cluster configuration
    Do not boot this node into cluster mode until upgrade is complete.
    Renamed "/etc/cluster/ccr" to "/etc/cluster/ccr.upgrade".
    \*\* Removing Sun Cluster framework packages \*\*
        ...
        Removing SUNWscrtlh..done
        Removing SUNWscr.....failed
        scinstall: Failed to remove "SUNWscr"
        Removing SUNWscscku..done
        ...
    scinstall: scinstall did NOT complete successfully!

    126106-12 until -19 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-12 until -19 Sun Cluster 3.2: CORE patch for Solaris 10_x86
    126105-12 until -19 Sun Cluster 3.2: CORE patch for Solaris 9

    This means in case of Upgrade (using scinstall -u) from Sun Cluster 3.2 to Sun Cluster 3.2 update1 or update2 the issue can occur.
    More details available in Missing preremove script in Sun Cluster 3.2 core patch revision 12 and higher.
    The issue is, if the mentioned Sun Cluster core patches are installed it is not possible to remove the SUNWscr package within the upgrade to Sun Cluster 3.2 1/09 Update2.

    The problem looks as:
    # ./scinstall -u update
    Starting upgrade of Sun Cluster framework software
    Saving current Sun Cluster configuration
    Do not boot this node into cluster mode until upgrade is complete.
    Renamed "/etc/cluster/ccr" to "/etc/cluster/ccr.upgrade".
    \*\* Removing Sun Cluster framework packages \*\*
        ...
        Removing SUNWscrtlh..done
        Removing SUNWscr.....failed
        scinstall: Failed to remove "SUNWscr"
        Removing SUNWscscku..done
        ...
    scinstall: scinstall did NOT complete successfully!


    Workaround:
    Before the upgrade to Sun Cluster 3.2 Update1/Update2 install the following patch which delivers a correct preremove script for Sun Cluster 3.2
    140016 Sun Cluster 3.2: CORE patch for Solaris 9
    140017 Sun Cluster 3.2: CORE patch for Solaris 10
    140018 Sun Cluster 3.2: CORE patch for Solaris 10_x86

    If already one of the following patches installed then the above patches are not necessary, because these patches also include a correct preremove script for package SUNWscr.
    126106-27 or higher Sun Cluster 3.2: CORE patch for Solaris 10
    126107-28 or higher Sun Cluster 3.2: CORE patch for Solaris 10_x86
    126105-26 or higher Sun Cluster 3.2: CORE patch for Solaris 9

    This is reported in bugs 6676771 and 6747530 with further details.

    Wednesday Apr 08, 2009

    nested mounts may fail to mount in the correct order on Sun Cluster 3.2

    In case of Sun Cluster 3.2 it's possible that nested mounts will be mounted in the wrong order. As a result, the data on these file systems become inaccessible to users.

    The issue happen if one of the following Sun Cluster core patches are active and nested mounts are managed with resource type SUNW.HAStoragePlus.
    126106-27 or -29 or -30 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-28 or -30 or -31 Sun Cluster 3.2: CORE patch for Solaris 10_x86
    126107-26 or -28 or -29 Sun Cluster 3.2: CORE patch for Solaris 9

    The error can look like:
    The correct output of df -k should be
    /dev/vx/dsk/datadg/vol01 480751 1048 431628 1% /test
    /dev/vx/dsk/datadg/vol02 288639 1042 258734 1% /test/test2
    /dev/vx/dsk/datadg/vol03 577295 1041 518525 1% /test/test3

    The mount order is defined in the HAStoragePlus resource test-rs
    # clrs show -v test-rs | grep FilesystemMountPoints
    FilesystemMountPoints: /test /test/test2 /test/test3

    But, due to runtime problems the filesystems get mounted in wrong order and the df -k can look like:
    /dev/vx/dsk/datadg/vol02 480751 1048 431628 1% /test/test2
    /dev/vx/dsk/datadg/vol03 480751 1048 431628 1% /test/test3
    /dev/vx/dsk/datadg/vol01 480751 1048 431628 1% /test
    In this specific case, /test/test2 and /test/test3 were mounted first followed by an overlay mount of /test. Due to this, data in /test/test2 and /test/test3 would not be accessible and show the same information as /test.

    Workaround:
    It's possible to split the SUNW.HAStoragePlus resource. For the example above change the resource test-rs and remove the FilesystemMountPoints /test/test2 and /test/test3. Furthermore create a new resource test1-rs with the mentioned FilesystemMountPoints and add a resource dependency.
    The commands to change this specific configuration will be:
    # clrs set -p FilesystemMountPoints=/test test-rs
    # clrs create -g test-rg -t SUNW.HAStoragePlus -p FilesystemMountPoints=/test/test2,/test/test3 -p Resource_dependencies=test-rs -p AffinityOn=True test1-rs

    Due to this change the test1-rs starts after the test-rs and the problem is solved.
    Details available in:
    Alert 1020328.1 Nested Mounts Managed by a SUNW.HAStoragePlus Resource may Fail to Mount in the Correct Order on Solaris Cluster 3.2

    Update 17.Jun.2009:
    The -33 revision of the Sun Cluster core patch is the first released version which fix this issue.
    126106-33 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-33 Sun Cluster 3.2: CORE patch for Solaris 10_x86

    Monday Mar 09, 2009

    memory leaks in "rgmd -z global" process

    A memory leak occurs in the "rgmd -z global" process on Sun Cluster 3.2 1/09 Update2. The global zone instance of the rgmd process leaks memory in most situations such as "scstat" or "cluster show" and other basic commands. The problem is severe and the rgmd heap grows to a large size and crashes the Sun Cluster node.

    The issue only happen if one of the following Sun Cluster core patches are active.
    126106-27 or -29 or -30 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-28 or -30 or -31 Sun Cluster 3.2: CORE patch for Solaris 10_x86
    Due to the fact that this patches are also part of the Sun Cluster 3.2 1/09 Update2 release the issue occur also on fresh installed Sun Cluster 3.2 1/09 Update2 systems.

    The error can look as follows:
    Analyze the grow of memory allocation with (or similar tools)
    # prstat
    3942 root 61M 11M sleep 101 - 0:00:02 0.7% rgmd/41
    sometime later the increase of the memory allocation is visible.
    3942 root 61M 20M sleep 101 - 0:01:15 0.7% rgmd/41
    or
    # pmap -x <pid_of_rgmd-z_global> | grep heap
    00022000 47648 6992 6984 - rwx-- [ heap ]
    sometime later the increase of the memory allocation is visible.
    00022000 47648 15360 15352 - rwx-- [ heap ]

    When the memory is full the Sun Cluster node panics with the following message:
    Feb 25 07:59:23 node1 RGMD[1843]: [ID 381173 daemon.error] RGM: Could not allocate 1024 bytes; node is out of swap space; aborting node.
    ...
    Feb 25 08:10:05 node1 cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster that this node is panicking
    Feb 25 08:10:05 node1 unix: [ID 836849 kern.notice]
    Feb 25 08:10:05 node1 \^Mpanic[cpu0]/thread=2a100047ca0:
    Feb 25 08:10:05 node1 unix: [ID 562397 kern.notice] Failfast: Aborting zone "global" (zone ID 0) because "globalrgmd" died 30 seconds ago.
    Feb 25 08:10:06 node1 unix: [ID 100000 kern.notice]
    ...

    Update 20.Mar.2009:
    Available now:
    Alert 1020253.1 Memory Leak in the "rgmd" Process of Solaris Cluster 3.2 may Cause a failfast Panic

    Update 17.Jun.2009:
    The -33 revision of the Sun Cluster core patch is the first released version which fix this issue.
    126106-33 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-33 Sun Cluster 3.2: CORE patch for Solaris 10_x86


    Workaround: Use previous version -19 to prevent issue.
    126106-19 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-19 Sun Cluster 3.2: CORE patch for Solaris 10_x86

    The issue is reported in bug 6808508 (description: scalable services coredump during the failover due to network failure). A fix is in progress. This blog will be updated when the fix is available.

    Monday Feb 16, 2009

    Sun Cluster 3.2 1/09 Update2 Patches

    The Sun Cluster 3.2 1/09 Update2 is released. Click here for further information.

    The package version of the Sun Cluster 3.2 1/09 Update2 are the same for the core framework and the agents as for Sun Cluster 3.2 and Sun Cluster 3.2 2/08 Update1. Therefore it's possible to patch up an existing Sun Cluster 3.2 or Sun Cluster 3.2 2/08 Update1.

    The package version of the Sun Cluster Geographic Edition 3.2 1/09 Update2 are NOT the same as Sun Cluster Geographic Edition 3.2. Therefore an upgrade is necessary for the Geographic Edition.
    But don't worry about that, because unlike core Sun Cluster 3.2 the Geographic Edition framework does not create updates through patches. The update can be done without interruption of the service. Click here for details.

    The following patches (with the mentioned revision) are included in Sun Cluster 3.2 1/09 Update2. So the complete list is a combination of Sun Cluster 3.2 2/08 Update1 patches and this list. If these patches are installed on Sun Cluster 3.2 or Sun Cluster 3.2 2/08 Update1 release, then the features for framework & agents are identical. It's always necessary to read the "Special Install Instructions of the patch" but I made a note behind some patches where it's very important to read the "Special Install Instructions of the patch" (Using shortcut SIIOTP). Furthermore I made a note when a new resource type comes with the patch.

    New additional included patch revisions of Sun Cluster 3.2 1/09 Update2 for Solaris 10 05/08 update5 or higher
    126106-27 Sun Cluster 3.2: CORE patch for Solaris 10
    Note:
    Delivers SUNW.rac_udlm:3, SUNW.rac_framework:4, SUNW.crs_framework:2, SUNW.ScalMountPoint:3, SUNW.ScalDeviceGroup:3, SUNW.rac_svm:3, SUNW.rac_cvm:3 and SUNW.LogicalHostname:3 (but LogicalHostname was introduced in revision -17). Please read SIIOTP
    125514-05 Sun Cluster 3.2: Solaris Volume Manager (Mediator) Patch
    125992-03 Sun Cluster 3.2: SC Checks patch for Solaris 10
    126008-02 Sun Cluster 3.2: HA-DB Patch for Solaris 10
    126014-05 Sun Cluster 3.2: Ha-Apache Patch for Solaris 10
    126017-02 Sun Cluster 3.2: HA-DNS Patch for Solaris 10
    126020-04 Sun Cluster 3.2: HA-Containers Patch for Solaris 10 Note: Please read SIIOTP
    126023-03 Sun Cluster 3.2: Sun Cluster HA for Java Web Server, Patch for Solaris 10
    126026-02 Sun Cluster 3.2: HA-Kerberos Patch for Solaris 10
    126032-05 Sun Cluster 3.2: Ha-MYSQL Patch for Solaris 10 Note: Please read SIIOTP
    126035-05 Sun Cluster 3.2: HA-NFS Patch for Solaris 10
    126044-04 Sun Cluster 3.2: HA-PostgreSQL Patch for Solaris 10 Note: Please read SIIOTP
    126047-10 Sun Cluster 3.2: Ha-Oracle patch for Solaris 10 Note: Please read SIIOTP
    126050-03 Sun Cluster 3.2: HA-Oracle E-business suite Patch for Solaris 10
    126059-04 Sun Cluster 3.2: HA-SAPDB Patch for Solaris 10
    126062-06 Sun Cluster 3.2: HA-SAP-WEB-AS Patch for Solaris 10
    126068-05 Sun Cluster 3.2: HA-Sybase Patch for Solaris 10 Note: Please read SIIOTP
    126080-03 Sun Cluster 3.2: HA-Sun Java Systems App Server Patch for Solaris 10
    126083-02 Sun Cluster 3.2: HA-Sun Java Message Queue Patch for Solaris 10
    126092-03 Sun Cluster 3.2: HA-Websphere MQ Patch Note: Please read SIIOTP
    126095-05 Sun Cluster 3.2: Localization patch for Solaris 9 sparc and Solaris 10 sparc
    128556-03 Sun Cluster 3.2: Man Pages Patch for Solaris 9 and Solaris 10, sparc
    139921-02 Sun Cluster 3.2: JFreeChart patch for Solaris 10


    New additional included patch revisions of Sun Cluster 3.2 1/09 Update2 for Solaris 10 x86 05/08 update5 or higher
    126107-28 Sun Cluster 3.2: CORE patch for Solaris 10_x86
    Note:
    Delivers SUNW.rac_framework:4, SUNW.crs_framework:2, SUNW.ScalMountPoint:3, SUNW.ScalDeviceGroup:3, SUNW.rac_svm:3 and SUNW.LogicalHostname:3 (but LogicalHostname was introduced in revision -17). Please read SIIOTP
    125515-05 Sun Cluster 3.2: Solaris Volume Manager (Mediator) Patch
    125993-03 Sun Cluster 3.2: Sun Cluster 3.2: SC Checks patch for Solaris 10_x86
    126009-04 Sun Cluster 3.2: HA-DB Patch for Solaris 10_x86
    126015-06 Sun Cluster 3.2: HA-Apache Patch for Solaris 10_x86
    126018-04 Sun Cluster 3.2: HA-DNS Patch for Solaris 10_x86
    126021-04 Sun Cluster 3.2: HA-Containers Patch for Solaris 10_x86 Note: Please read SIIOTP
    126024-04 Sun Cluster 3.2: Sun Cluster HA for Java Web Server, Patch for Solaris 10_x86
    126027-04 Sun Cluster 3.2: HA-Kerberos Patch for Solaris 10_x86
    126033-06 Sun Cluster 3.2: Ha-MYSQL Patch for Solaris 10_x86 Note: Please read SIIOTP
    126036-06 Sun Cluster 3.2: HA-NFS Patch for Solaris 10_x86
    126045-05 Sun Cluster 3.2: HA-PostgreSQL Patch for Solaris 10_x86 Note: Please read SIIOTP
    126048-10 Sun Cluster 3.2: Ha-Oracle patch for Solaris 10_x86 Note: Please read SIIOTP
    126060-05 Sun Cluster 3.2: HA-SAPDB Patch for Solaris 10_x86
    126063-07 Sun Cluster 3.2: HA-SAP-WEB-AS Patch for Solaris 10_x86
    126069-04 Sun Cluster 3.2: HA_Sybase Patch for Solaris 10_x86 Note: Please read SIIOTP
    126081-04 Sun Cluster 3.2: HA-Sun Java Systems App Server Patch for Solaris 10_x86
    126084-04 Sun Cluster 3.2: HA-Sun Java Message Queue Patch for Solaris 10_x86
    126093-05 Sun Cluster 3.2: HA-Websphere MQ Patch for Solaris 10_x86 Note: Please read SIIOTP
    126096-05 Sun Cluster 3.2: Localization patch for Solaris 10 amd64 ??
    128557-03 Sun Cluster 3.2: Man Pages Patch for Solaris 10_x86
    139922-02 Sun Cluster 3.2: JFreeChart patch for Solaris 10_x86


    New additional included patch revisions of Sun Cluster 3.2 1/09 Update2 for Solaris 9 8/05 update8 or higher
    126105-26 Sun Cluster 3.2: CORE patch for Solaris 9
    Note:
    Delivers SUNW.rac_udlm:3, SUNW.rac_framework:4, SUNW.crs_framework:2, SUNW.ScalMountPoint:3, SUNW.ScalDeviceGroup:3, SUNW.rac_svm:3, SUNW.rac_cvm:3 and SUNW.LogicalHostname:3 (but LogicalHostname was introduced in revision -18). Please read SIIOTP
    125513-04 Sun Cluster 3.2: Solaris Volume Manager (Mediator) Patch
    125991-03 Sun Cluster 3.2: Sun Cluster 3.2: SC Checks patch for Solaris 9
    126007-02 Sun Cluster 3.2: HA-DB Patch for Solaris 9
    126013-05 Sun Cluster 3.2: HA-Apache Patch for Solaris 9
    126016-02 Sun Cluster 3.2: HA-DNS Patch for Solaris 9
    126022-03 Sun Cluster 3.2: Sun Cluster HA for Java Web Server, Patch for Solaris 9
    126031-05 Sun Cluster 3.2: Ha-MYSQL Patch for Solaris 9 Note: Please read SIIOTP
    126034-05 Sun Cluster 3.2: HA-NFS Patch for Solaris 9
    126043-04 Sun Cluster 3.2: HA-PostgreSQL Patch for Solaris 9 Note: Please read SIIOTP
    126046-10 Sun Cluster 3.2: HA-Oracle patch for Solaris 9 Note: Please read SIIOTP
    126049-03 Sun Cluster 3.2: HA-Oracle E-business suite Patch for Solaris 9
    126058-04 Sun Cluster 3.2: HA-SAPDB Patch for Solaris 9
    126061-06 Sun Cluster 3.2: HA-SAP-WEB-AS Patch for Solaris 9
    126067-05 Sun Cluster 3.2: HA-Sybase Patch for Solaris 9 Note: Please read SIIOTP
    126079-03 Sun Cluster 3.2: HA-Sun Java Systems App Server Patch for Solaris 9
    126082-02 Sun Cluster 3.2: HA-Sun Java Message Queue Patch for Solaris 9
    126091-03 Sun Cluster 3.2: HA-Websphere MQ Patch Note: Please read SIIOTP
    126095-05 Sun Cluster 3.2: Localization patch for Solaris 9 sparc and Solaris 10 sparc
    128556-03 Sun Cluster 3.2: Man Pages Patch for Solaris 9 and Solaris 10, sparc
    139920-02 Sun Cluster 3.2: JFreeChart patch for Solaris 9


    The quorum server is an alternative to the traditional quorum disk. The quorum server is outside of the Sun Cluster and will be accessed through the public network. Therefore the quorum server can be a different architecture.

    Included patch revisions in Sun Cluster 3.2 1/09 Update2 for quorum server feature:
    127404-02 Sun Cluster 3.2: Quorum Server Patch for Solaris 9
    127405-03 Sun Cluster 3.2: Quorum Server Patch for Solaris 10
    127406-03 Sun Cluster 3.2: Quorum Server Patch for Solaris 10_x86


    If some patches must be applied when the node is in noncluster mode, you can apply them in a rolling fashion, one node at a time, unless a patch's instructions require that you shut down the entire cluster. Follow procedures in How to Apply a Rebooting Patch (Node) in Sun Cluster System Administration Guide for Solaris OS to prepare the node and boot it into noncluster mode. For ease of installation, consider applying all patches at once to a node that you place in noncluster mode.

    Information about patch management available at Oracle Enterprise Manager Ops Center.

    Thursday Feb 12, 2009

    scalable service does not failover after network outage

    If a network outage occurs to the IPMP group which is part of the scalable resource group, then the scalable resource can NOT failover to the other host.

    The issue only happen if one the following Sun Cluster core patches are active.
    126106-27 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-28 Sun Cluster 3.2: CORE patch for Solaris 10_x86
    126105-26 Sun Cluster 3.2: CORE patch for Solaris 9
    Due to the fact that this patches are also part of the Sun Cluster 3.2 1/09 Update2 release the issue occur also on fresh installed Sun Cluster 3.2 1/09 Update2 systems.

    The error can look as follows:
    Feb 10 16:56:51 node1 in.mpathd[174]: NIC failure detected on e1000g0 of group ipmp0
    Feb 10 16:56:51 node1 in.mpathd[174]: Successfully failed over from NIC e1000g0 to NIC e1000g4
    Feb 10 16:57:18 node1 in.mpathd[174]: All Interfaces in group ipmp0 have failed
    Feb 10 16:57:19 node1 SC[SUNW.apache:4.1,apache-rg,apache-rs,SSM_IPMP_CALLBACK]: IPMP group ipmp0 has failed, so scalable resource apache-rs in resource group apache-rg may not be able to respond to client requests. A request will be issued to relocate resource apache-rs off of this node.
    Feb 10 16:57:23 node1 genunix: NOTICE: core_log: ssm_ipmp_callbac[2130] core dumped: /var/core/core.ssm_ipmp_callbac.2130.1227135261.0

    Update 7.Apr.2009:
    Solution: The bug 6774504 is fixed in
    126106-28 or higher Sun Cluster 3.2: CORE patch for Solaris 10
    126107-29 or higher Sun Cluster 3.2: CORE patch for Solaris 10_x86
    126105-27 or higher Sun Cluster 3.2: CORE patch for Solaris 9
    But the mentioned releases of the patches still have troubles with rgmd process. Please refer to memory leaks in "rgmd -z global" process .

    Workaround: Use previous version -19 if using scalable servcies
    126106-19 Sun Cluster 3.2: CORE patch for Solaris 10
    126107-19 Sun Cluster 3.2: CORE patch for Solaris 10_x86
    126105-19 Sun Cluster 3.2: CORE patch for Solaris 9

    The issue is reported in bug 6774504 (description: scalable services coredump during the failover due to network failure). A fix is in progress. This blog will be updated when the fix is available.

    Wednesday Jan 28, 2009

    private interconnect and patch 138888/138889


    In specific Sun Cluster 3.x configurations the cluster node can not join. Most of the time this issue comes up after the installation of kernel update patch
    138888-01 until 139555-08 or higher SunOS 5.10: Kernel Patch OR
    138889-01 until 139556-08 or higher SunOS 5.10_x86: Kernel Patch
    AND
    Sun Cluster 3.x using an Ethernet switch (with VLAN) for the private interconnect
    AND
    Sun Cluster 3.x using e1000g, nxge, bge or ixgb (GLDv3) interfaces for the private interconnect.

    The issue looks similar to the following messages during the boot up of the cluster node.
    ...
    Jan 25 15:46:14 node1 genunix: [ID 279084 kern.notice] NOTICE: CMM: node reconfiguration #2 completed.
    Jan 25 15:46:15 node1 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter e1000g1 constructed
    Jan 25 15:46:15 node1 ip: [ID 856290 kern.notice] ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast
    Jan 25 15:46:16 node1 genunix: [ID 884114 kern.notice] NOTICE: clcomm: Adapter e1000g3 constructed
    Jan 25 15:47:15 node1 genunix: [ID 604153 kern.notice] NOTICE: clcomm: Path node1:e1000g1 - node2:e1000g1 errors during initiation
    Jan 25 15:47:15 node1 genunix: [ID 618107 kern.warning] WARNING: Path node1:e1000g1 - node2:e1000g1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    Jan 25 15:47:16 node1 genunix: [ID 604153 kern.notice] NOTICE: clcomm: Path node1:e1000g3 - node2:e1000g3 errors during initiation
    Jan 25 15:47:16 node1 genunix: [ID 618107 kern.warning] WARNING: Path node1:e1000g3 - node2:e1000g3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
    ...
    Jan 25 16:33:51 node1 genunix: [ID 224783 kern.notice] NOTICE: clcomm: Path node1:e1000g1 - node2:e1000g1 has been deleted
    Jan 25 16:33:51 node1 genunix: [ID 638544 kern.notice] NOTICE: clcomm: Adapter e1000g1 has been disabled
    Jan 25 16:33:51 node1 genunix: [ID 224783 kern.notice] NOTICE: clcomm: Path node1:e1000g3 - node2:e1000g3 has been deleted
    Jan 25 16:33:51 node1 genunix: [ID 638544 kern.notice] NOTICE: clcomm: Adapter e1000g3 has been disabled
    Jan 25 16:33:51 node1 ip: [ID 856290 kern.notice] ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast

    Update 6.Mar.2009:
    Available now:
    Alert 1020193.1 Kernel Patches/Changes may Stop Sun Cluster Nodes From Joining the Cluster

    Update 26.Jun.2009:
    The issue is fixed in the patches
    141414-01 or higher SunOS 5.10: kernel patch OR
    137104-02 or higher SunOS 5.10_x86: dls patch

    Both patches require the 13955[56]-08 kernel update patch which is included in Solaris 10 5/09 update7. If using Solaris 10 5/09 update7 then Sun Cluster 3.2 requires the Sun Cluster core patch in revision -33 or higher. So, to get this one fixed it's recommended to use Solaris 10 5/09 update7 (patch 13955[56]-8 or higher & 141414-01(sparc) or 137104-02(x86)) with the Sun Cluster 3.2 core patch -33 or higher.


    Choose one of the corrective actions (if not install the patch with the fix):
    • Before install the mention patches configure VLAN tagging on the Sun interface and on the switch. This makes VLAN tagged packets expected and prevents drops. This means the interface name moves to e.g. e1000g810000. After configuration change to e.g. e1000g810000 it's recommend to reboot the Sun Cluster hosts. Configuration details.

    • If using the above mentioned kernel update patch enable QoS (Quality of Service) on the Ethernet switch. The switch should be able to handle priority tagging. Please refer to the switch documentation because each switch is different.

    • Do not install the above mentioned kernel update patch if using VLAN in Sun Cluster 3.x private interconnect.

    The mentioned kernel update patch delivers some new features in the GLDv3 architecture. It makes packets 802.1q standard compliant by including priority tagging. Therefore the following Sun Cluster 3.x configuration should not be affected.
    \* Sun Cluster 3.x which use ce, ge, hme, qfe, ipge or ixge network interfaces.
    \* Sun Cluster 3.x which have back-to-back connections for the private interconnect.
    \* Sun Cluster 3.x on Solaris 8 or Solaris 9.

    Sunday Jan 18, 2009

    ce_taskq_disable and Sun Cluster 3.x


    The /etc/system variable "set ce:ce_taskq_disable=1" is always in discussion with Sun Cluster 3.x. Now some new features available which makes this value unnecessary. If the following conditions are met then remove "set ce:ce_taskq_disable=1" from /etc/system .


    Overview: There are two enhancements which are solved:

    6281341: ce_taskq_disable should be able to set on per instance basis. (fixed in solaris patches)
    6487117: Sun Cluster should automatically request for intr mode RX processing for private interconnects. (fixed in Sun Cluster patches)

    These enhancements are integrated in
    Solaris 10:
    118777-12 SunOS 5.10: Sun GigaSwift Ethernet 1.0 driver patch (bundled in Solaris 10 5/08 Update5 onwards)
    125915-01 SunOS 5.10: dlpi.h patch 125915-01 Obsoleted by: 128004-01 SunOS 5.10: dlpi.h patch
    128004-01 SunOS 5.10: headerfile patch (bundled in Solaris 10 5/08 Update5 onwards)
    120500-20 Sun Cluster 3.1: Core Patch for Solaris 10
    or
    126106-18 Sun Cluster 3.2: CORE patch for Solaris 10

    Solaris 10_x86:
    118778-11 SunOS 5.10_x86: Sun GigaSwift Ethernet 1.0 driver patch (bundled in Solaris 10 5/08 Update5 onwards)
    125916-01 SunOS 5.10_x86: dlpi.h patch 125916-01 Obsoleted by: 128005-01 SunOS 5.10_x86: dlpi.h patch
    128005-01 SunOS 5.10_x86: headerfile patch (bundled in Solaris 10 5/08 Update5 onwards)
    120501-20 Sun Cluster 3.1: Core Patch for Solaris 10_x86
    or
    126107-18 Sun Cluster 3.2: CORE patch for Solaris 10_x86

    Solaris 9:
    112817-32 SunOS 5.9: Sun GigaSwift Ethernet 1.0 driver patch
    126849-01 SunOS 5.9: patch usr/include/sys/dlpi.h
    117949-35 Sun Cluster 3.1: Core Patch for Solaris 9
    or
    126105-18 Sun Cluster 3.2: CORE patch for Solaris 9

    Note: There are NO patches for Solaris 8 and Solaris 9_x86 because the 6281341 is not backported to these releases.

    Overall recommendation for Solaris 10: Due to some other issues it makes sense to use Solaris 10 10/08 Update6 (patches are bundled) instead of Solaris 10 5/08 Update5 and the mentioned Sun Cluster 3.x core patch.
    This is especially due to Alert 1019642.1: Failure to run clock thread may lead to a system hang.


    Otherwise if you can not install these patches the workaround is:

    If using supported network adapters which use the \*ce\* network driver for private interconnect, uncomment (activate) in /etc/system:
    set ce:ce_taskq_disable=1
    Sun Cluster installation automatically add this value to the /etc/system file.
    Additional consider to use following settings in case of performance issues in the public network. Beware this tuning always depends on the network infrastructure!
    set ce:ce_ring_size=1024
    set ce:ce_comp_ring_size=4096

    Note: If using \*ce\* network driver only for public network the default value of ce_taskq_disable=0 is ok.


    Need to know:
    In case of Sun Cluster 3.1 and 3.2 remove the following entry from /etc/system if active.
    set ce:ce_reclaim_pending=1
    This value was only necessary for Sun Cluster 3.0.


    Further details available in Technical Intructions 1017839.1

    Friday Jan 16, 2009

    blogging T-shirt




    Great, I won a T-Shirt.

    Thank you very much. I got this T-Shirt because you are reading this blog. This is due to the fact that the hit rate of this blog increased last year in fall. I try to keep the quality...
    See also "I'm Blogging This for Sun"

    Thursday Dec 11, 2008

    Tips to configure IPMP with Sun Cluster 3.x

    Configure IPMP (probe based or link based):
    Setup IPMP (IP network multipathing) groups on all nodes for all public network interfaces which are used for a HA dataservice. This article describe a summary of possibilities and known issues. An overview about IPMP can be found in System Administration Guide: IP Services.


    Example probe-based IPMP group active-active with interfaces qfe0 and qfe4 with one production IP:

    Entry of /etc/hostname.qfe0:
    <production_IP_host> netmask + broadcast + group ipmp1 up \\
    addif <test_IP_host> netmask + broadcast + deprecated -failover up

    Entry of /etc/hostname.qfe4:
    <test_IP_host> netmask + broadcast + group ipmp1 deprecated -failover up
    The IPMP group name ipmp1 is freely chosen in this example!

    If the defaultrouter is NOT 100% available please read
    Technical Instruction 1010640.1: Summary of typical IPMP Configurations
    and
    Technical Instruction 1001790.1: The differences between Network Adapter Failover (Sun Cluster 3.0) and IP Multipathing (Sun Cluster 3.1)

    Notes:
    \* Do not use test IP for normal applications.
    \* When using Solaris9 12/02 or later & Sun Cluster 3.1 Update1 or later there is no need for a IPMP test address if you have only 1 IP address in the IPMP group. (RFE 4511634, 4741473)
      e.g: of /etc/hostname.qfe0 entry:
        <production_IP_host> netmask + broadcast + group ipmp1 up
    \* Test IP for all adapters in the same IPMP group must belong to a single IP subnet.


    Example link-based IPMP group active-active with interfaces qfe0 and qfe4 with one production IP:

    Entry of /etc/hostname.qfe0:
    <production_IP_host> netmask + broadcast + group ipmp1 up
    Entry of /etc/hostname.qfe4:
    <dummy_IP_host> netmask + broadcast + deprecated group ipmp1 up

    Notes:
    \* Do NOT use the 0.0.0.0 IP address as dummy_IP_host for link based due to bug 6457375.
    \* This time the recommendation is to use valid IP address but it can also be a dummy IP address.

    The bug 6457375 is fixed in kernel update patch 138888-01 (sparc) or 138889-01 (x86). These kernel patches are based on Solaris 10 10/08 Update6. Now it's possible to use the 0.0.0.0 IP address as described in the following example:
    Entry of /etc/hostname.qfe0:
    <production_IP_host> netmask + broadcast + group ipmp1 up
    Entry of /etc/hostname.qfe4:
    group ipmp1 up

    Further Details:
    Technical Instruction 1008064.1: IPMP Link-based Only Failure Detection with Solaris 10 Operating System (OS)


    Hints / Checkpoints for all configurations:
    • You need an additional IP for each logical host.

    • If there is a firewall being used between clients and a HA service running on this cluster, and if this HA service is using UDP and does not bind to a specific address, the IP stack choses the source address for all outgoing packages from the routing table. So, as there is no guarantee that the same source address is chosen for all packages - the routing table might change - it is necessary to configure all addresses available on a network interface as valid source addresses in the firewall. More details can be found in the blog Why a logical IP is marked as DEPRECATED?

    • IPMP groups as active-standby configuration is also possible.

    • In the /etc/default/mpathd file, the value of TRACK_INTERFACES_ONLY_WITH_GROUPS must be yes (default).

    • In case of Sun Cluster 3.1 The FAILBACK in /etc/default/mpathd file must be yes (default). Bug 6429808. Fixed in Sun Cluster 3.2.

    • Use only one IPMP group in the same subnet. It's not supported to use more IPMP groups in the same subnet.

    • The SC installer adds an IPMP group to all public network adapters. If desired remove the IPMP configuration for network adapters that will NOT be used for HA dataservices.

    • Remove IPMP groups from dman interfaces (SunFire 12/15/20/25K) if exists. (Bug 6309869)

    Thursday Oct 02, 2008

    Sun Cluster 3.2 and VxVM 5.0 patch 124361-06

    There are some issues around after you have installed the

    Patch-ID# 124361-06
    Synopsis: VRTSvxvm 5.0_MP1_RP5: Rolling Patch 5 for Volume Manager 5.0 MP1


    This patch changes the handling of VxVM devices which is leading to disputes with Sun Cluster 3.2.

    Seen errors:
    a)
    host0 # scswitch -z -D testdg -h host1
    Sep 26 10:20:17 host0 Cluster.CCR: build_devlink_list: readlink failed for /dev/vx/dsk//global/.devices/node@1/dev/vx/dsk/testdg: No such file or directory

    Sep 26 10:23:41 host0 SC[SUNW.HAStoragePlus:6,test-rg,test-hastp-rs,hastorageplus_prenet_start]: Failed to analyze the device special file associated with file system mount point /test/data/AB: No such file or directory


    b)
    host0 # clrg create test-rg
    host0 # clresource create -g test-rg -t SUNW.HAStoragePlus -p FileSystemMountPoints="/testdata" test-rs
    clresource: host1 - Failed to analyze the device special file associated with file system mount point /testdata: No such file or directory.

    clresource: (C189917) VALIDATE on resource test-rs, resource group test-rg, exited with non-zero exit status.
    clresource: (C720144) Validation of resource test-rs in resource group test-rg on node node1 failed.
    clresource: (C891200) Failed to create resource "test-rs".


    On the other node:
    host1# Sep 26 14:27:38 host1 SC[SUNW.HAStoragePlus:4,test-rg,test-rs,hastorageplus_validate]: Failed to analyze the device special file associated with file system mount point /testdata: No such file or directory.
    Sep 26 14:27:38 host1 Cluster.RGM.rgmd: VALIDATE failed on resource , resource group , time used: 0% of timeout <1800, seconds>

    Workaround:
    Do not install patch 124361-06 use patch 124361-05.
    Important if patch is already installed: Before backing out 124361-06 ensure that Solaris 10 patch 125731-02 is installed to avoid bug 6622037.


    Update 10.Oct.2008: New patch 122058-11 is released which fix the problem and obsoletes 124361-06
    Patch-ID# 122058-11
    Synopsis: VRTSvxvm 5.0MP3: Maintenance Patch for Volume Manager 5.0


    Update 24.Oct.2008:
    Basically the problems all arise when 124361-06 is installed and a VxVM volume is created on a Sun Cluster configuration. With patch 124361-06, when a vxvm volume is created it creates the special device under /devices and then we have symbolic links under /dev/vx/[r]dsk// that point to the /devices entries. This behaviour does not happen when 122058-11 is installed. The special files are created under /dev/vx/[r]dsk/< dg >/ and NOT /devices.

    Check if the devices are correct. It's quite important that NO links in the mentioned directory of a device group. Two workarounds available if the wrong links exist. Besure that 122058-11 is already installed and volumes are inactive.

    Workaround1:
    node1# cd /global/.devices/node@1/dev/vx/dsk/testdg
    node1# ls -l
    total 4
    lrwxrwxrwx 1 root root 46 Oct 15 16:27 vol01 -> /devices/pseudo/vxio@0:testdg,vol01,59000,blk
    lrwxrwxrwx 1 root root 46 Oct 15 16:27 vol02 -> /devices/pseudo/vxio@0:testdg,vol02,59001,blk
    node1# rm vol01 vol02
    node1# cd /global/.devices/node@1/dev/vx/rdsk/testdg
    node1# ls -l
    total 4
    lrwxrwxrwx 1 root root 46 Oct 15 16:27 vol01 -> /devices/pseudo/vxio@0:testdg,vol01,59000,raw
    lrwxrwxrwx 1 root root 46 Oct 15 16:27 vol02 -> /devices/pseudo/vxio@0:testdg,vol02,59001,raw
    node1# rm vol01 vol02
    node1#
    node1# cldg sync testdg
    node1#
    node1# ls -l /global/.devices/node@1/dev/vx/dsk/testdg
    total 0
    brw------- 1 root root 282, 59000 Oct 15 16:32 vol01
    brw------- 1 root root 282, 59001 Oct 15 16:32 vol02
    node1# ls -l /global/.devices/node@1/dev/vx/rdsk/testdg
    total 0
    crw------- 1 root root 282, 59000 Oct 15 16:32 vol01
    crw------- 1 root root 282, 59001 Oct 15 16:32 vol02

    Workaround2:
    If symlink exist remove the symbolic links
    node1# rm /dev/vx/[r]dsk/testdg/symlink
    and then recreate the special files using the cluster command
    node1# /usr/cluster/lib/dcs/scvxvmlg

    Afterwards for security purposes a reconfiguation boot of all nodes is recommended.

    Update 30.Oct.2008:
    Available now:
    Alert 1019694.1 Sun Cluster Resource "HAstoragePlus" May Fail if Veritas Volume Manager Patch 124361-06 is Installed

    Tuesday Aug 19, 2008

    SAS/SATA HBA 375-3487 for Sun Storage J4200/J4400

    The Sun Storage Arrays J4x00 was released some days ago.

    At the time the SAS/SATA HBA for J4200 and J4400 is necessary:
    Option: SG-XPCIE8SAS-E-Z
    Part number: 375-3487
    Codename: Pandora
    - Details in System Handbook
    - Sun StorageTek PCI Express SAS 8-Channel Internal HBA Installation Guide

    IMPORTANT: Revision -02 of Pandora HBA is a requirement.
    This means 375-3487-02 for J4200/J4400 and the 375-3487-01 can only be used with ST2530.
    See also Sun Storage J4200/J4400 Array Release Notes

    How to identify the Revision?
    1.) Hardware: Look to the part number label on the HBA itself.
    2.) Solaris: Do
    # prtpicl -v > /tmp/prtpicl.out
    # vi /tmp/prtpicl.out
    Search for
    subsystem-id with 0x3150
    verify
    device-id is 0x58 and
    vendor-id is 0x1000
    revision-id of 0x2 == 375-3487-01
    revision-id of 0x8 == 375-3487-02



    Sample output of 375-3487-01 in X4600 with Solaris 10 5/08 x86:
    ...
        pci1000,3150 (obp-device, a40000098c)
         :DeviceID 0
         :UnitAddress 4,4
         :pcie-capid-reg 0x1
         :pcie-capid-pointer 0x68
         :pci-msi-capid-pointer 0x98
         :pci-msix-capid-pointer 0xb0
         :device-id 0x58
         :vendor-id 0x1000
         :revision-id 0x2
         :class-code 0x10000
         :unit-address 0
         :subsystem-id 0x3150
         :subsystem-vendor-id 0x1000
         :interrupts 0x1
         :devsel-speed 0
         :power-consumption 01 00 00 00 01 00 00 00
          :model SCSI bus controller
          :compatible (a4000009adTBL)
          | pciex1000,58.1000.3150.2 |
          | pciex1000,58.1000.3150 |
          | pciex1000,58.2 |
          | pciex1000,58 |
          | pciexclass,010000 |
          | pciexclass,0100 |
          | pci1000,58.1000.3150.2 |
          | pci1000,58.1000.3150 |
          | pci1000,3150 |
          | pci1000,58.2 |
          | pci1000,58 |
          | pciclass,010000 |
          | pciclass,0100 |
    ...


    Patch 125081-16 (sparc) or 125082-16 (x86) are required. They are embedded in Solaris 10 5/08 Update5.

    Wednesday Aug 06, 2008

    Sun SPARC Enterprise Mx000 with active bge interface

    The Sun SPARC Enterprise Server M4000, M5000, M8000 or M9000 can sporadically hang at boot time
    a) if the system is part of Sun Cluster
    and
    b) if the system have a configured bge network interface


    Example of boot hang:
    ...
    Booting as part of a cluster
    NOTICE: CMM: Node node1 (nodeid = 1) with votecount = 1 added.
    NOTICE: CMM: Node node2 (nodeid = 2) with votecount = 1 added.
    NOTICE: CMM: Quorum device 2 (/dev/did/rdsk/d7s2) added; votecount = 5, bitmask of nodes with configured paths = 0x3f.
    NOTICE: clcomm: Adapter bge3 constructed
    ... now the system hang at this point ...


    Solution: Install 138042-02 (or higher) of SunOS 5.10: MAC patch

    Monday Jun 30, 2008

    prevent reservation conflict panic if using active/passive storage controller

    Reservation conflicts can happen in a Sun Cluster environment if using active/passive storage controllers e.g. SE6540, SE6140, FLX380.

    First of all you should always consider to disable auto-failback flag if using MPxIO on shared devices. This can also prevent reservation conflict panics.

    Change the auto-failback value in /kernel/drv/scsi_vhci.conf to disable.
    e.g of kernel/drv/scsi_vhci.conf
    ...
    # Automatic failback configuration
    # possible values are auto-failback="enable" or auto-failback="disable"
    auto-failback="disable";
    ...


    Furthermore the reservation conflict panic was seen when one cluster node is down and the shared storage array made some (at least 2 or 3) failovers between the active/passive controllers. The behavior always depends on the design of the storage array controller.

    Two workarounds are available at the moment:

    1.) In case of Sun Cluster 3.2 force the cluster to do scsi3 reservations even in 2 node cluster configurations. If you have a 3 node (or more nodes), the cluster should do scsi3 reservations anyway.

    Be aware of Alert 1019005.1. In case of SE6540/SE6140/FLX380 use firmware 6.60.11.xx (which is part of CAM 6.1) or higher. To avoid trouble update this code before enabling SCSI3 reservations.

    To force the Sun Cluster 3.2 to do scsi3 reservations run the command:
    # cluster set -p global_fencing=prefer3

    Verify the setting using :
    # cluster show | grep -i scsi
       Type:                       scsi
       Access Mode:        scsi3


    2.) Allow Reservation on Unowned LUNs in SE6540/SE6140. You should prefer the workaround #1 but in case of Sun Cluster 3.1 you can not force scsi3 reservation mechanism for 2 node clusters. So, there is a need to use scsi2 reservations.

    The bit "Allow Reservation on Unowned LUNs" determines the controller response to Reservation/Release commands that are received for LUNs that are not owned by the controller. The value needs to be changed from 0x01 to 0x00. Beware this setting will be lost after a NVSRAM update!

    Using CAM management software do the following:
    # cd /opt/SUNWsefms/bin/

    For 6540/FLX380/FLX240/FLX280 run:
    # ./service -d -c set -q nvsram region=0xf2 offset=0x19 value=0x00 host=0x02

    For 6140 and 6130 run:
    # ./service -d -c set -q nvsram region=0xf2 offset=0x19 value=0x00 host=0x00

    Reboot both controllers in order to make the change active :
    # ./service -d -c reset -t a

    Wait at least 5 minutes until the A controller is up again.
    # ./service -d -c reset -t b


    Why this not happing before? With the changes of patch 125081-14 (sparc) or 125082-14 (x86) Sun deliver new driver for MPxIO. Due to this changes the problem can be triggered.

    Tuesday May 27, 2008

    Missing preremove script in Sun Cluster 3.2 core patch revision 12 and higher.

    In my last blog I stated that Sun Cluster 3.2 GA release with the -12 Sun Cluster core patch is the same as Sun Cluster 3.2 2/08 aka Update1. This is still true but the preremove script of the SUNWscr package is missing in the Sun Cluster 3.2 core patch revision -12 and higher. This is documented as internal bug 6676771. Therefore it's NOT possible to remove the SUNWscr package when the revision -12 or higher of the Sun Cluster 3.2 core patch is installed. (Lower revisions of the core patches are NOT affected.) The remove of the SUNWscr package is necessary in case of an upgrade by using the command "scinstall -u update".


    The fastest workaround is described the special install instructions of the Sun Cluster core patch revision -12:
    NOTE 5: After removing this patch, remove the SunCluster smf service for
            service tag.
            svcadm disable /system/cluster/sc_svtag:default
            svccfg delete /system/cluster/sc_svtag:default
    Execute these commands before the start of Sun Cluster upgrade.


    To fix the issue immediately it's possible to change the preremove script of SUNWscr package. At the moment the preremove script of SUNWscr will NOT be delivered with the Sun Cluster core patch. Therefore the workaround is persistent.

    Add the following to the /var/sadm/pkg/SUNWscr/install/preremove script (version 1.3):

    1.) New subroutine (before the main part)
    remove_svtag()
    {
          STCLIENT=${BASEDIR}/usr/bin/stclient
          CL_URN_FILE=${BASEDIR}/var/sadm/servicetag/cl.urn
          if [ -f ${CL_URN_FILE} ]; then
             # read the urn from the file
             URN=`cat ${CL_URN_FILE}`
             if [ -f ${STCLIENT} ]; then
             ${STCLIENT} -d -i ${URN} >/dev/null 2>&1
             fi
             rm -f ${CL_URN_FILE}
          fi
          return 0
    }


    2.) In the part of SVCADMD="/usr/sbin/svcadm disable -s" add
    $SVCADMD svc:/system/cluster/sc_svtag:default

    3.) Near the end of main routine before the line "if [ ${errs} -ne 0 ]; then" add
    # Remove service tag for cluster
    remove_svtag || errs=`expr ${errs} + 1`


    Or download the new preremove version 1.5 script for SUNWscr package and replace the 1.3 version.
    # cd /var/sadm/pkg/SUNWscr/install/
    # cp premove premove.old
    # cp preremove_version1.5_SUNWscr premove

    Thursday Apr 10, 2008

    Sun Cluster 3.2 2/08 Update1 Patches

    The Sun Cluster 3.2 2/08 Update1 is released. Click here for further information.

    The package version of the Sun Cluster 3.2 2/08 Update1 are the same for the core framework and the agents as Sun Cluster 3.2. Therefore it's possible to patch up an existing Sun Cluster 3.2.

    The package version of the Sun Cluster Geographic Edition 3.2 2/08 Update1 are NOT the same as Sun Cluster Geographic Edition 3.2. Therefhttps://updates.oracle.com/Orion/Services/download?type=readme&bugfix_name=ore an upgrade is necessary for the Geographic Edition.

    The following patches (with the mentioned revision) are included in Sun Cluster 3.2 2/08 Update1. If these patches are installed on Sun Cluster 3.2 release, then the features for framework & agents are identical. It's always necessary to read the "Special Install Instructions of the patch" but I made a note behind some patches where it's very important to read the "Special Install Instructions of the patch" (Using shortcut SIIOTP). Furthermore I made a note when a new resource type comes with the patch.

    Included patch revisions in Sun Cluster 3.2 2/08 Update1 for Solaris 10 11/06 update3 or higher
    125511-02 Sun Cluster 3.2: Core Patch for Solaris 10
    126106-12 Sun Cluster 3.2: CORE patch for Solaris 10 Note: Deliver SUNW.HAStoragePlus:6 Please read SIIOTP
    125508-06 Sun Cluster 3.2: Manageability and Serviceability Agent for Solaris 10 Note: Please read SIIOTP
    125514-02 Sun Cluster 3.2: Solaris Volume Manager (Mediator) Patch
    125517-04 Sun Cluster 3.2: OPS Core Patch for Solaris 10 Note: Deliver SUNW.rac_framework:3 Please read SIIOTP
    125992-01 Sun Cluster 3.2: SC Checks patch for Solaris 10
    125998-01 Sun Cluster 3.2: Sun Cluster Reliable Datagram Transport Patch
    126008-01 Sun Cluster 3.2: HA-DB Patch for Solaris 10
    126011-02 Sun Cluster 3.2: HA-DHCP Patch for Solaris 10
    126014-03 Sun Cluster 3.2: Ha-Apache Patch for Solaris 10
    126017-01 Sun Cluster 3.2: HA-DNS Patch for Solaris 10
    126020-02 Sun Cluster 3.2: HA-Containers Patch for Solaris 10 Note: Please read SIIOTP
    126023-02 Sun Cluster 3.2: Sun Cluster HA for Java Web Server, Patch for Solaris 10
    126026-01 Sun Cluster 3.2: HA-Kerberos Patch for Solaris 10
    126029-01 Sun Cluster 3.2: HA-LiveCache Patch for Solaris 10
    126032-03 Sun Cluster 3.2: Ha-MYSQL Patch for Solaris 10 Note: Please read SIIOTP
    126035-02 Sun Cluster 3.2: HA-NFS Patch for Solaris 10
    126041-01 Sun Cluster 3.2: HA-N1 Grid Engine Patch for Solaris 10
    126044-02 Sun Cluster 3.2: HA-PostgreSQL Patch for Solaris 10 Note: Please read SIIOTP
    126047-05 Sun Cluster 3.2: Ha-Oracle patch for Solaris 10 Note: Please read SIIOTP
    126050-02 Sun Cluster 3.2: HA-Oracle E-business suite Patch for Solaris 10
    126053-01 Sun Cluster 3.2: HA-Oracle Application Server Patch for Solaris 10
    126056-01 Sun Cluster 3.2: HA-SAP Patch for Solaris 10 Note: Deliver SUNW.sap_as:4 Please read SIIOTP
    126059-03 Sun Cluster 3.2: HA-SAPDB Patch for Solaris 10
    126062-03 Sun Cluster 3.2: HA-SAP-WEB-AS Patch for Solaris 10
    126065-03 Sun Cluster 3.2: HA-Siebel Patch for Solaris 10
    126068-03 Sun Cluster 3.2: HA-Sybase Patch for Solaris 10 Note: Deliver SUNW.sybase:5 Please read SIIOTP
    126071-01 Sun Cluster 3.2: HA-Tomcat Patch for Solaris 10
    126074-01 Sun Cluster 3.2: HA-BEA WebLogic Patch for Solaris 10
    126080-02 Sun Cluster 3.2: HA-Sun Java Systems App Server Patch for Solaris 10
    126083-01 Sun Cluster 3.2: HA-Sun Java Message Queue Patch for Solaris 10
    126092-02 Sun Cluster 3.2: HA-Websphere MQ Patch
    126095-04 Sun Cluster 3.2: Localization patch for Solaris 9 sparc and Solaris 10 sparc
    128481-02 Sun Cluster 3.2: Grid Service Provisioning for Solaris 10
    128556-01 Sun Cluster 3.2: Man Pages Patch for Solaris 9 and Solaris 10, sparc


    Included patch revisions in Sun Cluster 3.2 2/08 Update1 for Solaris 10 x86 11/06 update3 or higher
    125512-02 Sun Cluster 3.2: Core Patch for Solaris 10_x86
    126107-12 Sun Cluster 3.2: CORE patch for Solaris 10_x86 Note: Deliver SUNW.HAStoragePlus:6 Please read SIIOTP
    125509-06 Sun Cluster 3.2: Manageability and Serviceability Agent for Solaris 10_x86 Note: Please read SIIOTP
    125515-02 Sun Cluster 3.2: Solaris Volume Manager (Mediator) Patch
    125518-05 Sun Cluster 3.2: OPS Core Patch for Solaris 10_x86 Note: Deliver SUNW.rac_framework:3 Please read SIIOTP
    125993-01 Sun Cluster 3.2: Sun Cluster 3.2: SC Checks patch for Solaris 10_x86
    126009-03 Sun Cluster 3.2: HA-DB Patch for Solaris 10_x86
    126012-03 Sun Cluster 3.2: HA-DHCP Patch for Solaris 10_x86
    126015-04 Sun Cluster 3.2: HA-Apache Patch for Solaris 10_x86
    126018-03 Sun Cluster 3.2: HA-DNS Patch for Solaris 10_x86
    126021-02 Sun Cluster 3.2: HA-Containers Patch for Solaris 10_x86 Note: Please read SIIOTP
    126024-03 Sun Cluster 3.2: Sun Cluster HA for Java Web Server, Patch for Solaris 10_x86
    126027-03 Sun Cluster 3.2: HA-Kerberos Patch for Solaris 10_x86
    126030-03 Sun Cluster 3.2: HA-LiveCache Patch for Solaris 10_x86
    126033-04 Sun Cluster 3.2: Ha-MYSQL Patch for Solaris 10_x86 Note: Please read SIIOTP
    126036-03 Sun Cluster 3.2: HA-NFS Patch for Solaris 10_x86
    126042-01 Sun Cluster 3.2: HA-N1 Grid Engine Patch for Solaris 10_x86
    126045-03 Sun Cluster 3.2: HA-PostgreSQL Patch for Solaris 10_x86 Note: Please read SIIOTP
    126048-06 Sun Cluster 3.2: Ha-Oracle patch for Solaris 10_x86 Note: Please read SIIOTP
    126054-02 Sun Cluster 3.2: HA-Oracle Application Server Patch for Solaris 10_x86
    126057-03 Sun Cluster 3.2: HA-SAP Patch for Solaris 10_x86 Note: Deliver SUNW.sap_as:4 Please read SIIOTP
    126060-04 Sun Cluster 3.2: HA-SAPDB Patch for Solaris 10_x86
    126063-04 Sun Cluster 3.2: HA-SAP-WEB-AS Patch for Solaris 10_x86
    126069-02 Sun Cluster 3.2: HA_Sybase Patch for Solaris 10_x86 Note: Deliver SUNW.sybase:5 Please read SIIOTP
    126072-01 Sun Cluster 3.2: HA-Tomcat Patch for Solaris 10_x86
    126075-03 Sun Cluster 3.2: HA-BEA WebLogic Patch for Solaris 10_x86
    126081-03 Sun Cluster 3.2: HA-Sun Java Systems App Server Patch for Solaris 10_x86
    126084-03 Sun Cluster 3.2: HA-Sun Java Message Queue Patch for Solaris 10_x86
    126090-01 Sun Cluster 3.2: HA Websphere Messag Broker Patch for Solaris 10_x86
    126093-04 Sun Cluster 3.2: HA-Websphere MQ Patch for Solaris 10_x86
    126096-04 Sun Cluster 3.2: Localization patch for Solaris 10 amd64
    128482-02 Sun Cluster 3.2: Grid Service Provisioning for Solaris 10_x86
    128557-01 Sun Cluster 3.2: Man Pages Patch for Solaris 10_x86


    Included patches revisions in Sun Cluster 3.2 2/08 Update1 for Solaris 9 8/05 update8 or higher
    125510-02 Sun Cluster 3.2: Core Patch for Solaris 9
    126105-12 Sun Cluster 3.2: CORE patch for Solaris 9 Note: Deliver SUNW.HAStoragePlus:6 Please read SIIOTP
    125507-06 Sun Cluster 3.2: Manageability and Serviceability Agent for Solaris 9 Note: Please read SIIOTP
    125513-01 Sun Cluster 3.2: Solaris Volume Manager (Mediator) Patch
    125516-04 Sun Cluster 3.2: OPS Core Patch for Solaris 9 Note: Deliver SUNW.rac_framework:3 Please read SIIOTP
    125597-01 Sun Cluster 3.2: Sun Cluster Reliable Datagram Transport Patch
    125991-01 Sun Cluster 3.2: Sun Cluster 3.2: SC Checks patch for Solaris 9
    126004-01 Sun Cluster 3.2: HA-Agfa Patch for Solaris 9
    126007-01 Sun Cluster 3.2: HA-DB Patch for Solaris 9
    126010-02 Sun Cluster 3.2: HA-DHCP Patch for Solaris 9
    126013-03 Sun Cluster 3.2: HA-Apache Patch for Solaris 9
    126016-01 Sun Cluster 3.2: HA-DNS Patch for Solaris 9
    126022-02 Sun Cluster 3.2: Sun Cluster HA for Java Web Server, Patch for Solaris 9
    126025-01 Sun Cluster 3.2: HA-Oracle Application Server Patch for Solaris 9
    126028-01 Sun Cluster 3.2: HA-LiveCache Patch for Solaris 9
    126031-03 Sun Cluster 3.2: Ha-MYSQL Patch for Solaris 9 Note: Please read SIIOTP
    126034-02 Sun Cluster 3.2: HA-NFS Patch for Solaris 9
    126040-01 Sun Cluster 3.2: HA-N1 Grid Engine Patch for Solaris 9
    126043-02 Sun Cluster 3.2: HA-PostgreSQL Patch for Solaris 9 Note: Please read SIIOTP
    126046-05 Sun Cluster 3.2: HA-Oracle patch for Solaris 9 Note: Please read SIIOTP
    126049-02 Sun Cluster 3.2: HA-Oracle E-business suite Patch for Solaris 9
    126055-01 Sun Cluster 3.2: HA-SAP Patch for Solaris 9 Note: Deliver SUNW.sap_as:4 Please read SIIOTP
    126058-03 Sun Cluster 3.2: HA-SAPDB Patch for Solaris 9
    126061-03 Sun Cluster 3.2: HA-SAP-WEB-AS Patch for Solaris 9
    126064-03 Sun Cluster 3.2: HA-Siebel Patch for Solaris 9
    126067-03 Sun Cluster 3.2: HA-Sybase Patch for Solaris 9 Note: Deliver SUNW.sybase:5 Please read SIIOTP
    126070-01 Sun Cluster 3.2: HA-Tomcat Patch for Solaris 9
    126073-01 Sun Cluster 3.2: HA-BEA WebLogic Patch for Solaris 9
    126079-02 Sun Cluster 3.2: HA-Sun Java Systems App Server Patch for Solaris 9
    126082-01 Sun Cluster 3.2: HA-Sun Java Message Queue Patch for Solaris 9
    126085-02 Sun Cluster 3.2: HA-SWIFTAlliance Access Patch Note: Please read SIIOTP
    126091-02 Sun Cluster 3.2: HA-Websphere MQ Patch
    126095-04 Sun Cluster 3.2: Localization patch for Solaris 9 sparc and Solaris 10 sparc
    126097-01 Sun Cluster 3.2: HA-SWIFTAllianceGateway patch for Solaris 9
    128480-02 Sun Cluster 3.2: Grid Service Provisioning for Solaris 9
    128556-01 Sun Cluster 3.2: Man Pages Patch for Solaris 9 and Solaris 10, sparc


    The quorum server is an alternative to the traditional quorum disk. The quorum server is outside of the Sun Cluster and will be accessed through the public network. Therefore the quorum server can be a different architecture.

    Included patch revisions in Sun Cluster 3.2 2/08 Update1 for quorum server feature:
    127404-01 Sun Cluster 3.2: Quorum Server Patch for Solaris 9
    127405-02 Sun Cluster 3.2: Quorum Server Patch for Solaris 10
    127406-02 Sun Cluster 3.2: Quorum Server Patch for Solaris 10_x86
    127408-01 Sun Cluster 3.2: Quorum Man Pages Patch for Solaris 9 and Solaris 10, sparc
    127409-01 Sun Cluster 3.2: Quorum Man Pages Patch for Solaris 10_x86


    If some patches must be applied when the node is in noncluster mode, you can apply them in a rolling fashion, one node at a time, unless a patch's instructions require that you shut down the entire cluster. Follow procedures in How to Apply a Rebooting Patch (Node) in Sun Cluster System Administration Guide for Solaris OS to prepare the node and boot it into noncluster mode. For ease of installation, consider applying all patches at once to a node that you place in noncluster mode.

    Information about patch management available at Oracle Enterprise Manager Ops Center.

    About

    I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today