Introduction to SunCluster

Sun cluster supports four main topologies

- Clustered pair - two or more nodes that operate under a single cluster framework.

- Pair+N - A pair of nodes are connected to shared disk, and the remaining N nodes access the resources through the cluster interconnect.

- N+1 (star) - Contains some number of primary nodes, and one secondary node. The secondary node has access to all of the shared disk, and can take over for any of the N nodes should they fail.

- N\*N (scalable) - enabled every shared storage device in the cluster to connect to every node in the cluster. Since all of the N nodes can see the disk, failover can occur to any node



Quorum Rules

  • A quorum device must be available to both nodes in a 2-node cluster

  • QD info is maintained globally in the CCR db

  • QD should contain user data

  • Max and optimal number of votes contributed by QDs must be N -1 (where N == number of nodes in the cluster)

  • If # of QDs >= # of nodes, Cluster cannot come up easily if there are too many failed/errored QDs.

  • QDs are not required in clusters with more than 2 nodes, but recommended for higher cluster availability.

  • QDs are antomatically configured after Sun Cluster s/w installation is done, but you can also choose to manually assign the QD.

  • QDs are configured using DID devices, or you can use a Quorum Server (3.2 only)

Quorum Math and Consequences

  • A running cluster is always aware of (Math):

    • --> Total possible Q votes (number of nodes + disk quorum votes) --> Total present Q votes (number of booted nodes + available QD votes) --> Total needed Q votes ( > 50% of possible votes)

    Consequences:

    • --> Node that cannot find adequate Q votes will freeze, waiting for other nodes to join the cluster

      --> Node that is booted in the cluster but can no longer find the needed number of votes kernel panics (this occurs at a reconfiguration, when votes are checked, and Quorum accessed).

Installmode Flag -- allows for cluster nodes to be rebooted after/during initial

  • installation without causing the other (active) node(s) to panic.


# Reporting the cluster membership and quorum vote information

# /usr/cluster/bin/scstat -q

Quorum device / data integrity issues

- Quorum devices acquire quorum vote counts that are based on the number of node connections to the device. When you set up a quorum device, it acquires a maximum vote count of N-1 where N is the number of connected votes to the quorum device. For example, a quorum device that is connected to two nodes with nonzero vote counts has a quorum count of one (two minus one).


- Split brain occurs when the cluster interconnect between nodes is lost and the cluster becomes partitioned into subclusters, and each subcluster believes that it is the only partition

- Amnesia occurs if all the nodes leave the cluster in staggered groups. An example is a two-node cluster with nodes A and B. If node A goes down, the configuration data in the CCR is updated on node B only, and not node A. If node B goes down at a later time, and if node A is rebooted, node A will be running with old contents of the CCR. This state is called amnesia and might lead to running a cluster with stale configuration information.


- Failure fencing limits node access to multihost disks by preventing access To the disks. When a node leaves the cluster (it either fails or becomes partitioned), failure fencing ensures that the node can no longer access the disks. Only current member nodes have access to the disks, ensuring data integrity. The Sun Cluster system uses SCSI disk reservations to implement failure fencing. Using SCSI reservations, failed nodes are fenced away from the multihost disks, preventing them from accessing those disks.

You may see messages like :

NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.

If you are trying to start a cluster from a node that have left the cluster before last scshutdown, or if it was not the last to leave the cluster ( to prevent amnesia)

Cluster membership

- Cluster membership is handled through the Cluster Membership

Monitor (CCM), which performs the following actions:

- Enforcing a consistent membership view on all nodes (quorum)

- Driving synchronized reconfiguration in response to membership changes

- Handling cluster partitioning

- Ensuring full connectivity among all clustermembers by leaving unhealthy nodes out of the cluster until it is repaired.

CMM debug buffer example :

c-220ra-2-epar03 # mdb -k

cmmLoading modules: [ unix_ krtld genunixd dtrace specfs ufs sdb pcipsy ip hook neti sctpg arp usba fcp fctl nca md zfs random logindmux ptm_ cpc sppp crypto wrsmd fcip nfs ipc lofs ]

> \*cmm_dbg_buf/s
0x30003b8a008:  th 300023ad000 tm 412444414: device_type_registry_impl::initiali
ze()
th 300023ad000 tm 412444459: Registered device_type_registry in nameserver
th 300023ad000 tm 412444460: quorum_algorithm_init called
th 300023ad000 tm 412444474: quorum_impl: Attempting to get ref to type registry
th 300023ad000 tm 412444476: device_type_registry retrieved from nameserver.
th 300023ad000 tm 412444477: In initialize_quorum_info
th 300023ad000 tm 412444479: Creating event objects for quorum
th 300023ad000 tm 412444513: initialize_device_quorum_info being called
th 300023ad000 tm 412444525: Deleting old quorum info
th 300023ad000 tm 412444527: 
----------Quorum Info Table-----------
th 300023ad000 tm 412444528: List of nodes:
th 300023ad000 tm 412444529:    Node 1: votes = 1, key = 0x45ae2ec800000001
th 300023ad000 tm 412444530:    Node 2: votes = 1, key = 0x45ae2ec800000002
th 300023ad000 tm 412444532: List of quorum devices:
th 300023ad000 tm 412444533:    Quorum device 1: gdevname = '/dev/did/rdsk/d4s2'
, votes = 1, nodes_with_configured_paths = 0x3
th 300023ad000 tm 412444534: Total number of configured votes = 3.
th 300023ad000 tm 412444535: 
--------End of Quorum Info Table--------

th 300023ad000 tm 412444536: Calling process device quorum info
th 300023ad000 tm 412444537: get_quorum_device called for type scsi2
th 300023ad000 tm 412444539: get_quorum_device: loading module clq_scsi2
th 300023ad000 tm 412450398: SCSI2: module loading.
th 300023ad000 tm 412450399: device_type_registry_impl::register_device_type() register type scsi2
th 300023ad000 tm 412450401: SCSI2(/dev/did/rdsk/d4s2)::quorum_open
th 300023ad000 tm 412485050: SCSI2(/dev/did/rdsk/d4s2): quorum_open - vn_open() returned 0 after 1 trials
th 300023ad000 tm 412485052: quorum_disk_initialize_reserved(30002dbd680) called.
th 300023ad000 tm 412485082: quorum_ioctl_with_retries: ioctl DKIOCGGEOM returned error (13). Will retry in 2 seconds.
th 300023ad000 tm 412685038: quorum_ioctl_with_retries: ioctl DKIOCGGEOM returned error (13). Will retry in 2 seconds.
th 300023ad000 tm 412885032: quorum_ioctl_with_retries: ioctl DKIOCGGEOM returned error (13). Will retry in 2 seconds.
th 300023ad000 tm 413085033: quorum_ioctl_with_retries: ioctl DKIOCGGEOM returned error (13).
th 300023ad000 tm 413085035: quorum_scsi_get_sblkno: ioctl DKIOCGGEOM returned error (13).
th 300023ad000 tm 413085036: quorum_scsi_get_sblkno(30002dbd680) retd  error 13.
th 300023ad000 tm 413085038: quorum_disk_initialize_reserved(30002dbd680) retd sblkno (0xffffffffffffffff).
th 300023ad000 tm 413085040: SCSI2(/dev/did/rdsk/d4s2): quorum_open: Failed to initialize reserved sectors. Returned 13. Will retry later.
th 300023ad000 tm 413085042: Returned from process device quorum info
th 300023ad000 tm 413085044: Binding quorum to nameserver
th 300023ad000 tm 413085048: 'quorum' bound to local name server
th 300023ad000 tm 413109145: in unknown state
th 30003e72ca0 tm 413109291: CMM sender_thread starting
th 30003e79940 tm 413109312: in start state
th 30003e78fe0 tm 413109357: boot_delay thread starting
th 30003e79940 tm 413109360: in check_node_health
th 30003e79940 tm 413109361: in begin state
th 30003e79940 tm 413109363: seqnum = 1
th 30003e79620 tm 413433948: node_is_reachable(1,1224059757); current incn = 0
th 30003e79620 tm 413433950: node 1 is up; new incn = 1224059757
th 300023acce0 tm 413434043: state_changed as sender seqnum greater than mine. Sender = 1
th 30003e79940 tm 413434048: seqnum = 2
th 300023acce0 tm 413434122: state_changed as sender seqnum greater than mine. Sender = 1
th 30003e79940 tm 413434126: seqnum = 3
th 30003e79940 tm 413434392: in connectivity state
th 30003e79940 tm 413434393: connectivity state: membership = 0x3
th 30003e79940 tm 413434394: Connectivity matrix while in connectivity_state:
th 30003e79940 tm 413434395:    heard_from[1] : 0x3
th 30003e79940 tm 413434396:    heard_from[2] : 0x3
th 30003e79940 tm 413434414: in quorum_acquisition state
th 30003e79940 tm 413434538: in delay_in_case_of_split_brain
th 30003e79940 tm 413434542: delay_in_case_of_split_brain: 2 votes up, 0 down.
th 30003e79940 tm 413434544: acquire_quorum_devices(0x3) called.
th 30003e79940 tm 413434545: partition(0x3) can have quorum
th 30003e79940 tm 413434547: update_membership(0x3) called.
th 30003e79940 tm 413434548: reset_quorum_devices_bus(0x3) called.
th 30003e79940 tm 413434550: take_ownership: Calling quorum_read_keys
th 30003e79940 tm 413434551: quorum_disk_initialize_reserved(30002dbd680) called.
th 30003e79940 tm 413434580: quorum_ioctl_with_retries: ioctl DKIOCGGEOM returned error (13). Will retry in 2 seconds.
th 30003e79940 tm 413645908: quorum_disk_initialize_reserved(30002dbd680) retd sblkno (0x10ddbe5).
th 30003e79940 tm 413647728: quorum_pgre_choosing_write: wrote 0 for sblkno (0x10ddbe7).
th 30003e79940 tm 413648325: quorum_pgre_number_write: wrote 0x0 for sblkno (0x10ddbe7).
th 30003e79940 tm 413649874: SCSI2(/dev/did/rdsk/d4s2): quorum_read_keys returned 1 keys:
th 30003e79940 tm 413649876: key[0]: key=0x45ae2ec800000001
th 30003e79940 tm 413649878: Completed quorum_read_keys
th 30003e79940 tm 413649879: take_ownership: device=1: owner_id=0, preempt_nodes=0x0
th 30003e79940 tm 413649880: acquire_quorum_devices_lock_held returing {nodeid = 2, owned_devices = 0x0, preempted = 0}
th 30003e79940 tm 413649894: in quorum_check state
th 30003e79940 tm 413650092: check_cluster_quorum(): nodes=0x3, devices=0x1, votes=3, total configured votes = 3, required quorum = 2
th 30003e79940 tm 413650095: cluster has reached quorum
th 30003e79940 tm 413650147: in step1 state
th 30003e79940 tm 413650923: in step2 state
th 30003e79940 tm 413651064: in step3 state
th 30003e79940 tm 413651593: in step4 state
th 30003e79940 tm 413651810: in step5 state
th 30003e79940 tm 413651954: in step6 state
th 30003e79940 tm 413652094: in step7 state
th 30003e79940 tm 413716143: in step8 state
th 30003e79940 tm 413722994: in step9 state
th 30003e79940 tm 413723160: in step10 state
th 30003e79940 tm 413723344: in step11 state
th 30003e79940 tm 413723523: in step12 state
th 30003e79940 tm 413757758: in step13 state
th 30003e79940 tm 413757917: in end_sync state
th 30003e79940 tm 413758049: end state: seq_num = 3, membership = 0x3
th 30003aad020 tm 413804257: ccr_refreshed called
th 30003aad020 tm 413806306: SCSI2(/dev/did/rdsk/d4s2): quorum_read_keys returned 1 keys:
th 30003aad020 tm 413806308: key[0]: key=0x45ae2ec800000001
th 30003aad020 tm 413806310: SCSI2(/dev/did/rdsk/d4s2): quorum_register.
th 30003aad020 tm 413806311:    key=0x45ae2ec800000002
th 30003aad020 tm 413807217: SCSI2(/dev/did/rdsk/d4s2): quorum_register read check has succeeded
th 30003aad020 tm 413807219: SCSI2: quorum_register wrote key 0x45ae2ec800000002 to sblkno 0x10ddbe7.
th 30003aad020 tm 413807220: Registration key placed on QD 1
th 30003aad020 tm 413807222: SCSI2(/dev/did/rdsk/d4s2): quorum_enable_failfast
th 30003e78fe0 tm 419109018: boot_delay thread exiting
th 30003e79620 tm 107710027: node_is_unreachable(1,1224059757,12000) called
th 30003e79620 tm 107710029: node 1 is down
th 30003e79620 tm 107710032: Starting split brain timer for node 1
th 30003e79940 tm 107710050: in return state
th 30003e79940 tm 107710079: in check_node_health
th 30003e79940 tm 107710081: in begin state
th 30003e79940 tm 107710082: seqnum = 4
th 30003e79940 tm 107710148: in connectivity state
th 30003e79940 tm 107710149: connectivity state: membership = 0x2
th 30003e79940 tm 107710150: Connectivity matrix while in connectivity_state:
th 30003e79940 tm 107710151:    heard_from[2] : 0x2
th 30003e79940 tm 107710166: in quorum_acquisition state
th 30003e79940 tm 107710170: in delay_in_case_of_split_brain
th 30003e79940 tm 107710173: delay_in_case_of_split_brain: 1 votes up, 1 down.
th 30003e79940 tm 107710175: acquire_quorum_devices(0x2) called.
th 30003e79940 tm 107710176: partition(0x2) can have quorum
th 30003e79940 tm 107710177: update_membership(0x2) called.
th 30003e79940 tm 107710179: reset_quorum_devices_bus(0x2) called.
th 30003e79940 tm 107710180: Node 1 is not in cluster. Calling quorum_reset (qid=1)
th 30003e79940 tm 107710181: SCSI2(/dev/did/rdsk/d4s2): quorum_reset_bus called.
th 30003e79940 tm 107710379: quorum_reset_bus(qid=1) succeeded.
th 30003e79940 tm 107710381: take_ownership: Calling quorum_read_keys
th 30003e79940 tm 108028469: SCSI2(/dev/did/rdsk/d4s2): quorum_read_keys returned 2 keys:
th 30003e79940 tm 108028471: key[0]: key=0x45ae2ec800000001
th 30003e79940 tm 108028472: key[1]: key=0x45ae2ec800000002
th 30003e79940 tm 108028473: Completed quorum_read_keys
th 30003e79940 tm 108028475: find_resv_owner: Calling quorum_read_reservations
th 30003e79940 tm 108028476: SCSI2(/dev/did/rdsk/d4s2): quorum_read_reservations
th 30003e79940 tm 108028677: SCSI2(/dev/did/rdsk/d4s2): quorum_read_reservations returned:
th 30003e79940 tm 108028678: resv[1]: key=0x45ae2ec800000001
th 30003e79940 tm 108028681: Quorum: preempt_nodes_and_reserve called
th 30003e79940 tm 108028682: SCSI2(/dev/did/rdsk/d4s2): quorum_preempt called for key 0x45ae2ec800000001
th 30003e79940 tm 108029322: quorum_pgre_choosing_write: wrote 1 for sblkno (0x10ddbe7).
th 30003e79940 tm 108030474: quorum_pgre_number_read: read 0x0 for sblkno (0x10ddbe6).
th 30003e79940 tm 108030537: quorum_pgre_number_read: read 0x0 for sblkno (0x10ddbe7).
th 30003e79940 tm 108031857: quorum_pgre_lock: my_num = 0x1.
th 30003e79940 tm 108032904: quorum_pgre_number_write: wrote 0x1 for sblkno (0x10ddbe7).
th 30003e79940 tm 108033503: quorum_pgre_choosing_write: wrote 0 for sblkno (0x10ddbe7).
th 30003e79940 tm 108034654: quorum_pgre_choosing_read: read 0 for sblkno (0x10ddbe6).
th 30003e79940 tm 108035252: quorum_pgre_number_read: read 0x0 for sblkno (0x10ddbe6).
th 30003e79940 tm 108036445: quorum_pgre_key_find: match found on disk for key.
th 30003e79940 tm 108039465: SCSI2(/dev/did/rdsk/d4s2): quorum_preempt: issuing a MHIOCTKOWN.
th 30003e79940 tm 108640284: quorum_pgre_number_write: wrote 0x0 for sblkno (0x10ddbe7).                              
th 30003e79940 tm 108640293: take_ownership: device=1: owner_id=2, preempt_nodes=0x1
th 30003e79940 tm 108640295: acquire_quorum_devices_lock_held returing {nodeid = 2, owned_devices = 0x1, preempted = 0}
th 30003e79940 tm 108640313: in quorum_check state
th 30003e79940 tm 108640324: check_cluster_quorum(): nodes=0x2, devices=0x1, votes=2, total configured votes = 3, required quorum = 2

Cluster configuration repository

  • The Cluster Configuration Repository (CCR) is a private, cluster-wide, distributed database for storing information that pertains to the configuration and state of the cluster (located in /etc/cluter/ccr/\*) :

ccr_gennum      27
ccr_checksum    48DE7EF8D15BA5B6D4F80084A3ACEFB8
cluster.name    cluster32
cluster.state   enabled
cluster.properties.cluster_id   0x45AE2EC8
cluster.properties.installmode  disabled
cluster.properties.private_net_number   172.16.0.0
cluster.properties.private_netmask      255.255.248.0
cluster.properties.private_subnet_netmask       255.255.255.128
cluster.properties.private_user_net_number      172.16.4.0
cluster.properties.private_user_netmask 255.255.254.0
cluster.properties.private_maxnodes     64
cluster.properties.private_maxprivnets  10
cluster.properties.auth_joinlist_type   sys
cluster.properties.auth_joinlist_hostslist      .
cluster.properties.transport_heartbeat_timeout  10000
cluster.properties.transport_heartbeat_quantum  1000
cluster.properties.udp_session_timeout  480
cluster.properties.cmm_version  1
cluster.nodes.1.name    c-220ra-1-epar03
cluster.nodes.1.state   enabled
cluster.nodes.1.properties.private_hostname     clusternode1-priv
cluster.nodes.1.properties.quorum_vote  1
cluster.nodes.1.properties.quorum_resv_key      0x45AE2EC800000001
cluster.nodes.1.adapters.1.name qfe3
cluster.nodes.1.adapters.1.state        enabled
cluster.nodes.1.adapters.1.properties.device_name       qfe
cluster.nodes.1.adapters.1.properties.device_instance   3
cluster.nodes.1.adapters.1.properties.transport_type    dlpi
cluster.nodes.1.adapters.1.properties.lazy_free 1
cluster.nodes.1.adapters.1.properties.dlpi_heartbeat_timeout    10000
cluster.nodes.1.adapters.1.properties.dlpi_heartbeat_quantum    1000
cluster.nodes.1.adapters.1.properties.nw_bandwidth      80
cluster.nodes.1.adapters.1.properties.bandwidth 10
cluster.nodes.1.adapters.1.properties.ip_address        172.16.0.129
cluster.nodes.1.adapters.1.properties.netmask   255.255.255.128
cluster.nodes.1.adapters.1.ports.1.name 0
cluster.nodes.1.adapters.1.ports.1.state        enabled
cluster.nodes.1.adapters.2.name qfe7
cluster.nodes.1.adapters.2.state        enabled
cluster.nodes.1.adapters.2.properties.device_name       qfe
cluster.nodes.1.adapters.2.properties.device_instance   7
cluster.nodes.1.adapters.2.properties.transport_type    dlpi
cluster.nodes.1.adapters.2.properties.lazy_free 1
cluster.nodes.1.adapters.2.properties.dlpi_heartbeat_timeout    10000
cluster.nodes.1.adapters.2.properties.dlpi_heartbeat_quantum    1000
cluster.nodes.1.adapters.2.properties.nw_bandwidth      80
cluster.nodes.1.adapters.2.properties.bandwidth 10
cluster.nodes.1.adapters.2.properties.ip_address        172.16.1.1
cluster.nodes.1.adapters.2.properties.netmask   255.255.255.128
cluster.nodes.1.adapters.2.ports.1.name 0
cluster.nodes.1.adapters.2.ports.1.state        enabled
cluster.nodes.1.cmm_version     1
cluster.nodes.2.name    c-220ra-2-epar03
cluster.nodes.2.state   enabled
cluster.nodes.2.properties.quorum_vote  1
cluster.nodes.2.properties.quorum_resv_key      0x45AE2EC800000002
cluster.nodes.2.properties.private_hostname     clusternode2-priv
cluster.nodes.2.adapters.1.name qfe3
cluster.nodes.2.adapters.1.properties.device_name       qfe
cluster.nodes.2.adapters.1.properties.device_instance   3
cluster.nodes.2.adapters.1.properties.transport_type    dlpi
cluster.nodes.2.adapters.1.properties.lazy_free 1
cluster.nodes.2.adapters.1.properties.dlpi_heartbeat_timeout    10000
cluster.nodes.2.adapters.1.properties.dlpi_heartbeat_quantum    1000
cluster.nodes.2.adapters.1.properties.nw_bandwidth      80
cluster.nodes.2.adapters.1.properties.bandwidth 10
cluster.nodes.2.adapters.1.properties.ip_address        172.16.0.130
cluster.nodes.2.adapters.1.properties.netmask   255.255.255.128
cluster.nodes.2.adapters.1.state        enabled
cluster.nodes.2.adapters.1.ports.1.name 0
cluster.nodes.2.adapters.1.ports.1.state        enabled
cluster.nodes.2.adapters.2.name qfe7
cluster.nodes.2.adapters.2.properties.device_name       qfe
cluster.nodes.2.adapters.2.properties.device_instance   7
cluster.nodes.2.adapters.2.properties.transport_type    dlpi
cluster.nodes.2.adapters.2.properties.lazy_free 1
cluster.nodes.2.adapters.2.properties.dlpi_heartbeat_timeout    10000
cluster.nodes.2.adapters.2.properties.dlpi_heartbeat_quantum    1000
cluster.nodes.2.adapters.2.properties.nw_bandwidth      80
cluster.nodes.2.adapters.2.properties.bandwidth 10
cluster.nodes.2.adapters.2.properties.ip_address        172.16.1.2
cluster.nodes.2.adapters.2.properties.netmask   255.255.255.128
cluster.nodes.2.adapters.2.state        enabled
cluster.nodes.2.adapters.2.ports.1.name 0
cluster.nodes.2.adapters.2.ports.1.state        enabled
cluster.nodes.2.cmm_version     1
cluster.blackboxes.1.name       switch1
cluster.blackboxes.1.state      enabled
cluster.blackboxes.1.properties.type    switch
cluster.blackboxes.1.ports.1.name       1
cluster.blackboxes.1.ports.1.state      enabled
cluster.blackboxes.1.ports.2.name       2
cluster.blackboxes.1.ports.2.state      enabled
cluster.blackboxes.2.name       switch2
cluster.blackboxes.2.state      enabled
cluster.blackboxes.2.properties.type    switch
cluster.blackboxes.2.ports.1.name       1
cluster.blackboxes.2.ports.1.state      enabled
cluster.blackboxes.2.ports.2.name       2
cluster.blackboxes.2.ports.2.state      enabled
cluster.cables.1.properties.end1        cluster.nodes.1.adapters.1.ports.1
cluster.cables.1.properties.end2        cluster.blackboxes.1.ports.1
cluster.cables.1.state  enabled
cluster.cables.2.properties.end1        cluster.nodes.1.adapters.2.ports.1
cluster.cables.2.properties.end2        cluster.blackboxes.2.ports.1
cluster.cables.2.state  enabled
cluster.cables.3.properties.end1        cluster.nodes.2.adapters.1.ports.1
cluster.cables.3.properties.end2        cluster.blackboxes.1.ports.2
cluster.cables.3.state  enabled
cluster.cables.4.properties.end1        cluster.nodes.2.adapters.2.ports.1
cluster.cables.4.properties.end2        cluster.blackboxes.2.ports.2
cluster.cables.4.state  enabled
cluster.quorum_devices.1.name   d4
cluster.quorum_devices.1.state  enabled
cluster.quorum_devices.1.properties.votecount   1
cluster.quorum_devices.1.properties.gdevname    /dev/did/rdsk/d4s2
cluster.quorum_devices.1.properties.path_1      enabled
cluster.quorum_devices.1.properties.path_2      enabled
cluster.quorum_devices.1.properties.access_mode scsi2
cluster.quorum_devices.1.properties.type        scsi2

- The CCR structures contain the following types of information:

- Cluster and node names

- Cluster transport configuration

- The names of Solaris Volume Manager disk sets or VERITAS disk groups

- A list of nodes that can master each disk group

- Operational parameter values for data services

- Paths to data service callback methods

- DID device configuration

  • Current cluster status

    tip : getting install date/time from clusterid :

SolarisCAT(live/10U)> 2time 0x45AE2EC8
        Wed Jan 17 15:12:24 2007

You will also find the clusterid into the reservation/registrations keys :

#c-220ra-1-epar03 # /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/rdsk/c1t0d0s2

key[0]=0x45ae2ec800000001.
key[1]=0x45ae2ec800000002.

Shutting down cluster

  • scshutdown -y -g 30

Booting nodes in non-cluster mode

  • boot -x

Cluster daemons

c-220ra-1-epar03 # ps -edf | grep cluster
    root     4     0   0   Sep 30 ?          60:41 cluster
    root   293     1   0   Sep 30 ?           0:00 /usr/cluster/lib/sc/clexecd
    root   272     1   0   Sep 30 ?           0:00 /usr/cluster/lib/sc/failfastd
    root   252     1   0   Sep 30 ?           0:00 /usr/cluster/lib/sc/qd_userd
    root   294   293   0   Sep 30 ?           0:00 /usr/cluster/lib/sc/clexecd
    root   613     1   0   Sep 30 ?           2:46 /usr/lib/inet/xntpd -c /etc/inet/ntp.conf.cluster
    root   800     1   0   Sep 30 ?           0:20 /usr/cluster/lib/sc/cl_eventd
    root   889     1   0   Sep 30 ?           0:00 /usr/cluster/lib/sc/scprivipd
    root   782     1   0   Sep 30 ?           0:27 /usr/cluster/bin/pnmd
    root   828     1   0   Sep 30 ?           0:00 /usr/cluster/lib/sc/rpc.pmfd
    root   795     1   0   Sep 30 ?           0:01 /usr/cluster/lib/sc/cl_ccrad
    root  3322     1   0   Sep 30 ?           0:32 /usr/cluster/lib/sc/cl_eventlogd
    root   918     1   0   Sep 30 ?           8:25 /usr/cluster/lib/sc/rgmd
    root   777     1   0   Sep 30 ?           0:00 /usr/cluster/lib/sc/sc_zonesd
    root   981     1   0   Sep 30 ?           0:33 /usr/cluster/lib/sc/sc_delegated_restarter
    root   789     1   0   Sep 30 ?           1:13 /usr/cluster/lib/sc/scdpmd
    root   874     1   0   Sep 30 ?           1:47 /usr/cluster/lib/sc/rpc.fed


  • cluster -- system proc created by the kernel to encapsulate kernel threads that make up the core kernel range of operations. It directly panics the kernel if it's sent a KILL signal (SIGKILL). Other signals have no effect.

  • clexecd -- this is used by cluster kernel threads to execute userland cmds (such as run_reserve and dofsck cmds). It is also used to run cluster cmds remotely (eg: scshutdown).A failfast driver panics the kernel if this daemon is killed and not restarted in 30s.

  • cl_eventd -- This daemon registers and forwards cluster events s(eg: nodes entering and leaving the cluster). With a min of SC 3.1 10/03, user apps can register themselves to receive cluster events. Automatically restarted if killed.

  • cl_eventlogd - This daemon logs cluster events into a binary log file.

  • rgmd -- This is the resource group mgr, which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed by and not restarted in 30s.

    To start rgmd in debug mode :


disable failfast :

# /usr/cluster/lib/sc/cmm_ctl -f

node 1 is not up

node level failfast control disabled on node


( for sc3.0/ 3.1, use /usr/cluster/dtk/bin/cmm_ctl )

stop rgmd

start rgmd in debug mode :

# /usr/cluster/bin/rgmd -d


output given ( ½ sec datas )


15:20:42 c-220ra-1-epar03 Cluster.
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource type = SUNW.gds:6
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- rg = apache-ip-rg
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <1>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <2>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource = apache-ip-1
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ----------rt_basedir =/usr/cluster/lib/rgm/rt/hascip
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource type = SUNW.SharedAddress:2
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rgm_add_rg(), rg_name = zonetest
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- rg = zonetest
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <1>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <2>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- rg = test-rg
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <1>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <2>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource = test-rs
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ----------rt_basedir =/opt/SUNWscsmf/bin
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource type = SUNW.Proxy_SMF_failover
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- rg = rg2
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <1>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <2>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- rg = rg1
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <1>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <2>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource = rs1
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ----------rt_basedir =/usr/cluster/lib/rgm/rt/hascip
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource type = SUNW.SharedAddress:2
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- rg = jmt-rg
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <2>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource = jmt-gds-rs
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ----------rt_basedir =/opt/SUNWscgds/bin
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource type = SUNW.gds:6
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- rg = apache-ip-rg
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <1>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rg_node = <2>
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource = apache-ip-1
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ----------rt_basedir =/usr/cluster/lib/rgm/rt/hascip
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: ---------- resource type = SUNW.SharedAddress:2
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rgm_get_resources()
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rgm_add_r(), r_name = zonetest, rg_name = zonetest
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rgm_get_lni 1 
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rgm_get_lni 2 
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rgm_get_lni 1 
Oct 15 15:20:42 c-220ra-1-epar03 Cluster.RGM.rgmd: rgm_get_lni 2 


  • rpc.fed -- this is the "fork-and-exec" daemon - -which handles requests from rgmd to spawn methods for specific data services. failfast will panic the kernel if this is killed and not restarted in 35s.

# truss -p (pid fed)

  • scguieventd -- this daemon processes cluster events for the SunPlex Mgr GUI, so that the display can be updated in real time. It's not automatically started if it stops. If you are having trouble with SunPlex Mgr, might have to restart the daemon or reboot the specific node.

  • rpc.pmfd -- This is the process monitoring facility. It is i used as a general mechanic to initiate restarts and failure action scripts for some cluster daemons, and for most applications daemons and applications fault monitors. A failfast driver panics the kernel if this daemon is killed and not started in 35s


disable failfast 
# /usr/cluster/lib/sc/cmm_ctl -f
node 1 is not up
 level failfast control disabled on node 

( for sc3.0/ 3.1, use /usr/cluster/dtk/bin/cmm_ctl )

#/usr/cluster/lib/sc/rpc.pmfd-d
  • pnmd -- This is the public network management daemon, which manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted.

  • scdpmd -- multi-threaded DPM daemon runs on each node. DPM daemon is started by an rc script when a node boots. It montiors the availability of logical path that is visible thru various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted .

  • scqsd - quorum server daemon (does not run on cluster nodes)

  • failfastd - The failfast daemon allows the kernel to panic if certain essential daemons have failed. A failfast driver panics the kernel if this daemon is killed and not started in 30s

to disable failfast :
c-220ra-2-epar03 # /usr/cluster/lib/sc/cmm_ctl -f
node level failfast control disabled on node 1
node level failfast control disabled on node 2
to enable failfast : 
c-220ra-2-epar03 # /usr/cluster/lib/sc/cmm_ctl -F
node level failfast control enabled on node 1
node level failfast control enabled on node 2

( for sc3.0/ 3.1, use /usr/cluster/dtk/bin/cmm_ctl )

  • scprivipd - This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones.Automatically restarted.

  • qd_userd - This daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland (for example, a NAS quorum device).

  • sc_delegated_restarter - This daemon restarts cluster applications that are written as SMF services and then placed under control of the cluster using the Sun Cluster 3.2 SMF proxy feature. Automatically restarted .

  • sc_zonesd - This daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting and failure. A failfast driver panics the kernel if this daemon is killed and not started in 35s

  • cl_ccrad - This daemon provides access from userland management applications to the CCR.Automatically restarted.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Jean-Christophe Lamoure

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today