Sun Cluster 3.2, Truecopy, and the DID Namespace
By hhnguyen on Nov 26, 2006
Sun Cluster 3.2 (SC3.2) adds support for the use of Truecopy data replication within a single cluster, expanding on the available support of using Truecopy between two different clusters with Odyssey. In addition to providing automatic failover of replicated device groups within a cluster, SC3.2 adds support for the use of Solaris Volume Manager (SVM) with Truecopy replicated devices. Previously, only Veritas Volume Manager (VxVM) could be used with Sun Cluster and Truecopy.
When used with Sun Cluster, SVM disksets use DID devices. This gives SVM a consistent device namespace on all cluster nodes, since the Solaris name for a given device may be different on each node of the cluster. With Sun Cluster, SVM is unable to import a diskset if it cannot find the expected device names on that node.
This limitation is the reason that SVM was initially unsupported for use with Truecopy. Truecopy replicates data between two physical devices. In a cluster, these devices will have different DID names. Which means that SVM will be looking for the wrong devices when a failover or a switchover occurs. In order for SVM to function properly with Truecopy replicated devices, the DID namespace must be manipulated so that each pair of replicated devices have the same DID name.
To accomplish this, two new options havebeen added to the scdidadm(1M) command and to its new counterpart cldevice(1CL). The -T option automatically discovers which DID devices are Truecopy replica pairs and remaps them to have the same DID instance number. The -t option can be used to manually combine the DID instances of replicated devices. 'cldevice replicate' and 'cldevice rename' are the corresponding new command line interfaces.
What these commands do is take the DID instance for the replica on the local node (the node on which the command is being run) and rename it to have the same DID instance number as the replica on the remote node. For example, in
a 2 node cluster, if DID instance d6 on node 1 is replicated to DID instance D7 on node 2, then running 'cldevice replicate -D node-2' on node 1 will cause the DID instance for D6 to be removed and a new device path for D7 to be added. This way, when a diskset is moved between nodes 1 and 2, SVM is able to look for the device via its device name.
During the SC3.2 Beta program, there was some confusion about which node the remapping command should be run on, the node with the primary replica, or the node with the secondary replica. It generally does not matter. The only real difference will be which DID instance gets removed, and which one has a new path added. As stated above, it is the instance on the local node which gets removed, so reversing the above example, if 'cldevice replicate -D node-1' were run on node 2, then the DID instance for D7 would be removed, and D6 would have a new path added. In practice, we recommend running the command on the node which has the secondary replica, since certain operations, such as cleaning up the DID driver data structures cannot occur if the instance being removed is in use.
VxVM does not require the device name on both nodes to be the same when importing disk groups. However, there is a connectivity check when a disk group is registered as a global device group which will fail if all disks in the disk group do not show a DID device path from all nodes configured as potential primary nodes. Because of this, DID remapping also needs to be done when using Truecopy with VxVM.
Sun Cluster Engineering