Tuesday Apr 22, 2014

Configure failover LDom (Oracle VM Server for SPARC) on Solaris Cluster 4.1 by using 'live' migration

This blog shows an example to configure 'Oracle Solaris Cluster Data Service for Oracle VM Server for SPARC' on Solaris Cluster 4.1. It also mentions some hints around such a configuration. For this setup Solaris Cluster 4.1 SRU3 or higher and Oracle VM Server 3.0 or higher is required.
At least this is a summary of
Oracle Solaris Cluster Data Service for Oracle VM Server for SPARC Guide
and
Oracle VM Server for SPARC 3.0 Administration Guide.
Please check these guides for further restrictions and requirements.

This procedure is especially for 'live' migration of guest LDom's which means no shutdown of the OS in the LDom within the failover. In earlier OVM releases this was called 'warm' migration. However, the word 'live' is used in this example. A 'cold' migration means that the OS in the guest LDom will be stopped before migration.

Let's start:
The necessary services must be identical on all the potential control domains (primary domains) which run as Solaris Cluster 4.1 nodes. It is expected that Oracle VM Sever is already installed.

1) Prepare all primary domains which should manage the failover LDom with the necessary services.
all_primaries# ldm add-vconscon port-range=5000-5100 primary-vcc0 primary
all_primaries# svcadm enable svc:/ldoms/vntsd:default
all_primaries# ldm add-vswitch net-dev=net0 public-vsw1 primary
all_primaries# ldm add-vdiskserver primary-vds0 primary

To verify:
all_primary# ldm list-bindings primary


2) Set failure-policy on all primary domains:
all_primaries# ldm set-domain failure-policy=reset primary
To verify:
all_primaries# ldm list -o domain primary


3) Create failover guest domain (fgd0) on one primary domain.
Simple example:
primaryA# ldm add-domain fgd0
primaryA# ldm set-vcpu 16 fgd0
primaryA# ldm set-mem 8G fgd0



4) Add public network to failover guest domain:
primaryA# ldm add-vnet public-net0 public-vsw1 fgd0
To verify:
primaryA# ldm list-bindings fgd0

For more details to setup guest LDoms refer to Oracle VM Server for SPARC 3.0 Administration Guide.


5) Set necessary values on failover guest domain fgd0:
primaryA# ldm set-domain master=primary fgd0
primaryA# ldm set-var auto-boot?=false fgd0

To verify run:
primaryA# ldm list -o domain fgd0
auto-boot?=false is a “must have” to prevent data corruption. More details available in DocID 1585422.1 Solaris Cluster and HA VM Server Agent SUNW.ldom Data Corruption may Occur in a failover Guest Domain when "auto-boot?=true" is set


6) Select boot device for failover guest domain fgd0.
Possible options for the root file system of a domain with 'live' migration are: Solaris Cluster global filesystem (UFS/SVM), NFS, iSCSI, and SAN LUNs because all accessible at the same time from both nodes. The recommendation is to use full raw disk because it's expected to do 'live' migration. The full raw disk can be provided via SAN or iSCSI to all primary domains.
Remember zfs as root filesystem can ONLY be used if doing 'cold' migration because for 'live' migration both nodes need to access the root file system at the same time which is not possible with zfs.
Using Solaris Cluster global filesystem is an alternative but the performance is not that good as root on raw disk.
Details available in DocID 1366967.1 Solaris Cluster Root Filesystem Configurations for a Guest LDom Controlled by a SUNW.ldom Resource

So, root on raw filesystem is selected.
Add boot device to fgd0:
all_primaries# ldm add-vdsdev /dev/did/rdsk/d7s2 boot_fgd0@primary-vds0
primaryA# ldm add-vdisk root_fgd0 boot_fgd0@primary-vds0 fgd0



6a) Optional: Configure MAC addresses of LDom. The LDom Manager assign MAC automatically but the following issues can occur:
* Duplicate MAC address if other guest LDom's are down when creating a new LDom.
* MAC address can change after failover of a LDom
Assign your own MAC address is possible. This example use the suggested range between 00:14:4F:FC:00:00 – 00:14:4F:FF:FF:FF as described in
Assigning MAC Addresses Automatically or Manually of Oracle VM Server for SPARC 3.0 Administration Guide.
Example:
Identify current automatically assigned MAC addresses
primaryA# ldm list -l fgd0
to see the HOSTID which is similar as MAC a 'ldm bind fldg0' is necessary. Unbind fldg0 afterwards with 'ldm unbind fldg0'
MAC: 00:14:4f:fb:50:dc → change to 00:14:4f:fc:50:dc
HOSTID: 0x84fb50dc → change to 0x84fc50dc
public-net: 00:14:4f:fa:01:49 → change to 00:14:4f:fc:01:49
primaryA# ldm set-domain mac-addr=00:14:4f:fc:50:dc fgd0
primaryA# ldm set-domain hostid=0x84fc50dc fgd0
primaryA# ldm set-vnet mac-addr=00:14:4f:fc:01:49 public-net0 fgd0
primaryA# ldm list-constraints fgd0 (this shows assigned MAC now)
If necessary to change the MAC addresses on already configured failover guest LDom then refer to DocID 1559415.1 Solaris Cluster HA-LDom Agent do not Preserve hostid and MAC Address Upon Failover


7) Bind and start the fgd0
primaryA# ldm bind fgd0
primaryA# ldm start fgd0


8) Login to LDom using console:
primaryA# telnet localhost 5000


9) Install Solaris10 or Solaris 11 on LDom by using install server
To identify MAC address of LDom do in the console of fgd0:
{0} ok devalias net
{0} ok cd /virtual-devices@100/channel-devices@200/network@0
{0} ok .properties
local-mac-address 00 14 4f fc 01 49

For different installation method please refer to Installing Oracle Solaris OS on a Guest Domain of Oracle VM Server for SPARC 3.0 Administration Guide


10) Install HA-LDom (HA for Oracle VM Server Package) on all primary domain nodes if not already done
all_primaries# pkg info ha-cluster/data-service/ha-ldom
all_primaries# pkg install ha-cluster/data-service/ha-ldom



11) Check that cluster is first entry in /etc/nsswitch.conf
all_primaries# svccfg -s name-service/switch listprop config/host
config/host astring "files dns"
all_primaries# svccfg -s name-service/switch listprop config/ipnodes
config/ipnodes astring "files dns"
all_primaries# svccfg -s name-service/switch listprop config/netmask
config/netmask astring files
If not add it:
all_primaries# svccfg -s name-service/switch setprop config/host = astring: '("cluster files dns")'
all_primaries# svccfg -s name-service/switch setprop config/ipnodes = astring: '("cluster files dns")'
all_primaries# svccfg -s name-service/switch setprop config/netmask = astring: '("cluster files")'

More Details in DocID 1554887.1 Solaris Cluster: HA LDom Migration Fails With "Failed to establish connection with ldmd(1m) on target"


12) Create resource group for failover LDom fgd0 for primiary domains
primaryA# clrg create -n primaryA,primaryB fldom-rg


13) Register SUNW.HAStoragePlus if not already done:
primaryA# clrt register SUNW.HAStoragePlus


14) Create HAStoragePlus resource for boot device:
primaryA# clrs create -g fldom-rg -t SUNW.HAStoragePlus -p GlobalDevicePaths=/dev/global/dsk/d7s2 fgd0-has-rs
To use d7s2 is a requirement!!!


15) Enable LDom resrouce group on current node:
primaryA# clrg online -M -n fldom-rg


16) Register SUNW.ldom
primaryA# clrt register SUNW.ldom


17) Setup password file for non-interactive 'live' migration on all primary nodes
all_primaries# vi /.pass
add root password to this file
all_primaries# chmod 600 /.pass
Reguirements:
* The first line of the file must contain the password
* The password must be plain text
* The password must not exceed 256 characters in length
A newline character at the end of the password and all lines that follow the first line are ignored.
These details from Performing Non-Interactive Migrations of Oracle VM Server for SPARC 3.0 Administration Guide



18) Create SUNW.ldom resource
primaryA# clrs create -g fldom-rg -t SUNW.ldom -p Domain_name=fgd0 -p Password_file=/.pass -p resource_dependencies=fgd0-has-rs fgd0-rs

Notice: The domain configuration is retrieved by the “ldm list-constraints -x ldom” command from Solaris Cluster and stored in the CCR. This info is used to create or destroy the domain on the node where the resource group is brought online or offline.


19) Check Migration_type property. It should be MIGRATE for 'live' migration:
primaryA# clrs show -v fgd0-rs | grep Migration_type
If not MIGRATE then set it:
primaryA# clrs set -p Migration_type=MIGRATE fgd0-rs


20) To stop/start the SUNW.ldom resource
primaryA# clrs disable fgd0-rs
primaryA# clrs enable fgd0-rs


21) Verify the setup by switching failover LDom to other node and back.
primaryA# clrg switch -n primaryB fldom-rg
primaryA# clrg switch -n primaryA fldom-rg
To monitor the migration process run 'ldm list -o status fldg0' on the primary target domain.


22) Tune your timeout values depending on your system.
primaryA# clrs set -p STOP_TIMEOUT=1200 fldom-rg
Details in DocID 1423937.1 Solaris Cluster: HA LDOM Migration Fails With "Migration of domain timed out, the domain state is now shut off"


23) Consider further tuning of timeout values as described in
SPARC: Tuning the HA for Oracle VM Server Fault Monitor of Oracle Solaris Cluster Data Service for Oracle VM Server for SPARC Guide
For less frequent probing maybe the following setting can be used.
primaryA # clrs set -p Thorough_probe_interval=180 -p Probe_timeout=90 fldom-rs

Last but not least, it's not supported to run Solaris Cluster within a failover LDom!

Wednesday Aug 06, 2008

Sun SPARC Enterprise Mx000 with active bge interface

The Sun SPARC Enterprise Server M4000, M5000, M8000 or M9000 can sporadically hang at boot time
a) if the system is part of Sun Cluster
and
b) if the system have a configured bge network interface


Example of boot hang:
...
Booting as part of a cluster
NOTICE: CMM: Node node1 (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node node2 (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 2 (/dev/did/rdsk/d7s2) added; votecount = 5, bitmask of nodes with configured paths = 0x3f.
NOTICE: clcomm: Adapter bge3 constructed
... now the system hang at this point ...


Solution: Install 138042-02 (or higher) of SunOS 5.10: MAC patch

About

I'm still mostly blogging around Solaris Cluster and support. Independently if for Sun Microsystems or Oracle. :-)

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today