Monday Nov 12, 2012

New Solaris Cluster!

We released Oracle Solaris Cluster 4.1 recently. OSC offers both High Availability (HA) and also Scalable Services capabilities. HA delivers automatic restart of software on the same cluster node and/or automatic failover from a failed node to a working cluster node. Software and support is available for both x86 and SPARC systems.

The Scalable Services features manage multiple cluster nodes all providing a load-balanced service such as web servers or app serves.

OSC 4.1 includes the ability to recover services from software failures, failure of hardware components such as DIMMs, CPUs, and I/O cards, a global file system, rolling upgrades, and much more.

Oracle Availability Engineering posted a brief description and links to details. Or, you can just download it now!

Friday Mar 21, 2008

High-availability Networking for Solaris Containers

Here's another example of Containers that can manage their own affairs.

Sometimes you want to closely manage the devices that a Solaris Container uses. This is easy to do from the global zone: by default a Container does not have direct access to devices. It does have indirect access to some devices, e.g. via a file system that is available to the Container.

By default, zones use NICs that they share with the global zone, and perhaps with other zones. In the past these were just called "zones." Starting with Solaris 10 8/07, these are now referred to as "shared-IP zones." The global zone administrator manages all networking aspects of shared-IP zones.

Sometimes it would be easier to give direct control of a Container's devices to its owner. An excellent example of this is the option of allowing a Container to manage its own network interfaces. This enables it to configure IP Multipathing for itself, as well as IP Filter and other network features. Using IPMP increases the availability of the Container by creating redundant network paths to the Container. When configured correctly, this can prevent the failure of a network switch, network cable or NIC from blocking network access to the Container.

As described at docs.sun.com, to use IP Multipathing you must choose two network devices of the same type, e.g. two ethernet NICs. Those NICs are placed into an IPMP group through the use of the command ifconfig(1M). Usually this is done by placing the appropriate ifconfig parameters into files named /etc/hostname.<NIC-instance>, e.g. /etc/hostname.bge0.

An IPMP group is associated with an IP address. Packets leaving any NIC in the group have a source address of the IPMP group. Packets with a destination address of the IPMP group can enter through either NIC, depending on the state of the NICs in the group.

Delegating network configuration to a Container requires use of the new IP Instances feature. It's easy to create a zone that uses this feature, making this an "exclusive-IP zone." One new line in zonecfg(1M) will do it:

zonecfg:twilight> set ip-type=exclusive
Of course, you'll need at least two network devices in the IPMP group. Using IP Instances will dedicate these two NICs to this Container exclusively. Also, the Container will need direct access to the two network devices. Configuring all of that looks like this:
global# zonecfg -z twilight
zonecfg:twilight> create
zonecfg:twilight> set zonepath=/zones/roots/twilight
zonecfg:twilight> set ip-type=exclusive
zonecfg:twilight> add net
zonecfg:twilight:net> set physical=bge1
zonecfg:twilight:net> end
zonecfg:twilight> add net
zonecfg:twilight:net> set physical=bge2
zonecfg:twilight:net> end
zonecfg:twilight>add device
zonecfg:twilight:device> set match=/dev/net/bge1
zonecfg:twilight:net> end
zonecfg:twilight>add device
zonecfg:twilight:device> set match=/dev/net/bge2
zonecfg:twilight:net> end
zonecfg:twilight> exit
As usual, the Container must be installed and booted with zoneadm(1M):
global# zoneadm -z twilight install
global# zoneadm -z twilight boot
Now you can login to the Container's console and answer the usual configuration questions:
global# zlogin -C twilight
<answer questions>
<the zone automatically reboots>

After the Container reboots, you can configure IPMP. There are two methods. One uses link-based failure detection and one uses probe-based failure detection.

Link-based detection requires the use of a NIC which supports this feature. Some NICs that support this are hme, eri, ce, ge, bge, qfe and vnet (part of Sun's Logical Domains). They are able to detect failure of the link immediately and report that failure to Solaris. Solaris can then take appropriate steps to ensure that network traffic continues to flow on the remaining NIC(s).

Other NICs do not support this link-based failure detection, and must use probe-based detection. This method uses ICMP packets ("pings") from the NICs in the IPMP group to detect failure of a NIC. This requires one IP address per NIC, in addition to the IP address of the group.

Regardless of the method used, configuration can be accomplished manually or via files /etc/hostname.<NIC-instance>. First I'll describe the manual method.

Link-based Detection

Using link-based detection is easiest. The commands to configure IPMP look like these, whether they're run in an exclusive-IP zone for itself, or in the global zone, for its NICs and for NICs used by shared-IP Containers:
# ifconfig bge1 plumb
# ifconfig bge1 twilight group ipmp0 up
# ifconfig bge2 plumb
# ifconfig bge2 group ipmp0 up
Note that those commands only achieve the desired network configuration until the next time that Solaris boots. To configure Solaris to do the same thing when it next boots, you must put the same configuration information into configuration files. Inserting those parameters into configuration files is also easy:
/etc/hostname.bge1:
twilight group ipmp0 up

/etc/hostname.bge2: group ipmp0 up
Those two files will be used to configure networking the next time that Solaris boots. Of course, an IP address entry for twilight is required in /etc/inet/hosts.

If you have entered the ifconfig commands directly, you are finished. You can test your IPMP group with the if_mpadm command, which can be run in the global zone, to test an IPMP group in the global zone, or can be run in an exclusive-IP zone, to test one of its groups:

# ifconfig -a
...
bge1: flags=201000843 mtu 1500 index 4
        inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255
        groupname ipmp0
        ether 0:14:4f:f8:9:1d
bge2: flags=201000843 mtu 1500 index 5
        inet 0.0.0.0 netmask ff000000
        groupname ipmp0
        ether 0:14:4f:fb:ca:b
...

# if_mpadm -d bge1
# ifconfig -a
...
bge1: flags=289000842 mtu 0 index 4
        inet 0.0.0.0 netmask 0
        groupname ipmp0
        ether 0:14:4f:f8:9:1d
bge2: flags=201000843 mtu 1500 index 5        inet 0.0.0.0 netmask ff000000
        groupname ipmp0
        ether 0:14:4f:fb:ca:b
bge2:1: flags=201000843 mtu 1500 index 5
        inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255
...

# if_mpadm -r bge1
# ifconfig -a
...
bge1: flags=201000843 mtu 1500 index 4        inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255
        groupname ipmp0
        ether 0:14:4f:f8:9:1d
bge2: flags=201000843 mtu 1500 index 5        inet 0.0.0.0 netmask ff000000
        groupname ipmp0
        ether 0:14:4f:fb:ca:b
...

If you are using link-based detection, that's all there is to it!

Probe-based detection

As mentioned above, using probe-based detection requires more IP addresses:

/etc/hostname.bge1:
twilight netmask + broadcast + group ipmp0 up addif twilight-test-bge1 \\
deprecated -failover netmask + broadcast + up
/etc/hostname.bge2:
twilight-test-bge2 deprecated -failover netmask + broadcast + group ipmp0 up
Three entries for hostname and IP address pairs will, of course, be needed in /etc/inet/hosts.

All that's left is a reboot of the Container. If a reboot is not practical at this time, you can accomplish the same effect by using ifconfig(1M) commands:

twilight# ifconfig bge1 plumb
twilight# ifconfig bge1 twilight netmask + broadcast + group ipmp0 up addif \\
twilight-test-bge1 deprecated -failover netmask + broadcast + up
twilight# ifconfig bge2 plumb
twilight# ifconfig bge2 twilight-test-bge2 deprecated -failover netmask + \\
broadcast + group ipmp0 up

Conclusion

Whether link-based failure detection or probe-based failure detection is used, we have a Container with these network properties:

  1. Two network interfaces
  2. Automatic failover between the two NICs
You can expand on this with more NICs to achieve even more resiliency or even greater bandwidth.
About

Jeff Victor writes this blog to help you understand Oracle's Solaris and virtualization technologies.

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today