Thursday Aug 20, 2009

Why are packets going out of the "wrong" interface?

I often refer to this blog by James Carlson, so to help others, and me, find it, here is Packets out of the wrong interface. Thanks James for all the help over the years!

Steffen

VLANs and Aggregations

Every once in a while I see the question asking whether it is possible to use IEEE 802.1q VLANs together with IEEE 802.3ad Link Aggregation. I frequently have to check myself. So in order to better remind me, and share with others, here is a quick demonstration of how to get the two working together.

My test system is running build 05 of the upcoming Solaris 10 10/09 (update 8). The system has four bge interfaces, and I will use numbers 1 and 2. (This should work just as well with previous updates of Solaris 10, and with Sun Trunking in Solaris 9, except for the zones parts. I am using zones just to isolate my traffic generation and easily get it to use a specific data link.)

Starting out things like like this.

global# dladm show-dev
bge0            link: up        speed: 1000  Mbps       duplex: full
bge1            link: unknown   speed: 0     Mbps       duplex: unknown
bge2            link: unknown   speed: 0     Mbps       duplex: unknown
bge3            link: unknown   speed: 0     Mbps       duplex: unknown
global# dladm show-link
bge0            type: non-vlan  mtu: 1500       device: bge0
bge1            type: non-vlan  mtu: 1500       device: bge1
bge2            type: non-vlan  mtu: 1500       device: bge2
bge3            type: non-vlan  mtu: 1500       device: bge3
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
I have my switch set up to aggregate ports 1 and 2, and here is how I do it with Solaris 10.
global# dladm create-aggr -d bge1 -d bge2 1
global# dladm show-link
bge0            type: non-vlan  mtu: 1500       device: bge0
bge1            type: non-vlan  mtu: 1500       device: bge1
bge2            type: non-vlan  mtu: 1500       device: bge2
bge3            type: non-vlan  mtu: 1500       device: bge3
aggr1           type: non-vlan  mtu: 1500       aggregation: key 1
VLAN tagged interfaces are used by accessing the underlying data link by preceeding the data link ID with the VLAN tag. For bge1 and VLAN 111 that would be bge111001. For for aggr1 it would be aggr111001.

For this setup I am using zones zone111 and zone112 configured as an exclusive IP Instance. The zone configuration look like this.

global# zonecfg -z zone111 info
zonename: zone111
zonepath: /zones/zone111
brand: native
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: exclusive
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
net:
        address not specified
        physical: aggr111001
        defrouter not specified
Once configured, installed, and booted, the network configuration of zone111 is:
global# zlogin zone111 ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
aggr111001: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 172.16.111.141 netmask ffffff00 broadcast 172.16.111.255
        ether 0:3:ba:e3:42:8c
Turns out that configuring this was easy compared to showing that the link aggregation was really working. While the full list of links known when the zones are includes the aggregation and the VLANs on the aggregation, tools such a netstat or nicstat would not include them. As it turns out they only report on interfaces that are plumbed up in that IP Instance. It will not be possible to plumb either bge1 or bge2 since they are members of the aggregation.
global# dladm show-link
bge0            type: non-vlan  mtu: 1500       device: bge0
bge1            type: non-vlan  mtu: 1500       device: bge1
bge2            type: non-vlan  mtu: 1500       device: bge2
bge3            type: non-vlan  mtu: 1500       device: bge3
aggr1           type: non-vlan  mtu: 1500       aggregation: key 1
aggr111001      type: vlan 111  mtu: 1500       aggregation: key 1
aggr112001      type: vlan 112  mtu: 1500       aggregation: key 1
global# netstat -i
Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs Collis Queue
lo0   8232 loopback      localhost      98     0     98     0     0      0
bge0  1500 pinebarren    pinebarren     43101  0     7181   0     0      0
So I ended up using kstat(1M) to get the values of the number of outbound packets. I an interested in outbound as that is what Solaris can affect regarding distributing traffic across links in an aggregation--the switch determines that for inbound traffic.

This example shows data on instance 2 of the bge interface for kstat value opackets.

global# kstat -m bge -i 2 -s opackets
module: bge                             instance: 2
name:   mac                             class:    net
        opackets                        2542
With kstat I can see that for different connections either bge1 or bge2 has packets going out on it. A good test for me was scp to a remote system. Neither ping nor traceroute caused the necessary hashing to use both links in the aggregation.

Steffen

Monday Jun 01, 2009

OpenSolaris 2009.06 Delivers Crossbow (Network Virtualization and Resource Control)

Today OpenSolaris 2009.06, the third release of OpenSolaris, is announced and available for download. Among the many features in this version is the delivery of Project Crossbow, in a fully supported distribution. This brings network virtualization, including Virtual NICs (VNICs), bandwidth control and management, flow (QoS) creation and management, virtual switches, and other features to OpenSolaris.

Network virtualization joins a number of other features already in OpenSolaris, such as vanity naming (allowing custom names for data links), snooping on loopback for better observability, a re-architected IPMP with an administrative interface, and Network Automagic (NWAM--automatic configuration of desktop networking based on available wired and wireless network services).

Congratulations to everyone who made all this possible!

Steffen PS: Regarding the fully supported, please notice the new support prices and durations!

Monday Jan 19, 2009

IPMP Re-architecture is delivered

In the process of working on some zones and IPMP testing, I ran into a little difficulty. After probing for some insight, I was reminded by Peter Memishian that the IPMP Re-Architecture (part of Project Clearview) bits were going to be in Nevada/SXCE build 107, and that I could BFU the lastest bits onto an existing Nevada install. Well!!! [For Peter's own perspective of this, see his recent blog.]

Since I was already playing with build 105 because the Crossbow features are now integrated, I decided to apply the IPMP bits to a 105 installation. [Note: The IPMP Re-architecture is expected to be in Solaris Express Community Edition (SX-CE) build 107 or so (due to be out early Feb 2009), and thus in OpenSolaris 2009.spring (I don't know what its final name will be. Early access to IPS packages for OpenSolaris 2008.11 should appear in the bi-weekly developer repository shortly after SX-CE has the feature included. There is no intention to back port the re-architecture to Solaris 10.]

I am impressed! The bits worked right away, and once I got used to the slightly different way of monitoring IPMP, I really liked what I saw.

Being accustomed to using IPMP on Solaris 10 and with Crossbow beta testing previous Nevada bits, I used the long-standing (Solaris 10 and prior) IPMP configuration style I am used to. For my testing, I am using link failure testing only, so no probe addresses are configured. [For examples of the new configuration format, see the section Using the New IPMP Configuration Style below. (15 Feb 2009)]

global# cat /etc/hostname.bge1
group shared

global# cat /etc/hostname.bge2
group shared

global# cat /etc/hostname.bge3
group shared standby
In my test case bge1 and bge2 are active interfaces, and bge3 is a standby interface.
global# ifconfig -a4
bge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 139.164.63.125 netmask ffffff00 broadcast 139.164.63.255
        ether 0:3:ba:e3:42:8b
bge1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8c
bge2: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8d
bge3: flags=261000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY,INACTIVE,CoS> mtu 1500 index 5
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8e
ipmp0: flags=8201000842<BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 6
        inet 0.0.0.0 netmask 0
        groupname shared
You will notice that all three interfaces are up and part of group shared. What is different from the old IPMP is that automatically another interface was created, with the flag IPMP. This is the interface that will be used for all the data IP addresses.

Because I used the old format for the /etc/hostname.\* files, the backward compatibility of the new IPMP automatically created the ipmp0 interface and assigned it a name. If I wish to have control over that name, I must configure IPMP slightly differently. More on that later.

The new command ipmpstat(1M) is also introduced to get enhanced information regarding the IPMP configuration.

My test is really about using zones and IPMP, so here is what things look like when I bring up three zones that are also configured the traditional way, with network definitions using the bge interfaces. [Using the new format, I would replace bge with either ipmp0 (keep in mind that 0 (zero) is set dynamically) or shared. For more details on the new format, go to Using the New IPMP Configuration Style below. (15 Feb 2009)]

global# for i in 1 2 3 \^Jdo\^J zonecfg -z shared${i} info net \^Jdone
net:
        address: 10.1.14.141/26
        physical: bge1
        defrouter: 10.1.14.129
net:
        address: 10.1.14.142/26
        physical: bge1
        defrouter: 10.1.14.129
net:
        address: 10.1.14.143/26
        physical: bge2
        defrouter: 10.1.14.129
After booting the zones, note that the zones' IP addresses are on logical interfaces on ipmp0, not the previous way of being logical interfaces on bge.
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone shared1
        inet 127.0.0.1 netmask ff000000
lo0:2: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone shared2
        inet 127.0.0.1 netmask ff000000
lo0:3: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone shared3
        inet 127.0.0.1 netmask ff000000
bge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 139.164.63.125 netmask ffffff00 broadcast 139.164.63.255
        ether 0:3:ba:e3:42:8b
bge1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8c
bge2: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8d
bge3: flags=261000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY,INACTIVE,CoS> mtu 1500 index 5
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8e
ipmp0: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 6
        zone shared1
        inet 10.1.14.141 netmask ffffffc0 broadcast 10.1.14.191
        groupname shared
ipmp0:1: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 6
        zone shared2
        inet 10.1.14.142 netmask ffffffc0 broadcast 10.1.14.191
ipmp0:2: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 6
        zone shared3
        inet 10.1.14.143 netmask ffffffc0 broadcast 10.1.14.191
For address information, here are the pre and post boot ipmpstat outputs.
global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
0.0.0.0                   down   ipmp0       --          --
global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.143               up     ipmp0       bge1        bge2 bge1
10.1.14.142               up     ipmp0       bge2        bge2 bge1
10.1.14.141               up     ipmp0       bge1        bge2 bge1
What's really neat is that it shows which interface(s) are used for outbound traffic. A different interface will be selected for each new remote IP address. That is the level of outbound load spreading at this time.
global# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmp0       shared      ok        --        bge2 bge1 (bge3)
There is no group difference before or after.
global# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmp0       shared      ok        --        bge2 bge1 (bge3)
The FDT column lists the probe-based failure detection time, and is empty since that is disabled in this setup. bge3 is listed third and in parenthesis since that interface is not being used for data traffic at this time.
global# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge3        no      ipmp0       is-----   up        disabled  ok
bge2        yes     ipmp0       -------   up        disabled  ok
bge1        yes     ipmp0       --mb---   up        disabled  ok
Also, there are no differences for interface status. In both cases bge1 is used from multicast and broadcast traffic, and bge3 is inactive and in standby mode.
global# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge3        no      ipmp0       is-----   up        disabled  ok
bge2        yes     ipmp0       -------   up        disabled  ok
bge1        yes     ipmp0       --mb---   up        disabled  ok
The probe and target output is uninteresting in this setup as I don't have probe based failure detection on. I am including them for completeness.
global# ipmpstat -p
ipmpstat: probe-based failure detection is disabled

global# ipmpstat -t
INTERFACE   MODE      TESTADDR            TARGETS
bge3        disabled  --                  --
bge2        disabled  --                  --
bge1        disabled  --                  --
So lets see what happens on a link 'failure' as I turn of the switch port going to bge1.

On the console, the indication is a link failure.

Jan 15 14:49:07 global in.mpathd[210]: The link has gone down on bge1
Jan 15 14:49:07 global in.mpathd[210]: IP interface failure detected on bge1 of group shared
The various ipmpstat outputs reflect the failure of bge1 and failover to to bge3, which had been in standby mode, and to bge2. I had expected both IP addresses to end up on bge3. Instead, IPMP determines how to best spread the IPs across the available interfaces.

The address output shows that .141 and .143 are now on bge3.

global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.143               up     ipmp0       bge3        bge3 bge2
10.1.14.142               up     ipmp0       bge2        bge3 bge2
10.1.14.141               up     ipmp0       bge2        bge3 bge2
The group status has changed, with bge1 now shown in brackets as it is in failed mode.
global# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmp0       shared      degraded  --        bge3 bge2 [bge1]
The interface status makes it clear that bge1 is down. Broadcast and multicast is now handed by bge2.
global# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge3        yes     ipmp0       -s-----   up        disabled  ok
bge2        yes     ipmp0       --mb---   up        disabled  ok
bge1        no      ipmp0       -------   down      disabled  failed
As expected, the only difference in the ifconfig output is for bge1, showing that it is in failed state. The zones are continue to shown using the ipmp0 interface. This took me a little bit of getting used to. Before, ifconfig was sufficient to fully see what the state is. Now, I must use ipmpstat as well.

global# ifconfig -a4
...
bge1: flags=211000803<UP,BROADCAST,MULTICAST,IPv4,FAILED,CoS> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8c
bge2: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8d
bge3: flags=221000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY,CoS> mtu 1500 index 5
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared
        ether 0:3:ba:e3:42:8e
ipmp0: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 6
        zone shared1
        inet 10.1.14.141 netmask ffffffc0 broadcast 10.1.14.191
        groupname shared
ipmp0:1: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 6
        zone shared2
        inet 10.1.14.142 netmask ffffffc0 broadcast 10.1.14.191
ipmp0:2: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 6
        zone shared3
        inet 10.1.14.143 netmask ffffffc0 broadcast 10.1.14.191
"Repairing" the interface, things return to normal.
Jan 15 15:13:03 global in.mpathd[210]: The link has come up on bge1
Jan 15 15:13:03 global in.mpathd[210]: IP interface repair detected on bge1 of group shared
Note here only one IP address ended up getting moved back to bge1.
global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.143               up     ipmp0       bge1        bge2 bge1
10.1.14.142               up     ipmp0       bge2        bge2 bge1
10.1.14.141               up     ipmp0       bge2        bge2 bge1
Interface bge3 is back in standby mode.
global# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
ipmp0       shared      ok        --        bge2 bge1 (bge3)
All three interfaces are up, only two are active, and broadcast and multicast stayed on bge2 (no need to change that now).
global# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge3        no      ipmp0       is-----   up        disabled  ok
bge2        yes     ipmp0       --mb---   up        disabled  ok
bge1        yes     ipmp0       -------   up        disabled  ok
As a further example of rebalancing of the IP address, here is what happens with four IP addresses spread across two interfaces.
global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.144               up     ipmp0       bge2        bge2 bge1
10.1.14.143               up     ipmp0       bge1        bge2 bge1
10.1.14.142               up     ipmp0       bge2        bge2 bge1
10.1.14.141               up     ipmp0       bge1        bge2 bge1

Jan 15 16:19:09 global in.mpathd[210]: The link has gone down on bge1
Jan 15 16:19:09 global in.mpathd[210]: IP interface failure detected on bge1 of group shared

global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.144               up     ipmp0       bge2        bge3 bge2
10.1.14.143               up     ipmp0       bge3        bge3 bge2
10.1.14.142               up     ipmp0       bge2        bge3 bge2
10.1.14.141               up     ipmp0       bge3        bge3 bge2

Jan 15 18:11:35 global in.mpathd[210]: The link has come up on bge1
Jan 15 18:11:35 global in.mpathd[210]: IP interface repair detected on bge1 of group shared

global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.144               up     ipmp0       bge2        bge2 bge1
10.1.14.143               up     ipmp0       bge1        bge2 bge1
10.1.14.142               up     ipmp0       bge2        bge2 bge1
10.1.14.141               up     ipmp0       bge1        bge2 bge1
There is even spreading of the IP addresses across any two active interfaces.

Using the New IPMP Configuration Style

In the previous examples, I used the old style of configuring IPMP with the /etc/hostname.xyzN files. Those files should work on all older versions of Solaris as well as with the re-architecture bits. This section briefly covers the new format.

A new file that is introduced is the hostname.ipmp-group configuration file. It must follow the same format as any other data link configuration, ASCII characters followed by a number. I will use the same group name as above; however, I have to add a number to the end--thus the group name will be shared0. If you don't have the trailing number, the old style of IPMP setup will be used.

I create a file to define the IPMP group. Note that it contains only the keyword ipmp.

global# cat /etc/hostname.shared0
ipmp
The other files for the NICs reference the IPMP group name.

global# cat /etc/hostname.bge1
group shared0 up

global# cat /etc/hostname.bge2
group shared0 up

global# cat /etc/hostname.bge3
group shared0 standby up
One note that may not be obvious. I am not using the keyword -failover as I am not using test addresses. Thus the interfaces are also not listed as deprecated in the ifconfig output.

global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
shared0: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 2
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
bge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 139.164.63.125 netmask ffffff00 broadcast 139.164.63.255
        ether 0:3:ba:e3:42:8b
bge1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
        ether 0:3:ba:e3:42:8c
bge2: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 5
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
        ether 0:3:ba:e3:42:8d
bge3: flags=261000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY,INACTIVE,CoS> mtu 1500 index 6
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
        ether 0:3:ba:e3:42:8e
After booting the zones, which are still configured to use bge1 or bge2, things look like this.
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone shared1
        inet 127.0.0.1 netmask ff000000
lo0:2: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone shared2
        inet 127.0.0.1 netmask ff000000
lo0:3: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone shared3
        inet 127.0.0.1 netmask ff000000
shared0: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 2
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
shared0:1: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 2
        zone shared1
        inet 10.1.14.141 netmask ffffffc0 broadcast 10.1.14.191
shared0:2: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 2
        zone shared2
        inet 10.1.14.142 netmask ffffffc0 broadcast 10.1.14.191
shared0:3: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 2
        zone shared3
        inet 10.1.14.143 netmask ffffffc0 broadcast 10.1.14.191
bge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 139.164.63.125 netmask ffffff00 broadcast 139.164.63.255
        ether 0:3:ba:e3:42:8b
bge1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
        ether 0:3:ba:e3:42:8c
bge2: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 5
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
        ether 0:3:ba:e3:42:8d
bge3: flags=261000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY,INACTIVE,CoS> mtu 1500 index 6
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
        ether 0:3:ba:e3:42:8e
global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.143               up     shared0     bge1        bge2 bge1
10.1.14.142               up     shared0     bge2        bge2 bge1
10.1.14.141               up     shared0     bge1        bge2 bge1
0.0.0.0                   up     shared0     --          --

global# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
shared0     shared0     ok        --        bge2 bge1 (bge3)

global# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge3        no      shared0     is-----   up        disabled  ok
bge2        yes     shared0     -------   up        disabled  ok
bge1        yes     shared0     --mb---   up        disabled  ok
Things are the same as before, except that the I now have specified the IPMP group name (shared0 instead of the previous ipmp0). I find this very useful as the name can help identify the purpose, and when debugging, different IPMP group names using context appropriate text should be very helpful.

I find the integration, or rather the backward compatibility, great. Not only will the old or existing IPMP setup work, the existing zonecfg network setup works as well. This means the same configuration files will work pre- and post-re-architecture!

Let's take a look at how things look within a zone.

shared1# ifconfig -a4
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
shared0:1: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 2
        inet 10.1.14.141 netmask ffffffc0 broadcast 10.1.14.191
shared1# netstat -rnf inet

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              10.1.14.129          UG        1          2 shared0
10.1.14.128          10.1.14.141          U         1          0 shared0:1
127.0.0.1            127.0.0.1            UH        1         33 lo0:1
The zone's network is on the link shared0 using a logical IP, and everything else looks as it has always looked. This output is actually while bge1 is down. IPMP hides all the details in the non-global zone.

Using Probe-based Failover

The configurations so far have been with link-based failure detection. IPMP has the ability to do probe-based failure detection, where ICMP packet are sent to other nodes on the system. This allows for failure detection way beyond what link-based detection can do, including the whole switch, and items past it up to and including routers. In order to use probe-based failure detection, test addresses are required on the physical NICs. For my configuration, I use test addresses on a completely different subnet, and my router is another system running Solaris 10. The router happens to be a zone with two NICs and configured as an exclusive IP Instance.

I am using a completely different subnet as I want to isolate the global zone from the non-global zones, and the setup is also using the defrouter zonecfg option, and I don't want to interfere with that setup.

The IPMP setup is as follows. I have added test addresses on the 172.16.10.0/24 subnet, and the interfaces are set to not fail over.

global# cat /etc/hostname.shared0
ipmp

global# cat /etc/hostname.bge1
172.16.10.141/24 group shared0 -failover up

global# cat /etc/hostname.bge2
172.16.10.142/24 group shared0 -failover up

global# cat /etc/hostname.bge3
172.16.10.143/24 group shared0 -failover standby up
This is the state of the system before bringing up any zones.
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
shared0: flags=8201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS,IPMP> mtu 1500 index 2
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname shared0
bge0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 139.164.63.125 netmask ffffff00 broadcast 139.164.63.255
        ether 0:3:ba:e3:42:8b
bge1: flags=209040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,CoS> mtu 1500 index 4
        inet 172.16.10.141 netmask ffffff00 broadcast 172.16.10.255
        groupname shared0
        ether 0:3:ba:e3:42:8c
bge2: flags=209040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,CoS> mtu 1500 index 5
        inet 172.16.10.142 netmask ffffff00 broadcast 172.16.10.255
        groupname shared0
        ether 0:3:ba:e3:42:8d
bge3: flags=269040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACTIVE,CoS> mtu 1500 index 6
        inet 172.16.10.143 netmask ffffff00 broadcast 172.16.10.255
        groupname shared0
        ether 0:3:ba:e3:42:8e
The ipmpstat output is different now.
global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
0.0.0.0                   up     shared0     --          --

global# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
shared0     shared0     ok        10.00s    bge2 bge1 (bge3)

global# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge3        no      shared0     is-----   up        ok        ok
bge2        yes     shared0     -------   up        ok        ok
bge1        yes     shared0     --mb---   up        ok        ok
The Failure Detection Time is now set. And the probe information option lists an ongoing update of the probe results.
global# ipmpstat -p
TIME      INTERFACE   PROBE  NETRTT    RTT       RTTAVG    TARGET
0.14s     bge3        426    0.48ms    0.56ms    0.68ms    172.16.10.16
0.24s     bge2        426    0.50ms    0.98ms    0.74ms    172.16.10.16
0.26s     bge1        424    0.42ms    0.71ms    1.72ms    172.16.10.16
1.38s     bge1        425    0.42ms    0.50ms    1.57ms    172.16.10.16
1.79s     bge2        427    0.54ms    0.86ms    0.76ms    172.16.10.16
1.93s     bge3        427    0.45ms    0.53ms    0.66ms    172.16.10.16
2.79s     bge1        426    0.38ms    0.56ms    1.44ms    172.16.10.16
2.85s     bge2        428    0.34ms    0.41ms    0.71ms    172.16.10.16
3.15s     bge3        428    0.44ms    4.55ms    1.14ms    172.16.10.16
\^C
The target information option shows the current probe targets.
global# ipmpstat -t
INTERFACE   MODE      TESTADDR            TARGETS
bge3        multicast 172.16.10.143       172.16.10.16
bge2        multicast 172.16.10.142       172.16.10.16
bge1        multicast 172.16.10.141       172.16.10.16
Once the zones are up and running and bge1 is down, the status output changes accordingly.
global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.143               up     shared0     bge2        bge3 bge2
10.1.14.142               up     shared0     bge3        bge3 bge2
10.1.14.141               up     shared0     bge2        bge3 bge2
0.0.0.0                   up     shared0     --          --

global# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
shared0     shared0     degraded  10.00s    bge3 bge2 [bge1]

global# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge3        yes     shared0     -s-----   up        ok        ok
bge2        yes     shared0     --mb---   up        ok        ok
bge1        no      shared0     -------   down      failed    failed

global# ipmpstat -p
TIME      INTERFACE   PROBE  NETRTT    RTT       RTTAVG    TARGET
0.46s     bge2        839    0.43ms    0.98ms    1.17ms    172.16.10.16
1.15s     bge3        840    0.32ms    0.37ms    0.65ms    172.16.10.16
1.48s     bge2        840    0.37ms    0.45ms    1.08ms    172.16.10.16
2.56s     bge3        841    0.45ms    0.54ms    0.63ms    172.16.10.16
3.17s     bge2        841    0.40ms    0.51ms    1.01ms    172.16.10.16
3.93s     bge3        842    0.40ms    0.47ms    0.61ms    172.16.10.16
4.61s     bge2        842    0.63ms    0.75ms    0.98ms    172.16.10.16
5.17s     bge3        843    0.38ms    0.46ms    0.59ms    172.16.10.16
5.72s     bge2        843    0.36ms    0.44ms    0.91ms    172.16.10.16
\^C

global# ipmpstat -t
INTERFACE   MODE      TESTADDR            TARGETS
bge3        multicast 172.16.10.143       172.16.10.16
bge2        multicast 172.16.10.142       172.16.10.16
bge1        multicast 172.16.10.141       172.16.10.16
Without showing the details here, the non-global zones continue to function.

Bringing all three interfaces down, things look like this.

Jan 19 13:51:22 global in.mpathd[61]: The link has gone down on bge2
Jan 19 13:51:22 global in.mpathd[61]: IP interface failure detected on bge2 of group shared0
Jan 19 13:52:04 global in.mpathd[61]: The link has gone down on bge3
Jan 19 13:52:04 global in.mpathd[61]: All IP interfaces in group shared0 are now unusable
global# ipmpstat -a
ADDRESS                   STATE  GROUP       INBOUND     OUTBOUND
10.1.14.143               up     shared0     --          --
10.1.14.142               up     shared0     --          --
10.1.14.141               up     shared0     --          --
0.0.0.0                   up     shared0     --          --

global# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
shared0     shared0     failed    10.00s    [bge3 bge2 bge1]

global# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
bge3        no      shared0     -s-----   down      failed    failed
bge2        no      shared0     -------   down      failed    failed
bge1        no      shared0     -------   down      failed    failed

global# ipmpstat -p
\^C

global# ipmpstat -t
INTERFACE   MODE      TESTADDR            TARGETS
bge3        multicast 172.16.10.143       --
bge2        multicast 172.16.10.142       --
bge1        multicast 172.16.10.141       --
The whole IPMP group shared0 is down, all appropriate ipmpstat output reflects that, and no probes are listed nor probe RTT time reports are updated.

An additional scenario might be to have two separate paths, and have something other than a link failure force the failover.

Wednesday Mar 26, 2008

How to BFU a System

Sometimes you want to try out a new feature not yet delivered into Solaris Nevada, and you have apply binaries using BFU. I imagine if you do this all the time, you know all the tricks and gotchas. I don't do it often enough and sometimes get caught up in some details. So here are the steps I tend to use.

First, get the latest BFU package from the ON (OS/Net) Consolidation. I typically only use the SUNWonbld tar file for my hardware.

Download the bits you want to install, such as those for Crossbow Beta or Clearview's snoop on loopback

To make life a little simpler, I add the following to root's .profile file.

if [ -d /opt/onbld ]
then
   FASTFS=/opt/onbld/bin/`uname -p`/fastfs ; export FASTFS
   BFULD=/opt/onbld/bin/`uname -p`/bfuld ; export BFULD
   GZIPBIN=/usr/bin/gzip ; export GZIPBIN
   PATH=$PATH:/opt/onbld/bin
fi

Now to apply the bits. After unpacking the bits into a temporary location, lets say /tmp/bfu, install the onbld package.

# pkgadd -d onbld all

Processing package instance  from 

OS-Net Build Tools(sparc) 11.11,REV=2008.03.18.14.39
Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.

...

Installation of  was successful.
#
I re-read my .profile, and verify that the necessary BFU variables are set
# . /.profile
# echo $FASTFS
/opt/onbld/bin/sparc/fastfs
Now apply the BFU (this one is for Crossbow beta). You must use the full pathname!

Note: you may want to do this from the console, in case you loose your network connection.

# bfu `pwd`/nightly-nd
Copying /opt/onbld/bin/bfu to /tmp/bfu.1000
Executing /tmp/bfu.1000 /tmp/bfu/nightly-nd

...

Entering post-bfu protected environment (shell: ksh).
Edit configuration files as necessary, then reboot.

bfu#
Note that you end up in the BFU shell. Now issue an automatic conflict resolution check.
bfu# /opt/onbld/bin/acr
Getting ACR information from /tmp/bfu/nightly-nd... ok

updating //platform/sun4v/boot_archive
Finished.  See /tmp/acr.nhaqVi/allresults for complete log.
bfu#

bfu# exit
Exiting post-bfu protected environment.  To reenter, type:
LD_NOAUXFLTR=1 LD_LIBRARY_PATH=/tmp/bfulib LD_LIBRARY_PATH_64=/tmp/bfulib/64 
PATH=/tmp/bfubin /tmp/bfubin/ksh
#
Its time to reboot and run with the new bits!
About

stw

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today