Wednesday Feb 24, 2010

My thoughts on configuring zones with shared IP instances and the 'defrouter' parameter

An occasional call or email I receive has questions about routing issues when using Solaris Zones in the (default) shared IP Instance configuration. Everything works well when the non-global zones are on the same IP subnet (lets say 172.16.1.0/24) as the global zone. Routing gets a little tricky when the non-global zones are on a different subnet.

My general recommendation is to isolate. This means:

  • Separate subnets for the global zone (administration, backup) and the non-global zones (applications, data).
  • Separate data-links for the global and non-global zones.
    • The non-global zones can share a data-link
    • Non-global zones on different IP subnets use different data-links
Using separate data-links is not always possible. I was concerned whether this would actually work.

So I did some testing, and exchanged some emails because of a comment I made regarding PSARC/2008/057 and the automatic removal of a default route when the zone is halted.

Turns out I have been very restrictive in suggesting that the global and non-global zones not share a data-link. While I think that is a good administrative policy, to separate administrative and application traffic, it is not a requirement. It is OK to have the global zone and one or more non-global zones share the same data-link. However, if the non-global zones are to have different default routes, they must be on subnets that the global zone is not on.

My test case running Solaris 10 10/09 has the global zone on the 129.154.53.0/24 network and the non-global zone on the 172.16.27.0/24 network.

global# ifconfig -a
...
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.132 netmask ffffff00 broadcast 129.154.53.255
        ether 0:14:4f:ac:57:c4
e1000g0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        zone shared1
        inet 172.16.27.27 netmask ffffff00 broadcast 172.16.27.255

global# zonecfg -z shared1 info net
net:
        address: 172.16.27.27/24
        physical: e1000g0
        defrouter: 172.16.27.16

The routing table as seen from both are:
global# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              129.154.53.215       UG        1        123
default              172.16.27.16         UG        1          7 e1000g0
129.154.53.0         129.154.53.132       U         1         50 e1000g0
224.0.0.0            129.154.53.132       U         1          0 e1000g0
127.0.0.1            127.0.0.1            UH        3         80 lo0

shared1# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              172.16.27.16         UG        1          7 e1000g0
172.16.27.0          172.16.27.27         U         1          3 e1000g0:1
224.0.0.0            172.16.27.27         U         1          0 e1000g0:1
127.0.0.1            127.0.0.1            UH        4         78 lo0:1
While the global zone shows both routes, only the default applying to its subnet will be used. And for traffic leaving the non-global zone, only its default will be used.

You may notice that the Interface for the global zone's default router is blank. That is because I have set the default route via /etc/defaultrouter. I noticed that if it is determined via the route discovery daemon, it will be listed as being on e1000g0! This does not affect the behavior, however it may be visually confusing, which is probably why I initially leaned towards saying to not share the data-link.

There are multiple ways to determining which route might be used, including ping(1M) and traceroute(1M). I like the output of the route get command.

global# route get 172.16.29.1
   route to: 172.16.29.1
destination: default
       mask: default
    gateway: 129.154.53.1
  interface: e1000g0
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh    rtt,ms rttvar,ms  hopcount      mtu     expire
       0         0         0         0         0         0      1500         0

shared1# route get 172.16.28.1
   route to: 172.16.28.1
destination: default
       mask: default
    gateway: 172.16.27.16
  interface: e1000g0:1
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh    rtt,ms rttvar,ms  hopcount      mtu     expire
       0         0         0         0         0         0      1500         0
This quickly shows which interfaces and IP addresses are being used. If there are multiple default routes, repeated invocations of this will show a rotation in the selection of the default routes.

Thanks to Erik Nordmark and Penny Cotten for their insights on this topic!

Steffen Weiberle

Thursday Aug 20, 2009

Why are packets going out of the "wrong" interface?

I often refer to this blog by James Carlson, so to help others, and me, find it, here is Packets out of the wrong interface. Thanks James for all the help over the years!

Steffen

Tuesday Apr 14, 2009

Using IPMP with link based failure detection

Solaris has had a feature to increase network availability called IP Multipathing (IPMP). Initially it required a test address on every data link in an IPMP group, where the test addresses were used as the source IP address to probe network elements for path availability. One of the benefits of probe-based failure detection is that it can extend beyond the directly connected link(s), and verify paths through the attached switch(es) to what typically is a router or other redundant element to provide available services.

Having one IP address (whether a public or a private, non routable) per data link and also the separate address(es) for the application(s) turns out to be a lot of addresses to allocate and administer. And since the default of five probes spaced two seconds apart meant a failure would take at least ten (10) seconds to be detected, something more was needed.

So in the Solaris 9 timeframe the ability to also do link based failure detection was delivered. It requires specific NICs whose driver has the ability to notify the system that a link has failed. The Introduction to IPMP in the Solaris 10 Systems Administrators Guide on IP Services lists the NICs that support link state notification. Solaris 10 supports configuring IPMP with only link based failure detection.

global# more /etc/hostname.bge[12]
::::::::::::::
/etc/hostname.bge1
::::::::::::::
10.1.14.140/26 group ipmp1 up
::::::::::::::
/etc/hostname.bge2
::::::::::::::
group ipmp1 standby up
On system boot, there will be an indication on the console that since no test addresses are defined, probe-based failure detection is disabled.

Apr 10 10:57:20 in.mpathd[168]: No test address configured on interface bge2; disabling probe-based failure detection on it
Apr 10 10:57:20 in.mpathd[168]: No test address configured on interface bge1; disabling probe-based failure detection on it
Looking at the interfaces configured,
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 10.1.14.140 netmask ffffffc0 broadcast 10.1.14.191
        groupname ipmp1
        ether 0:3:ba:e3:42:8c
bge1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
bge2: flags=69000842<BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE> mtu 0 index 4
        inet 0.0.0.0 netmask 0
        groupname ipmp1
        ether 0:3:ba:e3:42:8d
you will notice that two of the three interfaces have no address (0.0.0.0). Also, the data address is on a physical interface on bge1. At the same time bge2 has the 0.0.0.0 address. On the failure of bge1,
Apr 10 14:34:53 global bge: NOTICE: bge1: link down
Apr 10 14:34:53 global in.mpathd[168]: The link has gone down on bge1
Apr 10 14:34:53 global in.mpathd[168]: NIC failure detected on bge1 of group ipmp1
Apr 10 14:34:53 global in.mpathd[168]: Successfully failed over from NIC bge1 to NIC bge2


global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=19000802<BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED> mtu 0 index 3
        inet 0.0.0.0 netmask 0
        groupname ipmp1
        ether 0:3:ba:e3:42:8c
bge2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8d
bge2:1: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 4
        inet 10.1.14.140 netmask ffffffc0 broadcast 10.1.14.191
the data address is migrated onto bge2:1. I find this a little confusing. However, I don't know any way around it on Solaris 10. The IPMP Re-architecture makes this a lot easier!

Using Probe-based IPMP with non-global zones

Configuring a shared IP Instance non-global zone and utilizing IPMP managed in the global zone is very easy.

The IPMP configuration is very simple. Interface bge1 is active, and bge2 is in stand-by mode.

global# more /etc/hostname.bge[12]
::::::::::::::
/etc/hostname.bge1
::::::::::::::
group ipmp1 up
::::::::::::::
/etc/hostname.bge2
::::::::::::::
group ipmp1 standby up
My zone configuration is:
global# zonecfg -z zone1 info
zonename: zone1
zonepath: /zones/zone1
brand: native
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
net:
        address: 10.1.14.141/26
        physical: bge1
Prior to booting, the network configuration is:
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone zone1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8c
bge2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8d
After booting, the network looks like this:
global# ifconfig -a4
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        zone zone1
        inet 127.0.0.1 netmask ff000000
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8c
bge1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        zone zone1
        inet 10.1.14.141 netmask ffffffc0 broadcast 10.1.14.191
bge2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname ipmp1
        ether 0:3:ba:e3:42:8d

So a simple case for the use of IPMP, without the need for test addresses! Other IPMP configurations, such as more than two data links, or active-active, are also supported with link based failure detection. The more links involved, the more test addresses are saved with link based failure detection. Since writing this entry I was involved in a customer configuration where this is saving several hundred IP address and their management (such as avoiding duplicate address). That customer is willing to forgo the benefit of probes testing past the local switch port.

Steffen

Thursday Feb 14, 2008

Patches for Using IP Instances with ce NICs are Available

The [Solaris 10] patches to be able to use IP Instances with the Cassini ethernet interface, known as ce, are available on sunsolve.sun.com for Solaris 10 users with a maintenance contract or subscription. (This is for Solaris 10 8/07, or a prior update patched to that level. These patches are included in Solaris 10 5/08, and also in patch clusters or bundles delivered at or around the same time, and since then.)

The SPARC patches are:

  • 137042-01 SunOS 5.10: zoneadmd patch
  • 118777-12 SunOS 5.10: Sun GigaSwift Ethernet 1.0 driver patch

The x86 patches are:

  • 137043-01 SunOS 5.10_x86: zoneadmd patch
  • 118778-11 SunOS 5.10_x86: Sun GigaSwift Ethernet 1.0 driver patch

I have not been able to try out the released patches myself, yet.

Steffen

Thursday Dec 20, 2007

One Step Closer to IP Instances with ce

With the availability of Solaris Nevada build 80 [1], the ability to use IP Instances with the GigaSwift line of NICs and the ce driver becomes possible. The fix for CR 6616075 to zoneadmd(1M) has been integrated into the OpenSolaris code base and is available in build 80. The necessary fix to the ce driver, tracked in CR 6606507, has already been delivered. With this combination, a zone can have an exclusive IP Instance using a ce-based link.

Zone configuration information:

global# zonecfg -z ce1 info net
net:
        address not specified
        physical: ce1
global#

And the view from the non-global zone:

ce1# zonename
ce1
ce1# cat /etc/release
                  Solaris Express Community Edition snv_80 SPARC
           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 17 December 2007
ce1# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
ce1: flags=1000843 mtu 1500 index 2
        inet 192.168.200.153 netmask ffffff00 broadcast 192.168.200.255
        ether 0:3:ba:68:1d:5f
lo0: flags=2002000849 mtu 8252 index 1
        inet6 ::1/128
ce1#

More when the soak time in Nevada is complete and the backport to Solaris 10 is available.

Thanks to the engineers who put energy into these fixes!

Happy Holidays!

Steffen

[1] As of 20 December 2007, build 80 is available within Sun only. Availability on opensolaris.org will be announced on opensolaris-announce@opensolaris.org.

Wednesday Dec 05, 2007

More good news for IP Instances

Continuing progress on the use of IP Instances on the full line of SPARC systems. The e1000g Intel PCI-X Gigabit Ethernet UTP and MMF adapters are now supported on the Sun Fire UltraSPARC servers. The NICs are:
  • x7285a - Sun PCI-X Dual GigE UTP Low Profile. RoHS-6 compliant
  • x7286a - Sun PCI-X GigE MMF Low Profile, RoHS-6 compliant
The NICs are supported on the V490, V890, E2900, E4900, E6900, E20K, and E25K systems. This is an alternative for those waiting for the GigaSwift (ce) NIC to be supported, or who don't need quad-port cards. Since the driver used is the e1000g, which is a GLDv3 driver, full support for IP Instances is available using these cards.

Monday Nov 05, 2007

Using IP Instances with VLANs or How to Make a Few NICs Look Like Many

[Minor editorial and clarification updates 2009.09.28]

Solaris 10 8/07 includes a new feature for zone networking. IP Instances is the facility to give a non-global zone its own complete control over the IP stack, which previously was shared with and controlled by the global zone.

A zone that has an exclusive IP Instance can set interface parameters using ifconfig(1M), put an interface into promiscuous mode to run snoop(1M), be a DHCP client or server, set ndd(1M) variables, have its own IPsec policies, etc.

One requirement for an exclusive IP Instance is that it must have exclusive access to a link name. This is any NIC, VLAN-tagged NIC component, or aggregation at this time. When they become available, virtual NICs will make this much simpler, as a single NIC can be presented to the zones using a number of VNICs, effectively multiplexing access to that NIC. A link name is an entry that can be found in /dev, such as /dev/bge0, /dev/bge321001 (VLAN tag 321 on bge1), aggr2, and so on.

To see what link names are available on a system, use dladm(1M) with the show-link option. For example:

global# dladm show-link
bge0            type: non-vlan  mtu: 1500       device: bge0
bge1            type: non-vlan  mtu: 1500       device: bge1
bge2            type: non-vlan  mtu: 1500       device: bge2
bge3            type: non-vlan  mtu: 1500       device: bge3

As folks have started to use IP Instances to isolate their zones, they have noticed that they don't have sufficient link names (I'll use just link in the rest of this) to assigned to the zones that have or wish to configure as exclusive. So, how does a global zone administrator configure a large number of zones as exclusive?

Let's consider the following situation, where there are three tiers of a web service, where each tier is on a different network.

If each server has only one NIC, the total number of switch ports required is at least eight (8). If each server has a management port, that is another eight ports, even if they are on a different, management network. Add to that at least three three switch ports going to the router.

Consolidating the servers onto a single Solaris 10 instance using exclusive IP Instances requires at least eight NICs for the services (one per service), and at least one for the global zone and management. (We'll ignore a service process requirements, since they are separate anyway, and access could be either via a serial interface or a network.)

One option to consider is using VLANs and VLAN tagging. When using VLAN tagging, additional information is put onto the ethernet frame by the sender which allows the receiver to associated that frame to a specific VLAN. The specification allow up to 4094 VLAN tags, from 1 to 4094. For more information on administering VLANs in Solaris 10, see Administering Virtual Local Area Networks in the Solaris 10 System Administrator Collection.

VLANs is a method to collapse multiple ethernet broadcast domains (whether hubs or switches) into a single network unit (usually a switch). [Typically, a single IP subnet, such as 192.168.54.0/24, is on a broadcast domain. Within such a switch frame, you can have a large number of virtual switches, consolidating network infrastructure and still isolating broadcast domains. Often, the use of VLANs is completely hidden from the systems tied to the switch, as a port on the switch is configured for only one VLAN. With VLAN tagging, a single port can allow a system to connect to multiple VLAns, and therefore multiple networks. Both the switch and the system must be configured for VLAN tagging for this to work properly. VLAN tagging has been used for years, and is robust and reliable.

Any one network interface can have multiple VLANs configured for it, but a single VLAN ID can only exist once on each interface. Thus it is possible to put multiple networks or broadcast domains on a single interface. It is not possible to put more than one VLAN of any broadcast domain on a single interface. For example, you can put VLANs 111, 112, and 113 on interface bge1, but you can not put VLAN 111 on bge1 more than once. You can, however, put VLAN 111 on interfaces bge1 and bge2.

Using the case shown above, if the three web servers are on the same network, say 10.1.111.0/24, you would want to have three interfaces that are all connected to a VLAN capable switch, and configure each interface with a VLAN tag that is the same as the VLAN ID on the switch.

For example, if the VLAN tag is 111 and the interfaces are bge1 through bge3, the link names you would assign to the three web servers would be bge111001, bge111002, and bge111003.

Introducing zones into the setup, the web servers can be run in three separate zones, and with exclusive IP Instances, they can be totally separate and each assigned a VLAN-tagged interface. Web Server 1 could have bge111001, Web Server 2 could have bge111002, and Web Server 3 could have bge111003.

global# zonecfg -z web1 info net
net:
        address not specified
        physical: bge111001

global# zonecfg -z web2 info net
net:
        address not specified
        physical: bge111002

global# zonecfg -z web3 info net
net:
        address not specified
        physical: bge111003

Within the zones, you could configure IP addresses 10.1.111.1/24 through 10.1.111.3/24.

Similarly, for the authentication tier, using VLAN ID 112, you could assign the zones auth1 through auth3 to bge112001, bge112002, and bge112003,respectively. And for application servers app1 and app2 on VLAN ID 113, bge113001 and bge113002. This can be repeated until some limit is reached, whether it is network bandwidth, system resource limits, or the maximum number of concurrent VLANs on either the switch or Solaris.

This configuration could look like the following diagram.

Web Server 1, Auth Server 1, and Application Server 1 share the use of NIC1, yet are all on different VLANs (111, 112, and 113, respectively). The same for instances 2 and 3, except that there is no third application server. All traffic between the three web servers will stay within the switch, as will traffic between the authentication servers. Traffic between the tiers is passed between the IP networks by the router. NICg is showing that the global zone also has a network interface.

Using this technique, the maximum number of zones with exclusive IP Instances you could deploy on a single system that are on the same subnet is limited to the number of interfaces that are capable of doing VLAN tagging. In the above example, with three bge interfaces on the system, the maximum number of exclusive zones on a single subnet would be three. (I have intentionally reserved bge0 for the global zone, but it would be possible to use it as well, making sure the global zone uses a different VLAN ID altogether, such as 1 or 2.)

Wednesday May 30, 2007

In two places at once?

Some background. Like any other mobile workforce, Sun employees have a need to access internal network services while not in the office. While we use commercial products, Sun engineers have also been working on a \*product\* called punchin. Punchin is a Sun-created VPN technology that uses native IPsec/IKE from the operating system in which it runs. It is the primary Solaris VPN solution for Solaris servers and clients, and will be expanding to other operating systems such as MacOS X in the near future.

Security policy states that if a system is 'punched in', it must not be on the public network at the same it. In other words, while the VPN tunnel is up, access to the Internet directly is restricted, especially access from the Internet to the system. While a system is on the VPN, it can not also be your Internet facing personal web server, for example.

Bringing up the VPN is an interactive process, requiring a challenge/response sequence. If you are like me, you may have a system at home and while at work need to access from that system some data on the corporate network. This is a catch-22, since the connection you use remotely to activate the VPN breaks as you start the VPN establishment process (enforcing the policy of being on only one network at a time).

Introduce Solaris Containers, or zones. Each zone looks like its own system. However, they share a single kernel and single IP. But wait, there is this new thing called IP Instance that allows zones configured as having an exclusive IP Instance to have their own IP (they already have their own TCP and UDP for all practical purposes). And wouldn't it be great if I could do this with just one NIC? Hey, Project Crossbow has IP Instances and VNICs. Great!

Now for the reality check. As I was told not so long ago, Rome was not built in one day. IP Instances are in Solaris Nevada and targeted for Solaris 10 7/07. VNICs are only available in a snapshot applied via BFU to Nevada build 61. [See also Note 1 below.]

So, lets see how to do this with just IP Instances.

First, since each instance, which are at least the global zone and one non-global need their own NIC, I need at least two NICs. Not all NICs support IP Instances, so the one(s) for the non-global zone(s) need to support IP Instances, and thus must be using GLDv3 drivers.

In my case, I am using a Sun Blade 100 with an on-board eri 100Mbps Ethernet interface. I purchased an Intel 1000/Pro MT Server NIC, which requires an e1000g driver. Here is a list of NICs that are known to work with IP Instances and VNICs.

After installing Solaris Nevada, I created my non-global zone with the following configuration:

global# zonecfg -z vpnzone info
zonename: vpnzone
zonepath: /zones/vpnzone
brand: native
autoboot: true
bootargs: 
pool: 
limitpriv: 
scheduling-class:
ip-type: exclusive
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
inherit-pkg-dir:
        dir: /etc/crypto/certs
fs:
        dir: /usr/local
        special: /zones/vpnzone/usr-local
        raw not specified
        type: lofs
        options: []
net:
        address not specified
        physical: e1000g0
global#
I had to include an additional inherit directive for this sparse, because currently some of the crypto stuff is not duplicated into a non-global zone. Without this, even the digest command would fail, for example. I needed to provide a private directory for /usr/local since that is where the Punchin packages get installed by default.

Once I installed and configured vpnzone, I was able to install and configure the Punchin client.

However, this required two NICs. So to use just one, I created a VNIC for my VPN zone.

global# dladm show-dev
eri0                 link: unknown   speed:       0Mb  duplex: unknown
e1000g0         link: up             speed:   100Mb  duplex: full
global# dladm show-link
eri0                 type: legacy     mtu: 1500       device: eri0
e1000g0         type: non-vlan  mtu: 1500       device: e1000g0
global# dladm create-vnic -d e1000g0 -m 0:4:23:e0:5f:1 1
global# dladm show-link
eri0                 type: legacy     mtu: 1500       device: eri0
e1000g0         type: non-vlan  mtu: 1500       device: e1000g0
vnic1               type: non-vlan  mtu: 1500       device: vnic1
global# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
e1000g0: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 2
        inet 192.168.1.58 netmask ffffff00 broadcast 192.168.1.255
        ether 0:4:23:e0:5f:6b
global# 
I chose to provide my on MAC address, based on the address of the base NIC.

I modified the non-global zone configuration:

global# zonecfg -z vpnzone info
zonename: vpnzone
zonepath: /zones/vpnzone
brand: native
autoboot: true
bootargs: 
pool: 
limitpriv: 
scheduling-class:
ip-type: exclusive
inherit-pkg-dir:
        dir: /lib
inherit-pkg-dir:
        dir: /platform
inherit-pkg-dir:
        dir: /sbin
inherit-pkg-dir:
        dir: /usr
inherit-pkg-dir:
        dir: /etc/crypto/certs
fs:
        dir: /usr/local
        special: /zones/vpnzone/usr-local
        raw not specified
        type: lofs
        options: []
net:
        address not specified
        physical: vnic1
global#
Now I can access the system at home while I am not there, zlogin into vpnzone, punchin, and be connected to our internal network. This is really significant for me, since at home I have 6Mbps download compared to only 600Kbps in the office. So downloading the DVD ISO that I used to create this setup took 1/10th the time at home than at work.

[1] I also used the SUNWonbld package. This package is specific to build 61!

Because I install BFUs a lot, I have added the following to my .profile

if [ -d /opt/onbld ]
then
   FASTFS=/opt/onbld/bin/`uname -p`/fastfs ; export FASTFS
   BFULD=/opt/onbld/bin/`uname -p`/bfuld ; export BFULD
   GZIPBIN=/usr/bin/gzip ; export GZIPBIN
   PATH=$PATH:/opt/onbld/bin
fi

Saturday May 12, 2007

Network performance differences within an IP Instance vs. across IP Instances

When consolidating or co-locating multiple applications on the same system, inter-application network typically stays within the system, since the shared IP in the kernel recognizes that the destination address is on the same system, and thus loops it back up the stack without ever putting the data on a physical network. This has introduced some challenges for customers deploying Solaris Containers (specifically zones) where different Containers are on different subnets, and it is expected that traffic between them leaves the system (maybe through a router or fireall to restrict or monitor inter-tier traffic).

With IP Instances in Solaris Nevada build 57 and targeted for Solaris 10 7/07, there is the ability to configures zones with exclusive IP Instances, thus forcing all traffic leaving a zone out onto the network. This introduces additional network stack processing both on the transmit and the receive. Prompted by some customer questions regarding this, I performed a simple test to measure the difference.

On two systems, a V210 with two 1.336GHz CPUs and 8GB memory, and an x4200 with two dual-core Opteron XXXX and 8GB memory, I ran FTP transfers between zones. My switch is a Netgear GS716T Smart Switch with 1Gbps ports. The V210 has four bge interfaces and the x4200 has four e1000g interfaces.

I created four zones. Zones x1 and x2 have eXclusive IP Instances, while zones s1 and s2 have Shared IP Instances (IP is shared with the global zone). Both systems are running Solaris 10 7/07 build 06.

Relevant zonecfg info is a follows (all zones are sparse):


v210# zonecfg -z x1 info
zonename: x1
zonepath: /localzones/x1
...
ip-type: exclusive
net:
        address not specified
        physical: bge1

v210# zonecfg -z s1 info
zonename: s1
zonepath: /localzones/s1
...
ip-type: shared
net:
        address: 10.10.10.11/24
        physical: bge3
 
As a test user in each zone, I created a file using 'mkfile 1000m /tmp/file1000m'. Then I used ftp to transfer it between zones. No tuning was done whatsoever.

The results are as follows.

V210: (bge)

Exclusive to Exclusive
x1# /usr/bin/time ftp x2 << EOF\^Jcd /tmp\^Jbin\^Jput file1000m\^JEOF

real       17.0
user        0.2
sys        11.2

Exclusive to Shared
x1# /usr/bin/time ftp s2 << EOF\^Jcd /tmp\^Jbin\^Jput file1000m\^JEOF

real       17.3
user        0.2
sys        11.6

Shared to Shared
s2# /usr/bin/time ftp s1 << EOF\^Jcd /tmp\^Jbin\^Jput file1000m\^JEOF

real        6.6
user        0.1
sys         5.3


X4200: (e1000g)

Exclusive to Exclusive
x1# /usr/bin/time ftp x2 << EOF\^Jcd /tmp\^Jbin\^Jput file1000m\^JEOF

real        9.1
user        0.0
sys         4.0

Exclusive to Shared
x1# /usr/bin/time ftp s2 << EOF\^Jcd /tmp\^Jbin\^Jput file1000m\^JEOF

real        9.1
user        0.0
sys         4.1

Shared to Shared
s2# /usr/bin/time ftp s1 << EOF\^Jcd /tmp\^Jbin\^Jput file1000m\^JEOF

real        4.0
user        0.0
sys         3.5
I ran each test several times and picked a result that seemed average across the runs. Not very scientific, and a table might be nicer.

Something I noticed that surprised me was that time spent in IP and the driver is measurable on the V210 with bge, and much less so on the x4200 with e1000g.

About

stw

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today