As often happens, a customer question resulted in this write-up. The customer had to quickly consider how they deploy a large number of zones on an M8000. They would be configuring up to twelve separate links for the different networks, and double that for IPMP. I wrote up the following. Thanks to Penny Cotten, Jim Eggers, Gordon Lythgoe, Peter Memishian, and Erik Nordmark for the feedback as I was preparing this. Also, you may see some of this in future documentation.
Datalink: An interface at Layer 2 of the OSI protocol stack, which is represented in a system as a STREAMS DLPI (v2) interface. Such an interface can be plumbed under protocol stacks such as TCP/IP. In the context of Solaris 10 Zones, datalinks are physical interfaces (e.g. e1000g0, bge1), aggregations (aggr3), or VLAN-tagged interfaces (e1000g111000 (VLAN tag 111 on e1000g0), bge111001, aggr111003). A datalink may also be referred to as a physical interface, such as when referring to a Network Interface Card (NIC). The datalink is the 'physical' property configured with the zone configuration tool zonecfg(1M).
Non-global Zone: A non-global zone is any zone, whether native or branded, that is configured, installed, and managed using the zonecfg(1M) and zoneadm(1M) commands in Solaris 10. A branded zone may be either Solaris 8 or Solaris 9.
Zone network configuration: shared versus exclusive IP Instances
Since Solaris 10 8/07, zone configurations can be either in the default shared IP Instance or exclusive IP Instance configuration.
When configured as shared, zone networking includes the following characteristics.
- All datalink and IP, TCP, UDP, SCTP, IPsec, etc. configuration is done in the global zone.
- All zones share the network configuration settings, including datalink, IP, TCP, UDP, etc. This includes ndd(1M) settings.
- All IP addresses, netmasks, and routes are set by the global zone and can not be altered in a non-global zone.
- Non-global zones can not utilize DHCP (neither client nor server). There is a work-around that may allow a zone to be a DHCP server.
- By default a privileged user in a non-global zone can not put a datalink into promiscuous mode, and thus can not run things like snoop(1M). Changing this requires adding the priv_net_raw privilege to the zone from the global zone, and also requires identifying which interface(s) to allow promiscuous mode on via the 'match' zonecfg parameter. Warning: This allows the non-global zone to send arbitraty packets on those interfaces.
- IPMP configuration is managed in the global zone and applies to all zones using the datalinks in the IPMP group. All non-global zones configured with one datalink can or must use all datalinks in the IPMP group. Non-global zones can use multiple IPMP groups. The zone must be configured with only one datalink from each IPMP group.
- Only default routes apply to the non-global zones, as determined by the IP address(es) assigned to the zone. Non-default static routes are not supported to direct traffic leaving a non-global zone.
- Multiple zones can share a datalink.
When configured as exclusive, zone networking includes the following characteristics.
- All network configuration can be done within the non-global zone (and can also be done indirectly from the global zone (via zlogin(1) or editing the files in the non-global zone's root file system).
- IP and above configurations can not be seen directly within the global zone (e.g. running ifconfig(1M) in the global zone will not show the details of a non-global zone).
- The non-global zone's interface(s) can be configured via DHCP, and the zone can be a DHCP server.
- A privileged user in the non-global zone can fully manipulate IP address, netmask, routes, ndd variables, logical interfaces, ARP cache, IPsec policy and keys, IP Filter, etc.
- A privileged user in the non-global zone can put the assigned interface(s) into promiscuous mode (e.g. can run snoop).
- The non-global zone can have unique IPsec properties.
- IPMP must be managed within the non-global zone.
- A datalink can only be used by a single running zone at any one time.
- Commands such as snoop(1M) and dladm(1M) can be used on datalinks in use by running zones.
It is possible to mix shared and exclusive IP zones on a system. All shared zones will be sharing the configuration and run time data (routes, ARP, IPsec) of the global zone. Each exclusive zone will have its own configuration and run time data, which can not be shared with the global zone or any other exclusive zones.
IP Multipathing (IPMP)
By default, all IPMP configurations are managed in the global zone and affects all non-global zones whose network configuration includes even one datalink (the net->physical property in zonecfg(1M)) in the IPMP group. A zone configured with a datalinks that are part of IPMP groups must only configure each IP address on only one of the datalinks in the IPMP group. It is not necessary to configure an IP address on each datalink in the group. The global zone's IPMP infrastructure will manage the fail-over and fail-back of datalinks on behalf of all the shared IP non-global zones.
For exclusive IP zones, the IPMP configuration for a zone must be managed from within the non-global zone, either via the configuration files or zlogin(1).
The choice to use probe-based failure detection or link-based failure detection can be done on a per-IPMP group basis, and does not affect whether the zone can be configured as shared or exclusive IP Instance. Care must be taken when selecting test IP addresses, since they will be configured in the global zone and thus may affect routing for either the global or for the non-global zones.
Routing and Zones
The normal case for shared-IP zones is that they use the same datalinks and the same IP subnet prefixes as the global zone. In that case the routing in the shared-IP zones are the same as in the global zone. The global zone can use static or dynamic routing to populate its routing table, that will be used by all the shared-IP zones.
In some cases different zones need different IP routing. The best approach to accomplish this is to make those zones be exclusive-IP zones. If this is not possible, then one can use some limited support for routing differentiation across shared-IP zones. This limited support only handles static default routes, and only works reliably when the shared-IP zones use disjoint IP subnets.
All routing is managed by zone that owns the IP Instance. The global zones owns the 'default' IP Instance that all shared IP zones use. Any exclusive IP zone manages the routes for just that zone. Different routing policies, routing daemons, and configurations can be used in each IP Instance.
For shared IP zones, only default static routes are supported with those zones. If multiple default routes apply to a non-global zone, care must be taken that all the default routes are able to reach all the destinations that the zone need to reach. A round robin policy is used when multiple default routes are available and a new route needs to be determined.
The zonecfg(1M) 'defrouter' property can be used to define a default router for a specific shared IP zone. When a zone is started and the parameter is set, a default route on the interface configured for that zone will be created if it does not already exist. As of Solaris 10 10/09, when a zone stops, the default route is not deleted.
Default routes on the same datalink and IP subnet are shared across non-global zones. If a non-global zone is on the same datalink and subnet as the global zone, default route(s) configured for one zone will apply for all other zones on that datalink and IP subnet.
Inter-zone network traffic isolation
There are several ways to restrict network traffic between non-global shared IP zones.
The /dev/ip ndd(1M) paramter 'ip_restrict_interzone_loopback', managed from the global zone, will force traffic out of the system on a datalink if the source and destination zones do not share a datalink. The default configuration for this is to allow inter-zone networking using internal loopback of IP datagrams, with the value of this parameter set to '0'. When the value is set to '1', traffic to an IP address in another zone in the shared IP Instance that is not on the same datalink will be put onto the external network. Whether the destination is reached will depend on the full network configuration of the system and the external network. This applies whether the source and destination IP address are on the same or different IP subnets. This parameter applies to all IP Instances active on the system, including exclusive IP Instance zones. In the case of exclusive IP zones, this will apply only if the zone has more than one datalink configured with IP addresses.
The for two zones on the same system to communicate with the 'ip_restrict_interzone_loopback' set to '1' requires the following conditions.
- There is a network path to the destination. If on the same subnet, the switch(es) must allow the connection. If on different subnets, routes must be in place for packets to pass reliably between the two zones.
- The destination address is not on the same datalink (as this would break the datalink rules).
- The destination is not on datalink in an IPMP group that the sending datalink is also in.
The 'ip_restrict_interzone_loopback' parameter is available in Solaris 10 8/07 and later.
A route(1M) action to prevent traffic between two IP addresses is available. Using the '-reject' flag will generate an ICMP unreachable when this route is attempted. The '-blackhole' flag will silently discard datagrams.
The IP Filter action 'intercept_loopback' will filter traffic between sockets on a system, including traffic between zones and loopback traffic within a zone. Using this action prevents traffic between shared IP zones. It does not force traffic out of the system using a datalink. More information is in the ipf.conf(4) or ipf(4) manual page.
Solaris 10 1/06 and later support IEEE 802.3ad link aggregations using the dladm(1M) datalink administration command. Combining two or more datalinks into an aggregation effectively reduces the number of datalinks available. Thus it is important to consider the trade-offs between aggregations and IPMP when requiring either network availability or increased network bandwidth. Full traffic patterns must be understood as part of the decision making process.
For the 'ce' NIC, Sun Trunking 1.3.1 is available for Solaris 10.
Some considerations when making a decision between link aggregation and IPMP are the following.
- Link aggregation requires support and configuration of aggregations on both ends of the link, i.e. both the system and the switch.
- Most switches only support link aggregation within a switch, not spanning two or more switches.
- Traffic between a single pair of IP addresses will typically only utilize one link in either an aggregation or IPMP group.
- Link aggregation only provides availability between the switch ports and the system. IPMP using probe-based failure detection can redirect traffic around internal switch problems or network issues behind the switches.
- Multiple hashing policies are available, and they can be set differently for inbound and outbound traffic.
- IPMP probe-based failure detection required test addresses for each datalink in the IPMP group, which are in addition to the application or data address(es).
- IPMP link-based failure detection will cause a fail-over or fail-back based on link state only. Solaris 10 supports IPMP configured in only link-based mode. If IPMP is configured in probe-based failure detection, link failure will also cause fail-over, and a link restore will cause a fail-back.
- A physical interface can be in only one aggregation. VLANs can be configured over an aggregation.
- A datalink can be in only one IPMP group.
- An IPMP group can use aggregations as the underlying datalinks.
Note, this is for Solaris 10. OpenSolaris has differences. Maybe something for another day.
I hope this is helpful! Steffen