What happened to my packets? -- or -- Dual default routes and shared IP zones

I recently received a call from someone who has helped me out a lot on some performance issues (thanks, Jim Fiori), and I was glad to be able to return even a small part of those favors!

He had been contacted to help a customer who was ready to deploy a web application, and they were experiencing intermittent lack of connection to the web site. Interestingly, they were also using zones, a bunch of them (OK, a handful)--and so right up my alley.

The customer was running a multi-tiered web application on an x4600 (so Solaris on x86 as well!), with the web server, web router, and application tiers in different zones. They were using shared IP Instances, so all the network configuration was being done in the global zone.

Initially, we had to modify some configuration parameters, especially regarding default routes. Since the system was installed with Solaris 10 5/08 and had more recent patches, we could use the defrouter feature introduced in 10/08 to make setting up routes for the non-global zones a little easier. This was needed because the global zone was using only one NIC, and it was not going to be on the networks that the non-global zones were on.

What made the configuration a little unique was that the web server needs a default router to the Internet, while the application server needs a route to other systems behind a different router. Individually, everything is fine. However, the web1 zone also needs to be on the network that the application and web router are on, so it ends up having two interfaces.

Lets look at web1 when only it is running.

web1# ifconfig -a4
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet netmask ff000000
bge1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet netmask ffffff00 broadcast
bge2:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet netmask ffffff00 broadcast
web1# netstat -rn
Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default               UG        1          0 bge1           U         1          0 bge1:1        U         1          0 bge2:1            U         1          0 bge1:1              UH        5         34 lo0:1

The zone is on two interface, bge1 and bge2, and has a default route that uses bge1. However, when zone app1 is running, there is a second default route, on bge2. The same is true if app2 or odr are running. Note that these three zones are only on bge2.

app1# ifconfig -a4
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet netmask ff000000
bge2:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet netmask ffffff00 broadcast
app1# netstat -rn
Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default             UG        1          0 bge2        U         1          0 bge2:1          U         1          0 bge2:1              UH        3         51 lo0:1

In the meantime, this is what happens in web1.

web1# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- --------- 
default             UG        1          0 bge2
default               UG        1          0 bge1           U         1          0 bge1:1        U         1          0 bge2:4            U         1          0 bge1:1              UH        6        132 lo0:4

With any of the other zones running, web1 now has two default routes. And it only happens in web1, as it is the only zone with its public facing data link bge1 and a shared data link (bge2).

Traffic to any system on either the or network will have no issues. Every time IP needs to determine a new path for a system not on either of those two networks, it will pick a route, and it will round-robin between the two default routes. Thus approximately half the time, connections will fail to establish, or possibly existing connections will not work if they have been idle for a while.

This is how IP is supposed to work, so there is technically nothing wrong. It is a features of zones and a shared IP Instance. [2009.06.23: For background on why IP works this way, see James' blog].

The only problem is that this is not what the customer wants!

One option would be to force all traffic between the web and application tier out the bge1 interface, putting it on the wire. This may not be desirable for security reasons, and introduces latencies since traffic now goes on the wire. Another option would be to use exclusive IP Instances for the web servers. For each web zone, and this example only has one, it would required two additional data links (NICs). That would add up. Also, this configuration is targeted to be used with Solaris Cluster's scalable services, and those must be in shared IP Instance zones. Hummm....as I like to say.

We didn't know about the shared IP Instance restriction of Solaris Cluster, and as the customer was considering how they were going to add additional NICs to all the systems, something slowly developed in my mind. How about creating a shared, dummy network between the web and application tier? They had one spare NIC, and with shared IP it does not even need to be connected to a switch port, since IP will loop all traffic back anyway!

The more I thought about it, the more I liked it, and I could not see anything wrong with it. At least not technically as I understood Solaris. Operationally, for the customer, it might be a little awkward.

Here is what I was thinking of...

With this configuration the web1 zone has a default router only to the Internet and it can reach odr, and if necessary, app1 and app2, directly via the new network. And app1 and app2 only have a single default route to get to the Intranet. The nice thing is that bge3 does not even need to be up. That is visible with ifconfig output, where bge3 is not showing a RUNNING flag, which indicates the port is not connected (or in my case has been disabled on the switch).

global# ifconfig -a4
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet netmask ffffff00 broadcast
        ether 0:3:ba:e3:42:8b
bge1: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet netmask 0
        ether 0:3:ba:e3:42:8c
bge2: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet netmask 0
        ether 0:3:ba:e3:42:8d 
bge3: flags=1000802<BROADCAST,MULTICAST,IPv4> mtu 1500 index 5 
        inet netmask 0
        ether 0:3:ba:e3:42:8e
And within web1 there is now only one default route.
web1# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- --------- 
default               UG        1         17 bge1           U         1          2 bge1:1        U         1          2 bge3:1            U         1          0 bge1:1              UH        4        120 lo0:1
In the customer's case, multiple systems were being used, so the private networks were connected together so that a web zone on one system could access an odr zone on another. I am showing the simple, single system case since it is so convenient.

If I were using Solaris Express Community Edition (SX-CE) or OpenSolaris 2009.06 Developer Builds, with the Crossbow bits and virtual NICs (VNICs) available, I wouldn't even have needed to use that physical interface. Both are available here.

I hope this trick might help others out in the future.



Sorry for commenting on such an old blog entry, but...

Could this be done using a loopback interface instead of a physical interface under Solaris 10?

Posted by David on October 18, 2010 at 05:45 AM EDT #

Hi David,

You have to use a physical interface. They are shared between the two zones, even if the interface is not plumbed. Loopback interfaces are per zone only.

What's it you are trying to do at the high level? Maybe there is a different way to get where you want to go.


Posted by Steffen Weiberle on October 18, 2010 at 07:08 AM EDT #

Hi Steffen,

I'm OK with my stuff here, Steffen.

I thought your solution was awesome, considering that CrossBow was not yet available under Solaris 10!

My question was more of a curiousity... could a loopback interface on a global zone be instantiated and shared into individual zones to carry out the role the physical interface did in your solution?

For example, with the loopbacks defined:
(global) lo0 -, (zonea) lo0:2 -, (zoneb) lo0:3

Could zonea lo0:2 ( communicate to zoneb lo2:3 (

It sounds like the answer is no, a loopbacks can not route between zones on a global network to make the virtual network...

I am REALLY looking forward to CrossBow in Solaris!

Posted by David on October 18, 2010 at 09:52 AM EDT #

Hi Steffen, i'm sorry for this very late reply.

@ David,

Like Steffen said, AFAIK, no - you can not interconnect zones using instantiated loopback interface, but you can do it using virtual interfaces assigned to every zones (virtual NIC) and connect those zones through the use of virtual switch/etherstub (the main concept of CrossBow project).

I have tried that in my home lab using vmware under ubuntu and SXCE b115 (it's been quite some time ago), using this small concept diagram http://i165.photobucket.com/albums/u66/rossonieri_1/SPEEDY/solaris-zones-fw/zone12.png

My partial reference was that from Mr. Sunay's blog here http://blogs.sun.com/sunay/entry/network_in_a_box_creating (thanks to Steffen for the link) to do some kind of dynamic inter-zones routing (Quagga), and some little experiment to do zone inline IPS (Snort), inter-zones NAT and firewalling etc.

At that time, the system itself (SXCE b115) is quite stable for small home lab traffic, and my home lab pending project was to replicate the idea through the use of Xen virtual interface - which is i found out that Xen was quite more complex/complicated compares to the CrossBow (different software usage and different concept approach i guess), and it got stucked until now (even i've been retired from the IT for some time now - and do all that as hobby only).

I've wrote a small tutorial to achieve inter-zones routing/firewalling using SXCE b115, but it is on Bahasa Indonesia http://opensource.telkomspeedy.com/forum/viewtopic.php?id=7959


Posted by abdi on November 25, 2010 at 12:52 PM EST #

Post a Comment:
Comments are closed for this entry.



« July 2016