Thursday Apr 16, 2009

What happened to my packets? -- or -- Dual default routes and shared IP zones

I recently received a call from someone who has helped me out a lot on some performance issues (thanks, Jim Fiori), and I was glad to be able to return even a small part of those favors!

He had been contacted to help a customer who was ready to deploy a web application, and they were experiencing intermittent lack of connection to the web site. Interestingly, they were also using zones, a bunch of them (OK, a handful)--and so right up my alley.

The customer was running a multi-tiered web application on an x4600 (so Solaris on x86 as well!), with the web server, web router, and application tiers in different zones. They were using shared IP Instances, so all the network configuration was being done in the global zone.

Initially, we had to modify some configuration parameters, especially regarding default routes. Since the system was installed with Solaris 10 5/08 and had more recent patches, we could use the defrouter feature introduced in 10/08 to make setting up routes for the non-global zones a little easier. This was needed because the global zone was using only one NIC, and it was not going to be on the networks that the non-global zones were on.

What made the configuration a little unique was that the web server needs a default router to the Internet, while the application server needs a route to other systems behind a different router. Individually, everything is fine. However, the web1 zone also needs to be on the network that the application and web router are on, so it ends up having two interfaces.

Lets look at web1 when only it is running.

web1# ifconfig -a4
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge1:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 172.16.1.41 netmask ffffff00 broadcast 172.16.1.255
bge2:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet 192.168.51.41 netmask ffffff00 broadcast 192.168.51.255
web1# netstat -rn
Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              172.16.1.1           UG        1          0 bge1
172.16.1.0           172.16.1.41          U         1          0 bge1:1
192.168.51.0         192.168.51.41        U         1          0 bge2:1
224.0.0.0            172.16.1.41          U         1          0 bge1:1
127.0.0.1            127.0.0.1            UH        5         34 lo0:1

The zone is on two interface, bge1 and bge2, and has a default route that uses bge1. However, when zone app1 is running, there is a second default route, on bge2. The same is true if app2 or odr are running. Note that these three zones are only on bge2.

app1# ifconfig -a4
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
bge2:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet 192.168.51.43 netmask ffffff00 broadcast 192.168.51.255
app1# netstat -rn
Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              192.168.51.1         UG        1          0 bge2
192.168.51.0         192.168.51.43        U         1          0 bge2:1
224.0.0.0            192.168.51.43        U         1          0 bge2:1
127.0.0.1            127.0.0.1            UH        3         51 lo0:1

In the meantime, this is what happens in web1.

web1# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- --------- 
default              192.168.51.1         UG        1          0 bge2
default              172.16.1.1           UG        1          0 bge1 
172.16.1.0           172.16.1.41          U         1          0 bge1:1
192.168.51.0         192.168.51.41        U         1          0 bge2:4
224.0.0.0            172.16.1.41          U         1          0 bge1:1
127.0.0.1            127.0.0.1            UH        6        132 lo0:4

With any of the other zones running, web1 now has two default routes. And it only happens in web1, as it is the only zone with its public facing data link bge1 and a shared data link (bge2).

Traffic to any system on either the 192.168.51.0 or 172.16.1.1 network will have no issues. Every time IP needs to determine a new path for a system not on either of those two networks, it will pick a route, and it will round-robin between the two default routes. Thus approximately half the time, connections will fail to establish, or possibly existing connections will not work if they have been idle for a while.

This is how IP is supposed to work, so there is technically nothing wrong. It is a features of zones and a shared IP Instance. [2009.06.23: For background on why IP works this way, see James' blog].

The only problem is that this is not what the customer wants!

One option would be to force all traffic between the web and application tier out the bge1 interface, putting it on the wire. This may not be desirable for security reasons, and introduces latencies since traffic now goes on the wire. Another option would be to use exclusive IP Instances for the web servers. For each web zone, and this example only has one, it would required two additional data links (NICs). That would add up. Also, this configuration is targeted to be used with Solaris Cluster's scalable services, and those must be in shared IP Instance zones. Hummm....as I like to say.

We didn't know about the shared IP Instance restriction of Solaris Cluster, and as the customer was considering how they were going to add additional NICs to all the systems, something slowly developed in my mind. How about creating a shared, dummy network between the web and application tier? They had one spare NIC, and with shared IP it does not even need to be connected to a switch port, since IP will loop all traffic back anyway!

The more I thought about it, the more I liked it, and I could not see anything wrong with it. At least not technically as I understood Solaris. Operationally, for the customer, it might be a little awkward.

Here is what I was thinking of...

With this configuration the web1 zone has a default router only to the Internet and it can reach odr, and if necessary, app1 and app2, directly via the new network. And app1 and app2 only have a single default route to get to the Intranet. The nice thing is that bge3 does not even need to be up. That is visible with ifconfig output, where bge3 is not showing a RUNNING flag, which indicates the port is not connected (or in my case has been disabled on the switch).

global# ifconfig -a4
...
bge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 129.154.53.125 netmask ffffff00 broadcast 129.154.53.255
        ether 0:3:ba:e3:42:8b
bge1: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 0.0.0.0 netmask 0
        ether 0:3:ba:e3:42:8c
bge2: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet 0.0.0.0 netmask 0
        ether 0:3:ba:e3:42:8d 
bge3: flags=1000802<BROADCAST,MULTICAST,IPv4> mtu 1500 index 5 
        inet 0.0.0.0 netmask 0
        ether 0:3:ba:e3:42:8e
...
And within web1 there is now only one default route.
web1# netstat -rn

Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- --------- 
default              172.16.1.1           UG        1         17 bge1 
172.16.1.0           172.16.1.41          U         1          2 bge1:1
192.168.52.0         192.168.52.41        U         1          2 bge3:1
224.0.0.0            172.16.1.41          U         1          0 bge1:1
127.0.0.1            127.0.0.1            UH        4        120 lo0:1
In the customer's case, multiple systems were being used, so the private networks were connected together so that a web zone on one system could access an odr zone on another. I am showing the simple, single system case since it is so convenient.

If I were using Solaris Express Community Edition (SX-CE) or OpenSolaris 2009.06 Developer Builds, with the Crossbow bits and virtual NICs (VNICs) available, I wouldn't even have needed to use that physical interface. Both are available here.

I hope this trick might help others out in the future.

Steffen

About

stw

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today