Thursday Jan 04, 2007

From the Trenches: a Niagara Jumpstart "gotcha"

Occasionally, things happen which remind me how subtle networking can be, even at the lower levels of the OSI stack.

I was with a customer yesterday, and they were upgrading their nice shiny new T2000 to Solaris 10 Update 3 using JET.

They were having an odd problem with this. The box would do the usual RARP / ARP and tftp its kernel and miniroot over from the server, but would hang just after whoami returned "no domain". Basically, it stalled just before the point where it would report that it was configuring its devices.

After a few 'phone calls to their Jumpstart guru, the problem was found to be that the T2000's network interfaces were trying to autonegotiate speed and duplex at this point; as the switches it was connected to had these properties set statically, the Jumpstart stalled.

The fix is to set these properties statically when initiating the Jumpstart; instead of the familiar "boot net - install", we used "boot net:speed=100,duplex=full, - install".

NB. Note the second, oddly-placed comma after "full" in the line above - it is necessary!

Tuesday Jan 02, 2007

Thoughts on Global Server Load Balancing

As well as doing all manner of stuff relating to Security, I occasionally get to do a bunch of Networking work; as with security, I like to do the whole piece, from requirements capture through design to implementation. I first encountered Layer 4-7 load balancing on Alteon ACEDirector 3s back in the days of WebOS 5.2 - which I figure must have been about 2001 - and found it really rather cool, especially when it came to deciding which load balancing algorithm to use based on the connection and state model of the protocol being balanced.

Later on, when Global Server Load Balancing (GSLB) was introduced, I initially wondered what the reasoning was behind it, then figured that there were all sorts of shortcomings with it which added up to a verdict of "avoid at all costs", then eventually realised that there's one or two cases where it's actually The Right Thing to do. I occasionally hear from various customers who are part-way down the path I've taken, so I figured I'd get some thoughts down on what GSLB is, when you should use it, and when there are better ways to solve the problem.

Why Customers Want "GSLB"

Back in the dotcom days, we'd get folk coming to us with words to the effect of "here's my $10 million in venture capital, build me a resilient service infrastructure to present my creation to the world at large".

We delivered on their requests, many times, including disaster recovery (DR) environments for those whose industry requirements stipulated them, or those who simply wanted them.

Then, when folks' money started getting tight, their requirements shifted a bit. They came back to us, saying "I've got this DR site, sitting there soaking up power and rack rental space, and while my main site is running normally, this other site is not actually doing me any good. How can I put the kit there to work, delivering my creation to the world such that it can still pick up as the DR site if the main one goes down? Oh, and if this site is in another country from the main one, how do I do this without having to put a huge expensive link between the two?"

Thus was GSLB born; the first time I came across it was in Alteon Web OS 8, probably around 2002-3. The following year, pretty much all the load balancer manufacturers had an implementation.

Taxonomy of GSLB

Most GSLB implementations work by exploiting the fact that queries tend to arrive at infrastructures based upon client resolutions of service addresses by DNS, rather than explicit references to IP addresses. Switches performing GSLB across the various datacentres are set up as the domain's primary external DNS servers, where DNS services themselves are typically backed-off to a pair of load-balanced DNS "slaves" hidden from external view. The switches handle DNS requests from clients, and use round-trip time (RTT) timing to determine whether a given client is closer to one GSLB-equipped datacentre than another.

In the event of a load-balanced service failing at a datacentre, where the virtual IP address (vip) associated with the service is presented via the external DNS, the GSLB system in the switches resolves the DNS-advertised address to the other site's vip.

Also - and this is the main reason why folk wanted GSLB - if multiple sites are up, the GSLB switches can compare their RTTs to the client's DNS server, such that the site with the lowest RTT is the one to which the service request is pointed. Thus, we get GSLB as "active-active bandwidth-weighted load balancing by DNS", which fulfils the customer requirements of "making the DR site do useful work".

Implicit Assumptions of GSLB

Perhaps the most significant implicit assumption in GSLB - which Alteon later claimed to have figured out a workaround for, although I never quite sorted out the nature of this workaround in my head - is that the client's DNS server (which contacts the GSLB-presented DNS address) is located logically close to the actual client trying to access the service being presented. By "logically close to", I mean "sufficiently far away from the GSLB infrastructure itself, in terms of hop count and link bandwidth bottlenecks, that the speed of the effective aggregate link between client and infrastructure is the same as that of the effective link between the DNS server chain the client is using and the infrastructure".

This assumption isn't always correct, especially when you consider that DNS is hierarchical and thus the server which makes the request of the GSLB-presented DNS service may be some levels removed from the client. Also, the "logical closeness" assumption doesn't wash too well when you consider entities such as the AOL mega-proxy.

When, and When Not, to Use GSLB

GSLB works best when you have multiple independent datacentres which you want to have doing useful work, such that the bandwidth of any existing link between them is small (assuming that GSLB negotiation traffic is light when compared to live service transaction traffic), and the datacentres do not need to perform back-end synchronisation or multi-phase commits between them.

This qualifier is what has relegated GSLB to the small niche it now occupies. While http and many other protocols are stateless, the transaction data they carry very often isn't. In the event that a non-read-only transaction was performed with one GSLB site and that GSLB site subsequently went down, you'd want the other site(s) to have a record of the data written in the transaction. This usually results in back-end databases at each site needing to either do regular synshronisations or multi-phase write commits, at which point it's frequently the case that the bandwidth of the links between the sites needs to be raised to the point where, rather than use GSLB, you might as well do regular active-active load balancing. While active-active is rather trickier to set up and maintain than active-standby, it's still simpler than GSLB and has the advantage that it's easier to weight the distribution of load across your sites, to cater for any differences in hardware performance between them.


So, in the (relatively unlikely) event that your services are read-only and the bandwidth between your datacentres is small, go right ahead and use GSLB. For all other circumstances, there's usually a better way to approach the problem, based on the fact that there's more inter-site bandwidth available.


There's actually a couple more points worth considering. First, if you're going to avoid GSLB, it's often useful to source the uppstream links from your datacentres from the same supplier, in a manner such that any vips you need to fail over between datacentres are in the same upstream subnet. Second, if you are going to go down the route of GSLB, beware the latency involved with DNS map updates; while a GSLB device will readily do a map "push" to its upstream server as part of a failover event, you are at the mercy of the "ripple carry" latency involved in reaching the client-end DNS server, before a client will be redirected to the live site.

Monday Dec 04, 2006

The Myth of "Dialtone"

Way back when - it must have been about 2001 - Scott McNealy bought into the concept of "webtone", where the idea was that "IP network access must be as reliable as the dialtone on your conventional telephone".

The thing is, the Internet may well already be more reliable than that.

It's a subtle point that the limited format of telephone tones means that they can be used to hide shortcomings in the 'phone network.

For instance, if you pick your 'phone up and get a dialtone, all that means is that you have connectivity to your nearest PBX. In IP terms, this is exactly equivalent to being able to ping your default gateway (ie, is only one step better than being able to ping your own address).

If you dial a number which is also on your local PBX, then it's about as easy for the PBX to make a circuit connection to the other handset as it is for a router to route packets between directly-connected subnets; the only usual difference is that the PBX needs to watch for a hang-up signal so that it can drop the connection, but it should also be borne in mind that such active communication termination makes billing easier. Even then, there are subtleties involved; it used to be the case that CCITT and Bell differed where one would only drop the circuit when the calling party sent a hangup, whereas the other would only drop when the called party sent a hangup.

If you call a number outside the scope of your local PBX, then the usual thing happens where the PBX reaches up to its nearest Class 5 switch - at which point, the traffic "disappears" into the cloud of interconnected Class 5 switches in the same way as an IP request disappears into the cloud of interconnected Internet routers when you try to connect to a remote IP address.

However, if the appropriate Class 5 switch can't set up a call circuit as a result of link congestion, you get an engaged tone.

That's right, an "engaged" tone. Not a "switch unavailable" tone or a "link full" tone, nor do you have to sit waiting interminably as you would if you were on a browser in the IP world, while whatever "connecting..." telltale spins away to itself for a couple of minutes before giving you something informative along the lines of "connection timed out" or "server unavailable".

If you've ever tried 'phoning someone, had the line come back as engaged, called them again some minutes later and had them react with surprise and denial when you ask who they were talking to the first time you called, the link was actually saturated the first time.

This also varies between countries; when making international calls, sometimes I've even had a "ringing" tone to mask conditions of line saturation, rather than an "engaged" tone.

The only time you would typically get a "number unreachable" tone is if the Class 5 switch which provides the nearest link to the destination-number holding PBX is unreachable on its out-of-band signalling link, and as Class 5 switches are deployed in highly-available pairs (just like resilient IP network infrastructures), this doesn't happen very often.

The fact that connection failure reports in IP networks are so informative is what makes failures more apparent, and hence perceived reliability lower.


Mobile 'phones are at least starting to change the rules a bit; when a hex (or BSC, if you're in the game) reaches capacity, at least you now get a "Network busy" error on your 'phone's screen rather than just an engaged tone or a dropped connection attempt. Folk will begin to notice such bandwidth issues, and therefore dynamically-modifiable BSCs with adaptive coverage are being developed (Queen Mary College, University of London have some particularly nifty stuff).




« July 2016