DOS attacks suck

Particularly when you're having a major content push.

The other day we were having our earnings announcement (can you get more visible?) and I walked into work at around 8:30am. I'd been getting paged every once in a while from our site monitors that something was occasionally failing or slow.

We looked around (my ops engineers and I) and saw a significant jump in incoming traffic - something north of 50mb/s - when we normally run about 3-5mb/s. The servers were still fine, as the switches were handling the attacks. Unfortunately it was only 8:30am, with our peak traffic loads hitting at around 10am. Looking at the switches we were running about 98% of capacity. The big question was did we have the capacity to live through the earnings announcment?

As the load goes up, we saw more pages from the monitors that some places were slow or couldn't connect... Earnings announcement is at 2pm PST. By noon we're pretty well maxed out on the switches, but we don't want to take downtime due to the earnings...

1pm. We attempt a content push - and it fails. This is the content that will be used for the announcement at 2. Great. We then attempt to reboot one switch, hoping against hopes we can bring it back before the announcment. The good news, it boots pretty fast (10 min). The bad news - the other switch succumbs to the load and dies (reboots). Switch 1 has some issues because switch 2 never really lets go. Great.

1:30pm I'm getting calls from all kinds of people - VPs, you name it. If we blow this, we're in serious trouble. One nice thing is the incoming traffic has started to fall off, the DOS is blowing over (but we're still semi-offline - some of the sites are fine, others are not so fine). We finally restart both switches and get them to split the load up as they're supposed to...

1:45pm: attempt the content push again. It finally goes through. Lots of sweat and hard work to get everything restarted, but in the end we get it all out there at \*1:56pm\*. Yikes. Way too close for me. I go back to my office and finalize the plans to move to the new network. We've tested that one to much higher levels of attacks and it's proven to be more resilient. Gotta get there soon, before I blow a blood vessel.

Sometimes I think network hardware providers are behind DOS attacks - I've been hit 3 times, and each one caused me to buy new network gear (newer stuff handles attacks better - there's no doubt about it)

Comments:

Interesting to see how one of the biggest sites in the world deals with DOS attacks. Thanks for the view into your world.

Posted by Danese Cooper on May 03, 2004 at 06:26 AM PDT #

DoS do suck.. My respect for your luck in meeting the dead-line and your openess about it. The sad thing is that you never get to find out who the buggers are that start this kind of thing.

Posted by Stefan Keller on June 10, 2004 at 02:08 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

I run the engineering group responsible for Sun.com and the high volume websites at Sun.

Will Snow
Sr. Engineering Director

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today