Monday Jun 08, 2009

Virtual chassis or big core switch?

We're building out our next generation infrastructure, and we're upgrading critical parts of our infrastructure to 10GBe. One of the things we've been debating is whether to use a big core switch or take advantage of some of the virtual chassis technologies that are now available (clustering many smaller switches to look like a big chassis switch).

We're well on our way to proving out the virtual chassis technology from Juniper. It fits our data flow rates, and it's easy to manage. Over time, we've put in network gear from several vendors. This was good as we used the best from each vendor. The problem is that it causes a maintenance hassle. Your engineers have to know 2 or 3 command sets, and that's a problem. So now we're moving towards standardizing on one or two vendors to simplify maintenance.

It turns out that we realized several things about our rack layouts. We always have limited space, so we typically laid our systems into the racks on an as-needed basis. We would find the right amount of space, make sure we had enough power and just dump the systems in as we could (splitting the systems so no one service was in a single rack/row, etc.)

Our next generation design takes into account data providers and data consumers, making sure that they connect via high bandwidth connections. Our front-ends (data consumers) don't typically consume more than 1gb connections to the backends (data providers), but if you get several front-ends talking to one backend, and the backend is on a 1gb link to the top-of-rack switch, you can get into some serious problems. We're able to work around that on our very high bandwidth backends by trunking connections, but even then it's suboptimal. So we're going to make sure we have 10g connections to the backends, and that we keep the number of "all-10g" racks to a minimum (to keep costs under control). Our rack layouts will definitely change, focusing on high availability, very high performance while staying in our power constraints.

I'll talk more (probably) in a future blog about the other things we're doing in the network.

Driving to Manteca

I had to drive to Manteca this morning. "Why?", you wonder? Well, I have 2 greyhounds. They're the sweetest dogs in the world - just the best dogs. But they're fragile. And they tend to break or injure their legs.

I managed to rescue two dogs that are incredible, but have needed a little extra care here and there. Max (maximum speed) is a white with brindle patches gigantic (75lbs) male. He's got a little problem with an osteoma on his right front leg. Just figuring that out cost me over $1000 at the local animal hospital. And we still don't know what it really is. We finally took him to a orthopedic surgeon that specializes in greyhounds. Of course, he's located in Manteca. Which if you don't know your california cities, is about 90 minutes from menlo park. If you're lucky. Max was on bed rest and such for 4 weeks - he's doing much better thanks.

And so of course, Max gets better, so we take him and Molly (a dark brindle) to the park to run. Molly is an italian sports car to Max's american musclecar. She's light (50lbs) and very, very fast. She was trained as a spoiler (she spoils the really fast dog's racing by nipping at their backs). The dogs took off in the dog run, and because it's been a while, they go from 0 to full speed in about a second flat. Things are fine for the first 3 minutes, and then Molly barks and starts whining in a "umm, something is wrong" kind of way. Great. She broke her ankle - technically dislocated her central metatarsal.

So that's why I drove to Manteca this morning. To see the specialist.

On the way home I was really surprised by the quality of driving along the 580 corridor. It was scary - and I don't get scared driving. I learned how to drive in Los Angeles. Doing 80mph 10 feet off the next guys' bumper is old hat to me. And I have a pro racing license (well, it's probably expired by now, but I had one). Anyway, I'm driving along and people are slicing and dicing at 80mph - slotting through the 10ft gaps. With semi's playing just to keep things interesting. At least twice I actually slammed on my brakes and prayed a lot that the guy behind me on his cell (with no headset of course) was paying attention at exactly that moment. I'd have been in the slow lane at a mere 75, but that was the Lane of Death with all the semi-trucks dicing in and out. All in all, kinda fun, but weird as I can't quite figure out what the hurry was all about. Does everyone start work at 10am? (note: I was on my hands-free in 3 separate meetings for the drive out and back)

Tuesday Feb 17, 2009

A new job, while keeping the old one...

I'm pleased to announce that the DotSun Engineering team (my group) is joining Cloud Computing Engineering effective today. This team is well known (well, at least inside of Sun we're well known!) for operating and Sun's other high volume websites. Under the new name Web Engineering, the team will continue to run the current sites, as well as begin work to deliver the next generation of The Web Engineering team will also have new responsibilities, including building the public web face of Sun's cloud.

The DotSun Technical Support team will also be joining Cloud Computing Engineering under the new Web Engineering group, where they will continue to support the sites.

There's the official announcement. Once things settle down, I'll try to convey what we're doing. I'm excited to join my old bosses/mentors Jim Parkinson and Lew Tucker.

At the same time, it's hard to say goodbye to a great organization - we were one of the few consolidated web groups in Corporate America. Between my boss and myself, we've talked to quite a few companies both in the USA and abroad and consistently we found that the web groups tended to be split along some artificial lines - engineering in one place, design in another, and publishing in yet another. DotSun (actually .Sun) was a single group with all aspects integrated. In many ways this was a great thing. In some ways it made us a huge target. I think it's hard for most companies to have an accurate idea of what they spend on the web. Heck, even with a consolidated group, we didn't own every site, we just tried hard to own the high volume, high visibility sites. Nevertheless, we had a significant budget - including all the way down to hardware and hosting. We enjoyed the support from the highest levels of the company, and my boss Curtis worked incessantly to make sure the exec's didn't forget why they were funding us at the levels required.

All said, that's not what's causing the breakup of the group. Many factors came into play, and maybe it was just time. As I said, happy and sad, all at the same time. Did I mention a little worried? If I know Jim and Lew, we'll have our work cut out for us!

Oh and I guess I have to change my tag line - we're not in marketing anymore - we're in a real software group! Yeah! (I think...)

Monday Feb 02, 2009

How many times can we learn the same lesson

How many times can we learn the same lesson?

Seriously - I spent 2 hours on Friday debugging an issue that we've seen many other times. And that's annoying. It turned out that glassfish by default has a really terrible configuration for running as a server (particularly at load).

An app was deployed and tests were run, and the issue was never encountered as the automated testing doesn't preserve sessions for long enough to evoke the nasty behavior (at least I think that's why the load testing never saw an issue).

Took @creechy and myself about 20 minutes to figure out, 5 minutes to fix it after wandering around the system trying to figure out what was wrong. Not having start/stop scripts was another problem (and one I'm fixing today).

I guess it's hard as the organization grows to make sure the institutional knowledge is shared - most of the lead engineers know to just send out a "has anyone deployed X on Y? Any special things I should do?" message. We've done so much of this so many times that several of the engineers can make the changes in their sleep. And that's a problem when a new engineering group or new engineer joins the staff. Transferring that knowledge is next to impossible - you have to know to ask the question or you just don't get the right info. Wikis are great for documenting what you've done, but they tend to get out of date pretty fast, and new versions of software come out with different config issues.

We'll do better next time - but I hate the pain of downtime on a production site!

Sunday Jan 25, 2009

Loosely coupled systems are better

My team and I have never been part of IT. We've always been somewhere on the peripheral, building, hosting, running web systems. I don't know if that's good or bad, but there have been side effects.

One of those side effects is that we never had access to back end IT systems inside of Sun's wide area network. We just ran things ourselves out on the edge. If someone inside needed data from our databases, we'd either copy the database daily or open SSH tunnels from inside to the external databases.

This turned out to have lots of advantages, and some disadvantages. The biggest disadvantage was that we really wanted access to some of those IT systems (customer record systems, etc.) Another disadvantage was that we had to maintain the SSH tunnels which became problematic over time.

So we created persistent VPN tunnels. And have paid the price ever since. Since we're not part of core IT, things happen on the network, we don't get notified, and the VPN drops. This has happened so often, I've thought of dropping the VPN. I'm starting to think of an alternate - like the VPN should be treated as a really terrible phone line. It only works occasionally, so use it that way.

Unfortunately my engineering teams have come to depend on the VPN - and that's going to have to change. Which means more work, and work I really don't want to do.

Tuesday Dec 23, 2008

Scripting storage (amber road aka fishworks)

Most people have taken the next two weeks off; I've decided to get a few things done that I don't get a chance to do... Like working with new hardware, trying out some software development, etc.

I recently got several Amber Road (fishworks) boxes, and I've developed a couple of rather simplistic (crude) scripts just to understand the capabilities of the boxes.

It's pretty amazing; with very little documentation, I hacked out a couple of rather useful scripts. One creates a new project, one creates a LUN in a given project and hands back the IQN (iSCSI Qualified Name). I was working on another to clone a LUN and drop it in another project.

Why you ask? Well, we're beginning to shift to using xVM for some of our hosting projects. xVM gives us the ability to dynamically move between front ends, expand the footprint for a particular app, etc. And running all this off iscsi based on ZFS really makes sense. We can install once, make a "gold master" and then just clone that as many times as we want. Really fast, really easy and efficient use of disk.

So, for your edification, two simple scripts for amber road. Use them at your own risk; I don't guarantee that they work on anything other than my own personal machine.

# Create a project named $1

if [ x"$1" = x ]; then
	echo "ERROR: You must specify a project name"
	exit 1

# create the project with the defaults
# 	set shareiscsi=on
#	set sharenfs=off

ssh -T $SERVER <<EOF
	try {
		run('shares project $1');
	} catch (err) {
		if (err.code == EAKSH_NAS_BADPROJECTNAME) {
			printf('Error! "$1" is an invalid project name!\\n');
		if (err.code == EAKSH_ENTITY_EXISTS) {
			printf('Error! "$1" already exists!\\n');
	run('set shareiscsi=on');
	run('set sharenfs=off');

ssh -T $SERVER <<EOF
	projects = list();
	for (i=0; i<projects.length; i++) {
		printf("\\t%s\\n", projects[i]);

And the other script - create a LUN in a project for use as iscsi target

# Create a LUN in a given PROJECT
# USAGE: create-lun.ksh   []

if [ x"$1" = x ]; then
	echo "Invalid argument!"
	echo "usage: $0   []"
	exit 1
if [ x"$2" = x ]; then
	echo "Invalid argument!"
	echo "usage: $0   []"
	exit 1
if [ x"$3" = x ]; then
echo "Creating LUN $2 in PROJECT $1 of size $SIZE"

ssh -T $SERVER <<EOF
	run('shares select $1');
	try {
		run('lun $2');
	} catch (err) {
		if (err.code == EAKSH_NAS_BADLUNNAME) {
			printf("%s: bad LUN name\\n", $2);
		if (err.code == EAKSH_NAS_LUNEXISTS) {
			printf('Error! "$2" already exists!\\n');
		if (err.code == EAKSH_ENTITY_BADSELECT) {
			printf('Error! "PROJECT $1 is invalid!\\n');
	run('set volsize=$SIZE');
	run('set sparse=true');

echo "Created $2 ($SIZE) in $1"
# don't know why it takes a while to assign an IQN, but if you don't wait you get an error!
echo "Waiting for IQN to be assigned..."
sleep 10
ssh -T $SERVER <<EOF
	run('shares select $1');
	run('select $2');
	printf("IQN: %s\\n", get('iqn'));
	run ('cd ../../..');


Thursday Nov 13, 2008

First Look at Amber Road (Sun 7000 unified storage system)

I received two amber road (Sun 7000 Unified Storage Systems) today and hooked them up in record time. I can't begin to tell you how impressive the install was. Simple, to the point, minimum necessary to get the boxes running and it \*just worked\*.

I've installed lots of different kinds of hardware here at Sun - and frankly this was almost un-Sun like in it's simplicity. Don't confuse folks, don't make it hard. Default to the right thing. 5 minutes and I have a usable \*system\*.

Stunning. And the performance from the tests I've run is just incredible. I thought I made a mistake in my calculations, and no, it was \*that freaking fast\*.

If you've not had the pleasure, take the time and try out the box. What can you lose other than a few hours trying out some really great hardware and software - it is after all Fully Integrated Software and Hardware (fishworks!)

Monday Nov 10, 2008

Sun Storage 7000 Unified Storage Systems...

Wow, that's a mouthful - Sun's naming alone should scare you off. Ok, if you're not scared off, let me tell you a little story about the 7000 series storage devices. I like the name Amber Road better, but I like Fishworks even better. Gah. If you can figure out our naming, then good for you!

I've known about the 7000 series for a while now - and my staff have seen demos from  Bryan Cantrill. It was a great idea then, and now you get to see it too. I've had access to demo systems for the past week, and my devices start landing this week - frankly my staff and I can't wait (you know I'm a hardware junkie, right?)

So there are many good reasons why the 7000 series is cool - the integrated flash devices, the hardware itself, blah, blah blah. Here's the amazing part - the hardware isn't even the coolest feature. It's the software. The ability to \*in real time\* drop in new tracing events to see what's really happening on the device is just unbelievable.

How many times have you seen your NAS devices suddenly "go slow"? And you have \*no clue\* as to why. I can tell you it happens often when running big infrastructure. You dig around for a while and maybe you can figure out that it's one machine and if you're particularly good you can figure out one user on one machine and slap their hands. With the 7000 you get the ability in real time to dig into whats being done using which protocol by user, by file, but whatever you want. It's stunning to see, and incredibly useful in managing the infrastructure. For a mostly detailed overview of the capabilities, check out Bryan Cantrill's presentation on analytics.

I'm deploying 7000 series boxes into my infrastructure next week - and I can't wait. It's great kit, but it's even \*better\* software.

So check out the 7000 series storage systems. Heck, Try 'em out - what have you got to lose - they're \*free\* to try!

Thursday Oct 30, 2008

Building out more datacenter space

It's time for me to build out more space in one of my datacenters, and I'm having a hard time coming up with the "right" design. Do I put lots of small switches in the cabinets, and home run few cables (4 per cabinet) or do I put big switches in (core switches) and home run everything?

Of course the big switch vendors want me to buy the big switches, but dang, that's a \*huge\* investment. I've been very successful deploying the switch-per-rack-function model. 

Is it better to chew rack units on a big frame and be expandable, or chew rack units in every rack? Very tough question. Add in the overhead of managing lots of small switches and things get very interesting. There is clearly an inflection point, but where is it? How would you calculate it?

The switch-per-rack-function model starts out like this: you build a network that's redundant at the firewall and load balancer layers, and then you do odd and even racks. Each rack has a front-end network (FE), a back-end network (BE), a service processor network (SP) and a serial console (just in case!).

You've already chewed through 4RU per rack, and that's not including the 2 RU that you need for additional power strips 'cause the power density is so high, the vertical strips (208/30A) don't have enough outlets!

Thursday Aug 14, 2008

Space shuttle launch from passing airliner

Some lucky folks got to watch a space shuttle launch from a passing Air Canada flight. And they videotaped it too! I imagine the experience was much better in person, but it still conveys the excitement.

That's one thing on my big "to do" list - seeing a space shuttle launch from up-close-and-personal. I've always wanted to do that since I was a kid and my dad worked for Rockwell. I got to take a tour of the facility that built the shuttles here in California. It was truely amazing. At one point the tour guide pointed out that we were standing on the wing and the wall on the left was the shuttle's skin. Freaky stuff. I even watched them gluing on the shuttle tiles (talk about stunningly boring - it took at that point about 30 min to glue on one tile) Of course, I better get a move on - the shuttle is scheduled for retirement in 2010!

via [PointNiner]

Coolest photo app \*ever\*

This has got to be the coolest photosynth software I've ever seen. Changes between night and day, provides real-time navigation through a series of photos, etc. I wonder where I can get it?

Monday Jun 23, 2008

Server growth

I asked my operations manager to run some stats to determine the number of servers we have in production, and the results provided a pretty interesting graph... And very good justification for the increasing size of the operations group. It may not seem like a lot of servers in the larger scope of websites (some people running thousands of servers) but if you realize that most of the applications we run are on very few servers (2-6), suddenly it looks like a much bigger problem.

Sunday Mar 30, 2008

2008 Submarine Cable Map

From a link on slashdot, the 2008 submarine cable map 

Monday Mar 17, 2008

Been busy...

I find with all the social media sites like twitter, facebook, linkedin and my personal blog, I'm not getting much time to blog for work. Well, that and figuring how much of the seedy underbelly of hosting the high volume websites for Sun I can really talk about (expose :)...

In some cases, I'd love to share parts of the hosting process/environment/challenges but the details I'd have to share might be considered security concerns. So instead, I just share what I can. Oh well.



How not to roll cable...

How Not To Roll Cable Up Stairs - Watch more free videos


I run the engineering group responsible for and the high volume websites at Sun.

Will Snow
Sr. Engineering Director


« August 2016