THE INDUSTRIAL REVOLUTION, FINALLY
By Gregp on Oct 16, 2006
I've commented frequently upon a central paradox of IT: software and hardware components are the products of fierce, high-volume competition, yet their final assembly by IT organizations is one-of-a-kind artisanship. To quote Scott McNealy, I've never toured a datacenter with the reaction "Wow, this looks just like the one I visited yesterday!"
We ought to ask why this is so, because it is supremely inefficient. Practically all IT organizations speak of the commoditization of computers, but seldom of computing. Partly, this is because computers and storage are simple to understand and quantify compared to the enormous complexity of their assembly into systems that deliver some (with hope, predictable) level of business service. This complexity not only is expensive, it's viscous. Business innovation, the central goal of IT, suffers.
There is certainly a school of thought that this complexity is inherent and the proper (read: profitable) thing for a vendor to do is insulate the IT customer from it with "services and solutions". From our vantage point, this is a punt. It's far better to attack the composition of systems to provide useful service as an engineering problem, not as an Exercise Left to the Reader.
And it is precisely in this spirit that Project Blackbox was born. We went back to engineering first principles: how do you transport, physically assemble, power, cool and ultimately recycle computing infrastructure? Take the joules-in, BTUs-out problem as one of engineering co-design. Something that can be quantitative, efficient, and manufacturable in volume.
Many unquestioned assumptions were put on the table. "Why do we build datacenters?" (Because of latency and administrative scale issues.) "Why do we build machine rooms?" (To let people and machines cohabitate, you know, to mount tapes, clean out chad, and punch buttons...) "Why do we have hot-swap fine-grained FRUs?" (To give the cohabitants something to do?)
Where we ended up with Project Blackbox is admittedly not for everyone. It is designed for ferocious scale, complete lights-out, fail-in-place, virtualization, uber-fast provisioning, and brutal efficiency. And I'd like to emphasize that the we expect that the most efficient way to deliver computing and storage services is with containers. Full stop.
While we've tried to keep the project as stealth as possible, we have disclosed aspects of it during its development to selective sets of potential customers and analysts. Feedback has been categorically positive, from "I need ten of these tomorrow. No really, I'm not kidding." to a giddy "This is classic Sun! Why didn't someone do this before?".
Yeah, this seems obvious, so why don't we build datacenters this way? It's the same kind of reaction one had to luggage with wheels, in-line skates, or parabolic skis. Obvious in hindsight, so why did it take so long? Well, obvious at one level, but most definitely dependent upon basic technological progress (in these cases, advances in bearings, plastics and laminates).
For containerized computing, the underlying enabler is the confluence of power density, lights-out management, horizontal scale and virtualization.
Let's look at power density. Half-a-dozen years ago, we were indeed building mondo datacenters, but at quite approachable power densities: typically under 100 watts/ft2. But as we continued to compress physical dimensions (the 1RU server and, now, blades) while simultaneously running hotter chips with more DRAM and disk, watts-per-rack skyrocketed.
Today, 10 kilowatts/rack is standard fare, and many folks are facing 15, 20 and even 25 kw. A standard rack fits nicely over a 2ft x 2ft floor tile. Thus, a 20 kw rack "projects" 5 kw/ft2. If my datacenter is 100 w/ft2 then I can only put one such 20 kw rack every fifty floor tiles! Even a completely modern, leading edge datacenter at 500 w/ft2 spaces our 20 kw rack one every tenth tile.
(It's the square root, natch': for the 500w/ft2 facility it's a rack, two empty tiles, then a rack, in both x and y. For the 100 w/ft2 case, it's a rack, six empty tiles, ...)
No wonder that people are out of space, power or cooling (they are all inter-related). And no wonder I get people jumping out of their chairs wanting ten Blackboxes "tomorrow"![Aside: don't confuse power density --- watts/unit volume --- with power efficiency--- watts/unit performance. Even super power-efficient designs such as the eight core UltraSPARC T1 can lead to high power densities, for the simple reason that cramming processors closer together allows them to be more cheaply and effectively interconnected. Low power processors do not necessarily imply low power density systems. But because you use fewer of them overall, they most definitely can cut the power costs of delivering a certain throughput or level of service]
Actually, the higher the power density, the more desirable containerized computing becomes. A standard TEU-sized (8ft x 20ft) container readily can handle eight 25 kw racks. That's a power density of 1,600 watts/ft2. And we really aren't breaking a sweat at these levels, they could easily be doubled owing to the dedicated heat exchanger for each rack position in the cooling loop.
Lights out management (LOM) is another technology enabler. Simply put, we've had a lot of pressure from our customers to make sure that no one is required to interact with a functioning server or storage system. Again, this is a long way from the implicit assumption left over from the mainframe era that there are "operators" for computers.
[Another aside: we are constantly reminded that if you want to build very reliable systems, the best thing you can do is keep people's fingers away from them. There are significantly non-zero probabilities that an operator coming in physical contact with a system, despite all best intents and training, will break something; not infrequently, by disconnecting a wrong cable or wrong disk drive.]
When we mix in virtualization and/or horizontal scale, we finally get to the place where a bit of code doesn't have to run on a particular computer, it only has to run on some computer. Thus, we can use mature techniques such as load balancing, along with emerging ones such O/S paravirtualization and dynamic relocation, to abstract applications from computers. And that leads to service strategies such as fail-in-place, and a wholesale re-evaluation of things like hot swap and redundant power supplies.
Clearly, this level of physical engineering attacks only a focused part of the complexity-at-scale problem, which is manifold. Given this qualification, Project Blackbox is a real, tangible step towards the purposeful engineering and mass production of modular infrastructure. The cobbler's children no longer have to go barefoot, and the industrial revolution can finally arrive for scalable computing.
However the market develops, I know my wife, Laurie, is relieved that Project Blackbox is finally, well, out of the box. For the past two years, whenever seeing a container any where on the road, a train, stacked aboard a ship, or sitting motionlessly at some job site, I'd predictably mumble "that could be one of ours...". And that would lead to my pleading for a commercial driver's license so I could haul them around on an 18-wheeler to different events. "Have you seen the way that you drive?" is the inevitable reply. Of course, I know she's right (and she's a far better driver than I, for the record).
But, Laurie, please, I'll only drive it on the weekends, and just around the block!