Walking with a limp

You can't please everybody. This customary parent or mentor wisdom is usually thrown at our lack of academic focus, or at our design choices as engineers. Generalists are useful, but complete knowledge was EOL announced when we left The Garden of Eden (or got evicted rather), and Last Shipped around Leonardo Da Vinci. Generalists cover just a different subset of the knowledge tree, exemplifying another form of a specialist. Shall we say Horizontal Specialists. Last time I had an antagonistic experience with an HMO General Practicioner, I voiced what I really thought about him. He called building security. Next time I will just use the Horizontal Specialist sobriquet, and walk away. But this blog is about the specialization of machines and devices. I will leave human and medical doctor specialization out to avoid entangling my beloved employer.

For computing devices the specialization dilemma is captured by the historical name we have been using for the processors and the servers we make, General Purpose . A machine suitable for many uses, possibly beyond its original designer's intent, say the optimists. An engineering specialty so narrow that its practicioners do not know much about the software and workloads above it, say the cynics. They are both right, "generalist" machines designed by "specialist" humans.

Those who drive teenagers to school may have seen how they drag their feet on their way to class. This teenager's mother commands him to stop being lazy and please pick up his feet. His father lectures him that men don't do whatever their mother's say, men do what they want. In an attempt to please both parents the poor teenager drags one foot and picks up the other, walking with a limp. Are we designing a limp into our machines by trying to please everybody? The CMT throughput server philosophy postulates that we can run faster if we don't limp, and indeed the UltraSPARC T1 based products we are launching these very days do just that. We decided to please throughput horizontal integer workloads at the expense of floating point single thread.

The immediate payback we get from creating more specialized general purpose processors and systems, is that a whole set of applications and deployment architectures take far less boxes. This benefit is compounded by the power efficiency of these UltraSPARC T1 boxes. Before you accuse me of spewing out unquantified generalities, I will offer quantification along two dimensions, within and across Moore's law process generations.

For purists interested in architectural prowess, comparing within a given process manufacturing technology is the fair comparison. Dealt a hand of cards (wafer cost, transistors, complexity, and power), architecture is the game of playing them best. But given exponential transistor increases across generations, ignoring the impact of process technology and just relying on architecture leads to certain defeat. We must compare both within and across generations.

Within a generation CMT is roughly an order of magnitude improvement. Take the web facing workloads I care about for some of my work, they run about 8 to 10 times faster on a Niagara system than on a contemporary general purpose Sparc processor consuming the same power, made in the same 90nm process, but having the limp of pleasing the single thread and Floating point constraints. Getting one order of magnitude out of the same power envelope and manufacturing technology is pretty compelling, but in the interest of full disclosure let's show the card we pulled out of our sleeve: Memory. CMT is all about making the most of system memory bandwidth, and in comparing within a generation CMT plays with much more memory bandwidth in its hand. Did we cheat? Not really, architecture is also about interfaces and optimizing the memory interface is part of playing our cards.

Comparing across generations is projecting the throughput ratio between CMT on the current vs. the next process technology nodes, while keeping the power and cost as invariants. If you were expecting the answer to be another order of magnitude, I'd like to have some of what you are smoking. Architectural order of magnitude improvements are kind of rare, so now we fall back to riding Moore's Law. Having lowered your expectations, here are the good news. Unlike previous limping approaches, CMTs will give you nice integer factors (2x for example) across Moore's law cycles. And that is all we can ask from a new architecture, a solid one time jump to a different curve, and climbing the new curve at least with the same rate as before the jump.

Our competitors don't neatly align their offerings to facilitate my simplistic two dimensional comparison within and across process technology, they put their stuff out and compete. I will just say that the competitive data gives me a warm feeling about this cool technology, and defer to my fellow Sun bloggers covering competitive angles much better and deeper than me.
I recommend following Welcome to the CMT era, a great repository of all things CMT at Sun by Richard McDougall so I can walk away from trying to please everybody and shift back to my original topic, generality vs. specialization.

Is the sacrifice in generality worth the benefit. Does the sacrifice break our axiomatic belief in layered modular system design, in not caring and not counting on the implementation details of other modules or layers. I will state a claim so counterintuitive that you might want to invite me in to explain myself and take a Breathalyzer test. We claim that a specialized CMT architecture actually broadens the applicability of the processor beyond what was possible with its original general purpose sibling. It derives enough generality to apply to other elements of the IP and telephony network infrastructure, which hopefully is the subject of a future posting, for which I have at most the heading. But if I don't get to it, feel free to give me a call and invite me.

[ Technorati: NiagaraCMT, ]


Post a Comment:
  • HTML Syntax: NOT allowed



« June 2016