what can you do?
By relling on Aug 17, 2004
Q: What can you do with a billion transistors?
A: a whole lot of things!
Intel has announced that they expect to see 1 billion transistor designs in the 2005 timeframe. I'm not talking about memories, which are already at the billion transistor size. I'm talking about logic devices at that density. Wow! Way back in grad school, I did some studies on wafer-scale integration where we were trying to produce multiprocessor designs which could use, essentially, more than one design on a wafer. The defect problems lead you to develop all sorts of exotic inteconnect strategies and make all sorts of irritating compromises. We always dreamed of having 100 million transistors, because then all of our design dreams would come true. Now, that density is almost everywhere. Late-model Itaniums are around 400 million transistors. The new Power5 has around 270 million transistors. Niagara will have lots of transistors too, though I don't know the exact count -- some details will be made public at the Hot Chips conference next week.
Q: So, what does all of this have to do with clusters?
A: a whole lot!
Much of the work put into clusters is intended to solve the problem of using more transistors. Let me explain. Back in the mid-80s, many companies, including the startup I was at, were trying to use lots of processors in clusters to solve compute-intensive problems. For the company I was at, the focus was on solving fluid dynamics problems. These sorts of problems can be solved with parallel systems, but it is not easy. In such systems we were always frustrated by the grainularity of the division of work. If you got the proper mix of work-per-processor, memory-per-processor, and interconnect-use then you could get some impressive throughput. If not, then it might suck worse than a laboring uniprocessor. Debugging programs was very, very difficult and even keeping the hardware running was a big challenge.
Over the years, Moore's law has proven to be quite reliable in predicting what we can do next. In the mid 1980's we barely had 32-bit microprocessors which required a handful of supporting chips to make a system while the memory size was about 4 MBytes and there were no on-processor caches. Today, we can easily have on-chip caches of 4 MBytes (about 25 million transistors with 6-transistor SRAM designs). In 2005, Intel implies, we could see 64 Mbytes of on-processor cache. But why waste all of those transistors on memory? Why not use the space to reduce the distance?
Q: What does distance have to do with anything?
A: everything in a cluster is a function of distance!
It is exceptionally difficult to move information faster than the speed of light. I won't say impossible because of Clarke's First Law. When we talk about on-chip timing difficulties at high speed, we do get concerned about sending data across a chip. For example, assuming a 95% phase velocity, we can move data 1 cm across a chip in about 35 picoseconds. There is a whole bunch of engineering trade-offs which make this difficult to achieve in a meaningful way on a chip, but I won't bore you with the details. When we go off-chip, this time increases dramatically. As we go across a printed circuit board, distance increases by about an order of magnitude and the engineering trade-offs further reduce the meaningful time (distance is directly proportional to time). If you go off of the printed circuit board, add more orders of magnitude.
What this means, practically, is that you really want your processors very close to each other. In the early 1990's when Sun introduced its first symmetric multiprocessor (SMP) system designs, there was a whole bunch of academic literature which said that they would never scale because we couldn't get the right mix of distance and processing power. I recall being at the Usenix conference where a distinguished panelist from a well-known Unix development company stated categorically that SMPs would never scale past 4 processors. Later this became 8... soon after it was 64... today it is over 100. The key is the distance. You will not be able to build a cluster system where the distance is on the order of 100 m (eg. Ethernet) which will be faster than a similar cluster system with a distance of 1 m (eg. a starcat backplane). And there is no way you could possibly beat a system with a distance of 1 cm (eg. Niagara). In a nutshell, this is why processor designs like Niagara are very, very cool. We have the number of transistors needed to shrink the distance between processors. We are still running a cluster of sorts, but the distance between the nodes is 1 cm versus 100 m. We could have only dreamed of this back in the 1980's but it is reality today.
Q: what about clustering for availability, isn't a Niagara a SPOF?"
A: well, yes, actually, but that is fodder for another day...