Change is Good
By marchamilton on Apr 09, 2008
Almost all modern microprocessors today are multi-core, that is they have two or more CPU cores packaged into a single chip which fits into a socket on your computer. Until just a few years ago, that was not the case. In part, what enables multi-core and CMT processors is Moore's Law which states that the number of transistors in an integrated circuit will roughly double every 18 months (as the transistor size shrinks by half due to process improvements). For at least twenty years, ending in about 2005, companies that designed CPUs focused primarily on the smaller transistor size and larger transistor count to run their CPUs at a faster clock rate. But this had diminishing returns. Many of us remember upgrading our 1 GHz PC for a 2 GHz PC, only to find that our web browser or word processor really didn't run much faster. While there are many application bottlenecks (such as getting data from the network and reading or writing to your disk drive), the main thing that slows down most applications is reading and writing data to your system's memory. And memory speeds have only doubled every six years or so, meaning that buying a processor that ran at a 2x clock rate, when your application was limited by memory speeds, did little to improve overall performance. So for twenty years, microprocessor designers focused on a technique called instruction level parallelism to actually get more work done as the CPU speed increased. Basically, instruction level parallelism looks for opportunities to execute different parts of the same computer instruction in parallel. But as clock rates get into multi-gigahertz, even the best instruction level parallelism can't keep up, at least says some of the best known computer architects in the world, like John Hennessy.
If you don't recognize the name, John Hennessy, is President of Stanford University and one of the best known professors of electrical engineering and computer science thanks to his seminal textbook Computer Architecture: A Quantitive Approach. Professor Hennessy gave a talk at USC in 2005 which I attended with my teenage daughter (who isn't particularly into computer architecture). In his talk, he called 2005 the end of the road for instruction level parallelism. He then went on to describe Sun's Niagara 1 (UltraSPARC T1) processor, which would soon become the world's first processor to have eight cores each being able to execute 4 separate programs, or threads, in parallel. Rather than look for diminishing opportunities to execute parts of the same instruction in parallel, the Niagara processor does almost no instruction level parallelism but executes entire programs, or threads, in parallel, up to 32 in the original Niagara 1 processor. My daughter left the talk rather excited saying, "dad, now I understand what Sun is doing and why you get so excited about it".
Sun subsequently open sourced the design of the Niagara processor then went on to build the Niagara 2 (UltraSPARC T2) processor, which is able to run a total of 64 threads in parallel, 8 each on its 8 cores. Today, Sun launched the T5140 and T5240 servers, the world's first two socket CMT servers. Powered by two UltraSPARC T2 Plus processors, these servers can execute an amazing 128 threads at the same time. So, you might ask, why would you ever need 128 threads on your desktop? I can think of one thread to run my word processor, one to be download a web page in the background, and if I'm feeling really creative maybe another thread to be encoding a video. But 128 threads, who needs that?
The answer, of course, is that you don't need 128 threads today on your desktop, but lots of server applications can use this many threads and more. Think of a web server responding to 128 page view requests or a server running Sun's MySQL database updating 128 bank account records, or just about anything else you would do on the web today. We have been quietly shipping these servers for several weeks already, and plenty of customers have had a chance to find things to do with their 128 threads. HPCVL, one of Canada's regional HPC centers is already using several dozen of these servers to run their highly threaded HPC applications. If you are wondering how it might work for you, you can try one free for sixty days, we even pay the return shipping if you don't end up buying it.
One thing to note, your operating system will have to know how to effectively handle 128 threads. Unless you are using Solaris, the operating system you are probably using on your x86 server most likely handles no more than 8 or perhaps 16 threads effectively. Luckily, Sun's Solaris operating system has been able to run powerful servers with this many threads for years and helps you get the maximum performance out of your T5140 or T5240 server. In the same 1RU or 2RU space of a typical x86 server, you can get five times or more the performance. Of course no single server or CPU is optimized for all applications, so we still will keep building our other servers, including our M9000 server (which runs up to 256 threads using a more conventional 64-socket SMP design) and our AMD and Intel powered rack mount and blade servers. In fact, I'm headed to Intel's Portland engineering center later this week to give a talk to some of their engineers. Not on the T5140 of course, but on Solaris. While Intel's current 4 core processors only run 1 thread per core today, Intel has certainly talked about their plans to build CPUs with more cores and more threads per core, which is exactly why they are spending so much time working with Sun engineers to optimize Solaris for their current and future x86 based CPUs.
And on the topic of change is good, I decided to change my hair style today. Actually, I lost a bet which is why I ended up with this change.