Tuesday Nov 20, 2007
Tuesday Aug 28, 2007
By jmeyer on Aug 28, 2007
Wednesday Jun 06, 2007
By jmeyer on Jun 06, 2007
A Look at POWER6's Lagging Architecture
Ever since Sun announced the world's first multi-core microprocessor, most of the entire high-volume chip-designing world (that world would be Sun, IBM, AMD, and Intel) have realized that the race was on to see who could build a microprocessor with the most cores and the most threads to handle the modern applications that thread-rich, Internet-based computing has spawned. That "most" includes everyone except IBM.
Declaring the end of instruction-level parallelism (ILP) and the advent of the era of thread-level parallelism (TLP) as far back as IDF 2003, Intel has already begun shipping quad-core processors (Sun is the first to ship systems). AMD will follow suit later this year. Sun, of course, set the bar in 2005 with its 8-core, 32-thread UltraSPARC T1 and will be shipping the 8-core, 64-thread "Niagara 2" processor in systems later this year. I'm guessing we'll probably beat the 2-core, 4-thread POWER6 to the volume market by a pretty solid margin, mostly because IBM failed to announce any availability of POWER6 in the volume (or high-end, for that matter) markets.
The reason for this enthusiasm around multithreading is described brilliantly in a whitepaper titled The Landscape of Parallel Computing Research: A View From Berkeley, written by a multidisciplinary group of Berkeley researchers, including the father of RISC, David Patterson. They see microprocessor performance hitting a "brick wall" due to three factors:
- The Power Wall: "Power is expensive, but transistors are 'free'. That is, we can put more transistors on a chip than we have the power to turn on." Particularly true on a 65nm chip like POWER6 or Niagara 2, unless you do something about it.
- The Memory Wall: "Load and store is slow, but multiply is fast. Modern microprocessors can take 200 clocks to access Dynamic Random Access Memory (DRAM), but even floating-point multiplies may take only four clock cycles." And increasing the size of the already great big caches traditionally used to mask memory latency aren't giving us a good return on the transistor investment anymore.
- The ILP Wall: "There are diminishing returns on finding more ILP. ... Increasing parallelism is the primary method of improving processor performance". They dismiss increasing clock frequency as the primary method of improving processor performance as old conventional wisdom.
To illustrate the point, Figure 2 of the whitepaper shows the impact of the old faster-clockrate, bigger-caches mentality that has prevailed in microprocessor design for the last thirty years:
The upper right part of the chart shows a lag of processor performance since 2002. The green line coincides remarkably closely to the doubling of performance most people attribute incorrectly to Moore's Law (Moore's Law is a statement about transistor density, not performance; but given the fact that performance had been tracking so closely with transistor density from 1986 to 2002, one could be forgiven for collapsing the two -- but not any longer.)
So, higher clock frequencies are no longer the key to performance any more and neither is using your free Moore's Law transistors on building bigger caches. So what did IBM do? They more than doubled their clock rate (2.2GHz to 4.7GHz) and quadrupled the size of their L2 on-chip caches (1.92MB on POWER5+, 8MB on POWER6). And what performance speed-up did they get for their efforts?
To answer that, let's look at IBM's proprietary rPerf benchmark which, according to IBM, is a benchmark that "simulates some of the system operations such as CPU, cache and memory. However, the model does not simulate disk or network I/O operations." Take a look at the right to see how well doubling the frequency and quadrupling the cache size worked for POWER6 [\*] (the green line is, once again, roughly 52% performance increase per year).
POWER6 is clearly in the microprocessor category of diminishing performance returns described by the Berkeley whitepaper because it has pinned its hopes on old, unimaginative, and out of date techniques that the rest of the industry has largely abandoned. For all IBM's hype around POWER6 being "convention shattering", it's still only two cores per chip and two threads per core, just like POWER5+. Moreover, they sacrificed the out-of-order execution that gave POWER5+ a boost to get the frequency and cache size increases. In most respects, its enhancements are completely evolutionary and not at all revolutionary. And in some respects, they're actually going backwards.
There's more. POWER6 is pretty much all we're going to see from IBM for the next three years at least (plus or minus still more clock speed increases!), since POWER7 won't be around until 2010 at the earliest. And there's still no word from IBM on when the missing entry-level and high-end POWER6 systems will show up.
So how is Sun stacking up against this? Sun has already publicly stated that we will be releasing three new 100% binary compatible SPARC processors over the course of the next eighteen months, each one optimized and targeted to different application workloads. And at the 2007 Sun Analyst Summit, Sun's Vice President of Systems, John Fowler, gave a presentation that included this performance roadmap for SPARC (I've updated the little sunburst milestones and once again, I've drawn in the bright green line representing roughly 52% performance increase per year):
We're already devastating IBM with Sun's CoolThreadsTM servers in terms of performance, rack space, power, and cooling. By the end of this year, we'll do it again. And as I said in a previous blog entry, when the ROCK systems bring the power of chip multi-threading to the high end, there will be absolutely no reason any customer would want a POWER system unless that particular shade of IBM blue went better with his datacenter décor.
The reason for all this is that years ago, Sun recognized that we were facing exactly those problems that were mentioned in the Berkeley whitepaper and decided to do something extraordinary. Rather than focus on clock frequency and cache size, like IBM, we decided to question everything that everyone knew about how to make a microprocessor go fast. The result was we made a big bet on chip multi-threading, or CMT, that's been paying off since 2005. That's why Sun's CoolThreads servers are the fastest-ramping product line in Sun Microsystems history and why customers are so enthusiastically endorsing them.
And you ain't seen nothin' yet. That's what's keeping IBM up at night.
\* This chart is based on normalizing the following rPerf numbers from IBM to 12.27: POWER5email@example.comGHz: 12.27; POWER5firstname.lastname@example.orgGHz: 13.83; POWER6@3.5GHz: 15.85; POWER6@4.2GHz: 18.38; POWER6@4.7GHz: 20.13.
Monday May 14, 2007
By jmeyer on May 14, 2007
For those of you whom I have not had the pleasure of meeting yet, my name is John Meyer, and I'm one of the seven SPARC server technical specialists in Sun Microsystems' U.S. field organization. My travels have taken me to many a different customer since I joined Sun over 13 years ago, but it was not until my good buddy, fellow office practical joker, and mentor Dave Edstrom started blogging recently that I thought I would take up the keyboard and start getting my thoughts out to cyberspace as well. I'm very passionate about Sun in general and SPARC in particular.
"Why should my long-winded rants, sanctimonious barbs, and brutal enthusiasm be confined to Sun's internal e-mail aliases? This is good stuff!", I thought to myself. It was also a great chance for me to Photoshop a cool logo and name my blog after my favorite scene from one of the funniest movies ever made (not to mention a cult classic among the systems engineers in the local Sun office).
If you're wondering why I'm so excited about SPARC after all these years, have a look at this chart, which summarizes the state of the SPARC ecosystem in early 1994, the year I started work here. Our biggest competition in the systems area was DEC's Alpha and we stacked up like this:
|MY WORLD IN 1994||SuperSPARC||Alpha 21064A|
|Frequency (MHz):||33 - 75||200 - 300|
|Operating System:||Solaris 2.2 was slow and buggy||Ultrix and VMS were world-class|
|Service and Support:||Break-fix, immature||Legendary|
You should keep in mind that 1994 was a time before most people knew what the Internet was (I had to explain it to my mother when I landed the job at Sun; now she has a webpage, downloads music from iTunes, and tells me how to auction stuff on eBay). It was also a time when clock rate really was the leading indicator of performance in a system. Caches were still fairly new and uncomplicated, and interconnects were relatively simple. We were getting beaten hands-down on benchmarks for all the right technical reasons by Alpha, and I mean by miles.
But you know what the punchline is: DEC and Alpha are both gone forever, and SPARC is not just still around but on the leading edge of microprocessor and open systems technology. If we could win with what we had back then, we can certainly win with what we've got now.
I'm convinced that the reason for our victory is that our customers sensed the same thing that I've always known about Sun. That is that we are one of the very few systems companies left who are doing anything really interesting. When not a single customer or analyst thought there was anything wrong with proprietary hardware and software, Sun saw open systems as a business strategy and forced all competitors to follow suit. When most people thought a network was no more than a tailpipe to a mainframe, our motto was "The Network is the Computer" (pretty obvious now, eh?) and we developed Java, the lingua franca of the Internet. And now we're often told by the purveyors of conventional wisdom that Sun's investment in SPARC and Solaris are a waste of money and resources. Why develop a processor and operating system when Intel and Linus (or Bill) can do those things for you? The reason is simple: if we only did what everyone else is doing, why in the world would anyone buy from us? That's usually when those purveyors of conventional wisdom fall silent.
I firmly believe that chip multithreading is the single most important technological advance the microprocessor world has seen in at least two decades, and we're the only ones who have it. A few months ago, I met with the CIO and CTO of a major financial firm in New York for a SPARC futures briefing. When I got to the part about ROCK, I told them, "This is the point where we finally put IBM out of the server business forever." They thought I was kidding and smiled, but when I told them that I was dead serious, they knew I meant it as only a fevered lunatic can mean something.
I also know that Solaris is unquestionably the most reliable, scalable, performant, and secure (not to mention technologically interesting) operating system in the world today. Don't take my word for it: when was the last time time you saw anything in AIX, HP-UX, Linux, or Windows beat out thin-film solar energy panels and powdered inhalable insulin as the Wall Street Journal's number one most innovative technology of the year?
In short, it's just fun to work at Sun right now. Our customers know they get a competitive differentiator, not just a world-class piece of electronics, from us.
So I hope you'll tune in occasionally to read my thoughts, tolerate my rants, forgive my mistakes, and participate with me in this stunning new world of SPARC.