Tuesday Nov 20, 2007

POWER6 Goes Thud: Part VII

Whither the POWER6 product line? After half a year, it's still missing! What went wrong?[Read More]

Tuesday Aug 28, 2007

POWER6 Goes Thud: Part VI

For a company that invented virtualization, IBM is still trying to catch up to Sun.[Read More]

POWER6 Goes Thud: Part V

Three months is a long time to be in labor with octuplets and have only one baby to show for it.[Read More]

Monday Jul 02, 2007

Confessions of an iPhone Lemming

This past Friday, June 29th, I joined the legions of people standing in line to buy Apple's new iPhone, finally ending six months of anticipation and anxiety. Berkeley Breathed even drew a cartoon about me for last Sunday's comics:

It really wasn't that bad. Plan A was to camp out on Apple's on-line store until 6:00PM Eastern and blast away at the order form so I could get one in a few days with rush delivery. But on the 28th I formulated Plan B, which was to check the local AT&T store to see how long the line was. Seeing no one there, I came back on the morning of the 29th and to my astonishment, there was still no one in line. I asked the store clerk what was up and he said to come back at 4:30 when they would close, form a line, and re-open at 6:00. Just to be sure, I got there at 3:30 and only 12 people were ahead of me. By 6:30 I'd not only gotten my iPhone, but I'd probably gotten one of the last 8GB ones at the store. Not bad for a Plan B.

But why line up for this thing at all? After all, cyberspace has been full of iPhone poo-poo's:

  • AT&T is the sole carrier available for iPhone and their service is less than desirable (I complained that it took 3 hours to activate my phone until my friend told me his took over 24 and AT&T's portal to my account access is still not working over two days later)
  • AT&T's EDGE network is too slow
  • there are no third-party apps available for iPhone (although this is about to change)
  • there is no instant messaging capability
  • there is no GPS (especially nasty if you want to get the most out of iPhone's incredible Maps widget)
  • there is no Bluetooth sync capability
  • one should never buy version 1.0 of anything
  • the next rev will be out before the battery runs down
  • etc. etc. etc.

All of these criticisms are right on the mark, and completely beside the point. To these people I say: You just don't get it, do you?

I don't mean that in the mushy, kumbaya sense of participating in one of the biggest cultural events to come out of the Left Coast since last month or whenever. I am, after all, a senior technologist in the world's only other innovative technology company. What I mean is, these may be reasons why you shouldn't get an iPhone; they are not reasons why I shouldn't get one. Let me explain to you where I'm coming from.

  1. I use an Apple MacBook Pro for everything. Before iPhone, I used to sync my Treo 680 using Missing Sync, which is basically the only thing anyone should use to sync a Palm device. Still, it was a third-party app in the way between my phone and my native desktop.
  2. Speaking of third-party apps, when I first entertained thoughts of buying an iPhone and did an inventory of all the apps on my Treo that I would have to do without, I realized that most of them were little $15 or $25 widgets I bought just to manage the phone itself or replace missing functionality in PalmOS. Here's a sample:
    • I used the amazing and wonderful DateBk6 on my Treo for calendar management and to handle time zones, which PalmOS does not. This was crucial, since my customer travel schedule means I'm in a different time zone almost every week. But wonderful app that it is, chalk up another third-party piece of software. In the path DateBk6 -> Missing Sync -> iCal, lots of information got lost (time zones, icons, etc.). On iPhone, I've got MacOS, just like on my desktop, so everything important to me migrates seamlessly. And free software such as jsCalendarSync means I'm only one step away from Sun's calendar server, and I can make my updated calendar available to others when I want them to see it, not every time I change something (unlike other real-time SyncML solutions).
    • I used Contacts5 to get decent address book functionality along with photo support. But once again, in the path Contacts5 -> Missing Sync -> Address Book, all the information that made Contacts5 worth buying didn't carry over. iPhone has no such problems.
    • I used Pocket Tunes to play my iTunes music on my Treo's measly 1GB memory card. But I had to maintain a separate, dummy play list in iTunes just to do so and of course, my iTunes playlist data did not copy over. Ditto for photos and iPhoto, despite the Missing Sync's photo conduit.
    • ChatterEmail was great as far as it went, and was much better than the bundled Versamail, but mostly it made me not check e-mail unless I absolutely had to.
    • Don't even get me started on how much of a pain it was to convert all my videos to ASF to get them to play (poorly) on the Treo.
    • I also had third-party backup utilities, new application sandboxers, and other little system-management tools, most of which I'd forgotten about until I did my Treo inventory.
    The most staggering result of this inventory is what I discovered when I went into Quicken to see how much I'd spent on all this stuff in the last year: over $300, about $15-$20 at a time! The only thing I'll truly miss is my shopping list manager. Everything else I can pretty much do without. My handheld device should be a communications and multimedia portal ... period! I don't care if it doesn't do my taxes, pay my rent, record my many failures at dieting, or anything else the Microsoft or Palm people are telling you distinguishes their clunky platforms (talk about people who don't get it!).
  3. I don't mind debugging version 1.0 of almost anything that comes from Apple. Besides, all those third-party Treo apps I told you about didn't exactly play nice with each other, even after five and a half years of PalmOS maturity. The most egregious example of this is when my Treo kept rebooting when I was on a call and someone else tried to call me at the same time. That's probably worth the price of iPhone admission right there.
  4. It will be nice to update my iPhone software through iTunes like an iPod; not back the whole thing up, wipe the firmware clean, restore everything, and then go looking for the pieces that didn't make it, such as happened all too often on my Treo.
  5. I was already an AT&T wireless customer from the Cingular acquisition and despite all the whining about their lousy service, no one has ever made any compelling argument about how much better it is elsewhere.

In almost every respect, I come out way ahead of where I was three days ago before my purchase.

If all this sounds like a long-winded rant designed more to justify the $600+ expense to myself than to you, I really don't care. The iPhone is simply a beautiful combination of complex technologies, made all the more beautiful by its ability to hide their complexity. Like all Apple products, it just works and I wanted it. So iPhone, here I come. And when the next rev comes out in another year (or less) and is twice as good, I'll buy that one too.

In my next entry, I'll tell you what I like, what I don't like, and what I'll wish for in this magnificent little device.

Lemmings of the world, rejoice! It's a nice cliff and the water's warm!

Wednesday Jun 06, 2007

POWER6 Goes Thud: Part IV

A Look at POWER6's Lagging Architecture

Ever since Sun announced the world's first multi-core microprocessor, most of the entire high-volume chip-designing world (that world would be Sun, IBM, AMD, and Intel) have realized that the race was on to see who could build a microprocessor with the most cores and the most threads to handle the modern applications that thread-rich, Internet-based computing has spawned. That "most" includes everyone except IBM.

Declaring the end of instruction-level parallelism (ILP) and the advent of the era of thread-level parallelism (TLP) as far back as IDF 2003, Intel has already begun shipping quad-core processors (Sun is the first to ship systems). AMD will follow suit later this year. Sun, of course, set the bar in 2005 with its 8-core, 32-thread UltraSPARC T1 and will be shipping the 8-core, 64-thread "Niagara 2" processor in systems later this year. I'm guessing we'll probably beat the 2-core, 4-thread POWER6 to the volume market by a pretty solid margin, mostly because IBM failed to announce any availability of POWER6 in the volume (or high-end, for that matter) markets.

The reason for this enthusiasm around multithreading is described brilliantly in a whitepaper titled The Landscape of Parallel Computing Research: A View From Berkeley, written by a multidisciplinary group of Berkeley researchers, including the father of RISC, David Patterson. They see microprocessor performance hitting a "brick wall" due to three factors:

  • The Power Wall: "Power is expensive, but transistors are 'free'. That is, we can put more transistors on a chip than we have the power to turn on." Particularly true on a 65nm chip like POWER6 or Niagara 2, unless you do something about it.
  • The Memory Wall: "Load and store is slow, but multiply is fast. Modern microprocessors can take 200 clocks to access Dynamic Random Access Memory (DRAM), but even floating-point multiplies may take only four clock cycles." And increasing the size of the already great big caches traditionally used to mask memory latency aren't giving us a good return on the transistor investment anymore.
  • The ILP Wall: "There are diminishing returns on finding more ILP. ... Increasing parallelism is the primary method of improving processor performance". They dismiss increasing clock frequency as the primary method of improving processor performance as old conventional wisdom.

To illustrate the point, Figure 2 of the whitepaper shows the impact of the old faster-clockrate, bigger-caches mentality that has prevailed in microprocessor design for the last thirty years:

The Brick Wall

The upper right part of the chart shows a lag of processor performance since 2002. The green line coincides remarkably closely to the doubling of performance most people attribute incorrectly to Moore's Law (Moore's Law is a statement about transistor density, not performance; but given the fact that performance had been tracking so closely with transistor density from 1986 to 2002, one could be forgiven for collapsing the two -- but not any longer.)

So, higher clock frequencies are no longer the key to performance any more and neither is using your free Moore's Law transistors on building bigger caches. So what did IBM do? They more than doubled their clock rate (2.2GHz to 4.7GHz) and quadrupled the size of their L2 on-chip caches (1.92MB on POWER5+, 8MB on POWER6). And what performance speed-up did they get for their efforts?

rPerf Relative to POWER5+ To answer that, let's look at IBM's proprietary rPerf benchmark which, according to IBM, is a benchmark that "simulates some of the system operations such as CPU, cache and memory. However, the model does not simulate disk or network I/O operations." Take a look at the right to see how well doubling the frequency and quadrupling the cache size worked for POWER6 [\*] (the green line is, once again, roughly 52% performance increase per year).

POWER6 is clearly in the microprocessor category of diminishing performance returns described by the Berkeley whitepaper because it has pinned its hopes on old, unimaginative, and out of date techniques that the rest of the industry has largely abandoned. For all IBM's hype around POWER6 being "convention shattering", it's still only two cores per chip and two threads per core, just like POWER5+. Moreover, they sacrificed the out-of-order execution that gave POWER5+ a boost to get the frequency and cache size increases. In most respects, its enhancements are completely evolutionary and not at all revolutionary. And in some respects, they're actually going backwards.

There's more. POWER6 is pretty much all we're going to see from IBM for the next three years at least (plus or minus still more clock speed increases!), since POWER7 won't be around until 2010 at the earliest. And there's still no word from IBM on when the missing entry-level and high-end POWER6 systems will show up.

So how is Sun stacking up against this? Sun has already publicly stated that we will be releasing three new 100% binary compatible SPARC processors over the course of the next eighteen months, each one optimized and targeted to different application workloads. And at the 2007 Sun Analyst Summit, Sun's Vice President of Systems, John Fowler, gave a presentation that included this performance roadmap for SPARC (I've updated the little sunburst milestones and once again, I've drawn in the bright green line representing roughly 52% performance increase per year):

SPARC Processor Performance Increases

We're already devastating IBM with Sun's CoolThreadsTM servers in terms of performance, rack space, power, and cooling. By the end of this year, we'll do it again. And as I said in a previous blog entry, when the ROCK systems bring the power of chip multi-threading to the high end, there will be absolutely no reason any customer would want a POWER system unless that particular shade of IBM blue went better with his datacenter décor.

The reason for all this is that years ago, Sun recognized that we were facing exactly those problems that were mentioned in the Berkeley whitepaper and decided to do something extraordinary. Rather than focus on clock frequency and cache size, like IBM, we decided to question everything that everyone knew about how to make a microprocessor go fast. The result was we made a big bet on chip multi-threading, or CMT, that's been paying off since 2005. That's why Sun's CoolThreads servers are the fastest-ramping product line in Sun Microsystems history and why customers are so enthusiastically endorsing them.

And you ain't seen nothin' yet. That's what's keeping IBM up at night.

\* This chart is based on normalizing the following rPerf numbers from IBM to 12.27: POWER5+@1.9GHz: 12.27; POWER5+@2.1GHz: 13.83; POWER6@3.5GHz: 15.85; POWER6@4.2GHz: 18.38; POWER6@4.7GHz: 20.13.

POWER6 Goes Thud: Part III

I was going to devote another blog entry to exposing the benchmarketing perfidy that IBM pulled in their POWER6 announcement the other day, but I can't possibly top the great analyses performed by the enigmatic, masked BM Seer. Thanks, Seer!

Monday Jun 04, 2007

POWER6 Goes Thud: Part II

A Look at IBM's TPC-C Results

One has to wonder what IBM was thinking when they published results on the fifteen-year-old TPC-C benchmark as part of their drive to impress the world about the performance characteristics of the single POWER6TM system they announced the other day. IBM's reasons couldn't possibly have included helping their customers make intelligent buying decisions by reflecting a modern, real-world workload with a reasonable database architecture. Criticisms of TPC-C over the years are legion, and it's probably not useful to repeat them here. Suffice to say that Gartner, IDC, and Oracle have weighed in, and this is one of the reasons that the Transaction Processing Performance Council (TPC) announced a new benchmark in March, TPC-E, which goes much further to reflect 21st-century OLTP workloads.

According to TPC-C's description: "TPC-C simulates a complete environment where a population of terminal operators executes transactions against a database. The benchmark is centered around the principal activities (transactions) of an order-entry environment. These transactions include entering and delivering orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses." It should be noted that there is no set piece of code to execute: the submitter is allowed to architect the system and write the SQL code however he wishes. And this is where the fun begins.

Let's take a look at IBM's configuration, which you can see for yourself in the full disclosure report (FDR) that they submitted to the TPC.

IBM's TPC-C Hardware Configuration

Pretty impressive database engine, isn't it? But did I mention that according to the FDR (page 27), the total database table size was about 13TB? Let me say that again, so you'll know it's not a misprint: IBM used almost 120 terabytes worth of disks (3,482 of them, to be exact) to store 13 terabytes of table data. When was the last time you asked your boss to give you money for 42 4Gb fibre channel connections to access 13TB of data? Or funds for 3,482 spindles so you could use only 10% of the space on them? Where do you think you'll be able to dig up 3,312 36GB disk drives? Are you even running DB2 in your datacenter?

I should mention that nothing IBM did here was in violation of the rules of the TPC, but it strains the imagination to see how their result could possibly be useful to anyone making OLTP database performance comparisons. This is why Sun hasn't published a TPC-C result since 2001, preferring instead iGEN OLTP or running real applications such as Oracle Apps or SAP.

If you still believe that IBM's TPC-C results actually tell you something useful, then take a look at the following chart, compiled by my colleague Pavel Anni using IBM's previous FDR:

IBM's TPC-C RESULTS System p 570
System p 570
Number of CPUs: 8 8 0%
Number of Cores: 16 16 0%
Number of Threads: 32 32 0%
Amount of RAM, GB: 512 768 50%
Frequency, GHz: 2.2 4.7 114%
tpmC: 1,025,170 1,616,162 58%
$/tpmC: 4.42 3.54 -20%
List price, server: $3,407,192 $2,141,159 -37%
List price, complete: $7,621,279 $9,419,485 24%
3-Year Maintenance: $346,608 $567,162 64%
Discount: $3,423,619 $4,273,465 25%
Price, including 3-Year TCO: $4,544,268 $5,713,181 26%
Database Software: DB2 UDB 8.2 DB2 Enterprise 9 N/A
Availability Date: May 31, 2006 November 21, 2007 N/A

At least IBM has given you something to compare the performance of the old and new processors, right? Think again. If that were IBM's benign intent, why did they add 50% more memory? Why did they use a different version of DB2? Customers looking to upgrade their POWER5+ systems to POWER6 have no idea how much of the 58% performance improvement was due to better performance of the new processor and how much was due to the larger memory configuration and newer software. And considering a 114% increase in clock rate (and a quadrupling of their level-2 cache sizes), why did they achieve only a 58% performance boost? And where are IBM's power consumption and cooling numbers? TPC-C does not require reporting such data, but I'd be willing to bet that POWER6's much-ballyhooed power reduction features were inactive for this run. Especially since their press release said that customers could choose higher performance or lower power consumption, but not both.

So IBM (1) wasn't interested in running a real-world configuration on a modern-day workload, even though TPC-E is available; and (2) wasn't interested in giving customers a valid apples-to-apples comparison with the old benchmark. So why did they do this?

Giving customers useful information is not what this is about. If you don't believe me, here's what IBM Fellow Bruce Lindsay said (page 73): "Well, the benchmarking business is dirty work. The idea is to get the numbers by hook and by crook. ... [M]uch of what we're doing in the TPC-C realm these days is in a performance range that goes beyond what any user is doing." There's a word for this in the industry: benchmarketing. And IBM is the best in the business.

They understand that customers are hungry for a number, any number, however irrelevant, because it's much harder for a customer to take the time to benchmark his own code on a try-and-buy system or in a benchmark center. I can understand this; I used to be in the exact same situation in a previous job. I know how hard it is to be under the gun to finish a proposal or make a project deadline with no time to do the kind of rigorous performance bake-off that is truly necessary to make an intelligent purchasing decision. But IBM is well aware of your predicament too, has been taking advantage of it for years, and will continue to do so until their customers speak up.

TPC, TPC Benchmark C, TPC-C, tpmC are trademarks of the TPC. Please see www.tpc.org for more details.

Sunday Jun 03, 2007

POWER6 Goes Thud: Part I

Or, How Clark W. Griswold Wound Up With the Wagonqueen Family Truckster

A little over a month ago, Sun announced seven new Sun SPARC Enterprise Servers, along with new virtualization capabilities, new reliability/availability/serviceability features, breathtaking memory and I/O bandwidth, and new world records on seven performance benchmarks. Every one of the new servers -- entry level, mid-range, and high-end -- is shipping to customers today. Including, of course, the ones that have those new capabilities and set those performance records.

Can you imagine if we had issued a press release announcing:

  1. only one of the seven servers, say, the eight-way mid-range M5000
  2. extravagant performance over existing servers, providing as "proof"
  3. "ultra-high frequency" processors but absolutely no data about whether the machines will turn your datacenter into a puddle of molten metal
  4. a promise for a set of software features and virtualization technologies that exist today only in other people's operating systems
  5. absolutely no word on where the missing six servers were or when they would be available

We wouldn't have had the nerve to face our customers the next day. Apparently, IBM doesn't grapple with perception issues the same way we do, because what I just described is exactly what IBM foisted on the public two weeks ago when they gave us an IOU for a new POWER6TM product line.

Wagonqueen Family Truckster I'm reminded of the scene in National Lampoon's Vacation in which the hapless Clark W. Griswold drives into Lou Glutz Motors to pick up his new Antarctic blue Sports Wagon with the C.B. radio and Rally Fun-Pack only to be told by the slimy salesman, Ed, that it won't be in for another six weeks. When Clark demands that his trade-in be returned immediately so he can take his business elsewhere, he discovers that it has been smashed pancake-thin in the scrapyard metal crusher. Knowing that Clark has nowhere else to turn, Ed convinces him that the metallic pea Wagonqueen Family Truckster (there are dozens of them, unsold, on the lot) is the car he really wants to take his family across country to Walley World.

I predict IBM will be forced to do the same thing by selling customers down-clocked versions of POWER6 for a long time (they announced 3.5GHz, 4.2GHz, and 4.7GHz parts, and benchmarked the 4.7s). If you look closely at the full disclosure report that IBM turned in with its TPC-C results, you will notice that it lists an availability date of November 21, 2007. IBM announced the POWER6-ized System p 570 on May 21, 2007. I'll do the math for you: that's exactly six months to the day in between. Would you like to take a stab at what's the absolute limit the Transaction Processing Council places on the period of time between announcement of a result and the availability of a system? If you guessed "exactly six months", give yourself a pat on the back. Next question: what price does IBM pay if they don't make that November deadline and have to rescind the result? If you guessed "precisely $0.00", go have a congratulatory beer. But they will have gotten six full months of penalty-free hype, which is, after all, the point of running TPC-C in this day and age in the first place. In the meantime, IBM will try to sell customers a bunch of down-clocked Family Trucksters whose performance on real-world applications can only be guessed at.

Obviously, I can't say that IBM won't be shipping any 4.7GHz POWER6 systems in the next six months, I'm just saying that you probably won't get one. When a company has little to offer in the way of technology innovation except ratcheting up the processor's clock rate, it pushes the laws of physics into areas where it gets extremely low yields on those dies. These are of course sold at a huge price premium and then allocated only to the best customers, typically under an early-access program. The Griswolds out there will just have to wait to see the performance promised rather extravagantly by IBM's marketing department. And if IBM promises you a 4.7GHz system, how confident do you feel, based on their apparent lack of confidence, that they can deliver? Got a schedule you need to keep on your journey to Walley World?

When Sun's internal engineering aliases were abuzz with IBM's IOU-for-an-announcement, I sent out an e-mail predicting that IBM will follow standard Imperial procedure and dump their garbage before going to light speed. (I know, I'm mixing my movie metaphors. And only one person wrote back saying he got the reference.) But sometimes you just get the feeling that history is repeating itself.

I'll be taking a closer look at IBM's claims around performance, virtualization, AIX, and more in the following posts. In the meantime, don't let anyone try to sell you the Family Truckster ;-)

Monday May 14, 2007


For those of you whom I have not had the pleasure of meeting yet, my name is John Meyer, and I'm one of the seven SPARC server technical specialists in Sun Microsystems' U.S. field organization. My travels have taken me to many a different customer since I joined Sun over 13 years ago, but it was not until my good buddy, fellow office practical joker, and mentor Dave Edstrom started blogging recently that I thought I would take up the keyboard and start getting my thoughts out to cyberspace as well. I'm very passionate about Sun in general and SPARC in particular.

"Why should my long-winded rants, sanctimonious barbs, and brutal enthusiasm be confined to Sun's internal e-mail aliases? This is good stuff!", I thought to myself. It was also a great chance for me to Photoshop a cool logo and name my blog after my favorite scene from one of the funniest movies ever made (not to mention a cult classic among the systems engineers in the local Sun office).

If you're wondering why I'm so excited about SPARC after all these years, have a look at this chart, which summarizes the state of the SPARC ecosystem in early 1994, the year I started work here. Our biggest competition in the systems area was DEC's Alpha and we stacked up like this:

MY WORLD IN 1994 SuperSPARC Alpha 21064A
Frequency (MHz): 33 - 75 200 - 300
Process (μm): 0.8 0.5
Operating System: Solaris 2.2 was slow and buggy Ultrix and VMS were world-class
Service and Support: Break-fix, immature Legendary

You should keep in mind that 1994 was a time before most people knew what the Internet was (I had to explain it to my mother when I landed the job at Sun; now she has a webpage, downloads music from iTunes, and tells me how to auction stuff on eBay). It was also a time when clock rate really was the leading indicator of performance in a system. Caches were still fairly new and uncomplicated, and interconnects were relatively simple. We were getting beaten hands-down on benchmarks for all the right technical reasons by Alpha, and I mean by miles.

But you know what the punchline is: DEC and Alpha are both gone forever, and SPARC is not just still around but on the leading edge of microprocessor and open systems technology. If we could win with what we had back then, we can certainly win with what we've got now.

I'm convinced that the reason for our victory is that our customers sensed the same thing that I've always known about Sun. That is that we are one of the very few systems companies left who are doing anything really interesting. When not a single customer or analyst thought there was anything wrong with proprietary hardware and software, Sun saw open systems as a business strategy and forced all competitors to follow suit. When most people thought a network was no more than a tailpipe to a mainframe, our motto was "The Network is the Computer" (pretty obvious now, eh?) and we developed Java, the lingua franca of the Internet. And now we're often told by the purveyors of conventional wisdom that Sun's investment in SPARC and Solaris are a waste of money and resources. Why develop a processor and operating system when Intel and Linus (or Bill) can do those things for you? The reason is simple: if we only did what everyone else is doing, why in the world would anyone buy from us? That's usually when those purveyors of conventional wisdom fall silent.

I firmly believe that chip multithreading is the single most important technological advance the microprocessor world has seen in at least two decades, and we're the only ones who have it. A few months ago, I met with the CIO and CTO of a major financial firm in New York for a SPARC futures briefing. When I got to the part about ROCK, I told them, "This is the point where we finally put IBM out of the server business forever." They thought I was kidding and smiled, but when I told them that I was dead serious, they knew I meant it as only a fevered lunatic can mean something.

I also know that Solaris is unquestionably the most reliable, scalable, performant, and secure (not to mention technologically interesting) operating system in the world today. Don't take my word for it: when was the last time time you saw anything in AIX, HP-UX, Linux, or Windows beat out thin-film solar energy panels and powdered inhalable insulin as the Wall Street Journal's number one most innovative technology of the year?

In short, it's just fun to work at Sun right now. Our customers know they get a competitive differentiator, not just a world-class piece of electronics, from us.

So I hope you'll tune in occasionally to read my thoughts, tolerate my rants, forgive my mistakes, and participate with me in this stunning new world of SPARC.




« July 2016