Friday Jan 25, 2008

Survivor Bias

Apropos second generations, our second generation ATCA lineup was announced and it has 10G all over. This Eweek article
was printed back in November, but may still be news to readers of my sparse blog.

10G and ATCA are a great fit because no optics are needed to run 10G between blades and fabric. The server 10G cost barrier is gone. Gone is another 10G adoption barrier, the need to upgrade both the server and the switch at the same time. Our ATCA refresh is quite comprehensive, with UltraSPARC T2 and x86 processors, with 10G in the blades (Neptune), in the fabric, through RTMs, and with server and packet processing on the same blade (see previous blog entry).

As all-things-ATCA get interesting some observers ask about the one or two competitors that quit. As a Telco and Carrier platform supplier we definitely like the technical, market, and standard attributes of ATCA. Though some may argue that our perspective carries "Survivor Bias".

The Survivor Bias artifact, in finance and in statistics, is the hazard of excluding population samples that did not survive. Like measuring the growth of all public companies yet overlook the companies that went under. Or worse, the ones that went under and survived (ask me how I know).

Recently I tasted Survivor Bias when our instructor wrapped up a leadership training course. The closing, the Grand Finale, was an attempt to put our professional travails in perspective by hearing what famous people thought really mattered in life, in retrospective, mind you. I silently cried foul. I felt that the success of these quotable luminaries tainted their collective wisdom with Survivor Bias. Their perspective devalued the priorities that consume most of my daily energies. And sure enough they no longer worry about the daily grind. Their job is done.





It took me a while to recognize two distinct objections at play, one has to do with judgment calls a posteriori, once the outcome is known, and the other with the effect of survival itself on the validity of the data.

The temporal objection is against the natural diminution of bridges already crossed. If our preoccupation is to be a strong link in a much longer chain, once the chain goes on, our opinions and priorities are of much lesser value. My objection against judging past priorities stands.

Outside of hindsight the value of the survivor perspective may depend on the survival process. The Car Talk radio show had a puzzler where a mathematician recommends armor plating WWII airplane wings and fuselage in spots where the returning airplanes showed no bullet holes. Absence of holes does not mean the area is not exposed, but possible vulnerability, if there is a correlation between planes hit in those spots and planes not returning.

This is just to illustrate how survivor data validity depends on whether we think survival stochastics are random or correlated. The example is neat because given a large sample, the mathematician can determine both the existence of the correlation, and the areas to protect.

Back to ATCA, our perspective is timely rather than "a posteriori", so we are safe there. And Sun's Netra platform success in the Telco market was not random, so on the correlation front we are peachy. Finally, we vote our perspective with a new generation of ATCA products. Every generation reinforces wings and fuselage looking at its bullet holes. And at the bullet holes of competitors that did not return from the sortie.

Today's Links:

Sun Netra ATCA Blade Server

UltraSPARC T2 on ATCA

Opteron on ATCA

High Throughput Packet Processing White Paper



Technorati Profile
[ Technorati: NiagaraCMT, ]


Add to Technorati Favorites

Tuesday Oct 09, 2007

A Beef with Darwin

A special post indeed in this series of amorphous blogs initially inspired by CMT topics. The arrival of second generation CMT systems (the T5120 and family) is a critical evolutionary step, and a good excuse to contemplate the evolution theme.

Interpreting Darwin's evolution as the natural selection of "good" heritable traits, my beef is that bad traits must “express themselves” before they can propagate, for selection to work. If we have children in our twenties but get really sick in our seventies, it may be too late for Darwin to kick in. We could even postulate an evolution path towards species that are perfectly healthy until reproduction age, but not a day more. NiCd battery species, kind of.

It is not fair to mess up with Darwin when he isn't around to respond. We can taunt contemporary scientists instead by stating that thanks to evolution the path to longevity is through stretching the reproduction age rather than expensive medicine. Let's postpone marriage by 40 years and watch life expectancy soar! There. Taunted the health sciences community plus many other innocent bystanders, let's now run to technology topics for cover.

Darwinian world or not, computer systems don't randomly mutate traits over infinite product generations, traits are introduced deliberately and rather frequently. Before creationsists celebrate the “deliberate” aspect, let me debunk intelligent creation with a single stroke, a three-finger stroke to be precise. The infamous CTRL-ALT-DEL single handedly debunks both intelligent creation and evolution theories for computer systems. OK. I take back "single handedly"...

Products and technologies are vulnerable in their long term survival, much more so than dinosaurs were in their day. Second generations, like the T5120, are then as much about product family continuity as about constant technology improvement. Improvements that are as natural as the desire for our children to go beyond our own reach.




CMT waltzes to the cadence of Moore's Law, and T2 boxes are here less than two years after the T1 boxes. True to the spirit of the Law, T2 systems doubled the number of physical threads per socket. The thread bump has replaced the speed bump. The bump is a nice integer 2x factor, actually more than 2x, after all the T2 processor also has a faster pipeline, a larger cache with higher associativity, crypto acceleration, more memory bandwidth, a no compromises floating point unit per core, and built-in 10G networking.

The links below have plenty of data on T2 systems across different workloads, so I will stick to the networking angles for now.

High Speed Networking evolved in multiples of ten. 100 Mbps around 1995, and two 10x factors since then took us to 10Gbps. Processors doing 2x every two years would be a factor of 64, modulo sampling noise. As processors and networks evolve, the question is how to jump across these moving trains. When should the next network speed be adopted? The answer is easy if we agree on who is at the center, the host or the network.

If your religion is host centric then, for their own sake, put servers on 10G as soon as they can do more than 1G. If network infrastructure is supreme, then build its temple only with processors and servers that can do more than 10G. Copernicus isn't around either to settle centricity questions, so we built these T2 based systems to satisfy both cases.

No compromises 10G networking infrastructure on general purpose processors and general purpose COTS platforms. Specifically, with dual 10G Ethernet interfaces, no I/O bus bottlenecks, multi-threaded networking, packet classification, virtualization, water tight domain isolation, policy, asymmetrical multi-processing, packet processing pipelines, Crossbow, data plane, crypto, short packet efficiencies, all coming together to a single socket box or blade near you.

Just like a deep analysis of Darwin takes you to his actual writings, the proof points around Sun's Unified Network Platforms, consolidation,
packet processing, and the movement towards data plane on general purpose platforms may require going through white papers and kicking the tires with reference apps. Here is a start:

High Throughput Packet Processing White Paper

Radical Consolidation White Paper

Links:

T5120, T5220, T6320 System and blades Launch blogs

UltraSPARC T2 Systems Launch on the Web



Technorati Profile
[ Technorati: NiagaraCMT, ]


Add to Technorati Favorites

Tuesday Aug 07, 2007

Ali G and Kanazawa on UltraSPARC T2 (nee Niagara 2)

With the announcement of the UltraSPARC T2 processor it is time to spare you my tired words on life, technology, and the pursuit of happiness, and defer to two great video guests: Hirokasu Kanazawa and Ali G.

Hirokasu Kanazawa, a most graceful martial artist, demonstrates the power of combination attacks. The 1-2 punch. Kanazawa teaches rapid succession of techniques, the second being more powerful than the first. Just like the succession of CMT processors UltraSPARC T1 and T2 (aka Niagara 2), a well executed combination is more effective than its individual parts.

Then we turn to none other than Ali G, hosting a science and technology expert panel. Ali G explores if massive computing capabilities (like the UltraSPARC T2), can make our lives better. He wisely probes if such machines can really tackle large multiplications without blowing up (a clear allusion to the power and thermal challenges of pre-CMT processors).

Niagara 2 themes, by two of the best.
Keep it real.

PLAY VIDEO (Quicktime)

PLAY


[ Technorati: NiagaraCMT, ]

Wednesday Jun 06, 2007

No Free Lunches

Today's motif is the nonexistence of free lunches. To dig into these allegations I turn to my brother, whose answers make more sense than my questions, and to the Internet. Metaphorically a Free Lunch is getting something for nothing. The expression goes back to a time of taverns, where drinking patrons were enticed with free food, hence the expression -no free lunch- meant to expose hidden costs, hidden in tavern drink costs.

“E finita la cuccagna!” is an Italian variant, as proclaimed at his inauguration by Fiorello La Guardia, the Mayor after whom a New York airport is named. His “no-more-free-lunches” was a call against government graft. I don't know about government, but the proclamation still echoes at La Guardia airport where you can hardly get a free or even a cheap meal.

There is no such thing as a free lunch was popularized by Milton Friedman, yet paradoxically Asset Diversification is seen as a case of a free lunch in the risk-reward tradeoff. Either Milton was not aware of this investment panacea, or financial planners are colluding to sell us unnecessary positions.

The laws of Thermodynamics dismiss the odds of free lunches in the physical world, but in spite of roadblocks like the No Free Lunch Theorems, there is little deterrence against the pursuit of free lunches in the world of information and optimization. People serve hard time for violating laws not theorems. Here my brother weighs in categorically. A better solution to a problem just means that the previous solution was sub-optimal. Sleep 14 hours every other day and then try 7 hours every day. You may feel better rested for the same proportion of sleep, but calling the new sleep regime a free lunch does not make any formal sense. Overwhelmed by analogies I retreat into this compromise:

A bona fide free lunch must be repeatable.

Not just a few free meals here and there, but a systematic way of repeating a win-win optimization. Let's be cautious that repeatability not become a demand for a perpetuum mobile or an infinite nutrition source since the USPTO has stopped granting Perpetual Motion patents without a working prototype. A free lunch is something you can base your finite diet on.

After buying drinks for a couple of rounds the net infrastructure is hungry for free lunches. It craves new services, higher ARPUs, lower OPEX, and a lower power footprint. What else is new in the world of slideware. Mobile devices and laptops on the left, some boxes on the right, and the proverbial cloud in the middle. Services flow from right to left, revenues go the other way. Psychiatrists whose patients see internets in Rorschach tests are perplexed, the cloud slideware has permeated everything. Me, I see gateways. Every computer inside or attached to the cloud transforms and moves stuff between its interfaces. I see complex and stateful transformations in the cloud, everything in that cloud is a gateway and I am not crazy doctor, even the puny devices on the left are gateways, doctor, with humans on one side and the net on the other.

For IP convergence these packet gateways need the intelligence to straddle heterogeneous protocols at various layers. When built out of traditional processors, gateways exhibit a brain vs. muscle tradeoff. More complex packet processing means lower throughput. As carriers add services they lower the gateway throughput (or burden per subscriber cost). This work-throughput tradeoff is like carrying water with a bucket. For me it has a negative slope when plotted against distance, but can be flattened if I summon enough muscular friends. By swinging buckets from one to the other we can maintain the water rate to any distance. This is repeatable, so the Bucket Brigade may represent a free lunch until I run out of idle friends with buckets and the graph slopes down.

Similarly for gateways, we can use threads to pipeline packet processing. More complex processing inserts more stages in this “packet brigade” without sacrificing throughput. Wishing low thread to thread communications costs means that the threads are ideally packed inside a processor, and the densest general purpose pool of vertical threads would be a CMT processor like Sun's upcoming Niagara 2. If said processor happens to have a couple of built in 10G network pipes to get the "water" in and out, there you have your gateway engine. The rest is just a simple matter of programming...

Seriously, the software angles deserve their own blog entry. Promised. Today we just point out that a neat way of parallelizing execution is actually serial (i.e. pipelined). In a thread rich future this is neat because it is a free lunch: increased computational work at constant throughput, repeatable. It works by tapping your idle friends (CMT threads), which incidentally can be summoned for a packet brigade using the just released Logical Domains (check out http://www.sun.com/ldoms).



[ Technorati: NiagaraCMT, ]

Monday Feb 19, 2007

Russian Dolls

How much information entropy does my Thursday evening phone call have? Very little. It's Thursday evening. She knows I want her nod for the usual beers after soccer. She senses I am already at the bar; fait accompli. Not much entropy in her response either. I may get a rare yet firm “No” if I forgot a school open house, or God forbid, an anniversary. With proper attention to calendar minutia, the entire phone call would carry zero information, the answer would always be “Yes”. With my fallible memory it carries one bit of information. The Grant or Deny bit. I don't squander any more bits on apologies over the phone. Love is never having to say I am sorry in front of ten implacable teammates at a bar.

But that permission bit takes much infrastructure and overhead. Digitizing her voice at say eight kilosamples per second, packetizing the cell phone air interface traffic, and establishing the call from the bar through signaling protocols. Oh, plus billing record updates (the constitution does not codify free phone speech anymore than free beer). Thousands of bytes exchanged to settle if I go home before or after a cold pitcher. Overhead paid for the flexibility to call anybody, the flexibility to carry digital information, and the flexibility of layered modular architectures that adapt to new uses with contained changes.

Like Russian dolls one inside the other, and the last tiny one is her Yes or No bit.



We may contemplate the dolls individually, like engineers trained to think horizontally within the layer of interest. But it is tempting once in a while to reflect vertically about multiple layers. Open the dolls one by one. A Russian doll introspection of sorts.

The layered paths of high speed networks mimic these Russian dolls. Serial packet switched wire protocols have become a bit parallel lately as transceivers use multiple lanes (10 Gigabit Ethernet XAUI for example), and the outer doll, the traditionally parallel buses that host network interfaces have lately become kind of serial, like PCI-Ex using packets over serially abstracted lanes. In an actual system these dolls are tortuously encapsulated as data traverses the network interface, the I/O interface, and terminates in a really wide system memory, whose physical interface may paradoxically be serially packetized (a la FBDIMM).

My usual reflection when thinking across layers is awe. Surprise that this complex tangle works reliably or works at all. But coldly thought, the complexity is ostensibly an artifact of the modularity that ultimately simplifies each layer, so that we humans can get them right. One doll at a time.

A doll we have been crafting and fitting is codenamed Neptune (how original). It is an interface device attaching servers to 10 Gigabit networks. Neptune may soon get another name to avoid upsetting some trademark lawyer somewhere in the solar system, an official, original, and hard to remember name. But to me she will always be Neptune.

So far our systems were mostly attached to Gigabit networks, and as these systems get more powerful they don't deserve to be on Gigabit networks anymore. Visualize 10 Gigabits per second as a bigger door into and out of a server. Curiously big doors are useful at both ends of the housing market: Monster Homes and Affordable Housing. Monster homes are big systems deployed for raw performance and a specific purpose; everything is big about them not just doors. Affordable housing are the systems aimed at accommodating multiple subscribers at minimal cost per subscriber. Multi-tenancy of subscribers if you will. Their simultaneous traffic needs also require big 10 Gigabit doors. These systems are a natural fit for CMT processor architectures, but that is a different story.

(Listen to Bob Sellinger for the story that ties CMT processors, subscribers, and economics in his "Getting Ready for 4G" webcast, by clicking his link at the bottom. Later though, you are reading about dolls now).

Neptune is more than a big door into a server, at least two doors. Two 10 Gigabit Ethernet ports because most infrastructure deployments are dual homed for redundancy.

Neptune does the serial-packets-to-memory-to-packets again when mediating between 10 Gigabit networks and PCI-Ex interfaces. And it uses every trick in the book to minimize the impact of such byzantine path. It tackles the affordable housing problem where tenants have separate corridors to their apartment units, and in the process Neptune also solves the traditional network receive scalability problem. Traffic is segregated into separate internal and separate system resources, ultimately targeting different threads/cores to service different traffic components.

Where is the novelty? Up until now we used to first queue packets into the server, and then classify them for the purposes of distributing traffic up the stack. Neptune first classifies and then queues, and that makes all the difference. Now we can have asymmetrical resource usage models, we can apply policy, we can virtualize, and of course eliminate the nasty head-of-the-line-blocking introduced by first queuing and then sorting things out.

Some Neptune uses are already in place, some are in development, and some are just sketched on napkins (blog drafts maybe?). Traffic spreading is already built into the Neptune device drivers, for example. Extending the reach of multiple container and virtualization technologies all the way into the network interface is a progression that spans Solaris containers, the upcoming Crossbow project in OpenSolaris, and Logical Domains, just to name a few of our Russian dolls. Besides these examples Neptune can and will fit into other dolls in various markets and communities beyond Sun products.

With its CMT lineage, Neptune matches processors with high degrees of threading and concurrency, so much so that we even put a mini-Neptune inside the upcoming Niagara 2 processor, conjuring the familiar metaphor of a doll inside a doll.


LINKS GALORE:

Neptune adapter

Niagara 2

Yes, you should listen to Sellinger now



RELATED BLOGS:


Crossbow & Neptune, by Markus Flierl

Simon Bullen Networking Blog


PODCAST:


Hal Stern talks about Neptune with Muller, Nordmark, and Saulsbury




[ Technorati: NiagaraCMT, ]

Tuesday Nov 07, 2006

The more I know men the more I like my blog

Or was it my dog? Some days I relate to that locution, and I tie it somehow to Diogenes. The internet consensus suggests the author is unknown. Unknown is a very prolific author since the internet. Specially since Blogs, these anonymous sources of wisdom, where anonymity is easy and attaining name recognition is hard.

There are no new ideas left, blogs are at best a new mold for old ideas. Blogs may carry a novel mindset, or merely a new mass publishing technique. A push-pull model of sorts for spam. The blogger mindset I see could descend from Diogenes the cynic. Diogenes, self proclaimed citizen of the world, presumably created the cosmopolitan concept. His cynics placed reason above convention, and argued that if an act is not shameful in private it should not be shameful in public. Strikingly reminiscent of the blogger mindset, at least the blogging community at Sun. “Of what use is a philosopher who doesn't hurt anybody's feelings?”, with that one I rest my case on Diogenes' blogging paternity.

Unconvinced? OK, let's settle on a fictional ancestor, Blogenes, and move on. The point is that neither the ideas nor the mindset are new. The only novelty is in their expression. The blog expression starts with our ability to capture, share, and interact electronically. Capturing our thoughts wherever we are, even in the restrooms. And with his record on private vs. public acts Blogenes would have taken his laptop to the restrooms. Actually, who hasn't?

This new expression lets me write about old ideas, like convergence, like the unstoppable force of general purpose volume technologies, and of course, the smart use of concurrency to exploit the newfound parallelism of CMTs. But why inflict old ideas onto others? Because, remarkably, some old ideas are themselves finding new expressions in new silicon.

For example, I have been blogging about CMT (Chip Multi Threading), as the combination of multi-core processors with vertically threaded cores. Neither is new, the advantages of evolving into multi-core processors instead of faster processor clocks is undisputed by now. Vertical threads go back at least a decade when architects I know like Laudon , Yamamoto, and Nemirovsky examined the virtues of hiding memory latencies through multiple hardware threads on the same processor pipeline.

By now all Sun's CMT processors incorporate four vertical threads per core. Processor threads are hard to visualize, so here is a picture. They are vertical, and when the rubber meets the road CMT speed relies on four of them per processor core. In fact simultaneous multi-threading has been tried in x86 architectures. It had only two wheels, wasn't that fast, and has already been abandoned.

Convergence is superficially the notion of unifying all networks (voice, signaling, data, video) into an all IP network. For those of us seeking the simple certainty of dogma, “In the beginning God created the Packet”. And for the many whose only dogma is cost, convergence is all about general purpose volume market costs. And that is fine, because as a cynic placing reason above convention, convergence is not only about unifying the networks but also unifying the processing building block used to build them. Same processing element for voice, signaling, data, and video, same for control and data planes.

Now, when picking a building block for a converged network, I insist, infrastructure ain't built on laptop parts. Laptop processors are made in volume allright, but they neglect the central tenet of the converged network faith: The Packet. The new expression of a converged building block is a General Purpose processor designed for superior packet processing, and for now that means two things:

1) Vertical Threads
2) Native Network Interfaces

As simple as dogma.

I already touched on 10 Gigabit native network interfaces in my previous post, so let's talk Vertical Threads. Vertical Threads, as intended, hide cache miss latency, and packet practicioners know that caches are evil to start with. There is no temporal locality in packet arrivals other than in the short term burst. Things may work fine for a small number of subscribers and collapse when the subscribers state no longer fits in the processor cache. Vertical Threads also eliminate the other packet nemesis: Interrupts. A multi-threaded processor with 32 threads like the UltraSPARC T1, or its 64 thread Niagara 2 successor, can dedicate a few threads to ingress packet processing (dedicate as in "fully devote without interrupts") and deliver no sweat wire speed packet processing in a general purpose processor.

Again, the converged building block is multi-threaded, not just multi-core. Multi as in “lots of threads”, not as in “more than one”. Vendors that don't have lots of threads complain that the software is not ready for the high levels of concurrency of lots of threads. Wait for software to catch up, they say.

Blogenes would retort with a categorical locution: “If you control something you make it work, if you don't you complain”. I bet these vendors do not control the software that needs to get ready for concurrency. I say this as bluntly as I say that volume does not mean laptops. After all “Of what use is a blogger who doesn't hurt anybody's feelings?”



[ Technorati: NiagaraCMT, ]

Wednesday Sep 13, 2006

Win a BMW

To ruin a friendship, nothing like taking a long trip together or jointly owning a car. Way before Sun adopted the "Share" mantra I did the car sharing thing against all advice. Jim argued over lunch that front wheel drive was a ploy by car manufacturers to reduce assembly costs, and that good cars (Bimmers and Mercs) were rear wheel drive for a reason. Jim, my mentor, is articulate and hands-on in mechanical and electrical engineering. So off we went to see the first cheap BMW we saw on the paper, and off we bought it; 50/50. A 1977 gray market with a New York State salt rust terminal disease. We would work on it together, and alternate possession every Friday noon at our Checkpoint Charlie, the wafer fab parking lot.

We were too immersed in drivetrain catechism to notice the terminal rust. We gave it our best shot and fixed what we could. It gradually sunk in that the drivetrain would outlast the body. Engines wear with mileage while rusty shells wear with time, so we put mileage on the car as often as we could to balance its demise. Like eating fondue and finishing the bread and the cheese at the same time. Not easy. The car became a long distance cruiser for IEEE meetings on the east coast, and was also loaned to a moonlighting friend going through medical school. Became the rusty airport Limo at JFK. We squeezed nine years and untold miles with the cheese fondue strategy. Not exactly trouble free miles, but you learn more from old cars that break down than from new/reliable cars. We learned that electron losses are largely irreversible in metals, at least for car fenders. We also learned about sharing. Not as in giving a part of what you have, rather as in building trust by doing something together.

By the time I came to California the car was no longer safe to drive. I left it with Jim on Long Island, and never had the courage to ask if it was parted. But I saved the key for gimmicks, that is, ceremoniously placing the BMW key on the table as my wager on a bet. A riskless bet, mind you, given that the key was the only part of the car that didn't rust to pieces. The Win-a-BMW fad peaked a couple of years ago, when working on how to integrate networking into our Niagara processor.

The bet was to name an absolute invariant of how a new Sun system would be used, reminiscent of the generality issue of Walking with a limp. It is hard to know what apps the system would run, it is hard to exactly anticipate what OS type and release will host the apps, how storage will be attached to the system, or even to determine which instruction set (SPARC or Opteron) will dominate a particular deployment. Name the system invariant and win the BMW key.

The right answer is that the system will do IP over Ethernet for a living. Or in more formal terms, our systems are deployed presenting an application or a service through a network using the IP network layer over an Ethernet data link layer. Niagara 2 has integrated networking interfaces because when you do something for a living you'd better do it well. And these interfaces run at 10 Gigabits because, borrowing from Howard (mentor #2), a Niagara 2 does not deserve to be on Gigabit Ethernet.

Well, the same interface integration topic is coming up in the press since our Niagara 2 preview. Some of the industry debate is captured by this article.

I would be ready to interject the BMW key gimmick into this debate (offer void where prohibited), but for integrated networking I already disclosed my answer. Now, what is the question?



[ Technorati: NiagaraCMT, ]

Tuesday Sep 05, 2006

The unbearable lightness of being stateless

Ever gone through an ascetic period of feeling better by owning less? The lightness of moving to college with just a couple of bags, quitting a job and selling most possessions. No entanglements, no commitments. This lightness is not about travelling light, suitcases come with wheels nowadays. And it is neither about a hermit without belongings, nor a surf bum with or without waves. It is about a martial artist that cannot be disarmed, because he is the weapon, the comfort of a possession that cannot be taken away.

Here is the test. Take laptop, keys, passport, passwords, checkbook, credit card, and wardrobe. Stateless is being able to function again within a day of losing them all. You pass the test if you don't need all that stuff, or can recover the loss in short order. If the loss is a big setback, welcome to the club, and to the quotidian stress of preventing the loss.

We are doomed by modern life complexities to only experience the lightness of being stateless for few yet memorable periods. But modernity also helps, through centralization, for example. Using banks and ATMs rather than stuffing currency in mattresses and wallets. Similarly for data. Service providers and employers help keep our electronic data in presumably safer places than laptops or digital cameras. Modernity helps statelessness by delegating the storage and protection burden to somebody else.

On the flip side, the stateless road warrior became an endangered species through the overexploitation of the laptop. Personal and corporate lives go with them in their hard drives. There is no lightness there, unless of course the laptop is used as a communication device rather than a storage device. A thin client that kept its diet except for some data caching here and there.

The crux is feeling as light as travelling with no luggage yet avoid the deprivation of owning nothing. The essence of being stateless is knowing that whatever we carry isn't critical to our functioning, or can be easily recreated. Bad things happen, and it is all about how fast we recover. Same for infrastructure computer systems, that is, the systems that centralize our funds instead of stuffed wallets, the systems that centralize our data instead of lugging our lives on a laptop, and of course the systems that provide the wireless network cloud so we can be stateless yet always connected.

Systems based on CMT processors, like the UltraSPARC T1 processor, or the just previewed second generation 64 thread CMT Niagara 2 processor, can be viewed as horizontal scaling within a chip. And soon they will become domainable with the introduction of "Logical Domains". These Logical Domains can also experience the lightness of being stateless. But what burden of baggage can these domains possibly want to shed? What entanglements and possessions is a server stressed out about? The burden is the I/O, the data stored in disks, the observable behavior over network attachments, and the idiosyncracies of a modular I/O architecture. Without all these, servers are carefree souls.

The liberating part of Logical Domains is precisely that one can create surf bum domains that do not own any I/O, and in fact most domains in a CMT system will not own any I/O. These are not hermit domains crunching numbers away in seclusion, they are rather domains that rely on somebody else for I/O. They delegate the burden of I/O bus ownership, probing buses for devices, loading device drivers, and recovering when bad things happen; as they do happen. Applications and services can be hosted in multiple such stateless guest domains. And when bad things happen to a guest domain, they get back on their feet really fast, because they have no I/O bus topology to probe, and no I/O devices to initialize.

Early into this CMT blogging thread we claimed that a CMT system can mimic the attributes of discretely deployed horizontally scaled systems, now with Logical Domains it can surpass the master. It can sustain guests that lead an I/O stateless lifestyle. Every day.

Logical Domains are coming to SunFire T2000 and T1000 Servers among others. The free SunFire server trial program is in perfect harmony with the lightness of using a server without really owning it, let alone the path to Nirvana through sharing the details of some impressive use of the box.



[ Technorati: NiagaraCMT, ]

Saturday May 27, 2006

Intermezzo

A quick Intermezzo before I get to "The Unbearable Lightness of Being Stateless". A kind of update to bring other names, places, and faces to my narrative. The magic of number three, a name, a place, and a face.
Name: Montoya

I have been rambling about how CMT applies horizontal scaling within a chip, and how it eliminates memory sprawl in servers and network elements, well, one of the emerging platforms for such server and network element deployments is ATCA, and it is only natural to put UltraSPARC T1 in an ATCA blade. So we did. The internal code name is Montoya. Now we can do horizontal scaling within the processor, and extend it to the ATCA packet switched interconnect at the shelf or even rack level. 32 threads per Montoya, that means 384 threads per shelf. Some of the ATCA products are already announced and shipping. Montoya was kind of previewed around April at the CTIA show in Vegas, but what happens in Vegas stays in Vegas, so I cannot post a Montoya picture yet.

Well, I will go out on a limb and show you a Montoya picture:





This is Montoya the place. Picture taken last December. Montoya is a small surfer town on the Atlantic South. Waves, cool winds, deserted most of the year but packed with people in January, reachable by car through a single and busy undulating bridge. In engineering lingo Montoya the place is high throughput, bursty, and I/O saturated. A great place for your next vacation. [Note to boss, take the Montoya team to Montoya the place for the Montoya the blade RR/GA celebration.]

Now A face. You were expecting a Montoya face. Wrong. The face is Ashley.



Thought Ashley or Ariel were female names? Wrong again. This is Ashley's pantomimical explanation of a stateless domain moving from "here" to "there", where "here" is around Ashley's solar plexus and "there" is around his right shoulder. So what? Anything worth moving is always moved from here to there or viceversa, but the beauty of Ashley's stateless domains is that they can be moved at all. Ashley's picture was taken in Santa Clara at the Multicore Expo in March, when part of the OpenSPARC community came physically together for the first time to talk about these multicore processor trends. Multicore Expo had the familiar charm of old and small Interops. One could clearly see two distinct themes, low power multicores applied to embedded devices where some of the cores are specialized, and server centric multicore architectures (with Sun's UltraSPARC T1 being the first such incarnation), where the cores are identical although the usage models may be asymmetrical...

Rather than a tired racconto of the Multicore Expo material, here is a link for the CMT presentations, including Ashley's preview of things to come around CMT virtualization. There are other sessions I really enjoyed, but I won't taint you with my opinions, just read them and let's compare notes through my comments section. My session was sandwiched between Teja presenting how to use their tools for FPGA based packet processing, and Intel showing how you can use a dual core Xeon for packet processing. Good stuff, but if you ask me you don't have to use FPGAs anymore for fast packet processing, and you probably don't want to use Xeons for fast packet processing. Why? As I said, ask me. Today is just an Intermezzo about a name, a place, and a face.

And here is an update, a link for Montoya the blade:



[ Technorati: NiagaraCMT, ]

Monday Jan 30, 2006

Teams vs. Individuals

We crossed Gregorian calendar boundaries, so some blog introspection is due. It yields two findings. One. My blogs got wordy, as if following a blog Moore's law of sorts. I do not promise shorter blogs, just some images to lower the overall word density. Like this canvas painting. I saw the canvas at Carlos' place in December and took this picture. Carlos says it won some award, I hope I am not breaking any law by reproducing it, but let's leave the canvas for later.

Two. Blogs resemble airports. High traffic structures for short visits. Hub airports send you on to other places, terminus airports don't. I expect my visitors to enjoy the destination over the journey. Shoe removal rituals at metal detectors, $7 airport pizza slices, does anybody enjoy the journey anymore? My blog wants to remain a small terminus.

Airports make the case for teams over individuals. Complex operations smoothly choreographed to move people and route airplanes. A network routing problem that, unlike the Internet, cannot do flow control by dropping airplanes. They drop passengers allright, but fortunately before boarding. The aviation case for individuals over teams is the Mustang P-51 fighter, designed and prototyped in about 120 days. Hard to imagine the Mustang as the product of process and teamwork. It had to be all in the head of one person, and presumably one that worked on the German Messerschmitt before moving to the US. Complex engineering slows down when a single person cannot handle all design tradeoffs and delegates to a team. Human communications cannot be as fast and accurate as no communication. I rest my case with our own Andy Bechtolsheim, if you worked with him you know that no team can iterate at Andy's speed.

So who wins, the team or the individual?

In sports the team wins, at least in the soccer (aka football elsewhere) teams I coached or played. Reductio ad absurdum proof of that is the legendary USA team that won the Sun World Cup in 2003; no stars there. In high tech we have a mix of teams and influential individuals. Looking at Microsoft's hiring philosophy through "How would you move Mount Fuji" shows that missing a good hire is better than taking the risk of hiring the wrong person into the mix. A natural corollary of high-tech engineering being a team endeavor where individuals can do damage disproportionate to their potential contributions.

The teams vs. individuals, or centralized vs. distributed picture is equally ambiguous for computer systems. In Turning the Tables I implicitly sided with the distributed and horizontal team approach. Or at least hinted that cost and resiliency benefits make it a natural choice when possible. But sometimes systems are simpler and more efficient if they avoid or minimize the need to communicate. Would we host the entire net on one system if we had an infinitely powerful processor?

Well, the UltraSparc T1 is a powerful performer as shown in this analysis by the University of Aachen. And yet in some cases the system can be pushed to higher throughput by using Solaris 10 Containers to deploy multiple application instances. Solaris Containers provide a single point of platform administration and restore scalability to apps that were not necessarily written to exploit the degree of parallelism of a CMT.
In a sense this creates a team out of an individual. Strange. And it gets stranger soon, as the CMT Hypervisor adopts Logical Domains (or LDOMs). With LDOMs we could deploy multiple isolated OS instances onto a single CMT. Like the little men in the canvas. Why do that? Fault containment. Recall that when things go wrong an individual can cause disproportional damage. Better contain the scope of the damage. Software defects and most hardware faults are contained to the offending LDOM.

Is this all there is to LDOMs? Fat chance. This is just the beginning, now we can deploy different revision and patch levels of Solaris on different LDOMs, and also deploy heterogeneous Operating Systems ported by the OpenSparc community to the Hypervisor API. Incidentally the Hypervisor API was just published on the OpenSparc site. That publication is what shook me out of my recent blogging lethargy and prompted this post.

Paraphrasing Woody Allen, the prerogative of the classics is that you keep finding in them new things you never saw before. Looking again at the canvas now I find an heterogeneous team of little men, maybe collaborating, maybe isolated, doing their thing. A CMT canvas? So many CMT block diagrams, so many CMT Staroffice slides, finally real art, a CMT painted in oil on canvas.

I also see entire heterogeneous systems (e.g. a wireless network infrastructure), collapsing into a much smaller number of processing elements than today, sharing one high bandwidth system memory. Not sure what Woody Allen has to say about this, but I care more about what Telcos say about this vision.

Is this all I can do with CMT domains? Not really. Stay tuned for "The Unbearable Lightness of Being Stateless" blog, covering neat uses of stateless domains. Stateless is the wrong word, but I hate to change the title when that is all I wrote so far. In the meantime, there is a hub airport you should visit, often, for the places it takes you. And unlike my blog, it has a team behind it: OpenSparc.



[ Technorati: NiagaraCMT, ]

Wednesday Dec 21, 2005

The problem with the world

My father in law says that the problem with the world is more people writing than reading. He has been saying it for decades. Who wrote that? I would riposte then, in my impertinent days before the Internet. We fight this battle one book at a time, or more. My father, for example, has multiple books in progress in different rooms. No, I haven´t told them about Blogs, and how Blogs exacerbate the world´s maladie now that any impertinent like me can write and publish.

The Web started fine, no writer surplus when pages were predominantly read-only static content. Mundane stuff did not deserve to be on the net; I was homepageless for many years. Static content concentration resulted in browsers accessing the same content over time, so Web caches made sense. Caching content at large aggregation points, like corporate Internet access proxies and at service providers, saved bandwidth and shortened response times. If you use something often, keep it close to you. What a concept... Processors keep cache lines in fast on-chip memory, operating systems keep file system caches in system memory, and restaurants keep the most popular dishes pre-cooked ready to heat and serve.

Except for mutual fund fees, which are damn predictable regardless of future performance, past behavior may not be a good predictor for the future, warns the prospectus. Such warnings fit the Web cache case, though. Web usage evolved to include much more dynamic content. Content is now highly customized to our identities, and cannot be days old. Auction sites, brokerage houses, and blogs demand content that is personal and timely; the impact on infrastructure is simple, it drives more bandwidth and end-point capacity so that this content can be assembled and served fresh. The Moore-Shannon match I described in a previous post gives us more endpoint and channel capacity in the servers and the plumbing that make up the net. I mean no disrespect by skipping the sophisticated distributed caching and tiered processing that also make up the net, I am exaggerating to highlight how brute force caching is becoming less useful.

Expectedly, brute force caching is not the best culinary choice either. We sent men to the moon but haven't made a reheated pizza that tastes the same, and in spite of decades of civil aviation pre-brewed airline coffee smell is as cruel a torture as coach class legroom. Stashing pre-cooked dishes in a BIG refrigerator is brute force cuisine. I am here to advocate the Big Oven approach instead.

Incidentally, we faced the same choice when we created our CMT UltraSPARC T1 processor. Allocate more transistors and power resources to caching or save them for the processing resources themselves. Larger caches in processors ARE the brute force approach. In our case, just like most restaurants, we were optimizing for throughput (and particularly throughput per Watt), and there was a better solution, vertical threading. By making each of the eight cores in the UltraSPARC T1 vertically threaded, AND by having a wide memory interface (23 Gbytes/sec bandwidth) the cores can keep retiring instructions in the face of long latency memory accesses, which is exactly the same problem tackled with caches. Long latency memory accesses are like cooking steps in culinary recipes, you must wait for the oven to do its thing, and that takes a while.

Cautious customers and, as of our product launch also some competitors, ask how an eight core CMT can perform with just a 3 Mbyte L2 cache. Isn't our CMT like eight processors in a single socket, shouldn't it have eight times the cache of traditional processors to keep them individually busy. Well, the whole point is that CMT addresses the memory latency problem through vertical threads, and this makes it much less sensitive to cache size because it is less sensitive to cache misses in the first place. Instead of a large refrigerator full of pre-cooked dishes to be heated and garnished by a single overworked cook, we put a large oven and hired eight nimble cooks. The cooks were taught how to handle four orders at a time (just like my father does with books), and whenever one of these orders goes in the oven they switch immediately to one of the other three that is not in the oven. That is why the large 23 Gbytes/sec oven is important, it holds up to four orders for each of the eight cooks at the same time.

We explained this approach to cautious customers through architecture and modelling data, but the most convincing step, in computers as in food, is testing and tasting. Trust the Explanations but test the product. Two weeks back I heard about the “try and buy” program. Evaluate the CMT box and only buy it if you want to keep it. Not many restaurants go that far. Some won't even let you in the kitchen to look inside those BIG refrigerators. I am not sure the “try and buy” program will satisfy competitors. After all competitors react in disparage or embrace modalities. Questioning the cache size is an example of the disparage modality. The embrace modality was used by another competitor, first claiming they already had multi-core vertical threaded network processors, and more recently announcing CMT plans themselves.

Network processors (aka NPUs) are indeed vertically threaded processors. Unlike NPUs the UltraSPARC T1 is a vertically threaded general purpose processor, with all the software development advantages of standard tools and languages, full memory protection, virtual memory, cache coherency across cores (at L1 and of course the shared L2), arbitrarily large program memory, and no collaborative thread yielding constraints. The UltraSPARC T1 is a good foundation for I/O and network facing workloads without the programming quirks of network processors. Competitors arguing they already have CMT technology is akin to comedian Benny Hill's reaction when told about Neutron bombs that destroy people without damaging their buildings. “Oh, we already have them in England, we call them mortgages”. That is how similar they are...

As for embracing CMT as their future direction, that would be just flattering.



[ Technorati: NiagaraCMT, ]

Friday Dec 09, 2005

Names more precious than oil

Higher driving and heating costs, attributed to supply-demand imbalances, are a reminder that energy is increasingly a scarce resource. Or is it? On one hand more oil reserves have been generally discovered than consumed each year, but on the other hand we may be about to exit that phase. So what's a man to do? A man is to forgo trying to predict the future, and buy energy positions that neutralize the cost of living impact of these runaway costs. Granted, serious imbalances would disrupt our way of life in ways that no hedging position can restore, but that is a bigger problem than a humble blogger can solve.

Having hedged the energy (and health care costs while at it), a man can then pour a glass of Cabernet and ponder about other scarce resources. We could worry next about the electromagnetic spectrum. Borrowing lines from real estate agents, they don't make spectrum any more, it does not grow on trees. Spectrum is crowded by radios, TVs, cell phones, garage door openers, wireless hot spots, radars, you name it. I am ever impressed at our ingenuity for getting more out of a limited spectrum over time. From the AMPS cellular system introducing frequency reuse with variable cell sizes [Bell System Technical Journal, 1979], to spread spectrum (CDMA) and the way it packs more information in the same channel band.

I have bored many an audience by repeating that we are essentially dealing with Claude Shannon's upper bound on amount of information per channel by exploiting the computation enabled by Moore's law. Shannon vs. Moore to 12 rounds. We have applied this to wireless links, to data center wiring, and even to processor chips input/output interfaces. Through Moore's law, God gives us the transistors to store and crunch information, but doesn't give us the pins to get all that information in and out of these circuits.

We fought valiantly against Shannon's tyranny. 1000BASE-T moves Gigabit Ethernet bits over Unshielded Twisted Pair (UTP Ethernet was 1000 times slower when it started around 1987), EV-DO pushing a couple of Megabits of packet data into the existing 1.25MHz CDMA spectrum could become a global wireless DSL of sorts (no more hotel Internet access fees!), and SERDES technology for 5Gbps and beyond (per pin pairs) in and out of ASICs and processors. All this enabled by signal processing at the transmission line endpoints, in the form of sophisticated coding and modulation, adaptive equalization, clock recovery, echo cancellation, path diversity, and so on. But Claudes will be Claudes, and eventually we'll reach the bound, or a point of diminishing returns. We can go then to higher capacity channels, like optics for cabled applications, and like Robert Drost's work on Proximity Communications for future processor chips immediate interfaces. No more sleep lost over scarcities other than the scarcity of sleep itself.

For those of us that always find something to worry about, I'll mention a really scarce resource in our global world: Names. Good names for our classes, methods, and variables. Good signal names for our ASIC's RTL. Good names for our children. I, for one, didn't consume a middle name when our first boy was born, we kept good names dry for later. Actually the first born, like a processor with a single register, doesn't need a name until a second baby arrives in the family. We couldn't quite convince the nurse, and had to name him before we could take the baby home.

Given names are contested, ask me, I share mine with a Mermaid and a German detergent, but they are nothing compared to the stakes around brand names, trademarks, or domain names. Armies of lawyers descend. A land grab for those easy-to-remember, short-to-type names, devoid of negative connotations in most of the 2800 languages spoken on the planet. That is global scarcity. To make things worse, perfectly good names are condemned by unfortunate events, a "Titanic", an "Enron", get out of circulation, because human memory has no cold reset input to make us forget.

The creative pace of high-tech aggravates name scarcity. Think about project names, about industry initiatives, about technologies. A chronic name deficit forces name reuse, name overload, or even worse, acronyms. Incidentally, the cool CMT (Vade Retro, an acronym!) technology we just launched with the productization of the UltraSPARC T1 processor platforms was internally known for a while as Niagara. Not a bad name. Relatively short, no residual meaning inside Sun, and even visually metaphoric for throughput. A minor weakness, Niagara rhymes with a prescription medication of singular use, but outside adolescent circles, who would have the poor taste to bring that up. (Breaking news: a three letter competitor spokesperson brought the medication up, our competitors must be employing adolescents).

But the best part is the name efficiency of Sun's CMT play. No namespace clutter, one powerful technology, one name to remember UltraSPARC T1, one multi-core processor, one socket. This simplicity helps my strained memory. So strained that when I am asked about how UltraSPARC T1 stacks up against our Intel based competitors, I have to pull out an Intel roadmap cheat-sheet to be sure. Do they refer to Bensley with Lindenhurst, or the Truland platform, Paxville or Tulsa, Dempsey or Woodcrest, Sossaman through Whitefield on a Conroe platform or all the way to Dunnington. The casual listener thinks I am doing a public reading of Harry Potter, and I haven't even invoked the Itanium namespace.

Energy, Spectrum, and Names. On the good side of each scarcity. CMT power savings, tackling Shannon, and wrapping it all up in a single name. I'd love to write about what we are bringing next to the Moore vs. Shannon fight, but you'll have to wait for the next round gong, or invite me for a preview.



[ Technorati: NiagaraCMT, ]

Thursday Dec 08, 2005

Walking with a limp

You can't please everybody. This customary parent or mentor wisdom is usually thrown at our lack of academic focus, or at our design choices as engineers. Generalists are useful, but complete knowledge was EOL announced when we left The Garden of Eden (or got evicted rather), and Last Shipped around Leonardo Da Vinci. Generalists cover just a different subset of the knowledge tree, exemplifying another form of a specialist. Shall we say Horizontal Specialists. Last time I had an antagonistic experience with an HMO General Practicioner, I voiced what I really thought about him. He called building security. Next time I will just use the Horizontal Specialist sobriquet, and walk away. But this blog is about the specialization of machines and devices. I will leave human and medical doctor specialization out to avoid entangling my beloved employer.

For computing devices the specialization dilemma is captured by the historical name we have been using for the processors and the servers we make, General Purpose . A machine suitable for many uses, possibly beyond its original designer's intent, say the optimists. An engineering specialty so narrow that its practicioners do not know much about the software and workloads above it, say the cynics. They are both right, "generalist" machines designed by "specialist" humans.

Those who drive teenagers to school may have seen how they drag their feet on their way to class. This teenager's mother commands him to stop being lazy and please pick up his feet. His father lectures him that men don't do whatever their mother's say, men do what they want. In an attempt to please both parents the poor teenager drags one foot and picks up the other, walking with a limp. Are we designing a limp into our machines by trying to please everybody? The CMT throughput server philosophy postulates that we can run faster if we don't limp, and indeed the UltraSPARC T1 based products we are launching these very days do just that. We decided to please throughput horizontal integer workloads at the expense of floating point single thread.

The immediate payback we get from creating more specialized general purpose processors and systems, is that a whole set of applications and deployment architectures take far less boxes. This benefit is compounded by the power efficiency of these UltraSPARC T1 boxes. Before you accuse me of spewing out unquantified generalities, I will offer quantification along two dimensions, within and across Moore's law process generations.

For purists interested in architectural prowess, comparing within a given process manufacturing technology is the fair comparison. Dealt a hand of cards (wafer cost, transistors, complexity, and power), architecture is the game of playing them best. But given exponential transistor increases across generations, ignoring the impact of process technology and just relying on architecture leads to certain defeat. We must compare both within and across generations.

Within a generation CMT is roughly an order of magnitude improvement. Take the web facing workloads I care about for some of my work, they run about 8 to 10 times faster on a Niagara system than on a contemporary general purpose Sparc processor consuming the same power, made in the same 90nm process, but having the limp of pleasing the single thread and Floating point constraints. Getting one order of magnitude out of the same power envelope and manufacturing technology is pretty compelling, but in the interest of full disclosure let's show the card we pulled out of our sleeve: Memory. CMT is all about making the most of system memory bandwidth, and in comparing within a generation CMT plays with much more memory bandwidth in its hand. Did we cheat? Not really, architecture is also about interfaces and optimizing the memory interface is part of playing our cards.

Comparing across generations is projecting the throughput ratio between CMT on the current vs. the next process technology nodes, while keeping the power and cost as invariants. If you were expecting the answer to be another order of magnitude, I'd like to have some of what you are smoking. Architectural order of magnitude improvements are kind of rare, so now we fall back to riding Moore's Law. Having lowered your expectations, here are the good news. Unlike previous limping approaches, CMTs will give you nice integer factors (2x for example) across Moore's law cycles. And that is all we can ask from a new architecture, a solid one time jump to a different curve, and climbing the new curve at least with the same rate as before the jump.

Our competitors don't neatly align their offerings to facilitate my simplistic two dimensional comparison within and across process technology, they put their stuff out and compete. I will just say that the competitive data gives me a warm feeling about this cool technology, and defer to my fellow Sun bloggers covering competitive angles much better and deeper than me.
I recommend following Welcome to the CMT era, a great repository of all things CMT at Sun by Richard McDougall so I can walk away from trying to please everybody and shift back to my original topic, generality vs. specialization.

Is the sacrifice in generality worth the benefit. Does the sacrifice break our axiomatic belief in layered modular system design, in not caring and not counting on the implementation details of other modules or layers. I will state a claim so counterintuitive that you might want to invite me in to explain myself and take a Breathalyzer test. We claim that a specialized CMT architecture actually broadens the applicability of the processor beyond what was possible with its original general purpose sibling. It derives enough generality to apply to other elements of the IP and telephony network infrastructure, which hopefully is the subject of a future posting, for which I have at most the heading. But if I don't get to it, feel free to give me a call and invite me.



[ Technorati: NiagaraCMT, ]

Wednesday Dec 07, 2005

How to tell a hardware from a software person

They both write code on a screen at a very high abstraction level, they test their code before integrating into a larger blob through highly abstracted interfaces. Verilog looks just like a structured programming language to the uninitiated, and tools keep most coders equally removed from the ultimate assembly language and transistor level details.

Some argue that the large costs of tooling modern semiconductors put a perfection burden on the hardware designers that, through insomnia, molds them into somber personalities. Software engineers are pictured, by opposition, as carefree characters always able to land on their feet by recompiling and patching. This is all passe'. Hardware design has adopted a train model where fixes are phased in at multiple pre-defined points, and the impact of software defects can result in damages exceeding semiconductor tooling costs.

To tell software and hardware people apart just ask QUI BONO?
Specifically: Qui bono from Moore's law? Who benefits from Moore's law?
Moore's law renders hardware achievements obsolete, while turning slow or bloated software into achievements. A colleague and myself designed the first LAN controller with embedded memory, a first that enabled packaging the entire controller in 24 pins. We put 2 kilobytes of RAM buffers, solved the embedded memory yield issues, went for beers and felt great about our achievement. Our bragging rights for that chip lasted as little as our beers. Darn law. So the test is simple, if you are a victim of Moore's law you are in hardware; if you are a beneficiary of Moore's law you are in software.

Hardware is further victimized by Moore's law constant pressure on price. Sometimes this leads to spiraling prices for a given hardware function, and other times to increased capacity for a roughly constant price. Server processors have followed the latter, namely the speed bump regime. Successive processors push the clock frequency higher and thus deliver a performance benefit instead of a cost reduction. Software executes faster, subsequently making a bigger and more complex software edifice viable.

But if all good things must come to an end, how long will this regime last? Moore's law is not out of gas yet, but cranking up processor clocks is getting harder and less productive. You heard the reasons, power dissipation vs. frequency and the impact of system memory latencies. Interestingly, the UltraSPARC T1 CMT anticipates this new regime of exponential growth in transistors without a good incentive to push the frequency further. Will customers demand a price reduction now that the speed bump is dead? Not if we transition instead to a thread bump regime. UltraSPARC T1 inaugurates this transition to a thread bump regime, and consecutive CMT generations offering thread increases commensurate with Moore's law should provide the bumps. (Note to self: Contact Niagara add agency with idea "We are the Bumps in Thread Bumps".)

But wait a minute, does this mean that our carefree software developers are no longer automatic beneficiaries? Indeed, this time around they may have to sweat a bit more to turn additional threads into software achievements. Oh, and maybe multi-core processor designers can have simpler lives now that there is more repetition and less unique circuits to design, verify, and lose sleep over. Not quite a reversal between victim and beneficiary, but we may need a better test than QUI BONO in the future.



[ Technorati: NiagaraCMT, ]

Tuesday Dec 06, 2005

The warmth of vacuum tubes

I grew up listening to vacuum tube nostalgia. Radio technicians could diagnose a radio receiver with just a screwdriver, and sometimes even fix it. But beyond that, the transition from vacuum tubes to semiconductors was a religious topic within the Radio Amateur circles. It got harder to build your own gear, some said it didn't sound the same, the non-linearities of transistor amplifiers, you know. But the main complaint was not technical. Radio Amateur operators missed them because vacuum tubes kept their hands warm during the cold winter nights.

Radio Amateur anecdotes are some of the most memorable stories I could tell, maybe some other day, on some other blog. And I would also join the mourning of the vacuum tube, if it weren't for a more profound and recent displacement I need to mourn. The displacement of the HF radio Amateur at the hands of the Internet... A 3khz voice channel, shared, that may or may not work on a given day to a given place, displaced by a DSL line and a browser. I am not the only deserter, just look at the roofs of a city like Montevideo, once the highest density of Yagi antennas on the planet, you walk its streets today and there are few antennas to be seen. Victims of the ubiquitous Internet.

Yet I appreciate irony, and with modern life's Internet addiction filling my once HAM radio nights, I discover that the Athlon laptop gets warm just like the old vacuum transceivers. Deja Vu. We replaced hot vacuum tubes with cooler solid state radio, then things got pretty hot when we put lots of transistors in NMOS integrated circuits. I recall my first IC design in NMOS, clocked at a meager 10MHz, it required a ceramic package and got too hot to touch. They got cooler again with CMOS, so we started building bigger and faster semiconductors. Up to the point that the semiconductors running a lowly laptop keep my hands warm. The Internet server infrastructure (replacing the Ham radio ether) requires major ventilation and air-conditioning. At the rate we are going we might have to host the planet's infrastructure at the poles. How is that for an idea, dual home the entire net infrastructure to the North and South poles, no single point of failure, affordable land, maximum redundancy, cooled by keeping the windows open, and solar cells for 24 hours a day (well, half a year). I digress, but you read it here first. Hosting the net at the poles...

What is next? How do we pull the CMOS cool device trick again? For the moniker we can certainly reuse the letters CM, that is a start. As for the substance, let me narrate a customer lab visit we had here in Newark. We were showing off our first UltraSPARC T1 bringup machine, verbally conveying how naturally Horizontal Micro-scaling fits telephony infrastructure network elements. Brought Solaris up, showed our demo, and asked the customer to do the honors and check how many processors Solaris reported. Impressing somebody by printing the number 32 on a screen may not get you very far socially with your friends, or at bar, but for a skeptical techie the number 32 out of a single processor socket was meaningful. He lived on that side of the fine line between skepticism and paranoia. He touched the processor, and feeling it cold, accused us of smoke and mirrors; basically of running the demo from a different machine. We proved the accuser wrong, and ended up making the unintended point that Niagara really is a Cooler technology. We earned the right to reuse the letters for the next cool technology, CMT.

But before you start asking about how to keep your hands warm in the CMT era, fear not, there is still Memory, that is, plenty of DIMMs to keep the operator warm. What a coincidence, every train of thought takes me back to the Memory theme.



[ Technorati: NiagaraCMT, ]

About

hendel

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today