I thought of a way of explaining the benefit of CMT (or more specifically,
interleaved multithreading -
see this article for details) using an analogy the other day.
Bear with me as I wax lyrical on computer history...
Deep back in the origins of the computer, there was only one process
(as well as one processor). There was no operating system, so in turn
there were no concepts like:
What am I getting at? Well, let me pick out a few of the advances in
computing, so I can explain why interleaved multithreading is simply the next logical step.
The first computer operating systems
simply replaced (automated) some of the tasks that were undertaken
manually by a computer operator - load a program, load some utility
routines that could be used by the program (e.g. I/O routines), record
some accounting data at the completion of the job. They did nothing
during the execution of the job, but they had nothing to do - no other
work could be done while the processor was effectively idle, such as when
waiting for an I/O to complete.
Then muti-processing operating systems were developed. Suddenly we
had the opportunity to use the otherwise wasted CPU resource while one
program was stalled on an I/O. In this case the O.S. would switch in
another program. Generically this is known as scheduling, and
operating systems developed (and still develop) more sophisticated
ways of sharing out the CPU resources in order to achieve the
At this point we had enshrined in the OS the idea that CPU resource
was precious, not plentiful, and there should be features designed
into the system to minimize its waste. This would reduce or delay the
need for that upgrade to a faster computer as we continued to add new
applications and features to existing applications. This is analogous
to conserving water to offset the need for new dams & reservoirs.
With CMT, we have now taken this concept into silicon. If
we think of a load or store to or from main (uncached) memory as a
type of I/O, then thread switching in interleaved multithreading is
just like the idea of a voluntary context switch.
We are not giving up the CPU for the duration of the "I/O", but we are
giving up the execution unit, knowing that if there is another thread
that can use it, it will.
In a way, we are delaying the need to increase the clock rate or
pipe-lining abilities of the cores by taking this step.
Now the underlying details of the implementation can be more complex
than this (and they are getting more complex as we release newer CPU
architectures like the UltraSPARC T2 Plus - see the
T5140 Systems Architecture Whitepaper for details), but this
analogy to I/O's and context switches works
well for me to understand why we have chosen this direction.
To continue to throw engineering resources at
faster, more complicated
CPU cores seems to be akin to the idea of the mainframe (the closest
descendant to early computers) - just make it do more of the same type