UltraSPARC T1 and Solaris: Threading Throughout

We are unveiling several UltraSPARC T1 (aka Niagara) based servers today. If you don't know what the T1 processor is, and you haven't heard about this chip and the systems that will house it, then you really should have a look. Seriously, this chip is impressive. Working at Sun, every now and again i'm fortunate enough to hear about new products and technologies we've got coming down the pipe. When I had first heard about the Niagara (T1) processor I was in disbelief.

32 logical CPUs presented by a single chip at 72 Watts?! Simply amazing.

Housed in T1's 2 square inch package are 8 processor cores, each capable of running 4 threads simultaneously. For me, the gravity of all this really sunk in when I invoked psrinfo(1M) on a test box and watched as the top of the output scrolled out of view in my xterm:
esaxe@ontario-mc25$ psrinfo
0       on-line   since 10/14/2000 20:54:37
1       on-line   since 10/14/2000 20:54:39
2       on-line   since 10/14/2000 20:54:39
3       on-line   since 10/14/2000 20:54:39
4       on-line   since 10/14/2000 20:54:39
5       on-line   since 10/14/2000 20:54:39
6       on-line   since 10/14/2000 20:54:39
7       on-line   since 10/14/2000 20:54:39
8       on-line   since 10/14/2000 20:54:39
9       on-line   since 10/14/2000 20:54:39
10      on-line   since 10/14/2000 20:54:39
11      on-line   since 10/14/2000 20:54:39
12      on-line   since 10/14/2000 20:54:39
13      on-line   since 10/14/2000 20:54:39
14      on-line   since 10/14/2000 20:54:39
15      on-line   since 10/14/2000 20:54:39
16      on-line   since 10/14/2000 20:54:39
17      on-line   since 10/14/2000 20:54:39
18      on-line   since 10/14/2000 20:54:39
19      on-line   since 10/14/2000 20:54:39
20      on-line   since 10/14/2000 20:54:39
21      on-line   since 10/14/2000 20:54:39
22      on-line   since 10/14/2000 20:54:39
23      on-line   since 10/14/2000 20:54:39
24      on-line   since 10/14/2000 20:54:39
25      on-line   since 10/14/2000 20:54:39
26      on-line   since 10/14/2000 20:54:39
27      on-line   since 10/14/2000 20:54:39
28      on-line   since 10/14/2000 20:54:39
29      on-line   since 10/14/2000 20:54:39
30      on-line   since 10/14/2000 20:54:39
31      on-line   since 10/14/2000 20:54:39
Yes, I know the system's clock is off by a few years. But seriously, output like this is something i'm used to seeing on monsters like the Sun Fire E25K and Sun Fire E6900 Servers. It was mind expanding indeed to see this sort of output from a small box with but a single physical processor.

Like UltraSPARC-IV and UltraSPARC-IV+ the T1 implements a Chip Multi-Threading (CMT) architecture...(which simply means it is able to run multiple threads of execution simultaneously). For a nice bit of background on CMT technology check out this article (I won't rehash it all here).

Solaris, as you might imagine is a natural fit for CMT systems like the T1000 and T2000, as it has efficiently operated across systems having twice as many CPUs (and more) for years.

For CMT however, acheiving good performance requires more than simply being able to scale. In a previous blog entry I talked about some of the CMT scheduling optimizations we've implemented in Solaris. Andrei will be discussing these optimizations more specifically in the context of the T1 processor (so again, I won't rehash), but it is worth underscoring that (especially for threaded processor architectures), the optimized thread placement and load balancing performed by the scheduler is a huge performance win.

Looking ahead, it's likely that workload characterization is going to be an important (and interesting) area of research. For example, we know that throughput on threaded processor architectures is maximized when threads running on the same core (sharing the same pipeline) are able to effectively execute in each other's stall cycles. CPI (Cycles Per Instruction) should therefore be an interesting metric to note when trying to characterize a given workload's scaling ability on this architecture. What other workload characteristics and metrics will be important/useful to collect and observe? We've got our work cut out for us. :)

Technorati Tags:
Comments:

hi there,

thanks for the info.

it will be nice to see some prices mentioned in some of the blogs -- easy friendly table with some common configuration vs prices for say 1 CPU, 2 CPU, 4 CPU systems.

thank you,

BR,
~A

Posted by anjan bacchu on December 06, 2005 at 10:23 AM PST #

Here is is pricing information for the T1000 and the T2000 Both systems have one physical processor, and appear to Solaris as 32 logical CPUs (8 cores x 4 threads per core). HTH... -Eric

Posted by Eric Saxe on December 07, 2005 at 09:02 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

esaxe

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today