Friday Feb 10, 2006

"Putting a large number of these medium-sized cores on a chip is not optimal." - FUD

This comment came from a blog I'm not sure if it was deliberate FUD or the author was just not aware that in today's world the old rules of thumb are no longer relevant.

Today almost all business applications are either multi-threaded or multi-process [firefox, apache, oracle, mysql, postgress, java, ...] So you are more interested in the performance of many threads than a single thread. The old thinking is cache misses are bad do anything to reduce cache misses build a bigger cache, keep other cores away from this cache because when we have a cache miss the processor will stall and that will reduce performance. The real problem is not cache misses but processor stalls. The T1 has 4 hardware threads per core so when we have a cache miss it can work on another thread, the processor doesn't stall. So does the T1 have more cache misses than other processors – maybe, does it reduce the performance of the T1 – No. More theory here more practice here.

[ T: NiagaraCMT CoolThreads CPU ]

Wednesday Dec 14, 2005

Why Does the Sun Fire T1000 use 180 Watts?

Why Does the Sun Fire T1000 use 180 Watts?
If you looked at computers in datacenters 5, 10, 15 or 20 years ago the thing most cabinets had in common was one 30 Amp 220V power plug, from this we could see that each cabinet could draw 6600 Watts. So for years we had a simple relationship between the floor area and the power/cooling requirements of the datacenter. We could just roll cabinets in and out without having to worry about power or cooling. Around 5 years ago small 2 processor servers started entering the datacenter we could get 12 to 16 of these in a rack which was great but often power and cooling requirement per rack increased up to 20000 Watts, this is just too much for the typical datacenter so we end up putting servers in small piles giving us the lilliput server farm. So once power and cooling are factored in the TOC per server is high.

Designing a 1U server for server farms if we are constrained by the server room door 42RU (Rack Units) and one power socket 6600 Watts, add in power sequencers, smaller racks etc we may have only have 32RUs free giving a power range of 140 Watts to 200 Watts per RU. The T1000 comes in at 180 Watts putting it in the sweet spot of the power vs space trade off.

[ Technorati: ]

Tuesday Dec 06, 2005

Is A Sun Fire T2000 SMP ?

Is A Sun Fire T2000 SMP (Symetric MultiProcessor) ?

I've seen this question asked a few times. This question is need to be broken down into 2 parts.

How many sockets does the T2000 have? - There is 1 socket for a UltraSPARC T1 processor .

The T1 processors contains 8 cores running 4 threads per core. Giving us 32 simultaneous processing threads on 1 die.

Symetric MultiProcessing describes the operating system feature of scheduling any task to any processing thread (Previous we would used CPU but the term CPU has become confused) it requires some underling hardware features cache coherency as an example to function efficently.

So the T2000 is a SMP computer but all of the SMP functions happen on 1 piece of silicon. This is a wow moment it! When you think of a T2000 as computer having 8 cores and the cache coherency and other SMP processes only takes a few cycles you see why it can outperform a multi-socket Xeon box .

First login to a Sun Fire T2000

So what do you see when you first login to a Sun Fire T2000.

lets do a simple test

which sh
file /usr/bin/sh
/usr/bin/sh: ELF 32-bit MSB executable
SPARC Version 1, dynamically linked, stripped

Do the same thing on any SPARC solaris machine from Sun or others you will get the same result this means that an executable complied today or ten years ago will run! So even though the Sun T2000 is the biggest thing to hit the computer industry in ten years it has over 10 years worth of software out of the box. The next question is how will it scale. When you look here and here you can see that the T2000 scales very well.

So lets have a quick look around.

System Configuration: Sun Microsystems sun4v Sun Fire T200
System clock frequency: 200 MHz
Memory size: 32760 Megabytes

========================= CPUs ===============================================

Location CPU Freq Implementation Mask
------------ ----- -------- ------------------- -----
MB/CMP0/P0 0 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P1 1 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P2 2 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P3 3 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P4 4 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P5 5 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P6 6 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P7 7 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P8 8 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P9 9 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P10 10 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P11 11 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P12 12 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P13 13 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P14 14 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P15 15 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P16 16 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P17 17 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P18 18 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P19 19 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P20 20 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P21 21 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P22 22 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P23 23 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P24 24 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P25 25 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P26 26 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P27 27 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P28 28 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P29 29 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P30 30 1200 MHz SUNW,UltraSPARC-T1
MB/CMP0/P31 31 1200 MHz SUNW,UltraSPARC-T1

========================= IO Configuration =========================

Location Type Slot Path Name Model
----------- ----- ---- --------------------------------------------- ------------------------- ---------
IOBD/NET0 PCIE IOBD /pci@780/pci@0/pci@1/network@0 network-pciex8086,105e
IOBD/NET1 PCIE IOBD /pci@780/pci@0/pci@1/network@0,1 network-pciex8086,105e
IOBD/PCIE0 PCIE 0 /pci@780/pci@0/pci@8/fibre-channel@0fibre-channel-pciex1077,2432
IOBD/PCIE0 PCIE 0 /pci@780/pci@0/pci@8/fibre-channel@0,1fibre-channel-pciex1077,2432
IOBD/PCIE-1 PCIE IOBD /pci@7c0/pci@0/pci@1/pci@0/isa isa
IOBD/PCIE-1 PCIE IOBD /pci@7c0/pci@0/pci@1/pci@0/usb@5 usb-pciclass,0c0310
IOBD/PCIE-1 PCIE IOBD /pci@7c0/pci@0/pci@1/pci@0/usb@6 usb-pciclass,0c0310
IOBD/PCIE-1 PCIE IOBD /pci@7c0/pci@0/pci@1/pci@0/ide ide-pci10b9,5229
IOBD/PCIE-1 PCIE IOBD /pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2 LSILogic,sas-pci1000,50 LSI,1064
IOBD/PCIX1 PCIX 1 /pci@7c0/pci@0/pci@2/network@0 network-pciex8086,105e
IOBD/PCIX1 PCIX 1 /pci@7c0/pci@0/pci@2/network@0,1 network-pciex8086,105e

The first thing to note is this Sun Fire T2000 has 32G of memory. The next thing is wow 32 processors and four gigabyte ethernet ports.

This reminds Me of the first time I logged into an Enterprise 10000 (E10K) in 1997, from memory it was half loaded with 32 UltraSPARC II and 32G of memory. The E10K was a bit bigger than a large refrigerator. Now we have a similar system but only requiring 2U of rack space. Back then the first question was how are we going to use all that processing power.

We had a few large applications that needed that much processing power and back then we needed to do some databases, web servers tuning to get them to scale with 32 CPU, Today after 8 years all these applications have defaults that make them run well on Sun Fire E25K where we can have over 140 cores. So to most solaris applications a 32 “CPU” machine is mid sized – no tuning necessary.

Another great thing about the E10K was you could partition them on the fly the unit of granularity was 4 CPUs so you could have up to 16 OS running in one E10K chassis, people loved this feature and that drove Sun to develop a rich resource management framework. In Solaris 10 you can now have psrset, containers and zones and other resource management tools so if you currently have a whole rack of lightly loaded systems they can keep their IP addresses and resource QOS and move onto one T2000.

So as the E10K was years ahead of the competition in the late 90s the T2000 is years ahead of the competition Today.




« April 2014