UltraSPARC T1 in a Fireball - the ugly duckling
By denissheahan on Dec 06, 2005
Just after I arrived in the Niagara Arch group we taped out UltraSPARC T1 and
10 weeks later we had first silicon. The guys in the bringup team spent
many late nights and got Solaris booted in a couple of weeks
Now what. We knew what the performance should be in theory but we needed
to prove this as quickly as possible. Silicon verification was ongoing
in Sunnyvale using all the available systems but we managed to beg one
from the team to do some performance evaluation.
The system the we received was without exception one of the ugliest I have
ever seen. It was called a Fireball and was a UltraSPARC T1 board jammed sideways
into a 4U deskside server. It had a an industrial 6 inch fan on it that
was irritatingly loud. There were cables and wires everywhere to aid debug
In the front were slots for 8 old full height SCSI disks. It looked like
no mechanical engineer had been involved in its creation. A picture of this
would later appear in Jonathans blog
Little did we know that we would come to love these systems and that
they would still be involved in performance testing a year later.
The system had limited I/O capabilities so we decided to initially test
a throughput cpu/memory Java benchmark. Initial chips were only
rated for 800MHz but you cannot keep a good performance engineer down.
We worked out a way to hack the reset code to drive the chips to 1.2GHz
by increasing the core voltage. As these were initial silicon samples
we didn't know what to expect. We tested a number of chips until we
found 3 that could run at 1.2GHz.
After that it was mostly software. When working with very early systems
firmware and OS bits are hand built, panics and powercycles are common.
Because Niagara is such an exciting new technology, however, people were
prepared to take a lot of pain to run early workloads. It took about
a week to get the right software stack in place but it was worth the wait.
The initial number at 800MHz was nearly 100k Ops/sec. When we
cranked it up to 1.2GHz we got 129k with minimal tuning. We were astounded
at how the UltraSPARC T1 threads absorbed work.
The Software to access Hardware performance counters was not yet in Solaris
so we scrambled to add this functionality. What they revealed was that the
utilization of the Niagara pipe was nearly 70%. In 12 years of Sparc Performance work
I had never seen a number that high. Not only was the silicon working
beautifully but the pre-silicon simulations had been right.
We rushed to run other CPU/memory benchmarks including an internal XML
test and got similar results.
A few weeks later I gave a presentation where I first showed the standard
CMT/Niagara slide with the 4 threads on a core and the 8 cores absorbing
all the stall. I'm sure most folks in the room were moaning to themselves
"here we go again" . Then I put up a slide with simply stated "An now
it is real" with the Java throughput and XML results. The age of CMT had