JE and the Sun T1000/T2000 ("Niagara")
Over the past few months, I've had the opportunity to run some tests on a Sun T2000
(a/k/a "Niagara") box. Several people have asked me how JE does in a
CMT (Chip Multi-Threading) environment so I wanted to share my notes. I
also ran tests on a T1000, which is the same processor speed as the T2000. I've noted any relevant differences when appropriate.
Much
of this text is about the T2000 hardware and its configuration. If you
aren't interested in that aspect of the machine, please skip down below
where I talk about performance. I have also interspersed some comments
about the Sunfire V20Z (dual 1.8GHz Opteron x86 box) since it
represents a fairly well known baseline for many readers.
The
T2000 configuration, as tested, is an 8x4 (8 cores by 4 strands per
core) UltraSPARC processor (1GHz), with 8GB of memory and two Seagate
73GB Serial SCSI drives.
The T1000 configuration, as tested, is
a 24 "strand" (6x4) processor (1GHz), with 2GB of memory and a single
Seagate 500GB SATA drive. Since the T1000 only ships with 0 or 1 disk
drive (there's only room for one drive inside the box), and the only
option is an 80GB drive, I chose the no-drive option and put my own in
(*).
Configuring the Hardware
The T2000 comes on a wood palett. It is apparently assembled in China.
Both
machines are very slick looking. The T2000 is a 2U unit (T1000 is 1U).
Neither of them have a power switch. The T2000 has dual power supplies
(it ships with two power cords), triple fans, and four disk drives
slots (on the front panel). Power supplies, fans, and disk drives are
all hot swappable.
As noted by others,
the machine is very slick looking. The green LEDs for the power and
disk drives look cool. While most systems indicate disk activity by
blinking the disk drive LEDs on, the Niagara series has the LED
on to indicate inactivity (blinking it off for activity). You can also
turn on a white blinking "find me" LED using the Advanced Lights Out
Manager (ALOM) (a/k/a SCP or Service Control Processor) so that if you
have multiple units you can remotely help someone in the data center
find the unit. The Sun engineers seem to have made this LED brighter
than the one on the V20Z (I suspect intentionally).
The ALOM is
quit useful and allows you to poweron, reset, poweroff, and do some
minimal configuration. The ALOM lets you talk to its own command line
processor or switch over (and back) to the actual system tty console.
There's no video card on the T2000. You can also ssh into the ALOM if
you don't want to mess with serial anymore.
The ALOM also allows
you to enable and disable memory cards and NICs. At one point I had
stuck in some incorrect memory in the T1000 in order to bring it above
2GB. This caused the ALOM to disable all the memory including the
original Sun memory that was still "good". After removing the incorrect
memory, the machine wouldn't boot, claiming an incorrect memory
configuration. Uh oh. A call to Sun cleared that up. I just needed to
use the 'enablecomponent' command to re-enable the Sun memory and all
was well. Nevertheless, it is nice that you can use the ALOM to disable
memory and other devices on the main unit.
When you first power
it on, the ALOM goes through many diags and POSTs. Once the ALOM has
booted, you can boot the machine, which takes quiet a while.
Fortunately, Solaris 10 doesn't need to be reboot as often as, say
Windows. My V20Z has been up for 112 days.
The top-cover design
of the T2000 chassis is better than the T1000, which is better than the
V20Z. The V20Z cover is a pain to replace and remove so mostly I just
leave it "almost closed". The T1000 is better, but I still have to give
a little shove forward in order to get it to slide backwards. The T2000
cover seems to work pretty well. It comes off easily and goes back on
pretty easily. There are sensors on both machines so the SCP will tell
you when the cover is off (and refuse to boot it). It is a really nice
plus that Sun has printed diagrams and instructions right on the covers
for easy reference (e.g. how to replace memory, fans, disks, etc.).
The
fans are, like the V20Z, very noisy, but this is expected for a machine
that is going to live in the data center. When you plug the machine in,
some of the fans start. My first reaction was "Gee, this machine isn't
nearly as noisy as the V20Z". But after commanding the ALOM to power on
the actual Niagara processors, either more fans start or the fans cycle
to a higher speed (I didn't bother to determine which) and all hell
breaks loose (noise wise). I donned by noise cancelling headphones,
configured the machine as quickly as possible, and relocated it from my
office to the data center.
The insides of the T2000 are neat and tidy. All Serial SCSI cables are neatly run and even tied down.
The
disk drives on the T2000 are the newer 2.5" form factor, rather than
the usual 3.5" size. My complements to Sun for this forward thinking
(improved space utilization) as well as for using Serial SCSI (the
T1000 uses 3.5" SATA). They are serious about space and energy savings
(more on this below). But I do wish they had configurations with larger
capacity drives. The V20Z has the same problem. I was told by the T1000
Product Manager that the Niagara boxes are supposed to be web servers
and all the big-iron storage is on another machine somewhere
(presumably the Galaxy-based boxes that they're working on). The T2000
has PCI-E slots available so you can put more drives on it (e.g. a RAID
or SAN).
The memory on both units is DDR2, PC-4200 (533Mhz),
registered, ECC. There are 16 slots in the T2000 and 8 slots in the
T1000. Both the T1000 and T2000 shipped with 512MB DIMMS (2GB total on
the former and 8GB on the latter). If you buy non-Sun memory for these
units, be sure to ask the vendor if they've tested it on a T1000 or
T2000.
The T2000 has a DVD drive on it, but the T1000 does not.
Solaris 10 was pre-installed so I haven't had to use the DVD drive yet.
See the footnote below regarding Solaris installation on the T1000 (it
has no DVD or CD drive). There is no video card on the machine (unlike
the Sunfire V20Z), so to get the T2000 up and running, you need a
serial terminal emulator. I used Hyper Terminal connected to my
notebook. I set it to 9600-8-N-1, emulating a VT100 and it was ready
for action. The Solaris install then asks you a bunch of the usual
questions (approximately 5 minutes worth) about your configuration
(DHCP or fixed ipaddr, network ipaddrs, gateways, net masks, ipv6,
timezone, time, root password, etc.). After rebooting, you have to use
the serial terminal to create a non-root account, but then, since sshd
is running on the unit by default, getting to the box remotely is easy.
Once you're ssh'd in using your non-root account, you can su. For the
most part, I just use xterm to get to the box.
Note to Sun
engineers and Product Management: The Google Mini has a really clean
way of configuring the unit out of the box that doesn't require any
serial cables or magic. You plug your notebook into the Mini's
management network port. The Mini has a DHCP server on it so your
notebook gets assigned some class non-routable class c ipaddr by the
Mini (the notebook and the mini are the only thing on this one-wire
management network). You then fire up your web browser to a fixed
non-routable class C ipaddr (i.e. the Mini's fixed addr) with a known
port (printed in the quickstart guide) and you can do the whole
configuration via your browser. Presto! No serial is necessary. Since
the T2000 has a network management port, it seems that this would be
pretty easy to do.
It is clear from all aspects of their design
that these are data-center centric machines so don't order one with the
intent of putting it next to your desk.
Loading up the software
I needed (cvs, emacs, top, etc.) was easy. I went to sunfreeware.com
for all the binaries. Be sure to pick up gnu tar, and then replace the
one in /usr/sbin/tar with the new one. The system comes with a Java 1.5
pre-installed. It didn't take long to do a handful of pkgadd commands
and I had xterms and emacs's on my Windows X server.
Performance
My
benchmark comparisons are against a Sun V20Z with 1.8GHz Opterons,
running Solaris 10, and a Windows dual Xeon (2.4GHz, HyperThreading
disabled). Unfortunately, I don't have a V40Z to run against. The
results I talk about below are fairly general in nature and only meant
to give a general feel for what the box can do.
Overall, single
threaded performance is slower on the Niagara than on the Opteron and
the Xeon, but that is not a surprise (the comments I see coming out of
Sun seem to agree with this general assessment). What's interesting
about the Niagara is that you get a truckload of 'CPUs' (albeit virtual
CPUs since it's only 6 or 8 cores on a single chip) for a relatively
small number of dollars. From my point of view, a Berkeley DB Java
Edition ("JE") developer, this makes the T2000 a nice platform for
testing multi-threaded scalability and concurrency testing. True, each
of the strands are slower, but there sure are a lot of them.
I
have enabled the read and write disk caches on the T1000, T2000, and
V20Z for all of these benchmarks. I am less interested in disk speed
for these tests than CPU speed and multi-threaded performance.
The
primary tuning exercise for JE, and probably the fairest one for the
T2000, was to tune a customer benchmark that has 25 application
threads, all doing read-only queries against a database. The benchmark,
named JESearchRate,
simulates a multi-threaded database server application that uses JE.
The goal of the tuning exercise was to get mpstat to show 100% CPU
utilization on all processors.
The end result of the tuning was that I was able to get JESearchRate
running about twice as fast on the T2000 compared to the V20Z. All of
the changes that I made to JE for this tuning exercise were
incorporated into JE 3.0.11, with the shared latches option being
settable by a configuration parameter ("je.env.sharedLatches").
I realize that this is a pretty short summary of what was a couple of months of work on JESearchRatetuning
(I had access to a T2000 in Sun's lab before the Niagara boxes went
GA), but it really is pretty significant. Many of Sun's competitors in
the server space have made the argument that while CMT is a nice
concept, it requires tuning your applications before you can realize
the performance benefits. In fact, that is exactly what I did: tuned JE
to have better multi-threaded performance, just as the other vendors
predicted. But so what? The changes I made are beneficial to all JE
users, whether they're running on 1.8GHz Pentium-M notebooks or on the
biggest SMP iron available. The good news for me, a developer, is that
I didn't have to pay an arm and a leg (only about $4k) to get my hands
on 24 processors (in the case of a T1000) that I can use for JE
performance tuning.
During the recent release qualification of
JE 3.0.11, I used the T2000 to run our stress tests. Normally, I run
three different configurations of the stress test on a dual processor,
each configuration having four active threads performing transactions
of various types. Depending on the configuration, the tests are both
IO- and CPU-bound. The T2000 was able to run 6 of these pretty easily
and mpstat showed all processes in the 10 to 50% range (top showing
30-35% usage across all CPUs). Since the stress test uses about 30MB of
working set, they all easily fit into memory.
I also ran a JE
"Contentious Update" test against the 2-way 2.4 GHz Xeon Windows/XP
machine. This test is CPU bound and fits in memory on all the tested
machines so disk and memory configurations aren't significant. For
single threaded performance, the Xeon machine is nearly three times
faster than the T2000 (20K ops/sec vs 7k ops/sec). But with two
threads, the Xeon performance drops and the T2000 performance increases
(14K ops/sec vs 8k ops/sec). With four threads it's 10k ops/sec vs 13k
ops/sec. I found similar results with other tests: Xeon performance
drops as the number of concurrent threads increases, but the T2000
throughput increases as the number of threads increases.
Another
interesting test is a random read test with shared latches enabled. I
ran this test against the V20Z, the T2000, and the Xeon. The results
are shown in the graph below. Interestingly, the Xeon, peaks with 2
threads, but then drops off at 3 and 4 threads and then levels off
after that. The T2000 scales up until about 3 threads and then levels
off (above the Xeon). The V20X, drops from 1 to 3 threads and then
levels off.
Power Consumption
Sun
has made a lot of smoke about the reduced power consumption with the
Niagara boxes so it seems fitting that I measure it. Our data center
electricity bills cross my desk every month so I'm glad to see Sun
pushing hard on this issue. The T2000 and T1000 that I tested both met
or exceeded what Sun's data sheet says they consume.
All current measurements were made with a line voltage of 120.9 VAC (60Hz).
T2000:
The ALOM draws 0.3 amps with the main system powered off. With the main
system powered on, booted, and idle it draws 1.9 amps. I ran continuous
fsync(2) calls on one of the two disks and I couldn't see a noticeable
change in the current drawn. This isn't surprising since disk drives
are generally spec'd to use no more than 12 watts. Running 32
processes, all CPU bound, it drew another 0.2 amps additional, for a
total of 2.1 amps. The Sun T2000 data sheet says that the system draws
275w or less so my system met or exceeded that specification.
T1000:
The ALOM draws 0.1 amps with the main system powered off. With the main
system powered on, booted, and idle it draws 1.2 amps. Running 24
processes, all CPU bound, it drew another .15 amps, for a total of 1.35
amps. The Sun T1000 data sheet says that the system draws 220w or less
so my system met or exceeded that specification.
I was a little
surprised by the difference in ALOM power between the T1000 and T2000.
The Sun T1000 Product Manager says that the T2000 ALOM is on a riser
card and the T1000 is on the system motherboard and that may account
for the difference.
Xeon: The Xeon draws 1.2 amps idle and 1.75 amps when it's running something CPU bound.
V20Z:
The SCP on this is really chincy with power and only draws 0.1 amps.
When the V20Z is idle, it draws 1.4 amps. When both processors are
running JVMs at 100% each, it draws 1.6 amps.
Some people have
expressed concerns the Niagara chips don't have binary compatibility
with, say x86. This is true, but since I use Java, I don't care. All
the utils I want are available in binary on sunfreeware.com. But I
recognize that this may not be sufficient for all potential T2000
customers.
Conclusion
The
Niagara is an interesting box from a developers point of view because
it offers a large number of strands (virtual CPUs) at a relatively low
price point for concurrency testing. From a data center point of view,
this box could be interesting for people who may be thinking of
upgrading older boxes (like the Xeon used in my tests) where there are
more cycles/watt in the Niagara than an older Intel machine.
(*) Note
that if you choose this option, you must be prepared to do a network
install of Solaris. Also, Sun doesn't ship the disk drive bracket with
the unit and you can't order it as a separate part. This may have
changed, but I believe they expect most people to purchase the box with
their own 80GB drive (and Solaris pre-installed). Since I couldn't do a
network install (I didn't have a Solaris SPARC box available), the Sun
guys were really nice and imaged my 500GB drive with Solaris to get me
going. They also graciously gave me one of the prototype disk brackets.