Niagara - Designed for Network throughput
We finally announce Niagara based servers to the public! Billed as the
low cost, energy efficient, huge network throughput processors -
marketing mumbo jumbo you think?? Well, try it and you will see. I was
priviledged enough that one of the earliest prototype landed on my desk
(or in my lab to be precise) so Solaris networking could be tailored to
take advantage of the chip. And boy, together with Solaris, this thing
So you know that Niagara is multi core, multi threaded chip and Solaris
takes advantage in multiple way. Let me highlight some of them.
The load from the NIC is fanned out to multiple soft rings in the
layer based on the src IP address and port information. Each soft ring
in turn is tied to a Niagara thread and a
Perimeter such that packets from a connection have locality
to specific H/W thread on a core and the NIC has locality to specific
core. Think of this model as 4 H/W threads per core processing the NIC
such that if one thread stalls for resource, the CPU cycles are not
wasted. The result is amazing network performance for this beast.
Performs 5-6 times the performance of your typical x86 based CPU.
Imagine you are a ISP or someone wanting to consolidate multiple
machines on one physical machine. Well, Niagara based platforms lends
themselves beautifully to this concept because there are so many H/W
threads around which appear as individual CPUs to Solaris. We have a
project underway called
(details available on
Community page on OpenSolaris) which will allow you to carve the
machine (create virtual network stacks) into multiple virtual machines
and tied specific CPUs to them and control the B/W utilization for each
virtual machine on a shared NIC.
Real Time Networking/Offload
based drivers and
architecture in Solaris 10, the stack controls the rate of interrupts
and can dynamically switch the NIC between interrupt and polling mode.
Couple with Niagara platform, Solaris can run the entire networking
stack on one core and provide real time capabilities to the
application. Meanwhile, the application them selves run on different
core without worrying about networking interrupts pinning them down.
You can get pretty bounded latencies provided application can do some
admission control. We are also planning to hide the core running
networking from the application effectively getting TOE for free
without suffering from the drawbacks of offloading networking to a
spearate piece of hardware.