Sunday Jul 12, 2009

Crossbow Launch, Talk and BOF at Community One and Java One

Crossbow Launch, Talk and BOF at Community One and Java One

On June 1, 2009, during Community One and Java One in San Francisco, California, Crossbow was formally launched as part of OpenSolaris 2009.06. The morning started with a keynote where John Fowler, EVP of Sun Systems group formally announced OpenSolaris 2009.06 as the beta for next enterprise release of Solaris.Next (Next release after Solaris 10). He and Greg Lavender then went on to show the Crossbow feature and the Virtual Wire demo. Later in the day I did a talk on Crossbow where Nicolas and Kais accompanied me and showed the Crossbow Virtual wire demo in detail. Bill Franklin and some of his cohorts were dressed as Crossbow knights and they charged in the room right after the talk. I think people just got a shock of their life. It was very entertaining.

The launch got lot of visibility and very good press coverage which can be see on the Crossbow News page. The most notable ones were: On June 2, 2009 we held the Crossbow BOF in the evening. Great showing and great support from the Community. So great stuff and a good closure for Phase 1 of the Crossbow project. The team members were pretty happy and relived. Now trying to get the next intermediate phase going so we can complete the story for next enterprise release of Solaris which might or might not be called Solaris11. Key things are more analytics (dlstat/flowstat), some security/anti spoofing features and more usablity etc. More details are being discussed on the Crossbow Discussion page.

Tuesday Mar 17, 2009

Crossbow: Virtualized switching and performance

Crossbow: Virtualized switching and performance

Saw Cisco's unified fabric announcement. Seems like they are going after Cloud computing which pretty much promises to solve the world hunger problem. Even if Cloud computing can just solve the high data center cost problem and make compute, networking, and storage available on demand in a cheap manner, I am pretty much sold on it. The interesting part is that world needs to move towards enabling people to bring their network on the cloud and have compute, bandwidth and storage available on demand. Talking about networking and network virtualization, this means that we need to go to open standards, open technology and off the shelf hardware. The users of cloud will not accept a vendor or provider lock down. The cloud needs to be built in such a manner that a user can take his physical network and migrate it to an operator's cloud and at the same time have the ability to build their own clouds and migrate stuff between the two. Open Networking is the key ingredient here.

This essentially means that there is no room for custom ASICs and protocols and the world of networking needs to change. This is what Jonathan was talking about to certain extent around Open Networking and Crossbow. OpenSolaris with Crossbow make things very interesting in this space. But it seems like people don't fully understand what Crossbow and OpenSolaris bring to the table. I saw a post from Scott Lowe and several other mentioning that Crossbow is pretty similar to VMware's network virtualization solutions and Cisco Nexus 1000v virtual switches.

Let me take some time to explain few very important things about Crossbow:
  • Its Open Source and part of OpenSolaris. You can download it right here.
  • Its leverages NIC hardware switching and features to deliver isolation and performance for virtual machines. Crossbow not only includes H/W & S/W based VNICs and switches, it also offers Virtualized Routers, Load balancer, and Firewalls. The Virtual Network Machines can be created using Crossbow and Solaris Zones and have pretty amazing performance. All these are connected together using the Crossbow Virtual Wire. You don't need to buy fancy and expensive virtualized switches to create and use Virtual Wire.
  • Using hardware virtualized lanes Crossbow technology scales multiples of 10gig traffic using off the shelf hardware.

Hardware based VNICs and Hardware based Switching

Picture is always worth a thousand words. The figure shows how crossbow VNIC are built on top of real NIC hardware and how we do switching in hardware where possible. And Crossbow does have a full featured S/W layer where it can do S/W VNICs and switching as well. The hardware is leveraged when available. Its important to note that most of the NIC vendors do ship with the necessary NIC classifiers and Rx/Tx rings and its pretty much mandatory for 10 gig NICs which do form the backbone for a cloud.
Crossbow H/W based VNICs

Virtual Wire: The essence of virtualized networking

The Crossbow Virtual Wire technology allows a person to convert a full features physical network (multiple subnets, switches and routers) and configure it within one or more hosts. This is the key to move virtualized networks in and out of the cloud. The figure shows a two subnet physical network with multiple switches, different link speeds and connected via a router and how it can be virtualized in a single box. A full workshop to do virtualized networking is available here.
Virtual Wire

Scaling and Performance

Crossbow leverages the NICs features pretty aggressively to create virtualization lanes that help traffic scale across large number of cores and threads. For people wanting to build real or virtual appliances using OpenSolaris, the performance and scaling across 10 Gig NICs is pretty essential. The figure below shows an overview of hardware lanes.
Crossbow Virtualization Architecture

More Information

There is a white paper and more detailed documents (including how to get started) at the Crossbow OpenSolaris page.

Monday Mar 02, 2009

Crossbow enables an Open Networking Platform

Crossbow enables an Open Networking Platform

I came across this blog from Paul Murphy. You should read the second half of Pauls blog. What he says pretty true. Crossbow delivered a brand new networking stack to Solaris which has scalability, virtualization, QoS, and better observability designed in (instead of patched in). The complete list of features delivered and under works are here. Coupled with a full fledged open source Quagga Routing Suite (RIP, OSPF, BGP, etc), IP Filter Firewall, and a kernel Load Balancer, OpenSolaris becomes a pretty useful platform for building Open Networking appliances.

Apart from single box functionality, imagine if you want to deliver Virtual Router or a load balancer, it would be pretty easy to do so. OpenSolaris offers Zones where you can deliver a pre configured zone as a Router, Load balancer, or a firewall. The difference would be that this Zone would be fully portable to another machine running OpenSolaris and will have no performance penalty. After all, we aka Crossbow team guarantee that our VNICs with Zones do not have any performance penalties. You can also build a fully portable and pre configured virtual networking equipment using Xen guest which can be made to migrate between any OpenSolaris or Linux host.

I noticed that couple of folks on Paul blog were asking about why Crossbow NIC virtualization is different? Well, its not just the NIC being virtualized but actually the entire data path along with it called a Virtualization Lane. You can see the virtualization lane all the way from NIC to socket Layer and back here. Not only is there one or more Virtualization Lanes per virtual machine, the bandwidth partitioning, Diffserv tagging, priority, CPU assignment etc. are designed in as part of the architecture. The same concepts are used to scale the stack across multiples of 10gigE NIC over large number of cores and threads (out of the world forwarding performance anyone!).

And as mentioned before, Crossbow enables Virtual Wire. A ability to create a full featured network without any physical wires. Think of running network simulations and testing in a whole new light!!

Tuesday Mar 04, 2008

Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)

Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)

Virtual Wire: Network in a Box (Sun Tech Day in Hyderabad)

I did a session for developers during the Sun Tech Day in Hyderabad and Raju Alluri had printed out 100 copies of the workshop and we were carrying 100 DVDs with Crossbow iso images (they are available on web here. The people just loved it. We had sooo underestimated the demand that printouts and DVDs disappeared in less than a minute. I had a presentation that included 30 odd slides but I couldn't even go past slide 7 since the workshop was so interesting to people. And between the tech day presentation and user group meeting in the evening, people pointed out a lot of interesting uses and why this can be such a powerful thing.

The idea that you can create any arbitrarily complex physical network as a virtual wire and run your favorite workload, do performance analysis and debug it is very appealing to people. Remember that we are not simulating the network. This is the real thing i.e. real applications running and real packets flowing. If you application runs on any OS, it will run on this virtual network and will send and receive real packets!!

The concept is pretty useful even to people like us because now we don't need to pester our lab staff to create us a network for us to test or experiment on. And best part is, we can use xVM and run Linux and Windows as hosts as well.

We are thinking of writing a book which reinvents how you learn networking in schools and universities. And oh by the way, do people really care about CCNA now that they can do all this on their laptop :) If someone is interested in contributing real examples for this workshop module and the book, you are more than welcome. Just drop us a line.

Thursday Feb 28, 2008

Network in a Box (Creating a real Networks on your Laptop)

Virtual Wire: Network in a Box (Creating a real Networks on your Laptop)

Virtual Wire: Network in a Box (Creating a real Network on your Laptop)

Crossbow: Network Virtualization & Resource Control


Create a real network comprising of Hosts, Switches and Routers as a Virtual Network on a laptop. The Virtual Network (called Virtual Wire) is created using OpenSolaris project Crossbow Technology and the hosts etc are created using Solaris Zones (a light weight virtualization technology). All the steps necessary to create the virtual topology are explained.

The users can use this hands on demo/workshop and exercises in the end to become an expert in
  • Configuring IPv4 and IPv6 networks
  • Hands on experience with OpenSolaris
  • Configure and manage a real Router
  • IP Routing technologies including RIP, OSPF and BGP
  • Debugging configuration and connectivity issues
  • Network performance and bottleneck Analysis
The users of this module need not have access to a real network, router and switches. All they need is a laptop or desktop running OpenSolaris Project Crossbow snapshot 2/28/2008 or later which can be found at


Crossbow (Network Virtualization and Resource Control) allows users to create a Virtual Wire with fixed link speeds in a box. Multiple subnet connected via a Virtual Router is pretty easy to configure. This allows the network administrators to do a full network configuration, verify IP address, subnet masks and router ports and addresses. They can test connectivity and link speeds and when fully satisfied, they can instantiate the configuration on the real network.

Another great application is to debug problems by simulating a real network in a box. If network administrators are having issues with connectivity or performance, they can create a virtual network and debug their issues using snoop, kernel stats and dtrace. They don't need to use the expensive H/W based network analyzers.

The network developers and researchers working with protocols (like high speed TCP) can use OpenSolaris to write their implementation and then try it out with other production implementations. They can debug and fine tune their protocol quite a bit before sending even a single packet on the real network.

Note1: Users can use Solaris Zones, Xen or ldom guests to create the virtual hosts while Crossbow provides the virtual network building blocks. There is no simulation but real protocol code at work. Users run real applications on the host and clients which generate real packets.

Note2: The Solaris protocol code executed for a virtual network or Solaris acting a real router or host is common all the way to bottom of MAC layer. In case of virtual networks, the device driver code for a physical NIC is the only code that is not needed.

Try it Yourself

Lets do a simple exercise. As part of this exercise, you will learn
  • How to configure a virtual network having two subnets and connected via a Virtual Router using Crossbow and Zones
  • How to set the various link speeds to simulate multiple speed network
  • Do some performance runs to verify connectivity
What you need:

A laptop or machine running Crossbow snapshot from Feb 28, 2008 or later

Virtual Network Example

Lets take a physical network. The example in Fig 1a is representing the real network showing how my desktop connects to the Lab servers. The desktop is on network while the server machines (host1 and host2) are on network. In addition, host1 has got a 10/100 Mbps NIC limiting its connectivity to 100Mbps.

Fig. 4

Fig. 1a

We will represent the network shown in Fig 1a on my Crossbow enabled laptop as a Virtual Network. We use Zones to act as host1, host2 and the Router while the global zone (gz) acts as the client (as a user exercise, create another client zone and assign VNIC6 to it to act as a client).
Fig. 4

Fig. 1a

Note 3: The Crossbow MAC layer itself does the switching between the VNICs. The Etherstub is craeated as a dummy device to connect the various virtual NICs. User can imagine etherstub as a Virtual Switch to help visualize the virtual network as a replacement for a physical network where each physical switch is replaced by a virtual switch (implemented by a Crossbow etherstub).

Create the Virtual Network

Lets start by creating the 2 etherstubs using the dladm command
gz# dladm create-etherstub etherstub1
gz# dladm create-etherstub etherstub3
gz# dladm show-etherstub

Create the necessary Virtual NICs. VNIC1 has a limited speed of 100Mbs while others have no limit
gz# dladm create-vnic -l etherstub1 vnic1
gz# dladm create-vnic -l etherstub1 vnic2
gz# dladm create-vnic -l etherstub1 vnic3

gz# dladm create-vnic -l etherstub3 vnic6
gz# dladm create-vnic -l etherstub3 vnic9
gz# dladm show-vnic
LINK        OVER             SPEED  MACADDRESS         MACADDRTYPE       
vnic1       etherstub1      - Mbps  2:8:20:8d:de:b1    random            
vnic2       etherstub1      - Mbps  2:8:20:4a:b0:f1    random            
vnic3       etherstub1      - Mbps  2:8:20:46:14:52    random            
vnic6       etherstub3      - Mbps  2:8:20:bf:13:2f    random            
vnic9       etherstub3      - Mbps  2:8:20:ed:1:45     random            

Create the hosts and assign them the VNICs. Also create the Virtual Router and assign it VNIC3 and VNIC9 over etherstub1 and etherstub3 respectively. Both the Virtual Router and Hosts are created using Zones in this example but you can easily use Xen or logical domains.

Create a base Zone which we can clone. The first part is necessary if you are on a zfs filesystem.
gz# zfs create -o mountpoint=/vnm rpool/vnm
gz# chmod 700 /vnm

gz# zonecfg -z vnmbase
vnmbase: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/vnmbase
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit

This part takes 15-20 minutes
gz# zoneadm -z vnmbase install

Now lets create the 2 hosts and the Virtual Router as follow
gz# zonecfg -z host1
host1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/host1
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic1
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit

gz# zoneadm -z host1 clone vnmbase
gz# zoneadm -z host1 boot

gz# zlogin -C host1

Connect to the console and go through the sysid config. For this example, we assign as IP address for vnic1. You can specify this during sysidcfg. For default route, specify as the default route. You can say 'none' for naming service, IPv6, kerberos etc for the purpose of this example.

Similarly create host2 and configure it with vnic2 i.e.
gz# zonecfg -z host2
host2: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/host2
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic2
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit

gz# zoneadm -z host2 clone vnmbase
gz# zoneadm -z host2 boot

gz# zlogin -C host2

Connect to the console and go through the sysid config. For this example, we assign as IP address for vnic2. You can specify this during sysidcfg. For default route, specify as the default route. You can say 'none' for naming service, IPv6, kerberos etc for the purpose of this example.

Lets now create the Virtual Router as
gz# zonecfg -z vRouter
vRouter: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:vnmbase> create
zonecfg:vnmbase> set zonepath=/vnm/vRouter
zonecfg:vnmbase> set ip-type=exclusive
zonecfg:vnmbase> add inherit-pkg-dir
zonecfg:vnmbase:inherit-pkg-dir> set dir=/opt
zonecfg:vnmbase:inherit-pkg-dir> set dir=/etc/crypto
zonecfg:vnmbase:inherit-pkg-dir> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic3
zonecfg:vnmbase:net> end
zonecfg:vnmbase> add net
zonecfg:vnmbase:net> set physical=vnic9
zonecfg:vnmbase:net> end
zonecfg:vnmbase> verify
zonecfg:vnmbase> commit
zonecfg:vnmbase> exit

gz# zoneadm -z vRouter clone vnmbase
gz# zoneadm -z vRouter boot

gz# zlogin -C vRouter

Connect to the console and go through the sysid config. For this example, we assign as IP address for vnic3 and as the IP address for vnic9. You can specify this during sysidcfg. For default route, specify 'none' as the default route. You can say 'none' for naming service, IPv6, kerberos etc for the purpose of this example. Lets enable forwarding on the Virtual Router to connect the 10.x.x.x and 20.x.x.x networks.
vRouter# svcadm enable network/ipv4-forwarding:default

Note 5: The above is done inside virtual router. Make sure you are in the window where you did the zlogin -C vRouter above

Now lets bringup VNIC6 and configure it including setting up routes in the global zone. You can easily create another host called host3 as the client on 20.x.x.x network by creating a host3 zone and assigning it IP address

Lets configure the VNIC6. Open a xterm in the global zone
gz# ifconfig vnic6 plumb up
gz# route add
gz# ping is alive
gz# ping is alive

Similarly, login into host1 and/or host2 and verify connectivity
host1# ping is alive
host1# ping is alive

Set up Link Speed

What we configured above are unlimited B/W links. We can configure a link speed on all the links. For this example, lets configure the link speed of 100Mbps on VNIC1
gz# dladm set-linkprop -p maxbw=100 vnic1

We could have configured the link speed (or B/W limit) while we were creating the vnic itself by adding the
-p maxbw=100
option to create-vnic command.

Test the performance

Start 'netserver' (or tool of your choice) in host1 and host2. You wil have to install the tools in the relevant places
host1# /opt/tools/netserver &
host2# /opt/tools/netserver &

gz# /opt/tools/netperf -H
TCP STREAM TEST to : histogram

Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10\^6bits/sec  

 49152  49152  49152    10.00    2089.87  

gz# /opt/tools/netperf -H
TCP STREAM TEST to : histogram
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10\^6bits/sec  

 49152  49152  49152    10.00     98.78   

Note6: Since is assigned to VNIC2 which has no limit, we get the max speed possible. is configured over VNIC1 which is assigned to host1 and we just set the link speed to 100Mbps and thats why we get only 98.78Mbps.


gz# zoneadm -z host1 halt
gz# zoneadm -z host1 uninstall

delete the zone
gz# zonecfg -z host1
zonecfg:host1> delete
Are you sure you want to delete zone host1 (y/[n])? y
zonecfg:host1> exit

In this way, delete host2 and vRouter zones. Make sure you don't delete vnmbase since re creating it takes time.
gz# ifconfig vnic6 unplumb

After you have deleted the zone, you can delete vnics and etherstubs as follows
# dladm delete-vnic vnic1			/\* Delete VNIC \*/
# dladm delete-vnic vnic2
# dladm delete-vnic vnic3
# dladm delete-vnic vnic6
# dladm delete-vnic vnic9

# dladm delete-etherstub etherstub3		/\* Delete etherstub \*/
# dladm delete-etherstub etherstub1

Make sure that VNICs are unplumbed (ifconfig vnic6 unplumb) and not assigned to a zone (delete the zone first) before you can delete them. You need to delete all the vnics on the etherstub before you can delete the etherstub.

User Exercises

Now that you are familiar with the concepts and technology, you are ready to do some experiments of your own. Cleanup the machine as mentioned above. The exercises below will help you master IP routing, configuring networks, and debugging for performance bottlenecks.
  • Recreate the Virtual Networkwork as show in Fig 1b but this time create an additional zone called client and assigned vnic6 to that client zone.
    	client Zone		vRouter		host1		host2
    		|		  |  |		  |		  |
    		---- etherstub3 ---  -------- etherstub 1----------
    Run all your connectivity tests from zloging into the client. Now change all IPv4 addresses to be IPv6 addresses and verify that client and hosts still have connectivity
  • Leave the Virtual Network as in 1, but configure OSPF in vRouter instead of RIP by default. Verify that you can still get the connectivity. Note the steps needed to configure OSPF
  • Configure and networks as two separate autonomous networks, assign them unique ASN numbers and configure unique BGP domains. Verify that connectivity still works. Note the steps needed to configure BGP domains.
  • Cleanup everything and recreate the virtual network in 1 above but instead of statically assigning the IP addresses to hosts and clients, configure NAT on the vRouter to give out address on subnet on vnic3 and address on for vnic9. While creating the hosts and clients, configure them to get their IP address through DHCP.
  • Cleanup everything and recreate the virtual network in 1 above. Add additional router vRouter2 which has a vnic each on the 2 etherstubs.
    			/ 	 \\
    			\\	 /
    This provides a redundant path from client to the hosts. Experiment with running different routing protocols and assign different weight to each path and see what path you take from client to host (use traceroute to detect). Now configure the routing protocol on two vRouters to be OSPF and play with link speeds and see how the path changes. Note the configuration and observations.
  • Cleanup. Lets now introduce another Virtual Router between two subnets i.e.
    client Zone		vRouter1	vRouter2	host1	     host2
    	|		  |  | 		 |    |		  |	       |
    	---- etherstub3 ---  -etherstub 2-    -----etherstub 3----------
    Now set the link (VNIC) between vRouter1 and etherstub2 to be 75 Mbps. Use snmp from client to retrive the stats from the vRouter1 and check where the packets are getting dropped when you run netperf from client to host2.

    Remove the limit set earlier and instead set the link speed of 75 Mbps on link between etherstub2 and vRouter2. Again use snmp to get the stats out on vRouter1. Do you see similar results as vRouter1? If not, can you explain why?

Conclusion and More resources

Use the real example and configure the virtual network to get familiar with the techniques used. At this point, have a look at your network and try to create a virtual network.

Get more details on the OpenSolaris Crossbow page

You can find high level presentations, architectural documents, man pages etc at

Join the mailing list at

Send in your questions or your configuration samples and we will put it in the use cases examples.

A similar Virtual Network example using global zone as a NAT can be found on Nicolas's blog at

Kais has a a example of dynamic bandwidth paritioning at

Venu talks about some of the cool crossbow features at which allows virtualizing services with Crossbow technology using flowadm.

Tuesday Dec 06, 2005

Niagara - Designed for Network Throughput

Niagara - Designed for Network throughput

We finally announce Niagara based servers to the public! Billed as the low cost, energy efficient, huge network throughput processors - marketing mumbo jumbo you think?? Well, try it and you will see. I was priviledged enough that one of the earliest prototype landed on my desk (or in my lab to be precise) so Solaris networking could be tailored to take advantage of the chip. And boy, together with Solaris, this thing rocks!!

So you know that Niagara is multi core, multi threaded chip and Solaris takes advantage in multiple way. Let me highlight some of them.

Network performance

The load from the NIC is fanned out to multiple soft rings in the GLDv3 layer based on the src IP address and port information. Each soft ring in turn is tied to a Niagara thread and a Vertical Perimeter  such that packets from a connection have locality to specific H/W thread on a core and the NIC has locality to specific core. Think of this model as 4 H/W threads per core processing the NIC such that if one thread stalls for resource, the CPU cycles are not wasted. The result is amazing network performance for this beast. Performs 5-6 times the performance of your typical x86 based CPU.


Imagine you are a ISP or someone wanting to consolidate multiple machines on one physical machine. Well, Niagara based platforms lends themselves beautifully to this concept because there are so many H/W threads around which appear as individual CPUs to Solaris. We have a project underway called  Crossbow (details available on Network Community page on OpenSolaris) which will allow you to carve the machine (create virtual network stacks) into multiple virtual machines and tied specific CPUs to them and control the B/W utilization for each virtual machine on a shared NIC.

Real Time Networking/Offload

With GLDv3 based drivers and FireEngine architecture in Solaris 10, the stack controls the rate of interrupts and can dynamically switch the NIC between interrupt and polling mode. Couple with Niagara platform, Solaris can run the entire networking stack on one core and provide real time capabilities to the application. Meanwhile, the application them selves run on different core without worrying about networking interrupts pinning them down. You can get pretty bounded latencies provided application can do some admission control. We are also planning to hide the core running networking from the application effectively getting TOE for free without suffering from the drawbacks of offloading networking to a spearate piece of hardware.

[ T: ]

Tuesday Jun 14, 2005

The world of Solaris Networking

The world of Solaris Networking The DDay has finally arrived. Open Solaris is here. For me personally, its a very nice feeling since I can now talk about the architecture and implementation openly with people and point them to the code. Before coming to Sun, I had always been in research labs where collaboration is the way of life. God - how much I missed that part in Sun and thankfully I am hoping to get it.

One of the big changes in Solaris 10 was project FireEngine which allowed Solaris to perform and scale. The important thing that I couldn't tell people before was where the wins came from. Bulk of them came from a lockless design called Vertical perimeter implemented by means of a serialization queue. This allows packets once picked up for processing to be taken all the way up to socket layer or all the way down to device driver. With the aid of the IPclassifer, we bind connections to squeues (which in turn are bound to CPUs) and this allows us to get a better locality and scaling. The squeues also allow us to track the entire backlog per CPU. The GLDv3 based drivers allow IP to control the interrupts and based on the squeue backlog, the interrupts are controlled dynamically to achieve even higher performance and avoid the havoc caused by interrupts. Some day I will tell you stories on how we dealt with 1Gb NICs when they arrived and CPUs were still pretty slow.

Coming back to collaboration, you will notice that Solaris networking architecture looks very different compared to SVR4 STREAMS based architecture or BSD based architecture. It opens new doors for us and it allows us to do stack virtualization and resource control (project Crossbow) and tons of new things. We have setup a networking community page which has brief discussion on some of the new projects we are doing and would love to hear what you think about it. The discussion form on the same page would be an easy way to talk. We are open to suggestions on how you would like to see this go forward.

Enjoy, just like I enjoyed Solaris for so many years!

Technorati Tag: OpenSolaris
Technorati Tag: Solaris

Thursday May 26, 2005

High Performance device driver framework (aka project Nemo)

A lot has happened since my last blog. I will talk about it one of these days. But the coolest thing we finished is called project Nemo. Its a high performance device driver framework which allows writing device driver for Solaris a breeze. Its technically GLDv3 framework but we like to call it Nemo instead :)

So what can Nemo do for you? Well, it switches dynamically between interrupt and polling mode (all controlled by IP) to boost performance. Any device driver which support turning the interrupt on/off can take advantage and boost performance by 20-25% by cutting the number of interrupts in more useful manner and improving the latency at the same time. Way superior to interrupt coalescing etc

Ben also finds it pretty useful here. Hey Ben, as you mentioned, lot of people are finding using ethernet pretty useful in storage as well. I'll have some followon news on our iSCSI front soon. The initiator is already done and will be part of S10 update while we are seening some pretty impressive numbers on a Solaris 10 iSCSI target with 10Gb.

Coming back to Nemo, it also does trunking for both 10Gb and 1Gb NICs in a pretty simple way. We demo'd a trunk of 2 10Gb NICs on a 2 CPU machine during the Sunlab's openhouse in april and we ran over 12GBps over the trunk! There are some other cool things Nemo will allow us to do and one of these days I will tell you the details.

Wednesday Dec 08, 2004

Solaris networking external page and discussion forum ready!

I would like to welcome you all to our External Solaris Networking page on bigadmin (yes I know it took this long but doing the code is lot easier :). Check it out.

It currently has FireEngine (the enhanced high performing TCP/IP stack in Solaris 10) related information (including the public white paper). We plan to move this forward to include every networking related project. Also has a discussion forum. I would like to encourage people to ask any networking related questions there and the experts to answer those there. This way, we build a kind of external FAQ for Solaris networking that would be very useful.

Saturday Oct 16, 2004

Solaris vs Red Hat

Sorry guys, the heading is not mine. Its coming from the discussion at where the Solaris 10 networking is being discussed. It is pretty interesting discussion if you filter out few of the usual posting where people don't really know the facts.

I was surprised to see a large number of people who know Solaris voicing their positive opinions. Normally, people from Solaris world are not very vocal on discussion groups and public forums. So that is a surprising (and good) change. Guys keep it up!

Someone mentioned that why are we not targetting Windows. Come on guys, you got to be serious. I am an engineer and do you think I design networking architecture targeted to beat windows :) As pointed out in the comments, they are not even on my radar. Maybe in next twenty years, their technology will match our current stuff but then we would have hopefully moved on :\^) And yes, as I am told, we do beat Windows 2003 by 20-30% on a 2 CPU x86 box (Opteron 2x2.2GHz with 2 Gb RAM) on webbench (static, dynamic and ecommerce). There are probably more benchmarks but frankly we hadn't had time to compare or publish. Our sole aim right now is to improve the real customer workloads and we are depending on customers to tell us these numbers.

As for AIX and HP-UX (and I am going to get in trouble now with my bosses for saying this), they just don't exist in any significant manner. I have talked to a large numbers of customers in past two years since part of our approach is to understand what the customer is having trouble with and what he will need going forward, and let me be really honest, I don't see HP-UX at all and very little AIX. Yes I do see IBM and HP machines, but they are all running Linux (please no flames, this is just my experience).

Again, when we are designing/writing new code, we do like to set some targets. When it comes to scaling across large number of CPUs, we have always done very well because thats where we focused. We never really looked at 1-2 CPU performance before since it was always easy to add more processors on SPARC platforms. Linux on the other hand has really simple code that allowed it to perform very well on 1 CPU. So our challenge was to come up with an architecture that could beat Linux on low end and still allowed us to scale linearly on high end and sure enough, we created FireEngine . Its the same code that runs on SPARC platforms scaling linearly and runs pretty fast on 2 CPU x86 platforms. And as you add more CPUs on x86 (going to 4 and 8 and then dual core), we just start becoming very compelling architecture.

As for some people commenting about the validity on the numbers comparing Solaris 10 and Apache with RHEL AS3 and Apache on, they are on the same H/W. Its a 2x2.2 Ghz Opteron box (V20z) with 6Gb RAM and 2 Broadcom Gig NICs. The numbers were done on webbench and the other major web performance benchmark that we can't talk about since the numbers are not published yet. These numbers are for out of box Solaris 10 32bits with no tuning at all (entire FireEngine focus was on out of box performance for real customer workloads). And frankly, we are not really interested in benchmarks because all the Linux web performance numbers (for instance SPECweb99) are published using TUX or Red Hat content accelarator. I haven't come across a single customer who is running TUX so far. So why doesn't someone publish a Linux Apache number without any benchmark special and we will be sure to put resources to meet/beat those numbers. That I think would be a more fair comparison. And thats why I am far more impressed by customer quotes like the one from "Bill Morgan, CIO at Philadelphia Stock Exchange Inc.", where he said that Solaris 10 improved his trading capacity by 36%. Now we are not talking about a micro benchmark here but a system level capacity. This was on a 12 way E4800 (SPARC platform). Basically, they loaded Solaris 10 on the same H/W and were able to do 36% more stock transactions per second.

And once again, I am not really anti Linux or anything. I just need something to compete against in a good natured way (HP-UX, AIX, IRIX are not around anymore, and I still can't bring myself down to compete with Windows). Before FireEngine, it was Linux guys who used to pull my leg saying when will I make Solaris perform as well as Linux on 1 CPU. Well, Solaris does perform now and some of the guys who used ot pull my leg took me out for beer when they loaded Solaris express on their system. And knowing them, I might be buying the next round somewhere down the line.

Oh, before I end, I wanted to just touch on why we are not comparing against RHEL AS4beta. Well, its not us who is doing the comparing but our customers. And that is because although Solaris 10 is due to ship now, things like FireEngine have been available and stable for almost a year. If I am to do the comparison, I will pick the latest in Red hat but I will compare it against Solaris 10 Update (due out 3-6 months after Solaris 10). And you know what, we haven't exactly been sitting around for the past year. Solaris 10 update will improve performance over S10 FCS by another 20-25% on networking workloads.

More Solaris on x86 Performance data is featuring performance on Solaris x86 platforms. BTW, couple of you asked how the new TCP/IP stack can scale almost linearly. I am trying to understand how much I am allowed to say on blogs like this but hopefully next week sometime I will give a more technical rundown for the geeks out there.

Monday Oct 11, 2004

Solaris 10 on x86 really performs

Someone pointed me to this article from George Colony, CEO, Forrester Research and the real story from Tom Adelstein. Both are pretty interesting articles but one of the feedbacks "Untrue... Learn the Facts first" to Tom kind of got me motivated to write this blog. "Solaris 10 on x86" can really match Linux in performance and better yet, linearly scale over large number of CPUs (remember that 8 CPUs x86 blades are here already and then we will start seeing 8 CPUs, dual core blades). The new network architecture (FireEngine) in S10 allows the same code to give a huge performance win on 1 and 2 CPU configurations and give linear scaling when more CPUs are added.

Take for instance web performance. We have improvemed 2 CPU performance by close to 50% (compared to Solaris 9) using a real web server like Apache, Sun One Web Server, Zeus, etc without any gimmicks like kernel caching etc. Its just plain webserver with TCP/IP and a dumb NIC. Some of our Solaris express customers are telling us that we are outperforming RHEL AS3 by almost 15-35% on the same hardware.

Interested in more numbers - On static and dynamic webbench, Solaris 10 is at par with RHEL AS3 on 2 CPU v20z while its ahead by 15% on webbench Ecommerce benchmark. On the same box, we can saturate a 1Gb NIC using only 8-9% on a 2.2Ghz Opteron processor but the real killer deal is that our 10Gb drivers are coming up and Alex Aizman fromS2io just informed me that we are pushing close to 7.3Gbps traffic on a v20z (with 2 x 1.6 Ghz Opterons) with more than 20% CPU to spare. We haven't even ported the driver to the high performance Nemo framework or enabled any hardware features as yet. So I am expecting a huge upside in next 2-3 months as the driver gets ported to Nemo (Paul and Yuzo should tell you more about Nemo sometime soon).

The improvements are not restricted to TCP only. We are doing a FireEngine followup for UDP which improves Tibco benchmark by 130% and Volano Mark benchmark by 30%. The customer tells us that we are outperforming RHEL AS3 by almost 15% on the same hardware. Adi et. al. can add some more details about UDP performance.

And the big killer features on Opterons, you can run 64bit Oracle or webserver on 64bit Solaris to take advantage of the bigger VM space but leave bulk of your apps to be 32bits which run unchanged.

I am not claiming the best performing OS title (atleast not yet!) for Solaris 10, but guys, we are still ramping up! Every new project going in Solaris is now delivering double digits performance improvements (FireEngine architecture has opened the door) and soon I will claim that title :) I must add that all these gains come on the same hardware without application needing to change at all. Just get the latest Solaris Express and see it for yourself.

And BTW, most of us at SUN are really pretty friendly towards Linux. Sure we compete in a good natured way. And Tom did hit the nail on the head regarding why people at SUN don't like Red Hat - Its really has to do with them having transformed free Linux into a not so free Linux.

Thursday Oct 07, 2004

Thanks for the interest. More performance has been ordered!

Wow! Judging from the number of email I received and the interest in network performance in general, looks like real people also read these blogs (other than robots and crawlers). I really appreciate the interest. Keep those emails coming as well and I will be more diligent in updating these logs frequently with interesting stuff and updates. As requested by most of you, I have placed a quaterly recurring order for more performance from The order tracking number is "sunay at sun dot com" :)

Sunday Oct 03, 2004

When will you have enough performance?

For someone who never had a web-page, this blog business is really frightening so bear with me if I seem like a novice. I wonder if someone actually reads these pages or its just the robots, crawlers and zombies generating the hits ;\^) Anyway since Carol (our PM) thinks this is useful medium to tell people outside Sun what I am thinking instead of them finding out when the product actually ships, here goes.

My name is Sunay Tripathi and I am a Sr. staff Eng. in Solaris Networking and security technologies. Yes, we are the people who make the 'Net' work in 'Network is the computer'. I also go by as the architect of FireEngine, the new TCP/IP stack in Solaris 10 for people who have tried Solaris 10 already and are pretty happy with the performance (which is most).

So what am I working on these days - well I hear 10Gb is happening. And I also hear that 10Gb is not enough. People are wanting 20-30Gbps bandwidth coming into 4 CPU opteron blades and still have meaningful processing power left!! Well, you do that and watch the interrupts go up like crazy and the system behave in more twisted ways than you can imagine and trust me, its not nice. But FireEngine comes to the rescue. We can tame the interrupts and do exactly what people want. I'll tell you the details some other day unless John Fowler can beat me to it by blogging soon.

Fairness and security is something that keeps me awake these days. A large section of customers tell me that they see 'http' literally disappearing in next 3-5 years and everything will be 'https' (SSL) and they don't want to sacrifice CPU just doing crypto and they don't want crypto to overwhelm rest of the traffic. Well, OK, they said QOS but what they actually meant was fairness without any guarantees. I am hard pressed to see why CNN will go 'https' but they do have a point - Yahoo mail should really be SSL protected by default!! So I am building fairness as part of the architecture instead of another add-on layer.

So let me tell you what else do I do other than designing and writing code. I like to hang out with my old stanford and IIT buddies who keep telling me that how we can combine forces to build the next big thing for internet (some day). I also love watching my 11 months old learn to walk. He is already hooked on to my workstation and has his own desktop now. Not surprising given that he sees his Mom and Dad spend 80% of their waking hours on these things. But what he really wants is my Acer Ferrari laptop running 64bit Solaris and I tell him dream on buddy :) My other passion is fast cars (after fast code) and Taekwando. I am a black belt and used to practice with Stanford Taekwando. Had a string of injuries last year which has kept me away but I have started training again and will be back soon.

Well, thats who I am. But let me tell you the real reason why I am doing this (apart from the fact even Sin-yaw and the rest of the perf team has a blog) - I actually want to hear back from you guys. Tell me what latest and greatest thing you are working on or dreaming off and how Solaris can make it happen for you. Not sure how the feedback thing works on this blog but you can always drop me a direct email. The address is pretty simple - sunay at sun dot com. I would also love to hear your opinions if you already tried Solaris 10.

And as for when will you have enough performance? The answer is never!


Sunay Tripathi, Sun Distinguished Engineer, Solaris Core OS, writes a weblog on architecture for Solaris Networking Stack, GLDv3 (Nemo) framework, Crossbow Network Virtualization and related things


« August 2016

No bookmarks in folder

Solaris Networking: Magic Revealed

No bookmarks in folder

solaris networking