Tuesday Mar 17, 2009

Crossbow: Virtualized switching and performance

Crossbow: Virtualized switching and performance

Saw Cisco's unified fabric announcement. Seems like they are going after Cloud computing which pretty much promises to solve the world hunger problem. Even if Cloud computing can just solve the high data center cost problem and make compute, networking, and storage available on demand in a cheap manner, I am pretty much sold on it. The interesting part is that world needs to move towards enabling people to bring their network on the cloud and have compute, bandwidth and storage available on demand. Talking about networking and network virtualization, this means that we need to go to open standards, open technology and off the shelf hardware. The users of cloud will not accept a vendor or provider lock down. The cloud needs to be built in such a manner that a user can take his physical network and migrate it to an operator's cloud and at the same time have the ability to build their own clouds and migrate stuff between the two. Open Networking is the key ingredient here.

This essentially means that there is no room for custom ASICs and protocols and the world of networking needs to change. This is what Jonathan was talking about to certain extent around Open Networking and Crossbow. OpenSolaris with Crossbow make things very interesting in this space. But it seems like people don't fully understand what Crossbow and OpenSolaris bring to the table. I saw a post from Scott Lowe and several other mentioning that Crossbow is pretty similar to VMware's network virtualization solutions and Cisco Nexus 1000v virtual switches.

Let me take some time to explain few very important things about Crossbow:
  • Its Open Source and part of OpenSolaris. You can download it right here.
  • Its leverages NIC hardware switching and features to deliver isolation and performance for virtual machines. Crossbow not only includes H/W & S/W based VNICs and switches, it also offers Virtualized Routers, Load balancer, and Firewalls. The Virtual Network Machines can be created using Crossbow and Solaris Zones and have pretty amazing performance. All these are connected together using the Crossbow Virtual Wire. You don't need to buy fancy and expensive virtualized switches to create and use Virtual Wire.
  • Using hardware virtualized lanes Crossbow technology scales multiples of 10gig traffic using off the shelf hardware.

Hardware based VNICs and Hardware based Switching

Picture is always worth a thousand words. The figure shows how crossbow VNIC are built on top of real NIC hardware and how we do switching in hardware where possible. And Crossbow does have a full featured S/W layer where it can do S/W VNICs and switching as well. The hardware is leveraged when available. Its important to note that most of the NIC vendors do ship with the necessary NIC classifiers and Rx/Tx rings and its pretty much mandatory for 10 gig NICs which do form the backbone for a cloud.
Crossbow H/W based VNICs

Virtual Wire: The essence of virtualized networking

The Crossbow Virtual Wire technology allows a person to convert a full features physical network (multiple subnets, switches and routers) and configure it within one or more hosts. This is the key to move virtualized networks in and out of the cloud. The figure shows a two subnet physical network with multiple switches, different link speeds and connected via a router and how it can be virtualized in a single box. A full workshop to do virtualized networking is available here.
Virtual Wire

Scaling and Performance

Crossbow leverages the NICs features pretty aggressively to create virtualization lanes that help traffic scale across large number of cores and threads. For people wanting to build real or virtual appliances using OpenSolaris, the performance and scaling across 10 Gig NICs is pretty essential. The figure below shows an overview of hardware lanes.
Crossbow Virtualization Architecture

More Information

There is a white paper and more detailed documents (including how to get started) at the Crossbow OpenSolaris page.



Monday Mar 02, 2009

Crossbow enables an Open Networking Platform

Crossbow enables an Open Networking Platform

I came across this blog from Paul Murphy. You should read the second half of Pauls blog. What he says pretty true. Crossbow delivered a brand new networking stack to Solaris which has scalability, virtualization, QoS, and better observability designed in (instead of patched in). The complete list of features delivered and under works are here. Coupled with a full fledged open source Quagga Routing Suite (RIP, OSPF, BGP, etc), IP Filter Firewall, and a kernel Load Balancer, OpenSolaris becomes a pretty useful platform for building Open Networking appliances.

Apart from single box functionality, imagine if you want to deliver Virtual Router or a load balancer, it would be pretty easy to do so. OpenSolaris offers Zones where you can deliver a pre configured zone as a Router, Load balancer, or a firewall. The difference would be that this Zone would be fully portable to another machine running OpenSolaris and will have no performance penalty. After all, we aka Crossbow team guarantee that our VNICs with Zones do not have any performance penalties. You can also build a fully portable and pre configured virtual networking equipment using Xen guest which can be made to migrate between any OpenSolaris or Linux host.

I noticed that couple of folks on Paul blog were asking about why Crossbow NIC virtualization is different? Well, its not just the NIC being virtualized but actually the entire data path along with it called a Virtualization Lane. You can see the virtualization lane all the way from NIC to socket Layer and back here. Not only is there one or more Virtualization Lanes per virtual machine, the bandwidth partitioning, Diffserv tagging, priority, CPU assignment etc. are designed in as part of the architecture. The same concepts are used to scale the stack across multiples of 10gigE NIC over large number of cores and threads (out of the world forwarding performance anyone!).

And as mentioned before, Crossbow enables Virtual Wire. A ability to create a full featured network without any physical wires. Think of running network simulations and testing in a whole new light!!

Sunday Dec 14, 2008

Crossbow - Network Virtualization Architecture Comes to Life

Crossbow - Network Virtualization Architecture Comes to Life

Crossbow - Network Virtualization Architecture Comes to Life

December 5th, 2008 was a joyous occasion and a humbling one at the same time. A vision that was created 4 years back was coming to life. I still remember the summer of 2004 when Sinyaw threw a challenge at me - can you Change the world? And it was Fall of same year when I unveiled the first set of Crossbow slides to him and Fred Zlotnik over a bottle of wine. Lot of planning and finally ready to start but there were still hurdles in the way. We were still trying to finish Nemo aka GLDv3 - A high performance device driver framework which was absolutely required for Crossbow (We needed absolute control over the Hardware). Nemo finished mid 2005 but then Nicolas, Yuzo etc. left Sun and went to a startup. Thiru was still trying to finish Yosemite (the FireEngine follow on). So in short, 2005 was basically more planning and prototyping (specially controlling the Rx rings and dynamic polling) on my part. I think it was early 2006 when work begin on Crossbow in earnest. Kais moved over from security group, Nicolas was back at Sun, Thiru, Eric Cheng, Mike Lim (and of course me) came together to form the core team (which later expanded to 20+ people in early 2008). So it was a long standing dream and almost three years of hard work that finally came to life when Crossbow Phase 1 integrated in Nevada Build 105 (and will be available in OpenSolaris 6.09 release).

Crossbow - H/W Virtualized Lanes that Scale (10gigE over multiple cores)

One of key tenets of Crossbow design was the concept of H/W Virtualization Lanes. Essentially tying a NIC Receive and Transmit ring, DMA channel, kernel threads, kernel queues, processing CPUs together. There are no shared locks, counters or anything. Each lane gets to individually schedule the packet processing by switching its Rx ring independently between interrupt mode and poll mode (Dynamic Polling). Now you can see why Nemo was so important because without it, stack couldn't control the H/W and without Nemo, the NIC vendors wouldn't have played along with us in adding the features we wanted (stateless classification, Rx/Tx rings, etc). Once a lane is created, we can program the classifier to spread packets based on IP addresses and port between each lane for scaling reasons. With the multiple cores and multiple thread that seems to be the way of life going forward and 10+ gigE of Bandwidth (soon we will have IPoIB working as well), scaling really matters (and we are not talking about achieving line rates on 10 gigE with jumbo grams - we are talking about real world, mix of small and large packets, 10k of connections and 1000s of threads).

To demonstrate the point, I captured bunch of statistics while finishing the final touches to the data path and getting ready to beat some world records. The table below shows mpstat output along with packets per second serviced for the Intel Oplin (10gigE) NIC on a Niagara2 based system. The NIC has enabled all 8 Rx/Tx rings and has 8 interrupts enabled (one for each rx ring).
 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
 38    0   0    6    21    3   31    1    5   12    0    86    0   0   0  99
 39    0   0 2563  5506 3907 3282   28   34 1170    0   178    0  21   0  78
 40    0   0 2553  5117 3948 2410   38  150 1192    0   504    1  21   0  77
 41    0   0 2651  5221 4232 2011   25   53 1195    0   210    0  20   0  80
 42    0   0 3078  5700 4743 2069   21   28 1285    0   125    0  22   0  78
 43    0   0 3280  5837 4777 2118   19   24 1328    0   101    0  22   0  78
 44    0   0 3143 19566 18801 1773  50   44 1285    0    68    0  65   0  35
 45    0   0 4570  7748 6838 1984   23   27 1697    0   118    0  29  0  71

# netstat -ia 1
    input   e1000g    output       input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls 
4       0     1       0     0      61284   0     128820  0     0     
3       0     2       0     0      61015   0     129316  0     0     
4       0     2       0     0      60878   0     128922  0     0  

This link shows the interrupt binding, mpstat and intrstat output. You can see that the NIC is trying very hard to spread the load but because the stack sees this as one NIC, there is one CPU (number 44) where all the 8 threads collide. Its like a 8 lane highway becoming single lane during rush hours.

Now lets look what happens when Crossbow enables a lane all the way up the stack for each Rx ring and also enables dynamic polling for each individually. If you look at the corresponding mpstat and intrstat output and packets per second rate, you will see that the lanes really do work independently from each other resulting in almost linear spreading and much higher packets per second serviced. The benchmark represents a webserver workload and needless to say, Crossbow with dynamic polling on per Rx ring basis almost tripled the performance. The raw stats can be seen here.
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys wt idl
 37    0   0 2507 11906 10272 4267  265  326  489    0   776    4  28   0  68
 38    0   0 2111 11793 9840 6503  336  314  472    0   615    3  32   0  65
 39    0   0  500 10409 10164  565    7  125  174    0  1413    6  23   0  70
 40    0   1  660 10423 9982  950   23  288  272    0  3834    8  34   0  58
 41    0   1  658 10490 10108  847   16  238  237    0  2549    8  29   0  64
 42    0   0  584 10605 10299  708   12  181  207    0  1828    7  26   0  67
 43    0   0  732 10829 10559  598    9  141  193    0  1485    7  25   0  68
 44    0   1  306   487   25 1091   17  282  330    0  4083    9  17   0  74

# netstat -ia 1
     input   e1000g    output       input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls 
2       0     1       0     0      267619  0     522226  0     0     
2       0     2       0     0      275395  0     539920  0     0     
2       0     2       0     0      251023  0     482335  0     0     
And finally below we print some statistics from the MAC per Rx ring data structure (mac_soft_ring_set_t). For each Rx ring, we track the number of packets received via interrupt path, number received via poll path, chains less than 10, chains between 10 and 50 and chains over 50 (each time we polled the Rx ring). And you can see that polling path brings a larger chunk of packets and in bigger chains.
Crossbow Virtualization Architecture
Keep in mind that for most OSes and most NIC, the interrupt path brings one packet at a time. This makes Crossbow architecture more efficient for scaling as well as performance at higher loads on high B/W NICs.

Crossbow and Network Virtualization

Once we have the ability to create these independent H/W lanes, programming the NIC classifier is easy. Instead of spreading the incoming traffic for scaling, we program the classifier to send packets for a mac address to a individual lane. The MAC addresses are tied to individual Virtual NICs (VNICs) which are in turn attached to guest Virtual Machines or Solaris Containers (Zones). The separation for each virtual machine is driven by the H/W and processed on the CPUs attached to the virtual machine (the poll thread and interrupts for the Rx ring for a VNIC are bound to the assigned CPUs). The picture kind of looks like this
Crossbow Virtualization Architecture
Since for NICs and VNICs, we always do dynamic polling, enforcing bandwidth limit is pretty easy. One can create a VNIC by simply specifying the B/W limit, priority, cpu lists in one shot and the poll thread will enforce the limit by picking up only packets that meet the limit. Something as simple as
freya(67)% dladm create-vnic -l e1000g0 -p maxbw=100,cpus=2 my_guest_vm
The above command will create a VNIC called my_guest_vm with a random MAC address and assign it a B/W of 100Mbps. All the processing for this VNIC is tied to CPU 2. Its features like this that makes Crossbow a integral part of Sun Cloud Computing initiative due to roll out soon.

Anyway, this should give you a flavour. There is a white paper and more detailed documents (including how to get started) at the Crossbow OpenSolaris page.



About

Sunay Tripathi, Sun Distinguished Engineer, Solaris Core OS, writes a weblog on architecture for Solaris Networking Stack, GLDv3 (Nemo) framework, Crossbow Network Virtualization and related things

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Blogroll
News

No bookmarks in folder

Solaris Networking: Magic Revealed

No bookmarks in folder

solaris networking