Saturday Oct 01, 2011

See you at Oracle OpenWorld!

I will be speaking at Oracle OpenWorld 2011 next week in San Francisco and I hope you will join me to learn more about Oracle Solaris 11, zones, network virtualization/Crossbow, I/O scalability, SPARC SuperCluster, and Solaris on Exadata and Exalogic. I will be speaking at the following sessions:
  • Session ID: 14646
    Session Title: Delivering the Near-Impossible: Around-the-Clock Global Secure Infrastructure
    Venue / Room: Moscone South - 252
    Date and Time: 10/3/11, 17:00 - 18:00

  • Session ID: 16242
    Session Title: Oracle Solaris Technical Panel with the Core Solaris Developers
    Venue / Room: Moscone South - 236
    Date and Time: 10/5/11, 13:15 - 14:15

  • Session ID: 16243
    Session Title: Cloud-Scale Networking with Oracle Solaris 11 Network Virtualization
    Venue / Room: Moscone South - 236
    Date and Time: 10/5/11, 17:00 - 18:00
You will also find me at the Meet the Experts area at the Moscone South DEMOgrounds for a 1-1 chat on Monday 3-4pm and Tuesday 9:45-11am. See the complete list of Oracle Solaris-related events at OOW here:

You can also follow me on twitter at @ndroux for my live coverage of OOW and up-to-the-minute status.

See you there!

Monday Nov 15, 2010

Solaris 11 Express Released! On Crossbow, NUMA I/O, Exadata, and more…

After many years under development, Solaris 11 Express is now available from Oracle. This milestone makes the many features and improvements that we have been working on since Solaris 10 available with Oracle Premier Support! As the architect for Crossbow and NUMA I/O, I wanted to spend some time here to give you a quick introduction and my perspective on these features.

Solaris 11 Express includes Crossbow, which we integrated in Solaris a couple of years ago, and have been steadily improving since then. Crossbow provides network virtualization and resource control designed into the core networking stack. This tight integration allows us to provide the best performance, leveraging advanced NIC hardware features and providing scalability on Oracle Systems from the 8 sockets Nahalem-based Sun Fire x4800 to the four socket SPARC T3-4.

Management of Crossbow VNICs and QoS is also closely integrated with other Solaris administration tools and features. For example, VNICs and bandwidth limits can be easily managed with the common data link management tool diadm. Crossbow allows the Solaris Zones virtualization architecture to be taken to the next level, allowing each zone to have its own VNIC(s) and virtual link speed, improving separation between zones by automatically binding network kernel resources (threads and interrupts) to the CPUs belonging to a zone.

Crossbow features such as virtual switching, virtual NICs, bandwidth limits, and resource control can be combined with other networking features introduced by Solaris 11 Express (Load balancing, VRRP, bridging, revamped IP tunnels, improved observability) to provide the ideal environment to build fully virtual networks in a box for simulation, planning, debugging, and teaching. Thanks to these features and the high efficiency of Zones, Solaris 11 Express provides the foundation for an open networking platform.

While an integrated data path, QOS, resource control, and scalability built-in are key for performance, equally important is managing and placing these resources on large systems. The Sun Fire x4800 and Oracle SPARC T3-4 for instance provide several processor sockets, connected to multiple PCI Express I/O switches. On such large systems, the processors are divided into multiple NUMA (Non-Uniform Memory Access) nodes, connected through high-speed interconnect. I/O requests as well DMA transfers to and from devices must be routed through the CPU interconnect, and the distance between devices and the CPUs used to process I/O requests must be kept to a minimal for best overall system scalability.

NUMA I/O is a new Solaris kernel framework which is used by other Solaris I/O subsystems (such as a network stack) to register their I/O resources (kernel threads, interrupts, and so on) and define at a high-level the affinity between these resources. The NUMA I/O framework discovers the I/O topology of the machine, and places these I/O resources on the physical CPUs according to the affinities specified by the caller, as well as the NUMA and I/O hardware topology.

The Oracle Exadata Database Machine running Solaris 11 Express depends heavily on NUMA I/O to achieve best Infiniband RDSv3 performance, which is the protocol used by the Exadata database compute nodes (Sun Fire x4800 in the case of the Oracle Exadata X2-8) to communicate with the Exadata Storage Servers. NUMA I/O is designed to be a common framework, and work is in progress to leverage it from other Solaris I/O subsystems.

Learn more about these features and the many other innovations provided by Solaris 11 Express, such as IPS, new packaging system that redefines the OS software life cycle, ZFS crypto, a new installer, Zones improvements, etc, on the Solaris 11 Express site at There you will also find information on how to download Solaris 11 Express, details the type of support available, documentation, and many other community resources.


Thursday Feb 14, 2008

Private virtual networks for Solaris xVM and Zones with Crossbow

Virtualization is great: save money, save lab space, and save the planet. So far so good! But how do you connect these virtual machines, allocate them their share of the bandwidth, and how do they talk to the rest of the physical world? This is where the OpenSolaris Project Crossbow comes in. Today we are releasing a new pre-release snapshot of Crossbow, an exciting OpenSolaris project which enables network virtualization in Solaris, network bandwidth partitioning, and improved scalability of network traffic processing.

This new release of the project includes a new features which allows you to build complete virtual networks that are isolated from the physical network. Virtual machines and Zones can be connected to these virtual networks, and isolated from the rest of the physical network through firewall/NAT, etc. This is useful when you want to prototype a distributed application before deploying it on a physical network, or if you want to isolate and hide your virtual network.

This article shows how Crossbow can be used together with NAT to build a complete virtual network connecting multiple Zones within a Solaris host. The same technique applies to xVM Server x64 as well, since xVM uses Crossbow for its network virtualization needs. A detailed description of the Crossbow virtualization architecture can be found in my document here.

In this example, we will build the following network:

First we need to build our virtual network, this can be done very simply using Crossbow using etherstubs. An etherstub is a pseudo ethernet NIC which can be created with dladm(1M). VNICs can then be created on top of that etherstub. The Crossbow MAC layer of the stack will implicitly create a virtual switch between all the VNICs sharing the same etherstub. In the following example we create an etherstub and three VNICs for our virtual network.

# dladm create-etherstub etherstub0
# dladm create-vnic -d etherstub0 vnic0
# dladm create-vnic -d etherstub0 vnic1
# dladm create-vnic -d etherstub0 vnic2

By default Crossbow will assign a random MAC address to the VNICs, as we can see from the following command:

# dladm show-vnic
vnic0 etherstub0 0 Mbps 2:8:20:e7:1:6f random
vnic1 etherstub0 0 Mbps 2:8:20:53:b4:9 random
vnic2 etherstub0 0 Mbps 2:8:20:47:b:9c random

You could also assign a bandwidth limit to each VNIC by setting the maxbw property during VNIC creation. At this point we are done creating our virtual network. In the case of xVM, you would specify "etherstub0" instead of a physical NIC to connect the xVM domain to the virtual network. This would cause xVM to automatically create a VNIC on top of etherstub0 when booting the virtual machine. xVM configuration is described in the xVM configuration guide.

Now that we have our VNICs we can create our Zones. Zone test1 can be created as follows:

# zonecfg -z test1
test1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:test1> create
zonecfg:test1> set zonepath=/export/test1
zonecfg:test1> set ip-type=exclusive
zonecfg:test1> add inherit-pkg-dir
zonecfg:test1:inherit-pkg-dir> set dir=/opt
zonecfg:test1:inherit-pkg-dir> end
zonecfg:test1> add net
zonecfg:test1:net> set physical=vnic1
zonecfg:test1:net> end
zonecfg:test1> exit

Note that in this case the zone is assigned its own IP instance ("set ip-type=exclusive"). This allows the zone to configure its own VNIC which is connected to our virtual network. Now it's time to setup NAT between our external network and our internal virtual network. We'll be setting up NAT with IP Filter, which is part of OpenSolaris, based on the excellent NAT write up by Rich Teer.

In our example the global zone will be used to interface our private virtual network with the physical network. The global zone connects to the physical network via eri0, and to the virtual private network via vnic0, as shown by the figure above. The eri0 interface eri0 is configured the usual way, and in our case its address is assigned using DHCP:

# ifconfig eri0
eri0: flags=201000843 mtu 1500 index 2
inet netmask ffffff00 broadcast
ether 0:3:ba:94:65:f8

We will assign a static IP address to vnic0 in the global zone:

# ifconfig vnic0 plumb
# ifconfig vnic0 inet up
# ifconfig vnic0
vnic0: flags=201100843 mtu 9000 index 6
inet netmask ffffff00 broadcast
ether 2:8:20:e7:1:6f

Note that the usual configuration variables (e.g. /etc/hostname.) must be populated for the configuration to persist across reboots). We must also enable IPv4 forwarding on the global zone. Run routeadm(1M) to display the current configuration, and if "IPv4 forwarding" is disabled, enable it with the following command:

# routeadm -u -e ipv4-forwarding

Then we can enable NAT on the eri0 interface. We're using a simple NAT configuration in /etc/ipf/ipnat.conf:

# cat /etc/ipf/ipnat.conf
map eri0 -> 0/32 portmap tcp/udp auto
map eri0 -> 0/32

We also need to enable IP filtering on our physical network-facing NIC eri0. We run "ipnat -l" to verify that our NAT rules have been enabled.

# svcadm enable network/ipfilter
# ipnat -l
List of active MAP/Redirect filters:
map eri0 -> portmap tcp/udp auto
map eri0 ->

Now we can boot our zones:

# zoneadm -z test1 boot
# zoneadm -z test2 boot

Here I assigned the address to the vnic1 assigned to zone test1:

# zlogin test1
[Connected to zone 'test1' pts/2]
# ifconfig vnic1
vnic1: flags=201000863 mtu 9000 index 2
inet netmask ffffff00 broadcast
ether 2:8:20:53:b4:9
# netstat -nr

Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default UG 1 0
default UG 1 0 vnic1 U 1 0 vnic1 UH 1 2 lo0

Routing Table: IPv6
Destination/Mask Gateway Flags Ref Use If
--------------------------- --------------------------- ----- --- ------- -----
::1 ::1 UH 1 0 lo0

Note that the zone appears to be on a network and has what looks like a regular NIC with a regular MAC address. In reality, this zone is connected to a virtual network isolated from the physical network. From that non-global zone, we can now reach out to the physical network via NAT running in the global zone:

# ssh someuser@
Last login: Tue Feb 12 13:35:03 2008 from somehost

From the global zone, we can query NAT to see the translations taking place:

# ipnat -l
List of active MAP/Redirect filters:
map eri0 -> portmap tcp/udp auto
map eri0 ->

List of active sessions:
MAP 37153 <- -> 26333 [ 22]

Of course this is only the tip of the iceberg. You could deploy NAT from a non-global zone itself, or deploy a virtual router on your virtual network, you could enable additional filtering rules, etc, etc. Of course you are not limited to only one virtual network. You can create multiple virtual networks within a host, route between these networks, etc. We are exploring some of the possibilities as part of the Crossbow and Virtual Network Machines projects.




« August 2016