Tuesday Apr 29, 2014

Solaris 11.2 Networking Overview: Application-Driven SDN and Beyond

Today we are excited to announce Solaris 11.2 (Solaris 11.2 Beta available here). This release introduces significant improvements to the Solaris networking stack, substantially expanding its built-in network virtualization and SLA features to provide a distributed virtual network infrastructure for your private cloud, and enabling application-driven SLAs. Together, these features are the foundation of the built-in Application-driven Software-Defined Networking (SDN) capabilities of Solaris 11.2.

As the Chief Architect for Solaris Core Networking, I am pleased to introduce this significant set of exciting new features and their benefits.

  • Elastic Virtual Switch (EVS): provides a built-in distributed virtual network infrastructure that can be used to deploy virtual switches across a collection of machines. EVS provides centralized management and observability for ease of use, and for the monitoring of resources across all the nodes in a single view. Control is performed through easy to use administration tools or OpenStack.

    EVS currently supports VXLAN and VLAN for maximum flexibility, and for easily integrating in your existing environment. Our architecture is fabric-independent, and can be extended in the future to support additional types of network fabrics. EVS manages network configuration across the compute nodes and the network for you automatically, and dynamically adapts to the location of your workload.

    EVS is tightly integrated with the newly introduced Solaris kernel zones as well as native zones, allowing a zone's VNIC to easily connect to an elastic virtual switch.

  • OpenStack Neutron Networking: Solaris 11.2 includes a full distribution of OpenStack, taking advantage of the stability, performance, and security of Solaris. For networking, Solaris 11.2 includes an OpenStack Neutron plugin layered on top of EVS. This plugin allows you to leverage the new Solaris distributed virtual networking capabilities from OpenStack transparently.

  • VXLAN: Extended VLANs allowing virtual segments to be layered on top of generic IP networks. VXLANs provide greater flexibility than VLANs which typically require switch configuration and limited to 4096 VLAN instances.

  • Datalink Multipathing Probing: DLMP was introduced in Solaris 11.1 to combine the benefits of link aggregation and IPMP. For instance, DLMP like IPMP does not require configuration of the switches, can implement failover between multiple switches without relying on switch vendor proprietary extensions. DLMP is implemented as a new mode of the link aggregation Solaris feature, which allows it to be easily combined with our network virtualization features, providing highly available VNICs to VMs and zones.

    Solaris 11.2 adds probe-based failure detection to DLMP, allowing the use of layer-3 IP probes to one or more target nodes on the network. IP address consumption is reduced through transitive probing between the members of a DLMP aggregation.

  • High-Priority Hardware-assisted Flows: This new feature extends the set of SLAs supported by Solaris to allow flows to be associated with a high or normal traffic priority. Packets belonging to flows with a high priority are processed more quickly through the network stack through dedicated kernel resources. When possible, high priority flows are segregated between multiple hardware rings in the NIC. The interrupt throttling settings of the underlying NIC are also dynamically adjusted if possible.

  • Application-Driven SLAs: In order to enable true application-driven SLAs, Solaris 11.2 provides new APIs allowing applications like Oracle RAC or JVMs to dynamically associate SLAs (bandwidth limit or priority) with a network sockets. Processing critical network traffic such as heartbeats at a higher priority improves system uptime, and bandwidth capping allow fine control of the bandwidth usage for better performance and isolation between different types of network traffic. Flows are dynamically created according to the configured SLAs, and they can be monitored with flowstat(1M).

  • NUMA IO performance improvements for latency-sensitive workloads: The NUMA IO framework, which we introduced in Solaris 11 to improve Solaris performance and allow it to scale on large machines such as the SPARC M5-32, was updated with this release to avoid latency spikes on loaded systems. Instead of binding kernel IO threads to specific CPUs, they are now bound to a subset of CPUs, allowing the NUMA IO locality optimizations to be preserved, while letting the dispatcher pick the best available CPU according to the current load. This reduces the risk for latency spikes on a loaded system, and also leads to a better distribution of IO-related CPU processing. These new bindings are pool optimized, meaning that the IO processing for a zone with its dedicated set of CPUs will be executed on the CPUs belonging to that zone for best isolation.

  • Network Monitoring: Network monitors are new in Solaris 11.2. They continuously monitor the network state and the system configuration for problems such as misconfiguration on the host or network. This allows common problems such as mismatching VLAN settings between switch and host, or misconfigured MTUs to be detected and reported early, minimizing down time.

  • Reflective Relay: One of the common use cases of virtualization is consolidation, where multiple physical machines are consolidated on a single server and run in virtual machines or zones. The Solaris build-in network virtualization provides a virtual software datapath between these VMs/zones. In some cases, the policies of the environment being consolidated requires the traffic to go through a centralized server for traffic isolation, accounting, etc. The Reflective Relay feature allows the software virtual switched to be bypassed, and all packets to be sent on the physical network infrastructure.

  • Precision Time Protocol: PTP enables the synchronization between hosts. By optionally leveraging hardware-assisted timestamping, PTP can achieve synchronization that is more precise than NTP.

  • SR-IOV VNICs: Single-Root IO Virtualization, or SR-IOV, virtualizes IO devices to allow direct access to hardware from virtual machines, avoiding costly hypervisor overhead. SR-IOV VNICs encapsulate SR-IOV virtual functions within VNICs, allowing SR-IOV to be managed and monitored as any a regular VNIC. Even when a SR-IOV virtual function is mapped into a guest for direct access, the corresponding VNIC remains present in the host operating system to allow for the control of the virtual function.

As you can see, Solaris 11.2 provides you with a significant set of new networking features and benefits, whether integrated within your existing infrastructure, or as the foundation of your next generation private cloud. The Solaris 11.2 Beta is now available for download; we are looking forward to your feedback!

Saturday Oct 01, 2011

See you at Oracle OpenWorld!

I will be speaking at Oracle OpenWorld 2011 next week in San Francisco and I hope you will join me to learn more about Oracle Solaris 11, zones, network virtualization/Crossbow, I/O scalability, SPARC SuperCluster, and Solaris on Exadata and Exalogic. I will be speaking at the following sessions:
  • Session ID: 14646
    Session Title: Delivering the Near-Impossible: Around-the-Clock Global Secure Infrastructure
    Venue / Room: Moscone South - 252
    Date and Time: 10/3/11, 17:00 - 18:00

  • Session ID: 16242
    Session Title: Oracle Solaris Technical Panel with the Core Solaris Developers
    Venue / Room: Moscone South - 236
    Date and Time: 10/5/11, 13:15 - 14:15

  • Session ID: 16243
    Session Title: Cloud-Scale Networking with Oracle Solaris 11 Network Virtualization
    Venue / Room: Moscone South - 236
    Date and Time: 10/5/11, 17:00 - 18:00
You will also find me at the Meet the Experts area at the Moscone South DEMOgrounds for a 1-1 chat on Monday 3-4pm and Tuesday 9:45-11am. See the complete list of Oracle Solaris-related events at OOW here: http://bit.ly/oow11-solaris

You can also follow me on twitter at @ndroux for my live coverage of OOW and up-to-the-minute status.

See you there!

Monday Nov 15, 2010

Solaris 11 Express Released! On Crossbow, NUMA I/O, Exadata, and more…

After many years under development, Solaris 11 Express is now available from Oracle. This milestone makes the many features and improvements that we have been working on since Solaris 10 available with Oracle Premier Support! As the architect for Crossbow and NUMA I/O, I wanted to spend some time here to give you a quick introduction and my perspective on these features.

Solaris 11 Express includes Crossbow, which we integrated in Solaris a couple of years ago, and have been steadily improving since then. Crossbow provides network virtualization and resource control designed into the core networking stack. This tight integration allows us to provide the best performance, leveraging advanced NIC hardware features and providing scalability on Oracle Systems from the 8 sockets Nahalem-based Sun Fire x4800 to the four socket SPARC T3-4.

Management of Crossbow VNICs and QoS is also closely integrated with other Solaris administration tools and features. For example, VNICs and bandwidth limits can be easily managed with the common data link management tool diadm. Crossbow allows the Solaris Zones virtualization architecture to be taken to the next level, allowing each zone to have its own VNIC(s) and virtual link speed, improving separation between zones by automatically binding network kernel resources (threads and interrupts) to the CPUs belonging to a zone.

Crossbow features such as virtual switching, virtual NICs, bandwidth limits, and resource control can be combined with other networking features introduced by Solaris 11 Express (Load balancing, VRRP, bridging, revamped IP tunnels, improved observability) to provide the ideal environment to build fully virtual networks in a box for simulation, planning, debugging, and teaching. Thanks to these features and the high efficiency of Zones, Solaris 11 Express provides the foundation for an open networking platform.

While an integrated data path, QOS, resource control, and scalability built-in are key for performance, equally important is managing and placing these resources on large systems. The Sun Fire x4800 and Oracle SPARC T3-4 for instance provide several processor sockets, connected to multiple PCI Express I/O switches. On such large systems, the processors are divided into multiple NUMA (Non-Uniform Memory Access) nodes, connected through high-speed interconnect. I/O requests as well DMA transfers to and from devices must be routed through the CPU interconnect, and the distance between devices and the CPUs used to process I/O requests must be kept to a minimal for best overall system scalability.

NUMA I/O is a new Solaris kernel framework which is used by other Solaris I/O subsystems (such as a network stack) to register their I/O resources (kernel threads, interrupts, and so on) and define at a high-level the affinity between these resources. The NUMA I/O framework discovers the I/O topology of the machine, and places these I/O resources on the physical CPUs according to the affinities specified by the caller, as well as the NUMA and I/O hardware topology.

The Oracle Exadata Database Machine running Solaris 11 Express depends heavily on NUMA I/O to achieve best Infiniband RDSv3 performance, which is the protocol used by the Exadata database compute nodes (Sun Fire x4800 in the case of the Oracle Exadata X2-8) to communicate with the Exadata Storage Servers. NUMA I/O is designed to be a common framework, and work is in progress to leverage it from other Solaris I/O subsystems.

Learn more about these features and the many other innovations provided by Solaris 11 Express, such as IPS, new packaging system that redefines the OS software life cycle, ZFS crypto, a new installer, Zones improvements, etc, on the Solaris 11 Express site at oracle.com. There you will also find information on how to download Solaris 11 Express, details the type of support available, documentation, and many other community resources.


Tuesday May 26, 2009

Crossbow for Cloud Computing Architectures

The first phase of Crossbow was integrated in OpenSolaris last December. I recently posted a paper which shows how Crossbow technology such as virtual NICs (VNICs), virtual switches, Virtual Wires (vWires), and Virtual Network Machines (VNMs) can be used as a foundation to build isolated virtual networks for cloud computing architectures. The document is available on opensolaris.org. Please share your comments on the Crossbow mailing list at crossbow-discuss@opensolaris.org.

Thursday Feb 14, 2008

Private virtual networks for Solaris xVM and Zones with Crossbow

Virtualization is great: save money, save lab space, and save the planet. So far so good! But how do you connect these virtual machines, allocate them their share of the bandwidth, and how do they talk to the rest of the physical world? This is where the OpenSolaris Project Crossbow comes in. Today we are releasing a new pre-release snapshot of Crossbow, an exciting OpenSolaris project which enables network virtualization in Solaris, network bandwidth partitioning, and improved scalability of network traffic processing.

This new release of the project includes a new features which allows you to build complete virtual networks that are isolated from the physical network. Virtual machines and Zones can be connected to these virtual networks, and isolated from the rest of the physical network through firewall/NAT, etc. This is useful when you want to prototype a distributed application before deploying it on a physical network, or if you want to isolate and hide your virtual network.

This article shows how Crossbow can be used together with NAT to build a complete virtual network connecting multiple Zones within a Solaris host. The same technique applies to xVM Server x64 as well, since xVM uses Crossbow for its network virtualization needs. A detailed description of the Crossbow virtualization architecture can be found in my document here.

In this example, we will build the following network:

First we need to build our virtual network, this can be done very simply using Crossbow using etherstubs. An etherstub is a pseudo ethernet NIC which can be created with dladm(1M). VNICs can then be created on top of that etherstub. The Crossbow MAC layer of the stack will implicitly create a virtual switch between all the VNICs sharing the same etherstub. In the following example we create an etherstub and three VNICs for our virtual network.

# dladm create-etherstub etherstub0
# dladm create-vnic -d etherstub0 vnic0
# dladm create-vnic -d etherstub0 vnic1
# dladm create-vnic -d etherstub0 vnic2

By default Crossbow will assign a random MAC address to the VNICs, as we can see from the following command:

# dladm show-vnic
vnic0 etherstub0 0 Mbps 2:8:20:e7:1:6f random
vnic1 etherstub0 0 Mbps 2:8:20:53:b4:9 random
vnic2 etherstub0 0 Mbps 2:8:20:47:b:9c random

You could also assign a bandwidth limit to each VNIC by setting the maxbw property during VNIC creation. At this point we are done creating our virtual network. In the case of xVM, you would specify "etherstub0" instead of a physical NIC to connect the xVM domain to the virtual network. This would cause xVM to automatically create a VNIC on top of etherstub0 when booting the virtual machine. xVM configuration is described in the xVM configuration guide.

Now that we have our VNICs we can create our Zones. Zone test1 can be created as follows:

# zonecfg -z test1
test1: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:test1> create
zonecfg:test1> set zonepath=/export/test1
zonecfg:test1> set ip-type=exclusive
zonecfg:test1> add inherit-pkg-dir
zonecfg:test1:inherit-pkg-dir> set dir=/opt
zonecfg:test1:inherit-pkg-dir> end
zonecfg:test1> add net
zonecfg:test1:net> set physical=vnic1
zonecfg:test1:net> end
zonecfg:test1> exit

Note that in this case the zone is assigned its own IP instance ("set ip-type=exclusive"). This allows the zone to configure its own VNIC which is connected to our virtual network. Now it's time to setup NAT between our external network and our internal virtual network. We'll be setting up NAT with IP Filter, which is part of OpenSolaris, based on the excellent NAT write up by Rich Teer.

In our example the global zone will be used to interface our private virtual network with the physical network. The global zone connects to the physical network via eri0, and to the virtual private network via vnic0, as shown by the figure above. The eri0 interface eri0 is configured the usual way, and in our case its address is assigned using DHCP:

# ifconfig eri0
eri0: flags=201000843 mtu 1500 index 2
inet netmask ffffff00 broadcast
ether 0:3:ba:94:65:f8

We will assign a static IP address to vnic0 in the global zone:

# ifconfig vnic0 plumb
# ifconfig vnic0 inet up
# ifconfig vnic0
vnic0: flags=201100843 mtu 9000 index 6
inet netmask ffffff00 broadcast
ether 2:8:20:e7:1:6f

Note that the usual configuration variables (e.g. /etc/hostname.) must be populated for the configuration to persist across reboots). We must also enable IPv4 forwarding on the global zone. Run routeadm(1M) to display the current configuration, and if "IPv4 forwarding" is disabled, enable it with the following command:

# routeadm -u -e ipv4-forwarding

Then we can enable NAT on the eri0 interface. We're using a simple NAT configuration in /etc/ipf/ipnat.conf:

# cat /etc/ipf/ipnat.conf
map eri0 -> 0/32 portmap tcp/udp auto
map eri0 -> 0/32

We also need to enable IP filtering on our physical network-facing NIC eri0. We run "ipnat -l" to verify that our NAT rules have been enabled.

# svcadm enable network/ipfilter
# ipnat -l
List of active MAP/Redirect filters:
map eri0 -> portmap tcp/udp auto
map eri0 ->

Now we can boot our zones:

# zoneadm -z test1 boot
# zoneadm -z test2 boot

Here I assigned the address to the vnic1 assigned to zone test1:

# zlogin test1
[Connected to zone 'test1' pts/2]
# ifconfig vnic1
vnic1: flags=201000863 mtu 9000 index 2
inet netmask ffffff00 broadcast
ether 2:8:20:53:b4:9
# netstat -nr

Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default UG 1 0
default UG 1 0 vnic1 U 1 0 vnic1 UH 1 2 lo0

Routing Table: IPv6
Destination/Mask Gateway Flags Ref Use If
--------------------------- --------------------------- ----- --- ------- -----
::1 ::1 UH 1 0 lo0

Note that the zone appears to be on a network and has what looks like a regular NIC with a regular MAC address. In reality, this zone is connected to a virtual network isolated from the physical network. From that non-global zone, we can now reach out to the physical network via NAT running in the global zone:

# ssh someuser@
Last login: Tue Feb 12 13:35:03 2008 from somehost

From the global zone, we can query NAT to see the translations taking place:

# ipnat -l
List of active MAP/Redirect filters:
map eri0 -> portmap tcp/udp auto
map eri0 ->

List of active sessions:
MAP 37153 <- -> 26333 [ 22]

Of course this is only the tip of the iceberg. You could deploy NAT from a non-global zone itself, or deploy a virtual router on your virtual network, you could enable additional filtering rules, etc, etc. Of course you are not limited to only one virtual network. You can create multiple virtual networks within a host, route between these networks, etc. We are exploring some of the possibilities as part of the Crossbow and Virtual Network Machines projects.

Monday Apr 02, 2007

Virtual Switching in Solaris with Crossbow VNICs

Virtual NICs, also known as VNICs, are a core components of project Crossbow. They allow physical NICs to be shared by multiple Zones or virtual machines such as Xen domains. VNICs appear to the rest of the system as regular NICs. VNICs can be assigned a subset of the hardware resources (interrupts, rings, etc) made available by the underlying hardware.

In order to provide connectivity between the multiple Zones or virtual machines sharing a single physical NIC, the VNIC layer also provides a data-path between the VNICs defined on top of the same underlying NIC. The VNICs sharing the same underlying NIC appear to be part of the same segment, i.e. connected to a same virtual switch. The virtual switch concept also allow fully virtual networks to be be built within a machine.

A couple of days ago I posted a first draft design document describing the concept of virtual switches, how they are implemented by VNICs in Solaris, and how they can be used in practice.




« August 2016