Thursday May 24, 2012

Solaris11 VNICs on SR-IOV Virtual Functions

OVM Server for SPARC(a.k.a LDoms) 2.2 provides support SR-IOV. That is, an SR-IOV Virtual Function(VF) can be assigned to a Logical Domain. A VF will provide bare metal like performance, this blog explains how to configure a VF so that VNICs can be created on top the VF device. This is required to be able to support Solaris11 Zones in a logical domain. 

 The following example, shows how to setup a VF so that VNICs can be created on it. 


When a VF is created by default only one mac-address(primary mac-address) is assigned to it. In order to create VNICs, additional mac-addresses need to be assigned to it. This can be done either when the VF is created or using the set-io command. This example assumes you already created a VF.  If the VF is assigned to a domain, then it must be stopped before assigning additional mac-addresses. 

The following command allocates 3 alternate mac-addresses using automatic mac-address allocation method. 

 Primary# ldm set-io alt-mac-addrs=auto,auto,auto /SYS/MB/NET0/IOVNET.PF0.VF0


 Now boot the logical domain to which the above VF is assigned. You can check the mac-addresses assigned to a VF using the following dladm command. 

ldg0# dladm show-phys -m net3
LINK                SLOT     ADDRESS            INUSE CLIENT
net3                primary  0:14:4f:f9:48:69   yes  net3
                    1        0:14:4f:fb:38:e    no   --
                    2        0:14:4f:fa:c8:7d   no   --
                    3        0:14:4f:fb:99:4b   no   --


Now, we can create up to 3 VNICs on the net3 device using the dladm command.  Creating more than that will fail. If more VNICs are desired, assign more mac-addresses using the 'ldm set-io' command.

ldg0# dladm create-vnic -l net3 vnic0
ldg0# dladm create-vnic -l net3 vnic1
ldg0# dladm create-vnic -l net3 vnic2
ldg0# dladm create-vnic -l net3 vnic3
May 20 22:16:18 vnic: WARNING: cannot detach client: 22
dladm: vnic creation over net3 failed: operation failed

SR-IOV feature in OVM Server for SPARC 2.2

One of the main features of OVM Server for SPARC(a.k.a LDoms) 2.2 is SR-IOV support. This blog is to help understand SR-IOV feature in LDoms a little better.

What is SR-IOV?

SR-IOV is an abbreviation for Single Root I/O virtualization. It is a PCI-SIG standards based I/O virtualization, that enables a PCIe function known as Physical Function(PF) to create multiple light weight PCIe functions known as Virtual Functions(VFs). After they are created, VFs show up like a regular PCIe functions and also operate like regular PCIe functions. The address space for a VF is well contained so that a VF can be assigned to a Virtual Machine(a logical domain) with the help of Hypervisor. SR-IOV provides the high granularity of sharing compared to other form of direct h/w access methods that are available in LDoms technology, that is, PCIe bus assignment and Direct I/O. A few important things to understand about PF and VFs are:

  • A VF configuration space provide access to registers to perform I/O only. That is, only access to DMA channels and related registers.
  • The common h/w related configuraton changes can only be performed via the PF, so a VF driver need to contact PF driver perform the change on behalf of VF. A PF driver owns the responsiblity of ensuring a VF does not impact other VFs or PF in any way.

More detalis of SR-IOV can be found at the PCI-SIG website: PCI-SIG Single Root I/O Virtualization

What are the benefits of SR-IOV Virtual Functions?

  • Bare metal like performance.
    • No CPU overhead and latency issues that are seen in Virtual I/O.
  • Throughput that is only limited by the number of VFs from the same device and actively performing I/O.
    • There is no limitation of throughput due to such things as implementation limitations that exist Virtual I/O.
    • At a given time, if it is the only one VF that is performing I/O, it can potentially utilize the entire b/w available.
    • When multiple VFs are trying perform I/O, then b/w allocation depends on the SR-IOV card h/w on how it allocates the b/w to VFs. The devices that are supported in LDoms2.2 apply a round robin type of policy, which distributes the available b/w equally to all VFs that are performing I/O.

In summary, a logical domain with an application that requires bare metal like I/O performance is a best candidate to use SR-IOV. Before assigning an SR-IOV Virtual Function to a logical domain, it is important to understand the limitations that come along with it, see below for more details.


LDoms2.2 SR-IOV Limitations:

Understand the following limitations and plan ahead on how you would deal with them in your deployment.

  • Migration feature is disabled for a logical domain that has a VF is assigned to it.
    • For all practical purposes, a VF looks like physical device in a logical domain. This brings all the limitations of having a physical device in a logical domain.
  • Hard dependency on the Root domain(a domain in which PF device exists).
    • In LDoms2.2, the Primary domain is the only root domain that is supported. That is, rebooting the Primary domain will impact the logical domains that have a VF assigned to them, the behavior is unpredictable but the common expectation is an OS panic.
    • Caution: Prior to rebooting the Primary domain, ensure that all logical domains that have a VF assigned to them are properly shutdown. See LDoms2.2 admin guide about how to setup a failure-policy to handle unplanned cases.
  • Primary Domain as the only root domain supported. That is, SR-IOV supported only for the SR-IOV cards that are in the PCIe bus owned by the Primary domain.
    • If a PCIe bus assigned to another logical domain, typically used to create failover configs, then SR-IOV support for for the cards in that bus is disabled. You will not see the Physical functions from those cards.

What hardware needed?

The following details may help you plan on what hardware needed to use LDoms SR-IOV feature.

  • SR-IOV feature is supported only on platforms based on T3 and beyond. This feature is not available on platforms T2 and T2+.
  • LDoms2.2 at the release time will support two SR-IOV devices. These are:
    • On-board SR-IOV ethernet devices. T3 and T4 platforms have Intel Gigabit SR-IOV capable Ethernet devices on the mother-board. So, to explore this technology, you have a device already available in your system.
    • Intel 10Gbps SR-IOV card with part numbers((X)1109A-Z, (X)1110A-Z, (X)4871A-Z). See LDoms2.2 Release notes for accurate info.
NOTE: Make sure to update Fcode firmware on these cards to ensure all features work as expected. See LDoms2.2 release notes for details on where and how to update the card's firmware.

What software needed?

The following are the software requirements, see the LDoms2.2 release notes and admin guide for more details.

  • LDoms2.2 Firmware and LDoms manager. See LDoms release notes for the Firmware versions for your platform.
  • ">SR-IOV feature requires major Solaris framework support and PF drivers in the Root Domain. At this time, the SR-IOV feature support available only in Solaris11 + SRU7 or later. So, ensure Primary domain has Solaris11 + SRU7 or later.
  • Guest domains can run either Solaris10 or Solaris11. If Solaris10 were to be used ensure, you have update9 or update10 with VF driver patches installed. See LDoms2.2 release notes for the patch numbers. If Solaris11 is used, then ensure you have SRU7 or later instaled.


References: LDoms 2.2 documentation 


How to create and assign SR-IOV VFs to logical domains?

This an example that shows how to create 4 VFs from an on-board SR-IOV PF device and assign them to 4 logical domains on a T4-1 platform. The following diagram shows the end result of this example. Example Showing SR-IOV VFs assigned to 4 Logical domains


Run 'ldm list-io' command to see all available PF devices. Note, the name of the PF device includes details about which slot the PF is located at. For example, a PF named /SYS/MB/RISER1/PCIE4/IOVNET.PF0 is present in a slot labled as PCIE4. Primary# ldm ls-io NAME TYPE DOMAIN STATUS ---- ---- ------ ------ pci_0 BUS primary niu_0 NIU primary /SYS/MB/RISER0/PCIE0 PCIE - EMP /SYS/MB/RISER1/PCIE1 PCIE - EMP /SYS/MB/RISER2/PCIE2 PCIE - EMP /SYS/MB/RISER0/PCIE3 PCIE - EMP /SYS/MB/RISER1/PCIE4 PCIE primary OCC /SYS/MB/RISER2/PCIE5 PCIE primary OCC /SYS/MB/SASHBA0 PCIE primary OCC /SYS/MB/SASHBA1 PCIE primary OCC /SYS/MB/NET0 PCIE primary OCC /SYS/MB/NET2 PCIE primary OCC /SYS/MB/RISER1/PCIE4/IOVNET.PF0 PF - /SYS/MB/RISER1/PCIE4/IOVNET.PF1 PF - /SYS/MB/RISER2/PCIE5/P0/P2/IOVNET.PF0 PF - /SYS/MB/RISER2/PCIE5/P0/P2/IOVNET.PF1 PF - /SYS/MB/RISER2/PCIE5/P0/P4/IOVNET.PF0 PF - /SYS/MB/RISER2/PCIE5/P0/P4/IOVNET.PF1 PF - /SYS/MB/NET0/IOVNET.PF0 PF - /SYS/MB/NET0/IOVNET.PF1 PF - /SYS/MB/NET2/IOVNET.PF0 PF - /SYS/MB/NET2/IOVNET.PF1 PF - Primary#


Let's use the PF with name /SYS/MB/NET0/IOVNET.PF0 for this example. The PF name has NET0, which indicates this is an on-board device. Using the -l option we can find additional details such as the path of the device and the maximum number of VFs it supports. This device supports upto a maximum of 7 VFs. Primary# ldm ls-io -l /SYS/MB/NET0/IOVNET.PF0 NAME TYPE DOMAIN STATUS ---- ---- ------ ------ /SYS/MB/NET0/IOVNET.PF0 PF - [pci@400/pci@2/pci@0/pci@6/network@0] maxvfs = 7 Primary# In the root domain, we can find the network device that maps to this PF by searching for the matching path in /etc/path_to_inst file. This device maps to igb0. Primary# grep pci@400/pci@2/pci@0/pci@6/network@0 /etc/path_to_inst "/pci@400/pci@2/pci@0/pci@6/network@0" 0 "igb" "/pci@400/pci@2/pci@0/pci@6/network@0,1" 1 "igb" Primary# In Solaris11 the auto-vanity name generates generic linknames, you can find the linkname for the device using the following command. You can see the igb0 maps to net0. So, we are really using the net0 device in Primary domain. Primary# dladm show-phys -L LINK DEVICE LOC net0 igb0 /SYS/MB net1 igb1 /SYS/MB net2 igb2 /SYS/MB net3 igb3 /SYS/MB net4 ixgbe0 PCIE4 net5 ixgbe1 PCIE4 net6 igb4 PCIE5 net7 igb5 PCIE5 net8 igb6 PCIE5 net9 igb7 PCIE5 net10 vsw0 -- net11 usbecm2 -- Primary#


Create 4 VFs on the PF /SYS/MB/NET0/IOVNET.PF0 using the create-vf command. Note the creating VFs in LDoms2.2 release requires a reboot of the root domain. We can create multiple VFs and reboot only once. NOTE: As this operation requires, plan ahead on how many VFs you would like create and create them in advance. You might be tempted to create max number of VFs and use them later, but this may not good with devices that support large number of VFs. For example Intel 10Gbps SR-IOV device that is supported in this release supports upto a max of 63 VFs. But T3 and T4 platforms can only support a max of 15 I/O domains per PCIe bus. So, creating more than 15 VFs on the same PCIe bus needs to planned on how you would use them, typically you may have to assign multiple VFs to a domain as we only support 15 I/o domains per PCIe bus. Primary# ldm create-vf /SYS/MB/NET0/IOVNET.PF0 Initiating a delayed reconfiguration operation on the primary domain. All configuration changes for other domains are disabled until the primary domain reboots, at which time the new configuration for the primary domain will also take effect. Created new VF: /SYS/MB/NET0/IOVNET.PF0.VF0 Primary# ldm create-vf /SYS/MB/NET0/IOVNET.PF0 ------------------------------------------------------------------------------ Notice: The primary domain is in the process of a delayed reconfiguration. Any changes made to the primary domain will only take effect after it reboots. ------------------------------------------------------------------------------ Created new VF: /SYS/MB/NET0/IOVNET.PF0.VF1 Primary# ldm create-vf /SYS/MB/NET0/IOVNET.PF0 ------------------------------------------------------------------------------ Notice: The primary domain is in the process of a delayed reconfiguration. Any changes made to the primary domain will only take effect after it reboots. ------------------------------------------------------------------------------ Created new VF: /SYS/MB/NET0/IOVNET.PF0.VF2 Primary# ldm create-vf /SYS/MB/NET0/IOVNET.PF0 ------------------------------------------------------------------------------ Notice: The primary domain is in the process of a delayed reconfiguration. Any changes made to the primary domain will only take effect after it reboots. ------------------------------------------------------------------------------ Created new VF: /SYS/MB/NET0/IOVNET.PF0.VF3 Primary#


Reboot the Primary domain. Caution: If there are any I/O domains that have PCIe slots of VFs assigned to them, then shutdown those logical domains before rebooting the Primary domain.


Once the Primary domain is rebooted, now the VFs are available to assign to other logical domains. Use list-io command to see the VFs and then assign them to I/O domains. You can see the VFs at the end. If the list is long you can use the PF name as the arg to limit the listing to VFs from that PF only. Primary# ldm ls-io NAME TYPE DOMAIN STATUS ---- ---- ------ ------ pci_0 BUS primary niu_0 NIU primary /SYS/MB/RISER0/PCIE0 PCIE - EMP /SYS/MB/RISER1/PCIE1 PCIE - EMP /SYS/MB/RISER2/PCIE2 PCIE - EMP /SYS/MB/RISER0/PCIE3 PCIE - EMP /SYS/MB/RISER1/PCIE4 PCIE primary OCC /SYS/MB/RISER2/PCIE5 PCIE primary OCC /SYS/MB/SASHBA0 PCIE primary OCC /SYS/MB/SASHBA1 PCIE primary OCC /SYS/MB/NET0 PCIE primary OCC /SYS/MB/NET2 PCIE primary OCC /SYS/MB/RISER1/PCIE4/IOVNET.PF0 PF - /SYS/MB/RISER1/PCIE4/IOVNET.PF1 PF - /SYS/MB/RISER2/PCIE5/P0/P2/IOVNET.PF0 PF - /SYS/MB/RISER2/PCIE5/P0/P2/IOVNET.PF1 PF - /SYS/MB/RISER2/PCIE5/P0/P4/IOVNET.PF0 PF - /SYS/MB/RISER2/PCIE5/P0/P4/IOVNET.PF1 PF - /SYS/MB/NET0/IOVNET.PF0 PF - /SYS/MB/NET0/IOVNET.PF1 PF - /SYS/MB/NET2/IOVNET.PF0 PF - /SYS/MB/NET2/IOVNET.PF1 PF - /SYS/MB/NET0/IOVNET.PF0.VF0 VF /SYS/MB/NET0/IOVNET.PF0.VF1 VF /SYS/MB/NET0/IOVNET.PF0.VF2 VF /SYS/MB/NET0/IOVNET.PF0.VF3 VF Primary#


Assign each VF to a logical domain using the 'add-io' command.

NOTE: LDoms2.2 requires the logical domain to which the VF is being assigned to be stopped. So, if the logical domains to which the VFs need to be assigned are running, then stop them and then assign VFs. Primary# ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF0 ldg0 Primary# ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF1 ldg1 Primary# ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF2 ldg2 Primary# ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF3 ldg3


Start the logical domains to use the VFs in them. You can start each domain individually or start all logical domains with 'ldm start -a'. NOTE: A VF device can be used to boot over network at the OBP prompt too. Primary# ldm start ldg0 LDom ldg0 started Primary# ldm start ldg1 LDom ldg1 started Primary# ldm start ldg2 LDom ldg2 started Primary# ldm start ldg3 LDom ldg3 started


Login to the guest domain and configure VF device for use. The VF device will appear like any other physical NIC device. You can only distinguish it by the device name using solaris commands. The following commands show a VF on logical domain 'ldg0' running solaris11 and configure it for use using dhcp. ldg0# dladm show-phys LINK MEDIA STATE SPEED DUPLEX DEVICE net0 Ethernet unknown 0 unknown igbvf0 ldg0# ldg0# ipadm create-ip net0 ldg0# ipadm create-add -T dhcp net0/dhcp Unrecognized subcommand: 'create-add' For more info, run: ipadm help ldg0# ipadm create-addr -T dhcp net0/dhcp ldg0# ifconfig net0 net0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 3 inet netmask ffffff00 broadcast ether 0:14:4f:f9:48:69 ldg0# After this the VF device can be used like any other network device for all applications without any latency or performance issues seen in Virtual I/O.

Raghuram Kothakota


« May 2012 »