Saturday Mar 29, 2014

Fibre Channel SR-IOV

OVM Server for SPARC 3.1.1 introduces support for a new class of SR-IOV devices, that is Fibre Channel. The Fibre Channel SR-IOV is a new exciting feature that brings native Fibre Channel performance to the Logical domains. There are no additional overhead caused by any software layers as the SR-IOV Virtual functions are implemented in h/w  and accessed directly in the Logical domains. This technology greatly increases the utilization of HBA adapters and reduces cost by decreasing the number of adapters as well as energy cost, more importantly reduces the FC switch port cost which is very high. This is all accomplished without loosing performance that is typically the case with virtualized IO.

NOTE: The same HBA adapter's h/w resources including the FC port b/w divided among the Virtual Functions, so the total performance won't exceed what a single adapter can deliver, but it is effectly divided among the Virtual Functions. The VFs that are not performing I/O do not consume any b/w, there by that b/w is available to the other VFs. 

This feature is fully dynamic like Ethernet SR-IOV. That is, you can create/destroy VFs dynamically without requiring a reboot of the Root domain and you can dynamically add and remove VFs from Logical domains. There are a few constraints that need to be met to accomplish this, see later in this blog for those details.This feature is also fully compatible with our non-primary Root domains. That is, you can assign a PCIe bus to a Logical domain(known as Root domain) and then create VFs and then assign the VFs to IO domains. This provides a method to reduce the single point of failures in a large deployment.

The following is an example view of 3 VFs from one port of an FC HBA assigned to three different IO domains. 

FC SR-IOV Example View

Getting Started

1. Install the required software:

Install the FC SR-IOV card in the specific PCIe bus and slot in your system. The following document provides the list of adapters that support FC SR-IOV.

Oracle VM Server for SPARC PCIe Direct I/O and SR-IOV Features (Doc ID 1325454.1)

NOTE: At this time only Emulex 16Gbps HBAs(Part numbers: 7101683,7101684,7101689,7101690) are supported.

Ensure the required Firmware installed in your platform. You can find the Firmware support information in the release notes at the following URL:

Install or update the OS version in the Root domain to the version that supports FC SR-IOV. You can find the OS versions that supports FC SR-IOV in the release notes at the following URL:  

Solaris OS Version Requirements

Ensure that the LDoms manager 3.1.1 software installed in the control domain. Note, the LDoms manager 3.1.1 software is automatically installed if the Solaris11.1 SRU17 or later installed in the control domain. You can verify this with "ldm -V" command.

2. Update the FC HBA adapter Firmware

Update the FC HBA adapter Firmware to the version that supports FC SR-IOV. The firmware for Emulex 16Gb HBAs can be found at:

It is important to power cycle the system(actually the HBA that needs the power cycle) after updating the Firmware.

3. FC Connection requirements

Ensure that the FC SR-IOV HBA is connected to a compatible FC Switch that supports NPIV. It is very important ensure that this condition is met. This feature is not supported if the FC port is directly connected to the storage.

4. Verify FC Physical Functions

Each FC HBA port shows up as one Physical Function. The FC PFs are named with "IOVFC.PFx" name, you can identify them in the output of "ldm list-io" command. The following is an example output:

# ldm ls-io
NAME                                      TYPE   BUS      DOMAIN   STATUS   
----                                      ----   ---      ------   ------   
pci_0                                     BUS    pci_0    primary  IOV      
pci_1                                     BUS    pci_1    primary  IOV      
niu_0                                     NIU    niu_0    primary           
niu_1                                     NIU    niu_1    primary           
/SYS/MB/PCIE0                             PCIE   pci_0    primary  OCC      
/SYS/MB/PCIE2                             PCIE   pci_0    primary  OCC      
/SYS/MB/PCIE4                             PCIE   pci_0    primary  EMP      
/SYS/MB/PCIE6                             PCIE   pci_0    primary  EMP      
/SYS/MB/PCIE8                             PCIE   pci_0    primary  EMP      
/SYS/MB/SASHBA                            PCIE   pci_0    primary  OCC      
/SYS/MB/NET0                              PCIE   pci_0    primary  OCC      
/SYS/MB/PCIE1                             PCIE   pci_1    primary  EMP      
/SYS/MB/PCIE3                             PCIE   pci_1    primary  OCC      
/SYS/MB/PCIE5                             PCIE   pci_1    primary  OCC      
/SYS/MB/PCIE7                             PCIE   pci_1    primary  OCC      
/SYS/MB/PCIE9                             PCIE   pci_1    primary  OCC      
/SYS/MB/NET2                              PCIE   pci_1    primary  OCC      
/SYS/MB/NET0/IOVNET.PF0                   PF     pci_0    primary           
/SYS/MB/NET0/IOVNET.PF1                   PF     pci_0    primary           
/SYS/MB/PCIE5/IOVNET.PF0                  PF     pci_1    primary           
/SYS/MB/PCIE5/IOVNET.PF1                  PF     pci_1    primary           
/SYS/MB/PCIE7/IOVFC.PF0                   PF     pci_1    primary           
/SYS/MB/PCIE7/IOVFC.PF1                   PF     pci_1    primary           
/SYS/MB/NET2/IOVNET.PF0                   PF     pci_1    primary           
/SYS/MB/NET2/IOVNET.PF1                   PF     pci_1    primary           

5. Understand the capabilities of each FC Physical Function

FC Physical functions have only one detail that is, the maximum number of VFs supported by it. You can use the "ldm list -l <pf-name>" to find this information. For example:

# ldm list-io -l /SYS/MB/PCIE7/IOVFC.PF0
/SYS/MB/PCIE7/IOVFC.PF0                   PF     pci_1    primary
[pci@500/pci@1/pci@0/pci@6/SUNW,emlxs@0]
    maxvfs = 8

6. Create the Virtual Functions(VFs)

We recommend to create all VFs in one step, this is an optimized method of creating the VFs and use them as needed. There is no performance penalty if some VFs are not used. You can use "ldm create-vf" command to accomplish this. For example:


# ldm create-vf -n max /SYS/MB/PCIE7/IOVFC.PF0
Created new vf: /SYS/MB/PCIE7/IOVFC.PF0.VF0
Created new vf: /SYS/MB/PCIE7/IOVFC.PF0.VF1
Created new vf: /SYS/MB/PCIE7/IOVFC.PF0.VF2
Created new vf: /SYS/MB/PCIE7/IOVFC.PF0.VF3
Created new vf: /SYS/MB/PCIE7/IOVFC.PF0.VF4
Created new vf: /SYS/MB/PCIE7/IOVFC.PF0.VF5
Created new vf: /SYS/MB/PCIE7/IOVFC.PF0.VF6
Created new vf: /SYS/MB/PCIE7/IOVFC.PF0.VF7


NOTE: If the IOV option is not enabled for the given PCIe bus where the FC HBA is installed, the above command will be failed. Enabling IOV option is not dynamic operation today, so one would have to reboot the Root domain to accomplish this. If you have to reboot the Root domain to set the IOV, we recommend to create VFs at the same time so that you can reboot and see VFs availabel soon after the reboot. This can be done with the following commmand.

# ldm start-reconf <root domain>
# ldm set-io iov=on pci_X
# ldm create-vf -n max <PF-name>
# <reboot the root domain to effect the changes>

7.  Understand VF WWNs assignment

The LDoms manager assigns a Port-WWN and a Node-WWN automatically to each FC VF. The auto-allocated WWNs are unique only if all of SPARC systems connected to a given SAN fabric are also connected in the same ethernet multicast domain. If not, they won't be unique. Also, if you ever destroy and recreate the VFs, theymay not get the same WWNs. As you may use these WWNs for Zoning or Lun masking, we would recommend using manual WWN allocation. See the following from admin guide for more details.


You can manually set the WWNs using the following command. You can change the WWNs dynamically as long as that VF is not assigned to any domain.

# ldm set-io port-wwn=<Port WWN> node-wwn=<Node WWN> <vf-name>

8. Configuration of SAN Storage

Configure your SAN storage to assign LUNs to each VFs. It is highly recommend to use LUN masking such that the LUNs are visible only the VF to which they are assigned. This is no different from how LUNs are assigned to different HBAs on different systems. NOTE: One important point to notice here is that you can assign LUNs IO domains such that they are not even visible in the Root domain. This produces equally same level of secure access that you get with different set of HBAs. This is not possible with virtual I/O methods.

9. Assigning VFs to Logical Domains

You can now assign VFs to Logical domains using the "add-io" command. For example, the following commands assign VFs to three different domain.

# ldm add-io /SYS/MB/PCIE7/IOVFC.PF0.VF0 ldg0
# ldm add-io /SYS/MB/PCIE7/IOVFC.PF0.VF1 ldg1
# ldm add-io /SYS/MB/PCIE7/IOVFC.PF0.VF2 ldg2

Make sure to setup the failure-policy settings to handle any unexpected Root domain reboot/crash cases. The following failure-policy will cause the IO domains(here ldg0, ldg1, ldg2) to be reset along with the Root domain. 

# ldm set-domain failure-policy=reset primary
# ldm set-domain master=primary ldg0
# ldm set-domain master=primary ldg1
# ldm set-domain master=primary ldg2 

NOTE: You can dynamically assign VFs too. That is, if the given Logical domain is running the required OS version, you can simply run the same commands to dynamically add the VFs. The IO domain OS requirements are same as what is mentioned for the Root domains, which you can find at:

10. Using the VFs in IO domains

If you statically added the VFs, then you can now start the IO domains and use the VFs like any FC HBA. For example, at OBP prompt of the IO domain, you can perform the "probe-scsi-all" to see all LUNs visible to that VF.   Installing the OS and booting from a LUN is fully supported. Features like MPxIO are full supported. For example, you can assign VFs from different Root Domains and configure MPxIO. 

Caution: Reboot/crash of a Root domain will impact the IO domains. Having VFs from different Root domains doesn't increase the availability of the IO domains.

Documentation

You can find the detailed documentation in the OVM Server for SPARC 3.1.1 Admin Guide

FC SR-IOV Limitations

There are a few limitations that needs to be understood. These are:

  • No support when the HBA is connected to the storage directly.
  • No NPIV support on top of Virtual functions. NPIV on the physical function is supported as usual. 






Tuesday Aug 20, 2013

Live migration of a Guest with an SR-IOV VF using dynamic SR-IOV feature!

NOTE: This is only an example how Dynamic SR-IOV can be exploited to accomplish Live migration of a Guest with SR-IOV VF. 

OVM Server for SPARC 3.1 introduces Dynamic SR-IOV feature which provides the capability to dynamically add and remove SR-IOV Virtual Functions to Logical domains. This is one use case on how Dynamic SR-IOV feature can be combined with rich Solaris IO features and accomplish the Live migration of a Logical domain that has an Ethernet SR-IOV Virtual Function assigned to it. The idea here is to create a multipath configuration in the logical domain with a VF and a Virtual Network(vnet) device so that we can dynamically remove the VF before the live migration and then re-assign the same VF on the target system. The vnet device in the Logical domain provides two functionalities, 1) allows the VF to be dynamically removed from the domain 2) provides communication for the application during the length of Live migration as the VF would be removed at the start of Live migration. 

The Solaris IPMP is the best possible multipath configuration that is possible for a VF and Vnet today. We need to configure the Vnet device as a standby device so that the high-performance VF is used for communication when it is available. If you want a VF to be assigned again on the target system, the restriction today is to add a VF with exact same name, that is a VF from same PCIE slot and same VF number as that on the source system. When the same device is seen by the Solaris OS in the Logical domain, it will automatically adds it to the same IPMP group  so that no manual intervention is required. As the VF was configured as active device, the IPMP will automatically re-direct the traffic to the VF.  The following diagram shows this configuration visually:

Live migration of a Guest with an SR-IOV VF

The above configuration shows that we can also choose to use the same PF as the backend device for the Virtual Switch and a VF from that PF can be assigned to the Guest domain.  That is, both Vnet and VF use the same PF device.  Note, you are free to use another NIC as the backend device for the virtual switch too, but this example  demonstrates that one doesn't require multiple network ports to be used for using this configuration.  The other recommended configuration would be to use another Service domain to host the Virtual switch that way an admin could use the same config to handle the needs of rebooting the Root domain manually. That is, manually remove the VF from the Guest domain dynamically before rebooting the Root domain and assign the VF again dynamically after the Root domain is booted.  

The following are the high-level steps configure such configuration.

Assumptions: 

  • A Guest domain named "ldg1" is already created with configuration to meet the Live migration requirements. 
  • The Guest domain OS support Dynamic SR-IOV, see the OVM Server for SPARC 3.1 Release notes for the OS versions that support Dynamic SR-IOV.
  • The Physical Function /SYS/MB/NET0/IOVNET.PF0 for the VFs.
  • The network device "net0" on the primary domain maps the PF /SYS/MB/NET0/IOVNET.PF0.
  • The desired number of SR-IOV Virtual Functions are already created on the PF /SYS/MB/NET0/IOVNET.PF0. See OVM Server for SPARC 3.1 admin guide for how to create the Virtual Functions. This example uses the VF named /SYS/MB/NET0/IOVNET.PF0.VF0.
  • A Virtual Switch(vsw) named "primary-vsw0" is already created on the primary domain.
Steps to create the config:

  • Create and add a Vnet device to the Guest domain ldg1.
    • # ldm add-vnet vnet0 primary-vsw0 ldg1
  • Add a VF to the Guest domain ldg1
    • # ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF0 ldg1
  • Boot the Guest domain ldg1
  • Login to the Guest domain and configure IPMP
    • Use "dladm show-phys" command to determine the netX names that Solaris OS assigned to the Vnet and VF devices. 
    • This example assumes net0 maps to Vnet device and net1 maps to the VF device.
  • Configure IPMP configuration. Note, this creates a simple IPMP active/standby configuration, you can choose to create the IPMP configuration based on your network needs. Here the important thing is that Vnet device to be configured as standby device.
    • # ipadm create-ip net0
    • # ipadm create-ip net1
    • # ipadm set-ifprop -p standby=on -m ip net0
    • # ipadm create-ipmp -i net0 -i net1 ipmp0
    • # ipadm create-addr -T static -a local=<ipaddr>/<netmask> ipmp0

Live migration steps:

  • On the source system to live migrate:
    • # ldm remove-io  /SYS/MB/NET0/IOVNET.PF0.VF0 ldg1
    • # ldm migrate -p <password-file> ldg1 root@<target-system>
  • On the target system after live migration:
    • # ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF0 ldg1

The following  youtube video is a demo of live migration with SR-IOV VF in a guest while running the network performance test. The graphs show the traffic switching to Vnet after the VF is removed and then switching back to VF when the VF is added on the target system.

Oracle Open World 2012 Demo of Live migration with SR-IOV Virtual Function


Direct I/O and SR-IOV features are now extended to Non-Primary root domains!

Until now OVM Server for SPARC Direct I/O and SR-IOV features were limited to PCIe buses assigned to the primary domain only. This restriction is now removed with the release of OVM Server for SPARC 3.1. That is, now you can assign a PCIe bus to a logical domain and then assign PCIe slots or SR-IOV Virtual Functions from that PCIe bus to other domains. This opens up many different creative opportunities. For example it enables configuration such as below:

Non-Primary root domain example config

A config like the above along with Dynamic SR-IOV feature in OVM Server for SPARC 3.1 would open up various opportunities for deployment. Note this won't increase the availability of the I/O domains yet, but certainly in the near future. It certainly provides opportunity to handle situations manually like rebooting a Root domain. To reboot one of the Root domains, an admin can now remove VFs from I/O domains and then reboot that Root domain, and once the Root domain is rebooted, you can add those VFs back to the I/O domains again. The Solaris OS in I/O domains will automatically add those VFs(need to assign the same exact VFs) back to the same multi-path groups automatically there by everything back to normal. 

OVM Server for SPARC 3.1 introduces Dynamic SR-IOV feature

OVM Server for SPARC 3.1 introduces a great enhancement to the PCIe SR-IOV feature. Until now, creating/destroying SR-IOV Virtual Functions(VFs) in a static way, that is it requires a reboot of the root domain and adding and removing VFs requires the Guest domain to be stopped. Rebooting root domains can be disruptive as it impacts the I/O domains that depend on it. Stopping Guest domain to add or remove a VF is also disruptive to the applications running in it.  OVM Server for SPARC 3.1 enhances the PCIe SR-IOV feature with Dynamic SR-IOV for Ethernet SR-IOV devices that removes these restrictions. That is now we can create/destroy VFs while a root domain is running and we can also add/remove VFs from a Guest domain while it is running without impacting applications. Note, OVM Server for SPARC 3.1 introduces another feature named "Non-Primary Root domains" which extends the PCIe SR-IOV and Dynamic SR-IOV features to all Root domains along with Primary domain. That is, you can perform all Dynamic SR-IOV operations on Non-Primary Root domains also.

This feature is supported only when OVM Server for SPARC 3.1 LDoms manager, corresponding System Firmware and the supported OS version. You can refer to the OVM Server for SPARC 3.1 release notes for the exact information the supported OS version and System Firmware versions. The dynamic IOV is enabled for a given logical domain only if all of the s/w components are installed and other configuration requirements are met. If not, you can always use the static method to accomplish your changes.

There is still one operation that we could not make it dynamic yet, that is enabling IO virtualization for a given PCIe bus. For now, IO virtualization for a given PCIe bus needs to be enabled ahead of time. That is, the following need to be done ahead of time. Note, if you are planning to create VFs at the same time, it is a good idea to create VFs at the same time as well as you would be rebooting any way. The following steps enables IO virtualization for a PCIe bus. Note, this needs to be done only once for a PCIe bus, of course while it is assigned to a Root domain.

Enable I/O Virtualization for a PCIe bus:

  • If the PCIe bus is already assigned to a domain: 
    • # ldm start-reconf <root-domain-name>
    • # ldm set-io iov=on <PCIe bus name>
    • # reboot  
  • You can also enable it while adding a PCIe bus a logical domain:
    • # ldm add-io iov=on <PCIe bus name> <domain-name>
  • You can check if the IOV is enabled for a given PCIe bus with "ldm list-io <PCIe bus name>".

Dynamically create or destroy VFs:

  • Once the IOV is enabled for a given PCIe bus, you can create or destroy VFs with create-vf and destroy-vf without requiring the delayed reconfiguration and a reboot. But this requires the Physical Function network device to be either not plumbed or in a multi-path configuration. The dynamic create/destroy VF operations perform hotplug offline and online operations on the PF device, that causes the PF to detached and re-attached. As a result the  PF device need to be either not plumbed(that is not in use) or in a multipathed(IPMP or aggr) configuration so that hotplug offline/online operations are successful. If that is not the case, the dynamic create/destroy operations will fail reporting that the device is  busy, in such case you can use the static method to create/destroy VFs
    • # ldm create-vf <pf-name>
    • # ldm destroy-vf <vf-name>

Dynamically add or remove VFs:

  • You can now  dynamically add and remove VFs to/from a logical domain. All you would need to run is 'add-io' and 'remove-io' commands without stopping the Guest domain.
    • # ldm add-io <vf-name>  <domain-name>
    • # ldm remove-io <vf-name> <domain-name>

Trouble shooting:

  • The dynamic SR-IOV operations are disabled:
    • Check if the System Firmware that is released with OVM Server for SPARC 3.1 is installed on your system.
    • Check if the OVM Server for SPARC 3.1 LDoms manager is installed.
    • Check if the given domain has the required OS version supported. Check the OVM Server for SPARC 3.1 release notes for this information.
    • Check if IOV is enabled for the given PCIe bus, use "ldm ls-io <PCIe bus name>" to check this.
  • The create-vf or destroy-vf  failed to dynamically perform the operation:
    • Verify if the network device that maps to the PF is not plumbed or in a multipath(Aggr or IPMP). 
      • You can obtain the device path for the PF using 'ldm ls-io -l <pf-name>" and then map it to the corresponding device in the root domain by grepping for it in /etc/path_to_inst. Then use "dladm show-phys" command map that to the netX device name.
  • Dynamically removing a VF from a Guest domain fails
    • Ensure the VF is not in use or in a multi-path configuration.

Thursday May 24, 2012

Solaris11 VNICs on SR-IOV Virtual Functions

OVM Server for SPARC(a.k.a LDoms) 2.2 provides support SR-IOV. That is, an SR-IOV Virtual Function(VF) can be assigned to a Logical Domain. A VF will provide bare metal like performance, this blog explains how to configure a VF so that VNICs can be created on top the VF device. This is required to be able to support Solaris11 Zones in a logical domain. 

 The following example, shows how to setup a VF so that VNICs can be created on it. 

Step1: 

When a VF is created by default only one mac-address(primary mac-address) is assigned to it. In order to create VNICs, additional mac-addresses need to be assigned to it. This can be done either when the VF is created or using the set-io command. This example assumes you already created a VF.  If the VF is assigned to a domain, then it must be stopped before assigning additional mac-addresses. 

The following command allocates 3 alternate mac-addresses using automatic mac-address allocation method. 

 Primary# ldm set-io alt-mac-addrs=auto,auto,auto /SYS/MB/NET0/IOVNET.PF0.VF0

Step2:

 Now boot the logical domain to which the above VF is assigned. You can check the mac-addresses assigned to a VF using the following dladm command. 

ldg0# dladm show-phys -m net3
LINK                SLOT     ADDRESS            INUSE CLIENT
net3                primary  0:14:4f:f9:48:69   yes  net3
                    1        0:14:4f:fb:38:e    no   --
                    2        0:14:4f:fa:c8:7d   no   --
                    3        0:14:4f:fb:99:4b   no   --

Step3:

Now, we can create up to 3 VNICs on the net3 device using the dladm command.  Creating more than that will fail. If more VNICs are desired, assign more mac-addresses using the 'ldm set-io' command.

ldg0# dladm create-vnic -l net3 vnic0
ldg0# dladm create-vnic -l net3 vnic1
ldg0# dladm create-vnic -l net3 vnic2
ldg0# dladm create-vnic -l net3 vnic3
May 20 22:16:18 dt241-147.us.oracle.com vnic: WARNING: cannot detach client: 22
dladm: vnic creation over net3 failed: operation failed
ldg0#


SR-IOV feature in OVM Server for SPARC 2.2

One of the main features of OVM Server for SPARC(a.k.a LDoms) 2.2 is SR-IOV support. This blog is to help understand SR-IOV feature in LDoms a little better.

What is SR-IOV?

SR-IOV is an abbreviation for Single Root I/O virtualization. It is a PCI-SIG standards based I/O virtualization, that enables a PCIe function known as Physical Function(PF) to create multiple light weight PCIe functions known as Virtual Functions(VFs). After they are created, VFs show up like a regular PCIe functions and also operate like regular PCIe functions. The address space for a VF is well contained so that a VF can be assigned to a Virtual Machine(a logical domain) with the help of Hypervisor. SR-IOV provides the high granularity of sharing compared to other form of direct h/w access methods that are available in LDoms technology, that is, PCIe bus assignment and Direct I/O. A few important things to understand about PF and VFs are:

  • A VF configuration space provide access to registers to perform I/O only. That is, only access to DMA channels and related registers.
  • The common h/w related configuraton changes can only be performed via the PF, so a VF driver need to contact PF driver perform the change on behalf of VF. A PF driver owns the responsiblity of ensuring a VF does not impact other VFs or PF in any way.

More detalis of SR-IOV can be found at the PCI-SIG website: PCI-SIG Single Root I/O Virtualization

What are the benefits of SR-IOV Virtual Functions?

  • Bare metal like performance.
    • No CPU overhead and latency issues that are seen in Virtual I/O.
  • Throughput that is only limited by the number of VFs from the same device and actively performing I/O.
    • There is no limitation of throughput due to such things as implementation limitations that exist Virtual I/O.
    • At a given time, if it is the only one VF that is performing I/O, it can potentially utilize the entire b/w available.
    • When multiple VFs are trying perform I/O, then b/w allocation depends on the SR-IOV card h/w on how it allocates the b/w to VFs. The devices that are supported in LDoms2.2 apply a round robin type of policy, which distributes the available b/w equally to all VFs that are performing I/O.
 

In summary, a logical domain with an application that requires bare metal like I/O performance is a best candidate to use SR-IOV. Before assigning an SR-IOV Virtual Function to a logical domain, it is important to understand the limitations that come along with it, see below for more details.

 

LDoms2.2 SR-IOV Limitations:

Understand the following limitations and plan ahead on how you would deal with them in your deployment.

  • Migration feature is disabled for a logical domain that has a VF is assigned to it.
    • For all practical purposes, a VF looks like physical device in a logical domain. This brings all the limitations of having a physical device in a logical domain.
  • Hard dependency on the Root domain(a domain in which PF device exists).
    • In LDoms2.2, the Primary domain is the only root domain that is supported. That is, rebooting the Primary domain will impact the logical domains that have a VF assigned to them, the behavior is unpredictable but the common expectation is an OS panic.
    • Caution: Prior to rebooting the Primary domain, ensure that all logical domains that have a VF assigned to them are properly shutdown. See LDoms2.2 admin guide about how to setup a failure-policy to handle unplanned cases.
  • Primary Domain as the only root domain supported. That is, SR-IOV supported only for the SR-IOV cards that are in the PCIe bus owned by the Primary domain.
    • If a PCIe bus assigned to another logical domain, typically used to create failover configs, then SR-IOV support for for the cards in that bus is disabled. You will not see the Physical functions from those cards.

What hardware needed?

The following details may help you plan on what hardware needed to use LDoms SR-IOV feature.

  • SR-IOV feature is supported only on platforms based on T3 and beyond. This feature is not available on platforms T2 and T2+.
  • LDoms2.2 at the release time will support two SR-IOV devices. These are:
    • On-board SR-IOV ethernet devices. T3 and T4 platforms have Intel Gigabit SR-IOV capable Ethernet devices on the mother-board. So, to explore this technology, you have a device already available in your system.
    • Intel 10Gbps SR-IOV card with part numbers((X)1109A-Z, (X)1110A-Z, (X)4871A-Z). See LDoms2.2 Release notes for accurate info.
NOTE: Make sure to update Fcode firmware on these cards to ensure all features work as expected. See LDoms2.2 release notes for details on where and how to update the card's firmware.

What software needed?

The following are the software requirements, see the LDoms2.2 release notes and admin guide for more details.

  • LDoms2.2 Firmware and LDoms manager. See LDoms release notes for the Firmware versions for your platform.
  • ">SR-IOV feature requires major Solaris framework support and PF drivers in the Root Domain. At this time, the SR-IOV feature support available only in Solaris11 + SRU7 or later. So, ensure Primary domain has Solaris11 + SRU7 or later.
  • Guest domains can run either Solaris10 or Solaris11. If Solaris10 were to be used ensure, you have update9 or update10 with VF driver patches installed. See LDoms2.2 release notes for the patch numbers. If Solaris11 is used, then ensure you have SRU7 or later instaled.

 

References: LDoms 2.2 documentation 

 

How to create and assign SR-IOV VFs to logical domains?

This an example that shows how to create 4 VFs from an on-board SR-IOV PF device and assign them to 4 logical domains on a T4-1 platform. The following diagram shows the end result of this example. Example Showing SR-IOV VFs assigned to 4 Logical domains

Step1:

Run 'ldm list-io' command to see all available PF devices. Note, the name of the PF device includes details about which slot the PF is located at. For example, a PF named /SYS/MB/RISER1/PCIE4/IOVNET.PF0 is present in a slot labled as PCIE4. Primary# ldm ls-io NAME TYPE DOMAIN STATUS ---- ---- ------ ------ pci_0 BUS primary niu_0 NIU primary /SYS/MB/RISER0/PCIE0 PCIE - EMP /SYS/MB/RISER1/PCIE1 PCIE - EMP /SYS/MB/RISER2/PCIE2 PCIE - EMP /SYS/MB/RISER0/PCIE3 PCIE - EMP /SYS/MB/RISER1/PCIE4 PCIE primary OCC /SYS/MB/RISER2/PCIE5 PCIE primary OCC /SYS/MB/SASHBA0 PCIE primary OCC /SYS/MB/SASHBA1 PCIE primary OCC /SYS/MB/NET0 PCIE primary OCC /SYS/MB/NET2 PCIE primary OCC /SYS/MB/RISER1/PCIE4/IOVNET.PF0 PF - /SYS/MB/RISER1/PCIE4/IOVNET.PF1 PF - /SYS/MB/RISER2/PCIE5/P0/P2/IOVNET.PF0 PF - /SYS/MB/RISER2/PCIE5/P0/P2/IOVNET.PF1 PF - /SYS/MB/RISER2/PCIE5/P0/P4/IOVNET.PF0 PF - /SYS/MB/RISER2/PCIE5/P0/P4/IOVNET.PF1 PF - /SYS/MB/NET0/IOVNET.PF0 PF - /SYS/MB/NET0/IOVNET.PF1 PF - /SYS/MB/NET2/IOVNET.PF0 PF - /SYS/MB/NET2/IOVNET.PF1 PF - Primary#

Step2:

Let's use the PF with name /SYS/MB/NET0/IOVNET.PF0 for this example. The PF name has NET0, which indicates this is an on-board device. Using the -l option we can find additional details such as the path of the device and the maximum number of VFs it supports. This device supports upto a maximum of 7 VFs. Primary# ldm ls-io -l /SYS/MB/NET0/IOVNET.PF0 NAME TYPE DOMAIN STATUS ---- ---- ------ ------ /SYS/MB/NET0/IOVNET.PF0 PF - [pci@400/pci@2/pci@0/pci@6/network@0] maxvfs = 7 Primary# In the root domain, we can find the network device that maps to this PF by searching for the matching path in /etc/path_to_inst file. This device maps to igb0. Primary# grep pci@400/pci@2/pci@0/pci@6/network@0 /etc/path_to_inst "/pci@400/pci@2/pci@0/pci@6/network@0" 0 "igb" "/pci@400/pci@2/pci@0/pci@6/network@0,1" 1 "igb" Primary# In Solaris11 the auto-vanity name generates generic linknames, you can find the linkname for the device using the following command. You can see the igb0 maps to net0. So, we are really using the net0 device in Primary domain. Primary# dladm show-phys -L LINK DEVICE LOC net0 igb0 /SYS/MB net1 igb1 /SYS/MB net2 igb2 /SYS/MB net3 igb3 /SYS/MB net4 ixgbe0 PCIE4 net5 ixgbe1 PCIE4 net6 igb4 PCIE5 net7 igb5 PCIE5 net8 igb6 PCIE5 net9 igb7 PCIE5 net10 vsw0 -- net11 usbecm2 -- Primary#

Step3:

Create 4 VFs on the PF /SYS/MB/NET0/IOVNET.PF0 using the create-vf command. Note the creating VFs in LDoms2.2 release requires a reboot of the root domain. We can create multiple VFs and reboot only once. NOTE: As this operation requires, plan ahead on how many VFs you would like create and create them in advance. You might be tempted to create max number of VFs and use them later, but this may not good with devices that support large number of VFs. For example Intel 10Gbps SR-IOV device that is supported in this release supports upto a max of 63 VFs. But T3 and T4 platforms can only support a max of 15 I/O domains per PCIe bus. So, creating more than 15 VFs on the same PCIe bus needs to planned on how you would use them, typically you may have to assign multiple VFs to a domain as we only support 15 I/o domains per PCIe bus. Primary# ldm create-vf /SYS/MB/NET0/IOVNET.PF0 Initiating a delayed reconfiguration operation on the primary domain. All configuration changes for other domains are disabled until the primary domain reboots, at which time the new configuration for the primary domain will also take effect. Created new VF: /SYS/MB/NET0/IOVNET.PF0.VF0 Primary# ldm create-vf /SYS/MB/NET0/IOVNET.PF0 ------------------------------------------------------------------------------ Notice: The primary domain is in the process of a delayed reconfiguration. Any changes made to the primary domain will only take effect after it reboots. ------------------------------------------------------------------------------ Created new VF: /SYS/MB/NET0/IOVNET.PF0.VF1 Primary# ldm create-vf /SYS/MB/NET0/IOVNET.PF0 ------------------------------------------------------------------------------ Notice: The primary domain is in the process of a delayed reconfiguration. Any changes made to the primary domain will only take effect after it reboots. ------------------------------------------------------------------------------ Created new VF: /SYS/MB/NET0/IOVNET.PF0.VF2 Primary# ldm create-vf /SYS/MB/NET0/IOVNET.PF0 ------------------------------------------------------------------------------ Notice: The primary domain is in the process of a delayed reconfiguration. Any changes made to the primary domain will only take effect after it reboots. ------------------------------------------------------------------------------ Created new VF: /SYS/MB/NET0/IOVNET.PF0.VF3 Primary#

Step4:

Reboot the Primary domain. Caution: If there are any I/O domains that have PCIe slots of VFs assigned to them, then shutdown those logical domains before rebooting the Primary domain.

Step5:

Once the Primary domain is rebooted, now the VFs are available to assign to other logical domains. Use list-io command to see the VFs and then assign them to I/O domains. You can see the VFs at the end. If the list is long you can use the PF name as the arg to limit the listing to VFs from that PF only. Primary# ldm ls-io NAME TYPE DOMAIN STATUS ---- ---- ------ ------ pci_0 BUS primary niu_0 NIU primary /SYS/MB/RISER0/PCIE0 PCIE - EMP /SYS/MB/RISER1/PCIE1 PCIE - EMP /SYS/MB/RISER2/PCIE2 PCIE - EMP /SYS/MB/RISER0/PCIE3 PCIE - EMP /SYS/MB/RISER1/PCIE4 PCIE primary OCC /SYS/MB/RISER2/PCIE5 PCIE primary OCC /SYS/MB/SASHBA0 PCIE primary OCC /SYS/MB/SASHBA1 PCIE primary OCC /SYS/MB/NET0 PCIE primary OCC /SYS/MB/NET2 PCIE primary OCC /SYS/MB/RISER1/PCIE4/IOVNET.PF0 PF - /SYS/MB/RISER1/PCIE4/IOVNET.PF1 PF - /SYS/MB/RISER2/PCIE5/P0/P2/IOVNET.PF0 PF - /SYS/MB/RISER2/PCIE5/P0/P2/IOVNET.PF1 PF - /SYS/MB/RISER2/PCIE5/P0/P4/IOVNET.PF0 PF - /SYS/MB/RISER2/PCIE5/P0/P4/IOVNET.PF1 PF - /SYS/MB/NET0/IOVNET.PF0 PF - /SYS/MB/NET0/IOVNET.PF1 PF - /SYS/MB/NET2/IOVNET.PF0 PF - /SYS/MB/NET2/IOVNET.PF1 PF - /SYS/MB/NET0/IOVNET.PF0.VF0 VF /SYS/MB/NET0/IOVNET.PF0.VF1 VF /SYS/MB/NET0/IOVNET.PF0.VF2 VF /SYS/MB/NET0/IOVNET.PF0.VF3 VF Primary#

Step6:

Assign each VF to a logical domain using the 'add-io' command.

NOTE: LDoms2.2 requires the logical domain to which the VF is being assigned to be stopped. So, if the logical domains to which the VFs need to be assigned are running, then stop them and then assign VFs. Primary# ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF0 ldg0 Primary# ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF1 ldg1 Primary# ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF2 ldg2 Primary# ldm add-io /SYS/MB/NET0/IOVNET.PF0.VF3 ldg3

Step7:

Start the logical domains to use the VFs in them. You can start each domain individually or start all logical domains with 'ldm start -a'. NOTE: A VF device can be used to boot over network at the OBP prompt too. Primary# ldm start ldg0 LDom ldg0 started Primary# ldm start ldg1 LDom ldg1 started Primary# ldm start ldg2 LDom ldg2 started Primary# ldm start ldg3 LDom ldg3 started

Step8:

Login to the guest domain and configure VF device for use. The VF device will appear like any other physical NIC device. You can only distinguish it by the device name using solaris commands. The following commands show a VF on logical domain 'ldg0' running solaris11 and configure it for use using dhcp. ldg0# dladm show-phys LINK MEDIA STATE SPEED DUPLEX DEVICE net0 Ethernet unknown 0 unknown igbvf0 ldg0# ldg0# ipadm create-ip net0 ldg0# ipadm create-add -T dhcp net0/dhcp Unrecognized subcommand: 'create-add' For more info, run: ipadm help ldg0# ipadm create-addr -T dhcp net0/dhcp ldg0# ifconfig net0 net0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 3 inet 10.129.241.141 netmask ffffff00 broadcast 10.129.241.255 ether 0:14:4f:f9:48:69 ldg0# After this the VF device can be used like any other network device for all applications without any latency or performance issues seen in Virtual I/O.
About

Raghuram Kothakota

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today