Tuesday Jul 07, 2015

PV IPoIB in Kernel Zones in Solaris 11.3

The Paravirtualization of IP over Infiniband (IPoIB) in kernel zones is a 
new feature in S11.3 enhancing the network virtualization offering in Solaris.
This allows for existing IP applications in the guest to run over Infiniband 
fabrics. Features such as Kernel zone Live Migration and IPMP are supported 
with the Paravirtualized IPoIB datalinks making it an appealing option.

Moreover, the device management of these guest datalinks are similar to their 
Ethernet counterparts making it straightforward to configure and manage. Zonecfg 
is used in the host to configure the kernel zone's automatic network interface 
(anet) to select the link of the IB HCA port to paravirtualize and assign as the 
lower-link, the Partition Key (P_Key) wthin the IB fabric and the possible 
link mode to choose from which could either be IPoIB-CM or IPoIB-UD.

The PV IPoIB datalink is a front end guest driver emulating a IPoIB VNIC 
in the host created over a physical IB partition datalink per P_Key and port.

To create a PV IPoIB datalink in a kernel zone the configuration is fairly 
simple. Here is an example showing how to create a PV IPoIB datalink in a 
kernel zone.

1. Find the IB datalink in the host to paravirtualize. 

I am selecting net7 for this example.

# ibadm
HCA             TYPE      STATE     IOV    ZONE
hermon0         physical  online    off    global

# dladm show-ib
LINK      HCAGUID        PORTGUID       PORT STATE   GWNAME       GWPORT   PKEYS
net5      21280001A0D220 21280001A0D222 2    up      --           --       8001,FFFF
net7      21280001A0D220 21280001A0D221 1    up      --           --       8001,FFFF
                                                  
# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         1000   full      igb0
net2              Ethernet             unknown    0      unknown   igb2
net3              Ethernet             unknown    0      unknown   igb3
net1              Ethernet             unknown    0      unknown   igb1
net4              Ethernet             up         10     full      usbecm0
net5              Infiniband           up         32000  full      ibp1
net7              Infiniband           up         32000  full      ibp0

2. Create an IPoIB PV datalinks to a kernel zone.
To add an IPoIB PV interface to a kernel zone say tzone1 add an anet 
and specify a lower-link and pkey which are mandatory properties using 
zonecfg. If not specified IPoIB-CM is the default link mode.

# zonecfg -z tzone1
    zonecfg:kzone0> add anet
    zonecfg:kzone0:anet> set lower-link=net7
    zonecfg:kzone0:anet> set pkey=0xffff
    zonecfg:kzone0:anet> info
    anet 1:
        lower-link: net7
        ...
        pkey: 0xffff
        linkmode not specified
        evs not specified
        vport not specified
        iov: off
        lro: auto
        id: 1
    ...
    zonecfg:tzone1>exit
#

3. Additional IPoIB PV datalinks to the kernel zone.
Additional IPoIB PV interfaces to a kernel zone with a lower-link and pkey 
can be added as indicated above. These datalinks can be used exclusively 
to host native zones within the kernel zones.

4. The PV IPoIB datalinks appear within the kernel zone on boot.

root@tzone1:~# dladm 
LINK                CLASS     MTU    STATE    OVER
net1                phys      65520  up       --
net0                phys      65520  up       --

root@tzone1:~# ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
lo0               loopback   ok           --         --
   lo0/v4         static     ok           --         127.0.0.1/8
   lo0/v6         static     ok           --         ::1/128
net0              ip         ok           --         --
   net0/v4        static     ok           --         1.1.1.190/24
net1              ip         ok           --         --
   net1/v4        static     ok           --         2.2.2.190/24

Virtual NICs (VNICs) tzone1/net0 and tzone1/net1 are created in the
host kernel which are the backend of the PV interface.

# dladm show-vnic
LINK            OVER           SPEED  MACADDRESS        MACADDRTYPE IDS
tzone1/net1     net7           32000  80:0:0:4d:fe:..   fixed       PKEY:0xffff
tzone1/net0     net7           32000  80:0:0:4e:fe:..   fixed       PKEY:0xffff

Wednesday Dec 10, 2014

Which Oracle Solaris Virtualization?

From time to time as the product manager for Oracle Solaris Virtualization I get asked by customers which virtualization technology they should choose. This is probably because of two main reasons.

  1. Choice: Oracle Solaris provides a choice of virtualization technologies so you can tailor your virtual infrastructure to best fit your application, not to have force (and hence compromise) your application to fit a single option 
  2. No way back: There is the perception, once you make your choice if you get it wrong there is no way back (or a very difficult way back), so it is really important to make the right choice

Understandably there is occasionally a lot of angst around this decision but, as always, with Oracle Solaris there is good news. First the choice isn't as complex as it first seems and below is a diagram that can help you get a feel for that choice. We now have many many customers that are discovering that the combination of Oracle Solaris Zones inside OVM Server for SPARC instances (Logical Domains) gives them the best of both worlds.

Second with Unified Archives in Oracle Solaris 11.2 you always have a way back. With a Unified Archive you can move from a Native Zone to a Kernel Zone to a Logical Domain to Bare Metal and any and all combinations in-between. You can test which is the best type of virtualization for your applications and infrastructure and if you don't like it change to another type in a few minutes. 

BTW if you want a more in-depth discussion of virtualization and how to best utilize it for consolidation, check out the Consolidation Using Oracle's SPARC Virtualization Technologies white paper.  

Thursday Aug 14, 2014

VXLAN in Solaris 11.2

What is a VXLAN?

VXLAN, or Virtual eXtensible LAN, is essentially a tunneling mechanism used to provide isolated virtual Layer 2 (L2) segments that can span multiple physical L2 segments. Since it is a tunneling mechanism it uses IP (IPv4 or IPv6) as its underlying network which means we can have isolated virtual L2 segments over networks connected by IP. This allows Virtual Machines (VM) to be in the same L2 segment even if they  are located on systems that are in different physical networks. Some of the benefits of VXLAN include:

  • Better use of resources, i.e. VMs can be provisioned on systems, that span different geographies, based on system load.
  • VMs can be moved across systems without having to reconfigure the underlying physical network.
  • Fewer MAC address collision issues, i.e. MAC address may collide as long as they are in different VXLAN segments.
Isolated L2 segments can be supported by existing mechanisms such as VLANs, but VLANs don't scale; the number of VLANs are limited to 4094 (0 and 1 are reserved), but VXLAN can provide upto 16 million isolated L2 networks.

Additional details, including protocol working, can be found in the VXLAN draft IETF RFC. Note that Solaris uses the IANA specified UDP port number of 4789 for VXLAN. 

The following is a quick primer on administering VXLAN in Solaris 11.2 using the Solaris administrative utility dladm(1m). Solaris Elastic Virtual Switch (EVS) can be used to manage VXLAN deployment automatically in a cloud environment - this will be the subject of a  future discussion.

The following illustrates how VXLANs are created on Solaris:

where IPx is an IP address (IPv4 or IPv6) and VNIs y and z are different VXLAN segments. VM1, VM2 and VM3 are guests with interfaces configured on VXLAN segments y and z. vxlan1 and vxlan2 are VXLAN links, represented by a new class called VXLAN.

Creating VXLANs

To begin with we need to create  VXLAN links in the segments that we want to use  for guests - let's assume we want to create segments 100 and 101. Additionally, we also want to create the VXLAN links on IP (remember VXLANs are overlay over IP networks), so we need the IP address over which we want to create the VXLAN links - let's assume our endpoint on this system is 10.10.10.1 (in the following example this IP address resides on net4).

# ipadm show-addr net4                                      
ADDROBJ           TYPE     STATE        ADDR
net4/v4                 static        ok           10.10.10.1/24

Create VXLAN segments 100 and 101 on this IP address.

# dladm create-vxlan -p addr=10.10.10.1,vni=100 vxlan1 
# dladm create-vxlan -p addr=10.10.10.1,vni=101 vxlan2    

Notes:

  • In the above example we explicitly provide the IP address, however, you could also:
    • provide a prefix and prefixlen to use an IP address that matches it, e.g:
# dladm create-vxlan -p addr=10.10.10.0/24,vni=100 vxlan1
    • provide an interface (say net4 in our case) to pick an active address on that interface, e.g:
# dladm create-vxlan -p interface=net4,vni=100 vxlan1
(you can't provide interface and addr together)

  • VXLAN links can be created on an IP address over any interface, including IPoIB link, except IPMP, loopback or VNI (Virtual Network Interface).
  • The IP address may belong to a VLAN segment.

Displaying VXLANs

Check if we have our VXLAN links:

# dladm show-vxlan                                          
LINK                ADDR                     VNI   MGROUP
vxlan1              10.10.10.1               100   224.0.0.1
vxlan2              10.10.10.1               101   224.0.0.1

One thing  we haven't talked about so far is the MGROUP. Recall from the RFC that VXLAN links use IP multicast for broadcast. So, we can assign a multicast address to each  VXLAN segment that we create. If we don't specify a multicast address, we assign the all-host multicast address (or all nodes for IPv6) to the VXLAN segments. In the above case since we didn't specify the multicast address both vxlan1 and vxlan2 will use the all-host multicast address.

The VXLAN links created, vxlan1 and vxlan2, are just like other datalinks (physical, VNIC, VLAN, etc.) and can be displayed using 

# dladm show-link
LINK                CLASS     MTU    STATE    OVER
...
vxlan1              vxlan     1440         up            --
vxlan2              vxlan     1440         up            --

The STATE reflects that state of the VXLAN links which is based on the status of the IP address (10.10.10.1 in this case). Note that the MTU is reduced because of the VXLAN encapsulation for each packet, on this VXLAN link.

Now that we have our VXLAN links, we can create Virtual Links (VNICs) over these  VXLAN links. Note, the VXLAN links themselves not active links, i.e. you can't plumb IP address or create Flows on them, but they can be snooped.

# dladm create-vnic  -l vxlan1 vnic1                    
# dladm create-vnic  -l vxlan1 vnic2    
# dladm create-vnic  -l vxlan2 vnic3            
# dladm create-vnic  -l vxlan2 vnic4  

# dladm show-vnic                                           
LINK                OVER              SPEED  MACADDRESS        MACADDRTYPE VIDS
vnic1               vxlan1            10000     2:8:20:d9:df:5f            random                   0
vnic2               vxlan1            10000     2:8:20:72:9a:70          random                   0
vnic3               vxlan2            10000     2:8:20:19:c7:14          random                   0
vnic4               vxlan2            10000     2:8:20:88:98:6d         random                    0

You can see from the above that the process of creating a VNIC on a VXLAN link  is no different from creating one any other link  such as physical, aggregation, etherstub etc.  This means that the VNICs created may belong to a VLAN and properties (such as maxbw and priority) can be set on them.

Once created, these VNICs can be assiged explicitly to Solaris zones. Alternatively, the VXLAN links can be set as the lower-link for configuring anet (automatic VNIC) links in Solaris Zones.

For Logical Domains on SPARC, the virtual switch (add-vsw) can be created on the VXLAN device which means the vnets created on the virtual switch will be part of the VXLAN segment.

Deleting VXLANs

A VXLAN can be deleted once all the VNICs over the VXLAN links have been deleted. Thus in our case:


# dladm delete-vnic vnic1   
# dladm delete-vnic vnic2 
# dladm delete-vnic vnic3     
# dladm delete-vnic vnic4  

# dladm delete-vxlan vxlan1
# dladm delete-vxlan vxlan2  

Additional Notes:
  • VXLAN for Solaris Kernel zone and LDom guests are not supported with direct I/O.
  • Hardware capabilities such as checksum and LSO are not available for the encapsulated (inner) packet.
  • Some earlier implementations (e.g. Linux) might use a pre-IANA assigned port number. If so, such implementations might have to be configured to use the IANA port number to interoperate with Solaris VXLAN. 
  • IP multicast must be available in the underlying network and if communicating  across different IP subnets, multicast routing should be available as well.
  • Modifying properties (IP address, multicast address or VNI) on a VXLAN link is currently not supported; you'd have to delete the VXLAN and re-create it.

Saturday Apr 19, 2014

The Technical Details: April 29 Oracle Solaris 11.2 Launch

You may have already heard that we're going to hold the Oracle Solaris 11.2 launch in New York City in a few days, and that there will also be a live webcast of the event.

One of the things that the webcast will feature that won't be part of the live event will be additional technical presentations where Solaris engineers will go into more detail about some of the new features that are being added. VP for Solaris core engineering Markus Flierl gives a quick rundown:


If this sounds interesting to you, you should register now. The event starts at 1 PM ET / 10 AM PT, with Mark Hurd and John Fowler. Markus then moves on to the more technical part of the in-person event, which will then be followed by the web-only deep-dive presentations.

During the live event, we'll have engineering folks and others on Twitter, tracking hashtag #solaris (apologies in advance to Stanislaw Lem fans).

Webcast: Announcing Oracle Solaris 11.2
Tuesday April 29, 2014
1 PM (ET) / 10:00am (PT)
REGISTER FOR THE WEBCAST

Friday Jan 10, 2014

Next OTN Virtual Sysadmin Day: January 28th, 2014

Glynn Foster notes that another OTN Virtual Sysadmin Day is coming up in just a couple of weeks, and talks about what's in store for the Oracle Solaris 11 track.

If you're not familiar with these, they're half-day, online, proctored hands-on labs, so you can learn more about various system administration technologies. They're also free--but you do need to register, and there's also some prep work to be done ahead of the event, so take a look at Glynn's blog post, and sign up today.

Thursday Feb 28, 2013

Building a "developer cloud" with Oracle Solaris 11

Orgad Kimchi describes how the new cloud technologies built into Oracle Solaris 11 can be used to build a virtualized "developer cloud."

By the way, this is the story of the "Oracle Solaris Remote Lab," an Oracle PartnerNetwork resource for developers which you'll be hearing more about.


After you read this, check out Orgad's earlier posts on other aspects of Solaris and virtualization.

Tuesday Sep 02, 2008

I Love Intel's Commitment to OpenSolaris

One of the things that excites me about using OpenSolaris is the amount of work that Intel is doing to improve the product for their processors. David Stewart, an engineering manager at Intel's Open Source Technology Center, has put together a series of 5 minute vidoes highlighting their work. All are entertaining, informative and worth a watch...

[Read More]
About

The Observatory is a blog for users of Oracle Solaris. Tune in here for tips, tricks and more as we explore the Solaris operating system from Oracle.

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
20
21
22
23
24
25
26
27
28
29
30
31
     
Today