Montag Okt 01, 2012

What's up with LDoms: Part 4 - Virtual Networking Explained

I'm back from my summer break (and some pressing business that kept me away from this), ready to continue with Oracle VM Server for SPARC ;-)

In this article, we'll have a closer look at virtual networking.  Basic connectivity as we've seen it in the first, simple example, is easy enough.  But there are numerous options for the virtual switches and virtual network ports, which we will discuss in more detail now.   In this section, we will concentrate on virtual networking - the capabilities of virtual switches and virtual network ports - only.  Other options involving hardware assignment or redundancy will be covered in separate sections later on.

There are two basic components involved in virtual networking for LDoms: Virtual switches and virtual network devices.  The virtual switch should be seen just like a real ethernet switch.  It "runs" in the service domain and moves ethernet packets back and forth.  A virtual network device is plumbed in the guest domain.  It corresponds to a physical network device in the real world.  There, you'd be plugging a cable into the network port, and plug the other end of that cable into a switch.  In the virtual world, you do the same:  You create a virtual network device for your guest and connect it to a virtual switch in a service domain.  The result works just like in the physical world, the network device sends and receives ethernet packets, and the switch does all those things ethernet switches tend to do.

If you look at the reference manual of Oracle VM Server for SPARC, there are numerous options for virtual switches and network devices.  Don't be confused, it's rather straight forward, really.  Let's start with the simple case, and work our way to some more sophisticated options later on. 

In many cases, you'll want to have several guests that communicate with the outside world on the same ethernet segment.  In the real world, you'd connect each of these systems to the same ethernet switch.  So, let's do the same thing in the virtual world:

root@sun # ldm add-vsw net-dev=nxge2 admin-vsw primary
root@sun # ldm add-vnet admin-net admin-vsw mars
root@sun # ldm add-vnet admin-net admin-vsw venus

We've just created a virtual switch called "admin-vsw" and connected it to the physical device nxge2.  In the physical world, we'd have powered up our ethernet switch and installed a cable between it and our big enterprise datacenter switch.  We then created a virtual network interface for each one of the two guest systems "mars" and "venus" and connected both to that virtual switch.  They can now communicate with each other and with any system reachable via nxge2.  If primary were running Solaris 10, communication with the guests would not be possible.  This is different with Solaris 11, please see the Admin Guide for details.  Note that I've given both the vswitch and the vnet devices some sensible names, something I always recommend.

Unless told otherwise, the LDoms Manager software will automatically assign MAC addresses to all network elements that need one.  It will also make sure that these MAC addresses are unique and reuse MAC addresses to play nice with all those friendly DHCP servers out there.  However, if we want to do this manually, we can also do that.  (One reason might be firewall rules that work on MAC addresses.)  So let's give mars a manually assigned MAC address:

root@sun # ldm set-vnet mac-addr=0:14:4f:f9:c4:13 admin-net mars

Within the guest, these virtual network devices have their own device driver.  In Solaris 10, they'd appear as "vnet0".  Solaris 11 would apply it's usual vanity naming scheme.  We can configure these interfaces just like any normal interface, give it an IP-address and configure sophisticated routing rules, just like on bare metal. 

In many cases, using Jumbo Frames helps increase throughput performance.  By default, these interfaces will run with the standard ethernet MTU of 1500 bytes.  To change this,  it is usually sufficient to set the desired MTU for the virtual switch.  This will automatically set the same MTU for all vnet devices attached to that switch.  Let's change the MTU size of our admin-vsw from the example above:

root@sun # ldm set-vsw mtu=9000 admin-vsw primary

Note that that you can set the MTU to any value between 1500 and 16000.  Of course, whatever you set needs to be supported by the physical network, too.

Another very common area of network configuration is VLAN tagging. This can be a little confusing - my advise here is to be very clear on what you want, and perhaps draw a little diagram the first few times.  As always, keeping a configuration simple will help avoid errors of all kind.  Nevertheless, VLAN tagging is very usefull to consolidate different networks onto one physical cable.  And as such, this concept needs to be carried over into the virtual world.  Enough of the introduction, here's a little diagram to help in explaining how VLANs work in LDoms:
VLANs in LDoms
Let's remember that any VLANs not explicitly tagged have the default VLAN ID of 1. In this example, we have a vswitch connected to a physical network that carries untagged traffic (VLAN ID 1) as well as VLANs 11, 22, 33 and 44.  There might also be other VLANs on the wire, but the vswitch will ignore all those packets.  We also have two vnet devices, one for mars and one for venus.  Venus will see traffic from VLANs 33 and 44 only.  For VLAN 44, venus will need to configure a tagged interface "vnet44000".  For VLAN 33, the vswitch will untag all incoming traffic for venus, so that venus will see this as "normal" or untagged ethernet traffic.  This is very useful to simplify guest configuration and also allows venus to perform Jumpstart or AI installations over this network even if the Jumpstart or AI server is connected via VLAN 33.  Mars, on the other hand, has full access to untagged traffic from the outside world, and also to VLANs 11,22 and 33, but not 44.  On the command line, we'd do this like this:

root@sun # ldm add-vsw net-dev=nxge2 pvid=1 vid=11,22,33,44 admin-vsw primary
root@sun # ldm add-vnet admin-net pvid=1 vid=11,22,33 admin-vsw mars
root@sun # ldm add-vnet admin-net pvid=33 vid=44 admin-vsw venus

Finally, I'd like to point to a neat little option that will make your live easier in all those cases where configurations tend to change over the live of a guest system.  It's the "id=<somenumber>" option available for both vswitches and vnet devices.  Normally, Solaris in the guest would enumerate network devices sequentially.  However, it has ways of remembering this initial numbering.  This is good in the physical world.  In the virtual world, whenever you unbind (aka power off and disassemble) a guest system, remove and/or add network devices and bind the system again, chances are this numbering will change.  Configuration confusion will follow suit.  To avoid this, nail down the initial numbering by assigning each vnet device it's device-id explicitly:

root@sun # ldm add-vnet admin-net id=1 admin-vsw venus

Please consult the Admin Guide for details on this, and how to decipher these network ids from Solaris running in the guest.

Thanks for reading this far.  Links for further reading are essentially only the Admin Guide and Reference Manual and can be found above.  I hope this is useful and, as always, I welcome any comments.


Freitag Jun 29, 2012

What's up with LDoms: Part 2 - Creating a first, simple guest

Welcome back!

In the first part, we discussed the basic concepts of LDoms and how to configure a simple control domain.  We saw how resources were put aside for guest systems and what infrastructure we need for them.  With that, we are now ready to create a first, very simple guest domain.  In this first example, we'll keep things very simple.  Later on, we'll have a detailed look at things like sizing, IO redundancy, other types of IO as well as security.

For now,let's start with this very simple guest.  It'll have one core's worth of CPU, one crypto unit, 8GB of RAM, a single boot disk and one network port.  (If this were a T4 system, we'd not have to assign the crypto units.  Since this is T3, it makes lots of sense to do so.)  CPU and RAM are easy.  The network port we'll create by attaching a virtual network port to the vswitch we created in the primary domain.  This is very much like plugging a cable into a computer system on one end and a network switch on the other.  For the boot disk, we'll need two things: A physical piece of storage to hold the data - this is called the backend device in LDoms speak.  And then a mapping between that storage and the guest domain, giving it access to that virtual disk.  For this example, we'll use a ZFS volume for the backend.  We'll discuss what other options there are for this and how to chose the right one in a later article.  Here we go:

root@sun # ldm create mars

root@sun # ldm set-vcpu 8 mars 
root@sun # ldm set-mau 1 mars 
root@sun # ldm set-memory 8g mars

root@sun # zfs create rpool/guests
root@sun # zfs create -V 32g rpool/guests/mars.bootdisk
root@sun # ldm add-vdsdev /dev/zvol/dsk/rpool/guests/mars.bootdisk \
           mars.root@primary-vds
root@sun # ldm add-vdisk root mars.root@primary-vds mars

root@sun # ldm add-vnet net0 switch-primary mars

That's all, mars is now ready to power on.  There are just three commands between us and the OK prompt of mars:  We have to "bind" the domain, start it and connect to its console.  Binding is the process where the hypervisor actually puts all the pieces that we've configured together.  If we made a mistake, binding is where we'll be told (starting in version 2.1, a lot of sanity checking has been put into the config commands themselves, but binding will catch everything else).  Once bound, we can start (and of course later stop) the domain, which will trigger the boot process of OBP.  By default, the domain will then try to boot right away.  If we don't want that, we can set "auto-boot?" to false.  Finally, we'll use telnet to connect to the console of our newly created guest.  The output of "ldm list" shows us what port has been assigned to mars.  By default, the console service only listens on the loopback interface, so using telnet is not a large security concern here.

root@sun # ldm set-variable auto-boot\?=false mars
root@sun # ldm bind mars
root@sun # ldm start mars 

root@sun # ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  UART    8     7680M    0.5%  1d 4h 30m
mars             active     -t----  5000    8     8G        12%  1s

root@sun # telnet localhost 5000

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

~Connecting to console "mars" in group "mars" ....
Press ~? for control options ..

{0} ok banner

SPARC T3-4, No Keyboard
Copyright (c) 1998, 2011, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.33.1, 8192 MB memory available, Serial # 87203131.
Ethernet address 0:21:28:24:1b:50, Host ID: 85241b50.

{0} ok 

We're done, mars is ready to install Solaris, preferably using AI, of course ;-)  But before we do that, let's have a little look at the OBP environment to see how our virtual devices show up here:

{0} ok printenv auto-boot?
auto-boot? =            false

{0} ok printenv boot-device
boot-device =           disk net

{0} ok devalias
root                     /virtual-devices@100/channel-devices@200/disk@0
net0                     /virtual-devices@100/channel-devices@200/network@0
net                      /virtual-devices@100/channel-devices@200/network@0
disk                     /virtual-devices@100/channel-devices@200/disk@0
virtual-console          /virtual-devices/console@1
name                     aliases

We can see that setting the OBP variable "auto-boot?" to false with the ldm command worked.  Of course, we'd normally set this to "true" to allow Solaris to boot right away once the LDom guest is started.  The setting for "boot-device" is the default "disk net", which means OBP would try to boot off the devices pointed to by the aliases "disk" and "net" in that order, which usually means "disk" once Solaris is installed on the disk image.  The actual devices these aliases point to are shown with the command "devalias".  Here, we have one line for both "disk" and "net".  The device paths speak for themselves.  Note that each of these devices has a second alias: "net0" for the network device and "root" for the disk device.  These are the very same names we've given these devices in the control domain with the commands "ldm add-vnet" and "ldm add-vdisk".  Remember this, as it is very useful once you have several dozen disk devices...

To wrap this up, in this part we've created a simple guest domain, complete with CPU, memory, boot disk and network connectivity.  This should be enough to get you going.  I will cover all the more advanced features and a little more theoretical background in several follow-on articles.  For some background reading, I'd recommend the following links:

What's up with LDoms: Part 1 - Introduction & Basic Concepts

LDoms - the correct name is Oracle VM Server for SPARC - have been around for quite a while now.  But to my surprise, I get more and more requests to explain how they work or to give advise on how to make good use of them.  This made me think that writing up a few articles discussing the different features would be a good idea.  Now - I don't intend to rewrite the LDoms Admin Guide or to copy and reformat the (hopefully) well known "Beginners Guide to LDoms" by Tony Shoumack from 2007.  Those documents are very recommendable - especially the Beginners Guide, although based on LDoms 1.0, is still a good place to begin with.  However, LDoms have come a long way since then, and I hope to contribute to their adoption by discussing how they work and what features there are today.

 In this and the following posts, I will use the term "LDoms" as a common abbreviation for Oracle VM Server for SPARC, just because it's a lot shorter and easier to type (and presumably, read).

So, just to get everyone on the same baseline, lets briefly discuss the basic concepts of virtualization with LDoms.  LDoms make use of a hypervisor as a layer of abstraction between real, physical hardware and virtual hardware.  This virtual hardware is then used to create a number of guest systems which each behave very similar to a system running on bare metal:  Each has its own OBP, each will install its own copy of the Solaris OS and each will see a certain amount of CPU, memory, disk and network resources available to it.  Unlike some other type 1 hypervisors running on x86 hardware, the SPARC hypervisor is embedded in the system firmware and makes use both of supporting functions in the sun4v SPARC instruction set as well as the overall CPU architecture to fulfill its function.

The CMT architecture of the supporting CPUs (T1 through T4) provide a large number of cores and threads to the OS.  For example, the current T4 CPU has eight cores, each running 8 threads, for a total of 64 threads per socket.  To the OS, this looks like 64 CPUs. 

The SPARC hypervisor, when creating guest systems, simply assigns a certain number of these threads exclusively to one guest, thus avoiding the overhead of having to schedule OS threads to CPUs, as do typical x86 hypervisors.  The hypervisor only assigns CPUs and then steps aside.  It is not involved in the actual work being dispatched from the OS to the CPU, all it does is maintain isolation between different guests.

Likewise, memory is assigned exclusively to individual guests.  Here,  the hypervisor provides generic mappings between the physical hardware addresses and the guest's views on memory.  Again, the hypervisor is not involved in the actual memory access, it only maintains isolation between guests.

During the inital setup of a system with LDoms, you start with one special domain, called the Control Domain.  Initially, this domain owns all the hardware available in the system, including all CPUs, all RAM and all IO resources.  If you'd be running the system un-virtualized, this would be what you'd be working with.  To allow for guests, you first resize this initial domain (also called a primary domain in LDoms speak), assigning it a small amount of CPU and memory.  This frees up most of the available CPU and memory resources for guest domains. 

IO is a little more complex, but very straightforward.  When LDoms 1.0 first came out, the only way to provide IO to guest systems was to create virtual disk and network services and attach guests to these services.  In the meantime, several different ways to connect guest domains to IO have been developed, the most recent one being SR-IOV support for network devices released in version 2.2 of Oracle VM Server for SPARC. I will cover these more advanced features in detail later.  For now, lets have a short look at the initial way IO was virtualized in LDoms:

For virtualized IO, you create two services, one "Virtual Disk Service" or vds, and one "Virtual Switch" or vswitch.  You can, of course, also create more of these, but that's more advanced than I want to cover in this introduction.  These IO services now connect real, physical IO resources like a disk LUN or a networt port to the virtual devices that are assigned to guest domains.  For disk IO, the normal case would be to connect a physical LUN (or some other storage option that I'll discuss later) to one specific guest.  That guest would be assigned a virtual disk, which would appear to be just like a real LUN to the guest, while the IO is actually routed through the virtual disk service down to the physical device.  For network, the vswitch acts very much like a real, physical ethernet switch - you connect one physical port to it for outside connectivity and define one or more connections per guest, just like you would plug cables between a real switch and a real system. For completeness, there is another service that provides console access to guest domains which mimics the behavior of serial terminal servers.

The connections between the virtual devices on the guest's side and the virtual IO services in the primary domain are created by the hypervisor.  It uses so called "Logical Domain Channels" or LDCs to create point-to-point connections between all of these devices and services.  These LDCs work very similar to high speed serial connections and are configured automatically whenever the Control Domain adds or removes virtual IO.

To see all this in action, now lets look at a first example.  I will start with a newly installed machine and configure the control domain so that it's ready to create guest systems.

In a first step, after we've installed the software, let's start the virtual console service and downsize the primary domain. 

root@sun # ldm list
NAME     STATE    FLAGS   CONS  VCPU  MEMORY   UTIL  UPTIME
primary  active   -n-c--  UART  512   261632M  0.3%  2d 13h 58m

root@sun # ldm add-vconscon port-range=5000-5100 \
               primary-console primary
root@sun # svcadm enable vntsd
root@sun # svcs vntsd
STATE          STIME    FMRI
online          9:53:21 svc:/ldoms/vntsd:default

root@sun # ldm set-vcpu 16 primary
root@sun # ldm set-mau 1 primary
root@sun # ldm start-reconf primary
root@sun # ldm set-memory 7680m primary
root@sun # ldm add-config initial
root@sun # shutdown -y -g0 -i6 

So what have I done:

  • I've defined a range of ports (5000-5100) for the virtual network terminal service and then started that service.  The vnts will later provide console connections to guest systems, very much like serial NTS's do in the physical world.
  • Next, I assigned 16 vCPUs (on this platform, a T3-4, that's two cores) to the primary domain, freeing the rest up for future guest systems.  I also assigned one MAU to this domain.  A MAU is a crypto unit in the T3 CPU.  These need to be explicitly assigned to domains, just like CPU or memory.  (This is no longer the case with T4 systems, where crypto is always available everywhere.)
  • Before I reassigned the memory, I started what's called a "delayed reconfiguration" session.  That avoids actually doing the change right away, which would take a considerable amount of time in this case.  Instead, I'll need to reboot once I'm all done.  I've assigned 7680MB of RAM to the primary.  That's 8GB less the 512MB which the hypervisor uses for it's own private purposes.  You can, depending on your needs, work with less.  I'll spend a dedicated article on sizing, discussing the pros and cons in detail.
  • Finally, just before the reboot, I saved my work on the ILOM, to make this configuration available after a powercycle of the box.  (It'll always be available after a simple reboot, but the ILOM needs to know the configuration of the hypervisor after a power-cycle, before the primary domain is booted.)

Now, lets create a first disk service and a first virtual switch which is connected to the physical network device igb2. We will later use these to connect virtual disks and virtual network ports of our guest systems to real world storage and network.

root@sun # ldm add-vds primary-vds
root@sun # ldm add-vswitch net-dev=igb2 switch-primary primary

You are free to choose whatever names you like for the virtual disk service and the virtual switch.  I strongly recommend that you choose names that make sense to you and describe the function of each service in the context of your implementation.  For the vswitch, for example, you could choose names like "admin-vswitch" or "production-network" etc.

This already concludes the configuration of the control domain.  We've freed up considerable amounts of CPU and RAM for guest systems and created the necessary infrastructure - console, vts and vswitch - so that guests systems can actually interact with the outside world.  The system is now ready to create guests, which I'll describe in the next section.

For further reading, here are some recommendable links:

Mittwoch Jan 26, 2011

Logical Domains - sure secure

LDoms Oracle VM Server for SPARC are being used wide and far.  And I've been asked several times, how secure they actually were.  One customer especially wanted to be very very sure. So we asked for independent expertise on the subject matter.  The results were quite pleasing, but not exactly night time literature. So I decided to add some generic deployment recommendations to the core results and came up with a whitepaper. Publishing was delayed a bit due to the change of ownership which resulted in a significant change in process.  The good thing about that is that now it's also up to date with the latest release of the software. I am now happy and proud to present::


Secure Deployment of Oracle VM for SPARC


A big Thanks You to Steffen Gundel of Cirosec, who laid the foundation for this paper with his study.


I do hope that it will be usefull to some of you!


Enjoy!

Donnerstag Aug 12, 2010

LDoms and LDCs

Oracle VM for SPARC, as it's now called, uses Logical Domain Channels (LDCs) for inter-domain communication of all sorts, mostly used for virtual devices. A question I've been asked multiple times was how many LDCs there are available, and how to tell how many you've used. Here a few notes to that:



  1. How many LDCs are there, and how are they used?
    Find the detailed answer in the Release Notes of LDoms 3.0. Here is an example:


    • UltraSPARC T2 has 512 LDCs per domain

    • UltraSPARC T2+ has 786 LDCs per domain

    • Check the release notes of your server for more current models.

    You can use this formula to calculate how many you need:
    All LDCs = 15 + 1 + g + g \* ( n + d + c ), where


    • g is the number of guest domains,

    • d is the number of virtual disks in each guest,

    • n is the number of virtual network ports in each guest and

    • c is the number of virtual console ports per guest (always one).

    If g and n aren't the same for all guests, the formula will be a little less elegant. But you get the idea.


  2. How can you check what LDCs your system uses, and for what?


    1. In the Control Domain, use the command ldm list-bindings -e for a list of all configured resources, including all LDCs. Here's some sample output from the virtual console section:

      VCC
      NAME PORT-RANGE
      vconsole 5000-5050
      CLIENT PORT LDC
      debian@vconsole 5001 0x18
      rivermuse@vconsole 5002 0x1c
      gentoo@vconsole 5000 0x11
      demo@vconsole 5008 0x3d
      install@vconsole 5012 0x50
      rac1@vconsole 5005 0x35
      rac2@vconsole 5006 0x66
      Use the additional option "-p" for parsable output. With that, the command ldm list-bindings -e -p|grep ldc=0|awk -F\\| '{print $NF}' will give you a brief list of all LDC IDs used. If you like, count them with wc

    2. Note that LDCs are always counted per domain, not per system.

    3. Additional info using kstat:
      At least for the virtual network devices, you can also use kstat to gather additional information. Get a first overview with kstat|grep ldc. If you like, dig deeper from there on.




Knowing all this should help you planing your next virtualization deployment. The available number of LDCs should usually be sufficient. Using the formula from above, you can easily figure that even on T2, the singe socket system, a maximum of 124 domains is possible. On T2+, this goes to 193 - more than the supported 128 guests. Of course, you're likely to have more than the bare minimum of three devices per guest, so your actual maximum will vary. However, it should still be enough in most cases. And now, next time you're asked for an additional guest, you can quickly check and figure if you have enough LDCs available.


 (This entry was updated on Feb. 19, 2013 and again on Jan 27, 2014)

About

Neuigkeiten, Tipps und Wissenswertes rund um SPARC, CMT, Performance und ihre Analyse sowie Erfahrungen mit Solaris auf dem Server und dem Laptop.

This is a bilingual blog (most of the time). Please select your prefered language:
.
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today