Monday Jun 13, 2011

Device Validation with Oracle VM Server for SPARC 2.1

Oracle VM Server for SPARC 2.1 (aka LDoms 2.1) has been recently released. One of the major new features of this release is live migration; you can find more information about this cool feature on Liam's blog, and some experiment on Jeff's blog. Here, I would like to talk about a much more modest improvement but which is actually very useful: the device validation.

Device Misconfiguration and Previous Versions of LDoms

Previous versions of LDoms are not very friendly for handling some obvious configuration mistakes, like a misspelled path to designate a virtual disk backend. For example, let say I have a virtual disk with a Solaris 11 installation which is on the ZFS volume ldoms/vdisk/solaris_11. If I want to use this ZFS volume as a virtual disk then I have to add the device /dev/zvol/rdsk/ldoms/vdisk/solaris_11 to the virtual device server (vds) with the following command:
  # ldm add-vdsdev /dev/zvol/rdsk/ldoms/vdisk/solaris_11 s11@primary-vds0
If I make a mistake in the path of the device, for example I forget the 's' at the end of 'ldoms', then the system will not complain and I will be able to successfully create and configure my domain:
  # ldm create ldg1
  # ldm set-vcpu 8 ldg1
  # ldm set-mem 8G ldg1
  # ldm add-vnet vnet0 primary-vsw0 ldg1
  # ldm add-vdsdev /dev/zvol/rdsk/ldom/vdisk/solaris_11 solaris_11@primary-vds0
  # ldm add-vdisk vdisk0 solaris_11@primary-vds0 ldg1
  # ldm bind ldg1
  # ldm start ldg1
However, if I access the console of my guest domain ldg1 and try to boot it, I am getting some obscure messages but my domain is definitively not booting:
  # telnet 0 5001
  Trying 0.0.0.0...
  Connected to 0.
  Escape character is '^]'.

  {0} ok boot vdisk0
  Boot device: /virtual-devices@100/channel-devices@200/disk@0  File and args: 
  WARNING: /virtual-devices@100/channel-devices@200/disk@0: Receiving packet from LDC
  but LDC is Not Up!
  WARNING: /virtual-devices@100/channel-devices@200/disk@0: Communication error with
  Virtual Disk Server using Port 0. Retrying.
  ...
  ERROR: /virtual-devices@100/channel-devices@200/disk@0: boot-read fail

 Can't open boot device
Understanding the problem from these error messages is not very straightforward, but if you have some experiences with LDoms, you can conclude that there is a problem with the virtual disk used for booting. Also, you probably know that a good place to look for hints about the problem is the /var/adm/messages file:
  # cat /var/adm/messages
  ...
  Jun 10 22:55:03 dt92-416 vds: [ID 877446 kern.info] vd_setup_vd():
  /dev/zvol/rdsk/ldom/vdisk/solaris_11 is currently inaccessible (error 2)
So here, we find a message from the virtual disk server (vds), about /dev/zvol/rdsk/ldom/vdisk/solaris_11 being inaccessible. In addition, "error 2" means "No such file or directory" (ENOENT). So we can check the device path and notice that it is incorrect because a 's' is missing:
  # ls -l /dev/zvol/rdsk/ldom/vdisk/solaris_11
  ls: cannot access /dev/zvol/rdsk/ldom/vdisk/solaris_11: No such file or directory

  # ls -l /dev/zvol/rdsk/ldoms/vdisk/solaris_11
  lrwxrwxrwx 1 root root 0 May 12 14:58 /dev/zvol/rdsk/ldoms/vdisk/solaris_11 ->
  ../../../../..//devices/pseudo/zfs@0:3,raw
Now, we just have to reconfigure the virtual disk server with the right device path. But it is quite a long trip to identify such an obvious problem!

Improvement with Oracle VM Server for SPARC 2.1

The good news is that with Oracle VM Server for SPARC 2.1, the system will immediately notice such a problem and give you a clear error message. Let's try the same sequence again with Oracle VM Server for SPARC 2.1:
  # ldm create ldg1
  # ldm set-vcpu 8 ldg1
  # ldm set-mem 8G ldg1
  # ldm add-vnet vnet0 primary-vsw0 ldg1
  # ldm add-vdsdev /dev/zvol/rdsk/ldom/vdisk/solaris_11 solaris_11@primary-vds0
  Path /dev/zvol/rdsk/ldom/vdisk/solaris_11 is not valid on service domain primary
As you can see, I get an error message saying that the path is not valid; so I can immediately notice and correct my mistake.

However, there might be some cases where you know that the path you are indicating is not valid, for example because the device does not exist yet or because the service domain providing that device is not currently up. For these situations, the ldm add-vdsdev command has a -q option to quickly add the device without checking if it is valid:

   # ldm add-vdsdev -q /dev/zvol/rdsk/ldom/vdisk/solaris_11 solaris_11@primary-vds0
   # ldm add-vdisk vdisk0 solaris_11@primary-vds0 ldg1
With the -q option, the device is not checked so no error is returned. This can be particularly useful when provisioning some domains for a future usage and when it does not really matter if the associated devices effectively exist yet. Or, if you just want to avoid the overhead of the validation because you know that your path is correct.

After you have configured your domain, virtual devices will be checked anyway when you bind the domain:

  # ldm bind ldg1
  Path /dev/zvol/rdsk/ldom/vdisk/solaris_11 is not valid on service domain primary
That way, you receive an error when you are about to use a domain which is incorrectly configured. At this point, you really need your virtual devices to be properly configured because you are effectively going to use them (note that the ldm bind command also has a -q option to disable the device validation, if you really want to bind your domain anyway).

This example has shown the device validation for virtual disks, but device validation also occurs with the ldm add-vsw command to validate that the physical network device (net-dev) associated with a virtual switch is valid.

Backward Compatibility Mode

Although this should not be very frequent, there might be some cases where you don't want any device validation to occur. For example, because you have some custom scripts and you never want the add-vsw, add-vdsdev or bind commands to fail. For such a case, it is possible to completely disable the device validation and to go back to the same behavior as before Oracle VM Server for SPARC 2.1.

Device validation can be disabled by setting the device_validation SMF property of the Logical Domain manager service to 0.

  # svccfg -s ldmd setprop ldmd/device_validation=0
  # svcadm refresh ldmd
  # svcadm restart ldmd
This setting will entirely disable the device validation. Device validation can be restored by setting the device_validation property back to -1:
  # svccfg -s ldmd setprop ldmd/device_validation=-1
  # svcadm refresh ldmd
  # svcadm restart ldmd
About

Alexandre Chartre is a senior principal engineer in the Oracle Virtualization Engineering organization. He is co-architect for Oracle VM Server for SPARC (LDoms).

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today