Disk dynamic reconfiguration in Oracle VM Server for SPARC

I recently saw an e-mail where a customer said you couldn't remove a virtual disk from a running guest, and my immediate response was "sure you can". I'll share the simple steps needed to add and remove a virtual disk in a running domain without any outage. This is a system running Oracle VM Server for SPARC 3.1 with a Solaris 11.1 guest domain named ldom0. I used NFS storage because it is easy to set up and lets me use live migration.

Adding a virtual disk to a running domain

The entire sequence of commands in the control domain defines, adds and removes a disk while the guest domain runs:

# mkfile -n 20g  /ldomsnfs/ldom0/disk1.img       # 1. create a disk image file
# ldm add-vdsdev /ldomsnfs/ldom0/disk1.img vol01@primary-vds0 # 2. define vdisk
# ldm add-vdisk vdisk01 vol01@primary-vds0 ldom0 # 3. add disk to the domain

# ldm rm-vdisk vdisk01 ldom0                     # 4. take it away from the domain.
# ldm rm-vdsdev vol01@primary-vds0               # 5. undefine the virtual disk
# rm disk1.img                                   # 6. save a little space.

That's all there is to it. The new disk is available for the domain's use after step 3 until I take it away in step 4.

Viewing reconfiguration from within the guest

Let's take a look from the guest domain's perspective. In the guest, you can see one disk before adding more (before command 3, above) via the format command:

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c3d1 
          /virtual-devices@100/channel-devices@200/disk@1
Specify disk (enter its number): ^C

There's one disk until ldm add-vdisk is issued in the control domain. That results in a dynamic reconfiguration event that can be seen, if you are curious, by entering dmesg within the guest:

# dmesg|tail
...snip...
Nov 20 12:03:23 ldom0 vdc: [ID 625787 kern.info] vdisk@0 is online using ldc@16,0
Nov 20 12:03:23 ldom0 cnex: [ID 799930 kern.info] channel-device: vdc0
Nov 20 12:03:23 ldom0 genunix: [ID 936769 kern.info] vdc0 is /virtual-devices@100/channel-devices@200/disk@0
Nov 20 12:03:23 ldom0 genunix: [ID 408114 kern.info] /virtual-devices@100/channel-devices@200/disk@0 (vdc0) online

You can see the added disk using format and then use it. In this case I created a temporary ZFS pool.

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c3d0 
          /virtual-devices@100/channel-devices@200/disk@0
       1. c3d1 
          /virtual-devices@100/channel-devices@200/disk@1
Specify disk (enter its number): ^C
# zpool create temp c3d0
# zpool list
NAME    SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  19.9G  5.27G  14.6G  26%  1.00x  ONLINE  -
temp   19.9G   112K  19.9G   0%  1.00x  ONLINE  -
At this point I can just go ahead and use the added disk space. I could have done other things like add it to the existing ZFS pool to make it a mirror, but this illustrates the point.

What happens if I try to remove an in-use disk

It could be very damaging to remove a virtual device while it is in use, so the default behavior is that Solaris tells logical domains manager that the device is in use and cannot be removed. That's a very important advantage of Oracle VM Server for SPARC: the logical domains framework and Solaris work cooperatively, in this and many other aspects.

In this case, we're prevented from yanking a disk while it is in use. If I try to remove the disk while it's in use, I get an error message - exactly what you want:

# ldm rm-vdisk vdisk01 ldom0
Dynamic reconfiguration of the virtual device on domain ldom0
failed with error code (-122).
The OS on domain ldom0 did not report a reason for the failure.
Check the logs on that OS instance for any further information.
Failed to remove vdisk instance
The reason is "because it's in use!" :-) An administrator would log into the guest to see what file systems are mounted. This behavior can be overridden using the "-f" option if you are certain you know what you're doing.

Removing the disk

I issued zpool destroy temp in the guest and repeated the ldm rm-vdsdev and it worked. Using zpool export temp would work just as well, and if I choose I can add that virtual disk to a different domain and it could use zpool import temp to access data created by ldom0. With other file systems, a regular umount would have the same effect, making it possible to remove the disk without -f.

The format command now shows only one disk again, and dmesg shows kernel messages when disk went offline:

Nov 20 12:42:10 ldom0 vdc: [ID 990228 kern.info] vdisk@0 is offline
Nov 20 12:42:10 ldom0 genunix: [ID 408114 kern.info] /virtual-devices@100/channel-devices@200/disk@0 (vdc0) offline

Summary

Solaris and the logical domain manager are engineered to work together in a coordinated fashion to provide operational flexibility. One of the values this provides is that administrators can safely add and remove virtual devices while domains run. This can be used for operational tasks like adding or removing disk capacity or IOPS as needed. The same capabilities are also available for virtual network devices.

Comments:

Is there a way to dynamically reconfigure LUNS (/dev/rdsk) coming from a an EMC VMAX in a control or IO domain. The reason I ask is that once we add disks to our control domains at a primary site, all storage is being SRDF'ed to a disaster recovery site where all disk names are now different (we are using MPXIO). This requires that a lot of adjusting of the ldom xml. It would be fantastic if somehow VDS was able to handle the change and recognize disk name changes.

We can get around this issue by using any lvm like veritas volume manager or even ZFS but we want to take full advantage of Ops Center for management and live migration of LDOMS.

Posted by guest on January 05, 2014 at 04:06 PM MST #

Is there a way to dynamically reconfigure LUNS (/dev/rdsk) coming from a an EMC VMAX in a control or IO domain. The reason I ask is that once we add disks to our control domains at a primary site, all storage is being SRDF'ed to a disaster recovery site where all disk names are now different (we are using MPXIO). This requires that a lot of adjusting of the ldom xml. It would be fantastic if somehow VDS was able to handle the change and recognize disk name changes.

We can get around this issue by using any lvm like veritas volume manager or even ZFS but we want to take full advantage of Ops Center for management and live migration of LDOMS.

Posted by Saeed Ibrahim on January 05, 2014 at 04:07 PM MST #

Hi Saeed,

If I understand the situation, the issue is that you add LUNs on one control domain, but SRDF makes the same LUNs visible elsewhere too, and you'd like the names to be uniform and the 'ldm add-vdsdev' to be done on the SRDF remote site. Is that correct? I'm afraid I don't really have an answer for you - there's no way we can know that a particular device that appears on one control domain is the same as a (possibly differently named) device on a different control domain. Maybe the EMC folks can help, and perhaps they can suggest a way to maintain uniform naming. Apologies in advance if I misunderstood this.

regards, Jeff

Posted by Jeff on January 14, 2014 at 02:03 PM MST #

Post a Comment:
Comments are closed for this entry.
About

jsavit

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today