Migratory Solaris Kernel Zones
By Jeff Victor-Oracle on Aug 26, 2014
Oracle Solaris 11.2 introduced Oracle Solaris Kernel Zones. Kernel Zones (KZs) offer a midpoint between traditional operating system virtualization and virtual machines. They exhibit the low overhead and low management effort of Solaris Zones, and add the best parts of the independence of virtual machines.
A Kernel Zone is a type of Solaris Zone that runs its own Solaris kernel. This gives each Kernel Zone complete independence of software packages, as well as other benefits.
One of the more interesting new abilities that Kernel Zones bring to Solaris Zones is the ability to "pause" a running KZ and, "resume" it on a different computer - or the same computer, if you prefer.
Of what value is the ability to "pause" a zone? One potential use is moving a workload from a smaller computer (with too few CPUs, or insufficient RAM) to a larger one. Some workloads do not maintain much state, and can restart quickly, and so they wouldn't benefit from suspend/resume. Others, such as static (read-only) databases, may take 30 minutes to start and obtain good performance. The ability to suspend, but not stop, the workload and its operating system can be very valuable.
Another possible use of this ability is the staging of multiple KZs which have already booted and, perhaps, have started to run a workload. Instead of booting in a few minutes, the workload can continue from a known state in just a few seconds. Further, the suspended zone can be "unpaused" on the computer of your choice. Suspended kernel zones are like a nest of dozing ants, waiting to take action at a moment's notice.
This blog entry shows the steps to create and move a KZ, highlighting both the Solaris iSCSI implementation as well as kernel zones and their suspend/resume feature. Briefly, the steps are:
- Create shared storage
- Make shared storage available to both computers - the one that will run the zone, at first, as well as the computer on which the zone will be resumed.
- Configure the zone on each system.
- Install the zone on one system.
- "Warm migrate" the zone by pausing it, and then, on the other computer, resuming it.
Links to relevant documentation and blogs are provided at the bottom.
The Kernel Zones suspend/resume feature requires the use of storage accessible by multiple computers. However, neither Kernel Zones nor suspend/resume requires a specific type of shared storage. In Solaris 11.2 the only types of shared storage that supports zones are iSCSI and Fiber Channel. This blog entry uses iSCSI.
The example below uses three computers. One is the iSCSI target, i.e. the storage server. The other two run the KZ, one at a time. All three systems run Solaris 11.2, although the iSCSI features below work on early updates to Solaris 11, or a ZFS Storage Appliance (the current family shares the brand name ZS3), or another type of iSCSI target.
In the commands shown below, the prompt "storage1#" indicates commands that would be entered into the iSCSI target. Similarly, "node1#" indicates commands that you would enter into the first computer that will run the kernel zone. The few commands preceded by the prompt "bothnodes#" must be run on the both node1 and node2. The name of the kernel zone is "ant1".
For simplicity, the example below ignores security concerns. (More about security below.)
Finally, note that these commands should be run by a non-root user who prefaces each command with the pseudo-command "sudo".
Step 1. Provide shared storage for the kernel zone. The zone only needs one device for its zpool. Redundancy is provided by the zpool in the iSCSI target. (For a more detailed explanation, see the link to the COMSTAR documentation in the section "The Links" below.)
storage1# pkg install group/feature/storage-server # Install necessary software. storage1# svcadm enable stmf:default # Enable that software. storage1# zfs create rpool/zvols # A dataset for the zvol. storage1# zfs create -V 20g rpool/zvols/ant1 # Create a zvol as backing store. storage1# stmfadm create-lu /dev/zvol/rdsk/rpool/zvols/ant1 # Create a back-end LUN. Logical unit created: 600144F068D1CD00000053ECD3D20001 storage1# stmfadm list-lu LU Name: 600144F068D1CD00000053ECD3D20001 storage1# stmfadm add-view 600144F068D1CD00000053ECD3D20001 storage1# stmfadm list-view -l 600144F068D1CD00000053ECD3D20001 View Entry: 0 Host group : All Target Group : All LUN : Auto storage1# svcadm enable -r svc:/network/iscsi/target:default # Enable the target service. storage1# svcs -l iscsi/target fmri svc:/network/iscsi/target:default name iscsi target enabled true state online next_state none state_time Auguest 10, 2014 03:58:50 PM EST logfile /var/svc/log/network-iscsi-target:default.log restarter svc:/system/svc/restarter:default manifest /lib/svc/manifest/network/iscsi/iscsi-target.xml dependency require_any/error svc:/milestone/network (online) dependency require_all/none svc:/system/stmf:default (online) storage1# itadm create-target Target iqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5 successfully created storage1# itadm list-target -v TARGET NAME STATE SESSIONS iqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5 online 0 alias: - auth: none (defaults) targetchapuser: - targetchapsecret: unset tpg-tags: default
Step 2A. Configure initiators Configuring iSCSI on the two iSCSI initiators uses exactly the same commands on each, so I'll just list them once.
bothnodes# svcadm enable network/iscsi/initiator bothnodes# iscsiadm modify discovery --sendtargets enable # The simplest discovery method. bothnodes# iscsiadm add discovery-address 192.168.1.11 # IP address of the storage server.At this point, the initiator will automatically discover all of the iSCSI LUNs offered by that target. One way to view the list of them is with the format(1M) command.
bothnodes# format Searching for disks...done AVAILABLE DISK SELECTIONS: .... 1. c0t600144F068D1CD00000053ECD3D20001d0
Step 2B. On each of the two computers that will host the zone, identify the Storage Uniform Resource Identfiers ("SURI") - see suri(5) for more information. This command tells you the SURI of that LUN, in each of multiple formats. We'll need this SURI to specify the storage for the kernel zone.
bothnodes# suriadm lookup-uri c0t600144F068D1CD00000053ECD3D20001d0 iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001 iscsi://house/target.iqn.1986-03.com.sun:02:238d10b8-cca8-ef7a-e095-e1132d91c4a5,lun.1 dev:dsk/c0t600144F068D1CD00000053ECD3D20001d0Step 2C. When you suspend a kernel zone, its RAM pages must be stored temporarily in a file. In order to resume the zone on a different computer, the "suspend file" must be on storage that both computers can access. For this example, we'll use an NFS share. (Another iSCSI LUN could be used instead.) The method shown below is not particularly secure, although the suspended image is first encrypted. Secure methods would require the use of other Solaris features, but they are not the topic of this blog entry.
storage1# zfs create -p rpool/export/suspend storage1# zfs set share.nfs=on rpool/export/suspendThat share must be made available on both nodes, with appropriate permissions.
node1# mkdir /mnt/suspend node1# mount -F nfs storage1:/export/suspend /mnt/suspend
node2# mkdir /mnt/suspend node2# mount -F nfs storage1:/export/suspend /mnt/suspendStep 3. Configure a kernel zone, using the two iSCSI LUNs and a system profile. You can configure a kernel zone very easily. The only required settings are the name and the use of the kernel zone template. The name of the latter is SYSsolaris-kz. That template specifies a VNIC, 2GB of dedicated RAM, 1 virtual CPU, and local storage that will be configured automatically when the zone is installed. We need shared storage instead of local storage, so one of the first steps will be deleting the local storage resource. That device will have an ID number of zero. After deleting that resource, we add the LUN, using the SURI determined earlier.
node1# zonecfg -z ant1 Use 'create' to begin configuring a new zone. zonecfg:ant1> create -t SYSsolaris-kz zonecfg:ant1> remove device id=0 zonecfg:ant1> add device zonecfg:ant1:device> set storage=iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001 zonecfg:ant1:device> set bootpri=0 zonecfg:ant1:device> info device: match not specified storage: iscsi://house/luname.naa.600144f068d1cd00000053ecd3d20001 id: 1 bootpri: 0 zonecfg:ant1:device> end zonecfg:ant1> add suspend zonecfg:ant1:suspend> set path=/mnt/suspend/ant1.sus zonecfg:ant1:suspend> end zonecfg:ant1> exitWe can create a reusable configuration profile.
node1# sysconfig create-profile -o ant1 [The usual sysconfig conversation ensues...]
Step 4. Install and boot the kernel zone.
node1# zoneadm -z ant1 install -c ant1/sc_profile.xml Progress being logged to /var/log/zones/zoneadm.20140815T155143Z.ant1.install pkg cache: Using /var/pkg/publisher. Install Log: /system/volatile/install.15996/install_log AI Manifest: /tmp/zoneadm15390.W_a4NE/devel-ai-manifest.xml SC Profile: /usr/share/auto_install/sc_profiles/enable_sci.xml Installation: Starting ... Creating IPS image Installing packages from: solaris origin: http://pkg.oracle.com/solaris/release/ The following licenses have been accepted and not displayed. Please review the licenses for the following packages post-install: consolidation/osnet/osnet-incorporation Package licenses may be viewed using the command: pkg info --license
DOWNLOAD PKGS FILES XFER (MB) SPEED Completed 483/483 64276/64276 543.7/543.7 0B/s PHASE ITEMS Installing new actions 87530/87530 Updating package state database Done Updating package cache 0/0 Updating image state Done Creating fast lookup database Done Installation: Succeeded Done: Installation completed in 207.564 seconds. node1# zoneadm -z ant1 boot
Step 5. With all of the hard work behind us, we can "warm migrate" the zone. The first step is preparation of the destination system - "node2" in our example - by applying the zone's configuration to the destination.
node1# zonecfg -z ant1 export -f /mnt/suspend/ant1.cfg
node2# zonecfg -z ant1 -f /mnt/suspend/ant1.cfgThe "detach" operation does not delete anything. It merely tells node1 to cease considering the zone to be usable.
node1# zoneadm -z ant1 suspend node1# zoneadm -z ant1 detach
A separate "resume" sub-command for zoneadm was not necessary. The "boot" sub-command fulfills that purpose.
node2# zoneadm -z ant1 attach node2# zoneadm -z ant1 bootOf course, "warm migration" is different from "live migration" in one important respect: the duration of the service outage. Live migration achieves a service outage that lasts a small fraction of a second. In one experiment, warm migration of a kernel zone created a service outage that lasted 30 seconds. It's not live migration, but is an important step forward, compared to other types of Solaris Zones.
- This example used a zpool as back-end storage. That zpool provided data redundancy, so additional redundancy was not needed within the kernel zone. If unmirrored devices (e.g. physical disks were specified in zonecfg) then data redundancy should be achieved within the zone. Fortunately, you can specify two devices in zonecfg, and "zoneadm ... install" will automatically mirror them.
- In a simple network configuration, the steps above create a kernel zone that has normal network access. More complicated networks may require additional steps, such as VLAN configuration, etc.
- Some steps regarding file permissions on the NFS mount were omitted for clarity. This is one of the security weaknesses of the steps shown above. All of the weaknesses can be addressed by using additional Solaris features. These include, but are not limited to, iSCSI features (iSNS, CHAP authentication, RADIUS, etc.), NFS security features (e.g. NFS ACLs, Kerberos, etc.), RBAC, etc.
- The Kernel Zones documentation and the zonecfg(1M) man page describe kernel zones.
- The Configuring Storage Devices with COMSTAR section in the "Managing Devices in Oracle Solaris 11.2" documentation.
- The iSCSI Initiator documentation.
- The suriadm(1M) man page.
- The Install a kernel zone in 3 steps blog entry.
- The Tour of a kernel zone blog entry.
- The Kernel Zones for SAP blog entry.
- The Deploying SAS 9.4 with Oracle Solaris 11.2 Kernel Zones and Unified Archives white paper.
- And finally the Solaris 11.2: All the Posts meta-post.