Containers on NFS?
By jph on Jun 10, 2005
Thinking I needed a good story as a reason to start writing a blog I hadn't joined the many friends who have been blogging for quite some time, but now I found a good reason. Btw. For those of you who like short blog entries I definitely don't plan to have entries this long too often either ...
I guess an introduction is in place for my first Sun Blog. My name is Joost Pronk van Hoogeveen (yes I had to write my full surname on every sheet on every exam I ever took), although I normally only use Joost Pronk to make it easier for myself and others. I work in the Solaris group and work on the resource/workload management tools (Solaris Resource Management, Solaris Zones, Solaris Containers, Solaris Container Manager, ...). I expect that's probably what I'll blog about most... Oh, and I'm Dutch so I might throw some of that in too.
Anyway a few weeks ago I was at a Sun customer conference discussing the fact that we currently don't support running Solaris Containers on NFS. More precisely we don't support the ability to have a Solaris Zone with its root filesystem on an NFS mounted directory.Solaris Containers
Before I go further I should discuss the Solaris Container versus Solaris Zones nomenclature. In short: Solaris Containers = Solaris Zones + Solaris Resource Management.
The longer version: Solaris Containers is the group of technologies that allow the administrator to create separate environments for their applications in Solaris. Solaris Zones is the technology that allows the admin to create a separate logical namespace environments, and the Solaris Resource Management (RM) allows them to assign amounts of system resources to their applications. To create a complete "Container" you'd want to use both technologies and not just one. Now Solaris Zones is really new and cool and it gets most of the lime-light but a good container also has the RM stuff configured too. Now when talking about it I generally default back to the "bigger" Solaris Container to not lock out the use of RM, but sometimes you need to zoom in to be precise and I'll talk about an individual piece, which is often a zone because that's "the new thing" people want info about.Anyway back to the story ...
So for a whole bunch of reasons running a zone's root filesystem on an NFS mounted directory doesn't work today, and most of them have to do with the fact that the global zone (who makes the initial NFS mount) has a different IP address than the non-global zone (who then wants to use this mount), and credentials, and the possible use of Kerberos, and different domain names, and ...
What if the non-global zone doesn't see any of this because it's behind some other mechanism?
At breakfast a customer (Thomas Nau from the University of Ulm) and I were talking about this when he said; "why don't we use lofi?" So we did it right then and there with two laptops and it works (big grin).So here's how we did it
First let me say this is a workaround hack, we didn't do anything illegal, and all the interfaces we used are regular Solaris interfaces, however it comes dangerously close to the "don't do this at home folks" category. So think twice before you'd use it in a production environment, but it definitely fun to do.
In short what we do is: We mount an NFS filesystem; Create a file; lofiadm this file; newfs this new lofi block device; mount the device on de zonepath location; Define a zone with the correct zonepath; Install and boot the zone. (sounds straight forward, right?)Before we start
Make sure you have an NFS share that you can mount:
root@nfsserv # share - /export rw "" root@nfsserv # pwd /export/zonesMount the filesystem and create a file
This means create a mount point (/zonemount), mount the filesystem on this, create the file:
root@zoneserver # mkdir /zonemount root@zoneserver # mount nfsserv:/export/zones /zonemount root@zoneserver # df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t0d0s0 8242397 5397348 2762626 67% / /devices 0 0 0 0% /devices ctfs 0 0 0 0% /system/contract proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab swap 443904 400 443504 1% /etc/svc/volatile objfs 0 0 0 0% /system/object fd 0 0 0 0% /dev/fd swap 443560 56 443504 1% /var/run swap 446320 2816 443504 1% /tmp nfsserv:/export/zones 17088954 3549610 13368455 21% /zonemount root@zoneserver # cd /zonemount root@zoneserver # mkfile 300M zone1_file root@zoneserver # ls -l total 614720 -rw------- 1 root root 314572800 Jun 1 2005 zone1_filelofiadm a new device and newfs this device
Now we have a 300MB file we can use lofiadm(1M) to create a new block device in /dev/lofi. In this case we choose to specify which device (/dev/lofi/1) we want. Once we've done that we use newfs(1M) to put a filesystem on our device/file:
root@zoneserver # lofiadm -a /zonemount/zone1_file /dev/lofi/1 root@zoneserver # newfs /dev/lofi/1 newfs: construct a new file system /dev/rlofi/1: (y/n)? y /dev/rlofi/1: 614400 sectors in 1024 cylinders of 1 tracks, 600 sectors 300.0MB in 64 cyl groups (16 c/g, 4.69MB/g, 2240 i/g) super-block backups (for fsck -F ufs -o b=#) at: 32, 9632, 19232, 28832, 38432, 48032, 57632, 67232, 76832, 86432, 518432, 528032, 537632, 547232, 556832, 566432, 576032, 585632, 595232, 604832,Create a mount point and mount the new filesystem on it
Now we have the device, we need to mount it where we plan to have our zonepath's. So first we mkdir(1M) a new mount point /zones/zone1, and then mount the new filesystem on top of this mount point:
root@zoneserver # mkdir /zones/zone1 root@zoneserver # mount /dev/lofi/1 /zones/zone1 root@zoneserver # chmod 700 /zones/zone1
Note: Because we created the filesystem instead of zoneadm doing if for us we need to use chmod(1) to make the directory only accessible to the global root user.Configure and Install the zone
So now we'll quickly create a fairly standard non-global zone and tell it to install:
root@zoneserver # zonecfg -z zone1 zone1: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:zone1> create zonecfg:zone1> set zonepath=/zones/zone1 zonecfg:zone1> add net zonecfg:zone1:net> set physical=hme0 zonecfg:zone1:net> set address=192.168.1.101/24 zonecfg:zone1:net> end zonecfg:zone1> verify zonecfg:zone1> exit root@zoneserver # zoneadm -z zone1 install Preparing to install zone
. Creating list of files to copy from the global zone. Copying <2568> files to the zone. Initializing zone product registry. Determining zone package initialization order. Preparing to initialize <946> packages on the zone. Initialized <946> packages on zone. Zone is initialized. The file contains a log of the zone installation. root@zoneserver # zoneadm list -cv ID NAME STATUS PATH 0 global running / - zone1 installed /zones/zone1 root@zoneserver # df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t0d0s0 8242397 5397356 2762618 67% / /devices 0 0 0 0% /devices ctfs 0 0 0 0% /system/contract proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab swap 442520 400 442120 1% /etc/svc/volatile objfs 0 0 0 0% /system/object fd 0 0 0 0% /dev/fd swap 442176 56 442120 1% /var/run swap 445136 3016 442120 1% /tmp nfsserv:/export/zones 17088954 3549875 13368190 21% /zonemount /dev/lofi/1 288239 64799 194617 25% /zones/zone1 root@zoneserver # zoneadm -z zone1 boot root@zoneserver # zoneadm list -cv ID NAME STATUS PATH 0 global running / 1 zone1 running /zones/zone1
Voila! It works
Note: df(1M) now shows us both mounts where the zone install only used 64799 KB out of the 288239 KB available on our lofi device-file-thing.Why this works
So why does this work? Well all of the NFS credential stuff that would normally break booting a non-global zone from NFS is now hidden from the zone. All of the NFS related stuff is done by the global zone and its credentials, the NFS server side never even sees any of the local zone files nor does it need to know about the IP addresses the non-global zone is using. On the other side the non-global zone never really knows its files are not on the local system, all its file are encapsulated in the filesystem living in the lofi block device. That this block device is actually a file on an NFS mount is not visible to it, nor does it care.So can I use it?
Well that's up to you. First as you can see above this is a fairly laborious process, and the more steps the more opportunities to mess up. Of course you can automate most of this, but still the debugability and measurability for someone who hasn't seen your setup isn't very high.
And then there is the version compliance. Having the zone root on NFS begs you to use it to migrate zones from one system to another, but be aware every zone has its package and patch list/revision embedded in it. So when you move them around they should only move to a system with an equal patch level, or you could get very strange behavior. what's going on there is worth a whole series of blogs and this entry is getting pretty long as it is...
Anyway, have fun.