New Zones Features
By Jeff Victor-Oracle on Sep 05, 2007
This update to Solaris 10 has many new features. Of those, many enhance Solaris Containers either directly or indirectly. This update brings the most important changes to Containers since they were introduced in March of 2005. A brief introduction to them seems appropriate, but first a review of the previous update.
Solaris 10 11/06 added four features to Containers. One of them is called "configurable privileges" and allows the platform administrator to tailor the abilities of a Container to the needs of its application. I blogged about configurable privileges before, so I won't say any more here.
At least as important as that feature was the new ability to move (also called 'migrate') a Container from one Solaris 10 computer to another. This uses the 'detach' and 'attach' sub-commands to zoneadm(1M).
Other, minor new features, included:
- rename a zone (i.e. Container)
- move a zone to a different place in the file system on the same computer
New Features in Solaris 10 8/07 that Enhance Containers
New Resource Management FeaturesSolaris 10 8/07 has improved the resource management features of Containers. Some of these are new resource management features and some are improvements to the user interface. First I will describe three new "RM" features.
Earlier releases of Solaris 10 included the Resource Capping Daemon. This tool enabled you to place a 'soft cap' on the amount of RAM (physical memory) that an application, user or group of users could use. Excess usage would be detected by rcapd. When it did, physical memory pages owned by that entity would be paged out until the memory usage decreased below the cap.
Although it was possible to apply this tool to a zone, it was cumbersome and required cooperation from the administrator of the Container. In other words, the root user of a capped Container could change the cap. This made it inappropriate for potentially hostile environments, including service providers.
Solaris 10 8/07 enables the platform administrator to set a physical memory cap on a Container using an enhanced version of rcapd. Cooperation of the Container's administrator is not necessary - only the platform administrator can enable or disable this service or modify the caps. Further, usage has been greatly simplified to the following syntax:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set physical=500m zonecfg:myzone:capped-memory> end zonecfg:myzone> exitThe next time the Container boots, this cap (500MB of RAM) will be applied to it. The cap can be also be modified while the Container is running, with:
global# rcapadm -z myzone -m 600mBecause this cap does not reserve RAM, you can over-subscribe RAM usage. The only drawback is the possibility of paging.
For more details, see the online documentation.
Virtual memory (i.e. swap space) can also be capped. This is a 'hard cap.' In a Container which has a swap cap, an attempt by a process to allocate more VM than is allowed will fail. (If you are familiar with system calls: malloc() will fail with ENOMEM.)
The syntax is very similar to the physical memory cap:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set swap=1g zonecfg:myzone:capped-memory> end zonecfg:myzone> exitThis limit can also be changed for a running Container:
global# prctl -n zone.max-swap -v 2g -t privileged -r -e deny -i zone myzoneJust as with the physical memory cap, if you want to change the setting for a running Container and for the next time it boots, you must use zonecfg and prctl or rcapadm.
The third new memory cap is locked memory. This is the amount of physical memory that a Container can lock down, i.e. prevent from being paged out. By default a Container now has the proc_lock_memory privilege, so it is wise to set this cap for all Containers.
Here is an example:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set locked=100m zonecfg:myzone:capped-memory> end zonecfg:myzone> exit
Simplified Resource Management FeaturesDedicated CPUs
Many existing resource management features have a new, simplified user interface. For example, "dedicated-cpus" re-use the existing Dynamic Resource Pools features. But instead of needing many commands to configure them, configuration can be as simple as:
global# zonecfg -z myzone zonecfg:myzone> add dedicated-cpu zonecfg:myzone:dedicated-cpu> set ncpus=1-3 zonecfg:myzone:dedicated-cpu> end zonecfg:myzone> exitAfter using that command, when that Container boots, Solaris:
- removes a CPU from the default pool
- assigns that CPU to a newly created temporary pool
- associates that Container with that pool, i.e. only schedules that Container's processes on that CPU
Also, three existing project resource controls were applied to Containers:
global# zonecfg -z myzone zonecfg:myzone> set max-shm-memory=100m zonecfg:myzone> set max-shm-ids=100 zonecfg:myzone> set max-msg-ids=100 zonecfg:myzone> set max-sem-ids=100 zonecfg:myzone> exitFair Share Scheduler
A commonly used method to prevent "CPU hogs" from impacting other workloads is to assign a number of CPU shares to each workload, or to each zone. The relative number of shares assigned per zone guarantees a relative minimum amount of CPU power. This is less wasteful than dedicating a CPU to a Container that will not completely utilize the dedicated CPU(s).
Several steps were needed to configure this in the past. Solaris 10 8/07 simplifies this greatly: now just two steps are needed. The system must use FSS as the default scheduler. This command tells the system to use FSS as the default scheduler the next time it boots.
global# dispadmin -d FSSAlso, the Container must be assigned some shares:
global# zonecfg -z myzone zonecfg:myzone> set cpu-shares=100 zonecfg:myzone> exitShared Memory Accounting
One feature simplification is not a reduced number of commands, but reduced complexity in resource monitoring. Prior to Solaris 10 8/07, the accounting of shared memory pages had an unfortunate subtlety. If two processes in a Container shared some memory, per-Container summaries counted the shared memory usage once for every process that was sharing the memory. It would appear that a Container was using more memory than it really was.
This was changed in 8/07. Now, in the per-Container usage section of prstat and similar tools, shared memory pages are only counted once per Container.
Global Zone Resource ManagementSolaris 10 8/07 adds the ability to persistently assign resource controls to the global zone and its processes. These controls can be applied:
- capped-memory: physical, swap, locked
- dedicated-cpu: ncpus, importance
global# zonecfg -z global zonecfg:myzone> set cpu-shares=100 zonecfg:myzone> set scheduling-class=FSS zonecfg:myzone> exitUse those features with caution. For example, assigning a physical memory cap of 100MB to the global zone will surely cause problems...
New Boot ArgumentsThe following boot arguments can now be used:
|Argument or Option||Meaning|
|-s||Boot to the single-user milestone|
|-m <milestone>||Boot to the specified milestone|
|-i </path/to/init>||Boot the specified program as 'init'. This is only useful with branded zones.|
Allowed syntaxes include:
global# zoneadm -z myzone boot -- -s global# zoneadm -z yourzone reboot -- -i /sbin/myinit ozone# reboot -- -m verboseIn addition, these boot arguments can be stored with zonecfg, for later boots.
global# zonecfg -z myzone zonecfg:myzone> set bootargs="-m verbose" zonecfg:myzone> exit
Configurable PrivilegesOf the existing three DTrace privileges, dtrace_proc and dtrace_user can now be assigned to a Container. This allows the use of DTrace from within a Container. Of course, even the root user in a Container is still not allowed to view or modify kernel data, but DTrace can be used in a Container to look at system call information and profiling data for user processes.
Also, the privilege proc_priocntl can be added to a Container to enable the root user of that Container to change the scheduling class of its processes.
IP InstancesThis is a new feature that allows a Container to have exclusive access to one or more network interfaces. No other Container, even the global zone, can send or receive packets on that NIC.
This also allows a Container to control its own network configuration, including routing, IP Filter, the ability to be a DHCP client, and others. The syntax is simple:
global# zonecfg -z myzone zonecfg:myzone> set ip-type=exclusive zonecfg:myzone> add net zonecfg:myzone:net> set physical=bge1 zonecfg:myzone:net> end zonecfg:myzone> exit
IP Filter ImprovementsSome network architectures call for two systems to communicate via a firewall box or other piece of network equipment. It is often desirable to create two Containers that communicate via an external device, for similar reasons. Unfortunately, prior to Solaris 10 8/07 that was not possible. In 8/07 the global zone administrator can configure such a network architecture with the existing IP Filter commands.
Upgrading and Patching Containers with Live UpgradeSolaris 10 8/07 adds the ability to use Live Upgrade tools on a system with Containers. This makes it possible to apply an update to a zoned system, e.g. updating from Solaris 10 11/06 to Solaris 10 8/07. It also drastically reduces the downtime necessary to apply some patches.
The latter ability requires more explanation. An existing challenge in the maintenance of zones is patching - each zone must be patched when a patch is applied. If the patch must be applied while the system is down, the downtime can be significant.
Fortunately, Live Upgrade can create an Alternate Boot Environment (ABE) and the ABE can be patched while the Original Boot Environment (OBE) is still running its Containers and their applications. After the patches have been applied, the system can be re-booted into the ABE. Downtime is limited to the time it takes to re-boot the system.
An additional benefit can be seen if there is a problem with the patch and that particular application environment. Instead of backing out the patch, the system can be re-booted into the OBE while the problem is investigated.
Branded Zones (Branded Containers)Some times it would be useful to run an application in a Container, but the application is not yet available for Solaris, or is not available for the version of Solaris that is being run. To run an application like that, perhaps a special Solaris environment could be created that only runs applications for that version of Solaris, or for that operating system.
Solaris 10 8/07 contains a new framework called Branded Zones. This framework enables the creation and installation of Containers that are not the default 'native' type of Containers, but have been tailored to run 'non-native' applications.
Solaris Containers for Linux ApplicationsThe first brand to be integrated into Solaris 10 is the brand called 'lx'. This brand is intended for x86 appplications which run well on CentOS 3 or Red Hat Linux 3. This brand is specific to x86 computers. The name of this feature is Solaris Containers for Linux Applications.
This was only a brief introduction to these many new and improved features. Details are available in the usual places, including http://docs.sun.com, http://sun.com/bigadmin, and http://www.sun.com/software/solaris/utilization.jsp.