Containers in SX build 56

The many Resource Management (RM) features in Solaris have been developed and evolved over the course of years and several releases. We have resource controls, resource pools, resource capping and the Fair Share Scheduler (FSS). We have rctls, projects, tasks, cpu-shares, processor sets and the rcapd(1M). All of these features have different commands and syntax to configure the feature. In some cases, particularly with resource pools, the syntax is quite complex and long sequences of commands are needed to configure a pool. When you first look at RM it is not immediately clear when to use one feature vs. another or if some combination of these features is needed to achieve the RM objectives.

In Solaris 10 we introduced Zones, a lightweight system virtualization capability. Marketing coined the term 'containers' to refer to a combination of Zones and RM within Solaris. However, the integration between the two was fairly weak. Within Zones we had the 'rctl' configuration option, which you could use to set a couple of zone specific resource controls, and we had the 'pool' property which could be used to bind the zone to an existing resource pool, but that was it. Just setting the 'zone.cpu-shares' rctl wouldn't actually give you the right cpu shares unless you also configured the system to use FSS. But, that was a separate step and easily overlooked. Without the correct configuration of these various, disparate components even a simple test, such as a fork bomb within a zone, could disrupt the entire system.

As users started experimenting with Zones we found that many of them were not leveraging the RM capabilities provided by the system. We would get dinged in evaluations because Zones, without a correct RM configuration, didn't provide all of the containment users needed. We always expected Zones and RM to be used together, but due the the complexity of the RM features and the loose integration between the two, we were seeing that few Zones users actually had a proper RM configuration. In addition, our RM for memory control was limited to rcapd running within a zone and capping RSS on projects. This wasn't really adequate.

About 9 months ago the Zones engineering team started a project to try to improve this situation. We didn't want to just paper over the complexity with things like a GUI or wizards, so it took us quite a bit of design before we felt like we hit upon some key abstractions that we could use to truly simplify the interaction between the two components. Eventually we settled upon the idea of organizing the RM features into 'dedicated' and 'capped' configurations for the zone. We enhanced resource pools to add the idea of a 'temporary pool' which we could dynamically instantiate when a zone boots. We enhanced rcapd(1M) so that we could do physical memory capping from the global zone. Steve Lawrence did a lot of work to improve resident set size (RSS) accounting as well as adding new rctls for maximum swap and locked memory. These new features significantly improve RM of memory for Zones. We then enhanced the Zones infrastructure to automatically do the work to set up the various RM features that were configured for the zone. Although the project made many smaller improvements, the key ideas are the two new configuration options in zonecfg(1M). When configuring a zone you can now configure 'dedicated-cpu' and 'capped-memory'. Going forward, as additional RM features are added, we anticipate this idea will evolve gracefully to add 'dedicated-memory' and 'capped-cpu' configuration. We also think this concept can be easily extended to support RM features for other key parts of the system such as the network or storage subsystem.

Here is our simple diagram of how we eventually unified the RM view within Zones.
       | dedicated  |  capped
cpu    | temporary  | cpu-cap
       | processor  | rctl\*
       | set        |
memory | temporary  | rcapd, swap
       | memory     | and locked
       | set\*       | rctl

\* memory sets and cpu caps are under development but are not yet part of Solaris.

With these enhancements, it is now almost trivial to configure RM for a zone. For example, to configure a resource pool with a set of up to four cpu's, all you do in zonecfg is:
zonecfg:my-zone> add dedicated-cpu
zonecfg:my-zone:dedicated-cpu> set ncpus=1-4
zonecfg:my-zone:dedicated-cpu> set importance=10
zonecfg:my-zone:dedicated-cpu> end
To configure memory caps, you would do:
zonecfg:my-zone> add capped-memory
zonecfg:my-zone:capped-memory> set physical=50m
zonecfg:my-zone:capped-memory> set swap=128m
zonecfg:my-zone:capped-memory> set locked=10m
zonecfg:my-zone:capped-memory> end
All of the complexity of configuring the associated RM capabilities is then handled behind the scenes when the zone boots. Likewise, when you migrate a zone to a new host, these RM settings migrate too.

Over the course of the project we discussed these ideas within the opensolaris Zones community where we benefited from much good input which we used in the final design and implementation. The full details of the project are available here and here.

This work is available in Solaris Express build 56 which was just posted. Hopefully folks using Zones will get a chance to try out the new features and let us know what they think. All of the core engineering team actively participates in the zones discuss list and we're happy to try to answer any questions or just hear your thoughts.

Post a Comment:
Comments are closed for this entry.



Top Tags
« December 2016