Dienstag Okt 01, 2013

CPU-DR for Zones

In my last entry, I described how to change the memory configuration of a running zone.  The natural next question is of course, if that also works with CPUs that have been assigned to a zone.  The answer, of course, is "yes".

You might wonder why that would be necessary in the first place.  After all, there's the Fair Share Scheduler, that's extremely capable of managing zones' CPU usage.  However, there are reasons to assign dedicated CPU resources to zones, licensing is one, SLAs with specified CPU requirements another.  In such cases, you configure a fixed amount of CPUs (more precisely, strands) for a zone.  Being able to change this configuration on the fly then becomes desirable.  I'll show how to do that in this blog entry.

In general, there are two ways to assign exclusive CPUs to a zone.  The classic approach is by using a resource pool with an associated processor set.  One or more zones can then be bound to that pool.  The easier solution is to use the parameter "dedicated-cpu" directly when configuring the zone.  In this second case, Solaris will create a temporary pool to manage these resources.  So effectively, the implementation is the same in both cases.  Which makes it clear how to change the CPU configuration in both cases: By changing the pool.  If you do this in the classical approach, the change to the pool will be persistent.  If working with the temporary pool created for the zone, you will also need to change the zone's configuration if you want the change to survive a zone restart.

If you configured you zone with "dedicated-cpu", the temporary pool (and also the temporary processor set that goes along with it) will usually be called "SUNWtmp_<zonename>".   If not, you'll know the name of the pool...  In both cases, everything else is the same:

Let's assume a zone called orazone, currently configured with 1 CPU.  It's to be assigned a second CPU.  The current pool configuration is like this:
root@benjaminchen:~# pooladm                

system default
	string	system.comment 
	int	system.version 1
	boolean	system.bind-default true
	string	system.poold.objectives wt-load

	pool pool_default
		int	pool.sys_id 0
		boolean	pool.active true
		boolean	pool.default true
		int	pool.importance 1
		string	pool.comment 
		pset	pset_default

	pool SUNWtmp_orazone
		int	pool.sys_id 5
		boolean	pool.active true
		boolean	pool.default false
		int	pool.importance 1
		string	pool.comment 
		boolean	pool.temporary true
		pset	SUNWtmp_orazone

	pset pset_default
		int	pset.sys_id -1
		boolean	pset.default true
		uint	pset.min 1
		uint	pset.max 65536
		string	pset.units population
		uint	pset.load 687
		uint	pset.size 3
		string	pset.comment 

			int	cpu.sys_id 1
			string	cpu.comment 
			string	cpu.status on-line

			int	cpu.sys_id 3
			string	cpu.comment 
			string	cpu.status on-line

			int	cpu.sys_id 2
			string	cpu.comment 
			string	cpu.status on-line

	pset SUNWtmp_orazone
		int	pset.sys_id 2
		boolean	pset.default false
		uint	pset.min 1
		uint	pset.max 1
		string	pset.units population
		uint	pset.load 478
		uint	pset.size 1
		string	pset.comment 
		boolean	pset.temporary true

			int	cpu.sys_id 0
			string	cpu.comment 
			string	cpu.status on-line
As we can see in the definition of pset SUNWtmp_orazone, it has been assigned CPU #0.  To add another CPU to this pool, you'll need these two commands:
root@benjaminchen:~# poolcfg -dc 'modify pset SUNWtmp_orapset \
                     (uint pset.max=2)' 
root@benjaminchen:~# poolcfg -dc 'transfer to pset \
                     orapset (cpu 1)'

To remove that CPU from the pool again, use these:

root@benjaminchen:~# poolcfg -dc 'transfer to pset pset_default \
                     (cpu 1)'
root@benjaminchen:~# poolcfg -dc 'modify pset SUNWtmp_orapset \
                     (uint pset.max=1)' 

That's it.   If you've used "dedicated-cpu" for your zone's configuration, you'll need to change that before the next reboot.  If not, you'd have to use the pool name you assigned to the zone.

Further details:

Montag Aug 19, 2013

Memory-DR for Zones

Zones allow you to limit their memory consumption.  The usual way to configure this is with the zone parameter "capped-memory" and it's three sub-values "physical", "swap" and "locked".  "Physical" corresponds to the resource control "zone.max-rss", which is actual main memory.  "Swap" corresponds to "zone.max-swap", which is swapspace and "locked" corresponds to "zone.max-locked-memory", which is non-pageable memory, typically shared memory segments.  Swap and locked memory are rather hard limits that can't be exceeded.  RSS - physical memory, is not quite as hard, being enforced by rcapd.  This daemon will try to page out all those memory pages that are beyond the allowed amount of memory and are least active.  Depending on the activity of the processes in question, this is more or less successful, but will always result in paging activity.  This will slow down the memory-hungry processes in that zone.

If you change any of these values using zonecfg, these changes will only be in effect after a reboot of the zone.  This is not as dynamic as one might be used to from the LDoms world.  But it can be, as I'd like to show you in a small example:

Let's assume a little zone with a memory configuration like this:

root@benjaminchen:~# zonecfg -z orazone info capped-memory
    physical: 512M
    [swap: 256M]
    [locked: 512M]

To change these values while the zone is in operation, you need to interact with two different sub-systems.   For physical memory, we'll need to talk to rcapd.  For swap and locked memory, we need prctl for the normal resource controls.  So, if I wanted to double all three limits for my zone, I'd need these commands:

root@benjaminchen:~# prctl -n zone.max-swap -v 512m -r -i zone orazone
root@benjaminchen:~# prctl -n zone.max-locked-memory -v 1g -r -i zone orazone
root@benjaminchen:~# rcapadm -z orazone -m 1g

These new values will be effective immediately - for rcapd after the next reconfigure-interval.  You can also change this interval with rcapadm.  Note that these changes are not persistent - if you reboot your zone, it will fall back to whatever was configured with zonecfg.  So to have both - persistent changes and immediate effect, you'll need to touch both tools.


  • Solaris Admin Guide:

Dienstag Apr 17, 2012

Solaris Zones: Virtualization that Speeds up Benchmarks

One of the first questions that typically comes up when I talk to customers about virtualization is the overhead involved.  Now we all know that virtualization with hypervisors comes with an overhead of some sort.  We should also all know that exactly how big that overhead is depends on the type of workload as much as it depends on the hypervisor used.  While there have been attempts to create standard benchmarks for this, quantifying hypervisor overhead is still mostly hidden in the mists of marketing and benchmark uncertainty.  However, what always raises eyebrows is when I come to Solaris Zones (called Containers in Solaris 10) as an alternative to hypervisor virtualization.  Since Zones are, greatly simplyfied, nothing more than a group of Unix processes contained by a set of rules which are enforced by the Solaris kernel, it is quite evident that there can't be much overhead involved.  Nevertheless, since many people think in hypervisor terms, there is almost always some doubt about this claim of zero overhead.  And as much as I find the explanation with technical details compelling, I also understand that seeing is so much better than believing.  So - look and see:

The Oracle benchmark teams are so convinced of the advantages of Solaris Zones that they actually use them in the configurations for public benchmarking.  Solaris resource management will also work in a non Zones environment, but Zones make it just so much easier to handle, especially with some of the more complex benchmark configurations.  There are numerous benchmark publications available using Solaris Containers, dating back to the days of the T5440.  Some recent examples, all of them world records, are:

The use of Solaris Zones is documented in all of these benchmark publications.

The benchmarking team also published a blog entry detailing how they make use of resource management with Solaris Zones to actually increase application performance.  That almost asks for calling this "negative overhead", if the term weren't somewhat misleading.

So, if you ever need to substantiate why Solaris Zones have no virtualization overhead, point to these (and probably some more) published benchmarks.


Neuigkeiten, Tipps und Wissenswertes rund um SPARC, CMT, Performance und ihre Analyse sowie Erfahrungen mit Solaris auf dem Server und dem Laptop.

This is a bilingual blog (most of the time). Please select your prefered language:
The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.


« April 2014