Consolidation using Solaris Zones is widely adopted. In many cases, people run all the zones on all available CPUs, which is great for overall utilization. In such a case, Solaris does all the scheduling, taking care that the best CPU is chosen for each process and that all resources are distributed fairly amongst all applications. However, there are cases where you would want to dedicate a certain set of CPUs to one or more zones. For example to deal with license restrictions or to create a more strict separation between different workloads. This separation is achieved either by using the “dedicated-cpu” setting in the zone’s configuration, or by binding the zone to an existing resource pool, which in turn contains a processor set. The technology in both cases is the same, since in the case of “dedicated-cpu”, Solaris automatically creates a temporary resource pool when the zone is started. The effect of using a processor set is that the CPUs assigned to it are available exclusively to the zones associated with this set. This means that these zones can use exactly those CPUs – not more, not less. Anything else running on the system (the global zone and any other zones) can no longer be executed on these CPUs.
In this article, I’ll discuss (and hopefully answer) the question, which CPUs to include in such a processor set, and how to figure out which zones currently run on which CPUs.
To avoid unnecessary confusion, let me define a few terms first, since there are multiple names in use for the various concepts:
- A CPU is a processor, consisting of one or more cores, cache and optionally some IO controllers and/or memory controllers.
- A Core is one computation or execution unit on a CPU. (Not to be confused with the pipelines that it contains.)
- A Strand is an entry point into a core, which makes the core’s services available to the operating system.
For example, a SPARC M7 CPU consists of 32 cores. Each core provides 8 strands, so a M7 CPU provides 32*8=256 strands to the OS. The OS treats each of these strands as a fully-fledged execution unit and therefore shows 256 “CPUs”.
All modern multi-core CPUs include multiple levels of caches. The L3 cache is usually shared by all cores. L2 and L1 caches are closer to the cores. They are smaller but faster and often dedicated to one or a small number of cores. (The M7 CPU applies different strategies, but each core owns it’s own, exclusive L1 cache.) Now, if multiple strands of the same core are used by the same process (or application), this can lead to relatively high hit rates in these caches. If, on the other hand, different processes use the same core, there will be competition for the little cache space, overwriting each other’s entries. We call this behavior “cache thrashing”. Solaris does a good job trying to prevent this. However, when using many zones, it is common to assign different zones to different sets of cores. Use whole cores (complete sets of 8 strands) to avoid sharing of cores between zones or applications. This also makes the most sense with regards to license capping, since you usually license your application by the number of cores.
So how can you make sure that your zones are bound correctly to whole, exclusive cores?
Solaris knows about the relation between strands, cores and cpus (as well as the memory hierarchy, which I’ll not cover here). You can query this relation using kstat. For historical reasons (from the times where there were no multi-core or multi-strand cpus), Solaris uses the term “CPU” for what we now call a strand:
root@mars:~# kstat -m cpu_info -s core_id -i 150
module: cpu_info instance: 150
name: cpu_info150 class: misc
core_id 18
root@mars:~# kstat -m cpu_info -s chip_id -i 150
module: cpu_info instance: 150
name: cpu_info150 class: misc
chip_id 1
In the above example, the “cpu” with id 150 is a strand of core 18, which belongs to CPU 1. You can discover all available strands and CPUs like this.
Usually, when you configure a processor set for a resource pool, you just tell it the minimum and maximum number of strands it should contain (where min=max is quite common). Optionally, you can also specify specific CPU-IDs (strands) or, since Solaris 11.2, core IDs. The commands to do this are “pooladm” and “poolcfg“. (There is also the command “psrset“, but it only creates a processor set, not a resource pool, and is not permanent, so needs to be run after every reboot.) I already described the use of these commands a while ago.
Now, to figure out which strands, cores or CPUs are assigned to a specific zone, you’d need to use kstat to find the association between strand IDs in your processor set and the corresponding cores and CPUs. Done manually, that’s a little painful, which is why I wrote a little script to do this for you:
root@mars:~# ./zonecores -h
usage: zonecores [-Sscl]
-S report whole Socket use
-s report shared use
-c report whole core use
-l list cpu overview
With the “-l” commandline option, it will give you an overview of the available CPUs and which zones are running on them. Here’s an example from a SPARC system with 2 16-core CPUs:
root@mars:~# ./zonecores -l
#
# Socket, Core, Strand and Zone Overview
#
Socket Core Strands Zones
0 0 0,1,2,3,4,5,6,7 db2,
0 1 8,9,10,11,12,13,14,15 db2,
0 2 16,17,18,19,20,21,22,23 none
0 3 24,25,26,27,28,29,30,31 db2,
0 4 32,33,34,35,36,37,38,39 db2,
0 5 40,41,42,43,44,45,46,47 db2,
0 6 48,49,50,51,52,53,54,55 db2,
0 7 56,57,58,59,60,61,62,63 coreshare,db1,
0 8 64,65,66,67,68,69,70,71 db2,
0 9 72,73,74,75,76,77,78,79 none
0 10 80,81,82,83,84,85,86,87 none
0 11 88,89,90,91,92,93,94,95 none
0 12 96,97,98,99,100,101,102,103 none
0 13 104,105,106,107,108,109,110,111 none
0 14 112,113,114,115,116,117,118,119 none
0 15 120,121,122,123,124,125,126,127 none
1 16 128,129,130,131,132,133,134,135 none
1 17 136,137,138,139,140,141,142,143 none
1 18 144,145,146,147,148,149,150,151 none
1 19 152,153,154,155,156,157,158,159 none
1 20 160,161,162,163,164,165,166,167 none
1 21 168,169,170,171,172,173,174,175 none
1 22 176,177,178,179,180,181,182,183 none
1 23 184,185,186,187,188,189,190,191 none
1 24 192,193,194,195,196,197,198,199 none
1 25 200,201,202,203,204,205,206,207 none
1 26 208,209,210,211,212,213,214,215 none
1 27 216,217,218,219,220,221,222,223 none
1 28 224,225,226,227,228,229,230,231 none
1 29 232,233,234,235,236,237,238,239 none
1 30 240,241,242,243,244,245,246,247 db2,
1 31 248,249,250,251,252,253,254,255 none
Using the options -S and -c, you can check whether your zones use whole sockets (-S) or whole cores (-c). With -s you can check whether or not several zones share one or more cores, which can be intentional or not, depending on the use case. Here’s an example with various pools and zones on the same system as above:
root@mars:~# ./zonecores -Ssc
#
# Checking Socket Affinity (16 cores per socket)
#
INFO - Zone db2 using 2 sockets for 8 cores.
OK - Zone db1 using 1 sockets for 1 cores.
OK - Zone capped7 using default pool.
OK - Zone coreshare using 1 sockets for 1 cores.
#
# Checking Core Resource Sharing
#
OK - Core 0 used by only one zone.
OK - Core 1 used by only one zone.
OK - Core 3 used by only one zone.
OK - Core 30 used by only one zone.
OK - Core 4 used by only one zone.
OK - Core 5 used by only one zone.
OK - Core 6 used by only one zone.
INFO - Core 7 used by 2 zones!
-> coreshare
-> db1
OK - Core 8 used by only one zone.
#
# Checking Whole Core Assignments
#
OK - Zone db2 using all 8 strands of core 0.
OK - Zone db2 using all 8 strands of core 1.
OK - Zone db2 using all 8 strands of core 3.
OK - Zone db2 using all 8 strands of core 30.
OK - Zone db2 using all 8 strands of core 4.
OK - Zone db2 using all 8 strands of core 5.
FAIL - only 7 strands of core 6 in use for zone db2.
FAIL - only 1 strands of core 8 in use for zone db2.
OK - Zone db1 using all 8 strands of core 7.
OK - Zone coreshare using all 8 strands of core 7.
Info: 1 instances of core sharing found.
Info: 1 instances of socket spanning found.
Warning: 2 issues found with whole core assignments.
While this mostly speaks for itself, here are some comments:
- Zone db01 uses a resource pool with 8 strands from one core.
- Zone coreshare also uses that same pool.
- Zone db2 uses a resource pool with 64 strands coming from cores from two different CPUs. It only uses 7 of the 8 strands from core 6, while the 8th strand comes from core 8. This is probably not intentional. It would make more sense to use all 8 strands from the same core to avoid cache sharing and reduce the number of cores to license by one. It might also be benefitial to use all 8 cores from the same CPU. In this case, Solaris would attempt to allocate memory local to that CPU to avoid remote memory access.
- Zone capped7 is configured with the option “capped-cpu: ncpus=7”. This is implemented using the Fair Share Scheduler (FSS) which uses all available CPUs in the default pool.
The script is available for download here: zonecores
I also wrote a more detailed discussion of all of this, with examples how to reconfigure your pool configuration in MOS DocID 2116794.1
Some links to further reading:
- Solaris 11.3 System Administration Commands: poolcfg and pooladm
- Solaris 11.3 Zones Configuration Resources
- Whitepaper: Hard Partitioning with Oracle Solaris Zones
