X

Jeff Victor's Blog

All New Zonestat!

Jeff Victor
Principal Systems Engineer

Part 1


Recently I gave
a brief overview of the enhancements available in Solaris 11 Express. I also hinted at more blog entries, mostly featuring Solaris Zones and network virtualization.


Before experimenting with new functions it's useful to have some tools to measure the results. With that in mind, this blog entry and its successor(s) will discuss new measurement tools that are in
Solaris 11 Express: zonestat(1), flowstat(1M) and dlstat(1M). I will start with zonestat.


Zonestat Introduction



But first some history. For Solaris 10 I created an open-source tool I named "zonestat". That tool filled a need: one integrated view of the resource consumption and optional resource control settings of all running zones. The resources listed included CPUs, physical memory, virtual memory, and locked memory. Zonestat provided a "dashboard" that greatly eased the task of monitoring the resource usage of Solaris Zones.


That tool has these main drawbacks:


  1. It's written in Perl and uses a large set of existing Solaris commands
    to gather all of the data that it needs. Executing all of those commands
    for each data sample uses a significant amount of CPU time.
  2. It is a separate tool, not part of Solaris. It is not supported.
  3. It was originally intended as a prototype, a demonstration of what
    could be accomplished. I made a number of enhancements along the way,
    but for a while it wasn't clear whether it made sense to upgrade it for
    Solaris 11.

However, even with those shortcomings, that zonestat script was put into production
at a number of data centers.


In 2009 Solaris Engineering decided to write a fully supported version of
zonestat, as a new Solaris command. Instead of someone writing code in his
spare time (me), a member of the Solaris Zones Engineering Team (Steve
Lawrence) was assigned to write a comprehensive, efficient, fully featured tool
that achieved many of the same goals as the original zonestat, and many more.
Using the experience gained from the original zonestat script, a completely new
program (also called "zonestat") had the potential to solve all of the problems of the
open-source Perl script, and add new features which had been requested by
users of Solaris Zones, and other features which the Zones Engineering Team knew
would be useful.


And in Solaris 11 Express that potential was realized. Because the new zonestat performs almost all of the functions of the original zonestat script, and performs far more in addition, the rest of this blog entry (and the next one) will only discuss the new zonestat which is part of Solaris 11 Express.


The new zonestat(1) command has a plethora of options. These options
allow the user to list data:


  • for each of the system's zones, including the global zone and
    data specific to kernel processing but not directly attributable to any one
    zone
  • for any subset of zones
  • regarding one or more types of resources, in absolute units or as a
    portion of available or capped resources
  • regarding one or more instances of resources (e.g. a particular
    processor set)
  • that has been sorted by one or more output columns
  • that is human-readable output or output that can be easily parsed by
    a script or other program
  • that includes timestamps, in one of several formats
  • that includes regular aggregations (called "summary reports"), such as
    "highest value during the interval" or "average value during the interval"

Zonestat has a variety of uses. The most obvious is monitoring resource
usage of zones. Even if you don't use resource controls, zonestat will
help you by telling you when a zone is using a significant portion (or all!)
of the system's resources. Of course, zonestat really brings value to
systems that are using resource controls, making it easy to determine
which zones are near their caps - a sure sign that there is a problem
with that zone's workload or that the zone's cap is too low.


In addition, you can use zonestat to determine proper values for resource
controls. For example, you can deploy a workload in a zone and use
zonestat to determine the maximum amount of CPU capacity it uses. That
information will enable you to make better decisions about how many CPUs
to assign to that zone - if you have decided that the workload should use its own, dedicated CPUs to the zone.


If you are not familiar with the resource management controls offered by
Solaris, you may wish to view the
relevant
documentation
before, during or after reading the rest of this. The book
"Oracle Solaris 10 System Virtualization Essentials" also describes all of
the resource controls available for Solaris 10 Zones, and how they can be
used to achieve various goals. Finally, the document
"Understanding
the Security Capabilities of Solaris
" approaches the same content from a
security perspective.


Now let's explore some of the interesting things you can do with zonestat.

Basics



The default output provides the data you would expect - basic information about the resource usage of all zones on the system. The command syntax can be simplified to this (omitting some features for now):

zonestat [options] interval [duration]

and the basic output looks like this:

GZ$ zonestat 5 2
Collecting data for first interval...
Interval: 1, Duration: 0:00:05
SUMMARY Cpus/Online: 32/32 Physical: 31.8G Virtual: 47.8G
----------CPU---------- ----PHYSICAL----- -----VIRTUAL-----
ZONE USED %PART %CAP %SHRU USED PCT %CAP USED PCT %CAP
[total] 0.10 0.31% - - 3109M 9.52% - 7379M 15.0% -
[system] 0.01 0.04% - - 2797M 8.57% - 7115M 14.5% -
global 0.08 0.51% - - 141M 0.43% - 129M 0.26% -
zoneA 0.00 0.02% - - 43.7M 0.13% - 35.4M 0.07% -
zoneB 0.00 0.02% - - 42.0M 0.12% - 32.8M 0.06% -
zoneC 0.00 0.04% - - 42.0M 0.12% - 32.8M 0.06% -
zoneD 0.00 0.02% - - 42.1M 0.12% - 33.2M 0.06% -
Interval: 2, Duration: 0:00:10
SUMMARY Cpus/Online: 32/32 Physical: 31.8G Virtual: 47.8G
----------CPU---------- ----PHYSICAL----- -----VIRTUAL-----
ZONE USED %PART %CAP %SHRU USED PCT %CAP USED PCT %CAP
[total] 0.09 0.30% - - 3109M 9.52% - 7379M 15.0% -
[system] 0.01 0.03% - - 2797M 8.57% - 7115M 14.5% -
global 0.08 0.51% - - 142M 0.43% - 129M 0.26% -
zoneA 0.00 0.02% - - 43.7M 0.13% - 35.4M 0.07% -
zoneB 0.00 0.02% - - 42.0M 0.12% - 32.8M 0.06% -
zoneC 0.00 0.02% - - 42.0M 0.12% - 32.8M 0.06% -
zoneD 0.00 0.02% - - 42.1M 0.12% - 33.2M 0.06% -

First, note that unlike other Solaris stat tools (e.g. vmstat) the first set of data is not a summary since the system booted. Instead, zonestat pauses for the time interval specified on the command line, at which point it displays data representing the first sample. (Zonestat doesn't actually collect the data. Its companion, zonestatd(1M) performs that service for all zonestat clients.)


Also, you probably noticed those two special lines, "[total]" and "[system]".
The first of those indicates data about the total quantity of each resource,
across the whole system. The lines labeled "[system]" show resource
consumption by the kernel or by processes that aren't associated with any
one zone.


Zonestat can produce a great deal of information - more than will fit on
one line. Its various options allow you to view summary data - as provided
in the default - or to focus on a zone, or on a particular type or instance
of a resource, or a combination of those. Obviously, the header will be
tailored to the output requested.


The summary header looks like this:


Interval: 1, Duration: 0:00:05
SUMMARY Cpus/Online: 32/32 Physical: 31.8G Virtual: 47.8G
----------CPU---------- ----PHYSICAL----- -----VIRTUAL-----
ZONE USED %PART %CAP %SHRU USED PCT %CAP USED PCT %CAP

The first line of data, per sample, tells you the ordinal number of the
sample - not very useful if you're just checking a few seconds of data,
but pretty helpful when you're scanning through 3 days worth of output.
The Duration field is similar, but is a measurement of time since the command began.


The SUMMARY line shows the quantity of CPUs that exist in the system and
how many of them are online. (I wrote an earlier blog entry about the
method that
Solaris uses to count CPUs
.) That line also shows the system's amount of
RAM ("Physical") and Virtual Memory (the size of RAM plus swap space on disk).


The ZONE column contains the name of the zone. The values in that row represent that zone's use of resources. The columns labeled USED show that zone's consumption of each resource. The unit depends on the resource.
For CPUs, a value of 1 represents a "Solaris CPU." For memory, the unit is specified in the output.


Besides those generic header elements, some are specific to a resource type.
%PART shows the CPU utilization, as a percentage of the compute capacity of the
processor set in which the zone's processes run. %CAP is the percentage of the
zone's CPU cap which has been used recently (if a cap has been applied to the zone).
%SHRU indicates the amount of CPU used as a percentage of the shares assigned to
the zone (if the Fair Share Scheduler is in use and shares have been assigned
to this zone). The latter may occasionally
show a surprising result: a value greater than 100%. I don't have space here to
explain the Fair Share Scheduler, but the short version is "FSS enforces a minimum
amount of available CPU capicity if there is contention for the CPUs, but it does
not enforce a maximum. If there isn't contention, any process which wants to consume
CPU cycles can do so - which can lead to a value greater than 100%."


The PHYSICAL section shows the amount of RAM used, the portion of the
system's memory (PCT) represented by that amount of RAM, and the portion
of the zone's RAM cap, if one has been set. The VIRTUAL section has
similar fields.

Comparing Usage to Caps


To show the data you might see when a zone has a RAM cap, let's set one. We
could do this in zonecfg(1M) for the next time the zone boots, but I don't
feel like rebooting the zone, so let's add that cap while the zone runs.
First, a quick check of the resource capping daemon. (In these examples, I am
logged in as a user which is configured to use non-default administrative privileges.
To temporarily gain those privileges, I will use the pfexec(1) command.)
GZ$ svcs rcap
STATE STIME FMRI
online Oct_28 svc:/system/rcap:default

The service is online, and the cap is easy to set:
GZ$ pfexec rcapadm -z zoneB -m 512m
GZ$ zonestat -z zoneB 10 1
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
SUMMARY Cpus/Online: 32/32 Physical: 31.8G Virtual: 47.8G
----------CPU---------- ----PHYSICAL----- -----VIRTUAL-----
ZONE USED %PART %CAP %SHRU USED PCT %CAP USED PCT %CAP
[total] 0.15 0.49% - - 3112M 9.53% - 7382M 15.0% -
[system] 0.01 0.05% - - 2797M 8.57% - 7115M 14.5% -
zoneB 0.00 0.10% - - 42.5M 0.13% 8.31% 33.3M 0.06% -

ZoneB is using 42.5MB, which is 0.13% of the system's memory (31.8GB),
and 8.31% of the 512MB cap that we set.


One of the many very useful abilities of zonestat is its ability to focus
on a small part of the data which it can potentially display. The previous
example demonstrated its ability to limit the output to one zone. We can
also limit the output to just one resource type, or "zoom in" further
to one instance of a resource.


Let's limit our view to the RAM used by that zone:

GZ$ zonestat -r physical-memory -z zoneB 10 1
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
PHYSICAL-MEMORY SYSTEM MEMORY
mem_default 31.8G
ZONE USED PCT CAP %CAP
[total] 3113M 9.53% - -
[system] 2797M 8.56% - -
zoneB 42.5M 0.13% 512M 8.31%

We can "zoom out" and look at all of the processor sets and their zone
assignments (something that was difficult in Solaris 10):
GZ$ pfexec zoneadm -z zoneA boot
GZ$ pfexec zoneadm -z zoneC boot
GZ$ pfexec zoneadm -z zoneD boot
GZ$ pfexec zonestat -r psets 10 1
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAX
pset_default default-pset 16/16 1/-
ZONE USED PCT CAP %CAP SHRS %SHR %SHRU
[total] 0.17 1.10% - - - - -
[system] 0.03 0.23% - - - - -
global 0.13 0.86% - - - - -
PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAX
zoneD dedicated-cpu 4/4 4/4
ZONE USED PCT CAP %CAP SHRS %SHR %SHRU
[total] 1.10 27.6% - - - - -
[system] 0.00 0.00% - - - - -
zoneD 1.10 27.6% - - - - -
PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAX
zoneC dedicated-cpu 4/4 4/4
ZONE USED PCT CAP %CAP SHRS %SHR %SHRU
[total] 1.00 25.0% - - - - -
[system] 0.08 2.00% - - - - -
zoneC 0.92 23.0% - - - - -
PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAX
zoneB dedicated-cpu 4/4 4/4
ZONE USED PCT CAP %CAP SHRS %SHR %SHRU
[total] 0.00 0.14% - - - - -
[system] 0.00 0.00% - - - - -
zoneB 0.00 0.14% - - - - -
PROCESSOR_SET TYPE ONLINE/CPUS MIN/MAX
zoneA dedicated-cpu 4/4 4/4
ZONE USED PCT CAP %CAP SHRS %SHR %SHRU
[total] 0.00 0.14% - - - - -
[system] 0.00 0.00% - - - - -
zoneA 0.00 0.14% - - - - -

With the basics out of the way, next time I will discuss some other options
that display other data and organize the output in different ways.




src=""
alt="counter for tumblr" >

Join the discussion

Comments ( 7 )
  • Craig S. Bell Tuesday, December 7, 2010

    Jeff, this is great stuff -- thanks for demonstrating. I very much look forward to having more comprehensive per-zone statistics.

    My first question: Does zonestatd(1M) retain all of the data, or does it only collect when somebody is looking with zonestat(1)?

    On my hosts, I would like to keep all of the raw telemetry, and run reports against it -- something analogous to "sar -f <file> -s <time> -e <time>".

    I looked at the manpages, but couldn't tell if this sort of reporting capability is there. Perhaps this is done through extended accounting tools?

    Thx... -c


  • Stephen Lawrence Tuesday, December 7, 2010

    zonestatd only collects data when a zonestat command is running.

    We would certainly like to extend zonestatd to allow data to be logged in a historical format, and add new sar-like options to re-run the data through the zonestat command. Would you find it sufficient to be able to replay the data through zonestat, or would you want a stable history-file format as well, to interrogate directly?

    As always, see Oracle's "future looking statements" disclosures.

    Currently, you can run zonestat -P to save the data in a ":" delimited format, which is a lot easer to parse for post-analysis.


  • Craig S. Bell Tuesday, December 7, 2010

    Stephen, Using zonestat(1) to read from logged data would be sufficient for my needs. Thx. -c


  • Danny Tuesday, December 7, 2010

    What about exposing zone metrics via SNMP? that's something I've always wanted to do on Sol10 for things like graphing per-zone CPU and memory consumption. I suppose one could write a MIB that executes zonestat under the covers, but a few ketats wouldn't go amiss...


  • Danny Tuesday, December 7, 2010

    That should say kstats obviously...


  • fese Thursday, February 3, 2011

    Pleasepleaseprettyplease do a backport to Sol10! We (and for sure lots of others) will be using Sol10 for a while, even after Sol11 arrives, and a zonestat should have been there from the beginning!


  • Jeffrey Victor Thursday, February 3, 2011

    Fese, see the first link in the section "Zonestat Introduction" in this blog entry. That zonestat was written for Solaris 10.


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha