Tuesday Oct 11, 2011

Solaris 11 is Coming!

Yes, the noise you hear is the pre-launch sequence for Solaris 11, which is getting closer every day!

The Launch Event will be held in New York City, at Gotham Hall on Broadway. Space is limited, so if you want to attend in person you should register online. A webcast of the event will also be available. Registration is available for both.

I have registered, so if you attend perhaps I will see you there!

Monday Sep 12, 2011

Oracle Solaris 11 Early Adopter Program

The next step in the release of Oracle Solaris 11 is here! Gold members of the Oracle Partner Network (OPN) may download the Early Adopter release to begin qualification of their applications on Oracle Solaris 11.

Oracle Solaris 11 includes the new major feature sets that are available in Solaris 11 Express, released in November 2010, and much more. You can find a complete description and download link of this Early Adopter release at oracle.com.

If you are not currently a member of the Oracle PartnerNetwork (OPN), you still have two choices:

  1. Learn more about OPN and register at http://www.oracle.com/partners/en/opn-program/index.html
  2. Begin to experience key new features by downloading Solaris 11 Express

We are looking forward to helping you learn all about these exciting new features and the benefits you will derive from them!

Thursday Aug 18, 2011

Oracle Solaris available for Exadata Database Machines

Oracle Solaris 11 Express is now available on Oracle Exadata Database Machines, enabling you to benefit from all of the OLTP and DW performance of Exadata hardware, and the industry-leading scalability and availability characteristics of Oracle Solaris.

For details, see the press release.

Tuesday Feb 08, 2011

Virtual Network - Part 3

This is the third in a series of blog entries that discuss the network virtualization features in Oracle Solaris 11 Express. Part 1 introduced the concept of network virtualization and listed the basic virtual network elements that Solaris 11 Express (S11E) provides. Part 2 expanded on the concepts and discussed the resource management features which can be applied to those virtual network elements (VNEs).

This blog entry assumes that you have some experience with Solaris Zones. If you don't, you can read my earlier blog entries, or buy the book "Oracle Solaris 10 System Virtualization Essentials" or read the documentation.

This entry will demonstrate the creation of some of these VNEs.

For today's examples, I will use an old Sun Fire T2000 that has one SPARC CMT (T1) chip and 32GB RAM. I will pretend that I am implementing a 3-tier architecture in this one system, where each tier is represented by one Solaris zone. The mythical example provides access to an employee database. The 3-tier service is named 'emp' and VNEs will use 'emp' in their names to reduce confusion regarding the dozens of VNEs we expect to create for the services this system will deliver.

The commands shown below use the prompt "GZ#" to indicate that the command is entered in the global zone by someone with sufficient privileges. Similarly, the prompt "emp-web1#" indicates a command which is entered in the zone "emp-web1" as a sufficiently privileged user.

Fortunately, Solaris network engineers gathered all of the actions regarding the management of network elements (virtual or physical) into one command: dladm(1M). You use dladm to create, destroy, and configure datalinks such as VNICs. You can also use it to list physical NICs:

GZ# dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
e1000g2     phys      1500   unknown  --         --
e1000g1     phys      1500   down     --         --
e1000g3     phys      1500   unknown  --         --
We need three VNICs for our three zones, one VNIC per zone. They will also have useful names - one for each of the tiers - and will share e1000g0:
GZ# dladm create-vnic -l e1000g0 emp_web1
GZ# dladm create-vnic -l e1000g0 emp_app1
GZ# dladm create-vnic -l e1000g0 emp_db1
GZ# dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
e1000g0     phys      1500   up       --         --
e1000g2     phys      1500   unknown  --         --
e1000g1     phys      1500   down     --         --
e1000g3     phys      1500   unknown  --         --
emp_web1    vnic      1500   up       --         e1000g0
emp_app1    vnic      1500   up       --         e1000g0
emp_db1     vnic      1500   up       --         e1000g0
GZ# dladm show-vnic
LINK         OVER         SPEED  MACADDRESS        MACADDRTYPE         VID
emp_web1     e1000g0      0      2:8:20:3a:43:c8   random              0
emp_app1     e1000g0      0      2:8:20:36:a1:17   random              0
emp_db1      e1000g0      0      2:8:20:b4:5b:d3   random              0

The system has four NICs and three VNICs. Note that the name of a VNIC may not include a hyphen (-) but may include an underscore (_).

VNICs that share a NIC appear to be attached together via a virtual switch. That vSwitch is created automatically by Solaris. This diagram represents the NIC and NVEs we have created.

Now that these datalinks - the VNICs - exist, we can assign them to our zones. I'll assume that the zones already exist, and just need network assignment.

GZ# zonecfg -z emp-web1 info
zonename: emp-web1
zonepath: /zones/emp-web1
brand: ipkg
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: exclusive
hostid:
fs-allowed:
GZ# zonecfg -z emp-web1
zonecfg:emp-web1> add net
zonecfg:emp-web1:net> set physical=emp_web1
zonecfg:emp-web1:net> end
zonecfg:emp-web1> exit

Those steps can be followed for the other two zones and matching VNICs. After those steps are completed, our earlier diagram would look like this:

Packets passing from one zone to another within a Solaris instance do not leave the computer, if they are in the same subnet and use the same datalink. This greatly improves network bandwidth and latency. Otherwise, the packets will head for the zone's default router.

Therefore, in the above diagram packets sent from emp-web1 destined for emp-app1 would traverse the virtual switch, but not pass through e1000g0.

This zone is an "exclusive-IP" zone, meaning that it "owns" its own networking. What is its view of networking? That's easy to determine. The zlogin(1M) command inserts a complete command-line into the zone. By default, the command is run as the root user.

GZ# zoneadm -z emp-web1 boot
GZ# zlogin emp-web1 dladm show-link
LINK        CLASS     MTU    STATE    BRIDGE     OVER
emp_web1    vnic      1500   up       --         ?
GZ# zlogin emp-web1 dladm show-vnic
LINK         OVER         SPEED  MACADDRESS        MACADDRTYPE         VID
emp_web1     ?            0      2:8:20:3a:43:c8   random              0

Notice that the zone sees its own VNEs, but cannot see NEs or VNEs in the global zone, or in any other zone.

The other important new networking command in Solaris 11 Express is ipadm(1M). That command creates IP address assignments, enables and disables them, displays IP address configuration information, and performs other actions.

The following example shows the global zone's view before configuring IP in the zone:

GZ# ipadm show-if
IFNAME     STATE    CURRENT      PERSISTENT
lo0        ok       -m-v------46 ---
e1000g0    ok       bm--------4- ---
GZ# ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
lo0/?             static   ok           127.0.0.1/8
lo0/?             static   ok           127.0.0.1/8
lo0/?             static   ok           127.0.0.1/8
e1000g0/_a        static   ok           10.140.204.69/24
lo0/v6            static   ok           ::1/128
lo0/?             static   ok           ::1/128
lo0/?             static   ok           ::1/128
lo0/?             static   ok           ::1/128

At this point, not only does the zone know it has a datalink (which we saw above) but the IP tools show that it is there, ready for use. The next example shows this:

GZ# zlogin emp-web1 ipadm show-if
IFNAME     STATE    CURRENT      PERSISTENT
lo0        ok       -m-v------46 ---
GZ# zlogin emp-web1 ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
lo0/v6            static   ok           ::1/128
An ethernet datalink without an IP address isn't very useful, so let's configure an IP interface and apply an IP address to it:
GZ# zlogin emp-web1 ipadm show-if
IFNAME     STATE    CURRENT      PERSISTENT
lo0        ok       -m-v------46 ---
GZ# zlogin emp-web1 ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
lo0/v6            static   ok           ::1/128

GZ# zlogin emp-web1 ipadm create-if emp_web1
GZ# zlogin emp-web1 ipadm show-if
IFNAME     STATE    CURRENT      PERSISTENT
lo0        ok       -m-v------46 ---
emp_web1   down     bm--------46 -46

GZ# zlogin emp-web1 ipadm create-addr -T static -a local=10.140.205.82/24 emp_web1/v4static
GZ# zlogin emp-web1 ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
emp_web1/v4static static   ok           10.140.205.82/24
lo0/v6            static   ok           ::1/128

GZ# zlogin emp-web1 ifconfig emp_web1
emp_web1: flags=1000843 mtu 1500 index 2
        inet 10.140.205.82 netmask ffffff00 broadcast 10.140.205.255
        ether 2:8:20:3a:43:c8

The last command above shows the "old" way of displaying IP address configuration. The command ifconfig(1) is still there, but the new tools dladm and ipadm provide a more consistent interface, with well-defined separation between datalink management and IP management.

Of course, if you want the zone's outbound packets to be routed to other networks, you must use the route(1M) command, the /etc/defaultrouter file, or both.

Next time, I'll show a new network measurement tool and the ability to control the amount of network bandwidth consumed.

Wednesday Jan 05, 2011

Virtual Networks

Network virtualization is one of the industry's hot topics. The potential to reduce cost while increasing network flexibility easily justifies the investment in time to understand the possibilities. This blog entry describes network virtualization and some concepts. Future entries will show the steps to create a virtual network.

Introduction to Network Virtualization

Network virtualization can be described as the process of creating a computer network which does not match the physical topology of a physical network. Usually this is achieved by using software tools of general-purpose computers or by using features of network hardware. A defining characteristic of a virtual network is the ability to re-configure the topology without manipulating any physical objects: devices or cables.

Such a virtual network mimics a physical network. Some types of virtual networks, for example virtual LANs (VLANs), can be implemented using features of network switches and computers. However, some other implementations do not require traditional network hardware such as routers and switches. All of the functionality of network hardware has been re-implemented in software, perhaps in the operating system.

Benefits of network virtualization (NV) include increased architectural flexibility, better bandwidth and latency characteristics, the ability to prioritize network traffic to meet desired performance goals, and lower cost from fewer devices, reduced total power consumption, etc.

The remainder of this blog entry will focus on a software-only implementation of NV.

A few years ago, networking engineers at Sun began working on a software project named "Crossbow." The goal was to create a comprehensive set of NV features within Solaris. Just like Solaris Zones, Crossbow would provide integrated features for creation and monitoring of general purpose virtual network elements that could be deployed in limitless configurations. Because these features are integrated into the operating system, they automatically take advantage of - and smoothly interoperate with - existing features. This is most noticeable in the integration of Solaris NV features and Solaris Zones. Also, because these NV features are a part of Solaris, future Solaris enhancements will be integrated with Solaris NV where appropriate.

The core NV features were first released in OpenSolaris 2009.06. Since then, those core features have matured and more details have been added. The result is the ability to re-implement entire networks as virtual networks using Solaris 11 Express. Here is an example of a virtual network architecture:

As you can guess from that example, you can create virtually :-) any network topology as a virtual network...

Oracle Solaris NV does more than is described here. This content focuses on the key features which might be used to consolidate workloads or entire networks into a Solaris system, using zones and NV features.

Virtual Network Elements

Solaris 11 Express implements the following virtual network elements.
  • NIC: OK, this isn't a virtual element, it's just on the list as a starting point.
    For a very long time, Solaris has managed Network Interface Connectors (NICs). Solaris offers tools to manage NICs, including bringing them up and down, and assigning various characteristics to them, such as IP addresses, assignment to IP Multipathing (IPMP) groups, etc. Note that up through Solaris 10, most of those configuration tasks were accomplished with the ifconfig(1M) command, but in Solaris 11 Express the dladm(1M) and ipadm(1M) commands perform those tasks, and a few more. You can monitor the use of NICs with dlstat(1M). The term "datalink" is now used consistently to refer to NICs and things like NICs, such as...

  • A VNIC is a pseudo interface created on a datalink (a NIC or an etherstub, described next). Each VNIC has its own MAC address, which can be generated automatically, but can be specified manually. For almost all purposes, a VNIC can be can be managed like a NIC. The dladm command creates, lists, deletes, and modifies VNICs. The dlstat command displays statistics about VNICs. The ipadm(1M) command configures IP interfaces on VNICs.
    Like NICs, VNICs have a number of properties that can be modified with dladm. These include the ability to force network processing of a VNIC to a certain set of CPUs, setting a cap (maximum) on permitted bandwidth for a VNIC, the relative priority of this VNIC versus other VNICs on the same NIC, and other properties.

  • Etherstubs are pseudo NICs, making internal networks possible. For a general understanding, think of them as virtual switches. The command dladm manages etherstubs.

  • A flow is a stream of packets that share particular attributes such as source IP address or TCP port number. Once defined, a flow can be managed as an entity, including capping bandwidth usage, setting relative priorities, etc. The new flowadm(1M) command enables you to create and manage flows. Even if you don't set resource controls, flows will benefit from dedicated kernel resources and more predictable, consistent performance. Further, you can directly observe detailed statistics on each flow, improving your ability to understand these streams of packets and set proper resource controls. Flows are managed with flowadm(1M) and monitored with flowstat(1M).

  • VLANs (Virtual LANs) have been around for a long time. For consistency, the commands dladm, dlstat and ipadm now manage VLANs.

  • InfiniBand partitions are virtual networks that use an InfiniBand fabric. They are managed with the same commands as VNICs and VLANs: dladm, dlstat, ipadm and others.

Summary

Solaris 11 Express provides a complete set of virtual network components which can be used to deploy virtual networks within a Solaris instance. The next blog entry will describe network resource management and security. Future entries will provide some examples.

Thursday Dec 09, 2010

All New Zonestat - Part 2

Part 2

Recently I introduced zonestat(1), a new command offered in Solaris 11 Express that replaces the zonestat Perl script that I had open-sourced a couple of years ago. Today I will complete the description of the new zonestat.

Fair Share Scheduler

One of the many useful resource controls offered by Solaris is the Fair Share Scheduler (FSS(7)). You can use FSS to tell Solaris "make sure that zoneXYZ can use a specific portion of the compute capacity of a set of 'Solaris CPUs' at a minimum. (For more information on FSS, see its man page and the Solaris 10 or Solaris 11 Express documentation.) FSS only enforces those minima if there is CPU contention.

That terse explanation used the phrase "of a set of CPUs" because the minimum portions of compute capacity enforced by Solaris can be calculated across all of the CPUs in the system, or across a set of CPUs that you have configured into a processor set. For my purpose here, the point is that you can create a resource pool - including a processor set, assign multiple zones to that pool, and tell Solaris to use the FSS scheduling algorithm for the processes to be scheduled on those CPUs. (See the libpool(3LIB) man page for more information.)

Zonestat will show information relating to FSS, including data that answers the question "is there CPU contention in any of my psets, and if there is, which zone(s) are using more than I expected?"

The next example uses two new zones, and assumes that most of the zones used earlier have been turned off for now. A dynamic resource pool, sharedDB, has been created. It will be the set of CPUs used by two zones which run database software. (This method is called "capped Containers" in Oracle licensing documents and is considered hard partitioning. It can be used to limit software license costs.) One of those zones, zoneDB-2, is more important than the other, zoneDB-1. To meet its SLA, zoneDB-2 must always be able to use at least 4 of the 6 CPUs in that pset.

# zonestat -r psets 10 1
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
PROCESSOR_SET                   TYPE  ONLINE/CPUS     MIN/MAX
pset_default            default-pset        22/22         1/-
                                ZONE  USED   PCT   CAP  %CAP   SHRS  %SHR %SHRU
                             [total]  0.12 0.56%     -     -      -     -     -
                            [system]  0.02 0.12%     -     -      -     -     -
                              global  0.09 0.44%     -     -      -     -     -

PROCESSOR_SET                   TYPE  ONLINE/CPUS     MIN/MAX
sharedDB                   pool-pset          6/6         6/6
                                ZONE  USED   PCT   CAP  %CAP   SHRS  %SHR %SHRU
                             [total]  4.01 66.8%     -     -    300     -     -
                            [system]  0.00 0.00%     -     -      -     -     -
                            zoneDB-2  3.00 50.1%     -     -    200 66.6% 75.1%
                            zoneDB-1  1.00 16.7%     -     -    100 33.3% 50.2%

PROCESSOR_SET                   TYPE  ONLINE/CPUS     MIN/MAX
zoneA                  dedicated-cpu          4/4         4/4
                                ZONE  USED   PCT   CAP  %CAP   SHRS  %SHR %SHRU
                             [total]  0.00 0.09%     -     -      -     -     -
                            [system]  0.00 0.00%     -     -      -     -     -
                               zoneA  0.00 0.09%     -     -      -     -     -
In the output above, zoneDB-1 is using the equivalent processing capacity of 1 Solaris CPU, and zoneDB-2 is using the equivalent of 3 Solaris CPUs. The PCT column indicates that zoneDB-2 is using 50% (3 out of 6) of the CPU capacity of the entire pset.

In addition, the SHRS column shows the number of FSS shares assigned to each of those zones, and %SHR is that zone's proportion of the total number of shares. In other words, zoneDB-2 is using 50% of the pset, but hasn't even used its enforced minimum of 66.6%.

This FSS configuration ensures that 200/300ths of 6 CPUs (i.e. 4 CPUs) are available to zoneDB-2. The %SHRU value of 75% tells us that the zone is using 75% of those 4 CPUs. Again, each of those two zones is allowed to use more than its share, as long as each zone can use its specified minimum.

Sorting the Output

You may have noticed in earlier examples that the zones were not listed in alphabetical order. By default, they are sorted by the amount of the resource that zonestat was reporting. If a resource is not specified, the output is sorted on CPU%. The two rows showing total and system data are always listed first.

You can change the sort order with the -S option, for example:

GZ$ zonestat -r processes -S used 2 1
Collecting data for first interval...
Interval: 1, Duration: 0:00:02
PROCESSES                     SYSTEM LIMIT
system-limit                          292K
                                ZONE  USED   PCT   CAP  %CAP
                             [total]   110 0.36%     -     -
                            [system]     0 0.00%     -     -
                              global    61 0.20%     -     -
                               zoneD    26 0.08%     -     -
                               zoneA    23 0.07%     -     -

As the man page for zonestat(1) shows, many other resources can be monitored. I will not review each of them here.

Aggregated Data

But wait! There's more! ;-)

In addition to understanding resource usage over a short interval (e.g. 10 or 60 seconds) it is often to necessary to understand the peak usage or average usage over a longer period of time. For that purpose, zonestat provides Summary Reports.

A simple summary report is one that is appended to the per-sample data shown earlier. Here is the output for two samples and one summary report:

GZ$ zonestat -r processes -R high -S used 10 2
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
PROCESSES                     SYSTEM LIMIT
system-limit                          292K
                                ZONE  USED   PCT   CAP  %CAP
                             [total]   109 0.36%     -     -
                            [system]     0 0.00%     -     -
                              global    61 0.20%     -     -
                               zoneD    25 0.08%     -     -
                               zoneA    23 0.07%     -     -

Interval: 2, Duration: 0:00:20
PROCESSES                     SYSTEM LIMIT
system-limit                          292K
                                ZONE  USED   PCT   CAP  %CAP
                             [total]   109 0.36%     -     -
                            [system]     0 0.00%     -     -
                              global    61 0.20%     -     -
                               zoneD    25 0.08%     -     -
                               zoneA    23 0.07%     -     -

Report: High Usage
    Start: Thu Dec  2 21:58:54 EST 2010
      End: Thu Dec  2 21:59:14 EST 2010
    Intervals: 2, Duration: 0:00:20
PROCESSES                     SYSTEM LIMIT
system-limit                          292K
                                ZONE  USED   PCT   CAP  %CAP
                             [total]   109 0.36%     -     -
                            [system]     0 0.00%     -     -
                              global    61 0.20%     -     -
                               zoneD    25 0.08%     -     -
                               zoneA    23 0.07%     -     -

If you only need the summary report, -q will be useful: it suppresses the individual samples of data.
GZ$ zonestat -q -r physical-memory -R high -S used 10 2
Report: High Usage
    Start: Thu Dec  2 22:03:54 EST 2010
      End: Thu Dec  2 22:04:14 EST 2010
    Intervals: 2, Duration: 0:00:20
PHYSICAL-MEMORY              SYSTEM MEMORY
mem_default                          31.8G
                                ZONE  USED   PCT   CAP  %CAP
                             [total] 5205M 15.9%     -     -
                            [system] 2790M 8.54%     -     -
                               zoneD 2229M 6.83%     -     -
                              global  141M 0.43%     -     -
                               zoneA 44.5M 0.13%     -     -
Let's assume that, from the data above, we determine that zoneD will never need more than 3 GB of RAM when it is operating correctly. We can add a RAM cap so that Solaris enforces that limit in case something goes awry:
GZ$ pfexec rcapadm -z zoneD -m 3g
GZ$ zonestat -q -r physical-memory -R high -S used 10 2
Report: High Usage
    Start: Thu Dec  2 22:37:15 EST 2010
      End: Thu Dec  2 22:37:35 EST 2010
    Intervals: 2, Duration: 0:00:20
PHYSICAL-MEMORY              SYSTEM MEMORY
mem_default                          31.8G
                                ZONE  USED   PCT   CAP  %CAP
                             [total] 5204M 15.9%     -     -
                            [system] 2789M 8.54%     -     -
                               zoneD 2229M 6.83% 3072M 72.5%
                              global  141M 0.43%     -     -
                               zoneA 44.5M 0.13%     -     -
In addition to a summary report at the end of the output, zonestat is able to generate periodic summary reports based on the collection of individual samples. For example, you might want to know what the peak memory usage is for each zone, per hour.

(A quick side note: zonestat does not perform continuous data collection. It collects data at an interval you specify. Therefore, the peak values reported by zonestat are the peak values of the values which were collected. In other words, zonestat reports the peak observed values.)

The example below collects data every 10 seconds for 24 hours. It reports the peak observed values every hour.

GZ$ zonestat -q -r physical-memory -R high 10 24h 60m
Report: High Usage
    Start: Mon Dec  5 16:42:01 EST 2010
      End: Mon Dec  5 17:42:01 EST 2010
    Intervals: 360, Duration: 1:00:00
PHYSICAL-MEMORY              SYSTEM MEMORY
mem_default                          31.8G
                                ZONE  USED   PCT   CAP  %CAP
                             [total] 3015M 9.23%     -     -
                            [system] 2791M 8.55%     -     -
                              global  136M 0.41%     -     -
                               zoneA 44.5M 0.13%     -     -
                               zoneD 45.7M 0.13% 3072M 1.48%

Report: High Usage
    Start: Mon Dec  5 17:42:01 EST 2010
      End: Mon Dec  5 18:42:01 EST 2010
    Intervals: 10, Duration: 1:00:00
PHYSICAL-MEMORY              SYSTEM MEMORY
mem_default                          31.8G
                                ZONE  USED   PCT   CAP  %CAP
                             [total] 3015M 9.23%     -     -
                            [system] 2791M 8.55%     -     -
                              global  136M 0.41%     -     -
                               zoneA 44.5M 0.13%     -     -
                               zoneD 64.3M 0.19% 3072M 2.09%

Report: High Usage
    Start: Mon Dec  5 18:42:01 EST 2010
      End: Mon Dec  5 19:42:01 EST 2010
    Intervals: 15, Duration: 1:00:00
PHYSICAL-MEMORY              SYSTEM MEMORY
mem_default                          31.8G
                                ZONE  USED   PCT   CAP  %CAP
                             [total] 3015M 9.23%     -     -
                            [system] 2791M 8.55%     -     -
                              global  136M 0.41%     -     -
                               zoneA 44.5M 0.13%     -     -
                               zoneD 65.1M 0.19% 3072M 2.11%
[further output deleted]

Parseable Data

At this point (especially after reading the sub-title of this section...) you might think that zonestat should have an option to generate output which is easy to parse. And you won't be disappointed: CSV (colon-separated-value) output is the result of the lower-case -p option:
GZ$ zonestat -p -q -r physical-memory -R high -S used 10 2
report-high:header:20101203T034655Z:20101203T034715Z:10:20
report-high:physical-memory:mem_default:[resource]:33423360K
report-high:physical-memory:mem_default:[total]:5329836K:15.94%:-:-
report-high:physical-memory:mem_default:[system]:2856556K:8.54%:-:-
report-high:physical-memory:mem_default:zoneD:2282968K:6.83%:3145728K:72.57%
report-high:physical-memory:mem_default:global:144608K:0.43%:-:-
report-high:physical-memory:mem_default:zoneA:45576K:0.13%:-:-
report-high:footer:20101203T034715Z10:20
Even a small number of options can generate a great deal of output:
GZ$ zonestat -p -r psets -R high 10 2
interval:header:since-last-interval:20101203T035201Z:1:10
interval:processor-set:default-pset:pset_default:[resource]:24:24:1:-:0-04-03.01
interval:processor-set:default-pset:pset_default:[total]:0.09:0.38%:-:-:-:-:-:0-00-00.92
interval:processor-set:default-pset:pset_default:[system]:0.01:0.06%:-:-:-:-:-:0-00-00.15
interval:processor-set:default-pset:pset_default:global:0.07:0.31%:-:-:-:-:-:0-00-00.77
interval:processor-set:dedicated-cpu:zoneD:[resource]:4:4:4:4:0-00-40.50
interval:processor-set:dedicated-cpu:zoneD:[total]:1.00:25.01%:-:-:-:-:-:0-00-10.13
interval:processor-set:dedicated-cpu:zoneD:[system]:0.00:0.05%:-:-:-:-:-:0-00-00.02
interval:processor-set:dedicated-cpu:zoneD:zoneD:0.99:24.95%:-:-:-:-:-:0-00-10.10
interval:processor-set:dedicated-cpu:zoneA:[resource]:4:4:4:4:0-00-40.50
interval:processor-set:dedicated-cpu:zoneA:[total]:0.00:0.01%:-:-:-:-:-:0-00-00.00
interval:processor-set:dedicated-cpu:zoneA:[system]:0.00:0.00%:-:-:-:-:-:0-00-00.00
interval:processor-set:dedicated-cpu:zoneA:zoneA:0.00:0.01%:-:-:-:-:-:0-00-00.00
interval:footer:20101203T035201Z10:10
interval:header:since-last-interval:20101203T035211Z:2:20
interval:processor-set:default-pset:pset_default:[resource]:24:24:1:-:0-04-03.08
interval:processor-set:default-pset:pset_default:[total]:0.07:0.32%:-:-:-:-:-:0-00-00.79
interval:processor-set:default-pset:pset_default:[system]:0.01:0.06%:-:-:-:-:-:0-00-00.15
interval:processor-set:default-pset:pset_default:global:0.06:0.26%:-:-:-:-:-:0-00-00.63
interval:processor-set:dedicated-cpu:zoneD:[resource]:4:4:4:4:0-00-40.51
interval:processor-set:dedicated-cpu:zoneD:[total]:1.00:25.01%:-:-:-:-:-:0-00-10.13
interval:processor-set:dedicated-cpu:zoneD:[system]:0.00:0.12%:-:-:-:-:-:0-00-00.04
interval:processor-set:dedicated-cpu:zoneD:zoneD:0.99:24.89%:-:-:-:-:-:0-00-10.08
interval:processor-set:dedicated-cpu:zoneA:[resource]:4:4:4:4:0-00-40.51
interval:processor-set:dedicated-cpu:zoneA:[total]:0.00:0.01%:-:-:-:-:-:0-00-00.00
interval:processor-set:dedicated-cpu:zoneA:[system]:0.00:0.00%:-:-:-:-:-:0-00-00.00
interval:processor-set:dedicated-cpu:zoneA:zoneA:0.00:0.01%:-:-:-:-:-:0-00-00.00
interval:footer:20101203T035211Z10:20
report-high:header:20101203T035151Z:20101203T035211Z:10:20
report-high:processor-set:default-pset:pset_default:[resource]:24:24:1:-:0-04-03.08
report-high:processor-set:default-pset:pset_default:[total]:0.09:0.38%:-:-:-:-:-:0-00-00.92
report-high:processor-set:default-pset:pset_default:[system]:0.01:0.06%:-:-:-:-:-:0-00-00.15
report-high:processor-set:default-pset:pset_default:global:0.07:0.31%:-:-:-:-:-:0-00-00.77
report-high:processor-set:dedicated-cpu:zoneD:[resource]:4:4:4:4:0-00-40.51
report-high:processor-set:dedicated-cpu:zoneD:[total]:1.00:25.07%:-:-:-:-:-:0-00-10.15
report-high:processor-set:dedicated-cpu:zoneD:[system]:0.00:0.12%:-:-:-:-:-:0-00-00.04
report-high:processor-set:dedicated-cpu:zoneD:zoneD:0.99:24.95%:-:-:-:-:-:0-00-10.10
report-high:processor-set:dedicated-cpu:zoneA:[resource]:4:4:4:4:0-00-40.51
report-high:processor-set:dedicated-cpu:zoneA:[total]:0.00:0.01%:-:-:-:-:-:0-00-00.00
report-high:processor-set:dedicated-cpu:zoneA:[system]:0.00:0.00%:-:-:-:-:-:0-00-00.00
report-high:processor-set:dedicated-cpu:zoneA:zoneA:0.00:0.01%:-:-:-:-:-:0-00-00.00
report-high:footer:20101203T035211Z10:20
With parseable output, you can easily write scripts that consume the output. Those scripts can further analyze the data, draw colorful graphs, and perform other data manipulation.

Miscellaneous Comments

A few parting notes:
  1. You can run zonestat in any zone. It will only receive data that should be visible to that zone. For example, when run in a non-global zone, zonestat will only display data about the processor set on which that zone's processes run.
  2. Zonestat gets all of the data from the zonestatd daemon, which is part of the service svc:/system/zones-monitoring:default. That service is enabled by default. Because that service is managed by SMF, if for any reason zonestatd stops, SMF will restart it.
  3. You can specify the interval at which zonestatd gathers data by setting the zones-monitoring:default service property config/sample_interval.

Summary

The new zonestat will provide hours of entertainment. It also provides answers to countless questions regarding your zones' use of system resources. Armed with that information, you can improve your understanding of the resource consumption of those zones, and improve the use of resource controls to ensure predictable performance of your workloads.

I hope that you have enjoyed learning about zonestat as much as I did. Check back during the first week of January for information on other new observability tools in Solaris 11 Express!

<script type="text/javascript"> var sc_project=2359564; var sc_invisible=1; var sc_security="22b325fd"; var sc_https=1; var sc_remove_link=1; var scJsHost = (("https:" == document.location.protocol) ? "https://secure." : "http://www."); document.write("");</script>

counter for tumblr

Tuesday Dec 07, 2010

All New Zonestat!

Part 1

Recently I gave a brief overview of the enhancements available in Solaris 11 Express. I also hinted at more blog entries, mostly featuring Solaris Zones and network virtualization.

Before experimenting with new functions it's useful to have some tools to measure the results. With that in mind, this blog entry and its successor(s) will discuss new measurement tools that are in Solaris 11 Express: zonestat(1), flowstat(1M) and dlstat(1M). I will start with zonestat.

Zonestat Introduction

But first some history. For Solaris 10 I created an open-source tool I named "zonestat". That tool filled a need: one integrated view of the resource consumption and optional resource control settings of all running zones. The resources listed included CPUs, physical memory, virtual memory, and locked memory. Zonestat provided a "dashboard" that greatly eased the task of monitoring the resource usage of Solaris Zones.

That tool has these main drawbacks:

  1. It's written in Perl and uses a large set of existing Solaris commands to gather all of the data that it needs. Executing all of those commands for each data sample uses a significant amount of CPU time.
  2. It is a separate tool, not part of Solaris. It is not supported.
  3. It was originally intended as a prototype, a demonstration of what could be accomplished. I made a number of enhancements along the way, but for a while it wasn't clear whether it made sense to upgrade it for Solaris 11.
However, even with those shortcomings, that zonestat script was put into production at a number of data centers.

In 2009 Solaris Engineering decided to write a fully supported version of zonestat, as a new Solaris command. Instead of someone writing code in his spare time (me), a member of the Solaris Zones Engineering Team (Steve Lawrence) was assigned to write a comprehensive, efficient, fully featured tool that achieved many of the same goals as the original zonestat, and many more. Using the experience gained from the original zonestat script, a completely new program (also called "zonestat") had the potential to solve all of the problems of the open-source Perl script, and add new features which had been requested by users of Solaris Zones, and other features which the Zones Engineering Team knew would be useful.

And in Solaris 11 Express that potential was realized. Because the new zonestat performs almost all of the functions of the original zonestat script, and performs far more in addition, the rest of this blog entry (and the next one) will only discuss the new zonestat which is part of Solaris 11 Express.

The new zonestat(1) command has a plethora of options. These options allow the user to list data:

  • for each of the system's zones, including the global zone and data specific to kernel processing but not directly attributable to any one zone
  • for any subset of zones
  • regarding one or more types of resources, in absolute units or as a portion of available or capped resources
  • regarding one or more instances of resources (e.g. a particular processor set)
  • that has been sorted by one or more output columns
  • that is human-readable output or output that can be easily parsed by a script or other program
  • that includes timestamps, in one of several formats
  • that includes regular aggregations (called "summary reports"), such as "highest value during the interval" or "average value during the interval"
Zonestat has a variety of uses. The most obvious is monitoring resource usage of zones. Even if you don't use resource controls, zonestat will help you by telling you when a zone is using a significant portion (or all!) of the system's resources. Of course, zonestat really brings value to systems that are using resource controls, making it easy to determine which zones are near their caps - a sure sign that there is a problem with that zone's workload or that the zone's cap is too low.

In addition, you can use zonestat to determine proper values for resource controls. For example, you can deploy a workload in a zone and use zonestat to determine the maximum amount of CPU capacity it uses. That information will enable you to make better decisions about how many CPUs to assign to that zone - if you have decided that the workload should use its own, dedicated CPUs to the zone.

If you are not familiar with the resource management controls offered by Solaris, you may wish to view the relevant documentation before, during or after reading the rest of this. The book "Oracle Solaris 10 System Virtualization Essentials" also describes all of the resource controls available for Solaris 10 Zones, and how they can be used to achieve various goals. Finally, the document "Understanding the Security Capabilities of Solaris" approaches the same content from a security perspective.

Now let's explore some of the interesting things you can do with zonestat.

Basics

The default output provides the data you would expect - basic information about the resource usage of all zones on the system. The command syntax can be simplified to this (omitting some features for now):

zonestat [options] interval [duration]
and the basic output looks like this:
GZ$ zonestat 5 2
Collecting data for first interval...
Interval: 1, Duration: 0:00:05
SUMMARY                  Cpus/Online: 32/32   Physical: 31.8G    Virtual: 47.8G
                    ----------CPU---------- ----PHYSICAL----- -----VIRTUAL-----
               ZONE  USED %PART  %CAP %SHRU  USED   PCT  %CAP  USED   PCT  %CAP
            [total]  0.10 0.31%     -     - 3109M 9.52%     - 7379M 15.0%     -
           [system]  0.01 0.04%     -     - 2797M 8.57%     - 7115M 14.5%     -
             global  0.08 0.51%     -     -  141M 0.43%     -  129M 0.26%     -
              zoneA  0.00 0.02%     -     - 43.7M 0.13%     - 35.4M 0.07%     -
              zoneB  0.00 0.02%     -     - 42.0M 0.12%     - 32.8M 0.06%     -
              zoneC  0.00 0.04%     -     - 42.0M 0.12%     - 32.8M 0.06%     -
              zoneD  0.00 0.02%     -     - 42.1M 0.12%     - 33.2M 0.06%     -

Interval: 2, Duration: 0:00:10
SUMMARY                  Cpus/Online: 32/32   Physical: 31.8G    Virtual: 47.8G
                    ----------CPU---------- ----PHYSICAL----- -----VIRTUAL-----
               ZONE  USED %PART  %CAP %SHRU  USED   PCT  %CAP  USED   PCT  %CAP
            [total]  0.09 0.30%     -     - 3109M 9.52%     - 7379M 15.0%     -
           [system]  0.01 0.03%     -     - 2797M 8.57%     - 7115M 14.5%     -
             global  0.08 0.51%     -     -  142M 0.43%     -  129M 0.26%     -
              zoneA  0.00 0.02%     -     - 43.7M 0.13%     - 35.4M 0.07%     -
              zoneB  0.00 0.02%     -     - 42.0M 0.12%     - 32.8M 0.06%     -
              zoneC  0.00 0.02%     -     - 42.0M 0.12%     - 32.8M 0.06%     -
              zoneD  0.00 0.02%     -     - 42.1M 0.12%     - 33.2M 0.06%     -

First, note that unlike other Solaris stat tools (e.g. vmstat) the first set of data is not a summary since the system booted. Instead, zonestat pauses for the time interval specified on the command line, at which point it displays data representing the first sample. (Zonestat doesn't actually collect the data. Its companion, zonestatd(1M) performs that service for all zonestat clients.)

Also, you probably noticed those two special lines, "[total]" and "[system]". The first of those indicates data about the total quantity of each resource, across the whole system. The lines labeled "[system]" show resource consumption by the kernel or by processes that aren't associated with any one zone.

Zonestat can produce a great deal of information - more than will fit on one line. Its various options allow you to view summary data - as provided in the default - or to focus on a zone, or on a particular type or instance of a resource, or a combination of those. Obviously, the header will be tailored to the output requested.

The summary header looks like this:

Interval: 1, Duration: 0:00:05
SUMMARY                  Cpus/Online: 32/32   Physical: 31.8G    Virtual: 47.8G
                    ----------CPU---------- ----PHYSICAL----- -----VIRTUAL-----
               ZONE  USED %PART  %CAP %SHRU  USED   PCT  %CAP  USED   PCT  %CAP
The first line of data, per sample, tells you the ordinal number of the sample - not very useful if you're just checking a few seconds of data, but pretty helpful when you're scanning through 3 days worth of output. The Duration field is similar, but is a measurement of time since the command began.

The SUMMARY line shows the quantity of CPUs that exist in the system and how many of them are online. (I wrote an earlier blog entry about the method that Solaris uses to count CPUs.) That line also shows the system's amount of RAM ("Physical") and Virtual Memory (the size of RAM plus swap space on disk).

The ZONE column contains the name of the zone. The values in that row represent that zone's use of resources. The columns labeled USED show that zone's consumption of each resource. The unit depends on the resource. For CPUs, a value of 1 represents a "Solaris CPU." For memory, the unit is specified in the output.

Besides those generic header elements, some are specific to a resource type. %PART shows the CPU utilization, as a percentage of the compute capacity of the processor set in which the zone's processes run. %CAP is the percentage of the zone's CPU cap which has been used recently (if a cap has been applied to the zone). %SHRU indicates the amount of CPU used as a percentage of the shares assigned to the zone (if the Fair Share Scheduler is in use and shares have been assigned to this zone). The latter may occasionally show a surprising result: a value greater than 100%. I don't have space here to explain the Fair Share Scheduler, but the short version is "FSS enforces a minimum amount of available CPU capicity if there is contention for the CPUs, but it does not enforce a maximum. If there isn't contention, any process which wants to consume CPU cycles can do so - which can lead to a value greater than 100%."

The PHYSICAL section shows the amount of RAM used, the portion of the system's memory (PCT) represented by that amount of RAM, and the portion of the zone's RAM cap, if one has been set. The VIRTUAL section has similar fields.

Comparing Usage to Caps

To show the data you might see when a zone has a RAM cap, let's set one. We could do this in zonecfg(1M) for the next time the zone boots, but I don't feel like rebooting the zone, so let's add that cap while the zone runs. First, a quick check of the resource capping daemon. (In these examples, I am logged in as a user which is configured to use non-default administrative privileges. To temporarily gain those privileges, I will use the pfexec(1) command.)
GZ$ svcs rcap
STATE          STIME    FMRI
online         Oct_28   svc:/system/rcap:default
The service is online, and the cap is easy to set:
GZ$ pfexec rcapadm -z zoneB -m 512m
GZ$ zonestat -z zoneB 10 1
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
SUMMARY                  Cpus/Online: 32/32   Physical: 31.8G    Virtual: 47.8G
                    ----------CPU---------- ----PHYSICAL----- -----VIRTUAL-----
               ZONE  USED %PART  %CAP %SHRU  USED   PCT  %CAP  USED   PCT  %CAP
            [total]  0.15 0.49%     -     - 3112M 9.53%     - 7382M 15.0%     -
           [system]  0.01 0.05%     -     - 2797M 8.57%     - 7115M 14.5%     -
              zoneB  0.00 0.10%     -     - 42.5M 0.13% 8.31% 33.3M 0.06%     -

ZoneB is using 42.5MB, which is 0.13% of the system's memory (31.8GB), and 8.31% of the 512MB cap that we set.

One of the many very useful abilities of zonestat is its ability to focus on a small part of the data which it can potentially display. The previous example demonstrated its ability to limit the output to one zone. We can also limit the output to just one resource type, or "zoom in" further to one instance of a resource.

Let's limit our view to the RAM used by that zone:

GZ$ zonestat -r physical-memory -z zoneB 10 1
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
PHYSICAL-MEMORY              SYSTEM MEMORY
mem_default                          31.8G
                                ZONE  USED   PCT   CAP  %CAP
                             [total] 3113M 9.53%     -     -
                            [system] 2797M 8.56%     -     -
                               zoneB 42.5M 0.13%  512M 8.31%
We can "zoom out" and look at all of the processor sets and their zone assignments (something that was difficult in Solaris 10):
GZ$ pfexec zoneadm -z zoneA boot
GZ$ pfexec zoneadm -z zoneC boot
GZ$ pfexec zoneadm -z zoneD boot
GZ$ pfexec zonestat -r psets 10 1
Collecting data for first interval...
Interval: 1, Duration: 0:00:10
PROCESSOR_SET                   TYPE  ONLINE/CPUS     MIN/MAX
pset_default            default-pset        16/16         1/-
                                ZONE  USED   PCT   CAP  %CAP   SHRS  %SHR %SHRU
                             [total]  0.17 1.10%     -     -      -     -     -
                            [system]  0.03 0.23%     -     -      -     -     -
                              global  0.13 0.86%     -     -      -     -     -

PROCESSOR_SET                   TYPE  ONLINE/CPUS     MIN/MAX
zoneD                  dedicated-cpu          4/4         4/4
                                ZONE  USED   PCT   CAP  %CAP   SHRS  %SHR %SHRU
                             [total]  1.10 27.6%     -     -      -     -     -
                            [system]  0.00 0.00%     -     -      -     -     -
                               zoneD  1.10 27.6%     -     -      -     -     -

PROCESSOR_SET                   TYPE  ONLINE/CPUS     MIN/MAX
zoneC                  dedicated-cpu          4/4         4/4
                                ZONE  USED   PCT   CAP  %CAP   SHRS  %SHR %SHRU
                             [total]  1.00 25.0%     -     -      -     -     -
                            [system]  0.08 2.00%     -     -      -     -     -
                               zoneC  0.92 23.0%     -     -      -     -     -

PROCESSOR_SET                   TYPE  ONLINE/CPUS     MIN/MAX
zoneB                  dedicated-cpu          4/4         4/4
                                ZONE  USED   PCT   CAP  %CAP   SHRS  %SHR %SHRU
                             [total]  0.00 0.14%     -     -      -     -     -
                            [system]  0.00 0.00%     -     -      -     -     -
                               zoneB  0.00 0.14%     -     -      -     -     -

PROCESSOR_SET                   TYPE  ONLINE/CPUS     MIN/MAX
zoneA                  dedicated-cpu          4/4         4/4
                                ZONE  USED   PCT   CAP  %CAP   SHRS  %SHR %SHRU
                             [total]  0.00 0.14%     -     -      -     -     -
                            [system]  0.00 0.00%     -     -      -     -     -
                               zoneA  0.00 0.14%     -     -      -     -     -


With the basics out of the way, next time I will discuss some other options that display other data and organize the output in different ways.

<script type="text/javascript"> var sc_project=2359564; var sc_invisible=1; var sc_security="22b325fd"; var sc_https=1; var sc_remove_link=1; var scJsHost = (("https:" == document.location.protocol) ? "https://secure." : "http://www."); document.write("");</script>

counter for tumblr
About

Jeff Victor writes this blog to help you understand Oracle's Solaris and virtualization technologies.

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today