Tuesday Jul 07, 2015

Virtual Address Reservation in Solaris 11.3

For applications that have a need to place memory at fixed locations in its address space (like the Oracle SGA), there is a new feature in Solaris 11.3 called Virtual Address Reservation that provides support for such fixed address mappings. A fixed address mapping today can fail if the system has already assigned a mapping to the desired location. As the system is free to choose any unused region in a process' address space for mapping things such as libraries, such conflicts could arise.  Worse yet, if MAP_FIXED mmap(2) were used by the application, it would be successful but any existing mapping could be destroyed.

Virtual Address Reservation in Solaris 11.3 provides the means to 'reserve' a portion of a process' address space which will prevent the system from using the reserved space for mapping operations that don't specify a fixed address. The VA Reservations guarantee that fixed address mappings would be successful.

To create a VA Reservation requires that the application be recompiled with a Mapfile (version 2) containing the RESERVE_SEGMENT directive that specifies the virtual address range to reserve. Multiple RESERVE_SEGMENT directives can be specified in the Mapfile to create multiple VA Reservations. The Mapfile below would reserve the VA range from 0x300000000 to 0x300400000. 

# cat Mapfile

$mapfile_version 2

RESERVE_SEGMENT myReservedVaName {
        VADDR = 0x300000000;
        SIZE = 0x400000;

# cc file.c -Mmapfile -m64

On execution of the resultant a.out binary, the specified virtual address range will be reserved early on during process startup and before libraries are mapped. pmap(1) can be run on the running process to see its VA reservation(s); it can be seen in the pmap output as "[ reserved ]". 

0000000100000000        32K r-x----  /a.out
0000000100106000         8K rwx----  /a.out
0000000300000000      4096K -------  [ reserved ]
FFFFFFFF7F200000      2112K r-x----  /lib/sparcv9/libc.so.1

To use the reserved space, the application simply needs to specify a fixed address that corresponds to the Reserved VA range on calls to either mmap(2) or shmat(2).

Please note that VA Reservation only addresses possible conflicts related to fixed address mappings.  Applications that use fixed address mappings should be well aware of other potential problems. For instance, my example above (on SPARC) reserves the VA Address space starting at 0x300000000.  This could cause malloc failures if the process is memory intensive and the Heap needs to be grow larger than 8G (heap starts at around 0x100106000 and cannot grow past 0x300000000). 

APIs for handling per-thread signals in Solaris

Solaris 11.3 introduces the following APIs to allow one process to interact directly
with a specific thread in a different process.

int proc_thr_kill(pid_t pid, pthread_t thread, int sig);
int proc_thr_sigqueue(pid_t pid, pthread_t thread, int sig, const union sigval value);
int proc_thr_sigqueue_wait(pid_t pid, pthread_t thread, int sig, const union sigval value,
    const struct timespec *timeout);

These APIs are patterned after the process direct signal APIs kill(2),
sigqueue(3C) and sigqueue_wait(3C). The introduction of these APIs will not change
anything about the basics of  signal generation and reception, i.e., there will be 
no guarantee that the signals have been received by the target. It depends on whether
the signal has been blocked or not or ignored or not at the target.

Use Case
These APIs can be used in any multi-process multi-threaded application between
threads of cooperating processes, where threads that are handling specific tasks
need to receive signals. An example would be an application that deals with network I/O
where in each thread in a process is handling one connection. In such a scenario,
using the thread directed signal API, a specific thread could be forced to cleanup
and abort due to errors or asked to dump status/debug info. A signal handler which
can perform the desired action(abort, dump) in response to a specific signal has
to be implemented by the process/threads that can receive the signals.

What does this mean for you?
Solaris threads of two independent and cooperting processes can now send and receive
signals on a per thread basis.

Document Reference
See man pages for
 - proc_thr_kill(3C)
 - proc_thr_sigqueue(3C)
 - proc_thr_sigqueue_wait(3C)

PV IPoIB in Kernel Zones in Solaris 11.3

The Paravirtualization of IP over Infiniband (IPoIB) in kernel zones is a 
new feature in S11.3 enhancing the network virtualization offering in Solaris.
This allows for existing IP applications in the guest to run over Infiniband 
fabrics. Features such as Kernel zone Live Migration and IPMP are supported 
with the Paravirtualized IPoIB datalinks making it an appealing option.

Moreover, the device management of these guest datalinks are similar to their 
Ethernet counterparts making it straightforward to configure and manage. Zonecfg 
is used in the host to configure the kernel zone's automatic network interface 
(anet) to select the link of the IB HCA port to paravirtualize and assign as the 
lower-link, the Partition Key (P_Key) wthin the IB fabric and the possible 
link mode to choose from which could either be IPoIB-CM or IPoIB-UD.

The PV IPoIB datalink is a front end guest driver emulating a IPoIB VNIC 
in the host created over a physical IB partition datalink per P_Key and port.

To create a PV IPoIB datalink in a kernel zone the configuration is fairly 
simple. Here is an example showing how to create a PV IPoIB datalink in a 
kernel zone.

1. Find the IB datalink in the host to paravirtualize. 

I am selecting net7 for this example.

# ibadm
HCA             TYPE      STATE     IOV    ZONE
hermon0         physical  online    off    global

# dladm show-ib
net5      21280001A0D220 21280001A0D222 2    up      --           --       8001,FFFF
net7      21280001A0D220 21280001A0D221 1    up      --           --       8001,FFFF
# dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         1000   full      igb0
net2              Ethernet             unknown    0      unknown   igb2
net3              Ethernet             unknown    0      unknown   igb3
net1              Ethernet             unknown    0      unknown   igb1
net4              Ethernet             up         10     full      usbecm0
net5              Infiniband           up         32000  full      ibp1
net7              Infiniband           up         32000  full      ibp0

2. Create an IPoIB PV datalinks to a kernel zone.
To add an IPoIB PV interface to a kernel zone say tzone1 add an anet 
and specify a lower-link and pkey which are mandatory properties using 
zonecfg. If not specified IPoIB-CM is the default link mode.

# zonecfg -z tzone1
    zonecfg:kzone0> add anet
    zonecfg:kzone0:anet> set lower-link=net7
    zonecfg:kzone0:anet> set pkey=0xffff
    zonecfg:kzone0:anet> info
    anet 1:
        lower-link: net7
        pkey: 0xffff
        linkmode not specified
        evs not specified
        vport not specified
        iov: off
        lro: auto
        id: 1

3. Additional IPoIB PV datalinks to the kernel zone.
Additional IPoIB PV interfaces to a kernel zone with a lower-link and pkey 
can be added as indicated above. These datalinks can be used exclusively 
to host native zones within the kernel zones.

4. The PV IPoIB datalinks appear within the kernel zone on boot.

root@tzone1:~# dladm 
LINK                CLASS     MTU    STATE    OVER
net1                phys      65520  up       --
net0                phys      65520  up       --

root@tzone1:~# ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
lo0               loopback   ok           --         --
   lo0/v4         static     ok           --
   lo0/v6         static     ok           --         ::1/128
net0              ip         ok           --         --
   net0/v4        static     ok           --
net1              ip         ok           --         --
   net1/v4        static     ok           --

Virtual NICs (VNICs) tzone1/net0 and tzone1/net1 are created in the
host kernel which are the backend of the PV interface.

# dladm show-vnic
tzone1/net1     net7           32000  80:0:0:4d:fe:..   fixed       PKEY:0xffff
tzone1/net0     net7           32000  80:0:0:4e:fe:..   fixed       PKEY:0xffff

New Security Extensions in Oracle Solaris 11.3

In Solaris 11.3, we've expanded the security extensions framework to give you more tools to defend your installations. In addition to Address Space Layout Randomization (ASLR), we now offer tools to set a non-executable stack (NXSTACK) and a non-executable heap (NXHEAP). We've also improved the sxadm(1M) utility to make it easier to manage security extension configurations.


When NXSTACK is enabled, the process stack memory segment is marked non-executable. This extension defends against attacks that rely on injecting malicious code and executing it on the stack. You can also configure NXSTACK to log each time a program tries to execute code on the stack. Log entries are output to /var/adm/messages.

Very few  non-malicious programs need to execute code on the stack, so NXSTACK is enabled by default in Solaris 11.3. If you have a program that needs to execute on the stack and you are able to recompile it, you can pass the "-z nxstack=disable" flag to Solaris Studio. Otherwise, you can use sxadm either to disable NXSTACK or set it to work only on tagged binaries. Most core Solaris utilities are tagged for NXSTACK.

Note that NXSTACK takes the place of the "noexec_user_stack" and "noexec_user_stack_log" entries in /etc/system. You can still use those entries to configure non-executable stack, and they will take precedence over any configuration of NXSTACK. However, they are considered deprecated and you are encouraged to switch to using NXSTACK through sxadm.


When NXHEAP is enabled, the brk(2)-based heap memory segment is marked non-executable. This extension defends against attacks that rely on injecting code and executing it from the heap. You can also configure NXHEAP to log each time a program tries to execute code on the heap. NXHEAP log entries are also written to /var/adm/messages.

Some programs (such as interpreters) do have legitimate reasons to execute code from the heap, so NXHEAP is enabled by default only for tagged binaries. Most core Solaris utilities are already tagged for NXHEAP, and you can tag your own binaries by passing the linker flag "-z nxheap=enable" when compiling with Solaris Studio. Of course, NXHEAP can also be enabled or disabled globally with sxadm.


We've made all sorts of improvements to sxadm in Solaris 11.3, so I'm only going to focus on three new subcommands that will help you configure the new security extensions.

sxadm get

"sxadm get" allows you to observe the properties of security extensions. For example, NXSTACK and NXHEAP have log properties that show whether or not logging is enabled for those extensions. You can query the log property with:

$ sxadm get log nxstack nxheap
EXTENSION           PROPERTY                      VALUE
nxstack             log                           enable
nxheap              log                           enable  

And you can get an easily parsable format by passing the "-p" flag:

$ sxadm get -p log nxstack nxheap

You can also query all properties (equivalent to "sxadm status") with:

$ sxadm get all
EXTENSION           PROPERTY                      VALUE
aslr                model                         tagged-files
nxstack             model                         all
--                  log                           enable
nxheap              model                         tagged-files
--                  log                           enable  

sxadm set

"sxadm set" allows you to set individual properties of extensions without needing to use "sxadm enable". For example, you can disable NXSTACK logging with:

$ sxadm get log nxstack
EXTENSION           PROPERTY                      VALUE
nxstack             log                           enable
$ sxadm set log=disable nxstack
$ sxadm get log nxstack
EXTENSION           PROPERTY                      VALUE
nxstack             log                           disable

sxadm delcust

"sxadm delcust" allows you to restore the default configuration for one or more security extensions. For example:

$ sxadm get all nxstack
EXTENSION           PROPERTY                      VALUE
nxstack             model                         tagged-files
--                  log                           disable
$ sxadm delcust nxstack
$ sxadm get all nxstack
EXTENSION           PROPERTY                      VALUE
nxstack             model                         all
--                  log                           enable

Of course, all of these new subcommands also work with ASLR, even though it only has one "model" property. For example:

$ sxadm get all aslr
EXTENSION           PROPERTY                      VALUE
aslr                model                         tagged-files
$ sxadm set model=all aslr
$ sxadm get all aslr
EXTENSION           PROPERTY                      VALUE
aslr                model                         all
$ sxadm delcust aslr
$ sxadm get all aslr
EXTENSION           PROPERTY                      VALUE
aslr                model                         tagged-files 


I hope you've enjoyed this quick introduction to all the work we've put into the Security Extensions Framework for Solaris 11.3, and I hope you're able to use some or all of it to meet your organization's security needs. For a more detailed explanation of sxadm and the individual security extensions, please see the sxadm(1M) man page.

OpenSSL on Oracle Solaris 11.3

As with Solaris 11.2, Solaris 11.3 delivers two versions of OpenSSL: the non-FIPS 140 version (default) and the FIPS 140 version.  They are both based on OpenSSL 1.0.1o (as of July 7th, 2015).

There are no major features added to Solaris 11.3 OpenSSL; however, there are a couple of things that I would like to note.

EOL SSLv2 Support

SSLv2 protocol has been known to have issues for a while. Therefore, we have decided it's about time to remove SSLv2 support from Solaris OpenSSL. This should not be an issue for most applications out there, as nobody should be using SSLv2 protocols these days.  If your application still does, please consider moving on to more secure TLS protocols.

With Solaris 11.3, SSLv2 entry points are replaced with stub functions, and they are declared 'deprecated'.  Thus, if you are building an application which has references to the SSLv2 entry points, be prepared to see some compiler warnings like:

        warning:  "SSLv2_client_method" is deprecated, declared in : "/usr/include/openssl/ssl.h", line 2035

Now, some of you may wonder: why are we not removing SSLv3 from Solaris OpenSSL as well?
Unfortunately, there are some 3rd party applications which still only support the SSLv3 protocol, thus, we feel that it's not time to remove SSLv3 support from the OpenSSL library just yet. That's not to say SSLv3 protocol is an acceptable protocol.  RFC 7568 Deprecating Secure Sockets Layer Version 3.0 was just published stating that "SSLv3 MUST NOT be used. Negotiation of SSLv3 from any version of TLS MUST NOT be permitted."  Fortunately, Oracle has already been implementing compliance with this RFC for a while now, and most applications supported by Oracle Solaris 11.3 disable SSLv2 and SSLv3 by default.  If you own an application which only supports SSLv3, it is time to move onto the newer and more secure protocols such as TLS 1.2.  We won't be supporting SSLv3 protocols for too much longer.

OpenSSL Thread and Fork Safety (Part 2)

With S11.2, we attempted to make OpenSSL thread and fork safe by default.  (See "OpenSSL Thread and Fork Safety" under "OpenSSL on Solaris 11.2")
However, the fix apparently wasn't complete, and we needed to extend the fix.

With Solaris 11.3 OpenSSL, the following functions are now replaced with stub functions.  Instead of allowing other applications/libraries to specify their own locking and thread identification callback functions, Solaris now has an internal implementation of locking and thread identification within Solaris OpenSSL that's not visible by the API caller.  Applications may still call those functions, but supplied callback functions will not be used by Solaris OpenSSL.


What does that mean for you?
OpenSSL is now thread and fork safe by default, finally.  You don't need to make any modification to
your application nor to your library.  You can relax and have a beer or two

That's all I have for now.

Changes to ZFS ARC Memory Allocation in 11.3

New in Solaris 11.3 is a kernel memory allocation subsystem called the
kernel object manager, or KOM. The first consumer of this subsystem is the

Prior to Solaris 11.3, the ZFS ARC allocated its memory from the kernel heap
space using kmem caches. This has several drawbacks: first, internal
fragmentation can result in memory used by the ARC not being reclaimed by the
system. This problem is particularly acute if large pages are being used, since
the buffer size is considerably smaller than the large page size -- even one
buffer still allocated will prevent the system from freeing the large page.
Another drawback of ZFS ARC using the kernel heap is that all of the kernel
heap is non-relocatable in memory, and thus must reside in the kernel cage.
This can lead to issues allocating large pages or performing DR memory remove
operations once the ARC has grown large, even if it shrinks successfully. As a
workaround for the cage growth issue, many sysadmins have limited the size of
the ZFS ARC cache in /etc/system. Finally, scalability of ARC shrinking prior
to Solaris 11.3 is limited by heap page unmapping speed on large SPARC systems.

In Solaris 11.3, the ZFS ARC allocates its memory through KOM. The metadata
which is frequently accessed by ZFS (such as directory files) remains in
the kernel cage, but the vast majority of the cache which is not frequently
accessed by ZFS now resides outside of the kernel cage, where it can be
relocated by DR and page coalescing. KOM uses a slab size of 2M on x86 or 4M on
SPARC, so internal fragmenation is much less of an issue than it was with 256M
heap pages on SPARC. Scalability is vastly improved, as KOM takes advantage of
64-bit systems by using the seg_kpm framework for its address translations.

With this change, many systems which required limiting the ARC size will no
longer require a hard limit, since the system is able to manage its memory much
better. Metadata heavy workloads, and systems hosting kernel zones, will still
need to limit the ARC size through /etc/system tuning in Solaris 11.3, however.

Saturday Jan 31, 2015

Multi-CPU Binding (MCB)

I want to tell everyone about the cool, new Multi-CPU Binding API introduced in Solaris 11.2.  Bo Li and I wrote up something that explains what it does, its benefits, and how it is used in Solaris along with examples of how to use it:


Multi-CPU Binding (MCB) is new functionality that was added to Solaris 11.2 and is available through a new API called "processor_affinity(2)" and through the pbind(1M) command line tool.  MCB provides similar functionality to processor_bind(2), but can do much more than processor_bind(2):

  1. Bind specified threads to one or more CPUs, leaf locality groups (lgroups)*, or Processor Groups (PGs)**.

  2. Specify strong or weak affinity to CPUs where:

    • Strong affinity means that the threads must only run on the specified CPUs

    •  Weak affinity means that the threads should always prefer to run on the specified CPUs but will run on the closest available CPU where they have sufficient priority to run soonest when the desired CPUs are running higher priority threads

  3. Specify positive or negative affinity for CPUs (ie. want to run or avoid running on specified CPUs)

  4. Enable or disable inheritance across fork(2), exec(2), and/or thr_create(3C).

  5. Query affinities of specified threads to CPUs, PGs, or lgroups.

* lgroups are the Solaris abstraction for telling which CPUs, memory, and I/O devices are within some latency of each other in a Non Uniform Memory Access (NUMA) machine

** PGs are the Solaris abstraction for performance relevant processor sharing relationships in CMT processors (eg. shared execution pipeline, FPU, cache, etc.)


Overall, MCB is more powerful and flexible than what was available in Solaris for affining threads to CPUs before MCB.

Before MCB, you could only do one or more of the following to affine a thread to one or more CPUs:

  • Bind one or more threads to one CPU and have this binding always be inherited across fork(2) and exec(2)
  • Set one or more thread's affinity for a locality group (lgroup) which is the Solaris abstraction for the CPUs, memory, and I/O devices within some latency of each other in a Non Uniform Memory Acess (NUMA) machine
  • Create an exclusive set of CPUs that can only run threads assigned to it, bind one or more threads to this processor set, and always have this processor set binding inherited across fork(2) and exec(2).

In contrast to the old functionality above, MCB has the following new functionality and benefits:

  1. Can bind to more than one CPU
    • The biggest benefit of MCB is that you can affine one or more threads to any set of CPUs that you want.  With this ability, you can bind threads to a NUMA node, processor chip, core, the CPUs sharing some performance relevant hardware component (eg. execution pipeline, FPU, cache, etc.), or an arbitrary set of CPUs.
    • Using a processor set is a way to affine a thread to a set of CPUs like MCB.  However, processor sets are exclusive so only threads assigned to the processor set can run on the CPUs in the processor set.  In contrast, MCB does not set aside CPUs for exclusive use by threads affined to those CPUs by MCB.  Hence, a thread having an MCB affinity for some CPUs does not prevent any other threads from running on those CPUs.
  2. More affinities
    • Having a positive and negative affinity to specify whether to run on or avoid the specified CPUs is a new feature that wasn't offered in the previous APIs for binding threads to CPUs
    • Being able to specify a strong or weak affinity is new for binding threads to CPUs, but isn't a completely new idea in Solaris.  The lgroup affinities already have the notion of strong and weak affinity.  The semantics are pretty different though.  The lgroup affinities mostly affect the order of preference for a thread's home lgroup.  In contrast, MCB strong and weak affinity affect where a thread must run or should prefer to run.  MCB affinities can cause the home lgroup of the thread to change to an lgroup that at least contains some of the specified CPUs, but it does not change the order of preference of home lgroups for the thread.
  3. More flexibility with inheritance
    • MCB has more flexibility with setting the inheritance of the MCB CPU affinities across fork(2),exec(2), or thr_create(3C).  It allows you to enable or disable inheritance of its CPU affinities separately across fork(2), exec(2), or thr_create(3C).

In contrast, the pre-existing APIs for binding threads to a CPU or a processor set make the bindings always be inherited across fork(2), exec(2), and thr_create(3C) so you can never disable any of the inheritance.  With lgroup affinities, you can enable or disable inheritance for fork(2), exec(2), and thr_create(3C), but you must enable or disable inheritance across all or none of these operations.

How is MCB used in Solaris?

Solaris optimizes performance for I/O on Non Uniform Memory Access (NUMA) machines where some I/O devices are closer to some CPUs and memory than others.  Part of what Solaris does for its NUMA I/O optimizations is place kernel I/O helper threads that help usher I/O from the application to the I/O device and vice versa near the I/O device.

Before Solaris 11.2, Solaris would bind each I/O helper thread to one CPU near its corresponding I/O device.  Unfortunately, this can cause some performance issues when the CPU where the I/O helper thread is bound becomes very busy running higher priority threads or handling interrupts.  Since the I/O helper thread is bound to just one CPU, it can only run on that one CPU, isn't allowed to run on any other CPU, and can have to wait a long time to run.  This can cause I/O performance to go down because the I/O will take longer to process.

In S11.2, MCB is used to overcome this problem by affining each I/O helper thread to one or more processor cores.  This gives the I/O helper threads more places to run and reduces the chance that they get stuck on a very busy CPU.  Also, MCB weak affinity can be used to specify that the I/O helper threads prefer to run on the specified CPUs but it is ok to run them on the closest available CPUs if the specified CPUs are too busy.



pbind(1M) is an existing tool to control and query the bindings of processes or LWPs to a CPU and has been modified to support affining threads to more than one CPU.

When specifying target CPUs, the user could directly use their processor IDs or indirectly use their Processor Group (PG) or Locality Group (lgroup) ID.

Bind processes/LWPs

Below are the equivalent ways of binding process 101048 to CPU 1. By default, the binding target type is CPU and, idtype is pid and binding affinity is strong:

    # pbind -b 1 101048

    pbind(1M): pid 101048 strongly bound to processor(s) 1.

    # pbind -b -c 1 101048

    pbind(1M): pid 101048 strongly bound to processor(s) 1.

    # pbind -b -c 1 -i pid 101048

    pbind(1M): pid 101048 strongly bound to processor(s) 1.

    # pbind -b -c 1 -s -i pid 101048

    pbind(1M): pid 101048 strongly bound to processor(s) 1.

Bind processes/LWPs to CPUs specified by Processor Group or Locality Group

    Binding process 101048 to the CPUs in Processor Group 1:

    # pbind -b -g 1 101048

    pbind(1M): pid 101048 strongly bound to Processor Group(s) 1

    Binding process 101048 to the CPUs in Locality Group 2:

    # pbind -b -l 2 101048

    pbind(1M): pid 101048 strongly bound to Locality Group(s) 0 2.

Weak binding

    # pbind -b 2 -w 101048

    pbind(1M): pid 101048 weakly bound to processor(s) 2.

Negative binding targets

    Weakly binding process 101048 to all CPUs but the ones in Processor Group 1:

    # pbind -b -g 1 -n -w 101048

    pbind(1M): pid 101048 weakly bound to Processor Group(s) 2.

Binding LWPs

When the user binds a process the specified CPUs, all the LWPs belonging to that process will be automatically bound to those CPUs. The user may also bind LWPs in the same process individually. LWPs range could be specified after ‘/’ and separated by comma.

    Strongly binding LWP 2, 3, 4 of process 101048 to CPU 2:

    # pbind -b -c 2 -i pid 116936/2-3,4

    pbind(1M): LWP 116936/2 strongly bound to processor(s) 2.

    pbind(1M): LWP 116936/3 strongly bound to processor(s) 2.

    pbind(1M): LWP 116936/4 strongly bound to processor(s) 2.

Query processes/LWPs binding

When querying for bindings of specific LWPs, the user may request that the resulting set of CPUs be identified through their IDs, the Processor Groups or the Locality Groups that contain them:

    # pbind -q 101048

    pbind(1M): pid 101048 weakly bound to processor(s) 2 3.

    # pbind -q -g 101048

    pbind(1M): pid 101048 weakly bound to Processor Group(s) 2.

    # pbind -q -l 101048

    pbind(1M): pid 101048 weakly bound to Locality Group(s) 0 2.

The user may also query all bindings for a specified CPU

    # pbind -Q 2

    pbind(1M): LWP 101048/1 weakly bound to processor(s) 2 3.

    pbind(1M): LWP 102122/1 weakly bound to processor(s) 2 3.

Binding Inheritance

By default, bindings are inherited across exec(2), fork(2) and thr_create(3C), but inheritance across any of these can be disabled.  For example, the user could bind a shell process to a set of CPUs and specify the binding is not inherited in fork(2).  In this way, all processes created by this shell will not be bound to any CPUs.

    Bind processes/LWPs but request binding not inherited across fork(2):

    # pbind -b -c 2 -f 101048                      

    pbind(1M): pid 101048 strongly bound to processor(s) 2.

Explanation of return value is commented in the manpage. For more details, please refer to manpage of pbind(1M).



MCB introduces a new processor_affinity(2) system call to control and query the affinity to CPUs for processes or LWPs.

    int processor_affinity(procset_t *ps, uint_t *nids, id_t *ids, uint32_t *flags);

Each option and flag used in pbind(1M) could be directly mapped to processor_affinity(2).  Similarly, the user may request the binding to be either strong or weak by specifying flag PA_AFF_STRONG or PA_AFF_WEAK.  The target CPUs could be specified by their processor IDs, Processor Group (PG) or Locality Group (lgroup) ID when used with corresponding flag PA_TYPE_CPU, PA_TYPE_PG, or PA_TYPE_LGRP.

The ps argument identifies to which LWP(s) that the call should be applied through a procset structure (see procset.h(3HEAD) for details).  The flags argument must contain valid combinations of the options given in the manpage.

When setting affinities, the nids argument points to a memory position holding the number of CPU, PG or LGRP identifiers to which affinity is being set, and ids points to an array with the identifiers.  Only one type of affinity must be specified along with one affinity strength.  Negative affinity is a type modifier that indicates that the given IDs should be avoided and affinity of the specified type should be set to all of the other processors in the system.

When specifying multiple LWPs, the threads should all be bound to the same processor set since they can be affined to CPUs in their processor set.  Additionally, setting affinities will succeed if processor_affinity(2) is able to set a LWP's affinity for any of the specified CPUs even if a subset of the specified CPUs are are invalid, offline, or faulted.

Setting strong affinity for CPUs [0-3] to the current LWP:

    #include <sys/processor.h>

    #include <sys/procset.h>

    #include <thread.h>

    procset_t ps;

    uint_t nids = 4;

    id_t ids[4] = { 0, 1, 2, 3 };

    uint32_t flags = PA_TYPE_CPU | PA_AFF_STRONG;

    setprocset(&ps, POP_AND, P_PID, P_MYID, P_LWPID, thr_self());

    if (processor_affinity(&ps, &nids, ids, &flags) != 0) {

        fprintf(stderr, "Error setting affinity.\n");



Setting weak affinity for CPUs in Processor Group 3 and 7 to process 300's LWP 2:

    #include <sys/processor.h>

    #include <sys/procset.h>

    #include <thread.h>

    procset_t ps;

    uint_t nids = 4;

    id_t ids[4] = { 3, 7 };

    uint32_t flags = PA_TYPE_PG | PA_AFF_WEAK;

    setprocset(&ps, POP_AND, P_PID, 300, P_LWPID, 2);

    if (processor_affinity(&ps, &nids, ids, &flags) != 0) {

        fprintf(stderr, "Error setting affinity.\n");



Upon a successful query, nids will contain the number of CPUs, PGs or LGRPs for which the specified LWP(s) has affinity.  If ids is not NULL, processor_affinity(2) will store the IDs of the indicated type up to the initial nids value.  Additionally, flags will return the affinity strength and whether any type of inheritance is excluded.

When querying affinities, PA_TYPE_CPU, PA_TYPE_PG or PA_TYPE_LGRP may be specified to indicate that the returned identifiers must be either be the CPUs, Processor Groups, or Locality Groups that contain the processors for which the specified LWPs have affinity.  If no type is specified, the interface defaults to CPUs.

Querying and printing affinities for the current LWP:

    #include <sys/processor.h>

    #include <sys/procset.h>

    #include <thread.h>

    procset_t ps;

    uint_t nids;

    id_t *ids;

    uint32_t flags = PA_QUERY;

    int i;

    setprocset(&ps, POP_AND, P_PID, P_MYID, P_LWPID, thr_self());

    if (processor_affinity(&ps, &nids, NULL, &flags) != 0) {

        fprintf(stderr, "Error querying number of ids.\n");


    } else {

        fprintf(stderr, "LWP %d has affinity for %d CPUs.\n",

            thr_self(), nids);


    flags = PA_QUERY;

    ids = calloc(nids, sizeof (id_t));

    if (processor_affinity(&ps, &nids, ids, &flags) != 0) {

        fprintf(stderr, "Error querying ids.\n");



    if (nids == 0)

        printf("Current LWP has no affinity set.\n");


        printf("Current LWP has affinity for the following CPU(s):\n");

    for (i = 0; i < nids; i++)

        printf(" %u", ids[i]);


When clearing affinities, the caller can either specify a set of LWPs that should have their affinities revoked (through the ps argument) or none or specify a list of CPU, PG or LGRP identifiers for which all affinities must be cleared.  See EXAMPLES below for details.

Clearing all affinities for CPUs 5 and 7:

    #include <sys/processor.h>

    #include <sys/procset.h>

    #include <thread.h>

    uint_t nids = 2;

    id_t ids[4] = { 5, 7 };

    uint32_t flags = PA_CLEAR | PA_TYPE_CPU;

    if (processor_affinity(NULL, &nids, ids, &flags) != 0) {

        fprintf(stderr, "Error clearing affinity.\n");



Explanation of return value is commented in the manpage. For more details, please refer to manpage of processor_affinity(2).


The processor_bind(2) binds processes/LWPs to a single CPU.  The interface remains the same as early Solaris version, but its implementation changes significantly to use MCB.  The processor_bind(2) and processor_affinity(2) are implemented the same way only differing in the limitations imposed by the number and types of arguments each accepts.  The calls to processor_bind(2) are essentially calls to processor_affinity(2) which only allow setting and querying binding to a single CPU at a time.

    int processor_bind(idtype_t idtype, id_t id, processorid_t new_binding, processorid_t *old_binding);

This function binds the LWP (lightweight process) or set of LWPs specified by idtype and id to the processor specified by new_binding. If old_binding is not NULL, it will contain the previous binding of one of the specified LWPs, or PBIND_NONE if none were previously bound.

For more details, please refer to the manpage of processor_bind(2).

Wednesday Dec 10, 2014

Which Oracle Solaris Virtualization?

From time to time as the product manager for Oracle Solaris Virtualization I get asked by customers which virtualization technology they should choose. This is probably because of two main reasons.

  1. Choice: Oracle Solaris provides a choice of virtualization technologies so you can tailor your virtual infrastructure to best fit your application, not to have force (and hence compromise) your application to fit a single option 
  2. No way back: There is the perception, once you make your choice if you get it wrong there is no way back (or a very difficult way back), so it is really important to make the right choice

Understandably there is occasionally a lot of angst around this decision but, as always, with Oracle Solaris there is good news. First the choice isn't as complex as it first seems and below is a diagram that can help you get a feel for that choice. We now have many many customers that are discovering that the combination of Oracle Solaris Zones inside OVM Server for SPARC instances (Logical Domains) gives them the best of both worlds.

Second with Unified Archives in Oracle Solaris 11.2 you always have a way back. With a Unified Archive you can move from a Native Zone to a Kernel Zone to a Logical Domain to Bare Metal and any and all combinations in-between. You can test which is the best type of virtualization for your applications and infrastructure and if you don't like it change to another type in a few minutes. 

BTW if you want a more in-depth discussion of virtualization and how to best utilize it for consolidation, check out the Consolidation Using Oracle's SPARC Virtualization Technologies white paper.  

Wednesday May 07, 2014

Solaris-specific Providers for Puppet

As I mentioned in my previous post about Puppet, there are some new Solaris-specific Resource Types for Puppet 3.4.1 in Oracle Solaris 11.2.  All of these new Resource Types and Providers have been available on java.net since integration into the FOSS projects gate.  I am actively working with Puppet Labs to get this code pushed back upstream so that it's available for anybody to work with.

Here's a small description of a few (of 23) of the new Resource Types:

  • boot_environment
    • name - The boot_environment name (#namevar)
    • description - Description for the new boot environment
    • clone_be - Create a new boot environment from an existing inactive boot environment
    • options - Create the datasets for a new boot environment with specific ZFS properties.  Specified as a hash
    • zpool - Create the new boot environment in the specified zpool
    • activate - Activate the specified boot environment
  • pkg_publisher
    • name - The publisher name (#namevar)
    • origin - Which origin URI(s) to set.  For multiple origins, specify them as a list
    • enable - Enable the publisher
    • sticky - Set the publisher 'sticky'
    • searchfirst - Set the publisher first in the search order
    • searchafter - Set the publisher after the specified publisher in the search order
    • searchbefore - Set the publisher before the specified publisher in the search order
    • proxy - Use the specified web proxy URI to retrieve content for the specified origin or mirror
    • sslkey - The client SSL key
    • sslcert - The client SSL certificate
  • vnic
    • name - The name of the VNIC (#namevar)
    • temporary - Optional parameter that specifies that the VNIC is temporary
    • lower_link - The name of the physical datalink over which the VNIC is operating
    • mac_address - Sets the VNIC's MAC address based on the specified value
  • dns
    • name - A symbolic name for the DNS client settings to use.  This name is used for human reference only
    • nameserver - The IP address(es) the resolver is to query.  A maximum of 3 IP addresses may be specified.  Specify multiple addresses as a list
    • domain - The local domain name
    • search - The search list for host name lookup.  A maximum of 6 search entries may be specified.  Specify multiple search entries as a list
    • sortlist - Addresses returned by gethostbyname() to be sorted.  Entries must be specified in IP 'slash notation'.  A maximum of 10 sortlist entries may be specified.  Specify multiple entries as an array.
    • options - Set internal resolver variables.  Valid values are debug, ndots:n, timeout:n, retrans:n, attempts:n, retry:n, rotate, no-check-names, inet6.  For values with 'n', specify 'n' as an integer.  Specify multiple options as an array.

Other Resource Types are:

  • Datalink Management:   etherstub, ip_tunnel, link_aggregation, solaris_vlan
  • IP Network Interfaces:  address_object, address_property, interface_properties, ip_interface, ipmp_interface,                                             link_properties, protocol_properties, vni_interface
  • pkg(5) Management:  pkg_facet, pkg_mediator, pkg_variant
  • Naming Services:  nis, nsswitch, ldap

The zones Resource Type has been updated to provide Kernel Zone and archive support as well.

Tuesday May 06, 2014

OpenSSL on Oracle Solaris 11.2

I'm sure you all wonder which version of OpenSSL is delivered with Oracle Solaris 11.2?
The answer is the latest and greatest OpenSSL 1.0.1h!

Now that I answered 80% of the questions you may have with regard to OpenSSL, I would like to announce three major features added to the Oracle Solaris 11.2 which I'm sure you'll all be excited to hear :-)

Inlined T4/T4+ instructions support and Engines

Background: S11.1 and earlier

Years and years ago, I worked on the SPARC T2/T3 crypto drivers.  On the SPARC T2/T3 processors, the crypto instructions are privileged; and therefore, the drivers are needed to access those instructions.  Thus, to make use of T2/T3 crypto hardware, OpenSSL had to use pkcs11 engine which adds lots of cycles going through the thick PKCS#11 session/object management layer, Solaris kernel layer, hypervisor layer to the hardware, and all the way back.  However, on SPARC T4/T4+ processors, crypto instructions are no longer privileged; and therefore, you can access them directly without drivers.  Valerie Fenwick has a nice article explaining the lower level specifics of the T4 hardware.

What does that means to you?  Much improved performance!  No more PKCS#11 layer, no more copy-in/copy-out of the data from the userland to the kernel space, no more scheduling, no more hypervisor, NADA!   As much as I enjoyed working on the crypto drivers, I'm happy to see this driver-less transition! ;-)

Dan Anderson has a great blog entry describing the difference between the T3 and T4 based hardware.  As he described, on Solaris 11 and 11.1, we made the T4 instructions available to OpenSSL via OpenSSL engine mechanism.  It was great for the time being, but to make T4 instruction support available directly from the OpenSSL website and to even bypass the engine layer all together, I was assigned to assassinate the t4 engine (Sorry, Dan) and make T4 instructions embedded to the OpenSSL's internal crypto module (a.k.a adding inlined T4 instruction support).

S11.2 and beyond

As I was learning how OpenSSL development worked, I learned OpenSSL upstream engineers had already committed the inlined T4 instruction support to the OpenSSL 1.0.2 branch.  (Thanks for making my life easier, OpenSSL team!)  I was job-less for a second, but since OpenSSL 1.0.2 won't be available in time for Solaris 11.2 delivery, we decided to patch the inlined T4 instruction support to our OpenSSL 1.0.1g delivery bundled with Solaris 11.2.

With this change, you'll get the T4/T4+ instruction support without engines; and therefore, you get as great performance as the t4 engine and even better performance for some algorithms (i.e. SHA-1, MD5) by default.

Other Engines

Oracle Solaris 11.2  killed not only the t4 engine, but also the aesni engine and the devcrypto engine.   The story for the aesni engine is pretty much similar to the one for the t4 engine.   It was introduced in Solaris 11 as Dan Anderson described in his article, and killed in Solaris 11.2.  AES-NI instruction support is now embedded in the OpenSSL upstream implementation (OpenSSL 1.0.1); and therefore, the separate engine is no longer needed.  The devcrypto engine was removed simply due to the lack of use.

With all this change, Oracle Solaris 11.2 OpenSSL is left with the one and only pkcs11 engine. pkcs11 engine is still necessary on the T2/T3 platforms and on any platform with the hardware keystore (i.e. SCA 6000). However, be sure to leave the pkcs11 engine disabled on T4/T4+ if you want max performance.  Again, I would like to emphasize that the OpenSSL performance on T4/T4+ platforms are looking MUCH better compared to the one on T2/T3 platforms!  It's time to move onto T4/T4+ platform, Y'all!!

OpenSSL FIPS-140 version support

It is important for many federal and financial service customers to have their cryptographic products being FIPS-140 validated. Oracle Solaris Cryptographic Framework recently achieved a FIPS 140-2 validation(yay!!), and it was very important to deliver the FIPS-140 validated OpenSSL with Solaris 11.2.

At the time Solaris 11 was released, OpenSSL 1.0.0 was the latest OpenSSL version available, and since OpenSSL 1.0.0 was not FIPS-140 validated, we only delivered non-FIPS-140 version of OpenSSL with Solaris 11.

Thanks to the OpenSSL upstream team (again), the best and greatest OpenSSL 1.0.1 can be compiled with a FIPS-140 validated module, and we are now delivering the FIPS-140 version of OpenSSL in addition to the non-FIPS-140 version of OpenSSL with Solaris 11.2.

When do you want to use FIPS-140 version of OpenSSL?

It's probably important to mention that the FIPS-140 version of OpenSSL is not for everybody.  The FIPS-140 validated version of cryptographic products come with a price tag.  Enabling FIPS-140 mode adds a lot of cycles to satisfy the FIPS-140 verification requirement (i.e. POST, pair-wise consistency test, contiguous RNG test, etc) at run time.  In addition, inlined T4/T4+ instruction support is not available in the FIPS-140 version of OpenSSL, and you won't get the best performance when the FIPS-140 mode is enabled.

That said, I would recommend you to enable FIPS-140 mode *only if* you need to.  The good news is that you will get the FIPS-140 compatible implementation even when the FIPS-140 mode is disabled.  It's just that it runs much faster!
That's one of the reasons why non-FIPS-140 version of OpenSSL is activated by default.

How to enable FIPS-140 version of OpenSSL

If you decided to enable FIPS-140 mode, here is how you can switch to the FIPS-140 version of OpenSSL.

Make sure you have the FIPS-140 version of the OpenSSL installed on the system.

# pkg mediator -a openssl
openssl  vendor            vendor     default
openssl  system            system     fips-140

To activate the fips-140 implementation
# pkg set-mediator -I fips-140 openssl

To check the currently activated OpenSSL implementation
# pkg mediator openssl

To change back to the default (non-FIPS-140) implementation
# pkg set-mediator -I default openssl

OpenSSL Thread and Fork Safety

OpenSSL provides an interface CRYPTO_set_locking_callback() for you (any application or library) to set your own locking callback function with the mutexes of your choice.
That sounds reasonable if the OpenSSL library is used only by applications.  However, when the OpenSSL library is used by another library, such design is asking for trouble.

We've seen a case where an OpenSSL application used a library which set a locking callback function, and the library got unloaded while the application continued using the OpenSSL library.  The application got a segfault because OpenSSL tried to reference the invalid locking callback function set by the unloaded library.  Whose fault is this?

You can argue that the library should have set the locking callback to NULL when it was unloaded.
Well, not quite.  Once the locking callback is set to NULL, the application is no longer thread-safe.

OpenSSL needed some changes to make applications and libraries thread and fork safe.

To fix this issue, the OpenSSL library (libcrypto.so) delivered with Solaris 11.2 sets up mutexes and a locking callback internally, and it ignores an attempt to set/change the locking callback.

What does that mean to you?
OpenSSL is now thread and fork safe by default.  You don't need to make any modification to your application nor library.  You can relax and have a margarita or two.

That's all I have for now.

Note:  The version number delivered with Solaris 11.2 was updated from 1.0.1g to 1.0.1h on Jun 05, 2014. OpenSSL version 1.0.1g was delivered with Solaris 11.2 Beta.

Puppet Configuration in Solaris

What is Puppet?

Puppet is IT automation software that helps system administrators manage IT infrastructure. It automates tasks such as provisioning, configuration, patch management and compliance. Repetitive tasks are easily automated, deployment of critical applications occurs rapidly, and required system changes are proactively managed. Puppet scales to meet the needs of the environment, whether it is a simple deployment or a complex infrastructure, and works on-premise or in the cloud.

Puppet is now available as part of Oracle Solaris 11.2!

Use ntpdate or ntpd -q to set the date

Puppet can error out with some very strange messages if the clocks on both the master and agent aren't synchronized.  You can use ntpdate or ntpd -q to set the date just once if you'd like to manage the NTP service with Puppet, or you can configure NTP.

Install the required packages on both systems 

# pkg install puppet

This will install the puppet, facter and ruby-19 packages.

Configure the Puppet SMF instances

master # svccfg -s puppet:master setprop config/server = master.fqdn.company.com
master # svccfg -s puppet:master refresh
master # svcadm enable puppet:master

agent # svccfg -s puppet:agent setprop config/server = master.fqdn.company.com
agent # svccfg -s puppet:agent refresh

Test the connection to the master and configure authentication

Before enabling the puppet:agent service, you'll want to test the connection first in order to set up authentication

agent # puppet agent --test --server master.fqdn.company.com

Info: Creating a new SSL key for agent.fqdn.company.com
Info: Caching certificate for ca
Info: Creating a new SSL certificate request for agent.fqdn.company.com
Info: Certificate Request fingerprint (SHA256):
**Exiting; no certificate found and waitforcert is disabled**

Now that the agent has created a new SSL key, authorization needs approval on the master.

Sign the SSL certificate on the master

master # puppet cert list
  "agent.fqdn.company.com" (SHA256)

master # puppet cert sign agent.fqdn.company.com
Notice: Signed certificate request for agent.fqdn.company.com
Notice: Removing file Puppet::SSL::CertificateRequest agent.fqdn.company.com at

Retest the agent to ensure it can connect

agent # puppet agent --test --server master.fqdn.company.com
Info: Caching certificate for agent.fqdn.company.com
Info: Caching certificate_revocation_list for ca
Info: Retrieving plugin
Info: Caching catalog for agent.fqdn.company.com
Info: Applying configuration version '1371232699'
Notice: Finished catalog run in 0.65 seconds

Enable the agent service

agent # svcadm enable puppet:agent

Additional configuration of /etc/puppet/puppet.conf on both master and agent (optional) 

Further customizations can be made in /etc/puppet/puppet.conf.  See Puppet's Configurables page for more details.

NOTE:  Puppet's configuration is completely done via  SMF stencils.  /etc/puppet/puppet.conf should not be directly edited as any edits will be lost when the Puppet SMF service (re)starts.  Setting a new value should be done via svccfg(1M):

# svccfg -s puppet:agent setprop config/<option> = <value>

# svccfg -s puppet:agent refresh

(substitute :master as needed)

Tuesday Apr 29, 2014

New in SMF Documentation for Oracle Solaris 11.2

The Service Management Facility guide is all new for the Oracle Solaris 11.2 release, with much more information including an example of creating a pair of services that start and stop an Oracle Database instance and an examination of the Puppet stencil service.

For more information about stencil services, see Solaris SMF Weblog, and see the svcio.1 and smf_stencil.4 man pages below.

Managing System Services in Oracle Solaris 11.2

Chapter 1, "Introduction to the Service Management Facility"

Chapter 2, "Getting Information About Services"
- Service states and contract processes
- Service dependencies and dependents
- New -L option to show service log files
- Property values in layers, snapshots, and customizations

Chapter 3, "Administering Services"
- Starting, restarting, stopping
- Re-reading configuration
- Configuring notification

Chapter 4, "Configuring Services"
- Setting and adding property values
- Adding service instances
- Using profiles to configure multiple systems

Chapter 5, "Using SMF to Control Your Application"
- Creating a service to start or stop an Oracle Database instance
- Using a stencil to create a configuration file

Appendix A, "SMF Best Practices and Troubleshooting"
- Repairing an instance that is in maintenance
- Diagnosing and repairing repository problems
- How to investigate problems starting services at system boot

User Commands                                            svcio(1)

     svcio - create text files  based  on  service  configuration

     /lib/svc/bin/svcio [-alux] [-f FMRI-instance] [-g group]
          [-i file] [-m mode] [-o file] [-O owner]
          [-R dir [-L opts [-p]]] [-S dir]

     The svcio utility reads a template known as  a  stencil  and
     emits  text  based on that file in conjunction with the pro-
     perties from a service instance.  In the typical case, svcio
     is used to generate application-specific configuration files
     for services that are managed by, but are not able  to  read
     their configurations from, SMF.

     If the stencil itself contains any errors, svcio  will  pro-
     vide  a  snippet  of  text  along with a line number and the
     cause of the error.  Unless the error would prevent  further
     progress,  each  error  is printed in the order it occurs in
     the file.

     Error messages are printed to the standard error stream.

     The following options are supported:


         Process all files configured for an instance.

         Specifically, svcio will look  at  all  property  groups
         with  the  type "configfile" and determine which stencil
         to use and where to write the resulting file by  examing
         the values of the properties "path" and "stencil" within
         that property group.  For  example,  if  property  group
         "conf1"  is  of the appropriate type then svcio will use
         the value of "conf1/stencil" as the path of the  stencil
         file  and  "conf1/path" as the path of the file to which
         to write the output.  Additionally, the optional proper-
         ties  "owner"  and  "group" can be used to set the owner
         and group of the output file respectively. If  the  pro-
         perty  group  name  or property name contains a reserved
         character (see smf(5)) then it must be encoded.

     -f FMRI-instance

         The FMRI of  a  service  instance  to  run  the  stencil

     -g group

         The group to associate the output files with

     -i file

         The path to the stencil file (default is  stdin).   This
         option cannot be used with -a.


         Rather than outputting a text file, simply list all pro-
         perties  that would be referenced were a file to be out-

     -L opts

         Specify options to be passed to mount(2)  when  loopback
         mounting output files.  If this option is not specified,
         output files will  not  be  loopback  mounted.   The  -R
         switch  is  required  with  this option.  A regular file
         will be written to the specified output path, rooted  at
         prefix. This file will be loopback mounted to the speci-
         fied output path, rooted at / or the value of the  -R-fR

     -m mode

         Set the mode for any output file (default is 644).

     -o file

         The path to the output file (default is  stdout).   This
         option cannot be used with -a.

     -O owner

         Set the owner of the output files

     -R prefix

         Set the root prefix for all output files.


         Create nonexistent  intermediate directories in the out-
         put  file  path  rooted  at  the value of the -R option.
         Note:  This option will not create directories that  are
         missing in the path to the mount point.

     -S dir

         Look for stencils in  this  directory  rather  than  the


         Unlink output files and undo loopback mounting.  No out-
         put files will be created.


         Terminate svcio on the first error rather than continu-
         ing to the next stencil.

     The following operands are supported:


         A  fault  management  resource  identifier  (FMRI)  that
         specifies  one or more instances (see smf(5)). FMRIs can
         be abbreviated by specifying the instance name,  or  the
         trailing portion of the service name. For example, given
         the FMRI:


         The following are valid abbreviations:


         The following are invalid abbreviations:


         If the  FMRI  specifies  a  service,  then  the  command
         applies  to  all  instances of that service, except when
         used with the -D option.

         Abbreviated forms of FMRIs are unstable, and should  not
         be used in scripts or other permanent tools.


         An FMRI that specifies an instance.

     Example 1 Processing All Configuration Files for an Instance

     This example processes all  configured  configuration  files
     for an instance:

       example% svcio -a -f svc:/service:instance

     Example 2 Removing All Configuration Files for an Instance

     This example unlinks and unmounts all configured  configura-
     tion files for an instance:

       example% svcio -au -f svc:/service:instance

     Example 3 Using an Unconfigured Stencil for an Instance

     This example produces an output  file  based  on  a  stencil
     that has not been configured:

       example% svcio -o /etc/svc.conf -i ~/svc.stencil \
       -f svc:/service1:instance

     The following exit values are returned:


         Successful command invocation.


         A fatal error occurred as a result of a failed system


         Invalid command line options were specified.


         A fatal error occurred as a result of an unexpected SMF


         An error occurred parsing a stencil.

     See attributes(5) for descriptions of the following attri-

    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
    | Availability                | system/core-os              |
    | Interface Stability         | Committed                   |

     smf_stencil(4), svcs(1), svcprop(1), svcadm(1M), svccfg(1M),
     svc.startd(1M), stat(2), libscf(3LIB), smf(5)

File Formats                                       smf_stencil(4)

     smf_stencil - defines the relationship between  SMF  proper-
     ties and a flat configuration file

     A stencil file defines a mapping between SMF properties  and
     flat text files.  The Service Management Facility, described
     in smf(5),  uses  stencil  files  in  conjunction  with  the
     svcio(1)  utility to generate text-based configuration files
     from SMF properties by invoking svcio(1)  before  the  start
     and  refresh  methods  of  a property configured service are

     The language understood by svcio(1) is comprised of a  small
     set  of  expressions  that  can  be  combined  to  concisely
     describe the structure of a configuration file  and  how  to
     populate  that  file with data from the SMF repository.  The
     expressions comprising the language are listed below:

     I.    $%{property_fmri[:<transform><transform_expression>]}

       Retrieve and emit the value(s) associated with a property.

       <transform> can be one of the following characters,  which
       define how to handle <transform_expression>:

       -   emit <transform_expression> if  the  property  is  not

       +   emit <transform_expression> if the property is defined

       ?   <transform_expression>    is    of     the     form
           "<true>[/<false>]".  If the boolean property is true,
           then emit <true>, otherwise emit <false>.

       ,   emit <transform_expression>  as  a  delimiter  between
           values in multi-valued properties

       ^   <transform_expression>  is  of  the  form  "<p>[/<s>]"
           where  <p>  is  used as a prefix and <s> is used as a
           suffix when emitting property values

       ^*  Same as '^', but nothing is emitted if the property is
           undefined or empty

       '   <transform_expression>     takes      the      form
           "<pattern>/<replace>",  where  <pattern>  is  a shell
           pattern style glob (for details, see  the  File  Name
           Generation section of sh(1)).  The first substring to
           match <pattern> is replaced with <replace>

       ''  Same as ', but every substring that matches  <pattern>
           is replaced with <replace>

     II.   $%/regular_expression/ { <sub_elements> }

       Process <sub_elements> for each property FMRI and property
       group  FMRI  that  matches regular_expression. As the pro-
       perty group and property is specified as an FMRI they must
       be  encoded  if  they  contain  reserved  characters  (see

     III.  $%<number>

       Retrieve a marked subexpression from a regular expression.

       Retrieve a marked subexpression from a regular expression.

     IV.   $%define name /regular_expression/ { <sub_elements> }

       Name a regular expression such that it can be  used  else-
       where in the stencil.

     V.    $%[regex_name[:<transform><transform_expression]]

       Recall a previously defined regular expression (as in IV).
       In  this  case, the set of transform characters is limited
       to ^, ', and ''.

     VI.   $%define name arg 1 arg 2 ... argN { <sub_elements> }

       Name a macro such that it can be  used  elsewhere  in  the

       Note: In the text above, '[' and ']' denote the macro del-
       imiters  rather  than  optional parameters as they do in I
       and V.

     VII.  $%<arg_name>

       Retrieve the text associated with a macro argument.

     VIII. $%[name foo bar ... baz]

       Recall a previously defined macro (as in VI).

     IX.   $%<method_token>

       Retrieve the value of an environment variable  represented
       by a method token describe in smf_method(5).

     X.    Literal Text

       Arbitrary text can  be  freely interspersed throughout the
       stencil without any denotative markers.

     XI.   ;comments

       A line that starts  with  a  ';',  ignoring  leading  whi-
       tespace,  is  considered a comment and not processed along
       with the rest of the file.

     Any of the  special  characters  above  can  be  escaped  by
     preceding  them  with a blackslash (\) character.  Addition-
     ally, the '\n' and '\t' sequences are expanded into endlines
     and  tab characters respectively.  Any non-special character
     preceded by '\' will emit only the character  following  the
     slash.  Thus '\g' will be translated to 'g'.

     I. $%{property_fmri[:<transform><transform_string>]}

       Example: $%{general/enabled:?on/off}

       This element will fetch the value (or values)  of  a  pro-
       perty  and  emit  a  string  subject to the transform, the
       transform string, and the values themselves.   <transform>
       is  a one- or two- character identifier that indicates how
       to modify a property value before emitting it, subject  to
       <transform_string>, as explained above.

       Note that nesting is allowed.  Imagine we  want  to  print
       the value of foo/b if foo/a is defined, but 'blueberry' if
       it is not.  This could be accomplished via the following:

       it is not.  This could be accomplished via the following:


       For the purposes of resolving FMRIs  into  values,  a  few
       shortcuts  are allowed.  Since svcio is always run against
       a specific instance, properties from that instance can  be
       shortened to "pg/prop" rather than a fully qualified FMRI.
       To  reference  properties  that  are  not  part   of   the
       instance,                     the                     full
       "svc:/service:instance/:properties/pg/prop" is required.

     II. $%/regular_expression/ { sub_elements> }

       Example: $%/pg/(.*)/ {lorem ipsum}

       This element defines a regular expression to match against
       the  entire  set  of property FMRIs on a system.  For each
       property FMRI that matches, the subelements are evaluated.
       When evaluating subelements, svcio(1) iterates over match-
       ing properties in lexicographical  order.   svcio(1)  uses
       the  POSIX extended regular expression set (see regex(5)),
       and  supports  saving  subexpressions   via   parentheses.
       Finally,  as a convenience svcio will surround the regular
       expression with ^ and $ characters.  Should you want  your
       expression  to  match  the  middle of strings, prepend and
       append '".*".

       Since  both  properties  associated  with  the   operating
       instance  as  well  as  properties  from other services or
       instances, regular expressions are only matched against  a
       subset  of  FMRIs  on the system.  If a regular expression
       includes the substring ":properties",  the  expression  is
       parsed for the service and/or instance where those proper-
       ties reside.  Once those properties are fetched, the regu-
       lar  expression  is matched only against that set.  If the
       regular expression does not contain  that  substring,  the
       only  properties  matched  are  those  associated with the
       operating instance.

       Note that the end of a regular expression is denoted by '/
       {'  so  it  is  not  necessary  to escape slash characters
       within the regular expression.

     III.  $%<number>

       Example: $%3

       This element emits the value from a  stored  subexpression
       in  a  preceding  regular  expression.  Using this element
       outside the context of a regular expression is  an  error.
       A valid use would be as follows:

       $%/foo/(.*)/ {
            $%1 = $%{foo/$%1}

       In the preceding example, every property in property group
       foo    would    be    emitted    as   "<property_name>   =

       Since arbitrary subelements are allowed within  a  regular
       expression  block,  nested  regular expressions have their
       subexpression indices adjusted relative to  the  index  of
       the  last subexpression of the containing expression.  For

       ;([a-zA-Z_-]*) is $%1
       $%/([a-zA-Z_-]*)/ {


       ;([a-zA-Z_-]*) is $%1
       $%/([a-zA-Z_-]*)/ {
            ;(.*) becomes $%2
            $%/$%1/(.*)/ {
                 $%2 = $%{$%1/$%2}

       In the preceding example,  every  property  group  for  an
       instance would be emitted in blocks as follows:

            prop1 = <prop1_value>
            prop2 = <prop2_value>

     IV.  $%define name /regular_expression/ { <sub_elements> }

       Example: $%define getProp //(.*)/ {dolor sit amet}

       This element follows the same basic rules as  element  II,
       but stores the element as a named regular expression  that
       can be invoked later in the stencil file.   Named  regular
       expressions are  not matched unless they are referenced as
       per element V, which immediately  follows.   Additionally,
       This element cannot be a child to any other.

     V. $%[regex_name:<transform><transform_string>]

       Example: $%[getProp:^restarter]

       This inserts  a  previously  defined  regular  expression,
       along  with all its subelements into the stencil as though
       the definition were copy and pasted.  Since the  insertion
       is  performed literally, there are some special rules that
       govern how the insertion is done in order to allow such an
       element  to  be  meaningful  at  many levels of expression
       nesting.  First of  all,  all  subexpression  indices  are
       interally  adjusted  so  that they do not collide with the
       outer regular expression context.  Second, a subset of the
       transformations   from   element  I  are  allowed.   These
       transforms operate on relative FMRIs within  the  inserted
       element.   Absolute FMRIs are left untouched.  This allows
       a stencil author to do useful things like prepend  to  the
       FMRI in order to express logical property nesting.  Here's
       an example:

       $%define PROPERTY /(.*)/ { $%1 = $%{$%1} }

       $%/([a-zA-Z_-]*)/ {

       When the insertion is done, the expression  will  function
       as follows:

       $%/([a-zA-Z_-]*)/ {
            $%/$%1/(.*)/ {
                 $%2 = $%{$%1/$%2}

       This is equivalent to the example in element III.

       It ends up this way because the rebasing during  substitu-

       This is equivalent to the example in element III.

       It ends up this way because the rebasing during  substitu-
       tion changes the $%1 to $%2, since $%1 occurs in the outer
       expression.  And as a  result  of  the  prepend  transform
       applied   during   substitution,   the  string  "$%1/"  is
       prepended to both the regular  expression  (since  regular
       expressions match FMRIs) as well as to the element of type
       II, allowing it resolve to a full  pg/property  specifier.
       The  subset  of allowed transforms is ^,',''.  Using other
       transforms is an error.

     VI. $%define macroName arg1 arg2 ... argN { <sub_elements> }

       Example: $%define defaultHost { myMachine }
                $%define getGeneral prop { $%{general/$%prop} }

       Macros provide simple text substitution  with  respect  to
       the  arguments  defined for the macro.  When called subse-
       quent to definition, the text of the sub-elements is emit-
       ted  with  the  text  of  the  arguments substituted where
       appropriate.  See the elements below for more details.

     VII. $%<argName>

       Example: $%prop

       This element emits the corresponding value passed into the
       macro that uses argName as an argument.  For example:

       $%define someMacro someArg someOtherArg {
               $%someArg = $%{pg/$%someOtherArg}

     VIII.  $%[macroName arg1 arg2 ... argN]

       Example: $%[getGeneral enabled]

       After a macro has been defined, the sub-elements  in  con-
       tains  can  be substituted into other parts of the stencil
       by using the form above.  When invoking  a  macro,  spaces
       are  used  to  delimit arguments.  In order to use a space
       within the value of an argument, it is necessary to escape
       that space with a ''.  For example, if we have the macro:

       $%define theMacro variable value {
               $%variable = $%value

       We can then use this form to substitute  that  text  else-
       where in the stencil.  For example, we can call it as fol-

       $%[theMacro ciphers elGamal\ 3DES\ AES\ Blowfish]

       And the resulting text in the output file would be:

       ciphers = elGamal 3DES AES Blowfish

     IX.  $%<method_token>

       Example: $%s

       Each of the single-character method  tokens  described  in
       smf_method(5)  are  available  in stencils.  In particular
       $%r, $%m, $%s,  $%i,  $%f,  and  $%%  are  understood  and
       expanded.   Due to the high chance of collision with macro
       variables (element VII), macro variables  have  precedence
       over method tokens when expansion occurs.  This means that

       variables (element VII), macro variables  have  precedence
       over method tokens when expansion occurs.  This means that
       if the macro variable $%someVar is encountered, it will be
       expanded  to  the value of $%someVar rather than 'service-
       nameomeVar'.  If output such  as  'service-nameomeVar'  is
       desired,  simply  escape a character in the macro variable
       as in $%s\omeVar.

     X.  Literal text

       Example: Lorem ipsum dolor sit amet, consectetur adipisic-
                ing  elit,  sed  do  eiusmod tempor incididunt ut
                labore et dolore magna aliqua.

       Literal text can be freely interspersed within the stencil
       and  is emitted  without modification.  The examples above
       make limited use of literal text.  Text appearing inside a
       regular  expression  is emitted for each match, but is not
       emitted if there are no matches.  Text  appearing  outside
       all  the  preceding  expression  types  is  emitted in all

     XI.  Comments

       Example: ;this is a comment
                     ;so is this
                \;this text will appear in the output file
                so will this, even with the ';' character

       To begin a comment, start the line with  a  ';'  character
       (not  including  whitespace).  The comment continues until
       the end of the line.  If having comments in the  resulting
       output  file  is  desired, simply escape the ';' with a ''

       ;The following example creates a 'configuration file'
       ;that lists some details of the service
       $%define author {Alice}
       $%define reviewer {Bob}

       This file  was  written  by  $%[author]  and  verified  by

       Preferences are $%{preferences/validated:+validated!}

       The following is a .ini style listing of all  the  proper-
       ties of service $%s and instance $%i:

       ;display a property in the form
       ;'   prop_name = prop_value'
       $%define display_property prop
       {\t$%prop = $%{/$%prop}\n}

       ;invokes display_property macro for each
       ;property matched
       $%define property //(.*)/ {$%[display_property $%1]}

       ;matches all property groups (lack of '/' prevents
       ;matching properties) and emits the property group
       ;name in brackets, with each property listed underneath.
       ;The expression '^$%1' means prepend all relative FMRIS
       ;in the regular expression named 'property' with the
       ;property group that satisfies this regular expression
       $%/([a-zA-Z0-9_-]*)/ {



     Suppose we have  a  service  'Foo'  with  just  the  default
     instance and the following properties:

       pg1/prop1 = val1
       pg1/prop2 = va2
       pg2/prop1 = val3 val4
       pg2/prop2 = val5
       preferences/validated = yes

     Using svcio(1) to the example  stencil  with  service  'Foo'
     would result in the following text:

       This file was written by Alice and verified by Bob

       Preferences are validated!

       The following is a .ini style listing of all  the  proper-
       ties of service Foo and instance default:

            prop1 = val1
            prop2 = val2

            prop1 = val3 val4
            prop2 = val5

            validated = yes

     It is also possible to  rewrite  the  example  stencil  more
     tersely, as shown below:

     $%define author {Alice}
     $%define reviewer {Bob}

     This  file  was  written  by  $%[author]  and  verified   by

     Preferences are $%{preferences/validated:+validated!}

     The following is a .ini style listing of all the  properties
     of service $%s and instance $%i:

     $%/([a-zA-Z0-9_-]*)/ {
     $%/$%1/(.*)/ {\t$%2 = $%{$%1/$%1}\n}

     See attributes(5) for descriptions of the  following  attri-

    |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
    | Availability                | system/core-os              |
    | Stability                   | Committed                   |

     svcio(1), sh(1), regex(5), svcs(1), svcprop(1),  svcadm(1M),
     svccfg(1M), svc.startd(1M), libscf(3LIB), smf(5)

New in IPS Documentation for Oracle Solaris 11.2

Documentation of the Image Packaging System on docs.oracle.com is in three books. All three books contain new information for the Oracle Solaris 11.2 release.

See also Tim Foster's Web Log


  • New pkg/mirror service
  • New pkg/depot service
  • New chapter about web server configuration, including a new section about configuring https access
  • New pkgrecv --clone option
  • New pkg install and pkg update troubleshooting section
  • New chapter about updating an image
  • New options for pkg subcommands:
    • -r: perform operation recursively on specified non-global zones
    • --sync-actuators: do not return until all actuators have finished
    • --ignore-missing: when updating or uninstalling, ignore packages that are not installed
  • New pkg exact-install command
  • New file attribute for setting system attributes

Copying and Creating Package Repositories in Oracle Solaris 11.2

Chapter 1, "Image Packaging System Package Repositories"
- New section about best practices

Chapter 2, "Copying IPS Package Repositories"
- Copying from a zip file (see also Release Engineering's blog) or iso file
- Using the pkgrecv command
- Using the new pkg/mirror service to automatically periodically update a repository

Chapter 3, "Providing Access To Your Repository"
- Using a ZFS share
- Using the pkg/server service

Chapter 4, "Maintaining Your Local IPS Package Repository"
- New repository update procedure
- Using the pkgrecv --clone option to clone a repository
- Using the new pkg/depot service to serve multiple repositories from a single location

Chapter 5, "Running the Depot Server Behind a Web Server"
- Caching, load balancing
- New section about configuring HTTPS repository access

Adding and Updating Software in Oracle Solaris 11.2

Chapter 1, "Introduction to the Image Packaging System"
- Incorporations and group packages, FMRIs, images

Chapter 2, "Getting Information About Software Packages"
- Packages that can be installed
- Package descriptions, licenses, dependencies, dependents
- Searching for packages

Chapter 3, "Installing and Updating Software Packages"
- New options for pkg subcommands regarding non-global zones, SMF actuators, and ignoring missing packages in a pkg update or uninstall
- New pkg exact-install command (see also Bart's blog)
- Updated information about non-global zones

Chapter 4, "Updating or Upgrading an Oracle Solaris Image"
- Ways to control the version to which to upgrade, including creating a custom incorporation package

Chapter 5, "Configuring Installed Images"
- Configuring publishers
- Variants and facets
- Freezing
- Incorporation constraints
- Mediations
- Groups

Appendix A, "Troubleshooting Package Installation and Update"
- All new - Begins with steps you should always do and then is organized by error message

Appendix B, "IPS Graphical User Interfaces"
- Package Manager and Package Update

Packaging and Delivering Software With the Image Packaging System in Oracle Solaris 11.2

Chapter 1, "IPS Design Goals, Concepts, and Terminology"
- General information about software self-assembly and package lifecycle
- Definitions, package components
- New file attribute, sysattr, for setting system attributes

Chapter 2, "Packaging Software With IPS"
- Updated procedures for publishing and delivering your package

Chapter 3, "Installing, Removing, and Updating Software Packages"
- How this works in the Image Packaging System

Chapter 4, "Specifying Package Dependencies"
- New firmware value for the fmri attribute of the origin dependency for specifying driver firmware compatibility

Chapter 5, "Allowing Variations"
- Variants and facets

Chapter 6, "Modifying Package Manifests Programmatically"
- Using pkgmogrify

Chapter 7, "Automating System Change as Part of Package Installation"
- Specifying actuators on package actions
- Delivering SMF services in IPS packages
- New or updated examples of a run-once service and a self-assembly service

Chapter 8, "Advanced Topics For Package Updating"
- Renaming, merging, splitting, obsoleting packages
- New or updated examples of preserving editable packaged content, preserving unpackaged content, sharing content across boot environments, overlaying files, and delivering a mediation

Chapter 9, "Signing IPS Packages"

Chapter 10, "Handling Non-Global Zones"

Chapter 11, "Modifying Published Packages"

Appendix A, "Classifying Packages"

Appendix B, "How IPS Is Used To Package the Oracle Solaris OS"


Saturday Apr 19, 2014

The Technical Details: April 29 Oracle Solaris 11.2 Launch

You may have already heard that we're going to hold the Oracle Solaris 11.2 launch in New York City in a few days, and that there will also be a live webcast of the event.

One of the things that the webcast will feature that won't be part of the live event will be additional technical presentations where Solaris engineers will go into more detail about some of the new features that are being added. VP for Solaris core engineering Markus Flierl gives a quick rundown:

If this sounds interesting to you, you should register now. The event starts at 1 PM ET / 10 AM PT, with Mark Hurd and John Fowler. Markus then moves on to the more technical part of the in-person event, which will then be followed by the web-only deep-dive presentations.

During the live event, we'll have engineering folks and others on Twitter, tracking hashtag #solaris (apologies in advance to Stanislaw Lem fans).

Webcast: Announcing Oracle Solaris 11.2
Tuesday April 29, 2014
1 PM (ET) / 10:00am (PT)

Friday Jan 10, 2014

Next OTN Virtual Sysadmin Day: January 28th, 2014

Glynn Foster notes that another OTN Virtual Sysadmin Day is coming up in just a couple of weeks, and talks about what's in store for the Oracle Solaris 11 track.

If you're not familiar with these, they're half-day, online, proctored hands-on labs, so you can learn more about various system administration technologies. They're also free--but you do need to register, and there's also some prep work to be done ahead of the event, so take a look at Glynn's blog post, and sign up today.


The Observatory is a blog for users of Oracle Solaris. Tune in here for tips, tricks and more as we explore the Solaris operating system from Oracle.


« August 2015