Saturday Jan 31, 2015

Multi-CPU Binding (MCB)

I want to tell everyone about the cool, new Multi-CPU Binding API introduced in Solaris 11.2.  Bo Li and I wrote up something that explains what it does, its benefits, and how it is used in Solaris along with examples of how to use it:

INTRODUCTION

Multi-CPU Binding (MCB) is new functionality that was added to Solaris 11.2 and is available through a new API called "processor_affinity(2)" and through the pbind(1M) command line tool.  MCB provides similar functionality to processor_bind(2), but can do much more than processor_bind(2):

  1. Bind specified threads to one or more CPUs, leaf locality groups (lgroups)*, or Processor Groups (PGs)**.

  2. Specify strong or weak affinity to CPUs where:

    • Strong affinity means that the threads must only run on the specified CPUs

    •  Weak affinity means that the threads should always prefer to run on the specified CPUs but will run on the closest available CPU where they have sufficient priority to run soonest when the desired CPUs are running higher priority threads

  3. Specify positive or negative affinity for CPUs (ie. want to run or avoid running on specified CPUs)

  4. Enable or disable inheritance across fork(2), exec(2), and/or thr_create(3C).

  5. Query affinities of specified threads to CPUs, PGs, or lgroups.

* lgroups are the Solaris abstraction for telling which CPUs, memory, and I/O devices are within some latency of each other in a Non Uniform Memory Access (NUMA) machine

** PGs are the Solaris abstraction for performance relevant processor sharing relationships in CMT processors (eg. shared execution pipeline, FPU, cache, etc.)

BENEFITS

Overall, MCB is more powerful and flexible than what was available in Solaris for affining threads to CPUs before MCB.

Before MCB, you could only do one or more of the following to affine a thread to one or more CPUs:

  • Bind one or more threads to one CPU and have this binding always be inherited across fork(2) and exec(2)
  • Set one or more thread's affinity for a locality group (lgroup) which is the Solaris abstraction for the CPUs, memory, and I/O devices within some latency of each other in a Non Uniform Memory Acess (NUMA) machine
  • Create an exclusive set of CPUs that can only run threads assigned to it, bind one or more threads to this processor set, and always have this processor set binding inherited across fork(2) and exec(2).

In contrast to the old functionality above, MCB has the following new functionality and benefits:

  1. Can bind to more than one CPU
    • The biggest benefit of MCB is that you can affine one or more threads to any set of CPUs that you want.  With this ability, you can bind threads to a NUMA node, processor chip, core, the CPUs sharing some performance relevant hardware component (eg. execution pipeline, FPU, cache, etc.), or an arbitrary set of CPUs.
    • Using a processor set is a way to affine a thread to a set of CPUs like MCB.  However, processor sets are exclusive so only threads assigned to the processor set can run on the CPUs in the processor set.  In contrast, MCB does not set aside CPUs for exclusive use by threads affined to those CPUs by MCB.  Hence, a thread having an MCB affinity for some CPUs does not prevent any other threads from running on those CPUs.
  2. More affinities
    • Having a positive and negative affinity to specify whether to run on or avoid the specified CPUs is a new feature that wasn't offered in the previous APIs for binding threads to CPUs
    • Being able to specify a strong or weak affinity is new for binding threads to CPUs, but isn't a completely new idea in Solaris.  The lgroup affinities already have the notion of strong and weak affinity.  The semantics are pretty different though.  The lgroup affinities mostly affect the order of preference for a thread's home lgroup.  In contrast, MCB strong and weak affinity affect where a thread must run or should prefer to run.  MCB affinities can cause the home lgroup of the thread to change to an lgroup that at least contains some of the specified CPUs, but it does not change the order of preference of home lgroups for the thread.
  3. More flexibility with inheritance
    • MCB has more flexibility with setting the inheritance of the MCB CPU affinities across fork(2),exec(2), or thr_create(3C).  It allows you to enable or disable inheritance of its CPU affinities separately across fork(2), exec(2), or thr_create(3C).

In contrast, the pre-existing APIs for binding threads to a CPU or a processor set make the bindings always be inherited across fork(2), exec(2), and thr_create(3C) so you can never disable any of the inheritance.  With lgroup affinities, you can enable or disable inheritance for fork(2), exec(2), and thr_create(3C), but you must enable or disable inheritance across all or none of these operations.

How is MCB used in Solaris?

Solaris optimizes performance for I/O on Non Uniform Memory Access (NUMA) machines where some I/O devices are closer to some CPUs and memory than others.  Part of what Solaris does for its NUMA I/O optimizations is place kernel I/O helper threads that help usher I/O from the application to the I/O device and vice versa near the I/O device.

Before Solaris 11.2, Solaris would bind each I/O helper thread to one CPU near its corresponding I/O device.  Unfortunately, this can cause some performance issues when the CPU where the I/O helper thread is bound becomes very busy running higher priority threads or handling interrupts.  Since the I/O helper thread is bound to just one CPU, it can only run on that one CPU, isn't allowed to run on any other CPU, and can have to wait a long time to run.  This can cause I/O performance to go down because the I/O will take longer to process.

In S11.2, MCB is used to overcome this problem by affining each I/O helper thread to one or more processor cores.  This gives the I/O helper threads more places to run and reduces the chance that they get stuck on a very busy CPU.  Also, MCB weak affinity can be used to specify that the I/O helper threads prefer to run on the specified CPUs but it is ok to run them on the closest available CPUs if the specified CPUs are too busy.

Tool

pbind(1M)

pbind(1M) is an existing tool to control and query the bindings of processes or LWPs to a CPU and has been modified to support affining threads to more than one CPU.

When specifying target CPUs, the user could directly use their processor IDs or indirectly use their Processor Group (PG) or Locality Group (lgroup) ID.

Bind processes/LWPs

Below are the equivalent ways of binding process 101048 to CPU 1. By default, the binding target type is CPU and, idtype is pid and binding affinity is strong:

    # pbind -b 1 101048

    pbind(1M): pid 101048 strongly bound to processor(s) 1.

    # pbind -b -c 1 101048

    pbind(1M): pid 101048 strongly bound to processor(s) 1.

    # pbind -b -c 1 -i pid 101048

    pbind(1M): pid 101048 strongly bound to processor(s) 1.

    # pbind -b -c 1 -s -i pid 101048

    pbind(1M): pid 101048 strongly bound to processor(s) 1.

Bind processes/LWPs to CPUs specified by Processor Group or Locality Group

    Binding process 101048 to the CPUs in Processor Group 1:

    # pbind -b -g 1 101048

    pbind(1M): pid 101048 strongly bound to Processor Group(s) 1

    Binding process 101048 to the CPUs in Locality Group 2:

    # pbind -b -l 2 101048

    pbind(1M): pid 101048 strongly bound to Locality Group(s) 0 2.

Weak binding

    # pbind -b 2 -w 101048

    pbind(1M): pid 101048 weakly bound to processor(s) 2.

Negative binding targets

    Weakly binding process 101048 to all CPUs but the ones in Processor Group 1:

    # pbind -b -g 1 -n -w 101048

    pbind(1M): pid 101048 weakly bound to Processor Group(s) 2.

Binding LWPs

When the user binds a process the specified CPUs, all the LWPs belonging to that process will be automatically bound to those CPUs. The user may also bind LWPs in the same process individually. LWPs range could be specified after ‘/’ and separated by comma.

    Strongly binding LWP 2, 3, 4 of process 101048 to CPU 2:

    # pbind -b -c 2 -i pid 116936/2-3,4

    pbind(1M): LWP 116936/2 strongly bound to processor(s) 2.

    pbind(1M): LWP 116936/3 strongly bound to processor(s) 2.

    pbind(1M): LWP 116936/4 strongly bound to processor(s) 2.

Query processes/LWPs binding

When querying for bindings of specific LWPs, the user may request that the resulting set of CPUs be identified through their IDs, the Processor Groups or the Locality Groups that contain them:

    # pbind -q 101048

    pbind(1M): pid 101048 weakly bound to processor(s) 2 3.

    # pbind -q -g 101048

    pbind(1M): pid 101048 weakly bound to Processor Group(s) 2.

    # pbind -q -l 101048

    pbind(1M): pid 101048 weakly bound to Locality Group(s) 0 2.

The user may also query all bindings for a specified CPU

    # pbind -Q 2

    pbind(1M): LWP 101048/1 weakly bound to processor(s) 2 3.

    pbind(1M): LWP 102122/1 weakly bound to processor(s) 2 3.

Binding Inheritance

By default, bindings are inherited across exec(2), fork(2) and thr_create(3C), but inheritance across any of these can be disabled.  For example, the user could bind a shell process to a set of CPUs and specify the binding is not inherited in fork(2).  In this way, all processes created by this shell will not be bound to any CPUs.

    Bind processes/LWPs but request binding not inherited across fork(2):

    # pbind -b -c 2 -f 101048                      

    pbind(1M): pid 101048 strongly bound to processor(s) 2.

Explanation of return value is commented in the manpage. For more details, please refer to manpage of pbind(1M).

APIs

processor_affinity(2)

MCB introduces a new processor_affinity(2) system call to control and query the affinity to CPUs for processes or LWPs.

    int processor_affinity(procset_t *ps, uint_t *nids, id_t *ids, uint32_t *flags);

Each option and flag used in pbind(1M) could be directly mapped to processor_affinity(2).  Similarly, the user may request the binding to be either strong or weak by specifying flag PA_AFF_STRONG or PA_AFF_WEAK.  The target CPUs could be specified by their processor IDs, Processor Group (PG) or Locality Group (lgroup) ID when used with corresponding flag PA_TYPE_CPU, PA_TYPE_PG, or PA_TYPE_LGRP.

The ps argument identifies to which LWP(s) that the call should be applied through a procset structure (see procset.h(3HEAD) for details).  The flags argument must contain valid combinations of the options given in the manpage.

When setting affinities, the nids argument points to a memory position holding the number of CPU, PG or LGRP identifiers to which affinity is being set, and ids points to an array with the identifiers.  Only one type of affinity must be specified along with one affinity strength.  Negative affinity is a type modifier that indicates that the given IDs should be avoided and affinity of the specified type should be set to all of the other processors in the system.

When specifying multiple LWPs, the threads should all be bound to the same processor set since they can be affined to CPUs in their processor set.  Additionally, setting affinities will succeed if processor_affinity(2) is able to set a LWP's affinity for any of the specified CPUs even if a subset of the specified CPUs are are invalid, offline, or faulted.

Setting strong affinity for CPUs [0-3] to the current LWP:

    #include <sys/processor.h>

    #include <sys/procset.h>

    #include <thread.h>

    procset_t ps;

    uint_t nids = 4;

    id_t ids[4] = { 0, 1, 2, 3 };

    uint32_t flags = PA_TYPE_CPU | PA_AFF_STRONG;

    setprocset(&ps, POP_AND, P_PID, P_MYID, P_LWPID, thr_self());

    if (processor_affinity(&ps, &nids, ids, &flags) != 0) {

        fprintf(stderr, "Error setting affinity.\n");

        perror(NULL);

    }

Setting weak affinity for CPUs in Processor Group 3 and 7 to process 300's LWP 2:

    #include <sys/processor.h>

    #include <sys/procset.h>

    #include <thread.h>

    procset_t ps;

    uint_t nids = 4;

    id_t ids[4] = { 3, 7 };

    uint32_t flags = PA_TYPE_PG | PA_AFF_WEAK;

    setprocset(&ps, POP_AND, P_PID, 300, P_LWPID, 2);

    if (processor_affinity(&ps, &nids, ids, &flags) != 0) {

        fprintf(stderr, "Error setting affinity.\n");

        perror(NULL);

    }

Upon a successful query, nids will contain the number of CPUs, PGs or LGRPs for which the specified LWP(s) has affinity.  If ids is not NULL, processor_affinity(2) will store the IDs of the indicated type up to the initial nids value.  Additionally, flags will return the affinity strength and whether any type of inheritance is excluded.

When querying affinities, PA_TYPE_CPU, PA_TYPE_PG or PA_TYPE_LGRP may be specified to indicate that the returned identifiers must be either be the CPUs, Processor Groups, or Locality Groups that contain the processors for which the specified LWPs have affinity.  If no type is specified, the interface defaults to CPUs.

Querying and printing affinities for the current LWP:

    #include <sys/processor.h>

    #include <sys/procset.h>

    #include <thread.h>

    procset_t ps;

    uint_t nids;

    id_t *ids;

    uint32_t flags = PA_QUERY;

    int i;

    setprocset(&ps, POP_AND, P_PID, P_MYID, P_LWPID, thr_self());

    if (processor_affinity(&ps, &nids, NULL, &flags) != 0) {

        fprintf(stderr, "Error querying number of ids.\n");

        perror(NULL);

    } else {

        fprintf(stderr, "LWP %d has affinity for %d CPUs.\n",

            thr_self(), nids);

    }

    flags = PA_QUERY;

    ids = calloc(nids, sizeof (id_t));

    if (processor_affinity(&ps, &nids, ids, &flags) != 0) {

        fprintf(stderr, "Error querying ids.\n");

        perror(NULL);

    }

    if (nids == 0)

        printf("Current LWP has no affinity set.\n");

    else

        printf("Current LWP has affinity for the following CPU(s):\n");

    for (i = 0; i < nids; i++)

        printf(" %u", ids[i]);

    printf("\n");

When clearing affinities, the caller can either specify a set of LWPs that should have their affinities revoked (through the ps argument) or none or specify a list of CPU, PG or LGRP identifiers for which all affinities must be cleared.  See EXAMPLES below for details.

Clearing all affinities for CPUs 5 and 7:

    #include <sys/processor.h>

    #include <sys/procset.h>

    #include <thread.h>

    uint_t nids = 2;

    id_t ids[4] = { 5, 7 };

    uint32_t flags = PA_CLEAR | PA_TYPE_CPU;

    if (processor_affinity(NULL, &nids, ids, &flags) != 0) {

        fprintf(stderr, "Error clearing affinity.\n");

        perror(NULL);

    }

Explanation of return value is commented in the manpage. For more details, please refer to manpage of processor_affinity(2).

processor_bind(2)

The processor_bind(2) binds processes/LWPs to a single CPU.  The interface remains the same as early Solaris version, but its implementation changes significantly to use MCB.  The processor_bind(2) and processor_affinity(2) are implemented the same way only differing in the limitations imposed by the number and types of arguments each accepts.  The calls to processor_bind(2) are essentially calls to processor_affinity(2) which only allow setting and querying binding to a single CPU at a time.

    int processor_bind(idtype_t idtype, id_t id, processorid_t new_binding, processorid_t *old_binding);

This function binds the LWP (lightweight process) or set of LWPs specified by idtype and id to the processor specified by new_binding. If old_binding is not NULL, it will contain the previous binding of one of the specified LWPs, or PBIND_NONE if none were previously bound.

For more details, please refer to the manpage of processor_bind(2).


Monday Jun 30, 2008

Results of Power Management

I've put Gregg's blog on Power Management to use, and here are my impressive results.

[Read More]

Wednesday Jun 25, 2008

Some Power Management

There is some limited support for power management in OpenSolaris 2008.05, but by default it is not enabled (hopefully that will change in a future release).  There is an entry on the BigAdmin wiki page that describes the supported CPUs (from both Intel and AMD) and the simple steps for modifying the /etc/power.conf file.  But this blog entry from Mark Haywood provides quite a bit more information.  In it, he describes the CPU frequency scaling feature in detail. The feature slows the CPU when idle, so that the chip uses less power and generates less heat.

To follow his instructions, all I had to do was enter this at a command line:

   $ pfexec gedit /etc/power.conf

I added two lines to the end of /etc/power.conf:

   cpupm enable
   cpu-threshold 15s

Saved the file and then ran this command:

   $ pmconfig

And now OpenSolaris 2008.05 is happily slowing down my CPU when it is idle. 

About

The Observatory is a blog for users of Oracle Solaris. Tune in here for tips, tricks and more as we explore the Solaris operating system from Oracle.

Search

Archives
« April 2015
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today