Thursday Jan 08, 2009

CPU to core mapping

A frequently asked question among users of CMT platforms is "How do I know which CPUs share a core?". For most users, the best answer is, "don't worry about it", because Solaris does a good job of assigning software threads to CPUs and spreading them across cores such that the utilization of hardware resources is maximized. However, knowledge of the mapping is helpful to users who want to explicitly manage the assignment of threads to CPUs and cores, to squeeze out more performance, using techniques such as processor set binding and interrupt fencing.

For some processors and configurations, the core can be computed as a static function of the CPU ID, but this is not a general or easy-to-use solution. Instead, Solaris exposes this in a portable way via the "psrinfo -pv" command, as shown in this example on an M5000 server:

    % psrinfo -pv
    The physical processor has 2 cores and 4 virtual processors (0-3)
      The core has 2 virtual processors (0 1)
      The core has 2 virtual processors (2 3)
        SPARC64-VI (portid 1024 impl 0x6 ver 0x90 clock 2150 MHz)
    The physical processor has 2 cores and 4 virtual processors (40-43)
      The core has 2 virtual processors (40 41)
      The core has 2 virtual processors (42 43)
        SPARC64-VI (portid 1064 impl 0x6 ver 0x90 clock 2150 MHz)

The numbers in parenthesis are the CPU IDs, as known to Solaris and used in commands such as mpstat, psradm, etc. At this time, there are no supported programmatic interfaces to get this information.

Now for the confusing part. Unfortunately, "psrinfo -pv" only prints the core information on systems running OpenSolaris or Solaris Express, because psrinfo was enhanced by this CR:

    6316187 Need interface to determine core sharing by CPUs
which was never backported to a Solaris 10 update. I cannot predict when or whether this will be done. However, on Solaris 10, you can see core groupings using the unstable and less friendly kstat interface. Try this script, which I have named showcores:
    kstat cpu_info | \\
        egrep "cpu_info |core_id" | \\
        awk \\
            'BEGIN { printf "%4s %4s", "CPU", "core" } \\
             /module/ { printf "\\n%4s", $4 } \\
             /core_id/ { printf "%4s", $2} \\
             END { printf "\\n" }'

    % showcores
     CPU core
       0   0
       1   0
       2   2
       3   2
      40  40
      41  40
      42  42
      43  42

The core_id extracted from the kstats is arbitrary, but CPUs with the same core_id share a physical core. Beware that the name and semantics of kstats such as core_id are unstable interfaces, which means they are not documented, not supported, and are subject to change.

Monday Nov 03, 2008

Faster Firmware

What a difference firmware can make! We take it for granted, and as administrators we probably do not update our system's firmware as often as we should, but I was recently involved in a performance investigation where it made a huge difference.

On a 128 CPU T5240 server, the throughput of an application peaked around 90 processes, but declined as more processes were added, until at 128 processes the throughput was just 25% of its peak value. Classic and severe anti-scaling. The puzzling part was that the usual suspects were innocent. mpstat showed that 99% of the time was usr mode, so no kernel issues; plockstat did not show any contended userland mutexes; cpustat did not show increases in cache misses, TLB misses, or any other counter per process; and a collector/analyzer profile did not show hot atomic functions or CAS operations. It did show a marked increase in the cost of the SPARC save instruction at function entry as process count was raised. Curious.

We eventually upgraded the firmware, and the application scaled nicely up to 128 processes. If you want some advice and do not care about gory details, skip the next two paragraphs :)

It turns out that the hypervisor had a global lock that was limiting scalability, and the lock was eliminated by a firmware upgrade. Normally very little time is spent executing code in hyper-privileged mode on the Sun CMT servers. However, the hypervisor is responsible for maintaining "permanent" VA->PA mappings in the TLB. These mappings are used for the Solaris kernel nucleus, one 4MB mapping for text, and one 4MB mapping for data. Solaris cannot handle an MMU miss for these mappings, so when the processor traps to hypervisor for the miss, the hypervisor finds the mapping, stuffs it into the hardware TLB, and returns from the trap, so Solaris never sees the miss.

The above hypervisor action was protected in the old firmware by a single global lock. The application had a high TLB miss rate exceeding 200K/CPU/sec, so the permanent mappings were being continuously evicted - not an issue if we don't use the kernel much. But, the app also had a deep function stack, so it generated lots of spill/fill traps. These trap to kernel text, which is backed by a permanent mapping, which has been evicted, which causes a hypervisor trap, which hits the global lock. A perfect storm limiting scalability! Spill/fill is a lightweight trap that does not change accounting mode to sys, hence I did not see high sys time; instead, I saw high save time. In hindsight, I could have directly observed the instruction and stall cycles spent in hypervisor mode using:

# cpustat -s -c pic0=Instr_cnt,pic1=Idle_strands,hpriv,nouser 10 10

Should you care? This issue is specific to firmware in the CMT server line, and it depends on your model:

  • T5140,T5240 (2-socket 128-CPU): Definitely verify and upgrade your firmware if needed; get the latest version of patch 136936.
  • T5120,T5220 (1-socket, 64-CPU): I have not observed the scalability bottleneck on this smaller system, but you may get a small performance boost by upgrading; get the latest version of patch 136932.
  • T1000, T2000 (1-socket, 32-CPU) - probably not an issue, the system is too small.
  • T5440 (4-socket, 256-CPU): not an issue, as the first units shipped already contained a later version of the firmware containing the fix.

The CR is: 6669222 lock in mmu_miss can be eliminated to reduce contention
It was fixed in Sun System Firmware version 7.1.3.d.
To show the version of firmware installed on your system, log in to the service processor and verify you have version 7.1.3.d or later:

sc> showhost
Sun System Firmware 7.1.3.d 2008/07/11 08:55

To upgrade your firmware:

  1. Go to
  2. lick on Patches and Updates link
  3. Type the patch number in the PatchFinder Form (eg 136936 for the T5140 or T5240)
  4. Push Find Patch button
  5. Click on the "Download Patch" link near the top.
  6. Unzip the download and refer to the file for instructions

If you have never upgraded firmware before, read the documentation and be careful!


Steve Sistare


« February 2015