Some time ago I published an article on monitoring DAX activity on SPARC M8 (and to a certain degree the whole article also applies to SPARC M7 or S7) This article better had preceded the article on the DAX as it targets the SPARC M8 core.
The goal is to give you a better understanding of the plethora of performance relevant events that a SPARC M8 CPU can measure or count. Like all SPARC CPUs (and Intel CPU have similar event counters) SPARC M8 provides a lot of information about what is happening at any stage of the processing of code. Most of the counters require an in-depth knowledge of how the pipeline works, but some are of general interest to anyone who analyzes the performance of a workload on SPARC M8.
Side note: there is also a huge amount of counters that monitor other system components like memory or I/O controllers (the DAX for example is monitored in that context), none of these will be covered in this article
There is a class of "general" purpose events called PAPI_ events, and some are implemented in Solaris' cpustat command. It is important to note that SPARC M8 can count or monitor up to four different events at the same time:
cpustat -c pic0=<event>,pic1=<event>,pic2=<event>,pic3=<event> \
interval [count]
(You do not need specify all four, and of course they all can be different)
Here's my personal list of most important events:
Event Name | Meaning |
---|---|
PAPI_tot_ins, PAPI_tot_cyc | Total instructions/cycles (indicates overall load on pipeline) |
PAPI_fp_ops, PAPI_fp_ins | Total FP ops, FP instructions (amount of FP processing) |
PAPI_ld_ins, PAPI_sr_ins | Total load / store instructions (indicates amount of memory I/O caused by the code directly) |
PAPI_l1_dcm, PAPI_l1_icm | L1D-cache misses or L1I-cache misses (if high, try to shrink working set) |
PAPI_tlb_dm, PAPI_tlb_im | TLB misses, if high try to increase page size |
Let's look at an example output, in general you need to assume the root role to get access to these counters:
root:~# cpustat -A cor –c pic0=PAPI_tot_ins,pic1=PAPI_fp_ops,pic2=PAPI_ld_ins,pic3=PAPI_sr_ins 10 time cor event pic0 pic1 pic2 pic3 10.007 0 tick 3576027 27050 612594 243438 10.008 1 tick 195732982 157958 32929437 24920092 10.029 2 tick 9129635998 28830441 259474445 1479961711 10.007 3 tick 67177212 436941 11214285 4594823 10.008 4 tick 1557865224 263392 212558823 56450586 10.008 5 tick 23716960 49871 3503560 1101415 10.007 6 tick 6420085 5978 975611 274779 10.008 7 tick 109398847 1233259 16217418 7222838
(This example also shows you how to aggregate events per core, which in general is a good idea on the highly threaded SPARC M8 core. All eight threads on the core share the cores ressources)
If you are only interested in the degree of saturation of the integer execution units of SPARC M8 the command pgstat would be all you need
root@t8-2cap-80tn:~# pgstat -v -B core 10 ID RELATIONSHIP HW UTIL CAP SW USR SYS IDLE CPUS 0 Core (Software) - - - 0.0% 0.0% 0.0% 100.0% 0-7 0 Core (Floating_Point_Unit) 0.0% 13K 5.1B - - - - 0-7 0 Core (Integer_Pipeline) 0.0% 7.5M 20B - - - - 0-7 1 Core (Software) - - - 0.0% 0.0% 0.0% 100.0% 8-15 1 Core (Integer_Pipeline) 0.0% 3.9M 20B - - - - 8-15 1 Core (Floating_Point_Unit) 0.0% 2.1K 5.1B - - - - 8-15 2 Core (Software) - - - 12.5% 6.1% 6.4% 87.5% 16-23 2 Core (Integer_Pipeline) 16.0% 3.2B 20B - - - - 16-23 2 Core (Floating_Point_Unit) 0.1% 2.9M 5.1B - - - - 16-23 3 Core (Software) - - - 0.1% 0.1% 0.0% 99.9% 24-31 3 Core (Floating_Point_Unit) 0.0% 79K 5.1B - - - - 24-31 3 Core (Integer_Pipeline) 0.2% 39M 20B - - - - 24-31 4 Core (Software) - - - 0.0% 0.0% 0.0% 100.0% 32-39 4 Core (Floating_Point_Unit) 0.0% 7.7K 5.1B - - - - 32-39 4 Core (Integer_Pipeline) 0.1% 10M 20B - - - - 32-39
The column entitles HW shows you the ratio of processed intructions to the theoretical maximum. Anything beyond 60% can be considered close to overloading this particular core.
If you invoke cpustat -h you would be swamped with events SPARC M8 could count, nearly all of them are only of interest to the developers of this CPUs (and this is not restricted to the SPARC M8 CPU)
If you want to dig deeper and use the non-generic events you might be able to figure out their meaning from their names, but there is no publicly accessible documentation of these events.
Solaris WebUI, the graphical interface to Solaris' StatsStore, does visualize some of the statistics presented above. The intro diagram shows you a graphical representation of the amount of "integer" and floating point instructions, and these are the only ones visualized via WebUI. The corresponding SSIDs (the unique identifiers of a certain statistic in Solaris' StatsStore) would be //:class.cpu//:stat.integer-pipe-usage//:op.rate//:op.util and //:class.cpu//:stat.fpu-usage//:op.rate (pls.note the different aggregations used, percentage of max. for the integer load and total number of instructions in the floating point case)