Friday May 23, 2014

Generic hardware counter events

A while back, Solaris introduced support for PAPI - which is probably as close as we can get to a de-facto standard for performance counter naming. For performance counter geeks like me, this is not quite enough information, I actually want to know the names of the raw counters used. Fortunately this is provided in the generic_events man page:

$ man generic_events
Reformatting page.  Please Wait... done

CPU Performance Counters Library Functions   generic_events(3CPC)

NAME
     generic_events - generic performance counter events

DESCRIPTION
     The Solaris  cpc(3CPC)  subsystem  implements  a  number  of
     predefined, generic performance counter events. Each generic
...
   Intel Pentium Pro/II/III Processor
       Generic Event          Platform Event          Event Mask
     _____________________________________________________________
     PAPI_ca_shr          l2_ifetch                 0xf
     PAPI_ca_cln          bus_tran_rfo              0x0
     PAPI_ca_itv          bus_tran_inval            0x0
     PAPI_tlb_im          itlb_miss                 0x0
     PAPI_btac_m          btb_misses                0x0
     PAPI_hw_int          hw_int_rx                 0x0
...

Monday Oct 19, 2009

Fishing with cputrack

I'm a great fan of the hardware performance counters that you find on most processors. Often you can look at the profile and instantly identify what the issue is. Sometimes though, it is not obvious, and that's where the performance counters can really help out.

I was looking at one such issue last week, the performance of the application was showing some variation, and it wasn't immediately obvious what the issue was. The usual suspects in these cases are:

  • Excessive system time
  • Process migration
  • Memory placement
  • Page size
  • etc.

Unfortunately, none of these seemed to explain the issue. So I hacked together the following script cputrackall which ran the test code under cputrack for all the possible performance counters. Dumped the output into a spreadsheet, and compared the fast and slow runs of the app. This is something of a "fishing trip" script, just gathering as much data as possible in the hope that something leaps out, but sometimes that's exactly what's needed. I regularly get to sit in front of a new chip before the tools like ripc have been ported, and in those situations the easiest thing to do is to look for hardware counter events that might explain the runtime performance. In this particular instance, it helped me to confirm my suspicion that there was a difference in branch misprediction rates that was causing the issue.

Thursday Apr 19, 2007

Summing cputrack output

Here's a quick awk script to sum cputrack output:

cputrack -o cputrack.log -c Instr_cnt,DTLB_miss sleep 2
awk -s '/tick/{t1+=$4;t2+=$5;}END{print t1,t2}' cputrack.log

Wednesday Apr 18, 2007

Detecting data TLB misses

There are a couple of easy ways that an application can be tested for whether it is encountering DTLB misses.

  • One way is to use the command trapstat, this command requires administrator privileges to run and either reports trap activity on a system-wide basis, or can be used to follow the traps that a single process encounters
  • The second way is to use cputrack to track the events recorded by the hardware performance counters on the processor. The particular counters will depend on the hardware. An example using an UltraSPARC III is:
cputrack -c pic0=Instr_cnt,pic1=DTLB_miss -p <pid>
About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge
Free Download

Search

Categories
Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
6
7
8
9
10
11
12
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs