Performance Counter Generic Events
By jonh on Jan 16, 2009
In Opensolaris 2008.11 (Nevada build 100 if you want to be more precise) I introduced a sprinkling of sugar into the CPU performance counter subsystem known as Generic Events. Generic events should hopefully make this subsystem a bit easier to use for those of us that have bad memories or just don't like carting reference manuals for processors around when using the processor performance counters. First, we'll take a quick look at the problem and then look at what has been added.
Solaris has an extremely capable CPU performance counter subsystem that, not surprisingly, provides the ability to measure a whole host of interesting and useful events that occur when the processor is executing. The amount of events available varies by processor but we're talking about things such as:
- Instructions executed.
- Cycles the processor was busy.
- Data Cache misses at different levels in the cache hierarchy.
- Instruction cache misses at different levels in the cache hierarchy.
- TLB misses
- Floating point operations of various types
So, each processor allows us to keep a count of events we're interested in; just say which event(s) you want to measure and go. Sounds simple? You would think so but it's not quite as easy to use as you would think. Every platform (Niagara2, AMD, Intel core etc.) implements its own set of events that can be measured and these events are named usually according to names that are used in the hardware vendors programmers reference manuals for the processor in question. We use these event names with tools such as
cputrack(1m). As an example, let's measure a common event such as "instructions executed" on a T5140 which has 2 Niagara2 processors:
# cpustat -c Instr_cnt 1 time cpu event pic0 1.003 0 tick 25458 1.003 1 tick 13967 1.013 2 tick 956 1.023 3 tick 9240 <chop> 1.123 127 tick 627
On SPARC this event is pretty easy to identify as the name,
Instr_cnt, is intuitive. However, let's say we want to measure the same thing on other Solaris systems we have around the place with different processor types. Our event names for the commonly used "instruction count" on a sample of supported processors looks like:
|Processor||Event Name||Optional Mask|
|AMD Family 0xf/0x10||FR_retired_x86_instr_w_excp_intr|
The third column in the above table refers to an optional mask field that some events on some processors use. The mask is usually used to subdivide an event type into more specific events related to that type. For example, an an amd system we can measure floating point instructions using the
FP_dispatched_fpu_ops event which can be subdivided using a unit mask of
0x1 to measure floating point add instructions, or a unit mask of
0x2 to measure floating point multiply instructions.
Another commonly used event type is that of "Total Cycles executed". Again, for the processors cited above:
|Processor||Event Name||Optional Mask|
|AMD Family 0xf/0x10||BU_cpu_clk_unhalted|
So, on a Pentium 4 based system, to measure user mode instructions executed aslong with user mode cycles you'd have to do:
#cpustat -c instr_retired,emask=0x4 -c global_power_events,emask=0x1 1
So, how are you supposed to know what the magic values are? Well, you really need to reference the programmers reference manual for the processor you're working on. Obviously, this can become extremely tedious after a while especially if you're not working with this things every day and if you have to work with several processor types.
Generic events provide the ability to refer to a particular event type with a single name across all platforms. So, for example, cycle count can now be referred to with the generic event name of
PAPI_tot_cyc and instruction count with the
PAPI_tot_ins event. By providing abstracted names for the commonly used event types you now no longer have to have a crib sheet with all the different names and masks for a good number of events. The above Pentium 4 example now looks like:
#cpustat -c PAPI_tot_ins -c PAPI_tot_cyc 1
In fact, that is now the invocation across all our platforms to measure instructions executed and cycle count (assuming the platform in question defines both events and can also accommodate both at once!).
So, you may ask, why the
PAPI prefix? In order for us not to implement yet another naming scheme for processor events I decided to use the naming scheme defined by the PAPI project from the University of Tennessee. This project has put a good amount of thought and effort into producing a well defined naming scheme for event abstraction. If you know the PAPI work then you'll see that the Solaris generic event names equate to the PAPI single counter preset events.
Solaris now defines over 100 generic events and each platform defines whichever generic events it can map into a native platform event. Depending upon the hardware capabilities of the platform, some platforms can define a more extensive list of generic events than others. To discover the generic events that are implemented by a platform, look at the output of
cpustat -h. Here's a sample from a Niagara2 based system:
#cpustat -h Usage: cpustat [-c events] [-p period] [-nstD] [interval [count]] -c events specify processor events to be monitored -n suppress titles -p period cycle through event list periodically -s run user soaker thread for system-only events -t include %tick register -D enable debug mode -h print extended usage information Use cputrack(1) to monitor per-process statistics. CPU performance counter interface: UltraSPARC T2 event specification syntax: [picn=]
[,attr[n][= ]][,[picn=] [,attr[n][= ]],...] Generic Events: event[0-1]: PAPI_tot_ins PAPI_l1_dcm PAPI_l1_icm PAPI_l2_icm PAPI_l2_ldm PAPI_tlb_dm PAPI_tlb_im PAPI_tlb_tm PAPI_br_tkn PAPI_br_ins PAPI_ld_ins PAPI_sr_ins See generic_events(3CPC) for descriptions of these events <chop>
You'll note from this output that a new man page,
generic_events(3CPC), has been added which defines the total set of generic events and also the events that each platform defines and the underlying platform event and any mask that it maps onto.
Hopefully, generic events will have made the Solaris performance counter subsystem a bit more approachable, especially for the more common use cases. The performance counters open up a whole world of opportunities in understanding application and system behaviour but sometimes they can be slightly daunting to use and map into real world behaviour. My next project to integrate, the DTrace CPU Performance Counter provider (which really should integrate soon, honest...) will hopefully open up a whole new world of exploration in the murky depths of the CPU performance counters. Watch this space.