Tuesday Jan 09, 2007

Understanding UltraSPARC-T1 performance counters

There are two performance counters on the UltraSPARC-T1 processor. The first can be programmed to collect one of eight different event types. The second is hardcoded to always count instructions. Each event type has a cost associated with it, this is typically the number of cycles where the processor will stall before issuing more instructions from that thread. The counters are described in the following table, together with order of magnitude estimates for the costs of each event:

CounterCommentCost in cycles
SB_fullCycles when store buffer is full1
FP_Instr_cntFloating point instruction count30
IC_missInstruction cache miss20
DC_missData cache miss20
ITLB_missInstruction TLB miss100
DTLB_missData TLB miss100
L2_imissInstruction fetches that miss L2 cache100
L2_dmiss_ldLoads that miss L2 cache100
Inst_cntInstruction count1

The interpretation of these counters is different for the UltraSPARC-T1 than for previous generations of processors. When interpreting the results it is very important to recognise that the processor is able to handle many more cycles of stall. The reasoning is as follows:

Each processor has eight cores, each core executes four threads. Each thread can issue one instruction per cycle. This means that every cycle, three threads cannot execute an instruction, these threads can either be stalled, or waiting for the opportunity to issue an instruction.

Assume that the processor is clocked at 1.2GHz. This results in a processor that can issue 9.6 billion instructions per second. For each one of those instructions, there were three threads that could either be stalled, or waiting to issue an instruction.

Hence there's a budget of 9.6 billion instructions per second. The Instr_cnt performance counter shows how much of this budget was actually used. This is a measure of the utilisation of the processor.

There is also a budget of 9.6 \* 3 = 28.8 billion 'stall' cycles; cycles where the other threads can be stalled. All the hardware counter stall events come from this budget.

In previous processors, a high stall time would indicate a performance issue with the application; but on the UltraSPARC-T1 that is no longer the case. Until the number of cycles spent stalled exceeds the 'stall' budget, there may be no impact on the performance of the application.

About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs