DProfile - Teaching Analyzer Perspectives

I've changed Dataspace Profiling to DProfile.

The application costs are broken down between execution time and memory-subsystem time in the Functions tab. You will be able to view and operate on the memory-subsystem time through all of the perspectives reviewed earlier.

Many of the perspectives in the perspectives table are built-in, such as the load object, function, PC, data object, and virtual and physical pages (8k, 64k, 512k and 4M).

All other perspectives are programmed into Performance Analyzer through the use of the .er.rc file using the expression grammar.

All of the profile perspectives of the machine are created by these expressions. This blog will cover the machine-independent and machine-dependent human readable expressions. Future entries will include tools that create expressions as well.

Time

Performance Analyzer has the Timeline view. A more simple view is the Seconds and Minutes perspectives. These provide a breakdown of memory sub-system time in seconds or minutes. By selecting the column heading, you change the sort order of the object. By selecting "Max Mem Time", you will order by most costly to less costly time intervals. By selecting Name, you will order in time series. Selecting the Graphical radio button gives you an insightful graphical view of your application through time.

en_desc on
mobj_define Seconds (TSTAMP/1000000000)
mobj_define Minutes (TSTAMP/60000000000)

The first line in the .er.rc file will instructs the engine behind Performance Analyzer to analyze the entire process tree (all descendants) created by the collect command.

The second line defines Seconds from the collected time stamps.

The third line defines Minutes from the collected time stamps.

Software Execution

Threads

An application may span multiple processes and contain many threads. There are two useful perspectives (objects) that are useful in analysis: the Thread and the ThreadID.

Either represent the Software Execution object within our application. The Thread is a unique identifier across the application; while the ThreadID is a unique identifier only within the each Process of the application.

mobj_define Thread (PID\*1000)+THRID
mobj_define ThreadID THRID

The application will allocate memory in a Process, through the use of virtual memory. Solaris will allocate and map physical memory for this virtual memory.

mobj_define Vaddr VADDR
mobj_define Paddr PADDR
mobj_define Process PID

Since we're announcing the Sun Fire CoolThreads Servers using the UltraSPARC T1 processor, here are the hardware-specific Sun Fire CoolThreads Server formula you add to your .er.rc file to identify cache hierarchy and hardware objects in the system:

mobj_define UST1_Bank (PADDR&0xc0)>>6
mobj_define UST1_L2CacheLine (PADDR&0x3ffc0)>>6
mobj_define UST1_L1DataCacheLine (PADDR&0x7f0)>>4
mobj_define UST1_Strand (CPUID)
mobj_define UST1_Core (CPUID&0x1c)>>2
mobj_define VA_L2 VADDR>>6
mobj_define VA_L1 VADDR>>4
mobj_define PA_L2 PADDR>>6
mobj_define PA_L1 PADDR>>4
mobj_define Vpage_256M VADDR>>28
mobj_define Ppage_256M PADDR>>28

Niagara has L2 Cache and physical memory grouped by UST1_Bank. Based on Paddr, a Bank is selected; then accesses will be serviced by the portion of the L2 cache in that bank. If a reference misses the L2 Cache, the memory controller associated with that Bank will service the miss.

UST1_L2CacheLine return the unique identifier for any L2 Cache Line in a UltraSPARC_T1 processor.

UST1_L1DataCacheLine returns the unique identifier for an L1 Cache Line within one UST1_Core. This does not return the unique identifier for a L1 Cache Line within a UST1_Processor.

UST1_Strand returns the identifier for each UltraSPARC_T1 Virtual Processor.

UST1_Core returns the identifier for the UltraSPARC_T1 Execution Core.

VA_L2 returns the identifer for Virtual Memory grouped in UltraSPARC_T1 L2CacheLine-sized chunks.

VA_L1 returns the identifer for Virtual Memory grouped in UltraSPARC_T1 L1DataCacheLine-sized chunks.

The previous two formula are useful to relate Hardware View cache line costs, back to Program Address Space. You'll filter on a hardware object, and relate it back to the virtual memory allocations for that hardware object.

PA_L2 and PA_L1 provide similar grouping in Physical Memory Paddr. These formula are useful in relating Hardware View cache lines costs, back to Solaris physical address allocations.

I'll show you how to use these formula in my next entry.

[ T: ]

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

nk

Search

Top Tags
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today