Wednesday Nov 25, 2009

Viewing thread activity in the Performance Analyzer

The Sun Studio Performance Analyzer is one of the two tools that I use most frequently (the other is spot - which is now in SS12U1!). It's a very powerful tool, but a lot of that power is not immediately visible to users. I'm going to discuss a couple of ways I've used the analyzer to view parallel applications.

The most common first step for looking at the performance of parallel apps is to use the timeline. However, the timeline can look a bit cluttered with all of the call stack data. Often you are really just interested in the leaf node. Fortunately this can be configured from the data presentation dialog box. To get the view I want I'm only showing the top leaf in the call stack:

This results in a display of the samples in each routine, by default this can look very colourful. You can make it easier on the eye by selecting the colours used to display the graphic. In the following graphic I've picked green for one parallel routine that I'm interested in, and blue for another, then used a yellow to colour all the time waiting for more work to be assigned:

The graphic shows that the work is not evenly spread across all threads. The first few threads spend more time in the hot routines than the later threads. We can see this much more clearly using the 'threads' view of the data. To get this view you need to go back to the data presentation dialog and select the threads tab, it's also useful to select the 'cpus' tab at the same time.

The threads tab shows the activity of each thread for the currently displayed metrics. This is useful to see if one thread is doing more work than another. The cpus tab shows time that the app spends on each CPU in the machine - this can indicate whether a particular CPU is over subscribed. The thread activity looks like:

This confirms what we thought earlier that some of the threads are much more active than other threads. The top chart shows the user time, which indicates that all the threads spent the same amount of time running 'stuff', the middle chart shows the time that each thread spent running useful work, the lower chart shows the time spent in overhead. The exercise now is to try and improve the distribution of work across the threads......

Thursday Aug 16, 2007

Presenting at Stanford HPC conference

I'll be presenting at Stanford next week as part of their HPC conference (Register here). I plan to cover:

About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs