Friday Sep 05, 2014

Fun with signal handlers

I recently had a couple of projects where I needed to write some signal handling code. I figured it would be helpful to write up a short article on my experiences.

The article contains two examples. The first is using a timer to write a simple profiler for an application - so you can find out what code is currently being executed. The second is potentially more esoteric - handling illegal instructions. This is probably worth explaining a bit.

When a SPARC processor hits an instruction that it does not understand, it traps. You typically see this if an application has gone off into the weeds and started executing the data segment or something. However, you can use this feature for doing something whenever the processor encounters an illegal instruction. If it's a valid instruction that isn't available on the processor, you could write emulation code. Or you could use it as a kind of break point that you insert into the code. Or you could use it to make up your own instruction set. That bit's left as an exercise for you. The article provides the template of how to do it.

Wednesday May 18, 2011

Profiling running applications

Sometimes you want to profile an application, but you either want to profile it after it has started running, or you want to profile it for part of a run. There are a couple of approaches that enable you to do this.

If you want to profile a running application, then there is the option (-P <pid>) for collect to attach to a PID:

$ collect -P <pid>

Behind the scenes this generates a script and passes the script to dbx, which attaches to the process, starts profiling, and then stops profiling after about 5 minutes. If your application is sensitive to being stopped for dbx to attach, then this is not the best way to go. The alternative approach is to start the application under collect, then collect the profile over the period of interest.

The flag -y <signal> will run the application under collect, but collect will not gather any data until profiling is enabled by sending the selected signal to the application. Here's an example of doing this:

First of all we need an application that runs for a bit of time. Since the compiler doesn't optimise out floating point operations unless the flag -fsimple is used, we can quickly write an app that spends a long time doing nothing:

$ more slow.c
int main()
  double d=0.0;
  for (int i=0;i<10000000000; i++) {d+=d;}

$ cc -g slow.c

The next step is to run the application under collect with the option -y SIGUSR1 to indicate that collect should not start collecting data until it receives the signal USR1.

$ collect -y SIGUSR1 ./a.out &
[1] 1187
Creating experiment database ...

If we look at the generated experiment we can see that it exists, but it contains no data.

$ er_print -func
Functions sorted by metric: Exclusive User CPU Time

Excl.     Incl.      Name
User CPU  User CPU
 sec.      sec.
0.        0.         

To start gathering data we send SIGUSR1 to the application, sending the signal again stops data collection. Sending the signal twice we can collect two seconds of data:

$ kill -SIGUSR1 1187;sleep 2;kill -SIGUSR1 1187
$ er_print -func
Functions sorted by metric: Exclusive User CPU Time

Excl.     Incl.      Name
User CPU  User CPU
 sec.      sec.
2.001     2.001      
2.001     2.001      main
0.        2.001      _start

Friday Oct 09, 2009

Webcast: Improving the performance of parallel codes using the Performance Analyzer

Earlier in the summer I recorded a slidecast on using the Performance Analyzer on parallel codes, it's just come out on the HPC portal.

Monday Sep 21, 2009

Profiling scripts

If you try to use the Sun Studio Performance Analyzer on something that is not an executable, you'll end up with an error message:

$ collect kstat
Target `kstat' is not a valid ELF executable

The most reliable workaround for this that I've discovered is as follows. First of all make up shell script that executes the command passed into it:

$ more

Then run the collect command as:

$ collect -F on /bin/sh <script> <params>

The -F on is required so that collect follows forked processes, otherwise collect will just profile the top /bin/sh which will do minimal work before forking off the actual command.

When loading the resulting experiment into the Analyzer you have to load all the descendant processes. You can do this by going to the filter dialog box and selecting all the processes, or you can take the easier route of placing en_desc on into your .er.rc file in your home directory (this will tell the analyzer to always load the descendant processes, which might make loading experiments slower, but will guarantee that you actually load all the data, and not just the top-level code).

One other thing to note is that each new process can contribute wall and wait time, so the wall time shown in the analyzer can be a multiple of the actual wall time. To see this in action do:

$ collect -F on /bin/sh kstat

The wall time on this will be a multiple of the actual runtime because each shell script contributes wall time while it waits for the kstat command to complete.

Friday Mar 20, 2009

University of Washington Presentation

I was presenting at the University of Washington, Seattle, on Wednesday on Solaris and Sun Studio. The talk covers the tools that are available in Solaris and Sun Studio. This is my "Grand Unified" presentation, it covers tools, compilers, optimisation, parallelisation, and debug.


Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge
Free Download


« February 2015
The Developer's Edge
Solaris Application Programming
OpenSPARC Book
Multicore Application Programming