Thursday Apr 28, 2011

Catching the macro bug

I have to admit a dislike for macros. I've seen plenty of codes where it has been a Herculean task to figure out exactly what source code generated the particular assembly code. So perhaps I'm biased to begin with. However, I recently hit another annoyance with macros. The following code looks pretty benign:

#include <stdio.h>
#include <sys/time.h>

int timercmp(struct timeval \*end, struct timeval \*begin,struct timeval \*result)

However, at compile time it produces the following error.

cc error.c
"error.c", line 4: syntax error before or at: struct
"error.c", line 4: syntax error before or at: )
"error.c", line 4: warning: old-style declaration or incorrect type for: tv_sec
"error.c", line 4: syntax error before or at: )
"error.c", line 4: warning: old-style declaration or incorrect type for: tv_sec
"error.c", line 4: syntax error before or at: )
"error.c", line 4: warning: old-style declaration or incorrect type for: tv_usec
"error.c", line 4: syntax error before or at: ->
"error.c", line 4: warning: old-style declaration or incorrect type for: tv_usec
"error.c", line 4: syntax error before or at: )
"error.c", line 4: warning: old-style declaration or incorrect type for: tv_sec
"error.c", line 4: identifier redefined: result
        current : function(pointer to struct timeval {long tv_sec, long tv_usec}) returning pointer to struct timeval {long tv_sec, long tv_usec}
        previous: function(pointer to struct timeval {long tv_sec, long tv_usec}) returning pointer to struct timeval {long tv_sec, long tv_usec} : "error.c", line 4
"error.c", line 4: syntax error before or at: ->
"error.c", line 4: warning: old-style declaration or incorrect type for: tv_sec
cc: acomp failed for error.c

The C++ compiler produces fewer errors:

 CC error.c
"error.c", line 4: Error: No direct declarator preceding "(".
1 Error(s) detected.

Of course, the problem is that timercmp is a macro defined in sys/time.h. This is revealed when the preprocessed source is examined:

$ cc -P error.c
$ tail error.i

int  ( ( ( struct timeval \* end ) -> tv_sec == ( struct timeval \* begin ) -> tv_sec ) ? ( ( struct timeval \* end ) -> tv_usec struct timeval \* result ( struct timeval \* begin ) -> tv_usec ) : ( ( struct timeval \* end ) -> tv_sec struct timeval \* result ( struct timeval \* begin ) -> tv_sec ) )

Now, we can narrow the problem down more rapidly by trying to compile the preprocessed code. This takes us to the exact line with the problem, and it's obvious from inspection exactly what is going on:

$ cc error.i
"error.i", line 1135: syntax error before or at: struct
"error.i", line 1135: syntax error before or at: )

Wednesday Nov 25, 2009

Viewing thread activity in the Performance Analyzer

The Sun Studio Performance Analyzer is one of the two tools that I use most frequently (the other is spot - which is now in SS12U1!). It's a very powerful tool, but a lot of that power is not immediately visible to users. I'm going to discuss a couple of ways I've used the analyzer to view parallel applications.

The most common first step for looking at the performance of parallel apps is to use the timeline. However, the timeline can look a bit cluttered with all of the call stack data. Often you are really just interested in the leaf node. Fortunately this can be configured from the data presentation dialog box. To get the view I want I'm only showing the top leaf in the call stack:

This results in a display of the samples in each routine, by default this can look very colourful. You can make it easier on the eye by selecting the colours used to display the graphic. In the following graphic I've picked green for one parallel routine that I'm interested in, and blue for another, then used a yellow to colour all the time waiting for more work to be assigned:

The graphic shows that the work is not evenly spread across all threads. The first few threads spend more time in the hot routines than the later threads. We can see this much more clearly using the 'threads' view of the data. To get this view you need to go back to the data presentation dialog and select the threads tab, it's also useful to select the 'cpus' tab at the same time.

The threads tab shows the activity of each thread for the currently displayed metrics. This is useful to see if one thread is doing more work than another. The cpus tab shows time that the app spends on each CPU in the machine - this can indicate whether a particular CPU is over subscribed. The thread activity looks like:

This confirms what we thought earlier that some of the threads are much more active than other threads. The top chart shows the user time, which indicates that all the threads spent the same amount of time running 'stuff', the middle chart shows the time that each thread spent running useful work, the lower chart shows the time spent in overhead. The exercise now is to try and improve the distribution of work across the threads......

Wednesday Jun 03, 2009

The Developer's Edge talk in Second Life

Just finished talking in Second Life. The slides from the talk are available from SLX. I've got into the habit of writing a transcript for my SL presentations - basically in case the audio fails for some reason.

The talk focuses a bit more on the way that people now get information (through blog posts, articles, indexed by search engines) and the Q&A after the talk was more about that than the technical content of the book. This is a domain that I've given a fair amount of thought to. When writing technical books there is a challenge to balance the information so that it includes the necessary details without writing material that will be out of date by the time that the book hits the press. Fortunately a large amount of the information that developers need is relatively long lived. The challenges come when describing a particular revision of the software, or a particular processor - details which can be very useful for people, but also details which may not age gracefully!

Wednesday Jan 28, 2009

A look inside the Sun compiler team

At the end of last year I was asked to appear in a short video for AMD and give an elevator pitch for performance analysis. The video is now up on the site.

They also took videos of some of the Solaris folks, as well as a few others from the compiler team.

Yuan Lin has a short talk about parallelisation. My manager, Fu-Hwa Wang, talks about the work of our organisation.

Tuesday Jan 13, 2009

Engelbart - Evolving Collective Intelligence

The December break is one of the few times when I'm able to find chunks of time to read. One of the books I was given was Doug Engelbart's Evolving Collective Intelligence", which is really a (very) short (88 pages!) set of essays calling for action on using computers to improve our 'collective intelligence'; meaning our ability to manage complexity. The book was a bit disappointing in that it didn't feel very focused and I didn't come away with a clear 'message'. However it did talk extensively about the 1968 demo.

The 1968 demo is described by the Stanford website that hosts the video as:

"was the public debut of the computer mouse. But the mouse was only one of many innovations demonstrated that day, including hypertext, object addressing and dynamic file linking, as well as shared-screen collaboration involving two persons at different sites communicating over a network with audio and video interface."

It was also described by Steven Levy as "The mother of all demos".

Now I've just got to find the time to watch nearly two hours of video....

Thursday Oct 30, 2008

The multi-core is complex meme (but it's not)

Hidden amoungst the interesting stories (Gaming museum opening, Tennant stepping down) on the BBC was this little gem from Andrew Herbert, the head of Microsoft research in the UK.

The article describes how multi-core computing is hard. Here's a snippet of it:

"For exciting, also read 'complicated'; this presents huge programming challenges as we have to address the immensely complex interplay between multiple processors (think of a juggler riding a unicycle on a high wire, and you're starting to get the idea)."

Now, I just happened to see this particular article, but there's plenty of other places where the same meme appears. And yes, writing a multi-threaded application can be very complex, but probably only if you do it badly :) I mean just how complex is:

#pragma omp parallel for

Ok, so it's not fair comparing using OpenMP to parallelise a loop with writing some convoluted juggler riding unicycle application, but let's take a look at the example he uses:

"Handwriting recognition systems, for example, work by either identifying pen movement, or by recognising the written image itself. Currently, each approach works, but each has certain drawbacks that hinder adoption as a serious interface.

Now, with a different processor focusing on each recognition approach, learning our handwriting style and combining results, multi-core PCs will dramatically increase the accuracy of handwriting recognition."

This sounds like a good example (better than the old examples of using a virus scanner on one core whilst you worked on the other), but to me it implies two independent tasks. Or two independent threads. Yes, having a multicore chip means that the two tasks can execute in parallel, but assuming a sufficiently fast single core processor we could use the same approach.

So yes, to get the best from multi-core, you need to use multi-threaded programming (or multi-process, or virtualisation, or consolidation, but that's not the current discussion). But multi-threaded programming, whilst it can be tricky, is pretty well understood, and more importantly, for quite a large range of codes easy to do using OpenMP.

So I'm going to put it the other way around. It's easy to find parallelism in today's environment, from the desktop, through gaming, or numeric codes. There's abundant examples of where many threads are simultaneously active. Where it gets really exciting (for exciting read "fun") is if you start looking at using CMT processors to parallelise things that previously were not practical to run with multiple threads.

Monday Oct 06, 2008

The healing power of WD40

I have a Boss ME-6 guitar effects pedal. I got it around 16 years ago, it's been sat idle in the garage for quite a while now. Last time I tried it most of the buttons had stopped responding, so the unit was stuck with just the factor defaults. I'd feared I'd need to dust off my soldering skills to replace the buttons - although I'd not located exact replacements. However, as a last ditch attempt to get it working again, I dowsed the the buttons with WD40 wriggled them around and put it together again.

Rather surprisingly the buttons started working again, after a bit more exercise they were all back to normal.

Tuesday Jun 24, 2008

Redpoint - stay at home

One of my friends just contacted me, I'd not heard from him in about 7 years, so it was neat to have some vague contact. Anyway, I'd kept some kind of track of him through another friend - who'd pointed me to an EP that he'd released. The EP's called 'Stay at home', the music is electronica/ambient. This release and the later ones (which I've not listened too yet) are available for free. Anyway, I wrote a chunk of the book late at night, and I ended up doing a fair amount of the final sections whilst listening to Redpoint. So, thanks Ian!

Thursday May 15, 2008

Sun Studio technical articles

Another list of Sun Studio technical articles.

Thursday Feb 07, 2008

Knuth and the complexity of songs

One of my colleagues just pointed out this 1984 paper by Donald Knuth on the complexity of songs. It assesses the algorithmic complexity of several well known folk and popular songs.

Thursday Oct 04, 2007

OpenMP - getting the (maximum) number of threads

Just parallelising a code, and the code needs to detect the number of threads that are available. There are a few things to consider at this point.

  • To determine that the code is being compiled under OpenMP, check the _OPENMP #define. One of the benefits of using OpenMP is that the same code base can be used to generate serial and parallel versions of the code. The _OPENMP #define separates out the parallel specific code.
  • The call to find the maximum number of threads that are available to do work is omp_get_max_threads() (from omp.h). This should not be confused with the similarly named omp_get_num_threads(). The 'max' call returns the maximum number of threads that can be put to use in a parallel region. The 'num' call returns the number of threads that are currently doing something. There's a big difference between the two. In a serial region omp_get_num_threads will return 1; in a parallel region it will return the number of threads that are being used.
  • The call omp_set_num_threads will set the maximum number of threads that can be used (equivalent to setting the OMP_NUM_THREADS environment variable). The name of this call does not make it any easier to remember the call to get the (maximum) number of threads available.

Monday Apr 23, 2007

Tuis for Ben The Hoose

I've just noticed that my friend Bob McNeill has won a third Tuis award for his Ben the Hoose album "the little cascade". The previous two were for his solo albums "covenant" and "turn the diesels".

Saturday Oct 21, 2006


I was trying to find some sequencing software to play bass & drum parts whilst I did some guitar practice. I initially tried Quartz but found the process of adding notes to be painful (may have been a mouse issue, but the software would not recognise a mouse-drag to play a longer note).

I also tried Anvil. I must admit that I almost didn't since the installer had to be run as Administrator. The free version of the software has its limitations, but was sufficient for my needs. Putting the music on the piano roll was pretty quick and easy. Adding a drum pattern was trivial thanks to the inclusion of looping capability.


Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge
Free Download


« February 2017
The Developer's Edge
Solaris Application Programming
OpenSPARC Book
Multicore Application Programming