By Darryl Gove on Oct 13, 2009
I had an e-mail which told the sorry tale of a new system which tool longer to build a project than an older system, of theoretically similar performance. The system showed low utilisation when doing the build indicating that it was probably spending a lot of time waiting for something.
The first thing to look at was a profile of the build process using `collect -F on`, which produced the interesting result that the build was taking just over 2 minutes of user time, a few seconds of system time, and thousands of seconds of "Other Wait" time.
"Other wait" often means waiting for network, or disk, or just sleeping. The other thing to realise about profiling multiple processes is that all the times are cumulative, so all the processes that are waiting accumulate "other wait" time. Hence it will be a rather large number if multiple processes are doing it. So this confirmed and half explained the performance issue. The build was slow because it was waiting for something.
Sorting the profile by "other wait" indicated two places that the wait was coming from, one was waitpid - meaning that the time was due to a process waiting for another process, well we knew that! The other was a door call. Tracing up the call stack eventually lead into the C and C++ compiler, which were calling gethostbyname. The routine doing the calling was "generate_prefix" which is the routine responsible for generating a random prefix for function names - the IP address of the machine was used as one of the inputs for the generation of a prefix.
The performance problem was due to
gethostbyname timing out, common reasons for this are missed configurations in the /etc/hosts and /etc/nsswitch.conf files. In this example adding the host name to the hosts file cured the problem.