By Jorgen Austvik on May 26, 2008
For some days I have been tracking down a problem which only shows itself on specific platforms and only when run in big automated settings when we try to run Slony-I tests. Off course, when we try to reproduce from our own environment, everything just works.
It turns out that due to some environment issue (PATH settings) different ps(1) commands were run on Solaris x86 and Solaris SPARC for the same user account, and that the Slony-I test suite parses ps(1) output (see _check_pid() in slony1-engine/tests/support_funcs.sh) to check if a process is still running.
I wrote a patch that used pgrep(1) to do the same in a slightly more robust way, but then Ståle - a manager! - came up to me and told me about kill -0 <pid>. How embarrassing not knowing about kill -0, and even more embarrassing being told by a manager. Anyway, now you are warned about kill -0, but knowledgeable managers I can not protect you from