When threads go bad

When a thread hits an error in a multithreaded application, that error will take out the entire app. Here's some example code:

#include <pthread.h>
#include <stdio.h>

void \*work(void \* param)
{
  int\*a;
  a=(int\*)(1024\*1024);
  (\*a)++;
  printf("Child thread exit\\n");
}

void main()
{
  pthread_t thread;
  pthread_create(&thread,0,work,0);
  pthread_join(thread,0);
  printf("Main thread exit\\n");
}

Compiling and running this produces:

% cc -O -mt pthread_error.c
% ./a.out
Segmentation Fault (core dumped)

Not entirely unexpected, that. The app died without the main thread having the chance to clear up resources etc. This is probably not ideal. However, it is possible to write a signal handler to capture the segmentation fault, and terminate the child thread without causing the main thread to terminate. It's important to realise that there's probably little chance of actually recovering from the unspecified error, but this at least might give the app the chance to report the symptoms of its demise.

#include <pthread.h>
#include <stdio.h>
#include <signal.h>

void \*work(void \* param)
{
  int\*a;
  a=(int\*)(1024\*1024);
  (\*a)++;
  printf("Child thread exit\\n");
}

void hsignal(int i)
{
  printf("Signal %i\\n",i);
  pthread_exit(0);
}

void main()
{
  pthread_t thread;
  sigset(SIGSEGV,hsignal);
  pthread_create(&thread,0,work,0);
  pthread_join(thread,0);
  printf("Main thread exit\\n");
}

Which produces the output:

% cc -O -mt pthread_error.c
% ./a.out
Signal 11
Main thread exit
Comments:

printf is not supposed to sig safe.
I know that you know that, but since I don't see any reference to it in your blog, one of your reader may try to use that in some code :-) (I only mention because since you wrote solaris app programming, you are likely to be quoted many times :-))

clicking on the opensparc link, on your about page redirected me to http://box455.bluehost.com/suspended.page/disabled.cgi/opensparc.net which says that the domain opensparc.net is suspended :-(

I am subscribed to your blog for quite some time and enjoy it very much: keep up with the good work.

cheers,
-- paulo

Posted by paulo on November 23, 2009 at 03:20 AM PST #

@Paulo. Exactly! The list of async-safe functions is here:
http://docs.sun.com/app/docs/doc/816-5137/gen-61908?a=view
if you look at the OpenSolaris code for printstack you'll find they use write:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/gen/walkstack.c

Thanks for pointing out the problem with OpenSPARC.net. Not sure what's going on there... hopefully the site will be back up soon.

Thanks,

Darryl.

Posted by Darryl Gove on November 23, 2009 at 04:13 AM PST #

And now we have no core file with which to debug the problem... it's almost always a bad idea to try to catch things like this.

Posted by John Levon on November 24, 2009 at 10:21 AM PST #

Post a Comment:
Comments are closed for this entry.
About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs