Sunday Jul 15, 2007

Using Thread Analyzer with DBX

tha_dbx
Using Thread Analyzer with DBX

The Thread Analyzer is a new tool and can be used to detect programming errors such as data-race and deadlocks in a multi-threaded application. Because the collector is an intergral debugging feature, you can collect data-race detection and deadlock detection experiments while debugging your application. In the following example, we will show you how to prepare the source code for experiment, how to collect data in dbx, how to examine the experiment, and how to use dbx to isolate the problem.

The program (t.c) is a multi-threaded C program that contains data-race errors.

% cat t.c
#include <stdio.h>
#include <pthread.h>

int g;

void cleanup_handler(void \*arg)
{
     g = 1;
}

void \*work(void \*arg)
{
     int i;
     pthread_cleanup_push(cleanup_handler, NULL);
     for (i=0; ; i++) {
         if ((i%1000)==0)
             pthread_testcancel();
     }
     pthread_cleanup_pop(0)
}

int main(int argc, char \*argv[])
{

     pthread_t tid1;
     pthread_t tid2;

     pthread_create(&tid1, NULL, work, NULL);
     pthread_create(&tid2, NULL, work, NULL);

     pthread_cancel(tid1);
     pthread_cancel(tid2);

     pthread_join(tid1,NULL);
     pthread_join(tid2,NULL);

     printf("%d \\n",  g);
}

The following command instruments the source code for data-race collection:

cc -xinstrument=datarace -g -mt t.c -o a.out

Now, we load the application to the dbx debugger, enable experiment collection, and start to debug:

dbx a.out
[t@null l@null]: collector tha races      # Specify thread-analyzer settings
                                                        # use deadlock instead of races for deadlock detection             
               
[t@null l@null]: collector enable           # Enable collector, open new experiment

The following command sets a breakpoint at the line 31. It is used to illustrate that you can use any debugging feature as needed for debugging purpose while collecting data.

[t@null l@null]: stop at 31
(2) stop at "t.c":31

[t@null l@null]: run     
Creating experiment database tha.1.er ...
Running: a.out
(process id 14046)
Reading libcollector.so
Reading tha.so
t@1 (l@1) stopped in main at line 31 in file "t.c"
   31        pthread_cancel(tid1);
[t@1 l@1]: threads                     # Print the list of all known threads                
                                    
o>    t@1  a  l@1   ?()   breakpoint       in  main()
        t@2  a  l@2   thread_hj_start_routine()   running          in  work()
        t@3  a  l@3   thread_hj_start_routine()   running          in  work()
[t@1 l@1]: status
\*(2) stop at "t.c":31
[t@1 l@1]: delete 2                     # Remove breakpoint number 2
[t@1 l@1]: cont                          # Continue execution
1

execution completed, exit code is 0
[t@null l@null]:er_print -races tha.1.er     # Examine the data-race experiment

(You can also use the Thread Analyzer or the Performance Analyzer to examine the experiment. )

Total Races:  1 Experiment:  tha.1.er

Race #1, Vaddr: 0x214c0
      Access 1: Write, cleanup_handler + 0x00000070,
                       line 8 in "t.c"
      Access 2: Write, cleanup_handler + 0x00000070,
                       line 8 in "t.c"
  Total Traces: 1

This experiment shows a data race error at line 8 in "t.c". Now you can use dbx to examine the context of all threads when line 8 is reached:

[t@null l@null]: stop at 8              
(2) stop at "t.c":8
[t@null l@null]: run   
Creating experiment database tha.2.er ...
Running: a.out
(process id 14097)
t@2 (l@2) stopped in cleanup_handler at line 8 in file "t.c"
    8        g = 1;
t@3 (l@3) stopped in cleanup_handler at line 8 in file "t.c"
    8        g = 1;

[t@3 l@3]: threads                                                          
      t@1  a  l@1   ?()   running          in  __lwp_wait()
      t@2  a  l@2   thread_hj_start_routine()   running          in  __open()
\*>   t@3  a  l@3   thread_hj_start_routine()   breakpoint       in  cleanup_handler()
[t@3 l@3]:

The dbx threads command shows that both t@2 and t@3 are writing to the variable g at line 8. The where command can also be used to print a procedure traceback.

The Thread Analyzer and dbx are available on Solaris and on Linux. The Thread Analyzer is a new tool in Sun Studio 12. These tools are fully supported by the Sun Studio IDE.

Friday Jun 01, 2007

Runtime Memory Checking

runtimechecks
Who took my memory?

Your app has gone through the development cycle, gotten tested, and finally it gets deployed. Everything goes well, but one day a customer calls you, "Hey, your binary is taking up 60Gig of memory, and there is nothing left!".  Instead of mumbling "then how do you remember my number?", you send a field application engineer to track down the problem at the customer site.

There is no debug information available and you cannot recompile your binary in the customer site. How to find the leaks? Here is where dbx's runtime memory checking (RTC) comes in handy. RTC interposes and instruments the binaries on the fly and therefore it does not require recompiling, relinking or even debug information - just load your application into dbx and run. For example,

% dbx a.out
(dbx) check -all                                                           
access checking - ON
memuse checking - ON
(dbx) run

You can also enable runtime memory checking on a running process using the link auditor.

% which dbx
/opt/ss12/opt/SUNWspro/bin/dbx
% setenv LD_AUDIT_64 /opt/ss12/opt/SUNWspro/prod/lib/amd64/dbxruntime/rtcaudit.so
% a.out &
[1] 6759
% unsetenv LD_AUDIT_64
% dbx -  6759
(dbx) check -all                                                           
access checking - ON
memuse checking - ON
(dbx) cont

When access checking is turned on, RTC detects and reports the following kinds of errors:

        baf     # Bad free
        duf     # Duplicate free
        maf    # Misaligned free
        mar    # Misaligned read
        maw   # Misaligned write
        oom    # Out of memory
        rua     # Read from unallocated memory
        rui      # Read from uninitialized memory
        wro    # Write to read-only memory
        wua    # Write to unallocated memory

With leaks checking, RTC will report the following kinds of errors:

        aib     # Possible memory leak - only pointer points in the middle of the block
        air     # Possible memory leak - pointer to the block exists only in register
        mel   # Memory leak - no pointers to the block

RTC reports memory errors with context information that includes details about allocation of a heap block. Here is a memory access error reported on an AMD64 box:

Read from unallocated (rua):
Attempting to read 1 byte at address 0x412de0
    which is just past heap block of size 1000 bytes at 0x4129f8
This block was allocated from:
        [1] rua() at line 37 in "access.c"
        [2] access() at line 7 in "access.c"
        [3] main() at line 7 in "main.c"

stopped in rua at line 38 in file "access.c"
   38           c = s[1000];

Memory errors can be suppressed. The following command suppress read from uninitialized (rui) in all functions in a.out

(dbx) suppress rui in a.out

RTC instruments memory access assembly instructions for access checking. You can exclude load objects, object files and functions from being instrumented. The following command

(dbx) rtc skippatch a.out -f main

excludes the function main from being instrumented.

Runtime memory access checking and leaks checking are available on Solaris Sparc and Intel x86/x64. AMD64 access checking is a new feature in Sun Studio 12. This advanced feature is fully supported by the debugger IDE.


About

janitor

Search

Top Tags
Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today