Adventures in Instruction Tracing: Part Duex!

Yesterday I began talking about cobbling together a user-land instruction trace tool using a variety of UltraSPARC and Solaris features. Today I'm going to cop out alittle and present the comments for the user trap handler I wrote which is the heart of the tool. For the truly bored, I've provided a hello.c and associated (post-processeed) trace.

-ejo

#if 0

illtrap_hndlr - Btrace3 ILLTRAP utrap handler

erik.oshaughnessy@sun.com


Btrace3 instruments the code of a program by substituting control
transfer instructions (CTIs) with specially prepared ILLTRAP
instructions.  When the code executes, the ILLTRAP instructions cause
the utrap handler to execute.  This allows us to observe the actual
path of execution by noting which ILLTRAP instructions are executed.

It probably is worth mentioning that the ILLTRAP instruction is a
sneaky way of hiding 22 bit constant data in your code.  According to
_The SPARC Architecture Manual_ Version 9, David Weaver and Tom
Germond, page 168:

"The ILLTRAP instruction causes a illegal_instruction exception. The
 const22 value is ignored by the hardware; specifically, this field is
 NOT reserved by the architecure for any future use."

I have taken advantage of this by crafting the ILLTRAP instructions so
that they serve as indices into the array of saved original
instructions. Used in a flat array, a const22 value gives us 2\^22-1 or
roughly 4 million points where code can be instrumented.  For the
current generation of software this number of probe points appears to
suffice, but it is not difficult to conceive of future applications
where the number of probe points exceeds this flat space.

The first implementation of Btrace played a dangerous shell game,
storing the original instruction back into the executing stream of
instructions and re-inserting the ILLTRAP when handling the next
probed CTI.  Despite the odd corner case, this technique worked
surprisingly well in a single- threaded application.  It falls apart
when attempting to trace multi- threaded applications since the
complexity of managing multiple stores into the instruction stream and
guranteeing correctness of the instrumented program proved simply
overwhelming.

Btrace3 sidesteps the self-modifying code aspects of Btrace1 and 2 by
emulating the CTI instruction instead of storing the probed
instruction back into the instruction stream.  This is possible since
the utrap handler can choose where the trapping context begins
executing after the utrap handler returns.  Typical utrap handlers
either skip the trapping instruction or restart the trapping
instruction after modifying the state of the trapping context.  A CTI
instruction can be emulated by calculating the target address if the
branch is taken and then determining if the CTI is taken.  After
evaluating any conditions predicated by the type of instruction being
emulated, it is a simple matter to return with arbitrary PC and nPC
addresses.

Since the instruction stream remains instrumented during the lifetime
of of the process, the synchronization complexity of the tracing code
is greatly reduced.  The only remaining synchronization point is trace
buffer management, which is a relatively well understood problem.

The first Btrace implementation also uncovered the importance of
preserving the state of the trapping context before proceeding with
tracing activities.  The two macros UTRAP_PROLOG and UTRAP_EPILOG
preserve and restore the following registers: %ccr, %g1 - %g7.
Additionally, UTRAP_PROLOG writes ASI_PNF to %asi as required by the
__sparc_utrap_install(2) man page before calls are made to ABI
conforming functions.  Values are preserved on the stack, making the
utrap handler reasonably re-entrant.  The stack also has room for
sundry other values; %tick, %fsr, %l6 and %l7 (PC and nPC) from
trapping context, and the ILLTRAP instruction at [%l6].  The macro
UTRAP_EPILOG restores the state of %ccr, %g1 - %g7 which have may have
been corrupted by the utrap handler activites.

The following is a pseudo pseudocode description of how illtrap_hndlr works:

illtrap_hndlr{

   sampled_tick_register = gethrtime()

   save trapping context

   original instruction = ProbeTable[ trapping ILLTRAP instruction ]

   tid = thr_self();     /\* re-implemented to avoid linking libthread \*/

   cpuid = getcpuid()    /\* new in Solaris 9 \*/

   switch( original instruction )
   
     case CALL:
        emulate CALL instruction

     case BPr:
        emulate BPr instruction	
   	
     case FBfcc:
        emulate FBfcc instruction	
   
     case FBPfcc:
        emulate FBPfcc instruction	
   
     case Bicc:
        emulate Bicc instruction	

     case BPcc:
        emulate BPcc instruction	

     case JMPL:
        emulate JMPL instruction	
     
     default:   /\* getting to default is an error condition \*/
        
        unregister illtrap_hndlr
        restore trapping context
        restart trapping instruction, this time without the handler to
          catch it when it falls.  the program should die.
     
   /\* end of switch \*/

   pick PC and nPC values based on contents of FLG register which
   indicates if the branch was taken and if the delay slot instruction was
   annulled.

   if( !TraceOn )
     restore trapping context
     start executing at PC and nPC computed during instruction emulation
   fi
   
   lock(TrcLock,tid)
   			
   if( cTrcSlot > TRCMAX )
     write(btrc->fd,TraceBuffer,TRCBUFSZ)
     cTrcSlot = 0;
   fi

   store btfrec_t built in emulate_\* to TraceBuffer[cTrcSlot]

   cTrcSlot++

   unlock(TrcLock)

   restore trapping conext

   start executing at PC and nPC computed during instruction emulation

}

Of course there's alittle bit of hand waving, but that should give the
reader a basic idea of what's being done and in what order. 

#endif
Comments:

Post a Comment:
Comments are closed for this entry.
About

ejo

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks