Episode I: The Phantom #defines

If you've spent any time at all looking at the Solaris code, especially assembly routines, you may have noticed that there are #defines that are used but that just don't seem to exist anywhere.

For example, in the swtch() code I referenced last time, you'll see this construct:

movq    %rbx, T_RBX(thread_t);

So what's T_RBX? If you look through the source code, you'll find it in usr/src/uts/intel/amd64/ml/mach_offsets.in, #defined as:

    #define   T_RBX           _CONST(T_LABEL + LABEL_RBX)
but yet if you search for a #define for T_LABEL you won't find one anywhere.

How can this code possibly assemble?

The answer? Genassym, offsets.in and mach_offsets.in files and assym.h.

But wait! There's no assym.h file, either! What's going on?!?!

Hold on, and I'll try to explain.

The genassym program, part of Solaris' stabs tools, is Solaris' way of generating calculated #defines as a way of accessing structure elements within assembly language code. It's documented at the beginning of the file usr/src/uts/sun4/ml/offsets.in:

\\ Guidelines:
\\
\\ A blank line is required between structure/union/intrinsic names.
\\
\\ The general form is:
\\
\\       name size_define [shift_define]
\\               member_name [offset_define]
\\       {blank line}
\\
\\ If offset_define is not specified then the member_name is
\\ converted to all caps and used instead.  If the size of an item is
\\ a power of two then an optional shift count may be output using
\\ shift_define as the name but only if shift_define was specified.
\\
\\ Arrays cause stabs to automatically output the per-array-item increment
\\ in addition to the base address:
\\
\\        foo FOO_SIZE
\\               array   FOO_ARRAY
\\
\\ results in:
\\
\\       #define FOO_ARRAY       0x0
\\       #define FOO_ARRAY_INCR  0x4
\\
\\ which allows \\#define's to be used to specify array items:
\\
\\       #define FOO_0   (FOO_ARRAY + (0 \* FOO_ARRAY_INCR))
\\       #define FOO_1   (FOO_ARRAY + (1 \* FOO_ARRAY_INCR))
\\       ...
\\       #define FOO_n   (FOO_ARRAY + (n \* FOO_ARRAY_INCR))
\\
\\ There are several examples below (search for _INCR).
\\
\\ There is currently no manner in which to identify "anonymous"
\\ structures or unions so if they are to be used in assembly code
\\ they must be given names.
\\
\\ When specifying the offsets of nested structures/unions each nested
\\ structure or union must be listed separately then use the
\\ "\\#define" escapes to add the offsets from the base structure/union
\\ and all of the nested structures/unions together.  See the many
\\ examples already in this file.
This allows us to manipulate structure definitions and not have to go back and adjust #defines or correct assembly code along the way. This also provides a way to easily index arrays from within assembly code.

To continue our resume() example, in usr/src/uts/intel/amd64/ml/mach_offsets.in, you'd find this code:

\\#define        LABEL_RBP       _CONST(_MUL(2, LABEL_VAL_INCR) + LABEL_VAL)
\\#define        LABEL_RBX       _CONST(_MUL(3, LABEL_VAL_INCR) + LABEL_VAL)
\\#define        LABEL_R12       _CONST(_MUL(4, LABEL_VAL_INCR) + LABEL_VAL)
\\#define        LABEL_R13       _CONST(_MUL(5, LABEL_VAL_INCR) + LABEL_VAL)
\\#define        LABEL_R14       _CONST(_MUL(6, LABEL_VAL_INCR) + LABEL_VAL)
\\#define        LABEL_R15       _CONST(_MUL(7, LABEL_VAL_INCR) + LABEL_VAL)
\\#define        T_RBP           _CONST(T_LABEL + LABEL_RBP)
\\#define        T_RBX           _CONST(T_LABEL + LABEL_RBX)
\\#define        T_R12           _CONST(T_LABEL + LABEL_R12)
\\#define        T_R13           _CONST(T_LABEL + LABEL_R13)
\\#define        T_R14           _CONST(T_LABEL + LABEL_R14)
\\#define        T_R15           _CONST(T_LABEL + LABEL_R15)
Coupled with this code in usr/src/uts/i86pc/ml/offsets.in:
_kthread        THREAD_SIZE
        t_pcb                   T_LABEL
[ ... snip ... ]
_label_t
        val     LABEL_VAL
You can probably see how it begins to pull together at this point; when the assembly code references T_RBX, genassym predefines that to be the offset within the thread's label_t where %rbx will be stored, or in this case:
    T_LABEL + LABEL_RBX
or:
    T_LABEL + (3 \* LABEL_VAL_INCR) + LABEL_VAL
where LABEL_VAL is the offset of the val field within a label_t, and LABEL_VAL_INCR is the offset that must be added to get to each successive member of the label_t's val array.

Thus the #define above becomes this for an amd64 kernel (given the sources as of the date this is being written):

    0x38 + (3 \* 8) + 0
yielding an offset of 0x50 bytes into the _kthread structure for the original movq instruction, so the instruction I mentioned at the top of this post:
    movq    %rbx, T_RBX(thread_t);
becomes:
    movq    %rbx, 0x50(thread_t);

Now this may all seem overly complex, but what if the size of the label_t changed? What if the elements were added to the _kthread structure such that the offset of the t_pcb element changed? All that would be required is a recompile of the kernel. No changes to header files, no further changes to source code, not even a change to the offsets.in file.

How does this magic happen? In summary, genassym, parses the offsets.in and mach_offsets.in files and creates the assym.h at compile time.

That's why you won't find it in the source tree, yet it is #included at the top of most assembly files, such as this snippet at the top of usr/src/uts/intel/ia32/ml/swtch.s:

#if defined(__lint)
#include 
#include 
#include 
#else   /\* __lint \*/
#include "assym.h"
#endif  /\* __lint \*/
and things still assemble.

I hope this helps explain where many of those magical #defines come from.

All in all, pretty nifty.


Technorati Tag: OpenSolaris
Technorati Tag: Solaris
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

kucharsk

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today