GCC-style asm inlining support in Sun Studio 12 compilers

Sun Studio 12 Asm Statements



    In order to support developers used to Gcc's Inline Assembly Feature,
Sun Studio 12 has implemented a compatible interface to allow the C and C++
programmer to insert assembly instructions into the code stream generated
by the compiler.  There are several advantages to this feature above and
beyond those of the Inline Assembly feature supported by prior Sun Studio
releases.  These include allowing the routine containing the inline assembly
to be optimized, compatibility with Gcc, more flexibility in the compiler's
ability to choose registers efficiently.

    In this new scheme the inline assembly takes the form of an asm
statement in the source language that has the following form:

    asm("<inst> %0, %1\\n" : <outputs> : <inputs> : <clobber list>);

Where <inst> is an assembly-language opcode, <outputs> is a comma-separated
list of outputs; likewise with <inputs>.  Each input or output consists of
a constraint string and an expression from the source language enclosed in
parentheses.  These expressions provide the inputs to pass to the asm statement
or the outputs to store the results of the asm statement to.  The clobber list
is a comma-separated list of strings that name machine registers (other than
inputs or outputs) that one or more of the instructions in the asm statement
are known to write to.  A typical function in C containing an asm statement
might look like this:

#include <stdio.h>
void foo() {
      int result, source = 3;

      asm("movl %1, %0\\n" : "m" (result) : "r" (source));
      printf("result = %d (expected 3)\\n", result);

    The %0 and %1 in the above example are placeholders for "result" and "source",
respectively.  The compiler will evaluate "source" and load it into a free
register denoted by %1.  Then generate the movl instruction to move that
register into the memory location corresponding to the variable "result"
denoted by %0.

    There is an alternative notation for placeholders that users may find
more readable.  Rather than using %0, %1, %2, etc.  to denote positional
arguments, the user may refer to arguments symbolically:

#include <stdio.h>
void foo() {
   int result, source = 3;

   asm("movl %[input], %[output]\\n" : [output] "m" (result) : [input] "r" (source));
   printf("result = %d (expected 3)\\n", result);

    In the above example, input and output have no special meaning,
they could be any names, but they must match a corresponding
square-bracketed name in the input or output lists of the asm

    These are very simple examples.  In actuality, an asm statement
may have more than one instruction and the constraints can get quite
complex.  With flexibility of expression comes some degree of complexity
which we will try to demystify in the sections that follow.

The Instruction String

    The instruction(s) to be executed are contained in one or more
quoted strings which precede the first colon in an asm statement.
The compiler does not parse the contents of these strings except
to scan for placeholders that it needs to replace with the arguments
of the asm statement.  So, the compiler knows nothing of the semantics
of the instructions in an asm statement other than what it is told via
the constraints on the input and output arguments and the contents
of the clobber list.  Within the instruction strings, any percent sign
that does not introduce a placeholder must be doubled.  For example,
in the following asm statement, the %eax register must be written as
"%%eax" but in the clobber list, no percent sign is needed:

    asm("movl  %0, %%eax\\n" : : "r" (foo) : "eax");

Inputs and Outputs

    For an asm statement to affect a program, it most often must
    be able to receive information from expressions in the source
    language and be able to assign to variables (or other lvalues)
    in the program.  This is accomplished by passing outputs and
    inputs into the asm statement in a manner similar to the arguments
    to a function call. 


    The source language expressions for inputs may be rvalues or lvalues. 
    Outputs must be lvalues.  Expressions may be of arbitrary complexity
    and are enclosed in parenthesis following the constraint string.

Unused inputs and outputs

    If there is no use of an input or output in an asm statement's
    instruction string, then no loads from or stores to that variable
    are generated.  This saves registers for those arguments that
    are used in the asm instruction string.  There is one exception
    to this rule: If an input or output is constrained to a specific
    hardware register (as opposed to a register class), then is must
    be loaded or stored even if it is not referred to in the instruction
    string.  This is because it value may be used implicitly by the
    instructions in the asm statement.


Register Constraints


        In the descriptions that follow, only one size of register
        is listed in the tables, but in most cases the size of the
        register actually chosen depends on the type of the source
        expression being loaded into or stored from it.  See
        "Matching register types to input and output types" below
        for more details about how the compiler chooses the size
        of register to use.

        Register classes

            The following constraints specify a class of integer
            register that the compiler may choose from when it
            needs a register within an asm statement:

            Constraint    Register class
            g or r              rax, rbx, rcx, rdx, rbp, rsi, rdi, rsp, r8 - r15
            R                    eax, ebx, ecx, edx, ebp, esi, edi, esp (legacy registers)
            q                     al, bl, cl, dl
            Q                    ah, bh, ch, dh
            A                     eax or edx (used for returning 64-bit values)

        Specific registers

            The following constraints may be used to lock a source
            variable or expression to a specific hardware register:

           Constraint     Register
                                    64-bit    32-bit

            a                        rax       eax
            b                        rbx       ebx
            c                        rcx        ecx
            d                        rdx       edx
            di                       rdi        edi
            si                        rsi        rdi

        Floating point

        XMM and MMX registers

            The following constraints are used to specify that the
            source variable or expression should occupy an XMM or
            MMX register:

            Constraint    Register class
            x                        xmm0 - xmm15
            y                        mm0 - mm15

             Note:  Be sure to specifiy -xarch=sse2 when using
                       these constraints if compiling in 32-bit mode.

        x87 Floating point stack

            The following constraints are used to refer to variables
            or expressions loaded on the x87 floating point stack:

            Constraint    Register
            f                        ST(0) - ST(7)
            t                        ST(0) (top of the FP stack)
            u                        ST(1) (register just below the top of the FP stack)

 Memory Constraints

        A memory constraint has the form "<m>"  where <m> is
        one of the following letters:

        Constraint    Description
        m                    Memory operand of any general addressing mode
        o                    Offsettable addressing mode
        V                    Non-offsettable addressing mode
        <                    Autodecrement addressing mode
        >                    Autoincrement addressing mode
        These constraints instruct the compiler to generate a
        memory reference wherever this argument's placeholder
        occurs in the instruction string.

 Immediate Constraints

        An immediate constraint has the form "<i>"  where <i> is
        one of the following letters:
        Constraint    Description
            i                     Any sized constant
            e                    Constant in range -2147483648 - 2147483647
            n                    A constant less than a word wide
            I                     Constant in range 0 - 31
            J                    Constant in range 0 - 63
            K                   0xff
            L                    0xffff
            M                   Constant in range 0 - 3
            N                   Constant in range 0 - 255
            Z                    Constant in range 0 - 0xffffffff
            E                    Floating point operand (native const double)
            F                    Floating point operand (const double)
            G                   Standard 80387 floating point constant
            s                    Constant not know at compile time (symbolic)

        These constraints instruct the compiler to generate an
        immediate operand wherever this argument's placeholder
        occurs in the instruction string.

 Digit Constraints

        Digit constraints are of the form "<n>" where <n> is a number
        which corresponds to the position of an output.  This constraint
        is only allowed on an input and the digit must refer to an output.
        The semantics are to bind the constrained input to use the same
        location to load its input to as the indicated output uses.

        The example below illustrates the use of digit constraints.

        asm ("addl %1,%0 \\n\\t"

        The simple example above essentially implements foo = foo + bar;
        The "0" in the input constraint indicates that variable foo
        needs to be loaded into the same register which will also
        contain the output result. It is also possible to specify a
        particular register as shown below:

        asm ("addl %1,%0 \\n\\t"

In this case, the com
piler will generate code to load variable foo
             into register %eax (since that input is constrained to output 0 and
            output 0 is constrained to %eax by the "=a" constraint) and bar will
            be loaded into register %ebx and the result foo will be available in
            register %eax.

        Here is another example of using digit constraints to shift a value
        by a given shift count:

        int shift_count = 5;
        int shifted_value = 37;

        asm ("sarl %1, %0\\n\\t"
             : "=r" (shifted_value)
             : "c" ((char) shift_count), "0" (shifted_value)

        In this example, the variable "shift_count" is loaded into the %cl
        register (note that the cast is required to convert the 32-bit integer
        "shift_count" to an 8-bit value as required by the sarl instruction.
        The variable "shifted_value" is loaded into a register chosen by the
        compiler with the proviso that the compiler will choose the same
        register to hold the result of the sarl instruction as requested by
        the "0" digit constraint.

 Multiple Constraints

        More than one constraint letter may be used in a
        constraint string. When this occurs, the compiler
        looks at the input or output to determine which
        constraint is the best match for the given expression.
        If the constraint string contains an immediate constraint,
        and the input is a constant of the correct type, then
        the input will be treated as an immediate.  Otherwise,
        if the constraint string contains a memory constraint
        and the input or output is an lvalue, then a memory
        reference will be generated. Failing this, if the
        constraint string contains a register constraint then
        the input will be loaded into or the output will be
        written to a register. The example below illustrates
        usage of multiple constraints:

        asm ("mulq %3"
                    : "=a"(low),"=d"(high)
                    : "a"(word),"rm"(foo)   

                The mulq instruction multiplies the contents of a 64-bit
        memory or register by the contents of %rax and the result
        is available in the %rdx, %rax register pair - the high
        64-bits in %rdx and low 64 bits in %rax.

        One of the operands of the multiply, the variable foo in
        the example above, can be available in either memory or in a
        register. The "rm" constraint used in the example allows the
        compiler to choose the most appropriate location.

        The example above also shows an interesting instance of constraints
        usage. Although there is no explicit reference to %0 or %1
        in the asm template, the mulq instruction implicitly returns
        the results in %rax and %rdx, therefore "=a" and "=d" must be
        indicated as output constraints. Similarly, the first input
        operand (word) is expected to be available in the %rax register.


    Certain modifier characters may be included in a constraint string to control
    how the compiler applies that constraint.  They are:

        Modifier    Description
        =                Operand is only written
        +                Operand is read and written
        &                Operand is clobbered early
        %                This operand and the following one are commutative   
        #                Ignore all characters up to the next comma as constraints
        \*                Ignore the following character when choosing register preferences

            Note: If = or + are specified in a constraint string, they must be the first    
                      character in the string.

        The following example shows a use of the "+" modifier:

        asm ("sarl %1, %0\\n\\t"
             : "+r" (shifted_value)
             : "c" ((char) shift_count)

        The variable "shifted_value" in the example above is both an
        input and an output. The compiler would generate code to load
        "shifted_value" into a general purpose register and ensure
        that "shifted_value" is available as an output in that same
        register. The same effect can be achieved using digit constraints
        (see example above) as well.  However, if there is no explicit reference
        to the input parameter in the asm template, it is more concise to use
        "+" modifier instead.

        The compiler normally makes the assumption that all inputs to
        an asm statement are consumed before any outputs are written
        to in the instructions which constitute the asm's instruction
        string.  If this is not the case for a particular instruction
        sequence, the user must inform the compiler which outputs are
        written early (i.e. before the last input is used).  This
        rule allows the compiler to use registers efficiently by
        choosing the same register for an input and an output under
        normal conditions, but allows the user to override this
        behavior when it would be semantically incorrect to do so.
        The use of the early clobber ("&") modifier provides the means
        to communicate this information to the compiler.  A register
        chosen for an operand marked as early clobber may not be used
        to hold any of the input operands.  The following example illustrates
        the use of early clobber:

        asm (
            "    subq    %2,%2        \\n"
            ".align 16            \\n"
            "1:    movq    (%4,%2,8),%0    \\n"
            "    adcq    (%5,%2,8),%0    \\n"
            "    movq    %0,(%3,%2,8)    \\n"
            "    leaq    1(%2),%2    \\n"
            "    loop    1b        \\n"
            "    sbbq    %0,%0        \\n"
            : "=&a"(ret),"+c"(n),"=&r"(i)
            : "r"(rp),"r"(ap),"r"(bp)
            : "cc"

    Matching register types to input and output types

        The register chosen by the compiler must match the
        type of the input or output in the source code.  There
        are two ways to for the user to affect what type of
        register the compiler will choose for any given input
        or output.  The first is to insert a size letter between
        the "%" and the digit in the placeholder in the instruction
        string such as:
            asm("movi %l1, %l0\\n" : "r" (result) : "r" (source));
        This will choose a 32-bit register for the each of the
        registers chosen to hold "result" and "source".  The
        supported types are:

            Type letter    Register size
                b                    8-bits
                h                    16-bits
                l                    32-bits
                q                    64-bits

        The second way to way to affect the type of the register
        chosen is by changing the type of the source expression
        passed to the asm statement.  By default the type of
        register is chosen based on the type of the input or
        output expression.  Casting this expression will also
        influence the size of register chosen to hold that
        expression in the code generated for the asm statement.

 The Clobber List

        Some instructions implicitly modify a register or the
        user may insert a specific register name in the instruction
        string such as: 

                    asm("movl  %0, %%eax\\n" : : "r" (var) : "eax");

        In such cases the modified register should be placed in the
        clobber list (the comma-separated list of strings following the

        third colon) to inform the compiler that this register is written

        to by the asm statement.  This allows the compiler to keep enough
        information about the liveness of registers around an asm
        statement to continue to do normal optimizations.  Without
        this information, the compiler would have to forgo many
        optimizations in any routine that contained asm statements.
        Note that outputs need not be placed in the clobber list.
        The compiler knows that they are written to already.

        The following example shows a use of clobber lists:

            __asm__("movl %0,%%ecx         \\n\\t"
                    "movl %1,%0               \\n\\t"
                    "movl %%ecx,%1               \\n\\t"

        The values of variable foo and variable bar are swapped
        in the example above, using %ecx as an intermediate place holder.
        Any value held in the register %ecx earlier will be lost
        after executing the asm template; therefore, "ecx" must be
        mentioned in the clobber list.

Current Limitations and Known Bugs

    No alternative constraints

    Gcc allows an operand's constraint string to have more than one series
    of constraint letters in a comma-separated list from which the best
    matching constraint is chosen based on the cost of loading that operand
    for each legal alternative constraint.  Sun Studio 12 currently implements
    only the simpler multiple constraint syntax described above.

    Assembler is not operand sensitive

    At present, the Sun Studio 12 assembler requires that the type of the
    opcode for any given instruction matches the types of its operands.
    Gcc's assembler, by contrast, can infer the suffix required for an
    opcode from the types of the operands of the instruction.  This is a
    limitation when writing asm statements intended to work interchangably
    on 32-bit and 64-bit platforms.  Most often such asm statements must
    be split into 32-bit and 64-bit versions surrounded by appropriate
    #ifdefs as in the following example:

        void f () {};

        int main () {
                void (\*fptr)() = 0;
        #ifdef __amd64
                asm ("movq %[f], %[fptr]"
                asm ("movl %[f], %[fptr]"
                     : [fptr] "=m" (fptr)
                     : [f] "r" (f));
                if ( fptr != f ) return 1;
                return 0;

    As another example of operand sensitivity, the following
    program will fail to assemble because of type mismatches
    between the opcode and one of its operands:

        int main() {
            int a, res;
            char b;
            /\* The input argument "c" is of the
               wrong type.  The movl instruction
               expects a 32-bit integers as its operands. \*/
            asm("movl %1, %0\\n\\t" : "=r" (res): "c" (b));

            /\* The sete instruction requires an
               8-bit result register, but res is
               a 32-bit integer. \*/
            asm("sete  %0\\n\\t" : "=r" (res));

            /\* Variable "a" is an int, but the shrl
               instruction requires an 8-bit shift
               count in register %cl. \*/
            asm ("shrl %1, %0\\n\\t" : "+r" (res) : "c" (a));
        The user will see assembly errors such as the following:
            "/tmp/srscott/yabeAAAJqaGsx", line 14 : Syntax error
            Near line: "movl %cl, %edx"
            "/tmp/srscott/yabeAAAJqaGsx", line 18 : Syntax error
            Near line: "sete  %eax"
            "/tmp/srscott/yabeAAAJqaGsx", line 23 : Syntax error
            Near line: "shrl %ecx, %eax"

        The following modifications will allow it to compile without

        int main() {
            int a, res;
            char b;
            /\* Casted second argument to required type. \*/
            asm("movl %1, %0\\n\\t" : "=r" (res): "c" ((int) b));

            /\* Use an 8-bit lvalue for the output argument. \*/
            asm("sete  %0\\n\\t" : "=r" (b));

            /\* Casted second argument to required type. \*/
            asm ("shrl %1, %0\\n\\t" : "+r" (res) : "c" ((char) a));

Inefficiency of memory constraints

Memory constraints lead to an extra level of indirection which requires
an extra register to hold the address.  This will not impact correctness,
but is less efficient than the user intended when the address is simple
enough to fit one of the addressing modes supported for that instruction.

Immediate constraints do not work in C++

        The following program will compile and execute correctly when compiled
        using the Sun Studio 12 C compiler, but C++ has a bug relating to the
        "i" constraint that prevents successful compilation:

                int main() {
              int res=0, inp=3;

              asm("\\tmovl %1, %0\\n": "=m" (res) : "i" (4));
              if (inp == 3 && res == 4) return 0;
              return 1;

        This problem can be worked around by storing the immediate value
        in a variable and using that variable with a "r" constraint:

                int main() {
                        int res=0, inp=3;
                        const int imm = 4;

                        asm("\\tmovl %1, %0\\n": "=m" (res) : "r" (imm));
                        if (inp == 3 && res == 4) return 0;
                        return 1;

Support for x87 floating point constraints when optimizing

When optimizing, support for x87 floating point constraints is incomplete.  We intend to solidify this area in a future patch to Sun Studio 12.


    This article has attempted to explain the syntax and semantics of Sun Studio 12's new
Asm Statement and provide examples of how to work around know differences from the
Gcc Asm Statement.  This article reflects the current state of the Sun Studio 12 with respect
to this feature as of the SS12 patch 1 release.  Some of what is described here may not
work with the Sun Studio 12 FCS release.  We intend to improve our compatibility with
Gcc in future patches of Sun Studio 12.  As  we do so, many of the limitations and known bugs
described above will be removed.  We hope that you have found this article useful.  Any
comments are welcomed.


Is "inline assembler" code supported on SPARC, too (e.g. to support hand-optimized VIS/VIS2 code) ?

Posted by Roland Mainz on June 03, 2007 at 02:59 PM PDT #

The gcc style asm inlining is fully supported in sparc compiler at all optimization e.g -xO0 to -xO5 level and at -fast. We do not have support yet for non optimized code. However, user can use inline template for any instruction in sparc ISA, including VIS/VIS2 by using inline template. The main source of information for inline template is the manpage in Sun Studio. Addtional useful information with examples can be found at: http://developers.sun.com/sunstudio/articles/inlining.html

Posted by Anoop Kumar on July 02, 2007 at 08:36 AM PDT #

Any idea why having a bsfq instruction in inline amd64 assembler under sun studio 12 would prevent iropt from inlining the whole function?

Posted by Andy on August 28, 2007 at 05:38 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed



« April 2014