News, tips, partners, and perspectives for the Oracle Linux operating system and upstream Linux kernel work

Exploring ARM64 runtime patching alternatives

Guest Author
In this blog Oracle kernel engineer Thomas Tai provides an overview on utilizing the Linux Alternatives Framework to perform runtime kernel patching.




Some of today's modern CPUs come with dedicated instructions to optimize specific operations. For example, ARMv8 has CRC32 instructions to accelerate CRC calculations. The problem is that those instructions can only be executed by a processor that supports them. Although the CPU has a feature register to identify its capabilities, checking the register before executing an instruction is time-consuming. Fortunately, the Linux kernel has a set of macros and functions known as the Linux Alternatives Framework to help solve this problem. This blog gives an overview of the framework.


Building and running the Linux kernel involves compiling the source code into an image file, loading the image file into memory, and then initiating execution. The image file’s format is in Executable and Linkable Format (ELF). The ELF file is comprised of multiple sections: the ".text" section stores the executable code, the ".data" section stores initialized data, and other sections that store different types of data. Usually, the code is executed without modification. In some cases, portions of the code need to be replaced (patched) to either optimize hardware features or work around bugs.

Linux Alternatives Framework

The Linux Alternatives Framework is a set of macros that kernel developers can use to prepare their code for boot time patching. It is available for multiple CPU architectures, including X86, ARM64, S390, and PA-RISC. The alternative macro stores the default original code in the .text 0 section and the replacement code in the .text 1 section. The macro also creates an 'alt_instr' structure containing the offset locations, instruction length, and the CPU feature bit. The structure is stored in the .alternative section.

struct alt_instr {
    s32 orig_offset; /* offset to original instruction          */
    s32 alt_offset;  /* offset to replacement instruction       */
    u16 cpufeature;  /* cpufeature bit set for replacement      */
    u8 orig_len;     /* size of original instruction(s)         */
    u8 alt_len;      /* size of new instruction(s), <= orig_len */

At boot time, the Linux kernel will walk through the .alternative section and compare each 'alt_instr' structure with the running CPU's features. If the machine does not have the specific feature, the default code remains unchanged. Otherwise, the kernel will replace the default code with the replacement code using the information available in the 'alt_instr' structure.

Syntax of the Framework's Macro

The macro syntax is similar to an if-then-else statement and is prefixed with the word alternative_. For example, the alternative_if is similar to the if statement, the alternative_if_not is similar to the if not, the alternative_else is similar to an else statement, and so on. The if macro marks the beginning of a code section, and the else macro starts a new code section. Finally, an endif macro ends the clause.

Let's pick the following 'crc32_le' function in arch/arm64/lib/crc32.S as an example. The example function assumes that the machine does not have the specific hardware capability and would branch to a routine that uses the software CRC function (b crc32_le_base). When the code is run on a machine with the hardware capability, the alternative macro causes the branch to be replaced by a NOP and continues to execute using hardware CRC instruction.

Original code segment                  Prefix removed
-----------------------------------    ------------------------------------------------------------------------------------------
SYM_FUNC_START(crc32_le)               SYM_FUNC_START(crc32_le)    /* start of the function                                    */
alternative_if_not ARM64_HAS_CRC32     if_not ARM64_HAS_CRC32      /* assuming the runtime machine has no hardware CRC feature */
    b crc32_le_base                        b crc32_le_base         /* default branch to software CRC routine                   */
alternative_else_nop_endif             else_nop_endif              /* patch with nop if machine has hardware CRC feature       */
    __crc32                                __crc32                 /* a macro which uses hardware CRC instructions             */
SYM_FUNC_END(crc32_le)                 SYM_FUNC_END(crc32_le)      /* end of function   

After expanding the example macro, we can see how it creates the 'alt_instr' structure and stores the replacement code in a separate section. You can refer to the following code block for a line-by-line explanation. As a summary, the macro uses multiple assembler directives to calculate the offset of the original and replacement code. The replacement code is then stored in the text subsection 1. Once an 'alt_instr' is created, the kernel can use it to patch the code at boot time.

// SYM_FUNC_START(crc32_le)
// alternative_if_not ARM64_HAS_CRC32
//     b crc32_le_base
// ....................................................................................
crc32_le:                            // function name
  .set .Lasm_alt_mode, 0             // set asm_alt_mode to 0. The asm_alt_mode controls which section
                                     // to use in the else statement at label 662.
                                     // mode 0 = the code after the else statement stores in .text 1
                                     // mode 1 = the code after the else statement stores in .text 0
  .pushsection .altinstructions, "a" // append following data to .altinstructions
  .word 661f - .                     // offset to original instruction
  .word 663f - .                     // offset to replacement instruction
  .hword ARM64_HAS_CRC32             // cpufeature bit set for replacement
  .byte 662f-661f                    // size of the original instruction(s)
  .byte 664f-663f                    // size of new instructions(s)
  .popsection                        // restore to .text 0 section
661:                                 // start of the original instruction
  b    crc32_le_base                 // original instruction (software CRC)

// alternative_else_nop_endif
//     __crc32
// SYM_FUNC_END(crc32_le)
// ....................................................................................
662:                                 // end of the original instruction
    .if .Lasm_alt_mode==0            // if mode == 0 then stores the following code in .text 1
    .subsection 1                    // stores following inst in .text 1
663:                                 // start of the replacement code
   Nops (662b-661b) / AARCH64_INSN_SIZE // creates multipe nops matches the number of
                                     // original instruction(s). i.e., the length of the replacement
                                     // code must be the same as the original code
664:                                 // end of the replacement code
    .if .Lasm_alt_mode==0
    .previous                        // restore to .text 0
    .org . - (664b-663b) + (662b-661b) // This is a build time check to make sure the length
                                       // of the replacement code is the same length as the original
                                       // code.
                                       // (664b - 663b) is length of the replacement code
                                       // (662b - 661b) is length of the original code
                                       // - (664b-663b) + (662b-661b) must be 0. Otherwise,
                                       // if - (664b-663b) + (662b-661b) < 0, a build error time will occur.
                                       // if - (664b-663b) + (662b-661b) > 0, the following line will cause build time error.
    .org . - (662b-661b) + (664b-663b)
    __crc32                          // Use hardware CRC

Examining the code with QEMU and GDB

After we have some idea of how the macro works, we can use QEMU and GDB to see how the Linux kernel performs the patching. On an ARMv8 host machine, start QEMU with -CPU host and use -S to cause QEMU to wait for the GDB connection.

qemu-system-aarch64 -machine virt,gic-version=3 -cpu host -m 8192 -nographic -gdb tcp::1234 -kernel Image -S

On a separate host terminal, start GDB and connect to the guest. Disassemble crc32_le before and after the patching.

# start gdb and load symbols from vmlinux
gdb vmlinux

# basic gdb setup and connect to the remote target
(gdb) set multiple-symbols ask
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000000040000000 in ?? ()

# set breakpoints before patching.
(gdb) hbreak arch/arm64/kernel/alternative.c:apply_alternatives_all
Hardware assisted breakpoint 1 at 0xffff8000113b4ea4: file arch/arm64/kernel/alternative.c, line 229.

# set a breakpoint after the patching. The breakpoint location in this example is
# arch/arm64/kernel/alternative.c, line 229 + 1. That is,
# arch/arm64/kernel/alternative.c, line 230
(gdb) hbreak arch/arm64/kernel/alternative.c:230
Hardware assisted breakpoint 2 at 0xffff8000113b4ec0: file arch/arm64/kernel/alternative.c, line 230.

# set breakpoints at crc32_le
(gdb) hbreak crc32_le
[0] cancel
[1] all
[2] arch/arm64/lib/crc32.S:crc32_le
[3] lib/crc32.c:crc32_le
> 2
# make note of the crc32_le address 0xffff800010590d60, we need it to disassemble the address
Hardware assisted breakpoint 3 at 0xffff800010590d60: file arch/arm64/lib/crc32.S, line 90.

# continue until we hit the first breakpoint
(gdb) continue

Breakpoint 1, apply_alternatives_all () at arch/arm64/kernel/alternative.c:229
229             stop_machine(__apply_alternatives_multi_stop, NULL, cpu_online_mask);

# disassemble the crc32_le before the patching.
# we should see "b crc32_le" which is the normal non-optimized version of calculating CRC.
(gdb) disassemble 0xffff800010590d60,+12
Dump of assembler code from 0xffff800010590d60 to 0xffff800010590d6c:
   0xffff800010590d60 <crc32_le+0>:     b       0xffff8000105ac928 <crc32_le> <== BEFORE PATCHING
   0xffff800010590d64 <crc32_le+4>:     cmp     x2, #0x10
   0xffff800010590d68 <crc32_le+8>:     b.lt    0xffff800010590e08 <crc32_le+168>
End of assembler dump.

# delete the old breakpoint
(gdb) delete 1

# continue until the 2nd breakpoint
(gdb) continue

Breakpoint 2, apply_alternatives_all () at arch/arm64/kernel/alternative.c:230
230     }

# disassemble the crc32_le after the patching. we should see the original
# code is patched with an nop, which causes the code to use dedicated
# CRC instruction later in the code path.
(gdb) disassemble 0xffff800010590d60,+52
Dump of assembler code from 0xffff800010590d60 to 0xffff800010590d6c:
   0xffff800010590d60 <crc32_le+0>:     nop  <== AFTER PATCHING
   0xffff800010590d64 <crc32_le+4>:     cmp     x2, #0x10
   0xffff800010590d68 <crc32_le+8>:     b.lt    0xffff800010590e08 <crc32_le+168>
   0xffff800010590d6c <crc32_le+12>:    and     x7, x2, #0x1f
   0xffff800010590d70 <crc32_le+16>:    and     x2, x2, #0xffffffffffffffe0
   0xffff800010590d74 <crc32_le+20>:    cbz     x7, 0xffff800010590de4 <crc32_le+132>
   0xffff800010590d78 <crc32_le+24>:    and     x8, x7, #0xf
   0xffff800010590d7c <crc32_le+28>:    ldp     x3, x4, [x1]
   0xffff800010590d80 <crc32_le+32>:    add     x8, x8, x1
   0xffff800010590d84 <crc32_le+36>:    add     x1, x1, x7
   0xffff800010590d88 <crc32_le+40>:    ldp     x5, x6, [x8]
   0xffff800010590d8c <crc32_le+44>:    tst     x7, #0x8
   0xffff800010590d90 <crc32_le+48>:    crc32x  w8, w0, x3 <== dedicated ARMv8 CRC instruction
End of assembler dump.


In this blog, we looked at the Linux Alternatives Framework. We discussed how this framework could enable CPU-specific instructions without incurring the runtime penalty of checking the feature register for every use and we gave a real-world example of the framework in operation.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.