If you have heard about Spectre v2 and have ever wondered if and how it relates to BTI, BTB, BHI, BHB, Retbleed, RSBA, RRSBA, RSBU, BTC, PhantomJMP, Inception, SRSO, IBPB, IBRS, retpoline, STIBP, RSB, PBRSB, SLS, jmp2ret, safeRET; or even what does this all mean? Then this document will help you. (hopefully)

In 2018, Spectre and Meltdown were disclosed and have introduced a new class of vulnerabilities based on speculative execution. Among all these vulnerabilities, the Spectre variant 2 (Spectre v2) was, and still is, particularly dreadful. As correctly predicted by discoverers of the Spectre vulnerability, it is not easy to fix, and it will haunt us for quite some time. Effectively, several years later, multiple Spectre v2 variants have been discovered, and mitigations keep evolving.

Spectre v2 variants are described in several research papers. Mitigations are spread in different documents mostly from Intel and AMD; and they are sometimes adjusted and fine-tuned, for example before being added to the Linux kernel. As a result it can be challenging to understand which mitigations are currently available, how they effectively relate to Spectre v2, or how they interact with each other.

To better understand Spectre v2, this document provides an overview of the vulnerability and its different variants. It also attempts to reference and explain all possible mitigations, and it is expected to be kept as up-to-date and as accurate as possible.

All information in this document is related to x86 processors only (Intel and AMD). Although some information might pertain to other processors impacted by Spectre v2 (such as ARM processors), its accuracy hasn’t been checked for any processors other than x86. The document focuses on mitigations for Spectre v2 attacks against the kernel, either from userspace (userspace -> kernel) or from a guest VM (guest VM -> host kernel).

Introduction

Speculative Execution

Speculative execution is a mechanism used by processors to improve performances. It relies on predicting the result of prior instructions to anticipate the execution of next instructions. With correct predictions the processor can execute instructions earlier than their program order. A wrong prediction results in a mis-speculation, the execution state is then restored and the processor restarts the execution.

For the end user, speculative execution doesn’t have any directly visible effect because instructions are always correctly executed and this doesn’t change the correctness of the executed code. However, speculative execution can have side effects and change the micro-architectural state of the processor, for example by loading data in cache.

Indirect Branches

In this document, unless specified otherwise, references to indirect branches are only to near call indirect, near jump indirect and near return instructions.

Indirect branches are branch instructions that provide the ability to jump to a target address that is provided in a register or loaded from memory, or a return instruction from a previous subroutine call. On x86, this refers to the JMP and CALL instructions where the target is specified in a register or in memory, or to the RET instruction.

ehen the processor speculatively executes an indirect branch instruction, it predicts the destination address of the branch (the branch target) and continues the speculative execution at this address. The speculative execution of indirect branches is exploited by Spectre v2 to trigger side-channel attack.

Indirect Branch Predictors

The processor predicts the target address of an indirect branch using an indirect branch predictor. There are two different indirect branch predictors, one for indirect jump and call instructions (JMP/CALL), and one for return instructions (RET). These predictors have some behaviors which are similar on Intel and AMD, but they also rely on mechanisms which are specific to each vendor. This explains why Intel and AMD are not necessarily impacted by the same Spectre v2 variants and why they sometimes require different mitigations.

JMP/CALL Predictor

Branch targets of indirect jump and call instructions are predicted using the JMP/CALL predictor. With this predictor, targets are predicted based on the branch instruction’s address using a Branch Target Buffer (BTB).

Intel – BHB

In addition to the BTB, Intel processors also use a Branch History Buffer (BHB) to predict different targets for the same indirect branch based on the history of previously executed branch instructions. This is used to improve the accuracy of branch predictions.

JMP/CALL predictions can be shared between processor threads running on the same processors core. In that case, a processor thread can influence JMP/CALL predictions on sibling processor threads.

RET Predictor

Branch targets of return instructions are predicted using the RET predictor. With this predictor, targets are predicted using a Return Stack Buffer (RSB). The RSB is a last-in-first-out buffer that keeps track of the return address every time a CALL instruction is executed. It is used to predict where the corresponding RET instruction will return. RET predictions are not shared between sibling processor threads.

AMD — RAP and RAS

In AMD documents, the RET predictor is called the Return Address Predictor (RAP), and the RSB is called the Return Address Store (RAS).

Notes: – The RSB can only store a limited number of entries (the limit depends on the type of processors), and it can become empty and underflow under certain conditions.

  • In some cases, RET instructions might be predicted using the JMP/CALL predictor instead of using the RET predictor. This behavior depends on processor.

Intel — RSBA and RRSBA Behaviors

Some Intel processors can have an RSB Alternate (RSBA) behavior and use the JMP/CALL predictor to predict RET instructions when the RSB is empty. This can happen with the following processors: – Some older Intel processors based on the Skylake microarchitecture and its close derivatives; – More recent processors with the RSBA capability; – Processors with the RRSBA capability: these processors have a Restricted RSBA Alternate (RRSBA) behavior. For more information, see RSBU Mitigation.

AMD — BTC and PhantomJMP

With Branch Type Confusion (BTC), some AMD processors can speculatively execute any instruction (including RET instructions) as an indirect JMP instruction. This misprediction will cause a phantomJMP that will speculatively branch to a target address selected by the JMP/CALL predictor. For more information, see BTC Mitigation.

Spectre v2 Vulnerabilities

Overview

Spectre v2 is a processor vulnerability which allows an attacker to somehow control the target address, that will be selected by the indirect branch predictor, when an indirect branch is speculatively executed. Basically, it allows an attacker to control indirect branch predictions by selecting the address of the code that will be speculatively executed.

With Spectre v2, indirect branch predictions can be controlled between different privilege modes on the same processor thread. In addition, because processor threads using the same processor core may have a shared indirect branch predictor, indirect branch predictions can also be controlled between sibling processor threads.

As a result, Spectre v2 allows a non-privileged user to influence speculative execution inside the kernel and, combined with a side-channel attack, it can be used to leak sensible data stored in the kernel. Similarly, Spectre v2 can also be triggered from a non-privileged user process to access data stored in another process, or from a guest VM to access data stored in the hypervisor or on the host system.

Variants

Spectre v2 was disclosed in 2018 and since then several different variants have been discovered. Each Spectre v2 variant attempts to better understand the behavior of the indirect branch predictor (which is an obscure mechanism), and to foul it a different way in order to influence the outcome of the prediction.

  • 2018 – SpectreBTB aka Branch Target Injection (BTI): the initial Spectre v2 vulnerability which is targeting the BTB.
  • 2018 – SpectreRSB and ret2spec: a variant targeting the RSB.
  • 2022 – SpectreBHB aka Branch History Injection (BHI): a variant targeting the BHB, exploited with eBPF (Intel only).
  • 2022 – Retbleed: a variant that exploits return instructions by forcing RET instructions to be predicted with the JMP/CALL predictor.
    The Retbleed vulnerability is different on Intel and AMD:
    • On Intel, it is based on Return Stack Buffer Underflow (RSBU).
    • On AMD, it is based on Branch Type Confusion (BTC) aka PhantomJMPs.
  • 2023 – Inception aka Speculative Return Stack Overflow (SRSO) (AMD only): a variant based on Retbleed that trains indirect branch predictors using speculative execution.
  • 2024 – Native BHI: a native exploit of BHI, without using eBPF.

In this document, BHI refers instinctively to both SpectreBHB and Native BHI.

The References section has more details about the Spectre v2 variants.

The following table shows the instructions and prediction mechanisms targeted by the different vulnerabilities:

 


Vulnerability
Targeted
Instruction
Targeted
Predictor
BTI
JMP or CALL
BTB
SpectreRSB
RET
RSB
BHI
JMP or CALL
BHB
Retbleed (RSU)
RET
RSB+BTB
Retbleed (BTC)
any
BTB
SRSO
any+RET
BTB+RSB

 

Spectre v2 Mitigations

Multiple mechanisms are available to mitigate Spectre v2 and its different variants. Unfortunately there is usually no single solution to mitigate all Spectre v2 variants, and solutions are often different depending on the processors vulnerabilities and capabilities.

The mitigation mechanisms can be processor hardware features (which might require a microcode update) or software sequences. The main mitigations are: IBPB, IBRS and retpoline. They usually need to be completed by additional mechanisms such as STIBP, SMEP, RSB Stuffing/Clearing or some vendor or processor specific mitigations.

Overview

Here is a simplified summary of the most common mitigations and which vulnerability they applied to. Beware that this is just a very high level summary, there are plenty of corner cases and caveats when using any mitigation, also some vulnerabilities might require multiple mitigations.

Vulnerability
Mitigation on Intel
Mitigation on AMD
BTI
IBRS or retpoline
IBRS or retpoline
BTI between
processor threads
IBRS or STIBP
IBRS or STIBP
BHI
BHI Mitigation:
BHI_DIS_S
or BHB Clearing
or retpoline + RRSBA_DIS_S
Not Vulnerable
SpectreRSB
user -> kernel
SMEP or RSB Clearing
SMEP or RSB Clearing
SpectreRSB
guest VM -> host kernel
IBRS or RSB Clearing
IBRS or RSB Clearing
Retbleed (RSBU)
IBRS or RSB Stuffing
Not Vulnerable
Retbleed (BTC)
Not Vulnerable
BTC Mitigation
in particular jmp2ret
SRSO
Not Vulnerable

 

Notes: – All mitigations have an impact on performances. – IBPB can be used to mitigate all the listed Spectre v2 variants, except BTI between processor threads. But IBPB can have a large impact on performances.

Mitigation Mechanisms

Different mechanisms are available to mitigate Spectre v2 and/or some of its variants.

  • Indirect Branch Prediction Barrier (IBPB)

    IBPB is a processor feature to establish a barrier for indirect branch predictions. The barrier prevents indirect branches executed before the barrier from influencing predictions of indirect branches executed after the barrier. This applies to predictions of indirect JMP, indirect CALL and RET instructions.

    IBPB is probably the most effective mitigation for Spectre v2 as it works for the JMP/CALL predictor and the RET predictor (although there are some limitations, see below), and it also works inside the same privilege mode. But it comes with a cost because IBPB can have a significant overhead. So it is usually only used in specific cases where no other mitigation is possible.

    Limitations

    Although IBPB is a very effective mitigation for Spectre v2, it has a few limitations:

    • IBPB has no impact on the sharing of branch predictions between processors threads. See STIBP to prevent the sharing of branch predictions.

    • On processors with the post-barrier RSB issue, IBPB is not a fully effective barrier for RSB-based predictions and an additional mitigation is required, see Post-Barrier RSB Mitigation.

    • On AMD processors, IBPB might not be an effective mitigation for BTC.

      AMD – BTC and SBPB

      On some processors, IBPB doesn’t flush older branch type predictions. In that case, IBPB is not an effective mitigation for Branch Type Confusion (BTC).

      Also, on processors with the SBPB capability (or on Zen3/Zen4 processors with the appropriate microcode patch), the SBPB command is available and will behave like IBPB but without necessarily flushing branch type predictions. SBPB can be used instead of IBPB when there is no need to flush branch type predictions, for example when there is no need to mitigate BTC.

    Implementation

    • IBPB is supported if the processor enumerates:

      • On AMD: CPUID.(EAX=8000_0008h):EBX[12] as 1.
      • On Intel: CPUID.(EAX=7, ECX=0):EDX[26] as 1.

      Support for IBPB implies that MSR 49h (PRED_CMD) exists.

    • An IBPB command is executed when setting bit 0 (IBPB) to 1 in the MSR 49h (PRED_CMD).

    AMD – IBPB and RSB

    IBPB_RET: If the processors enumerates CPUID.(EAX=8000_0008h):EBX[30] as 1 then IBPB clears the RSB.

    Intel – IBPB and IBRS

    Support for IBPB and IBRS is enumerated the same way. So if IBPB is supported then IBRS is also supported.

  • Indirect Branch Restricted Speculation (IBRS)
    IBRS restricts the speculation of indirect branches. It is a processor feature that prevents software running in userspace (or in a guest VM) from influencing the prediction of indirect branch targets in the kernel. When enabled, it also prevents indirect branch predictions to be shared between sibling processor threads (like STIBP does).

    Intel — RSBA

    IBRS prediction restriction also applies when the processor have a RSBA behavior and the JMP/CALL predictor is used to predict RET instructions. This makes IBRS an effective mitigations against the Retbleed RSB underflow (RSBU) vulnerability. For more information, see RSBU Mitigation.

    AMD — BTC

    IBRS prevents speculation at the predicted target of any instruction that is decoded as an indirect branch, regardless of the predicted branch type. So IBRS is an effective mitigation for BTC-IND. For more information, see BTC and SRSO Mitigations.

    IBRS was initially implemented as basic IBRS which requires IBRS to be enabled each time the kernel is entered. Nowadays, IBRS is available with an always-on mode where IBRS remains on after it has been enabled, and there is no need to enabled it each time the kernel is entered.

    Basic IBRS

    Basic IBRS should be enabled each time the kernel is entered in order to prevent indirect branch predictions done in userspace (or in a guest VM) from controlling indirect branch predictions done in the kernel.

    Always-On IBRS

    Always-on IBRS simplifies the usage of IBRS compared to basic IBRS. With always-on IBRS, IBRS only needs to be enabled once, instead of enabling it on every time the kernel is entered. Like basic IBRS, it still prevents predictions from userspace to control predictions in the kernel, and it also improves performances of IBRS.

    When always-on IBRS is enabled, it also prevents the predicted target of a RET instruction from using an RSB entry created in a guest VM, and mitigate the SpectreRSB vulnerability between a guest VM and the host kernel. On a VMExit, the host kernel should ensure that always-on IBRS hasn’t been disabled by the guest VM, and re-enabled it if it was.

    Always-on IBRS is implemented differently on Intel and AMD.

    Intel — Enhanced IBRS (eIBRS)

    On Intel processors, the always-on mode of IBRS is available as Enhanced IBRS (eIBRS), it is also known as IBRS ALL.

    AMD — Automatic IBRS (AutoIBRS)

    On AMD processors, the always-on mode of IBRS is available as Automatic IBRS (AutoIBRS).

    Note that when enabled, AutoIBRS prevents indirect branch predictions to be shared between sibling processor threads but only when the processor is in privileged mode. So it doesn’t prevent indirect branch predictions to be shared between sibling processor threads when running userspace code, and STIBP should be used in that case.

    AMD documents an IBRS Always On capability which was introduced with the documentation of the IBPB, IBRS and STIBP features. The Linux kernel doesn’t use this capability but it does use Automatic IBRS. So it looks like this capabilitity was superseded by Automatic IBRS.

    Limitations

    The restricted speculation enabled with IBRS only applies to the JMP/CALL predictor, and usually only to the BTB. So IBRS mitigates BTI, but it doesn’t necessarily mitigate BHI, see BHI Mitigation.

    Implementation

    Basic IBRS

    • Basic IBRS is supported if the processor enumerates:

      • On AMD: CPUID.(EAX=8000_0008h):EBX[14] as 1.
      • On Intel: CPUID.(EAX=7, ECX=0):EDX[26] as 1.

      Support for basic IBRS implies that MSR 48h (SPEC_CTRL) exists.

    • Basic IBRS is enabled/disabled by setting/clearing bit 0 (IBRS) in MSR 48h (SPEC_CTRL).

    Intel – IBRS and IBPB

    Support for IBPB and IBRS is enumerated the same way. So if IBRS is supported then IBPB is also supported.

    Always-On IBRS

    AMD – AutoIBRS

    • AutoIBRS is supported if the processor enumerates CPUID.(EAX=8000_0008h):EBX[8] as 1 (AutomaticIBRS).
    • AutoIBRS is enabled/disabled by setting/clearing bit 21 (AIBRSE) in the EFER register.

    Basic IBRS and AutoIBRS are enumerated and controlled differently. So an AMD processor can support both basic IBRS and AutoIBRS.

    Intel – eIBRS

    • eIBRS is supported if the processor enumerates:
      • CPUID.(EAX=7, ECX=0):EDX[26] as 1 (same as basic IBRS)
      • and MSR(ARCH_CAPABILITIES=10Ah).[1] as 1 (IBRS_ALL).
    • eIBRS is enabled/disabled by setting/clearing bit 0 (IBRS) in MSR 48h (SPEC_CTRL) (same as basic IBRS).

    Basic IBRS and eIBRS are controlled using the same mechanism. So an Intel processor can support either basic IBRS or eIBRS but not both.

    Extra information

    Processors can provide additional information about IBRS.

    AMD

    • IBRS Always On – CPUID.(RAX=8000_0008h):EBX[16]
      When set, indicates that the processor prefers that IBRS is only set once during boot and not changed.

    • IBRS Preferred – CPUID.(RAX=8000_0008h):EBX[18]
      When set, indicates that the processor prefers using the IBRS feature instead of other software mitigations such as retpoline.

    • IBRS Same Mode – CPUID.(RAX=8000_0008h):EBX[19]
      When set, IBRS provides same mode speculation limits. For these processors, when IBRS is set, indirect branch predictions are not influenced by any prior indirect branches, regardless of the privelege mode and regardless of whether the prior indirect branches occurred before or after the setting of IBRS.

    These capabilities are described in the AMD documentation, and the Linux kernel doesn’t use any of them but it does use Automatic IBRS. So it looks like these capabilities have been superseded by Automatic IBRS.

  • Retpoline
    Retpoline is a software construct where indirect call (CALL) and jump (JMP) instructions are replaced with a RET instruction sequence. The retpoline sequence pushes the target address onto the stack, and executes a RET instruction to jump to that address. It also adds an entry to the RSB so that a speculative execution of the RET instruction gets trapped executing safe code. This way, retpoline mitigates the BTI and BHI vulnerabilities by not having any indirect JMP or CALL instructions and so the JMP/CALL predictor is not used.

    Limitations

    Retpoline is an effective mitigation when the speculative execution of RET instructions solely depends on the RET predictor and on the RSB. If the processor might predict some RET instructions using the JMP/CALL predictor (for example because of RSBA on Intel, or phantomJMPs on AMD) then additional mitigations are required.

    Intel — RSBA

    On Intel processors that have a RSBA behavior, targets of RET instructions can sometimes be predicted using the JMP/CALL predictor. This behavior can be exploited by the Retbleed RSBU and the BHI vulnerabilities. See RSBU mitigations, and BHI mitigations.

    AMD — PhantomJMP

    Some AMD processors can be forced to speculatively execute any instruction (including RET instructions) as an indirect JMP instruction. This misprediction will cause a phantomJMP that will speculatively branch to a target address selected by the JMP/CALL predictor. This behavior should be prevented when using the retpoline mitigation. See the mitigation for BTC to prevent PhantomJMPs.

    Intel — Goldmont Plus and Tremont

    Intel recommands using eIBRS instead of retpoline when eIBRS is supported by the processor. In particular, retpoline may not be a fully effective mitigation for BTI on processors based on Intel Goldmont Plus and Tremont. Retpoline remains effective on other processors with eIBRS.

    Implementation Example

    Different implementations of the retpoline sequence are possible. As an example, here is the retpoline sequence used on the Linux for the indirect jump with the RAX register (jmp *%rax):

              call do_rop
              int3
    do_rop:   mov %rax,(%rsp)
              ret
    • The first line (call do_rop) pushes the address of the next line (line with the int3 instruction) on the stack and on the RSB, and it jumps to label do_rop.

    • The execution continues at label do_rop (mov %rax, (%rsp)) and it writes the target of the indirect jump (which is in register RAX) on the stack. This overwrites the value pushed on the stack at the previous step, but doesn’t change the RSB. So values on the stack and on the RSB are now different.

    • Then the RET instruction is executed and it jumps to the address present on the stack (which is the target of the indirect jump). If the RET instruction is speculatively executed then the target is retrieved from the RSB so it jumps to the line with INT3 instruction, and the speculation stops there because the INT3 instruction is a speculation barrier.

  • Supervisor-Mode Execution Prevention (SMEP)
    When SMEP is enabled, it prevents execution of code on user mode pages, even speculatively, from the kernel. Userspace code can only insert its own return addresses into the RSB, not return addresses of targets on kernel pages. So SMEP prevents addresses added to the RSB from userspace to be used from the kernel, and mitigates SpectreRSB when entering the kernel from userspace.

    AMD — Limitation

    On AMD processors, SMEP provides an effective mitigatation against SpectreRSB if the kernel and user virtual address spaces are disjoint with at least one unmapped 4K page separating them, otherwise RSB Stuffing should be used.

    Implementation

    • SMEP is supported if the processor enumerates CPUID.(EAX=7, ECX=0):EBX[7] as 1.

    • SMEP is enabled/disabled by setting/clearing bit 20 (SMEP) in register CR4.

  • Single Thread Indirect Branch Predictor (STIBP)
    JMP/CALL predictions can be shared between processor threads running on the same processors core. As a result, code running on a processor thread may be able to control predictions of indirect branches executed on sibling processor threads. Note that predictions from the RET predictor are never shared between processor threads.

    Enabling STIBP prevents indirect branch predictions from being con- trolled by another processor thread. Enabling IBRS also prevents indirect branch predictions from being controlled by another processor thread. So there is no need to enable STIBP when IBRS is enabled.

    Intel — STIBP and eIBRS

    Recent Intel processors, including all processors with eIBRS, provide this isolation for indirect branch predictions between processors threads without the need to set STIBP.

    This statement is from the Intel documentation but it is ambiguous: does it mean that STIBP is not needed when a processor supports eIBRS but eIBRS is not enabled?

    AMD — STIBP and AutoIBRS

    When AutoIBRS is enabled, indirect branch predictions are prevented from being controlled by another processor thread only when executing kernel code. To protect userspace, STIBP has to be enabled when running user code.

    Implementation

    • STIBP is supported if the processor enumerates:

      • On AMD: CPUID.(EAX=8000_0008h):EBX[15] as 1.
      • On Intel: CPUID.(EAX=7, ECX=0):EDX[27] as 1.

      Support for STIBP implies that MSR 48h (SPEC_CTRL) exists.

    • STIBP is enabled/disabled by setting/clearing bit 1 (STIBP) to 1 in the MSR 48h (SPEC_CTRL).

    AMD – STIBP Always On

    STIBP Always On – CPUID.(RAX=8000_0008h):EBX[17]
    When set, indicates that the processor prefers that STIBP is only set once during boot and not changed.

  • RSB Stuffing, Flushing and Clearing
    The RBS can be manipulated to prevent underflow, or remove user controlled entries.

    Mechanisms are available to stuff, flush or clear the RSB.

    • RSB stuffing is a software sequence to fill and overwrite the RSB, it is also known as RSB overwrite or RSB filling. After RSB stuffing, the RSB is not empty.

      RSB stuffing can be used to prevent RSB underflow and mitigate RSBU.

    • RSB flushing means removing the content of the RSB. After RSB flushing, the RSB is empty.

    • RSB clearing means clearing the entire content of the RSB. This can be done either by overwriting or removing the existing content. RSB flushing always totally clears the RSB. RSB stuffing can partially or totally clear the RSB.

      RSB clearing can be used to mitigate SpectreRSB.

    RSB Stuffing

    RSB stuffing is a sequence made of 32 CALL instructions with non-zero displacement. It can be used to totally or partially clear the RSB. It will totally clear the RSB on AMD processors which have no more than 32 RSB entries, and on Intel processors which do not support eIBRS.

    When RSB stuffing totally clears the RSB, it can be used as a RSB clearing sequence to mitigate SpectreRSB.

    Intel — Preventing RSB Underflow

    On Intel processor with RSBA behavior, RET instructions can be predicted using the JMP/CALL predictor when the RSB is empty. This behavior can defeat the retpoline mitigation and be used by the Retbleed (RSBU) vulnerability.

    Stuffing the RSB can be used to reduce the likelihood of the RSB to become empty and to underflow. To do so, RSB stuffing should be applied before RET instructions at risk of RSB underflow, for example on deep call stack, or when there can be imbalance between CALL and RET instructions (e.g. on context switch).

    In any case, preventing RSB underflow is not easy because the RSB can become empty under several different conditions, including some asynchronous events.

    RSB Stuffing/Flushing/Clearing

    RSB clearing can be used on kernel entry from userspace or from a guest VM (i.e. on VMExit) to prevent SpectreRSB. The RSB clearing ensures that the RSB has no entry added from a lesser privilege mode.

    The RSB stuffing sequence will totally clear the RSB on AMD processors which have no more than 32 RSB entries, and on Intel processors which do not support eIBRS. Some specific events or commands can also flush or clear the RSB.

    AMD — RSB With More Than 32 Entries

    On AMD processors with more than 32 RSB entries, the additional RSB entries are cleared when setting IBRS to 1.

    Note: All AMD processors before Zen 5 have 32 RSB entries or less. Zen 5 processors have more than 32 RSB entries.

    AMD references RSB stuffing as mitigation V2-3 and says that “all current AMD processors have a return address predictor with 32 entries or less. Future processors that have more than 32 RSB entries are planned to be architected to not require software intervention”. But the more recent AMD Programmer’s Manual indicates in the description of IBRS that “Processors implementing more than 32 return predictions include hardware to clear the additional entries when software writes a 1 to IBRS”.

    So what should be done to clear the RSB on Zen 5 is unclear: should IBRS be set or no intervention is required? Also on Zen 5, the RSB is flushed when the TLB is flushed, see ERAPS below.

    AMD — AutoIBRS and VMExit

    If AutoIBRS is enabled then the RSB is automatically cleared on VMExit.

    AMD — ERAPS

    On processors with the Enhanced Return Address Predictor Security (ERAPS) capability, the RSB is flushed when the TLB is flushed, even if the TLB is not entirely flushed. The TLB is flushed when using the INVPCID instruction, when writing to the CR3 register, or with some update to the CR4 register.

    The ERAPS capability was introduced on the 5th generation of AMD EPYC processors (EPYC 9xx5, aka Zen 5 or Turin).

  • LFENCE/JMP Sequence
    The LFENCE/JMP sequence is an alternative to retpoline. Note that this mitigation is not recommended anymore.

    The LFENCE/JMP sequence (aka AMD Retpoline, although it has nothing to do with the original retpoline sequence) replaces an indirect branch with a sequence where the load of the target address has finished before the branch is dispatched. Basically, this adds an LFENCE instruction right before the indirect branch instruction.

    For example, the indirect branch jmp *(%rax) is replaced with:

    lfence
    jmp *%rax

    The LFENCE instruction is a dispatch serializing function that will stop dispatching instructions until the branch target is in the RAX register and available at dispatch for the execution.

    AMD — Retpoline Alternative

    This mitigation was proposed by AMD as a faster alternative to retpoline. However, it was later discovered that the speculation window of this mitigation might be large enough to be exploited for Spectre v2. AMD recommends not to use this mitigation anymore.

    Intel — Goldmont Plus and Tremont

    On the Goldmont Plus and Tremont processors the retpoline may not be fully effective, and the LFENCE/JMP sequence may be an alternative, although this is not architecturally guaranteed. This is not a mitigation option that Intel is evaluating.

Specific Mitigations

Some vulnerabilities have specific mitigation requirements.

  • Post-Barrier RSB Mitigation
    Mitigation for ineffective RSB barrier.

    One some processors, commands that produce an RSB barrier (like IBPB) might not be fully effective for RSB-based predictions and requires an additional mitigation.

    This is issue can be fixed by using the RSB Stuffing sequence. If the original command (e.g. IBPB) was used only for producing an RSB barrier then it can be replaced by the RSB Stuffing sequence. Otherwise, the RSB Stuffing sequence should be used in addition of the original command.

    AMD — IBPB

    On AMD processors, this issue can affect the IBPB command, and the only solution is to use the RSB Stuffing sequence as explained above.

    Intel — IBPB and IBRS

    On Intel processors, this issue can affect the IBPB command. On processors with the eIBRS capability, it can also affect the IBRS command when used after a VMExit (to prevent a guest VM from controlling the RSB).

    In any case, the target that may be used across the RSB barrier is limited to the most-recent CALL instruction prior to the barrier. And the cross-barrier RSB target will not be used for RET instruction predictions made after the first post barrier CALL instruction.

    So this issue can be fixed with a simple software sequence to steer RSB predictions to benign code regions that restrict speculation. This can be done by ensuring that at least one CALL instruction is safely executed before the RSB-barrier.

    Note that processors with the PBRSB NO capability are not impacted by this issue.

  • SLS Mitigation
    Mitigation for Straight Line Speculation (SLS).

    With Straight Line Speculation (SLS), instructions sequentially following a branch instruction (JMP, CALL or RET) might be executed speculatively. This behavior can be prevented by adding a speculation barrier after the impacted instruction. Typically:

    • an LFENCE instruction is added after a CALL instruction;
    • an INT3 instruction is added after a JMP or RET instruction.

    AMD – JMP, CALL and RET instructions

    • SLS impacts the indirect and direct JMP and CALL instructions.
    • SLS impacts the RET instruction.

    Intel – JMP, CALL and RSBU

    • SLS impacts only the indirect JMP and CALL instructions;

    • SLS doesn’t impact the RET instruction.

      But, in some cases, RSB Underflow can cause instructions after a RET to be speculatively executed. This behavior is also prevented by adding a speculation barrier (typically an INT3 instruction) instruction after a RET instruction. For more information, see RSBU Mitigation.

    Note that SLS is not really a Spectre v2 variant, but its mitigation impacts instructions targeted by Spectre v2. So SLS mitigation often appears in software sequences used to mitigate Spectre v2 (such as retpoline).

  • BHI Mitigation (Intel)
    Mitigation for Branch History Injection (BHI).

    There are several ways to mitigate the BHI vulnerability depending on the capability of the processor.

    Hardware Mitigation

    Some processors have hardware features to mitigate BHI:

    • Processors with the BHI_NO capability are not impacted by the BHI vulnerability.

    • Processors with the RRSBA_DIS_S capability, and using the retpoline mitigation, can enable this capability to disable the use of alternate predictors on RSB underflow. This mitigates the BHI vulnerability only when combined with the retpoline mitigation.

    • Processors with the BHI_DIS_S capability can enable this capability to prevent predicted targets of indirect branches executed in the kernel from being selected based on branch history from branches executed in userspace (or in a guest VM).

      Note that BHI_DIS_S may not prevent predicted targets of indirect branches executed in userspace of a host from being based on branch history for branches executed in a guest VM. This can be prevented by using a software sequence to clear the BHB on VMExit.

    Other processors should use a software mitigation.

    Software Mitigation

    If the processor doesn’t have any hardware feature to mitigate BHI then a software sequence should be used to clear the BHB, and remove any potential attacker’s control of the BHB. The BHB should be cleared when entering the kernel from userspace or on a VMExit.

    TSX sequence

    If the processor supports Transactional Synchronization Extensions (TSX) then the TSX Abort command can be used to clear the BHB.

    This can be done a TSX sequence like this:

            xbegin label
            xabort $0
    label:

    BHB Clearing Loop

    Otherwise a generic short or long software sequence should be used depending on the processor.

    Here is what the software sequence looks like (based on the Linux code):

                movl    $<value1>, %ecx
                call    1f
                jmp     5f
                .align 64, 0xcc
    1:          call 2f
                ret
                .align 64, 0xcc
    2:          movl    $<value2>, %eax
    3:          jmp     4f
                nop
    4:          sub     $1, %eax
                jnz     3b
                sub     $1, %ecx
                jnz     1b
                ret
    5:          lfence

    <value1> and <value2> are integer values which depend on the sequence used (short or long).

    • On processors prior to Alder Lake, <value1> and <value2> should both be set to 5. This is the short sequence.

    • On Alder Lake, Sapphire Rapids and newer processors, <value1> should be set to 12 and <value2> to 7. This is the long sequence.

      Note that Alder Lake, Sapphire Rapids and newer processors support BHI_DIS_S which can be used to mitigate BHI. Newer processors can even have the BHI_NO capability and not be vulnerable to BHI. So there is usually no need to use the long sequence.

  • RSBU Mitigations (Intel)
    Mitigation for Retbleed RSB Underflow (RSBU).

    On processors with a RSBA behavior, when the RSB is empty, RET instructions can be predicted using the JMP/CALL predictor instead of using the RET predictor. This behavior can be exploited by the Retbleed RSB Underflow (RSBU) vulnerability. Processors which do not have a RSBA behavior are not affected by the RSBU vulnerability.

    Restricted RSBA (RRSBA)

    Some processors with an RSBA behavior can have a Restricted RSBA (RRSBA) behavior where predicted targets of RET instructions when the RSB is empty can be restricted to targets belonging to the current privilege mode. The RRSBA behavior is effective if the processors has the RRSBA capability and:

    • eIBRS is enabled (processors with the RRSBA capability always support eIBRS).
    • or the processor also has the RRSBA_DIS_S capability. In that case, the RRSBA behavior is always effective regardless of the setting of RRSBA_DIS_S or of eIBRS.

    When the RRSBA behavior is effective, it mitigates the RSBU vulnerability.

    Mitigations

    The following mitigations are possible for RSBU:

    • Processors which do not have a RSBA behavior are not affected by the RSBU vulnerability.

    • Processors with the RRSBA_DIS_S capability are not affected by the RSBU vulnerability.

    • Processors with IBRS can enable IBRS to mitigate RSBU.

    • Otherwise, RSB Stuffing can be used to reduce the likelihood of RSB underflow.

    Also on processors with a RSBA behavior which do not have the eIBRS capability (i.e. some Skylake processors without eIBRS), RSBU can be exploited to speculatively execute instructions following a RET instruction. This issue can be mitigated by added a speculation barrier (such as an INT3 instruction) after RET instructions. Note that this mitigation is similar to the SLS mitigation.

  • BTC and SRSO Mitigations (AMD)
    Mitigation for Branch Type Confusion (BTC) and Speculative Return Stack Overflow (SRSO). This includes the jmp2ret and safeRET sequences.

    Branch Type Confusion (BTC) happens when a BTB entry for an instruction collides with the entry for another instruction and is mispredicted with the wrong branch type. With BTC, any instruction, even a non-branch instruction, can be mispredicted. In particular, when an instruction is mispredicted as a JMP instruction, this creates a PhantomJMP.

    PhantomJMPs can be used to train the branch predictor using speculative execution (aka Training in Transient Execution or TTE), and manipulate the BTB or the RSB. PhantomJMPs combined with TTE to manipulate the RSB can be used to craft Speculative Return Stack Overflow (SRSO) attacks.

    BTC Variants

    Different BTC cases are defined depending on the actual instruction being processed:

    • BTC-NOBR: BTC on non-branch instructions and far branches (far branches are never predicted as taken).
    • BTC-DIR: BTC on RIP-relative branches (Jcc, near JMP, near CALL).
    • BTC-IND: BTC on indirect branch instructions (indirect JMP, indirect CALL).
    • BTC-RET: BTC on return instructions (RET)

    BTC Mitigations

    The following mitigations are possible for BTC:

    • IBPB can be used to mitigate all BTC variants as it flushes all BTB branch prediction information. Using IBPB on kernel entry (from userspace or on VMExit) mitigates all forms of BTC attacks from userspace or from a guest VM.

      However, on some processors, IBPB doesn’t flush older branch type predictions. In that case, IBPB is not an effective mitigation for BTC. IBPB flushes older branch type predictions on the following processors:

      • Zen and Zen2 processors;
      • Zen3 and Zen4 processors with the appropriate microcode;
      • processors with the IBPB_BRTYPE capability.
    • SLS mitigations can be used to mitigate BTC when a direct or indirect JMP or CALL instruction, or a RET instruction is predicted as a non-branch instruction.

    • When an instruction is mispredicted as a RET instruction, RSB Stuffing or SMEP can be used to ensure that addresses in the RSB are safe for `speculation.

    • BTC-IND can be mitigated with IBRS which prevents speculation at the predicted target of any instruction that is decoded as an indirect branch, or with retpoline which eliminates all use of indirect branches.

    • BTC-RET can be mitigated with jmp2ret or safeRET.

    • On Zen 2 processors, BTC-NOBR can be mitigated by setting the SuppressBPOnNonBr bit in the DE CFG2 MSR. When this bit is set, the branch prediction information on non-branch instruction is ignored, and speculation at the predicted target is prevented.

      However, although SuppressBPOnNonBr prevents speculation at the predicted target, it doesn’t prevent PhantomJMPs and Training in Transit Execution (TTE) so it doesn’t mitigate SRSO.

    SRSO Mitigation

    The following additional mitigations are available for SRSO:

    • Processors with the SRSO_NO capability are not vulnerable to any form of SRSO.

    • Processors with the SRSO_USER_KERNEL_NO are not vulnerable to SRSO across user/kernel boundaries.

    • Processors with the SRSO_MSR_FIX capability can use the BpSpecReduce bit of the BP_CFG MSR to mitigate SRSO across guest/host boundaries.

    • safeRET can be used to protect the kernel from SRSO.

    • Mitigation of SRSO across user/user an VM/VM boundaries requires the use of IBPB

    jmp2ret and safeRET

    jmp2ret and safeRET are similar software constructs that mitigate BTC for return instructions (BTC-RET) executed in the kernel. In addition, safeRET also mitigates SRSO.

    Return Thunk and Training Function

    Both jmp2ret and safeRET ensure that an attacker-controlled BTB entry is never used for predicting privileged RET instructions. This is achieved by using a return thunk and a training function:

    • Return thunk. All RET instructions are consolidated into a single piece of code. Instead of directly calling the RET instruction, code jumps to a return thunk that executes the RET instruction.

    • Training function. When entering the kernel, software calls the return thunk training function to safely trains the BTB entry for the RET instruction in the return thunk so that attacker-controlled prediction information is not used. The training function will remove any BTB information associated with the RET instruction in the return thunk, and then add a correct BTB entry for this instruction.

    As a result, the kernel has a unique RET instruction, and its BTB entry is protected. So this prevents a BTC attack to target this RET instruction.

    jmp2ret Sequence Example

    Here is the jmp2ret sequence as it is implemented on Linux:

    .align 64
    .skip 64 - (retbleed_return_thunk - retbleed_untrain_ret), 0xcc
    retbleed_untrain_ret:
            .bytes 0xf6   ; execute as test $0xcc, %bl
    retbleed_return_thunk:
            ret
            int3
            /* end of the test instruction */
            lfence
            jmp retbleed_return_thunk
            int3
    • retbleed return thunk is the jmp2ret return thunk which is called instead of the directly executing the RET instruction. It will just execute the RET instruction, like this:

      retbleed_return_thunk:
              ret
              int3
    • retbleed_untrain_ret is the jmp2ret training function, it will execute like this:

      retbleed_untrain_ret:
              test $0xcc, %bl
              /* end of the test instruction */
              lfence
              jmp retbleed_return_thunk
              int3

      The TEST instruction is a dummy instruction and its result is ignored. But it overlaps with the RET instruction of the return thunk (in retbleed_return_thunk) and causes the BTB information associated with the position of the RET instruction to be discarded without being used (this untrains the RET instruction). Then the code jumps to retbleed_return_thunk to execute the RET instruction and returns. In addition, executing the RET instruction correctly trains the BTB that a return instruction is present at this location.

    Notes:

    • There is a precise alignment to have the RET instruction starts at a cacheline boundary. This is a requirement on some processors (Zen 1 and Zen 2) for the overlapping instruction (TEST) to remove the BTB entry at the location of the RET instruction.

    • INT3 instructions after the RET and JMP instructions prevent SLS.

    safeRET

    For safeRET, the return thunk and the training function includes a safe return sequence which traps any speculative execution of the RET instruction. This way, the RET instruction is protected against RSB manipulation which might have occurred with SRSO.

    safeRET Sequence Example

    Here is the safeRET sequence as it is implemented on Linux:

    • srso_untrain_ret is the training function:

              .align 64
              .skip 64 - (srso_safe_ret - srso_untrain_ret), 0xcc
      srso_untrain_ret:
              .byte 0x48, 0xb8    ; execute as movabs $0xccccc30824648d48,%rax
      srso_safe_ret:
              lea 8(%rsp), %rsp
              ret
              int3
              int3
              /* end of the movabs instruction */
              lfence
              call srso_safe_ret
              ud2

      It will execute as:

      srso_untrain_ret:
              movabs $0xccccc30824648d48,%rax
              lfence
              call srso_safe_ret
              ud2
    • srso_return_thunk is the return thunk:

      srso_return_thunk:
              call srso_safe_ret
              ud2

    The difference with jmp2ret is that the return sequence (srso_safe_ret) is accessed with a CALL instruction instead of directly jumping to the RET instruction. The CALL instruction adds a RSB entry so that a speculative execution of the RET instruction will execute the UD2 undefined instruction and stop speculative execution.

    Because the return sequence is now accessed with a CALL instruction, the stack pointer needs to be adjusted (lea 8(%rsp), %rsp), before executing the RET instruction, so that it points to the address we effectively want to return to. When calling the training function (srso_untrain_ret), this extra instruction has to be overwritten in addition of the RET instruction. So the dummy instruction of the training function is now a MOVABS instruction instead of the TEST instruction used for jmp2ret.

    We can notice that in this sequence the LEA instruction starts at cacheline boundary and not the RET instruction. It is not clear if this is on purpose, or a mistake.

Hardening Options

In addition to mitigations, some hardening solutions are possible to provide defense-in-depth measures and further reduce the possibilily to exploit some vulnerabilites.

Clear Registers on Kernel Entry

Clearing registers on kernel entry prevent a user from controlling the register values and having them used during speculation execution.

Syscall Hardening

Syscall Weakness

Syscalls provide a large number of functions executed by kernel which can be invoked from userspace. Usually, syscalls are identified with a syscall number. When a syscall is invoked from userspace, the kernel is entered and it retrieves the kernel function which implements the syscall from a syscall table. Then that function is invoked using an indirect call.

With pseudocode, this would be something like this:

syscall_function syscall_table[] = [  syscall_read,
                                      syscall_write,
                                      syscall_open,
                                      ... ];

long syscall(unsigned int syscall_number, ...)
{
        ...
        rv = syscall_table[syscall_number](...);
        ...
}

This specific indirect call provides access to many kernel functions which can be selected from userspace and for which the caller can inject custom parameter values thanks to the syscall arguments. This provides an handy entry point for some Spectre v2 vulnerabilities like BHI.

Hardening

To prevent syscalls from being a backdoor for Spectre v2 attacks, syscalls can be hardened by removing the single syscall indirect call, and replacing it with direct calls.

So instead of getting the kernel function which implements the syscall from a syscall table and executing an indirect call, a large switch statement is used. The user provided syscall number is then checked against each valid syscall number and, if it matches, the corresponding kernel function is invoked with a direct call.

With pseudocode, this would give:

long syscall(unsigned int syscall_number, ...)
{
        ...
        switch (syscall_number) {
        case 0: return syscall_read(...);
        case 1: return syscall_write(...);
        case 2: return syscall_open(...);
        ...
        }
        ...
}

This prevents an easily accessible indirect call from being the target of Spectre v2 attacks. Note that this won’t prevent all Spectre v2 attacks, but this makes them harder as they would need to target an indirect call deeper in the kernel which will certainly be more difficult to control.

Clear Registers Before RET

Clearing registers before RET instructions makes it more difficult to exploit the Retbleed vulnerability (RSBU and BTC-RET) and to combined it with Training in Transient Execution (TTE). The clearing can be done on call-used registers or on all registers for extra protection.

The GCC compiler provides the -fzero-call-used-regs option to clear call-used registers at the end each function, and modifiers (such as all-gpr) are available to clear more registers.

Call Depth Tracking (Intel)

RSB underflow can be exploited by the Retbleed RBSU vulnerability. In particular, RSB underflow can occur when a program is returning from a deep call stack due to executing more RET instructions than the number of entries in the RSB.

Tracking the depth of calls can be used to figure out when the RSB is getting empty and is likely to underflow. When this happens, the RSB can be stuffed to prevent an underflow. This helps reducing the risk of an RSBU attack.

CET-IBT (Intel)

The Intel Control-flow Enforcement Technology (CET) provides the Indirect Branch Tracking (IBT) capability which enforces indirect branches to land on an endbranch (ENDBR) instruction. This basically prevents indirect branches to jump to a random address but only to addresses which have an ENDBR instruction (usually the beginning of a function).

CET-IBT limits, if not blocks, the speculative execution of an indirect branch if the target is not an ENDBR instruction. This constrains a BTI attack to target an address with an ENDBR instruction and so this reduces the attack surface, and limits speculation at the target address.

With CET-IBT, speculation may be limited with early implementations of CET, and it is completely blocked with later implementations.

Default Mitigations on Linux

This section describes default mitigations for Spectre v2 used on Linux kernel 6.13.0, when the kernel is built with all the available mitigations. Mitigations are listed with conditions under which they are used, like this:

  • ConditionsMitigation

For the STIBP mitigation, we can have:

  • STIBP (always) : STIBP is always enabled.
  • STIBP (prctl) : STIBP is enabled when running a process for which speculation of indirect branch has been disabled with prctl(2).

Default Mitigations on Intel Processors

BTI Mitigation

  • if eIBRS is supportedeIBRS

  • else if basic IBRS is supported
    and Retbleed is impacted by Retbleed

    basic IBRS

  • elseretpoline

BTI Between Processor Threads

  • if basic IBRS or retpoline is usedSTIBP (prctl)

BHI Mitigation

  • if retpoline is used and RRBSA_DIS_S is supportedRRBSA_DIS_S

  • else if BHI_DIS_S is supportedBHI DIS S

  • elseBHB Software Clearing Sequence (short sequence)

Guest VM → Host Kernel Mitigation

  • if eIBRS is not usedRSB Stuffing on VMExit

  • else if processor has post-barrier RSB issuePost-Barrier RSB Mitigation on VMExit

Default Mitigations on AMD Processors

BTI Mitigation

  • if AutoIBRS is supportedAutoIBRS

  • elseretpoline

BTI Between Processor Threads

  • if processor is impacted by RetbleedSTIBP (always)

  • elseSTIBP (prctl)

Guest VM → Host Kernel Mitigation

  • if AutoIBRS is not usedRSB Stuffing on VMExit

Retbleed and SRSO Mitigation

  • if processor is impacted by SRSO −→ safeRET

  • else if processor is impacted by Retbleed −→ jmp2ret

Additional Mitigations

In addition, the following mitigations are applied on both Intel and AMD:

  • RSB Stuffing on Context Switch (always enabled)

  • if SMEP is supportedSMEP

  • if IBPB is supportedIBPB on Context Switch (prctl)
    (IBPB is used on context switch with a process for which speculation of indirect branch has been disabled)

References

Vulnerabilities can have multiple references: – a Common Vulnerabilities and Exposures (CVE) number; – an AMD Security Bulletin (AMD-SB-XXXX); – an Intel Security Advisory (INTEL-SA-XXXX).

Vulnerabilities

01/2018 – Branch Target Injection (Spectre v2)

  • Names: Spectre v2, SpectreBTB, BTI

  • Platforms: Intel and AMD

  • References: CVE-2017-5715, INTEL-SA-00088

  • Vulnerability: The BTB can be trained from userspace (or from a guest VM) to mispredict a JMP/CALL indirect branch in the kernel.

  • Mitigations: STIBP, IBRS, IBPB, retpoline, LFENCE.

  • Notes: This is the initial Spectre v2 vulnerability and mitigations. Mitigations and recommandations have evolved with the discovery of new variants of the vulnerability.

07/2018 – SpectreRSB

03/2022 – Branch History Injection (BHI)

03/2022 – LFENCE/JMP Mitigation Update

  • Platforms: AMD only

  • References: CVE-2021-26401, AMD-SB-1036

  • Vulnerability: The speculation window of the AMD LFENCE/JMP mitigation may be large enough to be exploited for Spectre v2.

  • Mitigations: Recommendation is made not use the LFENCE/JMP mitigation for Spectre v2 but use retpoline or IBRS instead.

  • Notes: The LFENCE/JMP sequence (aka AMD Retpoline) was a mitigation proposed by AMD as an alternative to retpoline because it has better performances. With the disclosure of this issue, AMD has recommended not to use this mitigation anymore.

03/2022 – Straight-Line Speculation (SLS)

  • Names: SLS

  • Platforms: Intel and AMD

  • References: CVE-2021-26341, AMD-SB-1026

  • Vulnerability: On AMD processors may transiently execute instructions following an unconditional branch.

  • Mitigation: Place an INT3 after a RET or a JMP, and an LFENCE after a CALL.

  • Notes: This is not really a Spectre v2 variant, but this impacts instructions (JMP/CALL/RET) exploited by Spectre v2, and software sequence used for mitigating Spectre v2 like retpoline.

07/2022 – Return Stack Buffer Underflow (Retbleed – RSBU)

This issue is the Retbleed vulnerability on Intel. It is different from the Retbleed vulnerability on AMD.

07/2022 – Branch Type Confusion (Retbleed – BTC)

This issue is the Retbleed vulnerability on AMD. It is different from the Retbleed vulnerability on Intel.

08/2022 – Post-Barrier Return Stack Buffer Predictions (PBRSB)

  • Platforms: Intel and AMD

  • References:

  • Vulnerability: Commands that produce an RSB barrier (like IBPB) might not be fully effective.

  • Mitigation: RSB Stuffing or (Intel only) safely execute one CALL before the RSB barrier.

08/2023 – Speculative Return Stack Overflow (Inception – SRSO)

04/2024 – Native BHI

Documents

AMD

Intel

Linux