This blog entry was contributed by: Indu Bhagat, a member of the Oracle Linux Toolchain Team. Indu defined and wrote the SFrame format specification and developed it as part of the GNU Toolchain.

GNU logo

Introduction

In the fast-paced world of software development, understanding the “how” and “why” behind program execution is vital. When an error strikes, performance bottlenecks emerge, or deep insights into runtime behavior are needed, a stack trace is your indispensable guide. Simply put, a stack trace is a detailed record of the active function calls at a specific moment in a program’s execution, showing the nested sequence that led to that point. It’s fundamental for debugging (answering “how did we get here?”), implementing runtime mechanisms like exception handling, and performing stack unwinding.

While a program’s execution stack contains valuable clues, simply looking at the raw stack isn’t enough to reconstruct a full, accurate call sequence. To really trace back through nested calls, you need additional information beyond what’s immediately on the stack. For instance, determining the location of the previous stack frame and the exact “call site” within the calling function often requires more than just inspecting stack frame contents. Sometimes a return address might be found on the stack, but just as often it could be in a CPU register. This is where stack trace formats become essential: they define the crucial metadata that bridges the gap, providing the necessary auxiliary information to reliably perform stack tracing.

Enter SFrame, a new, lightweight, and highly efficient stack trace format designed to address the challenges and limitations of existing solutions. SFrame aims to provide accurate stack traces with minimal overhead, making it ideal for performance-critical scenarios like live tracing, profiling, and online debugging. Its design is guided by three core principles:

  • Minimal Data: SFrame captures only the essential stack frame details, drastically reducing overhead.
  • Ease of Use: SFrame is designed for straightforward parsing and generation by tools, simplifying integration.
  • Online Operations: SFrame is perfectly suited for real-time debugging and analysis, enabling responsive insights.

SFrame is currently expanding its adoption across various platforms, with Indu Bhagat actively working with the wider GNU/Linux community on its next version – SFrame version 3. See the Reference section for useful links. More on SFrame version 3 in a subsequent blogpost.

A very brief stack unwinding primer

The task of stack tracing can be summarized as: given a program counter (PC) value corresponding to the current execution point of a process and a pointer to its current stack frame, locate all the stack frames of the activation records that led to the current execution point, along with the corresponding activation call sites (PCs). For each frame, it is necessary to locate the stack frame corresponding to the previous activation, as well as the call site that led to executing the current function. This information is stored in either a register or some location in the stack, the details varying depending on the specific architecture and also on the ABI that defines the calling conventions. For example, in x86 the return address to return to from the current activation is always stored at some constant offset from the top of the stack while in Sparc it can be found in a particular general purpose register.

To complicate things further, even in the latter case the register shall obviously be saved on the stack every time a function gets called, and this should be taken into account as we reconstruct the stack frames.

What a call frame format shall provide is a conceptual table with a row for each value of the PC and a column for each register. This table is called the Unwind Table.

The content of each cell in the table t (trow,column) is the location where to find the value of the register corresponding to column, at the PC corresponding to row.

For example, if we know that the return address, according to the ABI in use, can be found in register %i7, we lookup the row corresponding to the current program counter and we will get the entry in the column for %i7, which contains the location of the value of that register at that particular execution point. If the current frame is the innermost one (and the compiler hasn’t been doing any other handling of registers), the location of the return address will be the physical register %i7, but if we are not in the innermost function or if the compiler has reused %i7 for general allocation, the location will likely be some offset within the current stack frame.

Note that the unwind table described here is only “conceptual”, meaning that it is not allocated explicitly in its entirety, due to the prohibitively large size of storing all the necessary information, most of it redundant. This means that the table needs to be computed on demand, when a backtrace is desired.

Why Another Stack Tracing Format? The Problem with the Status Quo

As it is today, the landscape of stack tracing methods is fragmented, with many existing solutions suffering from limitations or being tied to specific Application Binary Interfaces (ABIs) or processor architectures. The two most commonly used methods, while prevalent, each present significant drawbacks:

  • Frame Pointer Based Stack Tracing: This method relies on reserving a dedicated CPU register, known as the frame pointer, to consistently point to the base of the current stack frame. While seemingly straightforward, this approach comes at a cost. Dedicating a general-purpose register for a single purpose limits its availability for other computations, increasing “register pressure”. This may also lead to more frequent “register spilling” – where register contents must be saved to memory (the stack) and reloaded – resulting in larger, slower code. For this reason, modern optimizing compilers increasingly choose to forgo maintaining a frame pointer by default. Further, it is becoming increasingly clearer that frame pointer method has serious reliability issues as well. A rough estimate suggests that anywhere from 5-7% of stack traces may be impacted by inaccuracies due to ‘not yet completely setup frame pointer’ in prologues.
  • Exception Handling Frame (EH Frame) Based Stack Tracing: Rooted in the comprehensive DWARF debugging format, EH Frame encodes sophisticated rules for reconstructing the state of registers and the return address of the previous stack frame at any given instruction. Essentially, it’s a compact program composed of DWARF opcodes and expressions. To generate a backtrace, this encoded information must be interpreted and executed on demand by a specialized unwinder. While powerful and compact, this decoding process requires a non-trivial interpreter, adding complexity and overhead to the unwinding process.

Both of these widely used methods have inherent disadvantages that have unfortunately led to a fragmented ecosystem for stack tracing. The sentiment among developers has long highlighted frustrations with either of the above two methods.

As far back as 2012, Linux kernel maintainer Linus Torvalds famously expressed his dissatisfaction with DWARF unwinders in the kernel mailing list, pushing for simpler alternatives like Oops Rewind Capability (ORC) stack trace format in the Linux kernel. Another custom stack trace format born to fulfill the need for reliable stack traces is used in the PARCA agent.

Fun fact and a walk down memory lane to the very beginnings of the SFrame format: SFrame wasn’t born out of a developer’s personal passion project. Instead, the first call for help came from an internal Oracle team that was building a critical application and needed a fast, and low-overhead way of stack tracing. The team needed reliable, “on-the-fly” method, but found EH Frame to be too complex.

One of the options explored was simplifying the DWARF frame info but this didn’t solve the core complexity issues. Then, in conversations with DWARF experts externally, it was concluded that an entirely new, simpler format may be necessary to fill this void.

The creation of the SFrame format was therefore driven by a clear, two-fold goal: to enable fast and low-overhead stack tracing, and to provide reliable and asynchronous stack tracing. SFrame achieves this by eliminating the need to preserve the frame pointer in application code, and by offering a robust alternative for scenarios where EH Frame-based tracing is either undesirable or technically infeasible.

The core insight was to achieve simplicity and efficiency not by sacrificing accuracy, but by exploiting established, implicit ABI rules. Instead of a general-purpose instruction set to reconstruct arbitrary register states, SFrame focuses on encoding only the minimal, directly computable stack trace information: the Canonical Frame Address (CFA), the Frame Pointer (FP) if explicitly saved, and the Return Address (RA). This design paradigm allows for a direct, lookup within the unwind table rather than an interpretive execution, dramatically simplifying the stack tracing logic and reducing its runtime footprint.

In that sense, SFrame shares many similarities with the kernel-specific ORC format, but unlike ORC, SFrame is designed to be a general-purpose format, not limited to kernel code. Currently, SFrame officially supports x86_64, aarch64, and s390x, with the flexibility to incorporate other architectures and targets in the future.

Compactness of SFrame vs EH Frame

EH Frame is notorious for being extremely compact. This is fundamental because its main purpose is to aid in the implementation of exception handling, and therefore it must be loaded into memory so the run-time libraries can find it. SFrame must satisfy similar constraints, when it comes to compactness. However, these two formats achieve compactness in different ways:

  • EH Frame only tracks movements of data, it provides one “row” for each PC where something of interest happens to the location of the data used for backtracing (as opposed to a “row” for each PC). This is done by making use of DWARF opcodes and expressions. Effectively what is encoded in EH Frame is a program composed of DWARF opcodes and expressions that, once executed, recreates the full table of tracked locations (conceptually speaking). This is neat, but requires to implement some non-trivial logic in the interpreter (the interpreter in this context is what executes the DWARF instructions to build the table).
  • SFrame, on the other hand, encodes an explicit table rather than a program to re-create it, but it achieves compactness by not encoding or storing explicitly what can be inferred by the ABIs of the supported architectures, namely x86_64 and aarch64 (and s390x). This allows to process SFrame in a very fast and simple way, since no interpreter is necessary. For example, if in some architecture and ABI the return address is always stored at some fixed offset in each stack frame, the call frame info format doesn’t bother to encode it. The stack tracer or error handler knows about this, and can directly go and fetch the value from the right place of the stack.

The following table shows some numbers comparing the size of SFrame and EHFrame sections in a few real-life programs. It shows that generally speaking SFrame is as compact as EH Frame. SFrame is capable of providing CFI (Call Frame Information) in a form that is at least as compact as EH Frame, and at the same time much simpler to decode and use.

CFLAGS=”-g -O2 -Wa,–gsframe” CXXFLAGS=”-g -O2 -Wa,–gsframe”
Program x86_64 ratio (SFrame size/EH size) aarch64 ratio (SFrame size/EH size)
addr2line 1.03 0.67
as 1.03 0.69
gdb 0.91 0.69
ld 1.02 0.68
objcopy 1.03 0.67
readelf 1.04 0.65
cc1 1.00 0.65
cc1plus 1.00 0.65
emacs 0.98 0.65
AVG 1.00 0.67

The Larger Picture: SFrame’s Strategic Advantages over Existing Methods

Bringing it all together, SFrame emerges as a viable solution for stack tracing, offering distinct technical and practical advantages over both frame pointer- based and EH Frame-based methods. These benefits directly address the long-standing challenges of performance, reliability, and deployment across diverse computing environments.

Enabling Smaller, Faster Programs: The fundamental motivation for compilers to abandon dedicated frame pointers was performance. By freeing up a general-purpose register that would otherwise be reserved, compilers gain greater flexibility in register allocation. This reduces “register pressure” and minimizes the need for costly “register spilling” to memory. SFrame, by design, eliminates the need for applications to preserve a frame pointer for stack tracing, directly contributing to more compact and more performant generated code.

Efficacy in Constrained Environments: SFrame’s minimalistic design allows for stack tracing without requiring dynamic memory allocation (i.e., it can be “malloc-free”). This makes it exceptionally well-suited for highly restricted or performance-critical environments, such as within the kernel, in error handling contexts, or in BPF programs, where memory allocation is either forbidden or comes with prohibitive overhead. A typical SFrame- based stack tracer can be implemented in just a few hundred lines of code, further emphasizing its lean footprint. Projects like PARCA, for instance, have shown keen interest in SFrame for its suitability in such demanding contexts.

Eliminating Missed Frames and Enhancing Accuracy: Frame pointer updates occur during a function’s prologue (where the stack frame for that function is being set up). Crucially, the frame pointer register isn’t immediately updated at the very beginning; stack frame space must first be allocated or some minimal register saves be done before updating the frame pointer. This creates a brief window within the function prologue where the frame pointer still points to the stack frame. If an asynchronous event, like a signal, arrives during these few instructions, a frame pointer-based unwinder will mistakenly attribute the event to the caller, leading to the function being entirely missed from the stack trace. SFrame, by nature of it being off-band information, avoids this inherent race condition, ensuring more accurate and complete backtraces.

Unlocking Compiler Optimization Opportunities: Modern compiler optimizations, such as “shrink wrapping” (where a function’s prologue /epilogue code is minimized and only executed when necessary), often rely on the freedom to manage the stack frame without the strict requirement of maintaining a frame pointer at all points of execution. By providing an alternative stack tracing mechanism, SFrame opens the door for compilers to implement even more aggressive optimizations, further improving program performance and code size.

The Sole Reliable Method for Certain Architectures: On architectures like s390x, unwinding using frame pointers is fundamentally unsupported or riddled with issues due to the lack of proper 64-bit ABI specifications and compiler support for a reliable “backchain” mechanism. For s390x, SFrame stands out as the only viable and reliable method to obtain accurate stack traces, making it a critical enabling technology for s390x.

More efficient Stack Tracing: Integrating SFrame stack tracer support into the kernel will provide a huge improvement for profiling in Linux. Early tests using s390x support for SFrame have already shown notable improvements in both the sampling rate and the size of the recorded data when capturing call graphs with perf. This makes performance profiling more efficient and less resource-intensive.

Consolidating Stack Tracer Implementations: The reality is that the vast majority of GNU/Linux distributions do not build their software with frame pointers enabled, or do so only for a subset of architectures. This means that for consistent, reliable stack tracing across the ecosystem (e. g., for kernel-level tools), supporting a dedicated stack frame format like EH Frame or SFrame is unavoidable. SFrame offers a unified, general solution that can cater to the needs of diverse distros and architectures, reducing the necessity for multiple stack tracer implementations. Frame pointers, while simple in theory, cannot serve as a universal solution for reliable stack tracing.

Adoption and Ongoing Work

There is increasing interest in adopting SFrame stack tracing across the GNU/Linux ecosystem. The format is gaining traction within key projects, paving the way for its widespread use in modern Linux distributions and beyond.

Crucially, SFrame-based stack tracing for the Linux kernel is seeing active development, with recent patches for the Linux kernel to allow deferred stack unwinding for userspace stacks. These have been recently merged into the tree for the upcoming 6.17 Linux kernel release. This will enable robust, efficient user-space stack tracing directly from kernel code. Furthermore, the SFrame backtracer has now been merged into Glibc 2.42 officially released on July 28, 2025 (thanks to Claudiu Zissulescu for the valuable work), providing a foundational component for SFrame utilization at the C library level. These developments collectively underscore SFrame’s growing momentum and its future as a standard for reliable stack tracing.

More to come

In a subsequent blog, we will cover the SFrame format itself in more detail: what does SFrame stack trace information look like and how ELF binaries are making use of it in the GNU/Linux ecosystem. There are a lot of ongoing activities around SFrame’s evolution as well to support the growing usecases. All this and more in future blogposts.

References

Articles

Linux distributions

Presentations