Tracing the ptrace

The ptrace system call in Linux enables a process to observe and control the execution of another process. It is a core component of the Linux kernel’s application debugging infrastructure, widely used by various tools and applications. By understanding how ptrace works and how to use it, developers can create more powerful and flexible tools and applications.

Before proceeding, let’s focus this blog on the target x86_64/Linux architecture. Debuggers are highly architecture-dependent, and field/register names may vary across different architectures.

ptrace enables debuggers, tracers, and various introspection tools. It allows one process (the tracer) to control another process (the tracee) by stopping it, reading and writing its memory and registers, single-stepping, intercepting syscalls, setting breakpoints, and performing other operations. Below are some of the requests provided by ptrace to analyze the behavior of a tracee.

This article serves as a valuable resource for developers seeking to debug applications, create new debugging tools, and improve security software. Additionally, it addresses key challenges related to Linux system issues associated with ptrace, including process tracing, system call monitoring, and performance optimization. Whether you’re working on system-level debugging, reverse engineering, or enhancing security mechanisms, this guide will equip you with the knowledge needed to leverage ptrace effectively.

Some of the important `ptrace` requests and Errors

PTRACE_ATTACH: Attaches the tracer to a process, allowing it to observe and control the tracee. First, a tracee must be attached to the tracer. Once attached, subsequent commands apply to each thread individually. In a multithreaded process, each thread can either be attached to a tracer or left unattached.
PTRACE_SEIZE: Attach to the process specified in pid, making it a tracee of the calling process. Unlike PTRACE_ATTACH, PTRACE_SEIZE does not stop the process on attach. Debugging without disrupting or stopping processes is more suitable for real-time applications or high-priority processes.
PTRACE_DETACH: Detaches the tracer from the process, allowing the tracee to resume normal execution.
PTRACE_TRACEME: Instructs the tracee to allow itself to be traced by a parent process.
PTRACE_PEEKDATA: Reads data from the tracee’s memory at a specified address.
PTRACE_POKEDATA: Writes data to the tracee’s memory at a specified address.
PTRACE_GETREGS: Retrieves the registers (e.g., general-purpose registers, program counter, stack pointer) from the tracee.
PTRACE_SETREGS: Sets the registers of the tracee, which can be useful for manipulating the execution flow or state.
PTRACE_CONT: Continues the execution of the tracee after a stop, optionally with a signal.
PTRACE_SYSCALL: Stops the tracee every time it enters or exits a system call, useful for tracing system calls.
PTRACE_SINGLESTEP: Executes a single instruction in the tracee, then stops. This is useful for debugging at the instruction level.

More details can be found in man pages of ptrace provided in the references.

One importnt error we should notice is EPERM, The specified process cannot be traced. This could be because the tracer has insufficient privileges (the required capability is CAP_SYS_PTRACE).

Ptrace action sequence

The interaction between a tracer and a tracee follows a defined sequence of actions.

Attachment :

The tracer initiates the tracing process by attaching to the tracee. This can occur in one of the following ways:
- Tracee allows the tracing : The parent process calls fork(), and in the child process, the following call is made: ptrace(PTRACE_TRACEME, ...). The child then executes exec(). The PTRACE_TRACEME command informs the kernel that the child wishes to be traced by its parent.
- Tracer Attaching to a Running tracee : The tracer attaches to an existing, unrelated process by calling: ptrace(PTRACE_ATTACH, ...). With this, the tracee receives a SIGSTOP signal and transitions into a stopped state.

Kernel State Modification :

When a tracee is successfully attached, the kernel sets a specific flag in the task_struct. The PT_PTRACED flag is set to indicate that the process is under tracing.

Event Interception Setup :

The tracer specifies which events should cause the tracee to pause, using commands such as: PTRACE_SYSCALL: Stop at every system call and PTRACE_SINGLESTEP: Stop after each instruction. These commands are issued through ptrace().

Tracee Stops on Events :

When an intercepted event (e.g., a system call) occurs: The kernel suspends the tracee. The kernel sends a SIGTRAP signal to the tracer. The tracer is unblocked from its wait() or waitpid() call.

Tracer’s Actions :

Once notified, the tracer gains control of the tracee and may perform the following actions:
- Register Access : ptrace(PTRACE_GETREGS, …), ptrace(PTRACE_SETREGS, …)
- Memory Access : ptrace(PTRACE_PEEKDATA, …), ptrace(PTRACE_POKEDATA, …)

Code Injection :

The tracer can modify the tracee’s memory and instruction pointer to inject and execute arbitrary code.

Resuming Execution :

After performing the desired inspection or modification, the tracer resumes the tracee’s execution by issuing: ptrace(PTRACE_CONT, ...); The tracee continues execution either from where it was paused or from an address specified by the tracer.

Role of task_struct in Tracing :

The Linux kernel represents each process using the task_struct. When a process is being traced via ptrace, the kernel modifies the following fields:
- ptrace Field: The PT_PTRACED flag is set to indicate the process is being traced.
- parent Field: This pointer is updated to reference the tracer process. The original parent is replaced for the purpose of receiving and sending signals.

Lets have our own debugger

Basic strategies to trace a process

tracer spawns and traces a child(tracee). The tracee calls PTRACE_TRACEME, and then callsexecve(). Parent waitpid()‘s and uses ptrace.
Or
Attach to an existing process (tracee), the tracer uses PTRACE_ATTACH <pid>, waits for the stop, then controls it.

We require waitpid() in the tracer to receive stop notifications (signals, breakpoints, syscall stops).

Example 1 :

mytracee.c
----------------
/* LICENSE: GPLv2 */
#include <stdio.h>
#include <unistd.h>

int abc(int local_abc) {
    //lets  loop and get debugged by the tracer
    printf("Addeess of local_abc: %p\n", &local_abc);
    while(1) {
    }
    return(local_abc);
}

int main() {
    int local_main = 0x12345;
    int pid = getpid();
    printf("Tracee PID: %d\n", getpid());
    abc(local_main);
    return 0;
}

mytracer.c
----------------
/* LICENSE: GPLv2 */
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <sys/user.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

pid_t tracee_pid;
int status;
struct user_regs_struct regs;

long get_word(pid_t tracee_pid, void *addr) {
    long word;
    errno = 0;
    word = ptrace(PTRACE_PEEKTEXT, tracee_pid, addr, NULL);
    if (errno != 0) {
        perror("ptrace(PTRACE_PEEKTEXT)");
        ptrace(PTRACE_DETACH, tracee_pid, NULL, NULL);
        exit(1);
    }
    return word;
}

void set_word(pid_t tracee_pid, void *addr, long data) {
    errno = 0;
    if (ptrace(PTRACE_POKETEXT, tracee_pid, addr, data) == -1) {
        perror("ptrace(PTRACE_POKETEXT)");
        exit(1);
    }
}

int wait_and_dumpreg() {
    // wait for child/tracee to complete exec and stop in main
    waitpid(tracee_pid, &status, 0);

    if(!WIFSTOPPED(status)) {
        printf("Tracee error, did not stop");
        return(1);
    }

    if(ptrace(PTRACE_GETREGS, tracee_pid, NULL, &regs) == -1){
        printf("PTRACE_GETREGS Error..");
    } else {
        printf("RIP: 0x%llx RSP: 0x%llx \n",
               (unsigned long long)regs.rip,
               (unsigned long long)regs.rsp);
    }

    return(0);
}

int main(void){
    int rc = 0;
    long data;
    unsigned long addr;
    unsigned long txt;
    long original_data;

    tracee_pid = fork();
    if(tracee_pid == 0){
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("./tracee", "./tracee", NULL);
        perror("execl");
        _exit(1);
    }

    rc = wait_and_dumpreg();
    ptrace(PTRACE_CONT, tracee_pid, NULL, NULL);
    rc = wait_and_dumpreg();

    printf("Enter the memory address to read (in hexadecimal): ");
    if (scanf("%lx", &addr) != 1) {
        fprintf(stderr, "Invalid address format\n");
        ptrace(PTRACE_DETACH, tracee_pid, NULL, NULL);  // Detach from the target process
        return (1);
    }

    // Read data from the target process memory
    data = get_word(tracee_pid, (void *)addr);
    printf("Data at address 0x%lx: 0x%x\n", addr, data);

    data = 0x678910;
    printf("Write data 0x%lx at address 0x%lx\n", data, addr);
    set_word(tracee_pid, (void *)addr, data);

    data = get_word(tracee_pid, (void *)addr);
    printf("Read data back from address 0x%lx: 0x%x\n", addr, data);

    ptrace(PTRACE_CONT, tracee_pid, NULL, NULL);
    return (rc);
}

Compile and Test:

# cc -g mytracee.c -o tracee
# cc -g mytracer.c -o tracer

Terminal1	Terminal2	Comments
./tracer RIP: 0x7f3474255f30 RSP: 0x7ffe9065e2e0 Tracee PID: 2686108 Addeess of local_abc: 0x7ffe9065e1ec RIP: 0x4005f7 RSP: 0x7ffe9065e1e0 Enter the memory address to read (in hexadecimal): Enter the memory address to read (in hexadecimal): 0x7ffe9065e1ec Data at address 0x7ffe9065e1ec: 0x12345 Write data 0x678910 at address 0x7ffe9065e1ec Read data back from address 0x7ffe9065e1ec: 0x678910	# ps aux \|grep trace root 2686107 0.0 S+ 10:43 0:00 ./tracer root 2686108 99.9 R+ 10:43 9:00 ./tracee # kill -18 2686108	Tracer forked the tracee Tracer waits for the tracee to attach. Once attached, tracer prints the IP and SP Tracee prints the `PID` and the address of the local variable `local_abc` The tracer is in a sleep state, waiting for the tracee to reattach. Meanwhile, the tracee is running in an infinite loop, executing `abc` Let’s send a `SIGCONT` signal to the tracee to reattach it. The tracee reattaches, and the tracer prints the SP and IP. Let’s read the local_abc. We read the variable local_abc at `0x7ffe9065e1ec` and wrote the value `0x678910` into it. We read the value of local_abc again from the tracee.

Lets validate the values:

# gdb -p 2686108

(gdb) bt
#0  abc (local_abc=6785296) at mytracee.c:8
#1  0x0000000000400630 in main () at mytracee.c:17

(gdb) info registers
rsp 0x7ffe9065e1e0     <<<GDB RSP match with printed SP>>>

(gdb) disassemble 0x7f3474255f30
Dump of assembler code for function _start:

(gdb) disassemble 0x4005f7
Dump of assembler code for function abc:
<<<Printed IP points to function abc>>> 

# addr2line -fe tracee 0x4005f7
abc
mytracee.c:8

(gdb) p/x local_abc
$2 = 0x678910
<<<We read and write the veriables successfully in the tracee>>>

Inference :

We witnessed gdb a.out or strace a.out type tracing.
We examined how the tracee attaches after the exec call.
We are aware how to continue the tracee after attach.
We are aware how to wait for a tracee.
We are aware how tracee attached to tracer on signals.
We examined the local variables on tracee and changed them.
We examined the instruction pointer at various stages of execution.

long insert_breakpoint(pid_t pid, void *addr) {
    long data = get_word(pid, addr);  // Read original instruction
    long breakpoint = (data & ~0xFF) | BREAKPOINT; // Set breakpoint byte
    printf("Setting break point 0x%lx\n", breakpoint);
    set_word(pid, addr, breakpoint);  // Write breakpoint instruction
    return(data);
}

void remove_breakpoint(pid_t pid, void *addr, long original_data) {
    set_word(pid, addr, original_data);  // Restore original instruction
}

The above can be used to tweak the text section of the tracee

Example 2

mytracee.c
----------------
/* LICENSE: GPLv2 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    while (1) {
        sleep(5);
        getpid();
        getppid();
    }
}

mytracer.c
----------------
/* LICENSE: GPLv2 */
#include <stdio.h>
#include <stdlib.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <unistd.h>
#include <errno.h>
#include <sys/user.h>

void do_tracer(pid_t tracee_pid) {
    int status;
    struct user_regs_struct regs;

    if (ptrace(PTRACE_ATTACH, tracee_pid, NULL, NULL) == -1) {
        perror("ptrace PTRACE_ATTACH");
        return;
    } else {
        printf("Tracee attached successfully\n");
    }

    // Wait for the tracee to stop on its first instruction
    waitpid(tracee_pid, &status, 0);

    if (WIFSTOPPED(status)) {
        printf("Child stopped, PID: %d\n", tracee_pid);

        while (1) {
            // Wait for the next stop (system call, breakpoint, etc.)
            if (ptrace(PTRACE_GETREGS, tracee_pid, NULL, &regs) == -1) {
                perror("ptrace GETREGS");
                exit(1);
            }

            printf("Trace syscall orig_rax=%lld:0x%llx \n",regs.orig_rax, regs.orig_rax);
            ptrace(PTRACE_SYSCALL, tracee_pid, NULL, NULL);

            // Wait for the tracee to stop again
            waitpid(tracee_pid, &status, 0);

            if (WIFEXITED(status)) {
                printf("Child exited with status %d\n", WEXITSTATUS(status));
                break;
            }
        }
    }
}

int main(int argc, char *argv[]) {
    pid_t tracee_pid;
    if (argc != 2) {
        fprintf(stderr, "Usage: %d <pid_of_program_to_debug>\n", argv[0]);
        exit(1);
    }
    tracee_pid = atoi(argv[1]);
    do_tracer(tracee_pid);

    return 0;
}

Complile and Test:

# cc -g mytracee.c -o tracee
# cc -g mytracer.c -o tracer

# ./tracee &
2690790
# ./tarcer 2690790
Tracee attached successfully
Child stopped, PID: 2690790
...
Trace syscall orig_rax=39:0x27
Trace syscall orig_rax=39:0x27
Trace syscall orig_rax=110:0x6e
Trace syscall orig_rax=110:0x6e
Trace syscall orig_rax=35:0x23
Trace syscall orig_rax=35:0x23
Trace syscall orig_rax=39:0x27
Trace syscall orig_rax=39:0x27
Trace syscall orig_rax=110:0x6e
Trace syscall orig_rax=110:0x6e
Trace syscall orig_rax=35:0x23
Trace syscall orig_rax=35:0x23
...

Sycall Table in x86_64 :

nanosleep : 35:0x23
getpid : 39:0x27
getppid : 110:0x6e

Manging EPERM with CAP_SYS_PTRACE:

Unprivileged processes are not permitted to trace other processes, due to security restrictions. The capability required for a process to act as a tracer is CAP_SYS_PTRACE. Let’s explore this with an example.

privileged_user # ./target &
[1] 1092690
# Target process started, PID: 1092690

unprivileged_user # ./tracer
Enter PID of the target process: 1092690
ptrace attach failed: Permission denied (EPERM): Operation not permitted


privileged_user # setcap cap_sys_ptrace=eip ./tracer

unprivileged_user # ./tracer
Enter PID of the target process: 1092690
Successfully attached to process 1092690

Inference :

We witnessed gdb -p <pid> type tracing.
The tracee loops trough sleep, getpid and getppid syscalls
We examined the sycalls correctly , similar to strace

ptrace(PTRACE_SINGLESTEP, child_pid, NULL, NULL);

Instead of tracing system calls, we can single-step the tracee’s code by using PTRACE_SINGLESTEP within the do_tracer loop of the tracer.

With this, we are now confident in proceeding with the design of a tracer based on ptrace.

NOTE : To keep this article simple and accessible, the examples provided are based on single-threaded programs. Please note that multi-threaded programs require additional handling and considerations, which are not covered in this basic guide.

Some of the softwares deployed drawn on `ptrace`

gdb: gdb is a debugger that uses ptrace to control the execution of a process and examine its memory and registers.
strace: strace is a system monitoring tool that uses ptrace to monitor the system calls made by a process.
SELinux: SELinux is security software that uses ptrace to monitor and control the behavior of processes.
qemu: qemu is virtualization software that uses ptrace to emulate the behavior of a process.
systemtap, valgrind perf: These tools also use ptrace to monitor, analyze and control the behavior of processes.

Precautions with `ptrace`:

While ptrace is a powerful tool for debugging and introspection, its use comes with several precautions and potential pitfalls.

Security Risks:

ptrace can be a powerful tool, but it can also be used to compromise the security of a system.

Privilege Escalation: ptrace can potentially be used by malicious processes to manipulate or monitor other processes
Access Control: ptrace can access to processes’ memory and registers and compromise the integrity of the system.

Performance Overhead:

ptrace introduces significant overhead, as it requires frequent context switches between the tracer and the tracee
ptrace can introduce delays in timing-critical code and may create race conditions, potentially affecting application performance and behaviour.
We must consider the potential for deadlocks and race conditions due to the blocking nature of ptrace.

ptrace is primarily used by tracers to attach to a process, enabling interaction during the initial stages of instrumentation setup. This capability is often used in conjunction with other mechanisms to support detailed process monitoring. Advanced debugging features, such as single-stepping, are typically associated with debuggers and fall outside the primary scope of ptrace‘s functionality.

Architecture and Operating system dependency:

ptrace is architecture-sensitive, so careful consideration is required when using it for debugging software. The available ptrace features may be limited depending on the platform.
Migrating software that uses ptrace across platforms and operating systems requires extra care.

Conclusion

ptrace is a powerful and versatile tool that enables detailed control and inspection of processes on Linux. However, It is crucial to be aware of the security implications, careful handling of permissions and inline to kernel security guidelines, while using ptrace. While ptrace can have some performance challenges, despite this, ptrace remains the most important tool for debugging, process inspection, and system-level monitoring. ptrace is an invaluable tool for systems programmers, security professionals and reverse engineers.

Reference

https://man7.org/linux/man-pages/man2/ptrace.2.html

Tracing the ptrace

Some of the important `ptrace` requests and Errors

Ptrace action sequence

Attachment :

Kernel State Modification :

Event Interception Setup :

Tracee Stops on Events :

Tracer’s Actions :

Code Injection :

Resuming Execution :

Role of task_struct in Tracing :

Lets have our own debugger

Basic strategies to trace a process

Example 1 :

Compile and Test:

Lets validate the values:

Inference :

Example 2

Complile and Test:

Manging EPERM with CAP_SYS_PTRACE:

Inference :

Some of the softwares deployed drawn on `ptrace`

Precautions with `ptrace`:

Security Risks:

Performance Overhead:

Architecture and Operating system dependency:

Conclusion

Reference

Partha Satapathy

Ksplice Known Exploit Detection for nftables (Netfilter), vlan, OpenVSwitch

Mount It Right: Rescuing Container Startup Times with favordynmods

Tracing the ptrace

Some of the important ptrace requests and Errors

Ptrace action sequence

Attachment :

Kernel State Modification :

Event Interception Setup :

Tracee Stops on Events :

Tracer’s Actions :

Code Injection :

Resuming Execution :

Role of task_struct in Tracing :

Lets have our own debugger

Basic strategies to trace a process

Example 1 :

Compile and Test:

Lets validate the values:

Inference :

Example 2

Complile and Test:

Manging EPERM with CAP_SYS_PTRACE:

Inference :

Some of the softwares deployed drawn on ptrace

Precautions with ptrace:

Security Risks:

Performance Overhead:

Architecture and Operating system dependency:

Conclusion

Reference

Authors

Partha Satapathy

Ksplice Known Exploit Detection for nftables (Netfilter), vlan, OpenVSwitch

Mount It Right: Rescuing Container Startup Times with favordynmods

Some of the important `ptrace` requests and Errors

Some of the softwares deployed drawn on `ptrace`

Precautions with `ptrace`: