Break New Ground

Hardcore Container Debugging

Vish Abrams
Architect, Cloud Development

You've read a bunch of online tutorials and you've managed to containerize your application. You have even exposed a port so you can reach the application from the outside world, but when you connect, you're greeted with an error page: "Cannot connect to the database". It's time to start debugging! Below you will find some methods for debugging containers, as well as some information about the crashcart tool we have developed at Oracle to make debugging easier.

Related content

Debugging Strategies

Containers can be a challenge to debug, especially when you are a little fuzzy on exactly what a container is and how it works. Some people treat containers like miniature vms, and go so far as to run an ssh daemon inside their container so that they can login when things go crazy. Others stick a bunch of useful tools inside their container and use `docker exec` to get a shell inside their container. But for those of us with slightly-more-sane operational practices, what do we do when things go wrong? 

Debugging from the Host

If you are using a microcontainer, your container only contains a single application and its dependencies. That means no debugging tools, no shell, no help at all! Fortunately, a lot of debugging can be done from the host. One of the most important tools in your arsenal is nsenter. A linux container is a combination of quite a few isolation and protection primitives, but the most important of these to understand are namespaces. Namespaces isolate your containerized process from other processes on the system. Nsenter allows you to enter existing namespaces.

For example, lets say you wanted to debug some networking issues in your container. You could start by entering the network namespace of your container. To do this, first determine the pid of a process in your container:

PID=docker inspect -f "{{.State.Pid}}" <container-id>

To get a shell in the network namespace you can use nsenter with the pid:

sudo nsenter -n -p$PID

Files that represent the namespaces can be found in the proc filesystem, and you can also use the location directly:

sudo nsenter -n/proc/$PID/ns/net

Nsenter is pretty powerful, especially for dealing with network issues. You can list interfaces or dump traffic with tools that you have installed on your host. If you need access to most of the other namespaces of the container, you can enter them in a similar way.

There is, however, one type of debugging that is challenging: accessing the container's filesystem. You can't enter the mount namespace of the container without losing access to the mount namespace of the host, which is where all your tools live. There are various ways to get access to the container's files depending on which version of docker and which fs driver you are using.  One fairly straightforward technique is to look in /proc/$PID/root, although absolute symlinks will be broken and you will have to manually translate file locations between the two views. 

Roadblocks to an Ideal Solution

The perfect solution would involve somehow mounting your debug tools inside the container when you need them, and then removing them when you are finished so you don't leave around any security vulnerabilities. There are two problems with this idea:

  1. You can't just load your stuff over what is in the container or you defeat the purpose of entering the mount namespace of the container. You have to put your debugging tools in a non-standard location and many tools are not happy when they are in other directories. There are issues with search paths for libraries and other problems that make loading your tools into a new location like /debug a non-starter.
  2. You cannot just mount a directory from the host into your container's mount namespace. For security reasons, bind mounting across namespaces is not allowed. You could definitely restart the container with your tools in a new volume mount, but this means a restart at the beginning and end of your debugging session, which can be very disruptive in some scenarios.

Removing the Roadblocks

The first thing we need is a set of debugging tools that are happy living in an alternative location. In order to be sure that things are going to run without a hitch, the entire build chain from binutils on should be built with a non-standard prefix. In addition, library dependencies should be static to make sure any libraries in the container don't conflict with our debugging tools and cause problems.

It turns out there is a pretty cool packaging system that builds in an alternative location: nix. Using nix allows us to load the /nix directory with our tools and as long as the container itself wasn't built with nix, we are free from conflicts. To also support debugging containers built with nix we could choose an alternate directory, like /dev/crashcart (it can be useful to prepend /dev because dev is almost always a writable tmpfs in containers, which means we can mount things there even if the root filesystem happens to be readonly).

To clear the second roadblock we need a way to mount new things into the container namespace. One option for this is to create an rslave mount when you create the container. For example, you could load an rslave mount into your container namespace with docker's volume command:

docker run -v /tmp/mymaster:/dev/crashcart:rslave mycontainer

This makes /dev/crashcart in the container a slave mount of /tmp/mymaster on the host. That means if you bind mount a directory over /tmp/mymaster it will be propagated to /dev/crashcart in the container. This technique means we can bind mount in tools on demand and remove them later. We can then use nsenter to enter the mount namespace and run our tools. There is still one drawback with this method. To use it you must create a special volume mount for every container that you run at start time. If you didn't run your container with the rslave mount, you still have to restart to do your debugging. Wouldn't it be great if there was some way to do it without starting the container with a volume?

Enter crashcart

It turns out there is a method that can be used to mount tools in the container on demand. It involves some tricky hacks, and it should be noted that it will not work if user namespaces are in use unless you are on kernel 4.8 or later. The strategy is to mknod a block device inside the container's mount namespace and then use the mount syscall to mount the block device to the filesystem. In order to have a block device, we can package up our binaries into an ext3 filesystem and create a loopback block device using /dev/loop.

Doing this method manually is almost impossible if you don't already have mknod and mount inside your container, so we developed a rust utility called crashcart to do it for you. Crashcart will mount the image into your container's namespace and run /dev/crashcart/bin/bash for you using either a method similar to nsenter or by calling docker exec. This gives you access to any tools that have been put into the crashcart image.

Why Rust?

All of the reasons for using rust covered in Building a Container Runtime in Rust apply to crashcart as well. Even though crashcart has under 700 lines of code and could be written in c, it is always nice to be memory safe to avoid potential security vulnerabilities. Rust can be a bit tough to read for newcomers, but we encourage people who are new to rust to dive in and collaborate on this project. It is a fascinating language and has some very useful characteristics.

The Future

Crashcart provides a way to debug containers in a unique way today, but things could definitely get better. We hope that by making this tool available to the community that things will improve. Some ideas for potential improvements to the techniques follow:

  1. Building a crashcart image with nix is very slow because it cannot take advantage of the standard nix package servers. It would be possible to provide alternative package servers that support the new location, but we are hoping other packagers can create alternative crashcart images. Think of the build-image.sh script as a prototype; it would be great to see alternatives that could install packages from rpms or debs or other distributions.
  2. It would be nice to have an easier method to side-load binaries that wouldn't require such a complicated set of syscalls. One way to do this would be for the kernel to relax restrictions around mounting across namespaces. Another option would be for docker to create an rslave mount by default. This would also allow binaries to be mounted from the host more easily on demand.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.