Anatomy of a Debian package

Ever wondered what a .deb file actually is? How is it put together, and what's inside it, besides the data that is installed to your system when you install the package? Today we're going to break out our sysadmin's toolbox and find out. (While we could just turn to deb(5), that would ruin the fun.) You'll need a Debian-based system to play along. Ubuntu and other derivatives should work just fine.

Finding a file to look at

Whenever APT downloads a package to install, it saves it in a package cache, located in /var/cache/apt/archives/. We can poke around in this directory to find a package to look at.
spang@sencha:~> cd /var/cache/apt/archives
spang@sencha:/var/cache/apt/archives>
spang@sencha:/var/cache/apt/archives> ls
apache2-utils_2.2.16-2_amd64.deb
app-install-data_2010.08.21_all.deb
apt_0.8.0_amd64.deb
apt_0.8.5_amd64.deb
aptitude_0.6.3-3.1_amd64.deb
...
nano, the text editor, ought to be a simple package. Let's take a look at that one.

spang@sencha:/var/cache/apt/archives> cp nano_2.2.5-1_amd64.deb ~/tmp/blog
spang@sencha:/var/cache/apt/archives> cd ~/tmp/blogapt debian dpkg package-management

Digging in

Let's see what we can figure out about this file. The file command is a nifty tool that tries to figure out what kind of data a file contains.

spang@sencha:~/tmp/blog> file --raw --keep-going nano_2.2.5-1_amd64.deb 
nano_2.2.5-1_amd64.deb: Debian binary package (format 2.0)
- current ar archive
- archive file
Hmm, so file, which identifies filetypes by performing tests on them (rather than by looking at the file extension or something else cosmetic), must have a special test that identifies Debian packages. Since we passed the command the --keep-going option, though, it continued on to find other tests that match against the file, which is useful because these later matches are less specific, and in our case they tell us what a "Debian binary package" actually is under the hood—an "ar" archive!

Aside: a little bit of history

Back in the day, in 1995 and before, Debian packages used to use their own ad-hoc archive format. These days, you can find that old format documented in deb-old(5). The new format was added to be "saner and more extensible" than the original. You can still find binaries in the old format on archive.debian.org. You'll see that file tells us that these debs are different; it doesn't know how to identify them in a more specific way than "a bunch of bits":

spang@sencha:~/tmp/blog> file --raw --keep-going adduser-1.94-1.deb
adduser-1.94-1.deb: data
Now we can crack open the deb using the ar utility to see what's inside.

Inside the box

ar takes an operation code and modifier flags and the archive to act upon as its arguments. The x operation tells it to extract files, and the v modifier tells it to be verbose.

spang@sencha:~/tmp/blog> ar vx nano_2.2.5-1_amd64.deb
x - debian-binary
x - control.tar.gz
x - data.tar.gz
So, we have three files.

debian-binary

spang@sencha:~/tmp/blog> cat debian-binary
2.0
This is just the version number of the binary package format being used, so tools know what they're dealing with and can modify their behaviour accordingly. One of file's tests uses the string in this file to add the package format to its output, as we saw earlier.

control.tar.gz

spang@sencha:~/tmp/blog> tar xzvf control.tar.gz
./
./postinst
./control
./conffiles
./prerm
./postrm
./preinst
./md5sums
These control files are used by the tools that work with the package and install it to the system—mostly dpkg.

control
spang@sencha:~/tmp/blog> cat control
Package: nano
Version: 2.2.5-1
Architecture: amd64
Maintainer: Jordi Mallach 
Installed-Size: 1824
Depends: libc6 (>= 2.3.4), libncursesw5 (>= 5.7+20100313), dpkg (>= 1.15.4) | install-info
Suggests: spell
Conflicts: pico
Breaks: alpine-pico (<= 2.00+dfsg-5)
Replaces: pico
Provides: editor
Section: editors
Priority: important
Homepage: http://www.nano-editor.org/
Description: small, friendly text editor inspired by Pico
 GNU nano is an easy-to-use text editor originally designed as a replacement
 for Pico, the ncurses-based editor from the non-free mailer package Pine
 (itself now available under the Apache License as Alpine).
 .
 However, nano also implements many features missing in pico, including:
  - feature toggles;
  - interactive search and replace (with regular expression support);
  - go to line (and column) command;
  - auto-indentation and color syntax-highlighting;
  - filename tab-completion and support for multiple buffers;
  - full internationalization support.
This file contains a lot of important metadata about the package. In this case, we have:
  • its name
  • its version number
  • binary-specific information: which architecture it was built for, and how many bytes it takes up after it is installed
  • its relationship to other packages (on the Depends, Suggests, Conflicts, Breaks, and Replaces lines)
  • the person who is responsible for this package in Debian (the "maintainer")
  • How the package is categorized in Debian as a whole: nano is in the "editors" section. A complete list of archive sections can be found here.
  • A "priority" rating. "Important" means that the package "should be found on any Unix-like system". You'd be hard-pressed to find a Debian system without nano.
  • a homepage
  • a description which should provide enough information for an interested user to figure out whether or not she wants to install the package
One line that takes a bit more explanation is the "Provides:" line. This means that nano, when installed, will not only count as having the nano package installed, but also as the editor package, which doesn't really exist—it is only provided by other packages. This way other packages which need a text editor can depend on "editor" and not have to worry about the fact that there are many different sufficient choices available.

You can get most of this same information for installed packages and packages from your configured package repositories using the command aptitude show <packagename>, or dpkg --status <packagename> if the package is installed.

postinst, prerm, postrm, preinst
These files are maintainer scripts. If you take a look at one, you'll see that it's just a shell script that is run at some point during the [un]installation process.

spang@sencha:~/tmp/blog> cat preinst
#!/bin/sh

set -e

if [ "$1" = "upgrade" ]; then
    if dpkg --compare-versions "$2" lt 1.2.4-2; then
	if [ ! -e /usr/man ]; then
	    ln -s /usr/share/man /usr/man
	    update-alternatives --remove editor /usr/bin/nano || RET=$?
	    rm /usr/man
	    if [ -n "$RET" ]; then
	        exit $RET
	    fi
	else
	    update-alternatives --remove editor /usr/bin/nano
	fi
    fi
fi
More on the nitty-gritty of maintainer scripts can be found here.

conffiles
spang@sencha:~/tmp/blog> cat conffiles 
/etc/nanorc
Any configuration files for the package, generally found in /etc, are listed here, so that dpkg knows when to not blindly overwrite any local configuration changes you've made when upgrading the package.

md5sums
This file contains checksums of each of the data files in the package so dpkg can make sure they weren't corrupted or tampered with.

data.tar.gz

Here are the actual data files that will be added to your system's / when the package is installed.
spang@sencha:~/tmp/blog> tar xzvf data.tar.gz
./
./bin/
./bin/nano
./usr/
./usr/bin/
./usr/share/
./usr/share/doc/
./usr/share/doc/nano/
./usr/share/doc/nano/examples/
./usr/share/doc/nano/examples/nanorc.sample.gz
./usr/share/doc/nano/THANKS
./usr/share/doc/nano/changelog.gz
./usr/share/doc/nano/BUGS.gz
./usr/share/doc/nano/TODO.gz
./usr/share/doc/nano/NEWS.gz
./usr/share/doc/nano/changelog.Debian.gz
[...]
./etc/
./etc/nanorc
./bin/rnano
./usr/bin/nano

Finishing up

That's it! That's all there is inside a Debian package. Of course, no one building a package for Debian-based systems would do the reverse of what we just did, using raw tools like ar, tar, and gzip. Debian packages use a make-based build system, and learning how to build them using all the tools that have been developed for this purpose is a topic for another time. If you're interested, the new maintainer's guide is a decent place to start.

And next time, if you need to take a look inside a .deb again, use the dpkg-deb utility:

spang@sencha:~/tmp/blog> dpkg-deb --extract nano_2.2.5-1_amd64.deb datafiles
spang@sencha:~/tmp/blog> dpkg-deb --control nano_2.2.5-1_amd64.deb controlfiles
spang@sencha:~/tmp/blog> dpkg-deb --info nano_2.2.5-1_amd64.deb
 new debian package, version 2.0.
 size 566450 bytes: control archive= 3569 bytes.
      12 bytes,     1 lines      conffiles            
    1010 bytes,    26 lines      control              
    5313 bytes,    80 lines      md5sums              
     582 bytes,    19 lines   *  postinst             #!/bin/sh
     160 bytes,     5 lines   *  postrm               #!/bin/sh
     379 bytes,    20 lines   *  preinst              #!/bin/sh
     153 bytes,    10 lines   *  prerm                #!/bin/sh
 Package: nano
 Version: 2.2.5-1
 Architecture: amd64
 Maintainer: Jordi Mallach 
 Installed-Size: 1824
 Depends: libc6 (>= 2.3.4), libncursesw5 (>= 5.7+20100313), dpkg (>= 1.15.4) | install-info
 Suggests: spell
 Conflicts: pico
 Breaks: alpine-pico (<= 2.00+dfsg-5)
 Replaces: pico
 Provides: editor
 Section: editors
 Priority: important
 Homepage: http://www.nano-editor.org/
 Description: small, friendly text editor inspired by Pico
  GNU nano is an easy-to-use text editor originally designed as a replacement
  for Pico, the ncurses-based editor from the non-free mailer package Pine
  (itself now available under the Apache License as Alpine).
  .
  However, nano also implements many features missing in pico, including:
   - feature toggles;
   - interactive search and replace (with regular expression support);
   - go to line (and column) command;
   - auto-indentation and color syntax-highlighting;apt debian dpkg package-management
   - filename tab-completion and support for multiple buffers;
   - full internationalization support.

If the package format ever changes again, dpkg-deb will too, and you won't even need to notice.

~spang


Ksplice is hiring!

Do you love tinkering with, exploring, and debugging Linux systems? Does writing Python clones of your favorite childhood computer games sound like a fun weekend project? Have you ever told a joke whose punch line was a git command?

Join Ksplice and work on technology that most people will tell you is impossible: updating the Linux kernel while it is running.

Help us develop the software and infrastructure to bring rebootless kernel updates to Linux, as well as new operating system kernels and other parts of the software stack. We're hiring backend, frontend, and kernel engineers. Say hello at jobs@ksplice.com!

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Tired of rebooting to update systems? So are we -- which is why we invented Ksplice, technology that lets you update the Linux kernel without rebooting. It's currently available as part of Oracle Linux Premier Support, Fedora, and Ubuntu desktop. This blog is our place to ramble about technical topics that we (and hopefully you) think are interesting.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today