Building Filesystems the Way You Build Web Apps

FUSE is awesome. While most major Linux filesystems (ext3, XFS, ReiserFS, btrfs) are built-in to the Linux kernel, FUSE is a library that lets you instead write filesystems as userspace applications. When something attempts to access the filesystem, those accesses get passed on to the FUSE application, which can then return the filesystem data.

It lets you quickly prototype and test filesystems that can run on multiple platforms without writing kernel code. You can easily experiment with strange and unusual interactions between the filesystem and your applications. You can even build filesystems without writing a line of C code.

FUSE has a reputation of being used only for toy filesystems (when are you actually going to use flickrfs?), but that's really not fair. FUSE is currently the best way to read NTFS partitions on Linux, how non-GNOME and legacy applications can access files over SFTP, SMB, and other protocols, and the only way to run ZFS on Linux.

But because the FUSE API calls separate functions for each system call (i.e. getattr, open, read, etc.), in order to write a useful filesystem you need boilerplate code to translate requests for a particular path into a logical object in your filesystem, and you need to do this in every FUSE API function you implement.

Take a page from web apps

This is the kind of problem that web development frameworks have also had to solve, since it's been a long time since a URL always mapped directly onto a file on the web server. And while there are a handful of approaches for handling URL dispatch, I've always been a fan of the URL dispatch style popularized by routing in Ruby on Rails, which was later ported to Python as the Routes library.

Routes dissociates an application's URL structure from your application's internal organization, so that you can connect arbitrary URLs to arbitrary controllers. However, a more common use of Routes involves embedding variables in the Routes configuration, so that you can support a complex and potentially arbitrary set of URLs with a comparatively simple configuration block. For instance, here is the (slightly simplified) Routes configuration from a Pylons web application:

from routes import Mapper

def make_map():
    map = Mapper()
    map.minimization = False
    
    # The ErrorController route (handles 404/500 error pages); it should
    # likely stay at the top, ensuring it can always be resolved
    map.connect('error/{action}/{id}', controller='error')
    map.connect('/', controller='status', action='index')
    map.connect('/{controller}', action='index')
    map.connect('/{controller}/{action}')
    map.connect('/{controller}/{action}/{id}')

    return map

In this example, {controller}, {action}, and {id} are variables which can match any string within that component. So, for instance, if someone were to access /spend/new within the web application, Routes would find a controller named spend, and would call the new action on that method.

RouteFS: URL routing for filesystems

Just as URLs take their inspiration from the filesystem, we can use the ideas from URL routing in our filesystem. And to make this easy, I created a project called RouteFS. RouteFS ties together FUSE and Routes, and it's great because it lets you specify your filesystem in terms of the filesystem hierarchy instead of in terms of the system calls to access it.

RouteFS was originally developed as a generalized solution to a real problem I faced while working on the Invirt project at MIT. We wanted a series of filesystem entries that were automatically updated when our database changed (specifically, we were using .k5login files to control access to a server), so we used RouteFS to build a filesystem where every filesystem lookup was resolved by a database query, ensuring that our filesystem always stayed up to date.

Today, however, we're going to be using RouteFS to build the very thing I lampooned FUSE for: toy filesystems. I'll be demonstrating how to build a simple filesystem in less than 60 lines of code. I want to continue the popular theme of exposing Web 2.0 services as filesystems, but I'm also a software engineer at a very Git- and Linux-heavy company. The popular Git repository hosting site Github has an API for interacting with the repositories hosted there, so we'll use the Python bindings for the API to build a Github filesystem, or GithubFS. GithubFS lets you examine the Git repositories on Github, as well as the different branches of those repositories.

Getting started

If you want to follow along, you'll first need to install FUSE itself, along with the Python FUSE bindings - look for a python-fuse or fuse-python package. You'll also need a few third-party Python packages: Routes, RouteFS, and github2. Routes and RouteFS are available from the Python Cheeseshop, so you can install those by running easy_install Routes RouteFS. For github2, you'll need the bleeding edge version, which you can get by running easy_install http://github.com/ask/python-github2/tarball/master

Now then, let's start off with the basic shell of a RouteFS filesystem:

#!/usr/bin/python

import routes
import routefs

class GithubFS(routefs.RouteFS):
    def make_map(self):
        m = routes.Mapper()
        return m

if __name__ == '__main__':
    routefs.main(GithubFS)

As with the web application code above, the make_map method of the GithubFS class creates, configures, and returns a Python Routes mapper, which RouteFS uses for dispatching accesses to the filesystem. The routefs.main function takes a RouteFS class and handles instantiating the class and mounting the filesystem.

Populating the filesystem

Now that we have a filesystem, let's put some files in it:

#!/usr/bin/python

import routes
import routefs

class GithubFS(routefs.RouteFS):
    def __init__(self, *args, **kwargs):
        super(GithubFS, self).__init__(*args, **kwargs)

        # Maps user -> [projects]
        self.user_cache = {}

    def make_map(self):
        m = routes.Mapper()
        m.connect('/', controller='list_users')
        return m

    def list_users(self, **kwargs):
        return [user
            for user, projects in self.user_cache.iteritems()
            if projects]

if __name__ == '__main__':
    routefs.main(GithubFS)

Here, we add our first Routes mapping, connecting '/', or the root of the filesystem, to the list_users controller, which is just a method on the filesystem's class. The list_users controller returns a list of strings. When the controller that a path maps to returns a list, RouteFS automatically makes that path into a directory. To make a path be a file, you just return a single string containing the file's contents.

We'll use the user_cache attribute to keep track of the users that we've seen and their repositories. This will let us auto-populate the root of the filesystem as users get looked up.

Let's add some code to populate that cache:

#!/usr/bin/python

from github2 import client
import routes
import routefs

class GithubFS(routefs.RouteFS):
    def __init__(self, *args, **kwargs):
        super(GithubFS, self).__init__(*args, **kwargs)

        # Maps user -> [projects]
        self.user_cache = {}
        self.github = client.Github()

    def make_map(self):
        m = routes.Mapper()
        m.connect('/', controller='list_users')
        m.connect('/{user}', controller='list_repos')
        return m

    def list_users(self, **kwargs):
        return [user
            for user, projects in self.user_cache.iteritems()
            if projects]

    def list_repos(self, user, **kwargs):
        if user not in self.user_cache:
            try:
                self.user_cache[user] = [r.name
                    for r in self.github.repos.list(user)]
            except:
                self.user_cache[user] = None

        return self.user_cache[user]

if __name__ == '__main__':
    routefs.main(GithubFS)

That's enough code that we can start interacting with the filesystem:

opus:~ broder$ ./githubfs /mnt/githubfs
opus:~ broder$ ls /mnt/githubfs
opus:~ broder$ ls /mnt/githubfs/ebroder
anygit	    githubfs	 pyhesiodfs	 python-simplestar
auto-aklog  ibtsocs	 python-github2  python-zephyr
bluechips   libhesiod	 python-hesiod
debmarshal  ponyexpress  python-moira
debothena   pyafs	 python-routefs
opus:~ broder$ ls /mnt/githubfs
ebroder

Users and projects and branches, oh my!

You can see a slightly more fleshed-out filesystem on (where else?) Github. GithubFS lets you look at the current SHA-1 for each branch in each repository for a user:

opus:~ broder$ ./githubfs /mnt/githubfs
opus:~ broder$ ls /mnt/githubfs/ebroder
anygit	    githubfs	 pyhesiodfs	 python-simplestar
auto-aklog  ibtsocs	 python-github2  python-zephyr
bluechips   libhesiod	 python-hesiod
debmarshal  ponyexpress  python-moira
debothena   pyafs	 python-routefs
opus:~ broder$ ls /mnt/githubfs/ebroder/githubfs
master
opus:~ broder$ cat /mnt/githubfs/ebroder/githubfs/master
cb4fc93ba381842fa0c2b34363d52475c4109852

What next?

Want to see more examples of RouteFS? RouteFS itself includes some example filesystems, and you can see how we used RouteFS within the Invirt project. But most importantly, because RouteFS is open source, you can incorporate it into your own projects.

So, what cool tricks can you think of for dynamically generated filesystems?

~broder

Comments:

Post a Comment:
Comments are closed for this entry.
About

Tired of rebooting to update systems? So are we -- which is why we invented Ksplice, technology that lets you update the Linux kernel without rebooting. It's currently available as part of Oracle Linux Premier Support, Fedora, and Ubuntu desktop. This blog is our place to ramble about technical topics that we (and hopefully you) think are interesting.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today