In this blog post, we discuss a new XFS feature of the Unbreakable Enterprise Kernel (UEK) 8, which is the ability to store parent directory pointers in child files. This feature has been under development since 2022, and is now available as a technology preview in UEK8.

What Problems Does This Solve?

Historically, Unix filesystems have implemented directory trees as a not-bidirectional graph. Directories point down to child files, but non-directory children do not point back up to parents. It’s therefore not possible to turn an open file descriptor or a file handle into a path through the directory tree.

Second, the lack of back pointers also limits the amount of reconstruction that a fsck tool can do; if a directory becomes corrupt, any unlinked children must be moved to a lost+found directory. With a redundant copy, automated tools can restore the directory tree to what users created, thereby saving system administrators time.

Third, there are many programs that use the BULKSTAT ioctl to scan the files of a XFS filesystems in inumber order. For many filesystems this results in a scan in linear order which saves time, but these programs have never been able to report status in terms of file paths which are much more convenient for humans.

How Does XFS Solve The Problem?

The solution is quite simple – for each link from a parent to a child file, add a second link that looks like a directory entry from the child file back to the parent. Now the directory graph is fully bidirectional. This solves the fsck problem by adding enough redundancy to the filesystem that it’s now possible to reconstruct a directory’s contents by scanning for children; or a child’s parent pointers by scanning for parents.

The path reconstruction problem is solved by recursively walking the parent pointers back to the root directory and printing the path found. The kernel can perform these upward tree walks internally, and userspace can call a new GETPARENTS ioctl to walk upwards one step at a time.

On disk, parent pointers roughly mirror a directory entry, except that there’s no need to store the file type (only directories can have children) and parent pointers store full file handles to strengthen the integrity checking of the directory tree. Parent pointers are stored in the extended attributes structure because that structure can be attached to all file types.

The file name linking the parent to the child is used as the extended attribute name; and the file handle is stored as the extended attribute value:

struct xfs_parent_rec {
    __be64  p_ino;
    __be32  p_gen;
} __packed;

The extended attribute structure is a natural place to store parent pointers because it already implements a key-value store, where the keys are user controlled strings. However, key value stores tend to be quite complex, and XFS is no exception. Ensuring referential integrity of a directory tree update is an absolute requirement – if the directory update commits, the parent pointer update must also commit. To support this new requirement, the extended attributes code was redesigned to use log intent items to track each state transition through an update. This allows XFS to restart a directory update after the system goes down.

Note: A previous attempt to add directory parent pointers to XFS was not a success because it did not guarantee referential integrity. This resulted in problems with inconsistent metadata.

Configuring the System

Because this is new functionality being offered as a technology preview, it isn’t yet enabled by default in UEK 8. Therefore, one must specially format a filesystem to enable the new feature. Here is an example of enabling this feature on a fresh 200TiB storage device:

$ mkfs.xfs /dev/nvme0n1 -f -n parent=1
meta-data=/dev/nvme0n1           isize=512    agcount=200, agsize=268435455 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
         =                       exchange=0
data     =                       bsize=4096   blocks=53687091000, imaxpct=1
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1, parent=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

The key thing to notice here is parent=1 in the output – this is the signal that directory parent pointers are ready to go.

The C API used by the sample code are defined in xfs.h, so it is also necessary to install the xfsprogs-devel package.

Note: Automatic fsck requires this feature to repair file metadata; if you format with -m autofsck=1 or use the provided ol_autofsck_10.0.conf file, then the directory parent pointers feature will be turned on.

Demonstration

Here’s a simple example of opening a file from the Linux source code tree, and then using the open fd to reconstruct the directory path:

$ sudo xfs_io -c 'parent' MAINTAINERS
p_ino     = 3225294236
p_gen     = 3850447861
p_namelen = 11
p_name    = "MAINTAINERS"

The kernel claims that the parent of MAINTAINERS is a directory with inode number 3225294236. Is this true?

$ stat .
  File: .
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 254,5   Inode: 3225294236  Links: 28

Yes, it is! Let’s make this example more interesting by introducing a hard link and running the query again:

$ ln MAINTAINERS fs/xfs/snails.txt
$ sudo xfs_io -c 'parent' MAINTAINERS
p_ino     = 3225294236
p_gen     = 3850447861
p_namelen = 11
p_name    = "MAINTAINERS"

p_ino     = 3628162029
p_gen     = 3596911673
p_namelen = 10
p_name    = "snails.txt"
$ stat fs/xfs/
  File: fs/xfs/
  Size: 12288           Blocks: 32         IO Block: 4096   directory
Device: 254,5   Inode: 3628162029  Links: 4

Here we see the new parent pointer entry for the hardlink that we added.

Looking at file handles isn’t very exciting, let’s use the same command to trace the file paths back to the root:

$ sudo xfs_io -c 'parent -p' MAINTAINERS
/home/djwong/cdev/linux-xfs/MAINTAINERS
/home/djwong/cdev/linux-xfs/fs/xfs/snails.txt

The xfs_scrub program in the xfsprogs-xfs_scrub RPM uses the bulkstat ioctl to enumerate all the file handles in the filesystem. This enables an efficient linear scan across the filesystem, but problem reports must be actionable. If you enable autofsck when formatting the filesystem, it uses parent pointer information to report file paths instead of file handles.

Let’s create a file with a very similar looking name to the one we just created. I’ve excerpted the output for brevity:

$ touch fs/xfs/snai1s.txt
$ sudo xfs_scrub -d -T -v -n /home/
<snip>
Phase 5: Check directory tree.
Info: /home/djwong/cdev/linux-xfs/fs/xfs: Unicode name "snails.txt" in directory could be confused with "snai1s.txt". (unicrash.c line 861)
<snip>

Sample Code

The code excerpts in this section show how a C program might harness the new GETPARENTS ioctl. To keep the examples simple, most error handling have not been provided. A robust program should implement that.

First, set up some memory buffers to receive the parent pointer information. The getparents object should otherwise be initialized to zero. Then open an interesting file:

char buf[65536];
struct xfs_getparents getparents = {
    .gp_buffer = (uintptr_t)buf,
    .gp_bufsize = 65536,
};
unsigned int n = 0;
int fd, ret;

fd = open("MAINTAINERS", O_RDWR);

The GETPARENTS ioctl is an iterator function, which means that user programs can call the kernel repeatedly until either the kernel returns the DONE flag, or the user program decides to move on:

while ((ret = ioctl(fd, XFS_IOC_GETPARENTS, &getparents)) == 0) {
    struct xfs_getparents_rec *gpr;

    if (getparents.gp_oflags & XFS_GETPARENTS_OFLAG_ROOT)
        break;

    gpr = xfs_getparents_first_rec(&getparents);
    while (gpr != NULL) {
        printf("parent[%u]: %s\n", n++, gpr->gpr_name);
        gpr = xfs_getparents_next_rec(&getparents, gpr);
    }

    if (getparents.gp_oflags & XFS_GETPARENTS_OFLAG_DONE)
        break;
}

This simple C program, therefore, prints the parents of the file “MAINTAINERS”.

Conclusion

As you have seen, it is now possible to construct a file path from a file descriptor or a file handle. This should result in better redundancy and reporting.

Appendix A: Reporting File Parent Pointers

/* LICENSE: GPLv2 */
#include <xfs/xfs.h>

int print_parents(const char *fname)
{
    char buf[65536];
    struct xfs_getparents getparents = {
        .gp_buffer = (uintptr_t)buf,
        .gp_bufsize = 65536,
    };
    unsigned int n = 0;
    int fd, ret;

    fd = open(fname, O_RDWR);
    if (fd < 0) {
        perror(fname);
        return -1;
    }

    while ((ret = ioctl(fd, XFS_IOC_GETPARENTS, &getparents)) == 0) {
        struct xfs_getparents_rec *gpr;

        if (getparents.gp_oflags & XFS_GETPARENTS_OFLAG_ROOT)
            break;

        gpr = xfs_getparents_first_rec(&getparents);
        while (gpr != NULL) {
            printf("parent[%u]: %s\n", n++, gpr->gpr_name);
            gpr = xfs_getparents_next_rec(&getparents, gpr);
        }

        if (getparents.gp_oflags & XFS_GETPARENTS_OFLAG_DONE)
            break;
    }
    if (ret)
        perror("GETPARENTS ioctl");

    close(fd);
    return ret;
}