X

News, tips, partners, and perspectives for the Oracle Linux operating system and upstream Linux kernel work

XFS - 2019 Development Retrospective

Matt Keenan
Principal Software Engineer

Darrick Wong, Upstream XFS Maintainer and kernel developer for Oracle Linux, returns to talk about what's been happening with XFS.

Hi folks! It has been a little under two years since my last post about upcoming XFS features in the mainline Linux kernel. In that time, the XFS development community have been hard at work fixing bugs and rolling out new features! Let's talk about the improvements that have landed recently in the mainline Linux Kernel, and our development roadmap for 2020. The new reflink and online fsck features will be covered in separate future blog posts.

Lazy Timestamp Updates

Starting with Linux 4.17, XFS implements the lazytime mount option. This mount option permits the filesystem to skip updates to the last modification timestamp and file metadata change timestamp if they have been updated within the last 24 hours. When used in combination with the relatime mount option to skip updates to a last access timestamp when it is newer than the file modification timestamp, we see a marked decrease in metadata writes, which in turn improves filesystem performance on non-volatile storage. This enhancement was provided by Christoph Hellwig.

Filesystem Label Management

In Linux 4.18, Eric Sandeen added to XFS support for btrfs' label get and set ioctls. This change enables administrators to change a filesystem label while that filesystem is mounted. A future xfsprogs release will adapt xfs_admin to take advantage of this interface.

Large Directory

Dave Chinner contributed a series of patches for Linux 5.4 that reduce the amount of time that XFS spends searching for free space in a directory when creating a file. This change improves performance on very large directories, which should be beneficial for object stores and container deployment systems.

Solving the Y2038 Problem

The year 2038 poses a special problem for Linux -- any signed 32-bit seconds counter will overflow back to 1901. Work is underway in the kernel to extend all of those counters to support 64-bit counters fully. In 2020, we will begin work on extending XFS's metadata (primarily inode timestamps and quota expiration timer) to support timestamps out to the year 2486. It should be possible to upgrade to existing V5 filesystems.

Metadata Directory Tree

This feature, which I showed off late in 2018, creates a separate directory tree for filesystem metadata. This feature is not itself significant for users, but it will enable the creation of many more metadata structures. This in turn can enable us to provide reverse mapping and data block sharing for realtime volumes; support creating subvolumes for container hosts; store arbitrary properties in the filesystem; and attach multiple realtime volumes to the filesystem.

Deferred Inode Reclaim and Inactivation

We frequently hear two complaints lodged against XFS -- memory reclamation runs very slowly because XFS inode reclamation sometimes has to flush dirty inodes to disk; and deletions are slow because we charge all costs of freeing all the file's resources to the process deleting files. Dave Chinner and I have been collaborating this year and last on making those problems go away.

Dave has been working on replacing the current inode memory reclaim code with a simpler LRU list and reorganizing the dirty inode flushing code so that inodes aren't handed to memory reclaim until the metadata log has finished flushing the inodes to disk. This should eliminate the complaints that slow IO gets in the way of reclaiming memory in other parts of the system.

Meanwhile, I have been working on the deletion side of the equation by adding new states to the inode lifecycle. When a file is deleted, we can tag it as needing to have its resources freed, and move on. A background thread can free all those resources in bulk. Even better, on systems with a lot of IOPs available, these bulk frees can be done on a per-AG basis with multiple threads.

Inode Parent Pointers

Allison Collins continues developing the inode parent pointer feature. This has led to the introduction of atomic setting and removal of extended attributes and a refactoring of the existing extended attribute code. When completed, this will enable both filesystem check and repair tools to check the integrity of a filesystem's directory tree and rebuild subtrees when they are damaged.

Anyway, that wraps up our new feature retrospective and discussion of 2020 roadmap! See you on the mailing lists!

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.