Introduction to XFS Transaction Mechanism

March 28, 2023 | 8 minute read
Text Size 100%:

Introduction

This article aims to provide an introduction to XFS’ transaction mechanism and the lifecycle of modifying metadata and writing them back to disk. To illustrate this, this article will consider the use-case of changing an Inode’s modification timestamp. Note, the article is based on XFS functionality as defined in the v5.19 Linux kernel.

Overview

The figure below illustrates the process of modifying metadata and writing them to the on-disk log and then to the regular filesystem space.

 

XFS Writing Data Path

img

 

Modifying and updating metadata contents consists of three steps:

  1. High level transactions
    • Generally, high level transactions are created to service a filesystem modification request received through a system call (e.g. fallocate()).
    • Metadata items modified during servicing the system call are added to a list owned by the high level transaction.
    • There could be several high level transactions executing at any given time.
    • After completing all the required metadata modifications in memory, the high level transaction is committed i.e The in-core metadata items owned by the high level transaction are moved to a global list called Committed Item List (CIL).
    • Hence, The CIL consists of in-core metadata items that were modified and committed by one or more high level transactions.
  2. Checkpoint transactions
    • These transactions process metadata items present in the CIL.
    • The following actions are performed by a Checkpoint transaction once sufficient number of metadata items have accumulated in the CIL.
      1. Format each in-core metadata in the CIL to a metadata specific in-core structure.
      2. Formatted structures are written to the the on-disk log.
      3. The corresponding in-core metadata are then moved to the Active Item List (AIL).
  3. Xfsaild

    The kernel thread Xfsaild, wakes up at regular intervals to process in-core metadata that were added to the list at xfs_ail->ail_head. It copies the contents from fields of modified in-core metadata structures over to the on-disk structures. The updated on-disk structures is then written to the disk.

Deep dive

We now take a closer look into XFS’ transaction processing mechanism introduced above. We will do this by considering the modifications made to an inode due to changing the modification time of a file via:

touch -m /mnt/testfile

However, before we do that, we need to familiarize ourselves with some data structures that will be used in this process.

Data structures used when modifying Inodes

  1. High level XFS transaction

    A high level XFS transaction is represented by struct xfs_trans:

    struct xfs_trans {
        unsigned int        t_magic;
        ...
        struct list_head    t_items;
    };

    The list at xfs_trans->t_item is used to collect all metadata items that were modified by the high level transaction.

  2. Committed Item List

    The CIL is represented by struct xfs_cil:

    struct xfs_cil {
        ...
        struct list_head    xc_cil;
        ...
    };

    The list at xfs_cil->xc_cil links all the metadata items that have been committed by high level transactions to the CIL.

  3. Active Item List

    The AIL is represented by struct xfs_ail:

    struct xfs_ail {
        ...
        struct list_head  ail_head;
        ...
    };

    The list at xfs_ail->ail_head links all the metadata items that have been moved from the CIL and have been written to the on-disk log.

  4. XFS disk inode

    The on-disk XFS inode is represented by struct xfs_dinode:

    struct xfs_dinode {
        __be16        di_magic;
        __be16        di_mode;
        ...
        xfs_timestamp_t    di_atime;    /* time last accessed */
        xfs_timestamp_t    di_mtime;    /* time last modified */
        xfs_timestamp_t    di_ctime;    /* time created/inode modified */
        ...
    };
  5. The information held in an on-disk inode is distributed across the following two in-core structures.

    1. The XFS in-core inode is represented by struct xfs_inode:

      struct xfs_inode {
          ...
          xfs_ino_t        i_ino;
          ...
          struct xfs_inode_log_item *i_itemp;
          ...
          /* VFS inode */
          struct inode            i_vnode;
      };
    2. The VFS inode is represented by struct inode:

      struct inode {
          ...
          unsigned long           i_ino;
          ...
          struct timespec64       i_atime;
          struct timespec64       i_mtime;
          struct timespec64       i_ctime;
          ...
      };

      As depicted, the timestamps associated with a file are stored in the VFS inode structure.

  6. struct xfs_inode_log_item is used to track the fields of the in-core inode that have been modified and hence need to be written to the on-disk log:

    struct xfs_inode_log_item {
        struct xfs_log_item     ili_item;
        struct xfs_inode        *ili_inode;
        ...;
        unsigned int            ili_fields;
        ...;
    };

    The struct xfs_log_item (described below) links the inode item with other modified metadata items in lists which are used during transaction processing.

  7. struct xfs_log_item is used to link modified metadata items in lists owned by High level transactions, CIL & AIL:

    struct xfs_log_item {
        struct list_head        li_ail;
        struct list_head        li_trans;
        ...
        uint                    li_type;
        struct list_head        li_cil;
        struct xfs_log_vec      *li_lv;
    };
    • li_type indicates the type of the metadata instance being tracked by a log item. For example, a log item tracking an inode will have its li_type field set to XFS_LI_INODE.
    • li_trans links the log item to the list of metadata items modified by a high level transaction. The head of the list is present at xfs_trans->t_items.
    • li_cil links the log item to the list of metadata items which have been added into the CIL. The head of the list is present at xfs_cil->xc_cil.
    • li_ail links the log item to the list of metadata items which have been added into the AIL. The head of the list is present at xfs_ail->ail_head.

Updating an Inode’s modification timestamp

High level transaction

The command to change the modification timestamp of the inode causes the following events to occur inside the kernel:

  1. xfs_setattr_nonsize() does the following
    • Allocate a high level transaction (i.e. struct xfs_trans).
    • Allocate an Inode log item (i.e struct xfs_inode_log_item) and add the corresponding log item (i.e. xfs_inode_log_item->ili_item) to the list at xfs_trans->t_items.
  2. setattr_copy() copies the modification and change timestamps passed by the userspace process.
  3. The modification timestamp is part of the core on-disk Inode. The fact that the core inode has been modified is recorded in xfs_inode_log_item->ili_fields.
  4. The high level transaction finishes its execution by moving all the log items accumulated thus far (which includes the log item representing the inode) into the CIL.

Checkpoint transaction

The checkpoint transaction performs the following actions on the Inode log item present in the CIL:

  1. The value of xfs_inode_log_item->ili_fields indicates that the core fields of the inode are to be logged. Hence, XFS copies the contents of the core fields (which includes the modification timestamp) into struct xfs_log_dinode.

    struct xfs_log_dinode {
        uint16_t                di_magic;
        uint16_t                di_mode;
        ...
        xfs_log_timestamp_t     di_atime;
        xfs_log_timestamp_t     di_mtime;
        xfs_log_timestamp_t     di_ctime;
        ...
    };
  2. The contents of struct xfs_log_dinode is then written to the on-disk log along with other required book keeping information. The metadata thus written to the on-disk log will be used if a filesystem has to be recovered after an abnormal shutdown.
  3. The inode log item is moved to the AIL list at xfs_ail->ail_head after the write to on-disk log is completed.

Xfsaild

Xfsaild performs the following actions on the Inode log item present in the AIL:

  • The relevant contents of struct xfs_inode and struct inode are copied over to a struct xfs_dinode. The members of the in-core inode structures to be copied are decided based on the value of xfs_inode_log_item->ili_fields.
  • The updated struct xfs_dinode is written to the original location of the inode on the disk.

Conclusion

As mentioned at the beginning, this article tries to provide a high level overview of XFS’ transaction mechanism. I hope that this article serves as a stepping stone to now be able to understand the more detailed design document XFS Logging design.

Chandan Babu


Previous Post

Oracle Linux Automation Manager training resources

Craig McBride | 2 min read

Next Post


Propelling business innovation across on-premises and cloud environments with Linux

Gursewak Sokhi | 5 min read