This article aims to provide an introduction to XFS’ transaction mechanism and the lifecycle of modifying metadata and writing them back to disk. To illustrate this, this article will consider the use-case of changing an Inode’s modification timestamp. Note, the article is based on XFS functionality as defined in the v5.19 Linux kernel.
The figure below illustrates the process of modifying metadata and writing them to the on-disk log and then to the regular filesystem space.
Modifying and updating metadata contents consists of three steps:
Xfsaild
The kernel thread Xfsaild, wakes up at regular intervals to process in-core metadata that were added to the list at xfs_ail->ail_head
. It copies the contents from fields of modified in-core metadata structures over to the on-disk structures. The updated on-disk structures is then written to the disk.
We now take a closer look into XFS’ transaction processing mechanism introduced above. We will do this by considering the modifications made to an inode due to changing the modification time of a file via:
touch -m /mnt/testfile
However, before we do that, we need to familiarize ourselves with some data structures that will be used in this process.
High level XFS transaction
A high level XFS transaction is represented by struct xfs_trans
:
struct xfs_trans { unsigned int t_magic; ... struct list_head t_items; };
The list at xfs_trans->t_item
is used to collect all metadata items that were modified by the high level transaction.
Committed Item List
The CIL is represented by struct xfs_cil
:
struct xfs_cil { ... struct list_head xc_cil; ... };
The list at xfs_cil->xc_cil
links all the metadata items that have been committed by high level transactions to the CIL.
Active Item List
The AIL is represented by struct xfs_ail
:
struct xfs_ail { ... struct list_head ail_head; ... };
The list at xfs_ail->ail_head
links all the metadata items that have been moved from the CIL and have been written to the on-disk log.
XFS disk inode
The on-disk XFS inode is represented by struct xfs_dinode
:
struct xfs_dinode { __be16 di_magic; __be16 di_mode; ... xfs_timestamp_t di_atime; /* time last accessed */ xfs_timestamp_t di_mtime; /* time last modified */ xfs_timestamp_t di_ctime; /* time created/inode modified */ ... };
The information held in an on-disk inode is distributed across the following two in-core structures.
The XFS in-core inode is represented by struct xfs_inode
:
struct xfs_inode { ... xfs_ino_t i_ino; ... struct xfs_inode_log_item *i_itemp; ... /* VFS inode */ struct inode i_vnode; };
The VFS inode is represented by struct inode
:
struct inode { ... unsigned long i_ino; ... struct timespec64 i_atime; struct timespec64 i_mtime; struct timespec64 i_ctime; ... };
As depicted, the timestamps associated with a file are stored in the VFS inode structure.
struct xfs_inode_log_item
is used to track the fields of the in-core inode that have been modified and hence need to be written to the on-disk log:
struct xfs_inode_log_item { struct xfs_log_item ili_item; struct xfs_inode *ili_inode; ...; unsigned int ili_fields; ...; };
The struct xfs_log_item
(described below) links the inode item with other modified metadata items in lists which are used during transaction processing.
struct xfs_log_item
is used to link modified metadata items in lists owned by High level transactions, CIL & AIL:
struct xfs_log_item { struct list_head li_ail; struct list_head li_trans; ... uint li_type; struct list_head li_cil; struct xfs_log_vec *li_lv; };
li_type
indicates the type of the metadata instance being tracked by a log item. For example, a log item tracking an inode will have its li_type
field set to XFS_LI_INODE
.li_trans
links the log item to the list of metadata items modified by a high level transaction. The head of the list is present at xfs_trans->t_items
.li_cil
links the log item to the list of metadata items which have been added into the CIL. The head of the list is present at xfs_cil->xc_cil
.li_ail
links the log item to the list of metadata items which have been added into the AIL. The head of the list is present at xfs_ail->ail_head
.The command to change the modification timestamp of the inode causes the following events to occur inside the kernel:
xfs_setattr_nonsize()
does the following
struct xfs_trans
).struct xfs_inode_log_item
) and add the corresponding log item (i.e. xfs_inode_log_item->ili_item
) to the list at xfs_trans->t_items
.setattr_copy()
copies the modification and change timestamps passed by the userspace process.xfs_inode_log_item->ili_fields
.The checkpoint transaction performs the following actions on the Inode log item present in the CIL:
The value of xfs_inode_log_item->ili_fields
indicates that the core fields of the inode are to be logged. Hence, XFS copies the contents of the core fields (which includes the modification timestamp) into struct xfs_log_dinode
.
struct xfs_log_dinode { uint16_t di_magic; uint16_t di_mode; ... xfs_log_timestamp_t di_atime; xfs_log_timestamp_t di_mtime; xfs_log_timestamp_t di_ctime; ... };
struct xfs_log_dinode
is then written to the on-disk log along with other required book keeping information. The metadata thus written to the on-disk log will be used if a filesystem has to be recovered after an abnormal shutdown.The inode log item is moved to the AIL list at xfs_ail->ail_head
after the write to on-disk log is completed.
Xfsaild performs the following actions on the Inode log item present in the AIL:
struct xfs_inode
and struct inode
are copied over to a struct xfs_dinode
. The members of the in-core inode structures to be copied are decided based on the value of xfs_inode_log_item->ili_fields
.struct xfs_dinode
is written to the original location of the inode on the disk.As mentioned at the beginning, this article tries to provide a high level overview of XFS’ transaction mechanism. I hope that this article serves as a stepping stone to now be able to understand the more detailed design document XFS Logging design.
Next Post