Introduction

The Introduction to XFS Transaction Mechanism blog provided an overview of Checkpoint transactions. The metadata tracked by these transactions end up on the on-disk log. This article aims to provide a description of the on-disk layout of the checkpoint transactions.

On-disk log layout

The on-disk log’s contents can be visualized as a sequence of log records.

img

The pointers Tail and Head demarcate the boundaries of the active region of the log (indicated with the blue color). The Head of the log is where new log records are supposed to be written. The Tail of the log points to the oldest log record whose metadata items are yet to be written to their respective locations on disk. It is moved forward when the metadata in the oldest log record is written back.

A filesystem operation which modifies an inode will cause the modified parts of the inode to be written into the new log record LRq+1. The Head pointer will then point to immediately after LRq+1.

img

Assuming the log record LRp contained modifications made to the superblock, updating the corresponding on-disk superblock with these modifications will move the tail to point to LRp+1.

img

Also, The Head will wrap around when the end of the on-disk log is reached.

img

Each of the Head and Tail pointers are actually composed of two values,

  • Cycle number of the log
  • Offset inside the on-disk log.

The cycle number component is incremented when the Head pointer wraps around the log.

Each log record contains a subset of fields of one or more metadata items among other things. Also, log records are of varying sizes based on the number and size of the metadata items modified.

Log record structure

The log record starts with a log record header followed by an alternating sequence of Operation header and an optional field which contains the metadata (e.g. Inode) that has been modified.

img

Log record header

The log record header is represented by struct xlog_rec_header. The following are some of the important fields of the header,

  • h_magicno Set to XLOG_HEADER_MAGIC_NUM.
  • h_len The length of the log record.
  • h_lsn The Log Sequence Number of the log record. It indicates the location in the on-disk log where the log record starts. Similar to the Tail and Head pointers, this field is composed of Cycle number and an offset.
  • h_tail_lsn The value of the Tail pointer at the time of writing this log record to the on-disk log.
  • h_num_logops The number of Operation headers in the log record.

Checkpoint transactions

A Checkpoint transaction consists of an alternating sequence of Operation header and Metadata. Checkpoint transactions can be laid out in on-disk log in many ways. The following list illustrates a subset of the possible cases.

  1. One checkpoint transaction fits exactly in a single log record.

    img

  2. One checkpoint transaction spread across several log records.

    img The value of h_num_logops in each log record will be set to the number of operation headers in that log record rather than the total number of operation headers in the checkpoint transaction.

  3. Mulitple checkpoint transactions written inside a single log record.

    img The value of h_num_logops will be set to the total number of operation headers across all the checkpoint transactions.

Log Operation header

An Operation header describes the content of metadata which is logged right next to it. The operation header is represented by a struct xlog_op_header. The following are some of the important fields of this structure.

  • oh_tid The ID of the checkpoint transaction.
  • oh_len The length of the metadata payload.
  • oh_flags Valid values for this field are,
    • XLOG_START_TRANS Indicates that this is a start record.
    • XLOG_COMMIT_TRANS Indicates that the entire checkpoint transaction has been safely written to the on-disk log.
    • XLOG_CONTINUE_TRANS In the case where a checkpoint transaction is spread across multiple log records, the last metadata of a log record can be partially written. The corresponding operation header will have the continue flag set to indicate that the remaining part of the metadata can be obtained from the next log record in the log.
    • XLOG_WAS_CONT_TRANS This flag indicates that the operation header’s payload holds the remaining part of the metadata whose initial content was written at the end of the previous log record.
    • XLOG_END_TRANS This flag is set when the payload contains all of the remaining metadata whose initial content was written at the end of the previous log record.
    • XLOG_UNMOUNT_TRANS An operation header with this flag set is written to indicate that the filesystem has been unmounted cleanly.

The following operation headers of a checkpoint transaction have special significance:

  • Zeroth operation header

    This indicates the beginning of a new checkpoint transaction.

  • First operation header

    The payload of this operation header holds the following information about the checkpoint transaction,

    1. Transaction ID
    2. Number of metadata items being logged.
  • Last operation header.

    This indicates the end of the checkpoint transaction.

Conclusion

This article provided an overview of the on-disk layout of XFS’ checkpoint transactions. A future article will provide actual examples of metadata being logged.