XFS: Tracking space on a realtime device

April 2, 2024 | 7 minute read
Text Size 100%:

Purpose

XFS supports a separate volume called the realtime device which aims to reduce disk space tracking delays.

This articles describes the underlying on-disk structures used by XFS for managing space on a realtime device.

Tracking space on a realtime device

A filesystem with a realtime device is created by using the following command line,

# mkfs.xfs -f  -r rtdev=/dev/sdb1,extsize=4096 /dev/sda1

Here, extsize is the size of an extent on the realtime device. It has to be a multiple of filesystem block size, which itself is generally set to 4096 bytes.

The realtime extent size is stored in the superblock in units of the number of filesystem blocks that make up a realtime extent.

# xfs_db -c 'sb 0' -c print /dev/sda1
magicnum = 0x58465342
...
rextsize = 1
...

In the above example, rextsize field of the superblock is set to 1 since one realtime extent is of the same size as the filesytem block size.

The filesystem can then be mounted as shown below,

# mount -o rtdev=/dev/sdb1 /dev/sda1 /mnt/scratch/

In order to have XFS store a file’s data on the realtime device, the corresponding inode needs to have the realtime flag enabled. This can be done as shown below,

# xfs_io -R -f -c 'pwrite 0 4k' /mnt/scratch/testfile

The realtime status of the inode can be confirmed by checking for the presence of the r flag in the output of xfs_io’s lsattr command.

# xfs_io -c 'lsattr' /mnt/scratch/testfile
r---------------- /mnt/scratch/testfile

Another way of confirming the same is by checking for the presence of XFS_DIFLAG_REALTIME flag in the ondisk inode’s di_flags field.

# stat -c %i /mnt/scratch/testfile
131
# xfs_db -c 'inode 131' -c 'print core.realtime' /dev/sda1
core.realtime = 1

Please note that the filesystem has to be unmounted before executing an xfs_db command.

Another method to create realtime files is to enable the rtinherit flag on a directory. This will cause all the new files created under such a directory to be realtime files. Also, any newly created sub-directories will inherit the rtinherit flag.

# mkdir /mnt/scratch/realtime-dir
# xfs_io -c 'chattr +t' /mnt/scratch/realtime-dir/
# xfs_io -f -c 'pwrite 0 8k' /mnt/scratch/realtime-dir/regular-file-0.bin
# xfs_io -c 'lsattr' /mnt/scratch/realtime-dir/regular-file-0.bin
r---------------- /mnt/scratch/realtime-dir/regular-file-0.bin
# mkdir /mnt/scratch/realtime-dir/subdirectory
# xfs_io -c 'lsattr' /mnt/scratch/realtime-dir/subdirectory/
-------t--------- /mnt/scratch/realtime-dir/subdirectory/

Space on a realtime device is managed using the contents of two special inodes:

  1. Real-time bitmap inode.
  2. Real-time summary inode.

The inode numbers of these inodes are stored in the following fields of the superblock:

# xfs_db -c 'sb 0' -c print /dev/sda1
magicnum = 0x58465342
...
rbmino = 129
rsumino = 130
...

In the above output, the Real-time bitmap inode has an inode number of 129 and the Real-time summary inode has an inode number of 130.

Real-time bitmap inode

The Real-time bitmap inode can be considered as a regular file whose contents are interpreted as a sequence of blocks with each block containing a bitmap.

 Bitmap Inode's Content

Each bit represents the usage status a realtime extent. For example, Bit 0 represents the allocation status of the zeroth realtime extent, Bit 1 represents the allocation status of the first realtime extent and so on. A bit is set when the underlying realtime extent is unused. Otherwise the bit is cleared.

Real-time summary inode

The contents of the Real-time summary inode can be visualized as a table illustrated below:

Realtime summary inode's content

  • The rows track the free spaces in units of Log2(Number of Realtime extents). For example, The first row tracks free extents of length “1 realtime extent” i.e. Log2(1) = 0.
  • Each column tracks the status of a block of the Real-time bitmap inode. For example, The first column tracks zeroth block of the Real-time bitmap inode.
  • A cell in the table contains the number of free extents of size Log2(Number of Realtime extents) available in the corresponding Bitmap inode’s block.

The file is actually a sequence of 4-byte integers (tracking free extent count) with Log level being used as the primary index and Bitmap block number being used as the secondary index. This is illustrated below:

Summary inode linear layout

On a freshly created filesystem having a 20GiB realtime device and a realtime extent size of 4096, the longest free space will contain 20GiB / 4096 = 5242880 realtime extents. Hence the cell at row 22 (Log2(5242880) = ~22) and column 160 (20GiB / (4096 bytes/block * 8 bits/byte * 4096 bytes) = 160) will have the value 1 assigned to it. The remaining cells will have a value of 0.

# xfs_db -c 'inode 130' -c 'print u3' /dev/sda1
u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,9,4,0]
# dd if=/dev/sda1 of=rtsumino-blocks.bin skip=9 bs=4k count=4
4+0 records in
4+0 records out
16384 bytes (16 kB, 16 KiB) copied, 0.000544315 s, 30.1 MB/s
# od -A x -t x1 rtsumino-blocks.bin
000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
003700 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
003710 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
004000

Row number 22 and column number 160 maps to file offset (22 * 160 * 4 bytes/cell) = 14080 = 0x3700.

Allocation of space

Higher level XFS code can request free space on a realtime device based on the size of the free space and/or the locality of the free space. The following is a brief description of how the previously described ondisk structures are used to service such space allocation requests.

  • Request for free space that satisfies a specific size. In such a request, the caller provides the range of the required free space length i.e. [Minimum length, Maximum length]. The following steps are executed by the allocator:

    1. Search the table in Real-time summary inode data blocks for a non-zero summary value in each of the cells starting from the row Log2(Maximum length). The range of cells that can be searched is illustrated below,

      Summary inode max search range

    2. If a non-zero summary value is not found, search for a non-zero summary value in the cells that appear in the rows between Log2(Maximum length) to Log2(Minimum length). The range of cells that can be searched is illustrated below,

      Summary inode min/max search range

    The allocator then reads the corresponding bitmap block (and also the blocks immediately following it if required) to allocate the requested space. The bits corresponding to the newly allocated space are then cleared. Also, the cells in the summary inode table are updated with the reduced free extent count.

  • Request for free space that satisfies a specific size and locality Apart from the range of the required free space length, the caller provides a realtime extent number around which the free space has to be allocated. The following steps are executed by the allocator:

    1. The allocator checks for the required free space at the realtime extent number provided by the caller. The extent is returned if it is indeed free.

    2. Otherwise, the allocator checks for non-zero realtime summary value in each of the rows between Log2(Minimum length) and Log2(Maximum length). The column search starts at the bitmap block which maps the realtime extent number passed as an argument to the allocator. The column search proceeds by checking for a non-zero realtime summary value in the next column; The allocator checks for the same in the previous column if a non-zero realtime summary value is not found. The allocator continues the search by incrementally searching in columns before and after the initial column.

      The following diagram illustrates the search order.

      Summary inode min/max locality search

      If the realtime extent number is tracked by the bits in Bitmap block number X, the allocator starts by checking for a non-zero realtime summary value in the cells belonging to the Xth column (shaded using the Blue colour). It then proceeds to incrementally search for a non-zero realtime summary value in columns X+1 and X-1 (shaded using the Red colour) and so on.

    In both the cases mentioned above, on finding a cell with a non-zero summary value, the allocator reads the selected Bitmap blocks and clears the bits corresponding to the allocated extents. Also, the corresponding cells in the summary inode table are updated with the reduced free extent count.

Conclusion

As noted in the XFS filesystem structure document, the on-disk structures associated with realtime device are not space efficient. However, they help in quickly managing space on the realtime device.

Chandan Babu


Previous Post

Improve your Linux file system know-how with the Oracle Linux Storage Management Video Playlist

Nicolas Pares | 3 min read

Next Post


Explore Linux kernel configurations with kconfigs

Stephen Brennan | 5 min read