fsck (file system consistency check) is a tool used to check a filesystem for errors. Running fsck proactively and periodically allows us to keep track of the filesystem status and catch corruption before they become widespread in a system. Current fsck versions require the filesystem to be unmounted with the exception that it can be run on read-only mounts. In a production environment, unmounting filesystems requires downtime, and having a periodic check would require frequent downtime which is not practical. Hence, the need for a mechanism to check file systems while mounted. xfs_scrub is an online fsck tool in the works, (more details here) however xfs_scrub is a large project and we wanted a solution to have a consistency check till xfs_scrub is released. In our production environments, we have multiple VMs with vdisks on an XFS filesystem and we came up with the following solution to have regular scans.

Scanfs enables us to scan filesystems for any inconsistencies/corruptions without requiring any downtime. This comes in handy if you want to keep an eye on your volumes and scan them in frequent regular intervals to avoid any major data loss. This alternate method does something similar to online fsck. There are a few limitations and pre-requisites to this and they are

  • Scanning is done from the host and only guest machine volumes can be scanned

  • Host filesystem containing the guest volume’s .img files should support reflink

    # ls /host-mount
    guest-volume1.img guest-volume2.img
    # xfs_info /host-mount | grep reflink
           =                       reflink=1

How does it work?

Scanfs script uses the reflink feature of the host filesystem to take a point-in-time snapshot of the virtual disk, more details on reflink can be found here. The contents of the snapshot will be equivalent to a VM when crashed. Once the snapshot is taken, the scanfs script maps the virtual disk to a loop device and mounts the loop device. During mount, the journal is replayed and the file system(s) inside the virtual disk is brought to a consistent state. After this, we run the fsck tools associated with the filesystem inside virtual disks and check for any inconsistencies.

Reflink generates an instant copy of a file by sharing file extents. Take the reflink copy of all the guest volume files.

# mkdir reflink-cp
# cp --reflink guest-volume1.img reflink-cp/reflink-guest-volume1.img
# cp --reflink guest-volume2.img reflink-cp/reflink-guest-volume2.img

# ls reflink-cp/
reflink-guest-volume1.img  reflink-guest-volume2.img

Scanning volumes

Now that the reflink copy is done, you have a copy image that you can set up on the host machine and run fsck on.

# losetup -fv reflink-guest-volume1.img
# losetup -fv reflink-guest-volume2.img

# losetup -a
/dev/loop1: [64513]:274710663 (/host-mount/reflink-cp/reflink-guest-volume2.img)
/dev/loop0: [64513]:274710662 (/host-mount/reflink-cp/reflink-guest-volume1.img)

If the volume has partitions, you can use kpartx to get those partition mappings. As some of the filesystems make use of a journal, it is a good idea to mount the filesystem to replay the journal and then umount and run our checks. Depending on the filesystem type, run the required fsck command. If any inconsistency is found, you can plan for downtime to go ahead and investigate/fix those before losing any data or before the corruption spreads and causes more trouble.

# blkid /dev/loop0
/dev/loop0: UUID="bce5e7dd-ae78-419c-82c4-3364d2ad13e7" TYPE="xfs"

# mkdir tmp_mnt
# mount /dev/loop0 tmp_mnt; umount /dev/loop0

# xfs_repair -n /dev/loop0
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

# blkid /dev/loop1
/dev/loop1: UUID="f5c7a33b-f3cd-4fd0-893a-2077457eecc9" TYPE="ext4"

# e2fsck -fn /dev/loop1
e2fsck 1.45.4 (23-Sep-2019)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/loop1: 38/100352 files (13.2% non-contiguous), 301483/400000 blocks

If the volumes are found to be corrupt, you can now collect the metadumps for further investigation. Once done with the scanning, clean up the reflinked volumes.

# If partitions exist on the volumes, use kpartx -d to delete those
# losetup -d /dev/loop0
# losetup -d /dev/loop1

These reflink copies created at the beginning can now be deleted or preserved if required for further investigation.

Using Scanfs

Scanfs is a tool that does all the above steps and gives you a report on the state of your filesystems. It takes in the path to the image files as an argument and also supports options for setup and cleanup, which comes in handy to collect any debug data if corruption is detected.

For installation instructions and more information about OLED, please see Oracle Linux Enhanced Diagnostics.

# oled scanfs /myvolumes

Scanfs 1.0


Setting up
setting up - system.img
setting up - data.img
Setup Complete

Checking XFS Filesystems
Scanning /dev/Scanolumes-vg_ora1/lv_root
Scanning /dev/mapper/loop4p2

Checking EXT4 Filesystems
Scanning /dev/mapper/loop4p1

Cleaning up
Cleanup complete.

Check /myvolumes/Scanfs-2023-11-22T06-50-36/summary for more details

Corruption found in these volumes:

************************************

xfs - /dev/mapper/loop4p2

************************************