fsck (file system consistency check) is a tool used to check a filesystem for errors. Running fsck proactively and periodically allows us to keep track of the filesystem status and catch corruption before they become widespread in a system. Current fsck versions require the filesystem to be unmounted with the exception that it can be run on read-only mounts. In a production environment, unmounting filesystems requires downtime, and having a periodic check would require frequent downtime which is not practical. Hence, the need for a mechanism to check file systems while mounted. xfs_scrub is an online fsck tool in the works, (more details here) however xfs_scrub is a large project and we wanted a solution to have a consistency check till xfs_scrub is released. In our production environments, we have multiple VMs with vdisks on an XFS filesystem and we came up with the following solution to have regular scans.
Scanfs enables us to scan filesystems for any inconsistencies/corruptions without requiring any downtime. This comes in handy if you want to keep an eye on your volumes and scan them in frequent regular intervals to avoid any major data loss. This alternate method does something similar to online fsck. There are a few limitations and pre-requisites to this and they are
-
Scanning is done from the host and only guest machine volumes can be scanned
-
Host filesystem containing the guest volume’s .img files should support reflink
# ls /host-mount guest-volume1.img guest-volume2.img # xfs_info /host-mount | grep reflink = reflink=1
How does it work?
Scanfs script uses the reflink feature of the host filesystem to take a point-in-time snapshot of the virtual disk, more details on reflink can be found here. The contents of the snapshot will be equivalent to a VM when crashed. Once the snapshot is taken, the scanfs script maps the virtual disk to a loop device and mounts the loop device. During mount, the journal is replayed and the file system(s) inside the virtual disk is brought to a consistent state. After this, we run the fsck tools associated with the filesystem inside virtual disks and check for any inconsistencies.
Taking a reflink copy
Reflink generates an instant copy of a file by sharing file extents. Take the reflink copy of all the guest volume files.
# mkdir reflink-cp # cp --reflink guest-volume1.img reflink-cp/reflink-guest-volume1.img # cp --reflink guest-volume2.img reflink-cp/reflink-guest-volume2.img # ls reflink-cp/ reflink-guest-volume1.img reflink-guest-volume2.img
Scanning volumes
Now that the reflink copy is done, you have a copy image that you can set up on the host machine and run fsck on.
# losetup -fv reflink-guest-volume1.img # losetup -fv reflink-guest-volume2.img # losetup -a /dev/loop1: [64513]:274710663 (/host-mount/reflink-cp/reflink-guest-volume2.img) /dev/loop0: [64513]:274710662 (/host-mount/reflink-cp/reflink-guest-volume1.img)
If the volume has partitions, you can use kpartx to get those partition mappings. As some of the filesystems make use of a journal, it is a good idea to mount the filesystem to replay the journal and then umount and run our checks. Depending on the filesystem type, run the required fsck command. If any inconsistency is found, you can plan for downtime to go ahead and investigate/fix those before losing any data or before the corruption spreads and causes more trouble.
# blkid /dev/loop0 /dev/loop0: UUID="bce5e7dd-ae78-419c-82c4-3364d2ad13e7" TYPE="xfs" # mkdir tmp_mnt # mount /dev/loop0 tmp_mnt; umount /dev/loop0 # xfs_repair -n /dev/loop0 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. # blkid /dev/loop1 /dev/loop1: UUID="f5c7a33b-f3cd-4fd0-893a-2077457eecc9" TYPE="ext4" # e2fsck -fn /dev/loop1 e2fsck 1.45.4 (23-Sep-2019) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/loop1: 38/100352 files (13.2% non-contiguous), 301483/400000 blocks
If the volumes are found to be corrupt, you can now collect the metadumps for further investigation. Once done with the scanning, clean up the reflinked volumes.
# If partitions exist on the volumes, use kpartx -d to delete those # losetup -d /dev/loop0 # losetup -d /dev/loop1
These reflink copies created at the beginning can now be deleted or preserved if required for further investigation.
Using Scanfs
Scanfs is a tool that does all the above steps and gives you a report on the state of your filesystems. It takes in the path to the image files as an argument and also supports options for setup and cleanup, which comes in handy to collect any debug data if corruption is detected.
For installation instructions and more information about OLED, please see Oracle Linux Enhanced Diagnostics.
# oled scanfs /myvolumes Scanfs 1.0 Setting up setting up - system.img setting up - data.img Setup Complete Checking XFS Filesystems Scanning /dev/Scanolumes-vg_ora1/lv_root Scanning /dev/mapper/loop4p2 Checking EXT4 Filesystems Scanning /dev/mapper/loop4p1 Cleaning up Cleanup complete. Check /myvolumes/Scanfs-2023-11-22T06-50-36/summary for more details Corruption found in these volumes: ************************************ xfs - /dev/mapper/loop4p2 ************************************