QEMU offers NVMe device emulation, commonly used by developers and users for testing and developing drivers, tools, and new operating system features. With the release of QEMU 9.2, the NVMe emulation now supports Controller Atomic Parameters (AWUN and AWUPF) (3), ensuring that writes adhering to these parameters are handled atomically. Future updates to QEMU’s NVMe Atomic Write support will include Namespace Atomic Parameters and Namespace Atomic Boundary Parameters. This blog will introduce the Atomic Writes feature with regards to NVMe controllers and how to setup Atomic Writes for a QEMU-NVMe device.

Atomic Writes

The word “atom” originates from the Greek word “atomos,” meaning “uncuttable.” Multiple operations can be combined into a single logical unit and executed by a thread. When these operations are treated as a single, indivisible action, it is considered atomic. In the context of disk writes, if a thread performs a single operation that writes to multiple disk blocks, either all of the write data will be committed to the media or none of it will. Additionally, it is guaranteed that the write data will not be interleaved with another thread’s disk write.

The following example demonstrates valid and invalid outcomes of two 4-block atomic writes: Write A targets Logical Block Addresses (LBAs) 0-3, while Write B targets LBAs 1-4. The overlapping LBAs between Writes A and B are 1-3. For atomic writes to be valid in this range, the data must come entirely from either Write A or Write B, without any mixing of the two.

awun-example

  • The Valid Results show resulting data to LBAs 1-3 comes from either Write A OR Write B. There was no intermixing.
  • The Invalid Results show resulting data to LBAs 1-3 was intermixed from Write A AND Write B, so these commands were not Atomic.

Benefits of Atomic Writes

Ensuring atomic writes enhances data integrity, performance, and reliability:

  • Non-atomic writes, especially during power failures or error conditions, can lead to torn writes, where only part of a write operation is successfully recorded. Without checksums, this issue might go undetected, resulting in data corruption.
  • In some databases, such as MySQL, page writes must be performed twice to recover from possible incomplete writes that can occur during power failures or errors. This process, called a double-write, is unnecessary when writes are atomic. For NVMe devices, removing the double-write requirement, reduces the number of write operations which prolongs the lifespan of the NVMe device.
  • When multi-threaded applications are designed to take advantage of atomic write guarantees, more efficient locking mechanisms can be implemented reducing the risk of data corruption and improving overall performance.

NVMe Atomic Writes

By default, a single NVMe logical block write (512 bytes) is guaranteed to be atomic. If a NVMe controller provides an atomic guarantee for writes larger than a single logical block, its atomic write behavior must be specified using the Atomicity Parameters defined in the NVMe Specification (2). These parameters include Controller, Namespace, and Atomic Boundary Parameters. The QEMU 9.2 NVMe Atomic Write implementation discussed in this blog supports only the Controller Parameters, so Namespace and Atomic Boundary Parameters will not be covered here. Note that Namespace and Boundary Parameters are not required by the NVMe Specification (2).

Controller Atomic Parameters – This describes the atomic write behavior across all namespaces on a given NVMe controller. Its atomic behavior is dependent on whether its in a normal operating mode or in a powerfail mode.

  • Atomic Write Unit Normal (AWUN) – Maximum atomic write size in number of blocks. If a write is submitted with a size of less than or equal to AWUN, then the write is guaranteed to be atomic. The AWUN value is zero-based, meaning that a value of 0 corresponds to 1 block, a value of 1 corresponds to 2 blocks, and so on.
  • Atomic Write Unit Powerfail (AWUPF) – Maximum atomic write size in number of blocks during a powerfail or error condition. If a write is submitted with a size of less than or equal to AWUPF, then the controller guarantees that if the command fails due to a powerfail or error condition, a subsequent read will return all old data or all new data. Like AWUN, AWUPF value is zero-based, meaning that a value of 0 corresponds to 1 block, a value of 1 corresponds to 2 blocks, and so on. AWUPF must be less than or equal to AWUN.

Operating systems that support atomic write operations typically do so for data integrity purposes over performance so the AWUPF value may be used to determine the operating system’s maximum atomic write size rather than AWUN so a controllers atomic normal operating mode is not needed. By disabling a controller’s atomic normal operating mode, controller resources can be freed up to be used for other purposes, potentially providing better performance. This can be done by setting the “Disable Normal” bit of the NVMe Feature – Write Atomicity Normal.

  • Write Atomicity Normal Feature (NVMe Feature 0xA)- Configures the controller operation of the AWUN parameter via the NVMe SET FEATURE command. The Disable Normal (DN) bit (bit 0) when set to ‘1’, the host specifies that AWUN is not required and the controller shall only honor AWUPF. If this bit is cleared to ‘0’, then AWUN shall be honored by the controller. The current configuration of the Write Atomicity Normal Feature can be checked using the NVMe GET FEATURE command.

Calculating Maximum Atomic Write Size: ((Value of AWUN or AWUPF) + 1 (since AWUN and AWUPF are zero-based)) * (Logical Block Size)

  • Example: If AWUN is set to 63: (63 + 1) * 512 = 32768 bytes

Using NVMe Atomic Writes in QEMU

Setting up the Controller Atomic Parameters

QEMU NVMe Atomic Parameters (See NVMe Specification (2) for details). These are setup when starting the QEMU VM:

    atomic.dn (default off) - Set the value of Disable Normal.
    atomic.awun=UINT16 (default: 0)
    atomic.awupf=UINT16 (default: 0)

When these parameters are added to a QEMU VM startup script, upon VM startup, it enables and initializes the NVMe Atomic Write feature. The AWUN and AWUPF parameters should be set being consistent with the NVMe Specification (2). The Write Atomicity Normal Feature can be initialized to a desired power on state (Disable Normal enabled or disabled).

  • Setting AWUN to 32k atomic write size – (32k/512) – 1(AWUN is zero-based) = 63 – atomic.awun=63
  • Setting AWUPF to 16k atomic write size – (16k/512) – 1(AWUPF is zero-based) = 31 – atomic.awupf=31

QEMU-NVMe Internal Maximum Atomic Write Size

The QEMU-NVMe Atomic Write implementation maintains an internal maximum atomic write size which is based on the following:

  • DN=off, internal maximum atomic write size (in bytes) = (AWUN+1(AWUN is zero-based)) * 512
  • DN=on, internal maximum atomic write size (in bytes) = (AWUPF+1(AWUPF is zero-based)) * 512

Any write issued to QEMU-NVMe is guaranteed to be atomic if its size is less than or equal to the internal maximum atomic write size. Writes larger than the internal maximum atomic write size will complete successfully, but are not guaranteed to be atomic.

DN can be dynamically modified via the NVMe SET FEATURE command which will dynamically change the internal maximum atomic write size.

Example QEMU/Linux Startup

For QEMU, use 9.2 or greater. For the Linux examples, use Linux 6.11 or greater and fio 3.37 or greater on the guest.

# ./qemu/build/qemu-system-x86_64 -cpu host --enable-kvm -smp cpus=8 -no-reboot -m 8192M \
  -drive file=./disk.img,if=ide -boot c \
  -device nvme,id=nvme-ctrl-0,serial=nvme-1,atomic.dn=off,atomic.awun=63,atomic.awupf=31 \
  -drive file=./nvme.img,if=none,id=nvm-1 -device nvme-ns,drive=nvm-1,bus=nvme-ctrl-0

This will create a NVMe controller with a single namespace with AWUN=63 (32768 bytes), AWUPF=31 (16384 bytes), and Disable Normal is not set.

Verifying Atomic Parameters with Linux

# nvme id-ctrl /dev/nvme0 | grep awun
awun : 63

# nvme id-ctrl /dev/nvme0 | grep awupf
awupf : 31

# nvme get-feature /dev/nvme0 -f 0xa
get-feature:0x0a (Write Atomicity Normal), Current value:00000000 **\<--- Disable Normal (DN) is not set.**

# cat /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n1/queue/atomic_write_max_bytes
16384 **\<--- The operating system (Linux) uses 16384 as the atomic write maximum write size**

The above example shows AWUN=63 (32768 bytes), AWUPF=31 (16384 bytes), and Disable Normal=off, so the QEMU-NVMe internal maximum atomic write size is 32768 bytes, but Linux reports atomic_write_max_bytes as 16384. This is because Linux uses AWUPF rather than AWUN for the atomic maximum write size to get data integrity benefits. The Operating System’s atomic maximum write size will always be less than or equal to QEMU-NVMe’s internal maximum atomic write size.

Changing Disable Normal with Linux

# nvme get-feature /dev/nvme0 -f 0xa
get-feature:0x0a (Write Atomicity Normal), Current value:00000000 **\<--- DN=0, internal maximum atomic write size is 32768 bytes**

# nvme set-feature /dev/nvme0 -f 0xa -V 0x1 **\<--- Sets DN to 1**
set-feature:0x0a (Write Atomicity Normal), value:0x00000001, cdw12:00000000, save:0

# nvme get-feature /dev/nvme0 -f 0xa
get-feature:0x0a (Write Atomicity Normal), Current value:0x00000001 **\<--- DN=1, internal maximum atomic write size is now 16384 bytes**

# cat /sys/devices/pci0000:00/0000:00:04.0/nvme/nvme0/nvme0n1/queue/atomic_write_max_bytes
16384 **\<--- Changing DN will not change the Operating System's maximum atomic write size**

Using Linux fio to validate Atomic Writes

# nvme id-ctrl /dev/nvme0 | grep awun
awun : 63

# nvme id-ctrl /dev/nvme0 | grep awupf
awupf : 31

# nvme set-feature /dev/nvme0 -f 0xa -V 0x0 **\<--- Sets DN to 0**
set-feature:0x0a (Write Atomicity Normal), value:0x00000000, cdw12:00000000, save:0

# nvme get-feature /dev/nvme0 -f 0xa
get-feature:0x0a (Write Atomicity Normal), Current value:0x00000000 **\<--- DN=0, internal maximum atomic write size is now 32768 bytes**

Fio (1) is a test program that is used to simulate various block workloads. Fio will perform writes with a unique pattern and a CRC at the end of the block. If the writes are atomic, there will never be intermixing of data so the CRC will always be correct during the read verification. If the writes are not guaranteed to be atomic, there could be intermixing of data and the CRC will be incorrect during read verification. Since DN=0 and AWUN is 63, the internal maximum atomic write size is 32768 bytes. If we run fio with 64k byte blocks (not guaranteed to atomic) and read verification.

# fio --filename=/dev/nvme0n1 --direct=1 --rw=randwrite --bs=64k --iodepth=256 --name=iops \
  --numjobs=50 --ioengine=libaio --loops=10 --verify=crc64 --verify_write_sequence=0

fio: multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures.
iops: (g=0): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=256

_crc64: verify failed at file /dev/nvme0n1 offset 756285440, length 65536 (requested block: offset=756285440, length=65536, flags=88)
      Expected CRC: aa1c63542c2e1840
      Received CRC: 4b24cb8e451e42e7
  • Eventually a crc64 error is expected.

If we run fio with 32k byte blocks (guaranteed to be atomic) and read verification.

# fio --filename=/dev/nvme0n1 --direct=1 --rw=randwrite --bs=32k --iodepth=256 --name=iops \
--numjobs=50 --ioengine=libaio --loops=10 --verify=crc64 --verify_write_sequence=0

fio: multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures.
iops: (g=0): rw=randwrite, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=256
Run status group 0 (all jobs):
      READ: bw=1892MiB/s (1984MB/s), 37.8MiB/s-39.2MiB/s (39.7MB/s-41.1MB/s), io=500GiB (537GB), run=261075-270557msec
      WRITE: bw=1711MiB/s (1794MB/s), 34.2MiB/s-36.0MiB/s (35.9MB/s-37.7MB/s), io=500GiB (537GB), run=284505-299261msec
Disk stats (read/write):
      nvme0n1: ios=16420620/16384000, sectors=1050165992/1048576000, merge=0/0, ticks=1940415/2072772, in_queue=4013187, util=100.00%

Summary

QEMU 9.2 introduced enhanced support for NVMe Atomic Write operations, particularly for developers and users focused on testing, driver development, or OS feature development. The new feature allows NVMe devices to handle atomic writes according to specific controller atomic parameters: AWUN (Atomic Write Unit Normal) and AWUPF (Atomic Write Unit Powerfail). By adopting this feature, developers can test and ensure robust data integrity in environments that simulate real-world conditions, including power failures and system error conditions.

After reading this blog, you will have learned how to configure an NVMe device in QEMU to enable the Atomic Write feature, verify the atomic write settings using Linux, and use the fio test utility to simulate a torn write scenario, illustrating how atomic writes prevent this issue.

In the future, it is expected QEMU will expand support for the NVMe Atomic Writes feature to include Namespace Atomic Parameters, Namespace Atomic Boundary Parameters, and Multiple Atomicity Mode.

References

  1. fio – Flexible I/O tester – https://fio.readthedocs.io/en/latest/fio_doc.html
  2. NVM Express Base Specification, Revision 2.0c, October 4, 2022. Available from https://nvmexpress.org/specifications/
  3. Final Patch – https://patchwork.kernel.org/project/qemu-devel/cover/20240926212458.32449-1-alan.adamson@oracle.com/