The Linux PCI subsystem is one of the most significant subsystems of the Linux kernel. In this article, we introduce the usage of QEMU to emulate different PCI/PCIe configurations to help study the Linux PCI subsystem. This ability facilitates Linux administrators or developers, to study, debug and develop the Linux kernel, as it is much easier to customize the PCI/PCIe configuration with QEMU. For instance, in conjunction with SeaBIOS source code, it will be much easier to study PCI initialization and the probing process. In addition, it is also considerably faster to reboot a QEMU/KVM virtual machine compared to rebooting a baremetal server.

For all examples in this article the KVM virtual machine is running Oracle Linux 8, the virtual machine kernel version is 5.10.0, and the QEMU version is 5.2.0.

All examples run the boot disk (ol8.qcow2) as default IDE. Since the objective of this article is to study PCI/PCIe, we use virtio-scsi-pci HBA as an example and will not attach any SCSI LUN to the HBA. Please refer to our prior blog article for how to attach an SCSI LUN to virtio-scsi-pci HBA.

The article focuses on the usage of QEMU with PCI/PCIe. It does not cover any prerequisite knowledge on PCI/PCIe specifications.

PCI Bridge

This section demonstrates how to create PCI secondary buses through PCI-2-PCI bridge. The secondary bus is created via “pci-bridge”.

qemu-system-x86_64 -machine pc,accel=kvm -vnc :8 -smp 4 -m 4096M \
-net nic -net user,hostfwd=tcp::5028-:22 \
-hda ol8.qcow2 -serial stdio \
-device pci-bridge,id=bridge0,chassis_nr=1 \
-device virtio-scsi-pci,id=scsi0,bus=bridge0,addr=0x3 \
-device pci-bridge,id=bridge1,chassis_nr=2 \
-device virtio-scsi-pci,id=scsi1,bus=bridge1,addr=0x3 \
-device virtio-scsi-pci,id=scsi2,bus=bridge1,addr=0x4

The above QEMU command line creates two PCI secondary buses. While one secondary bus (01:00.0) has one virtio-scsi-pci HBA (01:03.0), the second (02:00.0) has two virtio-scsi-pci HBAs (02:03.0 and 02:04.0).

[root@vm ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
00:04.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:05.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
01:03.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
02:03.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
02:04.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI

The below lspci output and figure depict the PCI bus topology for this example.

[root@vm ~]# lspci -t
-[0000:00]-+-00.0
           +-01.0
           +-01.1
           +-01.3
           +-02.0
           +-03.0
           +-04.0-[01]----03.0
           \-05.0-[02]--+-03.0
                        \-04.0

PCI Root Bus

This section demonstrates how to create extra PCI root buses through the “light-weight” PXB (PCI Expander Bridge) host bridge. It is “pxb” in QEMU command line. It is implemented only for i440fx and can be placed only on bus 0.

qemu-system-x86_64 -machine pc,accel=kvm -vnc :8 -smp 4 -m 4096M \
-net nic -net user,hostfwd=tcp::5028-:22 \
-hda ol8.qcow2 -serial stdio \
-device pxb,id=bridge1,bus=pci.0,bus_nr=3 \
-device virtio-scsi-pci,bus=bridge1,addr=0x3 \
-device pxb,id=bridge2,bus=pci.0,bus_nr=8 \
-device virtio-scsi-pci,bus=bridge2,addr=0x3 \
-device virtio-scsi-pci,bus=bridge2,addr=0x4

The above QEMU command line creates two extra PCI root buses. The first root bus (04:00.0) has one virtio-scsi-pci HBA (04:03.0), and the second root bus (09:00.0) has two virtio-scsi-pci HBAs (09:03.0 and 09:04.0).

[root@vm ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
00:04.0 Host bridge: Red Hat, Inc. QEMU PCI Expander bridge
00:05.0 Host bridge: Red Hat, Inc. QEMU PCI Expander bridge
03:00.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
04:03.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
08:00.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
09:03.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
09:04.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI

The below lspci output and figure depict the PCI bus topology for this example.

[root@vm ~]# lspci -t
-+-[0000:08]---00.0-[09]--+-03.0
 |                        \-04.0
 +-[0000:03]---00.0-[04]----03.0
 \-[0000:00]-+-00.0
             +-01.0
             +-01.1
             +-01.3
             +-02.0
             +-03.0
             +-04.0
             \-05.0

PCIe Root Complex

This section demonstrates how to create extra PCIe root buses through extra Root Complexes. According to QEMU source code, PCIe features are supported only by ‘q35’ machine type on x86 architecture and the ‘virt’ machine type on AArch64. The root complex is created by using “pxb-pcie” on the QEMU command line.

qemu-system-x86_64 -machine q35,accel=kvm -vnc :8 -smp 4 -m 4096M \
-net nic -net user,hostfwd=tcp::5028-:22 \
-hda ol8.qcow2 -serial stdio \
-device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \
-device ioh3420,id=pcie_port1,bus=pcie.1,chassis=1 \
-device virtio-scsi-pci,bus=pcie_port1 \
-device ioh3420,id=pcie_port2,bus=pcie.1,chassis=2 \
-device virtio-scsi-pci,bus=pcie_port2 \
-device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
-device ioh3420,id=pcie_port3,bus=pcie.2,chassis=3 \
-device virtio-scsi-pci,bus=pcie_port3

The above QEMU command line creates two extra PCIe root complexes. The first root complex has one virtio-scsi-pci HBA (09:00.0), and the second has two virtio-scsi-pci HBAs (03:00.0 and 04:00.0).

[root@vm ~]# lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
00:03.0 Host bridge: Red Hat, Inc. QEMU PCIe Expander bridge
00:04.0 Host bridge: Red Hat, Inc. QEMU PCIe Expander bridge
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
02:00.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
02:01.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
03:00.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI (rev 01)
04:00.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI (rev 01)
08:00.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
09:00.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI (rev 01)

The below lspci output and figure depict the PCIe topology for this example.

[root@vm ~]# lspci -t
-+-[0000:08]---00.0-[09]----00.0
 +-[0000:02]-+-00.0-[03]----00.0
 |           \-01.0-[04]----00.0
 \-[0000:00]-+-00.0
             +-01.0
             +-02.0
             +-03.0
             +-04.0
             +-1f.0
             +-1f.2
             \-1f.3

PCIe Switch

This section demonstrates how to create a PCIe switch.

qemu-system-x86_64 -machine q35,accel=kvm -vnc :8 -smp 4 -m 4096M \
-net nic -net user,hostfwd=tcp::5028-:22 \
-hda ol8.qcow2 -serial stdio \
-device ioh3420,id=root_port1,bus=pcie.0 \
-device x3130-upstream,id=upstream1,bus=root_port1 \
-device xio3130-downstream,id=downstream1,bus=upstream1,chassis=9 \
-device virtio-scsi-pci,bus=downstream1 \
-device xio3130-downstream,id=downstream2,bus=upstream1,chassis=10 \
-device virtio-scsi-pci,bus=downstream2

The above QEMU command line creates a PCIe switch which has two virtio-scsi-pci HBAs attached. While the upstream port is connected to the root bus, each downstream port is connected to a virtio-scsi-pci HBA (03:00.0 and 04:00.0).

[root@vm ~]# lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:02.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
00:03.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) (rev 02)
02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
02:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
03:00.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI (rev 01)
04:00.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI (rev 01)

The below lspci output and figure depict the PCIe topology for this example.

[root@vm ~]# lspci -t
-[0000:00]-+-00.0
           +-01.0
           +-02.0
           +-03.0-[01-04]----00.0-[02-04]--+-00.0-[03]----00.0
           |                               \-01.0-[04]----00.0
           +-1f.0
           +-1f.2
           \-1f.3

IOMMU

Nowadays, IOMMU is always used by baremetal machines. QEMU is able to emulate IOMMU to help developers debug and study the Linux kernel IOMMU relevant source code, e.g., how DMA Remapping and Interrupt Remapping work.

The below QEMU command line demonstrates how to create Intel IOMMU for a virtual machine (with Interrupt Remapping enabled). In addition to the QEMU command line, “intel_iommu=on” should be appended to the Linux kernel command line for the virtual machine.

qemu-system-x86_64 -machine q35,accel=kvm,kernel-irqchip=split -vnc :8 -smp 4 -m 4096M \
-net nic -net user,hostfwd=tcp::5028-:22 \
-hda ol8.qcow2 -serial stdio \
-device nvme,drive=nvme0,serial=deadbeaf1,max_ioqpairs=4 \
-drive file=disk1.qcow2,if=none,id=nvme0 \
-device intel-iommu,intremap=on

According to the virtual machine syslog, IOMMU is available and enabled by the Linux kernel.

[root@vm ~]# dmesg | egrep "iommu|IOMMU"
... ...
[    0.019828] DMAR: IOMMU enabled
[    0.203209] DMAR-IR: IOAPIC id 0 under DRHD base  0xfed90000 IOMMU 0
[    0.628348] iommu: Default domain type: Passthrough
[    1.078994] pci 0000:00:00.0: Adding to iommu group 0
[    1.079892] pci 0000:00:01.0: Adding to iommu group 1
[    1.080775] pci 0000:00:02.0: Adding to iommu group 2
[    1.081654] pci 0000:00:03.0: Adding to iommu group 3
[    1.082545] pci 0000:00:1f.0: Adding to iommu group 4
[    1.083432] pci 0000:00:1f.2: Adding to iommu group 4
[    1.084315] pci 0000:00:1f.3: Adding to iommu group 4

Summary

In this article we discussed how to emulate numerous different PCI/PCIe configurations with QEMU to help study the Linux kernel PCI subsystem. In addition, we discussed IOMMU emulation with QEMU which helps study more PCI/PCIe features. The best way to fully understand how to configure PCI/PCIe emulation using QEMU is to study the QEMU source code. This will also assist in understanding PCI/PCIe specifications. This article should primarily be used as a cheat sheet. For more details on the usage of QEMU PCI emulation, please refer to QEMU docs.