Background
Mandatory Access Control (MAC) systems, such as SELinux, are well-suited for virtualisation libraries like libvirt to ensure access separation on multi-tenancy hosts. In this blog, we’ll explore how SELinux is used by libvirt to provide an additional layer of security and isolation for virtual machines while exploring some configuration options for deploying SELinux-enabled libvirt environments.
SELinux
SELinux enforces security policies that restrict the actions of processes, limiting the damage that can be caused by a compromised process or user error. SELinux operates by labelling resources (files, processes, etc.) and enforcing rules that dictate what operations can be performed on those resources.
Multi-Category Security (MCS) is an SELinux feature that further refines access control by categorising objects and subjects into categories. MCS is used in conjunction with SELinux’s type enforcement to provide an additional layer of isolation. It is particularly useful in virtualisation environments, such as those managed by libvirt, where it can be used to isolate virtual machines (VMs) from each other. MCS ensures that even if two VMs are running under the same SELinux type, they can still be isolated if they are in different categories.
Categories are integers from 0 to 1023 (up to 1024 possible, though policy limits may apply), assigned to SELinux security contexts in the MLS (Multi-Level Security) range field. A typical MCS label is s0:cN[,cM,…], where s0 is a fixed sensitivity level, and cN represents a single category. Multiple categories can be attached to a single label, allowing a context to access resources across multiple isolation groups—useful for shared resources in complex setups like clustered VMs.
For example, in a virtualisation context, MCS can label VMs with different categories, preventing a process in one VM from accessing resources labeled with a different category. This enhances security by ensuring that even if a VM is compromised, it cannot access resources of other VMs. Access follows a dominance principle: granted only if the accessor’s categories are a superset of the target’s. Therefore, a common category can be used to share a disk across VMs. For instance, two VMs VM1 and VM2, each with label s0:c951,c961 and s0:c851,c961 respectively can access a shared disk (or any other resource) with the label s0:c961.
SELinux stores these labels as extended attributes (xattr) in the security namespace. Extended Attributes are a filesystem feature that allow associating arbitrary metadata with files, directories, and other filesystem objects. They consist of name-value pairs organised into namespaces (viz. user, system, security, trusted) to avoid conflicts and provide access controls. The types of namespaces are listed below.
- User namespace (user.*): For user-defined attributes, accessible by file owners.
- System namespace (system.*): For filesystem-specific data, like POSIX ACLs.
- Security namespace (security.*): Used by security modules, such as SELinux for storing security labels (e.g., security.selinux).
- Trusted namespace (trusted.*): Accessible only by root, for trusted applications.
Setup verification
libvirt uses xattr’s trusted namespace capabilities when SELinux is enabled on the system. As of this writing, ALL major filesystems that Oracle Linux ships include xattr support.
To check whether SELinux is enabled, run this as root:
$ getenforce
Enforcing
If the output is Enforcing then SELinux is enabled and in Enforcing mode. An output of Permissive means SELinux will only audit all operations (including failed ones), with an audit trail (typically under /var/log/audit/audit.log). An output of Disabled means SELinux has been disabled, and must be enabled through its configuration file (/etc/selinux/config); any attempts to enable it via setenforce will have no effect until this configuration file is modified and the system is rebooted.
The SELinux security context can be seen with the -Z flag to most UNIX programs.
$ ls -dZ /var/lib/libvirt/images/addon-ddisk.img
system_u:object_r:svirt_image_t:s0:c951,c961 addon-ddisk.img
$ ps -eZ | grep qemu
system_u:system_r:svirt_t:s0:c951,c961 7403 ? 00:00:19 qemu-kvm
In this example, the two categories in the security level (c951 and c961) are being assigned from the qemu process that is using the disk image, because libvirt defaults to a dynamic SELinux labelling scheme. This means that the libvirt daemon will automatically set the disk image file’s security context to be the same as the qemu process that is currently using it, ensuring the VM process can correctly access only relevant disk images (and other resources in general, but we focus on disk images here).
Types of labelling
By default, libvirt resorts to dynamic labelling, where the daemon generates two category IDs per VM upon start. The daemon guarantees that these IDs will be unique to each running daemon instance. This is termed dynamic labelling. If no seclabel node is specified in the domain XML, then it uses dynamic labelling by default, assuming SELinux is enabled on the host.
Dynamic labelling is preferred when simplicity and automatic isolation are needed, such as in environments with many transient VMs, as it avoids manual category management and ensures uniqueness without conflicts. An advantage is ease of use with no XML changes required; however, categories are dynamically assigned per daemon restart, which could theoretically lead to conflicts if there is an overlap with statically-assigned category labels to resources. Extra care must therefore be taken when mixing use of dynamic and static labelling.
Static labelling offers predictability by using predefined categories in the domain XML, making it better for fixed, long-running VMs or when integrating with custom MCS policies where exact category control is required for compliance. It risks category overlaps if the same IDs are assigned to multiple VMs, potentially allowing cross-VM access, so administrators must carefully allocate unique sets. Within static labelling, relabelling can be enabled or disabled: with relabel set to no, libvirt will not change the security label of the disk images to the value specified in the XML; the administrator must ensure that all resources the VM uses are labelled appropriately, otherwise the VM process may encounter permission issues. This differs from dynamic only in category assignment—dynamic guarantees uniqueness for all resources that it labels, while static places the onus on the administrator to avoid conflicts.
A top-level node as shown below can be added to a domain XML to select the desired labelling scheme.
Static Labelling with Relabel Disabled
<seclabel type='static' model='selinux' relabel='no'>
<label>system_u:system_r:svirt_t:s0:c392,c662</label>
</seclabel>
In this case, libvirt did not modify the context label of this file; the seclabel attribute node is set to “static” with its relabel attribute set to “disabled”, which resulted in SELinux blocking access to the file because it is outside the MCS scope that was defined for the VM process.
This is the error that is reported:
$ virsh start ol8img
error: Failed to start domain 'ol8img'
error: internal error: process exited while connecting to monitor: 2025-06-04T09:58:24.745572Z qemu-kvm: -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2': Permission denied
When the SELinux label of the file in question is inspected, it is untouched and still holds the default label for an uncategorised file in this root directory.
$ ls -dZ /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
unconfined_u:object_r:admin_home_t:s0 1278345216 /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
MCS works on a principle of dominance – access is only granted if the accessor’s category is a superset of the target. Since the vm has a more restrictive category applied (two categories against the file’s none), access to the file is blocked for this vm process by SELinux.
Static Labelling with Relabel Enabled
When libvirt is directed to relabel the disk image file (by setting relabel='yes'), the operation succeeds.
<seclabel type='static' model='selinux' relabel='yes'>
<label>system_u:system_r:svirt_t:s0:c392,c662</label>
</seclabel>
$ virsh start ol8img
Domain 'ol8img' started
When the SELinux label of the disk image is inspected, it has been rewritten by libvirt to the value specified in the xml as shown above:
$ ls -dZ /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
system_u:object_r:svirt_image_t:s0:c392,c662 /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
The previous label is now saved in the file’s extended attributes, accessible through getfattr:
$ getfattr -d -m- /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2 --absolute-names
# file: /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
security.selinux="system_u:object_r:svirt_image_t:s0:c392,c662"
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="unconfined_u:object_r:admin_home_t:s0" <<<
trusted.libvirt.security.timestamp_dac="1748503666"
trusted.libvirt.security.timestamp_selinux="1748503666"
This behaviour is controlled through the remember_owner configuration key in the QEMU configuration file. Setting this key to 1 ensures the original security context is restored correctly when the file is not in use anymore (i.e. when the VM is shut down). Disabling it results in the loss of any customised security label—the label gets reset to the defaults appropriate for the file.
Investigating SELinux-related denials
Building on the denial observed in the static labelling example above, this section details techniques to identify and analyze SELinux-related access issues in the context of libvirt.
In the example shown above, the ‘permission denied’ error is because of SELinux blocking access to the disk image. But how does one identify if the access was blocked by SELinux?
By default, SELinux denials (or potential denials in audit mode) are logged to both /var/log/messages and /var/log/audit/audit.log. Recent logs from these two sources can be manually inspected to identify any potential causes of blockers. Since /var/log/messages also includes logs from other sources, lone SELinux messages could potentially be lost in the message storm on a typically busy system. Utilities that allow more targeted audit event lookups are therefore desirable – that is where ausearch comes into play.
ausearch allows targeted querying based on multiple factors, the most common being timestamp-based and based on process name/id. In the example above, if one were to identify if the denial was caused by SELinux, one can query ausearch based on recency and process name. This, however, requires the auditd daemon to be running at the time of incident, otherwise ausearch will not return the erring event. In that case, start the auditd daemon and reattempt the problematic action.
Let’s use these tools and attempt to analyse a libvirt access error. We have a vm xml having a single boot disk that is modified to include the seclabel xml mentioned earlier; it fails to boot when you attempt to start it through virsh. The chronology of analysis is therefore similar to what’s shown below:
$ virsh start ol810
error: Failed to start domain 'ol810'
error: internal error: process exited while connecting to monitor: 2025-12-05T08:49:41.132863Z qemu-kvm: -blockdev {"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":null}: Could not open '/var/lib/libvirt/images/ol810.qcow2': Permission denied
$ ausearch -c qemu-kvm -i -ts recent
----
type=PROCTITLE msg=audit(12/05/2025 00:49:41.131:15655) : proctitle=/usr/libexec/qemu-kvm -name guest=ol810,debug-threads=on -S -object {"qom-type":"secret","id":"masterKey0","format":"raw","file"
type=SYSCALL msg=audit(12/05/2025 00:49:41.131:15655) : arch=x86_64 syscall=openat success=no exit=EACCES(Permission denied) a0=AT_FDCWD a1=0x55e3e5abe151 a2=O_RDWR|O_CLOEXEC a3=0x0 items=0 ppid=1 pid=467192 auid=unset uid=qemu gid=qemu euid=qemu suid=qemu fsuid=qemu egid=qemu sgid=qemu fsgid=qemu tty=(none) ses=unset comm=qemu-kvm exe=/usr/libexec/qemu-kvm subj=system_u:system_r:svirt_t:s0:c392,c662 key=(null)
type=AVC msg=audit(12/05/2025 00:49:41.131:15655) : avc: denied { write } for pid=467192 comm=qemu-kvm name=ol810.qcow2 dev="dm-0" ino=168652224 scontext=system_u:system_r:svirt_t:s0:c392,c662 tcontext=system_u:object_r:virt_image_t:s0tclass=file permissive=0
This is indeed the same SELinux event that prevented the vm from starting. Passing -ts recent filters events based on timestamp – recent queries events in the last ten minutes from the time the ausearch query was triggered. The same event is also logged to /var/log/messages as well, albeit in a more verbose format:
$ journalctl -t setroubleshoot
Dec 05 00:49:43 setroubleshoot[467211]: SELinux is preventing /usr/libexec/qemu-kvm from write access on the file ol810.qcow2.
***** Plugin qemu_file_image (98.8 confidence) suggests *******************
If ol810.qcow2 is a virtualization target
Then you need to change the label on ol810.qcow2'
Do
# semanage fcontext -a -t virt_image_t 'ol810.qcow2'
# restorecon -v 'ol810.qcow2'
***** Plugin catchall (2.13 confidence) suggests **************************
If you believe that qemu-kvm should be allowed write access on the ol810.qcow2 file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'qemu-kvm' --raw | audit2allow -M my-qemukvm
# semodule -X 300 -i my-qemukvm.pp
SELinux comes with various helpful plugins that automatically analyse audit events and suggest corrective measures – qemu_file_image is one such plugin. However, the fix here adds an exception for the label that the domain posesses, and will require modification if that changes on the guest. Therefore, instead of patching up the problem incorrectly like this, the correct solution is to ensure the disk image has the proper SELinux label that the domain configuration requires.
A quick way to identify if SELinux might be the one causing access issues is to repeat the operation with SELinux in Permissive mode – this ensures that SELinux only audits potential restrictive actions (instead of actually carrying them out). If the operation succeeds with SELinux in permissive mode, then SELinux is blocking access to the desired resources, and the root cause can be analysed as shown above.
If the problem still persists after setting SELinux to Permissive mode, then the problem is out of scope of SELinux, and deeper analysis will be required. SELinux can be set to Permissive mode using the setenforce utility.
$ setenforce 0
$ getenforce
Permissive
Per-domain view of the /dev subsystem
As part of its hardening operations, libvirt starts each qemu process in its own separate mount namespace; this minimises the attack surface a potentially-breached vm can target. qemu only populates the devices that are required for the vm, as specified in the domain xml. As of this writing, there is no way to pre-populate a namespace with disks that are not specified in a domain xml.
Libvirt recreates entries within the mount namespace, but only default block and character device files present under the /dev mount. Files not residing in this path are not recreated but instead bind-mounted into the namespace, resulting in any operations on these files persisting on disk. Only block and character devices, along with any symlinks (and its targets) are recreated in the mount namespace. Standard device files such as (/dev/zero, /dev/null, /dev/random and /dev/urandom) are recreated unconditionally in each mount namespace.
All namespaces within a system can be viewed with the lsns utility. It will list out the different types of namespaces that are currently active on the system, along with the pid of the process that created it.
$ lsns -t mnt
NS TYPE NPROCS PID USER COMMAND
4026531841 mnt 512 1 root /usr/lib/systemd/systemd --system --deserialize 24
4026531862 mnt 1 256 root kdevtmpfs
4026532407 mnt 1 1120547 root /usr/lib/systemd/systemd-udevd
4026533820 mnt 1 1417 root /usr/sbin/rdma-ndd --systemd
4026533832 mnt 1 1111828 root /usr/sbin/rsyslogd -n
4026533834 mnt 1 3081750 qemu /usr/libexec/qemu-kvm -name guest=ol810,debug-threads=on -S -
4026533895 mnt 1 1669 root /usr/sbin/NetworkManager --no-daemon
4026533896 mnt 1 1673 root /usr/sbin/irqbalance --foreground
4026533959 mnt 1 1764 root /usr/sbin/ModemManager
4026535437 mnt 1 2254 chrony /usr/sbin/chronyd
The mount namespace that was created for the vm named ol810 is visible in this list, along with the pid of the vm process itself. The actual namespace of a vm can be viewed through its /proc entry. The root of a each process’ namespace is conveniently available at /proc/<qemu-pid>/root, and can be viewed with tools like ls.
For instance, when a block device is attached to a vm, the view of the attached disk becomes the file that is recreated under its mount namespace.
To view this, first get the pid of the qemu process (-u filters on the running process user):
$ ps -u qemu
PID TTY TIME CMD
1061439 ? 00:00:11 qemu:ol8-img
Next, inspect the mount namespace of this pid. As a reference, the inode of the original disk that was specified in the xml is also listed in the first column of the output below:
$ ls -il /proc/1061439/root/dev/sdf /dev/sdf
767 brw-rw----. 1 root disk 8, 80 Jun 5 04:03 /dev/sdf
10 brw-rw----. 1 qemu qemu 8, 80 Jun 6 02:22 /proc/1061439/root/dev/sdf
It is important to observe that any operations that qemu does on the disk that pertain to its metadata are performed on the file view that is within the mount namespace, NOT the actual file path specified in the xml. This means that any extended attributes that are written to the file as part of the SELinux security driver will reflect in the mount namespace view of the device, not the host view of the device. Since disks in paths other than /dev are bind-mounted, any modifications there will reflect on the original file path only for these.
Working example
Let us note the disks that are currently attached to a test vm we are interested in. Obtain the list of devices with the domblklist subcommand of virsh:
$ virsh domblklist ol8-img
Target Source
-------------------------------------------------------------------------------
vda /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
sdb /dev/sde
Both of these disks are currently attached to the vm ol810. However, the disk attached as vda in the vm is bind-mounted, while the disk attached as sdb is recreated in the domain’s mount namespace.
To verify this, first obtain the pid of the vm so that its /proc file can be examined. Since we know that the domain is run under the user qemu, we just filter to processes running under that user with the -u flag to ps.
$ ps -u qemu
PID TTY TIME CMD
294629 ? 00:00:19 qemu:ol8-img
Now that we have the pid of the vm, inspect both the disks listed above within the namespace and compare them with their listed paths above:
$ getfattr -d -m- /dev/sde /proc/294629/root/dev/sde
getfattr: Removing leading '/' from absolute path names
# file: dev/sde
security.selinux="system_u:object_r:fixed_disk_device_t:s0"
# file: proc/294629/root/dev/sde
security.selinux="system_u:object_r:svirt_image_t:s0:c249,c396"
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="system_u:object_r:fixed_disk_device_t:s0"
trusted.libvirt.security.timestamp_dac="1749811180"
trusted.libvirt.security.timestamp_selinux="1749811180"
Only the device that was recreated under the mount namespace has the extended attributes written to it, not the original file path that was listed under /dev.
However, for the other device that is attached to the same vm, this is not the case. Inspecting this disk in the same manner results in a different observation:
$ getfattr -d -m- /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2 /proc/294629/root/var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
getfattr: Removing leading '/' from absolute path names
# file: var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
security.selinux="system_u:object_r:svirt_image_t:s0:c249,c396"
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="unconfined_u:object_r:virt_image_t:s0"
trusted.libvirt.security.timestamp_dac="1749811180"
trusted.libvirt.security.timestamp_selinux="1749811180"
# file: proc/294629/root/var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
security.selinux="system_u:object_r:svirt_image_t:s0:c249,c396"
trusted.libvirt.security.dac="+0:+0"
trusted.libvirt.security.ref_dac="1"
trusted.libvirt.security.ref_selinux="1"
trusted.libvirt.security.selinux="unconfined_u:object_r:virt_image_t:s0"
trusted.libvirt.security.timestamp_dac="1749811180"
trusted.libvirt.security.timestamp_selinux="1749811180"
The same xattrs can be seen on both the file paths – this is because it is bind-mounted. This can be further verified with the inode for both these paths. Since the xattrs are stored in the inode, files with the same inode will display the same xattrs. The file inode can be obtained through the -i flag to ls.
$ ls -li /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2 /proc/294629/root/var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
167898172 -rw-r--r--. 1 qemu qemu 1276641280 Jun 20 04:17 /proc/294629/root/var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
167898172 -rw-r--r--. 1 qemu qemu 1276641280 Jun 20 04:17 /var/lib/libvirt/images/OL8U10_x86_64-kvm-b258.qcow2
This observation is critical when debugging issues relating to SELinux labels and any general access issues with libvirt, since the file that needs to be inspected will differ based on the type of file in question.
For example, when a disk detach has not completed completely/terminated prematurely, there may be stale xattrs still present on the device within the mount namespace of the vm. In these cases, an attempt to attach the same device again will fail because libvirt will see an existing device file with valid xattrs written to it, and will therefore reject the incoming addition request with an error message like so:
error: cannot add disk to VM: Requested operation is not valid: Setting different SELinux label on /dev/sde which is already in use
This happens because along with the SELinux label, libvirt also maintains a reference count for each disk that is attached to a vm. This is done because libvirt needs to ensure that the same disk is not attached to multiple VMs (unless the disk is explicitly marked as <shareable/>, in which case reference counting is disabled) in the domain xml. Otherwise there is risk of filesystem corruption if there are concurrent writers to the filesystem – unless the filesystem itself supports it (e.g. GlusterFS). The shareable flag affects the cache mode that the disk is operated in, and can have guest io performance implications.
In such cases, the error message can be confusing without knowledge of this mount namespace, since the original file will never show any xattrs on it, so the source of libvirt’s error message might seem unclear. In such cases, it is important to first check the vm’s namespace to see if there are any stale xattrs written, and if required, clear them out. The stale xattrs can be cleared out by iterating through the list of xattrs and deleting them one by one, like the one-liner script below:
# first get the list of xattrs that are present in the device inode
$ attrs=$(getfattr -m "trusted.*" /path/to/disk | grep ^trusted)
# delete each of them individually with setfattr -x
$ for attr in $attrs; do setfattr -x $attr /path/to/disk; done
This first collects all the attribute names into a variable ($attrs) and then deletes them one by one with setfattr -x. Once the stale xattrs have been cleared, the attach operation can be retried successfully.
Disabling mount namespace usage
A mount namespace therefore acts like a chroot for the process, resulting in the process being unable to access anything outside of this mount namespace. Similar concepts exist for other subsystems (e.g. net for network) as well, though out of scope for this blog because they deal with subsystems other than the filesystem. Of note are the network, user and group namespaces. Other than the user namespace, the rest can be customised through the domain xml and/or the libvirt daemon configuration file. The user namespace can be configured through the qemu configuration file. qemu can be directed not to use mount namespaces by modifying its configuration file, and setting the namespaces key in qemu.conf to an empty list:
namespaces = [ ]
This gives each vm unfettered, direct access to any disks that are defined in the vm xml; any operation with side effects will reflect on other vms that use the same resources, which can be dangerous in a multi-tenancy environment. It is therefore recommended to retain use of mount namespaces, unless there is a explicit need to share host resources amongst multiple vms.
Conclusion
This blog highlights some of the interplays between libvirt and SELinux. Using mount namespaces allows libvirt to control the visibility that a domain process has of the host system resources on which it is running. It also outlines some common errors and how they can be resolved, to aid in developing an understanding of libvirt processes. Oracle Linux enables these features by default, and they can be tuned per requirements as outlined in this blog.