A memory leak occurs when an application dynamically allocates memory and neglects to free it when it’s no longer needed. Memory leaks can be difficult to triage, especially when they occur in large applications with complex code paths. This blog entry provides an overview of how we used Valgrind to find and fix several QEMU memory leaks related to vCPU hot plug.
vCPU Hot Plug
vCPU hot plug is the addition and removal of virtual CPUs to a VM (virtual machine) without stopping or otherwise interrupting the VM. These operations are complex by nature. Among various complications, QEMU dynamically adds and removes many underlying data structures, so there are numerous opportunities for incomplete cleanup.
While testing vCPU hot plug on an x86 guest running KVM, we discovered QEMU was leaking a small amount of memory upon each hot unplug operation. However, it wasn’t obvious what data was being leaked and in which code paths. We decided to use Valgrind to help analyze this issue.
Valgrind to the Rescue
Valgrind is an open source toolset with tools for code and memory analysis, debugging, and profiling. For this issue, we use memcheck which tracks memory allocations and reports memory leaks. memcheck is straightforward to use and doesn’t require any modifications or relinking of the application. However, compiling with debug symbols (gcc’s -g option) provides more useful output by giving us the proper symbol names and addresses.
We invoke Valgrind, passing the QEMU command line as the main argument, along with options to specify that the memcheck tool should be used, requesting additional details for all leak types, and directing all output to a log file where it can be analyzed (see the manual for Valgrind options).
Note: Valgrind uses dynamic recompilation and instrumentation, and there is a significant performance impact. Also Valgrind uses a lot of memory (up to twice the normal amount used by the application), so make sure the host system has extra headroom to account for the additional load.
valgrind --tool=memcheck \
--leak-check=full \
--log-file=valgrind.txt \
/usr/libexec/qemu-kvm ...
Once the VM is running, we perform a single hot plug/unplug operation using the qmp-shell:
(QEMU) device_add id=cpu1 driver=host-x86_64-cpu socket-id=0 core-id=1 thread-id=0
{"return": {}}
(QEMU) device_del id=cpu1
{"return": {}}
The hot plug/unplug completes without issues, then we stop the VM and examine the log file. Valgrind reports four different leaks totalling just over 4KB:
==4060562== LEAK SUMMARY: ==4060562== definitely lost: 4,376 bytes in 4 blocks ==4060562== indirectly lost: 0 bytes in 0 blocks
In this context, definitely lost means the application no longer has pointers to allocated memory. indirectly lost means pointers to memory have been indirectly lost via a pointer structure. For example, if a list head pointer is definitely lost, all the remaining elements will be indirectly lost.
Note: The following QEMU symbol names, call stacks, line numbers, etc. are for version 6.2.0.
In addition to providing the number of bytes and occurrences, Valgrind provides a backtrace to make it easy to identify the execution flow and source of each leak. The first two leaks are similar and occur in the QEMU vCPU init path qemu_init_vcpu() -> kvm_start_vcpu_thread():
==4060562== 8 bytes in 1 blocks are definitely lost in loss record 1,031 of 8,477 ==4060562== at 0x4C3ADBB: calloc (vg_replace_malloc.c:1117) ==4060562== by 0x69EE4CD: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.5600.4) ==4060562== by 0x8D5DA1: kvm_start_vcpu_thread (kvm-accel-ops.c:68) ==4060562== by 0x7A155C: qemu_init_vcpu (cpus.c:630) ==4060562== by 0x7380FA: x86_cpu_realizefn (cpu.c:6447) ==4060562== by 0x90465D: device_set_realized (qdev.c:531) ==4060562== by 0x90E315: property_set_bool (object.c:2268) ==4060562== by 0x90C36B: object_property_set (object.c:1403) ==4060562== by 0x910703: object_property_set_qobject (qom-qobject.c:28) ==4060562== by 0x90C6D2: object_property_set_bool (object.c:1472) ==4060562== by 0x903F25: qdev_realize (qdev.c:333) ==4060562== by 0x443882: qdev_device_add_from_qdict (qdev-monitor.c:711)
Here’s the second leak:
==4060562== 56 bytes in 1 blocks are definitely lost in loss record 5,028 of 8,477 ==4060562== at 0x4C3ADBB: calloc (vg_replace_malloc.c:1117) ==4060562== by 0x69EE4CD: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.5600.4) ==4060562== by 0x8D5DB9: kvm_start_vcpu_thread (kvm-accel-ops.c:69) ==4060562== by 0x7A155C: qemu_init_vcpu (cpus.c:630) ==4060562== by 0x7380FA: x86_cpu_realizefn (cpu.c:6447) ==4060562== by 0x90465D: device_set_realized (qdev.c:531) ==4060562== by 0x90E315: property_set_bool (object.c:2268) ==4060562== by 0x90C36B: object_property_set (object.c:1403) ==4060562== by 0x910703: object_property_set_qobject (qom-qobject.c:28) ==4060562== by 0x90C6D2: object_property_set_bool (object.c:1472) ==4060562== by 0x903F25: qdev_realize (qdev.c:333) ==4060562== by 0x443882: qdev_device_add_from_qdict (qdev-monitor.c:711)
Looking at the QEMU source code, we can see the leaky malloc() calls.
accel/kvm/kvm-accel-ops.c:
64 static void kvm_start_vcpu_thread(CPUState *cpu)
65 {
66 char thread_name[VCPU_THREAD_NAME_SIZE];
67
68 cpu->thread = g_malloc0(sizeof(QemuThread)); <=== first leak
69 cpu->halt_cond = g_malloc0(sizeof(QemuCond)); <=== second leak
70 qemu_cond_init(cpu->halt_cond);
71 snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/KVM",
72 cpu->cpu_index);
73 qemu_thread_create(cpu->thread, thread_name, kvm_vcpu_thread_fn,
74 cpu, QEMU_THREAD_JOINABLE);
75 }
In this case, the vCPU’s thread identifier and condition variable are leaked when the vCPU is hot unplugged (QemuThread is a pthread_t wrapper, QemuCond is a pthread_cond_t wrapper).
Fixing the Leaks
Once we identify the leaks, it’s a matter of finding the right place to free the associated memory objects. This can be challenging because the create and cleanup operations don’t mirror one another. The solution being pursued in upstream QEMU is to restructure the code a bit and create cleanup routines. In particular, a new cleanup routine common_vcpu_thread_destroy() is invoked from the vCPU cleanup path (in cpu_remove_sync() after the thread has terminated in qemu_thread_join()).
softmmu/cpus.c:
static void common_vcpu_thread_destroy(CPUState *cpu)
{
g_free(cpu->thread);
g_free(cpu->halt_cond);
}
void cpu_remove_sync(CPUState *cpu)
{
cpu->stop = true;
cpu->unplug = true;
qemu_cpu_kick(cpu);
qemu_mutex_unlock_iothread();
qemu_thread_join(cpu->thread);
qemu_mutex_lock_iothread();
if (cpus_accel->destroy_vcpu_thread_precheck == NULL
|| cpus_accel->destroy_vcpu_thread_precheck(cpu)) {
common_vcpu_thread_destroy(cpu); <=== cleanup routine
}
}
The Other Leaks
The most significant remaining leak is a 4KB buffer used to support XSAVE:
==4060562== 4,096 bytes in 1 blocks are definitely lost in loss record 8,366 of 8,477 ==4060562== at 0x4C3B15F: memalign (vg_replace_malloc.c:1265) ==4060562== by 0x4C3B288: posix_memalign (vg_replace_malloc.c:1429) ==4060562== by 0xAC5A32: qemu_try_memalign (oslib-posix.c:210) ==4060562== by 0xAC5AA4: qemu_memalign (oslib-posix.c:226) ==4060562== by 0x6E450D: kvm_arch_init_vcpu (kvm.c:1986) ==4060562== by 0x8CEE6C: kvm_init_vcpu (kvm-all.c:510) ==4060562== by 0x8D5CC3: kvm_vcpu_thread_fn (kvm-accel-ops.c:40) ==4060562== by 0xAC81C6: qemu_thread_start (qemu-thread-posix.c:556) ==4060562== by 0x7EB2159: start_thread (in /usr/lib64/libpthread-2.28.so) ==4060562== by 0x9D45DD2: clone (in /usr/lib64/libc-2.28.so)
This leak was addressed by adding a corresponding g_free() to the appropriate cleanup routine. The final leak is related to a vCPU-specific address space list (cpu->cpu_ases); details can be seen in the proposed upstream patch.
After all leaks are addressed, Valgrind reports no more leaks after the vCPU hot plug/unplug operation!
==3449719== LEAK SUMMARY: ==3449719== definitely lost: 0 bytes in 0 blocks ==3449719== indirectly lost: 0 bytes in 0 blocks
Resources
- QEMU Documentation
- Valgrind Quick Start Guide
- Valgrind User Manual
- Valgrind packages are available at yum.oracle.com for (Oracle Linux 7 and Oracle Linux 8)