Asynchronous vCPU Unplug in libvirt

Live resizing of virtual machines is usually described as an administrative operation: add memory, remove a CPU, adjust placement, and continue running. The details are more subtle. Removing a vCPU from a running QEMU guest requires coordination between the host and the guest; The host can request removal of the CPU device, but the guest still has to cooperate before the operation can be considered complete. This post describes a new enhancement to “synchronous” vCPU unplug in libvirt, why it is useful for orchestration applications using libvirt, and how to use it correctly.

Background

Live vCPU changes are a common part of virtualization management. A management service may reduce the number of vCPUs assigned to a running guest when load drops, when capacity must be reclaimed, or when a policy engine decides that a guest should be resized without a reboot.

For QEMU guests, unplugging a vCPU is not simply a matter of editing libvirt’s live domain state. libvirt asks QEMU to remove the CPU device, QEMU coordinates that request with the guest, and final completion is reported only after the guest-visible removal has happened. The operation therefore has two separate phases:

request submission -> guest-visible removal

Historically, the public API exposed this mostly as a synchronous operation. That was usable for callers that wanted to block until libvirt’s wait finished, but it was not a good fit for event-driven management software.

Asynchronous vCPU unplug adds a clearer contract for libvirt’s QEMU driver. The application can ask libvirt to submit the unplug request and return once QEMU accepts it. Final completion is then reported through a vCPU-specific domain event.

The Problem with Synchronous vCPU Unplug

When libvirt unplugs a QEMU vCPU synchronously, the QEMU driver sends a device_del request and waits for QEMU’s DEVICE_DELETED event before finishing the live domain state update. This is correct from libvirt’s point of view: the vCPU should not disappear from libvirt’s live state until QEMU has actually reported that the device is gone.

The difficulty is that the waiting policy belongs to libvirt, not to the application. In the existing path, callers are bound to libvirt’s internal wait behavior. For QEMU device unplug this is normally a short built-in timeout, with a longer timeout on ppc64.

That default is necessarily generic. It is not aware of the caller’s infrastructure concerns, including but not limited to:

how busy the control plane is;
whether this operation is part of a larger resize workflow;
how long the guest is expected to take under current load;
whether the caller would rather wait for 500 ms, 30 seconds, or hand the operation to a background reconciler;
what retry, alerting, or rollback policy the surrounding system wants to use.

In other words, the synchronous API made libvirt responsible both for submitting the operation and for deciding how long the caller should wait. Large management systems usually prefer that split differently: libvirt should perform the hypervisor operation and report authoritative state changes, while the management layer should own policy decisions such as timeouts.

What Was Missing

Returning early from an unplug request is only useful if the caller can still learn what happened later. Before asynchronous vCPU unplug, libvirt had a generic failure event for rejected device removals, but it did not have a successful vCPU-specific completion event.

That missing success event was the main contract gap. A management application could not safely treat an accepted request as completed, because the guest might still be using the vCPU. If no completion was observed within libvirt’s built-in wait window, the application had to inspect or re-query domain state manually to find out whether the unplug eventually completed. If the application knew that the guest was heavily loaded and wanted to wait longer, the synchronous API did not give it a way to enforce that policy.

The missing piece was a new interface that could distinguish these outcomes:

API failure
    The request was not accepted.

API success
    The request was accepted, but the vCPU may still be present.

VIR_DOMAIN_EVENT_ID_VCPU_REMOVED
    Final success for a specific libvirt XML vCPU ID.

VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED
    Final failure after a request had already been accepted.

The new event, VIR_DOMAIN_EVENT_ID_VCPU_REMOVED, provides the missing success side. Its callback receives the libvirt vCPU ID:

typedef void (*virConnectDomainEventVcpuRemovedCallback)(virConnectPtr conn,
                                                         virDomainPtr dom,
                                                         unsigned int vcpuid,
                                                         void *opaque);

The vcpuid value matches the <vcpu id='...'> value from the domain XML. This is the value that management software should use when updating its model of the guest.

Application-Defined Timeouts

The most important behavioral change is that timeout policy can now move out of libvirt and into the management or orchestration application.

With the asynchronous flow, the application can mark a vCPU removal as pending, start its own timer, and then react to whichever terminal condition arrives first:

API returns success
    Start application timeout and keep the vCPU removal pending.

vcpu-removed arrives
    Cancel the timeout and apply the state change.

device-removal-failed arrives
    Cancel the timeout and fail the pending operation.

application timeout expires
    Re-query domain state, retry, alert, or defer to a reconciler.

This is a better division of responsibility. libvirt remains the source of truth for what QEMU accepted and what QEMU later reported. The management layer decides how long the surrounding workflow can afford to wait.

For example, a cloud control plane might use a short timeout for an interactive resize request, but a longer timeout for a background consolidation task. A host under heavy load might defer convergence to a periodic reconciler. A service with strict placement guarantees might fail the pending resize immediately if the vCPU removal does not complete within its own policy window.

None of those policies are accounted for in a generic catch-all timeout policy. They belong to the application that understands the infrastructure it is manipulating.

Triggering Async Unplug

The operation can be triggered either through virsh or through the libvirt API. In both cases, the important part is to observe events before issuing the request.

Using virsh

Start by watching domain events for the guest:

$ virsh event guest --all --loop --timestamp

In another terminal, request an asynchronous vCPU unplug for a specific vCPU:

$ virsh setvcpu guest 7 --disable --live --async

When QEMU completes the removal, the event stream reports:

event 'vcpu-removed' for domain 'guest': vcpu 7

The count-based interface uses the same completion mechanism:

$ virsh setvcpus guest 4 --live --async

The command returning successfully means that the unplug request was accepted. It does not mean that all requested vCPUs have disappeared from the live guest. The caller should wait for the relevant vcpu-removed events before treating the resize as converged.

If QEMU or the guest rejects an accepted unplug request, the failure continues to be reported through the existing event:

event 'device-removal-failed' for domain 'guest': vcpu7

The exact alias in the failure event is a device alias, not the XML vCPU ID. For robust tooling, a failure path should re-query the domain state before deciding what to retry or report.

Using the libvirt API

Applications should register for both the success and failure events before issuing the asynchronous request. This is important because the success event may be delivered before the API call itself returns.

The following example sketches the control flow for unplugging vCPU 3:

struct unplug_wait {
    unsigned int vcpuid;
    int done; /* 0 pending, 1 removed, -1 rejected */
};

static void
vcpuRemoved(virConnectPtr conn,
            virDomainPtr dom,
            unsigned int vcpuid,
            void *opaque)
{
    struct unplug_wait *wait = opaque;

    if (vcpuid == wait->vcpuid)
        wait->done = 1;
}

static void
deviceRemovalFailed(virConnectPtr conn,
                    virDomainPtr dom,
                    const char *devAlias,
                    void *opaque)
{
    struct unplug_wait *wait = opaque;

    /*
     * The failure event carries a device alias. Re-query domain state before
     * deciding whether this failed the pending vCPU unplug.
     */
    wait->done = -1;
}

/* Do this in your main routine */
struct unplug_wait wait = { .vcpuid = 3, .done = 0 };

if (virConnectDomainEventRegisterAny(conn, dom,
                                     VIR_DOMAIN_EVENT_ID_VCPU_REMOVED,
                                     VIR_DOMAIN_EVENT_CALLBACK(vcpuRemoved),
                                     &wait, NULL) < 0)
    goto error;

if (virConnectDomainEventRegisterAny(conn, dom,
                                     VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED,
                                     VIR_DOMAIN_EVENT_CALLBACK(deviceRemovalFailed),
                                     &wait, NULL) < 0)
    goto error;

if (virDomainSetVcpu(dom, "3", 0,
                     VIR_DOMAIN_SETVCPU_AFFECT_LIVE |
                     VIR_DOMAIN_SETVCPU_ASYNC_UNPLUG) < 0) {
    /*
     * The request was not accepted. Do not wait for a completion event for
     * this operation.
     */
    goto error;
}

/*
 * The request was accepted. Keep vCPU 3 in pending-removal state until an
 * event, an application timeout, or a later state reconciliation resolves it.
 */

The count-based API uses the flag VIR_DOMAIN_VCPU_ASYNC_UNPLUG instead, but it is functionally equivalent:

if (virDomainSetVcpusFlags(dom, 2,
                           VIR_DOMAIN_VCPU_LIVE |
                           VIR_DOMAIN_VCPU_ASYNC_UNPLUG) < 0) {
    /*
     * The downscale request was not accepted.
     */
    goto error;
}

For count-based downscaling, the caller should expect one VIR_DOMAIN_EVENT_ID_VCPU_REMOVED event for each removed vCPU ID defined in the libvirt XML.

Per-vCPU Completion

The new event reports libvirt vCPU IDs rather than QEMU CPU device aliases. This distinction matters for two reasons.

First, QEMU aliases are not part of the libvirt domain XML interface for vCPUs. Applications cannot define those aliases in XML, and libvirt does not expose a public XML-level way to query or specify exactly which QEMU CPU device alias should be removed. The identifier available to applications is the libvirt XML vCPU ID.

Second, some architectures can group multiple libvirt vCPUs under one QEMU hotpluggable CPU object. If such an object is removed, libvirt emits one vcpu-removed event for each libvirt vCPU ID removed from that group.

This makes the event directly usable by management software. The application does not need to translate QEMU aliases back into libvirt XML IDs on the success path. It can update its model with the same identifier it used when submitting the request.

What Happens Inside libvirt

The implementation exposes the asynchronous nature of the existing QEMU hot-unplug path.

In synchronous mode, libvirt sends QEMU device_del, waits for device removal, and then updates the live domain state. In asynchronous mode, libvirt returns after QEMU accepts the delete request:

rc = qemuDomainDeleteDevice(vm, vcpupriv->alias);
if (rc < 0) {
    ...
} else if (async_unplug) {
    return 0;
} else {
    if ((rc = qemuDomainWaitForDeviceRemoval(vm)) <= 0)
        ...
}

The existing DEVICE_DELETED handling performs the final state update and queues VIR_DOMAIN_EVENT_ID_VCPU_REMOVED. The event is also carried over libvirt’s remote protocol, so remote clients connected through libvirtd or virtqemud receive the same completion notification.

The result is a cleaner public contract without changing the fundamental QEMU requirement: vCPU unplug is complete only when QEMU reports that the device has been deleted.

Practical Behavior

The asynchronous flags apply to live vCPU unplug. They do not make config-only updates asynchronous, and they do not change vCPU plug behavior. If the live operation increases the vCPU count, the async unplug flag has no effect.

virDomainSetVcpusFlags() uses VIR_DOMAIN_VCPU_ASYNC_UNPLUG. virDomainSetVcpu() uses VIR_DOMAIN_SETVCPU_ASYNC_UNPLUG. The matching virsh commands are:

$ virsh setvcpus guest 4 --live --async
$ virsh setvcpu guest 7 --disable --live --async

For setvcpus, asynchronous unplug is not combined with guest-agent CPU changes. For setvcpu, the asynchronous path is meaningful for disable operations, not enable operations.

The main rule is that API success should be treated as “accepted”, not “completed”. Completion belongs to the event stream.

Conclusion

Asynchronous vCPU unplug makes vCPU removal fit event-driven virtualization management. It separates the act of submitting an unplug request from the policy of waiting for it, and gives applications a precise event for final success.

That separation is useful in real control planes. libvirt remains responsible for talking to QEMU and reporting authoritative completion. The application can enforce its own timeout, retry, alerting, or reconciliation policy based on the needs of the infrastructure it manages.

These changes are included in upstream libvirt from version 12.4.0, so users building from the upstream master branch can try the asynchronous unplug flow now.

References

VIR_DOMAIN_EVENT_ID_VCPU_REMOVED: vCPU removal completion event, introduced in libvirt 12.4.0.
VIR_DOMAIN_VCPU_ASYNC_UNPLUG: async flag for virDomainSetVcpusFlags(), introduced in libvirt 12.4.0.
VIR_DOMAIN_SETVCPU_ASYNC_UNPLUG: async flag for virDomainSetVcpu(), introduced in libvirt 12.4.0.
VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED: existing rejected-removal event, introduced in libvirt 1.3.4.
https://blogs.oracle.com/linux/introduction-to-cpu-hotplug
https://blogs.oracle.com/linux/vcpu-hotunplug-in-libvirt

Asynchronous vCPU Unplug in libvirt

Background

The Problem with Synchronous vCPU Unplug

What Was Missing

Application-Defined Timeouts

Triggering Async Unplug

Using virsh

Using the libvirt API

Per-vCPU Completion

What Happens Inside libvirt

Practical Behavior

Conclusion

References

Akash Kulhalli

Inside Linux Packet Filtering: Netfilter and nftables

Linux Memory Compaction: Internals and Debugging — Part 2: Observing and Interpreting Compaction Data

Asynchronous vCPU Unplug in libvirt

Background

The Problem with Synchronous vCPU Unplug

What Was Missing

Application-Defined Timeouts

Triggering Async Unplug

Using virsh

Using the libvirt API

Per-vCPU Completion

What Happens Inside libvirt

Practical Behavior

Conclusion

References

Authors

Akash Kulhalli

Inside Linux Packet Filtering: Netfilter and nftables

Linux Memory Compaction: Internals and Debugging — Part 2: Observing and Interpreting Compaction Data