Add opt-in DetachReconciler for orphan hot-plug volumes by mattia-eleuteri · Pull Request #185 · kubevirt/csi-driver

mattia-eleuteri · 2026-06-04T14:27:52Z

What this PR does / why we need it:

Even with a fully VMI-aware synchronous detach path, the CSI driver cannot catch every source of orphan hot-plug volumes:

Driver upgrades do not retroactively clean state created by older versions.
external-attacher gives up on VolumeAttachment after its force-detach-after timeout (6 min default) and removes the object without re-invoking the driver; the VMI keeps the hot-plug forever.
The VMI is mid-migration when removevolume is issued; the subresource returns HTTP 409 and the 30-second backoff inside ControllerUnpublishVolume gives up.
Downstream forks short-circuit earlier in the call graph (e.g. when the infra PVC is already gone), so upstream's ControllerUnpublishVolume is never reached.

In all four cases the VMI ends up with .status.volumeStatus[].hotplugVolume entries that no VolumeAttachment would ever reference again. The hot-plug pod stays alive and pins the infra storage device exclusively to the source host, blocking subsequent attachments.

This PR introduces an opt-in DetachReconciler that runs in the controller process alongside the gRPC server. Each cycle it:

Lists all VMIs in the infra cluster namespace.
For each VMI, fetches the parent VM (treating NotFound as "every hot-plug on this VMI is orphan").
Compares VMI.Status.VolumeStatus[*].HotplugVolume against VM.spec.template.spec.volumes. Entries present in the VMI but absent from the VM spec are candidate orphans.
Records the first observation timestamp for each candidate. Only on a second observation, separated from the first by at least the configured grace period, does the reconciler issue RemoveVolumeFromVMI. The grace period absorbs transient divergence around live migrations and normal removevolume calls.
Prunes its in-memory seen map of entries that are no longer divergent, so it does not grow unboundedly.

The reconciler is fully opt-in:

--enable-detach-reconciler (default false) — must be set to start the loop.
--detach-reconciler-sync-period (default 5m) — how often the loop runs.
--detach-reconciler-grace-period (default 5m) — minimum age of an observed divergence before acting.

No new dependencies are introduced. The reconciler uses only methods already on kubevirt.Client (ListVirtualMachines, GetWorkloadManagingVirtualMachine, RemoveVolumeFromVMI) and k8s.io/apimachinery/pkg/util/wait, which is already vendored. Logging uses klog, matching the rest of the driver.

Which issue(s) this PR fixes:

Fixes #183

Special notes for your reviewer:

A companion PR (Check VMI status when detaching, not just VM spec #184, separate and independent) makes ControllerUnpublishVolume VMI-aware so it no longer creates new orphans in the most common race (VM teardown before VMI). This reconciler is the backstop for everything that synchronous path cannot reach (legacy state, external-attacher timeouts, mid-migration detaches, downstream forks). The two PRs can be reviewed and merged in any order — neither depends on the other.
The reconciler is deliberately conservative: it never touches a hot-plug that is still in VM.spec, it requires two observations spaced by gracePeriod, and it skips VMIs with a deletion timestamp. The combination yields zero false positives against the failure modes documented in the linked issue.
Public Sync(ctx) is exposed for testing (the periodic loop in Run() is what users will rely on). Happy to switch to a package-private API + behavioural-only tests if reviewers prefer.
Tests cover: grace-period gating, no-op on legitimate hot-plug, VM-missing fan-out, terminating-VMI skip, and seen-map pruning.

Release note:

Add opt-in `--enable-detach-reconciler` flag that starts a background reconciler in the controller process. It periodically scans VMIs in the infra namespace and removes hot-plug volumes that are no longer referenced by the parent VM spec, after a configurable grace period. Default off; cadence configurable via `--detach-reconciler-sync-period` and `--detach-reconciler-grace-period`.

The companion commit makes the synchronous detach path VMI-aware, but that only fixes orphans created after the upgrade. Orphans accumulated before upgrading the driver, or created by edge cases the synchronous path still cannot reach (external-attacher giving up after force-detach timeout, VMI in mid-migration when removevolume is issued, downstream forks short-circuiting on missing infra PVC), need a backstop. This commit introduces a background reconciler that runs in the controller process and periodically scans every VMI in the infra cluster namespace, looking for hot-plugs that are no longer referenced by the parent VM spec (or whose parent VM is gone altogether). On the second observation, separated by at least gracePeriod from the first, the reconciler issues RemoveVolumeFromVMI to release the QEMU device and the hot-plug pod that pins the infra storage attachment. The reconciler is opt-in via --enable-detach-reconciler (default off) so existing deployments are unchanged. Two more flags expose the cadence: --detach-reconciler-sync-period (default 5m) and --detach-reconciler-grace-period (default 5m). The grace period absorbs transient divergence around live migrations and normal removevolume API calls; the seen-map is pruned each pass so it does not grow unboundedly. Logging uses klog, matching the rest of the driver — no new dependencies. Signed-off-by: Mattia Eleuteri <mattia@hidora.io>

kubevirt-bot · 2026-06-04T14:28:27Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign aglitke for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot · 2026-06-04T14:28:28Z

Hi @mattia-eleuteri. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kvaps

Thanks for the detailed write-up, but I don't think the motivation holds up for a permanent reconciler, and I'd suggest reshaping this as a documented one-time cleanup instead.

The external-attacher bullet is not accurate: the attacher puts a finalizer on the VolumeAttachment and keeps calling ControllerUnpublishVolume with exponential backoff (capped at 5 minutes by default) until it succeeds — the VA object cannot disappear without a successful unpublish. The 6-minute timeout is the A/D controller's maxWaitForUnmountDuration force-detach: it lets the new attach proceed early (which is exactly how the DRBD failed to set source device readwrite race happens), but the old VA stays in Terminating and keeps being retried. The mid-migration 409 case converges the same way — the attacher simply retries after the migration finishes. So with #184 merged, every orphan that still has a VolumeAttachment self-heals through the normal retry path.

That leaves orphans without a VA, which only exist because older driver versions falsely returned success on unpublish (the bug #184 fixes). That is a finite legacy set, and cleaning a finite set once is a migration, not a reconciliation loop. A documented procedure covers it: list VMIs whose status.volumeStatus[].hotplugVolume entries are absent from the parent VM spec (or whose VM is gone), then virtctl removevolume <vmi> --volume-name=<vol> for each — the same subresource this reconciler calls, but as a one-time, operator-supervised step in the release notes of the version that ships #184.

If maintainers still want a standing reconciler, there is also a correctness problem to solve first: non-persistent hot-plugs (virtctl addvolume without --persist) live only in the VMI and never appear in VM.spec.template, so the "in VMI status but not in VM spec" criterion classifies every such legitimate hot-plug as an orphan and detaches it after the grace period — repeatedly, on every re-add. And since the loop scans all VMIs in the infra namespace with no ownership filter, in shared-namespace deployments (e.g. Cozystack tenant namespaces hosting both cluster node VMs and standalone user VMs) it would detach volumes from VMs that have nothing to do with this driver. At minimum it would need to filter VMIs by cluster-ownership labels and only consider volumes matching the driver's --volume-prefix.

The downstream-fork case from the description is an argument for fixing the fork (by porting #184) rather than for an upstream daemon.

awels · 2026-06-25T16:58:52Z

Note the non persistent hotplug has been deprecated, and the default behaviour is to attach to the VM with a hotplug flag, so it can be unplugged. This behavior retains the hotplugged status after reboot.

kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Jun 4, 2026

kubevirt-bot requested a review from mhenriks June 4, 2026 14:28

mattia-eleuteri mentioned this pull request Jun 4, 2026

Check VMI status when detaching, not just VM spec #184

Merged

kvaps reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add opt-in DetachReconciler for orphan hot-plug volumes#185

Add opt-in DetachReconciler for orphan hot-plug volumes#185
mattia-eleuteri wants to merge 1 commit into
kubevirt:mainfrom
mattia-eleuteri:feat/detach-reconciler

mattia-eleuteri commented Jun 4, 2026 •

edited

Loading

Uh oh!

kubevirt-bot commented Jun 4, 2026

Uh oh!

kubevirt-bot commented Jun 4, 2026

Uh oh!

kvaps left a comment

Uh oh!

awels commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

mattia-eleuteri commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kubevirt-bot commented Jun 4, 2026

Uh oh!

kubevirt-bot commented Jun 4, 2026

Uh oh!

kvaps left a comment

Choose a reason for hiding this comment

Uh oh!

awels commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mattia-eleuteri commented Jun 4, 2026 •

edited

Loading