Skip to content

feat(snapshot): zero-pause async memory snapshots via userfaultfd write-protect#7

Draft
meAmitPatil wants to merge 11 commits into
mainfrom
amit/async-memory-snapshot
Draft

feat(snapshot): zero-pause async memory snapshots via userfaultfd write-protect#7
meAmitPatil wants to merge 11 commits into
mainfrom
amit/async-memory-snapshot

Conversation

@meAmitPatil
Copy link
Copy Markdown

@meAmitPatil meAmitPatil commented Mar 24, 2026

Summary

Reduces VM pause time during memory snapshots from 355ms to 7ms for diff snapshots by writing memory to disk in the background while the VM continues running.

How it works

Uses Linux userfaultfd write-protect mode. When an async snapshot is requested:

  1. Pause VM — get dirty bitmap from KVM, write-protect dirty pages
  2. Resume VM immediately — background thread starts writing pages to disk
  3. If VM writes to a protected page — COW handler saves the old data first, then lets the write proceed
  4. Background writer finishes — uses saved COW data for pages the VM modified, reads directly from memory for the rest

The VM only pauses long enough to set up page protection (~114ms first time, ~7ms for subsequent diffs). The actual I/O happens in the background.

This is the same technique QEMU uses for live VM snapshots and CRIU uses for live container checkpointing.

Benchmarks (bare metal, 128MB guest)

Operation Time
Sync snapshot (blocking) 355ms
Async snapshot (VM can resume) 114ms
Diff snapshot (agent tool-call loop) 7-8ms

Testing

E2E test suite validates 7 production scenarios — 20/20 passed on bare metal:

  • Async snapshot create + complete
  • Restore from async snapshot
  • Agent tool-call loop (full → diff → diff)
  • Concurrent snapshot guard
  • VM continues after immediate resume
  • Sync backward compatibility

…cleanup, snapshot_type handling

Signed-off-by: Amit Patil <[email protected]>
…tion, pre-allocated vectors

Signed-off-by: Amit Patil <[email protected]>
@meAmitPatil meAmitPatil changed the title feat(snapshot): async background memory snapshot with userfaultfd write-protect feat(snapshot): zero-pause async memory snapshots via userfaultfd write-protect Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant