Skip to content

fix: prevent short-lived pod loss on failed snapshot export#178

Open
peatey wants to merge 3 commits into
developfrom
fix/short-lived-pod-ack
Open

fix: prevent short-lived pod loss on failed snapshot export#178
peatey wants to merge 3 commits into
developfrom
fix/short-lived-pod-ack

Conversation

@peatey

@peatey peatey commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • change short-lived pod reads to two-phase semantics by returning a copy without clearing the buffer
  • add explicit short-lived pod acknowledgment to clear the buffer only after successful export
  • update exporter flow to acknowledge short-lived pods only when all emitters succeed
  • add/update tests for buffer acknowledgment behavior and exporter success/failure handling

Validation

  • go test ./pkg/emitter ./internal/mocks
  • pkg/cluster envtest suite could not run in this environment because /usr/local/kubebuilder/bin/etcd is missing

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates short-lived pod buffering semantics to avoid losing pods on failed snapshot exports by introducing an explicit acknowledgment step and adjusting exporter behavior/tests accordingly.

Changes:

  • Change GetAllShortLivedPods to return a copy (no buffer clearing on read) and add AcknowledgeShortLivedPods to explicitly clear the buffer.
  • Update exporter flow to acknowledge short-lived pods only after all emitters succeed.
  • Add/update tests covering repeated reads, acknowledgment behavior, and exporter success/failure handling.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/emitter/exporter.go Adds “ack after successful emission” logic to exporter loop.
pkg/emitter/exporter_test.go Adds tests asserting acknowledgment happens only on successful emission.
pkg/cluster/dynamic.go Implements two-phase short-lived pod semantics (copy on read + explicit acknowledge).
pkg/cluster/dynamic_test.go Updates envtest to validate buffer is not cleared until acknowledgment.
pkg/cluster/clustercache.go Extends ClusterCache interface with AcknowledgeShortLivedPods.
internal/mocks/mocks.go Updates mock ClusterCache to implement AcknowledgeShortLivedPods.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/emitter/exporter.go
Comment thread pkg/emitter/exporter.go
Comment thread pkg/emitter/exporter.go Outdated
Comment thread pkg/emitter/exporter_test.go
Comment thread internal/mocks/mocks.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants