Skip to content

fix: persist snapshot state across restarts to prevent incomplete win…#180

Draft
peatey wants to merge 3 commits into
developfrom
fix/snapshot-state-persistence
Draft

fix: persist snapshot state across restarts to prevent incomplete win…#180
peatey wants to merge 3 commits into
developfrom
fix/snapshot-state-persistence

Conversation

@peatey

@peatey peatey commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Problem

In-Memory State Loss on Restart — The first post-restart export cycle feeds the allocation/asset pipelines with data from an incomplete window, potentially causing double-counting or gaps in the Kubecost aggregator.

Solution

Persist the lastSnapshot timestamp to the scratch directory so the snapshot provider can recover its window state across restarts.

Changes

Added SnapshotState struct to persist lastSnapshot timestamp. Implemented persistState() and recoverState() methods in ConcurrentSnapshotProvider. Added ScratchDir configuration to SnapshotConfig (from SCRATCH_DIR env var, defaults to /opt/finops-agent). Modified provider initialization to recover persisted state. Added state persistence after successful export in exporter loop. Updated SnapshotProvider interface with PersistState() method. Updated test mocks to implement new interface method.

Behavior

Cold start (missing/corrupt state file) restricts to current window only and logs warning. Restart with valid state recovers lastSnapshot timestamp and continues from correct window. State persistence only after successful export to ensure consistency (crash between snapshot and export won't record invalid timestamp).

…dow exports

- Add SnapshotState struct to persist lastSnapshot timestamp
- Implement persistState() and recoverState() methods in ConcurrentSnapshotProvider
- Add ScratchDir configuration to SnapshotConfig (from SCRATCH_DIR env var)
- Recover persisted state on provider initialization
- Persist state after successful export in exporter loop
- Add PersistState() method to SnapshotProvider interface
- Update test mocks to implement new interface method

This fixes the in-memory state loss issue where the first export after
restart would feed allocation/asset pipelines with incomplete window data.
State is persisted to {SCRATCH_DIR}/snapshot-state.json after successful
export, ensuring window continuity across restarts.
…dow exports

- Add SnapshotState struct to persist lastSnapshot timestamp
- Implement persistState() and recoverState() methods in ConcurrentSnapshotProvider
- Add ScratchDir configuration to SnapshotConfig (from SCRATCH_DIR env var)
- Recover persisted state on provider initialization
- Track emit errors and only persist state after successful export
- Add PersistState() method to SnapshotProvider interface
- Update test mocks to implement new interface method
- Add comprehensive unit tests for state persistence and recovery
- Extract state filename as constant

This fixes the in-memory state loss issue where the first export after
restart would feed allocation/asset pipelines with incomplete window data.
State is persisted to {SCRATCH_DIR}/snapshot-state.json only after all
emitters succeed, ensuring window continuity across restarts.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses snapshot window continuity across agent restarts by persisting and recovering the snapshot provider’s lastSnapshot timestamp from disk, preventing post-restart exports from using an incomplete window.

Changes:

  • Add ScratchDir to SnapshotConfig and populate it from SCRATCH_DIR (defaulting to /opt/finops-agent).
  • Persist/recover lastSnapshot via a JSON state file in ConcurrentSnapshotProvider.
  • Persist snapshot state after each exporter cycle and update test mocks for the expanded SnapshotProvider interface.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
pkg/emitter/snapshotconfig.go Adds ScratchDir config and env/default resolution for persistence location.
pkg/emitter/snapshot.go Introduces persisted state model + persist/recover logic and extends SnapshotProvider with PersistState().
pkg/emitter/exporter.go Persists snapshot state after each export loop iteration completes.
pkg/emitter/exporter_test.go Updates the test snapshot provider mock to satisfy the new interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/emitter/snapshot.go
Comment thread pkg/emitter/exporter.go Outdated
Comment thread pkg/emitter/snapshotconfig.go Outdated
Comment thread pkg/emitter/snapshot.go
Comment thread pkg/emitter/snapshot.go
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@peatey peatey marked this pull request as draft April 13, 2026 19:28
@peatey

peatey commented Apr 13, 2026

Copy link
Copy Markdown
Contributor Author

parking this for the moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants