fix: persist snapshot state across restarts to prevent incomplete win…#180
Draft
peatey wants to merge 3 commits into
Draft
fix: persist snapshot state across restarts to prevent incomplete win…#180peatey wants to merge 3 commits into
peatey wants to merge 3 commits into
Conversation
…dow exports
- Add SnapshotState struct to persist lastSnapshot timestamp
- Implement persistState() and recoverState() methods in ConcurrentSnapshotProvider
- Add ScratchDir configuration to SnapshotConfig (from SCRATCH_DIR env var)
- Recover persisted state on provider initialization
- Persist state after successful export in exporter loop
- Add PersistState() method to SnapshotProvider interface
- Update test mocks to implement new interface method
This fixes the in-memory state loss issue where the first export after
restart would feed allocation/asset pipelines with incomplete window data.
State is persisted to {SCRATCH_DIR}/snapshot-state.json after successful
export, ensuring window continuity across restarts.
…dow exports
- Add SnapshotState struct to persist lastSnapshot timestamp
- Implement persistState() and recoverState() methods in ConcurrentSnapshotProvider
- Add ScratchDir configuration to SnapshotConfig (from SCRATCH_DIR env var)
- Recover persisted state on provider initialization
- Track emit errors and only persist state after successful export
- Add PersistState() method to SnapshotProvider interface
- Update test mocks to implement new interface method
- Add comprehensive unit tests for state persistence and recovery
- Extract state filename as constant
This fixes the in-memory state loss issue where the first export after
restart would feed allocation/asset pipelines with incomplete window data.
State is persisted to {SCRATCH_DIR}/snapshot-state.json only after all
emitters succeed, ensuring window continuity across restarts.
There was a problem hiding this comment.
Pull request overview
This PR addresses snapshot window continuity across agent restarts by persisting and recovering the snapshot provider’s lastSnapshot timestamp from disk, preventing post-restart exports from using an incomplete window.
Changes:
- Add
ScratchDirtoSnapshotConfigand populate it fromSCRATCH_DIR(defaulting to/opt/finops-agent). - Persist/recover
lastSnapshotvia a JSON state file inConcurrentSnapshotProvider. - Persist snapshot state after each exporter cycle and update test mocks for the expanded
SnapshotProviderinterface.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pkg/emitter/snapshotconfig.go | Adds ScratchDir config and env/default resolution for persistence location. |
| pkg/emitter/snapshot.go | Introduces persisted state model + persist/recover logic and extends SnapshotProvider with PersistState(). |
| pkg/emitter/exporter.go | Persists snapshot state after each export loop iteration completes. |
| pkg/emitter/exporter_test.go | Updates the test snapshot provider mock to satisfy the new interface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Contributor
Author
|
parking this for the moment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In-Memory State Loss on Restart — The first post-restart export cycle feeds the allocation/asset pipelines with data from an incomplete window, potentially causing double-counting or gaps in the Kubecost aggregator.
Solution
Persist the lastSnapshot timestamp to the scratch directory so the snapshot provider can recover its window state across restarts.
Changes
Added SnapshotState struct to persist lastSnapshot timestamp. Implemented persistState() and recoverState() methods in ConcurrentSnapshotProvider. Added ScratchDir configuration to SnapshotConfig (from SCRATCH_DIR env var, defaults to /opt/finops-agent). Modified provider initialization to recover persisted state. Added state persistence after successful export in exporter loop. Updated SnapshotProvider interface with PersistState() method. Updated test mocks to implement new interface method.
Behavior
Cold start (missing/corrupt state file) restricts to current window only and logs warning. Restart with valid state recovers lastSnapshot timestamp and continues from correct window. State persistence only after successful export to ensure consistency (crash between snapshot and export won't record invalid timestamp).