Skip historical sweep when collectors resume after a gap (closes #892)#904
Merged
erikdarlingdata merged 1 commit intodevfrom Apr 28, 2026
Merged
Skip historical sweep when collectors resume after a gap (closes #892)#904erikdarlingdata merged 1 commit intodevfrom
erikdarlingdata merged 1 commit intodevfrom
Conversation
When the Off preset, an Agent stoppage, or a server reboot pauses collection for hours, the next run of query_stats / procedure_stats / query_store would dump everything that accumulated during the gap into their delta tables in one go. On query_stats specifically (issue #885), that was enough to blow tempdb overnight. Each of the three procs now reads MAX(config.collection_log.collection_time) for its own collector_name (where status = SUCCESS) right after computing the normal cutoff. If the gap to now exceeds 5x the configured frequency (or 30 minutes, whichever is larger), it clamps the cutoff to SYSDATETIME() so only forward-going data is collected on the resume run. NULL/0 frequency_minutes safely floors to 30 minutes. XE-backed collectors (blocked_process_xml, deadlock_xml, system_health, default_trace, trace_analysis) are bounded by their own @minutes_back / @hours_back parameters and don't have the catch-up problem, so they're left alone. Snapshot collectors (wait_stats, file_io_stats, etc) insert one row per run regardless of gap and were never at risk. Verified on sql2016/2017/2019/2022/2025: all three procs deploy cleanly, heuristic fires on a 3-hour synthetic gap, stays quiet on normal runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
After an Off preset, Agent stoppage, or server reboot, the first run of `query_stats` / `procedure_stats` / `query_store` would historically sweep the entire backlog of cumulative state into the delta tables in one shot. On `query_stats` that was enough to blow tempdb overnight (the original pain in #885).
These three collectors now do a lightweight check after computing their normal cutoff: if `MAX(config.collection_log.collection_time)` for their own `collector_name` (status `SUCCESS`) is older than 5× the configured `frequency_minutes` (floored at 30 min), the cutoff is clamped to `SYSDATETIME()` so only forward-going data is collected on the resume run.
What's NOT touched
Threshold logic
```
threshold_minutes = MAX(@frequency_minutes * 5, 30)
-- with NULL/0 frequency safely defaulting to 30
```
5× breathes room for slipped cycles; 30-min floor protects 1-min collectors from false-firing on Agent restarts.
Test plan
Closes #892.