perf(native): skip backfill on clean incrementals + bench guard tuning#1085
perf(native): skip backfill on clean incrementals + bench guard tuning#1085carlos-alm merged 7 commits intomainfrom
Conversation
…ed files #1069 made `backfillNativeDroppedFiles` run on every successful orchestrator pass — including incrementals — to repair `nodes`/`file_hashes` rows the orchestrator deleted for files outside its narrower file_collector (Clojure, Julia, R, Erlang, F#, Gleam, etc.). #1070 then taught the orchestrator's `detect_removed_files` to skip those extensions, so a current-binary 1-file rebuild reports `removedCount=0` and the orchestrator never deletes the dropped-language rows in the first place. But the JS side kept calling backfill unconditionally — wasting ~45ms per incremental on this repo (fs walk + 2 DB queries + WASM re-parse of all 48 unsupported-extension fixture files) repairing a gap that no longer exists. Gate the backfill call on `result.isFullBuild || result.removedCount > 0`: - Full builds: backfill runs (orchestrator never inserted dropped-language files, gap-fill is the whole point). - Incrementals on a current binary with #1070: `removedCount=0`, backfill skipped, no work needed. - Incrementals on a legacy binary (≤3.9.6) without #1070: `removedCount>0`, backfill runs, gap-repair behavior preserved. Local measurement on this repo (incremental-benchmark.ts, native engine): before: 1-file rebuild ~108ms (post-revert main, no fix) after: 1-file rebuild ~60ms Closes #1075.
Sub-30ms metrics (no-op rebuild, 1-file rebuild) routinely jitter ±10ms on CI runners from runner load, GC pauses, and OS scheduling, which translates to ±50%+ on small absolute numbers. The 25% threshold was flagging these as regressions even when the underlying work hadn't changed. Empirically verified the v3.10.0 No-op rebuild slowdown is real but small: ~3-7ms / ~25-35% locally (v3.9.6 source+binary measured at 14-19ms steady-state vs HEAD at 18-22ms). CI's +120% reflects metric noise floor on a sub-30ms baseline, not a 2x slowdown. Confirmed by toggling the JS-side fast-skip pre-flight (#1064) off — the orchestrator-only path is the same speed, ruling out #1064 as cause. Add 3.10.0:No-op rebuild to KNOWN_REGRESSIONS with the empirical breakdown. Likely contributors are the is_supported_extension filter in detect_removed_files (#1070), the larger file_hashes row set after #1069, and tree-sitter 0.24→0.25 — each costs a few hundred μs and together explain the local delta. Prune 6 dead entries (3.9.0/3.9.1/3.9.2 across 1-file rebuild, fnDeps depths, Full build) — these only fire when their version is `latest`, which never happens with current committed history (latest is 3.9.6 across all three benchmark files). Document the convention so future pruning is mechanical. Co-Authored-By: Claude Opus 4.7 <[email protected]>
|
Claude finished @carlos-alm's task —— View job Root Cause AnalysisProblem Being Solved: Performance regression where Backlog Compliance
Code Quality AssessmentPerformance Logic: ✅ The gating logic
Type Safety: ✅ Documentation: ✅ Excellent comment explaining the optimization, referencing specific PRs and performance measurements Test Changes: The regression guard updates are well-justified:
Critical ConcernsNone identified. This is a clean performance optimization with proper safeguards. Final Recommendation
|
Greptile SummaryThis PR gates
Confidence Score: 5/5Safe to merge — the backfill guard change is additive and the bench-guard refactor is internally consistent. Both changed files are well-reasoned and self-consistent. The No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[tryNativeOrchestrator completes] --> B{result.isFullBuild?}
B -- yes --> E[backfillNativeDroppedFiles]
B -- no --> C{removedCount > 0?}
C -- yes --> E
C -- no --> D{changedCount > 0?\nnew in this PR}
D -- yes --> E
D -- no --> F[Skip backfill\n~45ms saved on clean incremental]
E --> G[closeDbPair + return timing]
F --> G
Reviews (4): Last reviewed commit: "Merge branch 'main' into perf/skip-backf..." | Re-trigger Greptile |
| if (result.isFullBuild || removedCount > 0) { | ||
| await backfillNativeDroppedFiles(ctx); | ||
| } |
There was a problem hiding this comment.
Backfill silently skipped when a new unsupported-extension file is added on an incremental
When a developer adds a new .hcl, .scala, or .swift file (any extension outside Rust's file_collector) on an incremental build that has no other removals, the Rust orchestrator reports removedCount=0 and changedCount=0 (it never sees the file). The guard therefore skips backfill, leaving the new file absent from the graph until the next full rebuild with no warning.
Pre-#1069, backfill ran only on full builds, so this gap existed then too — but #1069 intentionally closed it by running backfill on every incremental. This PR re-opens that gap for the "new unsupported file, zero Rust-side activity" case. The perf win is real, but the condition could be broadened to result.isFullBuild || removedCount > 0 || (result.changedCount ?? 0) > 0 to at least cover incrementals where any file activity was detected, reducing (though not fully closing) the window.
There was a problem hiding this comment.
Good catch — broadened the guard to fire when result.changedCount > 0 as well, so any orchestrator-detected file activity now triggers backfill. This narrows the gap meaningfully (most incrementals do have changedCount > 0).
The residual case you identified (brand-new unsupported-extension file added on an otherwise-quiet incremental, where the orchestrator sees neither removals nor changes) requires JS-side detection of files the Rust collector doesn't see — tracked as a follow-up in #1091. Fixed in 6ace213.
| * stabilize. | ||
| */ | ||
| const REGRESSION_THRESHOLD = 0.25; | ||
| const REGRESSION_THRESHOLD = 0.5; |
There was a problem hiding this comment.
The 25%→50% bump is well-justified for sub-30ms noisy metrics, but paired with
MIN_ABSOLUTE_DELTA = 10 it creates a blind spot for metrics in the 30–100ms range. At 50ms baseline, a 24ms regression (+48%, delta 24ms) would now pass silently where it would previously be caught. Consider leaving the threshold at 25% but adding a separate per-metric override (e.g. a NOISY_METRICS set) so only the genuinely high-variance sub-30ms entries get the wider tolerance.
| const REGRESSION_THRESHOLD = 0.5; | |
| const REGRESSION_THRESHOLD = 0.25; |
There was a problem hiding this comment.
Agreed — kept REGRESSION_THRESHOLD at 0.25 so the 30-100ms range stays guarded, and added a NOISY_METRICS set with a separate 0.50 tolerance for the genuinely high-variance sub-30ms metrics (No-op rebuild, 1-file rebuild). A 50ms->74ms (+48%) regression now correctly fails. Fixed in b42e107.
Broaden the backfill guard to also fire when result.changedCount > 0, not just on full builds or removals. This narrows the gap where new unsupported-extension files could be silently dropped on incrementals that had any orchestrator-detected file activity. The residual gap (new unsupported file added on a fully quiet incremental) is tracked in #1091.
…rics (#1085) Restore REGRESSION_THRESHOLD to 0.25 so the 30-100ms range is still guarded against silent regressions. Add a NOISY_METRICS set with a separate 0.50 tolerance for the genuinely high-variance sub-30ms timing metrics (No-op rebuild, 1-file rebuild). The previous blanket 50% would have let a 50ms->74ms (+48%) real regression pass silently.
Codegraph Impact Analysis1 functions changed → 5 callers affected across 5 files
|
Summary
perf(native)— GatebackfillNativeDroppedFilesonresult.isFullBuild || result.removedCount > 0. After fix(native): persist file_hashes for dropped/symbol-less files #1069 made backfill run on every successful orchestrator pass and fix(native): skip unsupported-extension files in detect_removed_files #1070 taught the orchestrator to skip unsupported-extension files, the JS side was still calling backfill unconditionally — wasting ~45ms per incremental on this repo (fs walk + 2 DB queries + WASM re-parse of all 48 unsupported-extension fixture files) repairing a gap that no longer exists. Local measurement: 1-file rebuild dropped from ~108ms to ~60ms.test(bench)— Bump regression threshold from 25% → 50% with empirical justification, prune 6 deadKNOWN_REGRESSIONSentries (3.9.0/3.9.1/3.9.2 — these only fire when their version islatest, which never happens with current committed history), and add a 3.10.0:No-op rebuild entry documenting the verified ~3-7ms regression and its likely contributors (fix(native): skip unsupported-extension files in detect_removed_files #1070 filter, fix(native): persist file_hashes for dropped/symbol-less files #1069 file_hashes growth, tree-sitter 0.24→0.25).Closes #1075.
Test plan
npm run lint && npm run buildRUN_REGRESSION_GUARD=1 npm run test:regression-guardpasses locally (modulo the pre-existing wasm Full build delta against historical 3.9.4 data, which clears once 3.10.0 data lands)