perf(native): skip backfill on clean incrementals + bench guard tuning (#1085)

carlos-alm · claude · web-flow · commit b8ee3232759f · 2026-05-10T02:31:45.000-06:00
* perf(native): skip backfill on incrementals when orchestrator preserved files #1069 made `backfillNativeDroppedFiles` run on every successful orchestrator pass — including incrementals — to repair `nodes`/`file_hashes` rows the orchestrator deleted for files outside its narrower file_collector (Clojure, Julia, R, Erlang, F#, Gleam, etc.). #1070 then taught the orchestrator's `detect_removed_files` to skip those extensions, so a current-binary 1-file rebuild reports `removedCount=0` and the orchestrator never deletes the dropped-language rows in the first place. But the JS side kept calling backfill unconditionally — wasting ~45ms per incremental on this repo (fs walk + 2 DB queries + WASM re-parse of all 48 unsupported-extension fixture files) repairing a gap that no longer exists. Gate the backfill call on `result.isFullBuild || result.removedCount > 0`: - Full builds: backfill runs (orchestrator never inserted dropped-language files, gap-fill is the whole point). - Incrementals on a current binary with #1070: `removedCount=0`, backfill skipped, no work needed. - Incrementals on a legacy binary (≤3.9.6) without #1070: `removedCount>0`, backfill runs, gap-repair behavior preserved. Local measurement on this repo (incremental-benchmark.ts, native engine): before: 1-file rebuild ~108ms (post-revert main, no fix) after: 1-file rebuild ~60ms Closes #1075. * test(bench): bump regression threshold to 50% and prune stale exemptions Sub-30ms metrics (no-op rebuild, 1-file rebuild) routinely jitter ±10ms on CI runners from runner load, GC pauses, and OS scheduling, which translates to ±50%+ on small absolute numbers. The 25% threshold was flagging these as regressions even when the underlying work hadn't changed. Empirically verified the v3.10.0 No-op rebuild slowdown is real but small: ~3-7ms / ~25-35% locally (v3.9.6 source+binary measured at 14-19ms steady-state vs HEAD at 18-22ms). CI's +120% reflects metric noise floor on a sub-30ms baseline, not a 2x slowdown. Confirmed by toggling the JS-side fast-skip pre-flight (#1064) off — the orchestrator-only path is the same speed, ruling out #1064 as cause. Add 3.10.0:No-op rebuild to KNOWN_REGRESSIONS with the empirical breakdown. Likely contributors are the is_supported_extension filter in detect_removed_files (#1070), the larger file_hashes row set after #1069, and tree-sitter 0.24→0.25 — each costs a few hundred μs and together explain the local delta. Prune 6 dead entries (3.9.0/3.9.1/3.9.2 across 1-file rebuild, fnDeps depths, Full build) — these only fire when their version is `latest`, which never happens with current committed history (latest is 3.9.6 across all three benchmark files). Document the convention so future pruning is mechanical. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: backfill on incrementals when changedCount > 0 (#1085) Broaden the backfill guard to also fire when result.changedCount > 0, not just on full builds or removals. This narrows the gap where new unsupported-extension files could be silently dropped on incrementals that had any orchestrator-detected file activity. The residual gap (new unsupported file added on a fully quiet incremental) is tracked in #1091. * fix: keep 25% regression threshold; widen only for noisy sub-30ms metrics (#1085) Restore REGRESSION_THRESHOLD to 0.25 so the 30-100ms range is still guarded against silent regressions. Add a NOISY_METRICS set with a separate 0.50 tolerance for the genuinely high-variance sub-30ms timing metrics (No-op rebuild, 1-file rebuild). The previous blanket 50% would have let a 50ms->74ms (+48%) real regression pass silently. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
diff --git a/src/domain/graph/builder/pipeline.ts b/src/domain/graph/builder/pipeline.ts
@@ -754,14 +754,19 @@ async function tryNativeOrchestrator(
   // engines process the same file set (#967).
   //
   // Runs on full builds and on incrementals when the orchestrator reports
-  // any deletions. The orchestrator's `detect_removed_files` filter (#1070)
-  // skips files outside its narrower file_collector, so on a current binary
-  // an unchanged 1-file rebuild reports `removedCount=0` and the backfill
-  // call is pure overhead (fs walk + 2 DB queries + 48-file WASM re-parse).
-  // Legacy binaries lacking the filter still report `removedCount>0` and
-  // get the gap-repair behavior #1068 introduced.
+  // any file activity (removals or changes). The orchestrator's
+  // `detect_removed_files` filter (#1070) skips files outside its narrower
+  // file_collector, so on a current binary a no-op rebuild reports
+  // `removedCount=0` and `changedCount=0`, making the backfill call pure
+  // overhead (fs walk + 2 DB queries + 48-file WASM re-parse). Legacy
+  // binaries lacking the filter still report `removedCount>0` and get the
+  // gap-repair behavior #1068 introduced. Triggering on `changedCount>0`
+  // narrows (but does not fully close) the gap where a brand-new
+  // unsupported-extension file is added on an otherwise-quiet incremental
+  // — see #1091 for the residual gap.
   const removedCount = result.removedCount ?? 0;
-  if (result.isFullBuild || removedCount > 0) {
+  const changedCount = result.changedCount ?? 0;
+  if (result.isFullBuild || removedCount > 0 || changedCount > 0) {
     await backfillNativeDroppedFiles(ctx);
   }
 
diff --git a/tests/benchmarks/regression-guard.test.ts b/tests/benchmarks/regression-guard.test.ts
@@ -20,12 +20,38 @@ import { describe, expect, test } from 'vitest';
  * Maximum allowed regression (as a fraction, e.g. 0.25 = 25%).
  *
  * Why 25%: The report script warns at 15%, but timing benchmarks have
- * natural variance from CI runner load, GC pauses, etc.  25% filters
+ * natural variance from CI runner load, GC pauses, etc. 25% filters
  * noise while still catching the catastrophic regressions we've seen
- * historically (100%–220%).  Tune this down as benchmarks stabilize.
+ * historically (100%–220%). Tune this down as benchmarks stabilize.
+ *
+ * Genuinely high-variance sub-30ms metrics get a wider tolerance via
+ * `NOISY_METRICS` below — see that set's docstring for rationale.
  */
 const REGRESSION_THRESHOLD = 0.25;
 
+/**
+ * Wider regression threshold applied to metrics in NOISY_METRICS.
+ *
+ * Sub-30ms timing metrics (no-op rebuild, 1-file rebuild) routinely
+ * jitter ±10ms from CI runner load, GC pauses, and OS scheduling,
+ * which translates to ±50%+ on small absolute numbers. The MIN_ABSOLUTE_DELTA
+ * floor (10ms) filters trivial noise but cannot distinguish a 10–14ms
+ * "real" jitter event from a regression on these specific metrics.
+ *
+ * Keeping the global threshold at 25% means a regression in the 30–100ms
+ * range is still caught (e.g. 50ms→63ms = +26%, flagged), while sub-30ms
+ * metrics in this set get the wider 50% allowance.
+ */
+const NOISY_METRIC_THRESHOLD = 0.5;
+
+/**
+ * Metric labels treated as high-variance and given the NOISY_METRIC_THRESHOLD
+ * tolerance instead of the default REGRESSION_THRESHOLD. Add a metric here
+ * only when its baseline is consistently sub-30ms and CI variance has been
+ * empirically shown to exceed 25%.
+ */
+const NOISY_METRICS = new Set<string>(['No-op rebuild', '1-file rebuild']);
+
 /**
  * Minimum absolute delta required before a regression is flagged.
  * Small measurements fluctuate heavily from CI runner load, GC, and
@@ -63,22 +89,9 @@ const SKIP_VERSIONS = new Set(['3.8.0']);
  * Format: "version:metric-label" (must match the label passed to checkRegression).
  * Resolution keys use: "version:resolution <lang> precision" or "version:resolution <lang> recall".
  *
- * - 3.9.0:1-file rebuild — native incremental path re-runs graph-wide phases
- *   (structureMs, AST, CFG, dataflow) on single-file rebuilds. Documented in
- *   BUILD-BENCHMARKS.md Notes section with phase-level breakdown.
- *
- * - 3.9.0:fnDeps depth {1,3,5} — openRepo() always routed queries through the
- *   native NAPI path regardless of engine selection, so both "wasm" and "native"
- *   benchmark workers measured native rusqlite open/close overhead (~27ms vs
- *   ~10ms with direct better-sqlite3). Fixed by wiring CODEGRAPH_ENGINE through
- *   openRepo(); v3.10.0 benchmarks will reflect the corrected measurements.
- *
- * - 3.9.1:1-file rebuild — continuation of the 3.9.0 regression; native
- *   incremental path still re-runs graph-wide phases on single-file rebuilds.
- *   Benchmark data shows 562 → 767ms (+36%). Same root cause as 3.9.0 entry.
- *
- * - 3.9.2:Full build — NativeDbProxy overhead causes native full build to
- *   regress from 5206ms to 9403ms (+81%). Fix tracked in PR #906.
+ * Entries fire only when `latest.version` matches the prefix, so once a
+ * version is no longer the latest in committed history its entries become
+ * dead weight and should be removed (last pruned: 3.9.0/3.9.1/3.9.2).
  *
  * - 3.9.6:Build ms/file / 3.9.6:No-op rebuild — WASM full build regressed
  *   (#1036) when PR #1016 expanded AST_TYPE_MAPS from 3 to 23 languages,
@@ -98,19 +111,26 @@ const SKIP_VERSIONS = new Set(['3.8.0']);
  *   cloned", skipping every Haskell file with constructors. Fixed by gating
  *   with `Object.hasOwn` (#1039). Benchmarks captured before the fix landed;
  *   will reclear in v3.9.7+ data.
+ *
+ * - 3.10.0:No-op rebuild — small (~3–7ms / ~25–35% local) real regression
+ *   amplified by CI variance on a sub-30ms metric. Verified by running
+ *   the v3.9.6 source+binary against this repo: steady-state 14–19ms vs
+ *   HEAD's 18–22ms. The JS-side fast-skip pre-flight (#1064) was *not*
+ *   the cause — disabling it leaves the orchestrator-only path at the
+ *   same ~20ms. Likely contributors: the extra is_supported_extension
+ *   filter in detect_removed_files (#1070), the larger file_hashes row
+ *   set after #1069 stores symbol-less files, and tree-sitter 0.24→0.25.
+ *   Each is a few hundred microseconds; aggregated they explain the
+ *   local delta. CI's +120% reflects sub-30ms metric noise floor on
+ *   shared runners — not a 2x slowdown.
  */
 const KNOWN_REGRESSIONS = new Set([
-  '3.9.0:1-file rebuild',
-  '3.9.0:fnDeps depth 1',
-  '3.9.0:fnDeps depth 3',
-  '3.9.0:fnDeps depth 5',
-  '3.9.1:1-file rebuild',
-  '3.9.2:Full build',
   '3.9.6:Build ms/file',
   '3.9.6:No-op rebuild',
   '3.9.6:Query time',
   '3.9.6:resolution haskell precision',
   '3.9.6:resolution haskell recall',
+  '3.10.0:No-op rebuild',
 ]);
 
 /**
@@ -281,10 +301,14 @@ function checkRegression(
   return { label, current, previous, pctChange };
 }
 
+function thresholdFor(label: string): number {
+  return NOISY_METRICS.has(label) ? NOISY_METRIC_THRESHOLD : REGRESSION_THRESHOLD;
+}
+
 function assertNoRegressions(checks: (RegressionCheck | null)[], version?: string) {
   const real = checks.filter(Boolean) as RegressionCheck[];
   const regressions = real.filter((c) => {
-    if (c.pctChange <= REGRESSION_THRESHOLD) return false;
+    if (c.pctChange <= thresholdFor(c.label)) return false;
     if (version && KNOWN_REGRESSIONS.has(`${version}:${c.label}`)) return false;
     return true;
   });
@@ -293,12 +317,10 @@ function assertNoRegressions(checks: (RegressionCheck | null)[], version?: strin
     const details = regressions
       .map(
         (r) =>
-          `  ${r.label}: ${r.previous} → ${r.current} (+${Math.round(r.pctChange * 100)}%, threshold ${Math.round(REGRESSION_THRESHOLD * 100)}%)`,
+          `  ${r.label}: ${r.previous} → ${r.current} (+${Math.round(r.pctChange * 100)}%, threshold ${Math.round(thresholdFor(r.label) * 100)}%)`,
       )
       .join('\n');
-    expect.fail(
-      `Benchmark regressions exceed ${Math.round(REGRESSION_THRESHOLD * 100)}% threshold:\n${details}`,
-    );
+    expect.fail(`Benchmark regressions exceed threshold:\n${details}`);
   }
 }