fix(native): js-side fast-skip for incremental no-op rebuilds (#1054) by carlos-alm · Pull Request #1064 · optave/ops-codegraph-tool

carlos-alm · 2026-05-04T23:40:17Z

Summary

The Rust orchestrator's internal early-exit fires reliably locally but not in CI, where every native no-op rebuild was paying the full ~2s pipeline cost (parse, ast, cfg, dataflow, edges all re-running). WASM exits in ~20ms via detectChanges before any work happens.

This adds a JS-side equivalent: a read-only Tier-0/Tier-1 (mtime+size) check before invoking the native orchestrator. When every collected file matches file_hashes, skip the orchestrator entirely. Tier-2 hashing stays on the native side — any mismatch falls through and lets Rust's detect_changes remain the source of truth.

New detectNoChanges() helper in stages/detect-changes.ts — pure read of DB + filesystem, never mutates either
Wired into buildGraph() immediately after setupPipeline, gated on nativeAvailable && engineName === 'native' && incremental && !forceFullRebuild && !opts.scope
On match: writes journal header, closes DB, returns (matches WASM's early-exit semantics)
On mismatch / any error: falls through to the existing native orchestrator

Benchmark (this repo, 744 files)

	full	noop	1-file
native (before)	~2200ms	~2125ms	~60ms
native (after)	2194ms	22ms	59ms
wasm	7664ms	23ms	60ms

Native no-op now matches WASM's ~20ms early-exit.

Test plan

npm run build passes
biome check clean on changed files
scripts/incremental-benchmark.ts shows native noop ≈ wasm noop (~22ms)
CI bench gate confirms regression is gone

Follow-up

#1062 (the diagnostic timing-trace branch) can be closed unmerged once this lands.

Closes #1054

The Rust orchestrator's internal early-exit fires reliably locally but not in CI, where every native no-op rebuild was paying the full ~2s pipeline cost (parse, ast, cfg, dataflow, edges all re-running). WASM exits in ~20ms via detectChanges before any work happens. Mirror that behavior at the JS layer: a read-only Tier-0/Tier-1 (mtime+size) check before invoking the native orchestrator. When every collected file matches file_hashes, skip the orchestrator entirely. Tier-2 hashing stays on the native side — any mismatch falls through and lets Rust's detect_changes remain the source of truth. Benchmark on this repo (744 files): native noopRebuildMs: 2125ms → 22ms (matches WASM's 23ms) Closes #1054

claude · 2026-05-04T23:40:34Z

Claude finished @carlos-alm's task —— View job

PR Review Complete ✅

Reviewing PR #1064: fix(native): js-side fast-skip for incremental no-op rebuilds

Todo List:

Phase 1: Root Cause Analysis
Phase 2: Configuration Change Detection
Phase 3: Test Integrity Check
Phase 4: Code Quality Review
Phase 5: Backlog Compliance Assessment
Final Assessment & Recommendation

Root Cause Analysis

Problem Being Solved: The Rust native orchestrator's internal early-exit for no-op rebuilds was reliable locally but failing in CI, causing every native incremental rebuild to pay the full ~2s pipeline cost instead of exiting in ~20ms like WASM.

Why This Approach: Adds a JS-side pre-flight check using Tier-0/Tier-1 detection (mtime+size) before invoking the native orchestrator. This matches WASM's early-exit semantics and allows skipping the expensive orchestrator call entirely when no files have changed.

Risk Assessment: Low risk - this is a pure optimization that falls back gracefully to the existing native orchestrator on any failure or mismatch.

Backlog Compliance

Zero-dep: ✓ — Uses only existing tree-sitter/SQLite/filesystem APIs
Foundation-aligned: ✓ — Pure performance optimization, maintains always-current graph (P1) and zero-cost core (P4)
Problem-fit: 3 — Performance improvement for build tooling, tangentially helps agent responsiveness but doesn't directly address core "lost AI" problem
Breaking: No — Purely additive optimization with identical external behavior
Tier: 1

Critical Concerns

None identified. This is a well-implemented performance optimization.

Final Recommendation

Rating: ⭐⭐⭐⭐ (4/5)
Action: APPROVE
Reasoning: Excellent implementation quality, clear problem/solution fit, comprehensive error handling, and proper fallback behavior. Achieves stated performance goals (native no-op ~2125ms → ~22ms) with minimal risk. The conservative Tier-0/Tier-1 approach correctly defers to the native orchestrator as source of truth for complex cases.

Minor Suggestion: Consider adding a test specifically for detectNoChanges function to validate the mtime+size logic, though existing detectChanges tests provide good coverage.

github-actions · 2026-05-04T23:48:47Z

Codegraph Impact Analysis

4 functions changed → 9 callers affected across 7 files

buildGraph in src/domain/graph/builder/pipeline.ts:993 (6 transitive callers)
collectFiles in src/domain/graph/builder/stages/collect-files.ts:100 (7 transitive callers)
detectNoChanges in src/domain/graph/builder/stages/detect-changes.ts:537 (6 transitive callers)
hasEmptyAnalysisTable in src/domain/graph/builder/stages/detect-changes.ts:594 (5 transitive callers)

greptile-apps · 2026-05-04T23:49:33Z

Greptile Summary

Adds a JS-side Tier-0/Tier-1 (mtime+size) pre-flight check in buildGraph that short-circuits before invoking the native Rust orchestrator on incremental no-op rebuilds, matching WASM's ~20ms early-exit. The previous issues from the review thread (missing runPendingAnalysis guard, redundant collectFiles traversal on fallthrough) have been addressed in this revision.
detectNoChanges conservatively returns false when CFG/dataflow analysis is newly enabled but its table is empty, correctly mirroring the runPendingAnalysis guard at opts.cfg !== false. The collectFiles stage now early-returns when ctx.allFiles is already populated, and the catch block resets both ctx.allFiles and ctx.discoveredDirs so the JS fallback always re-collects under its own state.
Six new unit tests cover empty file_hashes, mtime+size match, deleted tracked file, mtime drift, and both pending-analysis guards.

Confidence Score: 5/5

Safe to merge — logic is conservative (any mismatch falls through to the orchestrator), previous review issues are addressed, and test coverage matches the documented edge cases.

All identified concerns from prior review threads are addressed: the pending-analysis guard mirrors runPendingAnalysis, the double collectFiles walk is eliminated, and the error-path resets ctx.allFiles/ctx.discoveredDirs correctly. The fast-skip is read-only and conservative by design — any exception, new file, deleted file, mtime or size drift, or empty analysis table causes a fallthrough to the authoritative native orchestrator. No correctness or data-loss risks found.

No files require special attention.

Important Files Changed

Filename	Overview
src/domain/graph/builder/stages/detect-changes.ts	Adds exported `detectNoChanges` and private `hasEmptyAnalysisTable`; logic is conservative and correctly mirrors `runPendingAnalysis`'s `opts.cfg !== false` / empty-table guard.
src/domain/graph/builder/pipeline.ts	Wires in the pre-flight fast-skip with correct gating conditions, error-path resets, and a single `closeDb` (no `nativeDb` has been opened yet at this point).
src/domain/graph/builder/stages/collect-files.ts	Adds an early-return guard to skip the filesystem walk when `ctx.allFiles` is already populated by the pre-flight; correctly excluded from the scoped-rebuild path.
tests/builder/detect-changes.test.ts	Six new unit tests covering all documented edge cases for `detectNoChanges`, including the pending-analysis guards introduced in this revision.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[buildGraph] --> B[setupPipeline\nopenDb + initSchema]
    B --> C{nativeAvailable &&\nengine=native &&\nincremental &&\n!forceFullRebuild &&\n!scope?}
    C -- No --> G
    C -- Yes --> D[collectFiles\nmtime+size pre-flight]
    D --> E{detectNoChanges?\nTier-0/Tier-1 check}
    E -- "cfg/dataflow tables empty\n→ conservative false" --> G
    E -- "new/deleted/changed file\n→ false" --> G
    E -- Error --> F[reset ctx.allFiles\nctx.discoveredDirs → undefined]
    F --> G
    E -- "all mtime+size match\n→ true" --> Z[writeJournalHeader\ncloseDb\nreturn undefined ✓]
    G[tryNativeOrchestrator\nRust detect_changes source of truth] --> H{result?}
    H -- early-exit --> Y[closeDbPair\nreturn undefined]
    H -- BuildResult --> X[return result]
    H -- throws --> I[JS pipeline fallback\nrunPipelineStages]
    I --> J[collectFiles\nearly-return if allFiles populated]
    J --> K[detectChanges → runPendingAnalysis\n→ parseFiles → … → finalize]

_{Reviews (2): Last reviewed commit: "fix(builder): avoid redundant collectFil..." | Re-trigger Greptile}

greptile-apps · 2026-05-04T23:49:36Z

  purgeAndAddReverseDeps(ctx, changePaths, reverseDeps);
 }

+/**
+ * Read-only pre-flight check for the native orchestrator.
+ *
+ * Returns true iff every collected source file has matching mtime+size in
+ * `file_hashes` and no DB-tracked file has been removed. When true, the
+ * caller can short-circuit before invoking the native orchestrator —
+ * matching WASM's ~20 ms early-exit path and avoiding the ~2s flat
+ * per-call native rebuild overhead seen in CI (#1054).
+ *
+ * Intentionally Tier-0/Tier-1 only (journal + mtime/size). Tier-2 content
+ * hashing is left to the native side: when this returns false the caller
+ * falls through to the orchestrator, which performs its own complete
+ * detection and is the source of truth.
+ *
+ * Pure read of `db` and the filesystem — never mutates either.
+ */
+export function detectNoChanges(
+  db: BetterSqlite3Database,
+  allFiles: string[],
+  rootDir: string,
+): boolean {
+  let hasTable = false;
+  try {
+    db.prepare('SELECT 1 FROM file_hashes LIMIT 1').get();
+    hasTable = true;
+  } catch {
+    /* table missing — first build */
+  }
+  if (!hasTable) return false;
+
+  const rows = db.prepare('SELECT file, hash, mtime, size FROM file_hashes').all() as FileHashRow[];
+  if (rows.length === 0) return false;
+  const existing = new Map<string, FileHashRow>(rows.map((r) => [r.file, r]));
+
+  const currentFiles = new Set<string>();
+  for (const file of allFiles) {
+    currentFiles.add(normalizePath(path.relative(rootDir, file)));
+  }
+  for (const existingFile of existing.keys()) {
+    if (!currentFiles.has(existingFile)) return false;
+  }
+
+  for (const file of allFiles) {
+    const relPath = normalizePath(path.relative(rootDir, file));
+    const record = existing.get(relPath);
+    if (!record) return false;
+    const stat = fileStat(file) as FileStat | undefined;
+    if (!stat) return false;
+    const storedMtime = record.mtime || 0;
+    const storedSize = record.size || 0;
+    if (storedSize <= 0) return false;
+    if (Math.floor(stat.mtimeMs) !== storedMtime || stat.size !== storedSize) return false;
+  }
+
+  return true;
+}


Missing runPendingAnalysis check before early exit

detectNoChanges skips the runPendingAnalysis guard that the existing detectChanges fast-exit path (lines 610–617) always executes. If CFG or dataflow analysis was newly enabled (or tables were wiped) between builds and no source files changed, mtime/size will match file_hashes, detectNoChanges will return true, the function will write the journal header and return — and the pending analysis pass will silently never run. The data stays empty in cfg_blocks/dataflow indefinitely on no-op-rebuild repos.

The existing JS path guards this explicitly: it calls runPendingAnalysis(ctx) before writing the journal. Either the same check needs to be reproduced here (checking cfg_blocks/dataflow row counts against opts.cfg/opts.dataflow), or detectNoChanges must return false conservatively whenever those tables are empty and the corresponding options are enabled.

Fixed in 8699471. detectNoChanges now takes an optional opts parameter and conservatively returns false when:

opts.cfg !== false and cfg_blocks is empty

opts.dataflow !== false and dataflow is empty

This mirrors the same empty-table check that runPendingAnalysis (line ~244) uses to decide whether to run the analysis pass. When the guard fires, the caller falls through to tryNativeOrchestrator (or, on native fallback, to runPipelineStages), which invokes detectChanges and the existing runPendingAnalysis path — populating CFG/dataflow even on no-op-rebuild repos.

The pipeline.ts call site now passes ctx.opts so the guard activates for every fast-skip attempt. Added 6 unit tests covering empty file_hashes, mtime+size match, deleted tracked file, mtime drift, and the new pending-analysis guards for both cfg and dataflow.

greptile-apps · 2026-05-04T23:49:37Z

+      try {
+        await collectFiles(ctx);
+        if (detectNoChanges(ctx.db, ctx.allFiles, ctx.rootDir)) {
+          info('No changes detected. Graph is up to date.');
+          writeJournalHeader(ctx.rootDir, Date.now());
+          closeDb(ctx.db);
+          return;
+        }
+      } catch (err) {
+        // Pre-flight is best-effort — any failure falls through to the
+        // orchestrator, which performs its own complete detection.
+        debug(`native fast-skip pre-flight failed: ${toErrorMessage(err)}`);
+      }
+    }


Redundant collectFiles call on fallthrough to runPipelineStages

When detectNoChanges returns false (changes detected), control falls through to tryNativeOrchestrator. If the native orchestrator itself fails and runPipelineStages is reached, collectFiles is called again at line 901 — doubling the collection work on this path. ctx.allFiles is overwritten with identical data, so there is no functional bug, but the extra traversal adds latency that counteracts the PR's goal of eliminating unnecessary overhead on the non-skip path. Consider guarding the second collectFiles call in runPipelineStages with a check like if (!ctx.allFiles?.length), or resetting ctx.allFiles in the catch block above so the second call is always the canonical one.

Fixed in 540a420. Guarded the collectFiles stage so it returns early when ctx.allFiles and ctx.discoveredDirs are already populated (and opts.scope is unset). On the fallthrough path (pre-flight ran but changes were detected → tryNativeOrchestrator → JS pipeline fallback), runPipelineStages now reuses the already-collected file list instead of re-walking the filesystem.

To keep the guard correct on the failure path, the buildGraph catch block now resets ctx.allFiles and ctx.discoveredDirs to undefined when the pre-flight throws — so runPipelineStages re-collects under its own engine state if we got partially populated before the error.

) The Tier-0/Tier-1 fast-skip introduced by #1064 short-circuited builds based purely on mtime+size, missing the runPendingAnalysis guard that the existing JS early-exit path (detectChanges, line ~610) always runs. If CFG or dataflow analysis was enabled (or tables wiped) between builds and no source files changed, mtime/size would still match file_hashes, detectNoChanges would return true, and the pending analysis pass would silently never run — leaving cfg_blocks and dataflow empty indefinitely on no-op-rebuild repos. Add a conservative pending-analysis guard: when opts.cfg !== false and cfg_blocks is empty, return false; same for opts.dataflow / dataflow. The caller then falls through to the orchestrator (or JS pipeline), which populates the tables via the existing runPendingAnalysis path. Adds unit tests for detectNoChanges covering empty file_hashes, mtime+size match, deleted tracked file, mtime drift, and the new pending-analysis guards for both cfg and dataflow. Impact: 2 functions changed, 7 affected

…1064) When the JS-side fast-skip pre-flight ran but detectNoChanges returned false, control fell through to tryNativeOrchestrator and, on native fallback, to runPipelineStages — which called collectFiles again at line 901, doubling the filesystem walk on the non-skip path and counteracting the PR's goal of eliminating overhead. Guard the collectFiles stage so it returns early when ctx.allFiles and ctx.discoveredDirs are already populated (and not in scoped mode). On pre-flight failure, the buildGraph catch block now resets these fields so the guard correctly falls through and re-collects under runPipelineStages's own engine state. Impact: 2 functions changed, 8 affected

carlos-alm · 2026-05-05T04:30:26Z

Addressed Claude's minor suggestion in 8699471:

Added a dedicated describe('detectNoChanges fast-skip', ...) block in tests/builder/detect-changes.test.ts with 6 unit tests covering the mtime+size logic: empty file_hashes (first build), mtime+size match, deleted tracked file, mtime drift, and the new pending-analysis guards for both cfg and dataflow.

carlos-alm · 2026-05-05T04:31:05Z

@greptileai

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

carlos-alm added 2 commits May 4, 2026 22:29

This was referenced May 5, 2026

chore(bench): per-iteration phase trace for #1054 diagnosis [DO NOT MERGE] #1062

Closed

ci: add 10-minute timeout to npm install steps #1065

Merged

carlos-alm merged commit 6686019 into main May 5, 2026
42 of 44 checks passed

carlos-alm deleted the fix/issue-1054-native-incremental-skip branch May 5, 2026 06:17

github-actions Bot locked and limited conversation to collaborators May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(native): js-side fast-skip for incremental no-op rebuilds (#1054)#1064

fix(native): js-side fast-skip for incremental no-op rebuilds (#1054)#1064
carlos-alm merged 3 commits into
mainfrom
fix/issue-1054-native-incremental-skip

carlos-alm commented May 4, 2026

Uh oh!

claude Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 4, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 4, 2026

Uh oh!

carlos-alm May 5, 2026

Uh oh!

greptile-apps Bot May 4, 2026

Uh oh!

carlos-alm May 5, 2026

Uh oh!

carlos-alm commented May 5, 2026

Uh oh!

carlos-alm commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented May 4, 2026

Summary

Benchmark (this repo, 744 files)

Test plan

Follow-up

Uh oh!

claude Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Root Cause Analysis

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

greptile-apps Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 5, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented May 5, 2026

Uh oh!

carlos-alm commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

greptile-apps Bot commented May 4, 2026 •

edited

Loading