fix(perf): scope WASM grammar load in engine-parity backfill (#1054)#1058
Conversation
The native engine drops files in some build environments (#1054), triggering a WASM backfill via the worker pool. The pool's first-call overhead is fine for full builds (amortized over hundreds of files) but dwarfs the actual parse work for small backfill batches — on slow CI runners, ~1.7s for 4 fixture files in one language. Add `parseFilesWasmInline`: a main-thread, no-worker parse path that loads only the grammars matching the input extensions and returns symbols with `_tree` set so the unified walker in `runAnalyses` populates AST/CFG/dataflow data downstream. New `parseFilesWasmForBackfill` chooses inline for batches ≤ 16 files, keeping worker isolation for larger batches where tree-sitter WASM crash protection matters more (#965). Routes both backfill sites through the new helper: - `parseFilesAuto`'s per-call inline backfill in `domain/parser.ts` - `backfillNativeDroppedFiles` in `domain/graph/builder/pipeline.ts` Refs #1054 Impact: 4 functions changed, 13 affected
|
Claude finished @carlos-alm's task —— View job Root Cause AnalysisProblem Being Solved: The worker pool's cold-start overhead (~1.7s on slow CI runners) was dominating the actual parse work (~10ms) when only a few files needed WASM backfill during engine-parity operations. This primarily affected the pre-publish benchmark gate which logs native orchestrator drops like "Native orchestrator dropped 4 file(s)... Backfilling via WASM". Why This Approach: The solution maintains the existing worker pool isolation for large batches (preserving crash safety from tree-sitter WASM issues #965) while routing small batches (≤16 files) through a new inline parsing path that avoids worker spawn overhead. The threshold (16 files) aligns with typical engine-parity drop sizes. Risk Assessment: Low risk - this is a pure performance optimization that preserves existing behavior. The inline path reuses established grammar loading patterns and still sets Backlog Compliance
Tier: 1 Configuration Changes Assessment✅ No problematic configuration changes detected. The PR introduces a single hardcoded threshold ( Code Quality AssessmentStrengths:
Minor Issues:
Critical ConcernsNone identified. This is a well-scoped performance optimization. Final Recommendation
|
Greptile SummaryThis PR scopes WASM grammar loading for small engine-parity backfills by introducing Confidence Score: 4/5Safe to merge — the primary leak fix is correct and the residual leak is small, bounded, and confined to edge-case callers. The previously-reported P1 is fixed. The remaining finding is P2: a bounded WASM tree leak (≤16 trees) in the secondary src/domain/parser.ts — the inline-backfill symbols returned from Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[parseFilesWasmForBackfill] -->|"filePaths.length <= 16"| B[parseFilesWasmInline\nmain-thread, sets _tree]
A -->|"filePaths.length > 16"| C[parseFilesWasm\nworker pool, no _tree]
D[backfillNativeDroppedFiles] --> A
D -->|"after DB insert"| E["cleanup loop\ntree.delete() + _tree = undefined ✅"]
F[parseFilesAuto] --> A
F -->|caller: parse-files.ts| G["ctx.allSymbols\n→ releaseWasmTrees ✅"]
F -->|caller: detect-changes.ts| H["analysisSymbols\n→ runAnalyses\n→ discarded, no cleanup ⚠️"]
F -->|caller: resolve-imports.ts| I["ctx.fileSymbols\n→ may not reach releaseWasmTrees ⚠️"]
Reviews (2): Last reviewed commit: "docs(parser): explain INLINE_BACKFILL_TH..." | Re-trigger Greptile |
Codegraph Impact Analysis4 functions changed → 13 callers affected across 7 files
|
The inline backfill path sets symbols._tree (live web-tree-sitter Tree backed by WASM linear memory) on every result, but those symbols are consumed locally for DB row construction in backfillNativeDroppedFiles and never added to ctx.allSymbols, so the finalize-stage releaseWasmTrees sweep never frees them. Without explicit cleanup, trees leak WASM memory until process exit — bounded per run but cumulative across in-process integration tests. Adds a cleanup loop after batchInsertNodes that mirrors releaseWasmTrees, and drops the now-unused parseFilesAuto import.
Adds context for the 16-file threshold per Claude review feedback: sized for typical engine-parity drops (recurring HCL case is 4 files); above it, the worker-pool's IPC + crash-isolation cost is amortized over enough parse work to be worth paying; below it, the cold-start dominates.
|
Addressed Greptile's P1 — WASM tree leak in Added a cleanup pass after Commit: 49c9461 |
|
Addressed Claude's review feedback:
Commit: ffd3431 |
Summary
parseFilesWasmInline— a main-thread parse path that loads only the grammars for the input extensions and returns symbols with_treeset, so the unified walker inrunAnalysespopulates AST/CFG/dataflow downstream.parseFilesAutoper-call backfill,backfillNativeDroppedFilespost-orchestrator backfill) throughparseFilesWasmForBackfill, which picks inline for batches ≤ 16 files and the worker pool for larger batches (preserving tree-sitter WASM crash isolation bug: WASM engine crashes V8 reproducibly on Windows + Node 22 when building codegraph source #965 where it matters).Context
#1054 — every native
buildGraphcall in the pre-publish gate logsNative orchestrator dropped 4 file(s) in natively-supported languages — likely a Rust extractor bug. Backfilling via WASM: .tfand falls back via the worker pool. This PR makes the fallback cheap for that case. The underlying Rust-side drop (why the freshly-built binary drops these 4 fixture files when the v3.9.6 published binary doesn't) is tracked separately and addressed in a follow-up PR.Test plan
npx tsc --noEmitclean.npx vitest run tests/integration/build-parity.test.ts— 4/4 pass.npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t "hcl"— 5/5 pass (HCL backfill is the prime case the new path serves).incremental-benchmark.tsreports identical numbers (full=2773ms, noop=17ms, 1-file=266ms) to before — Windows didn't pay the worker overhead anyway, so unchanged is expected. Effect should land on CI Linux.Refs #1054