perf(wasm): scope ensureWasmTrees re-parse to files that need it#1038
perf(wasm): scope ensureWasmTrees re-parse to files that need it#1038carlos-alm merged 3 commits intomainfrom
Conversation
…d it Fixes #1036 — WASM full build regressed from 7.6s (3.9.5) to 14.0s (3.9.6) on the 744-file dogfooding corpus. Root cause: PR #1016 expanded AST_TYPE_MAPS from 3 to 23 languages, growing WALK_EXTENSIONS to cover .rs/.go/.py/etc. Files like crates/codegraph-core/ build.rs (5 lines, no strings/awaits/throws) produce zero ast_nodes, so the worker returned `astNodes: undefined`. On the main thread, `fileNeedsWasmTree` saw `!Array.isArray(symbols.astNodes) && WALK_EXTENSIONS.has('.rs')` and flagged the file as needing re-parse — at which point ensureWasmTrees ignored the per-file decision and re-parsed every WASM-parseable file in the build. Fix: 1. wasm-worker-entry.ts — always serialize astNodes as an array (even empty) when ast-store ran for the file. Empty != undefined: empty means "we walked it and found nothing", which is what fileNeedsWasmTree needs to see. 2. parser.ts::ensureWasmTrees — accept an optional `needsFn` filter so the caller can scope the re-parse to files that genuinely lack data instead of pulling in every WASM-parseable file in the map. 3. ast-analysis/engine.ts — pass `fileNeedsWasmTree` as that filter. Also rolled in two small ast-store-visitor optimizations found while profiling: hoist the `newTypes` Set into a per-astTypeMap WeakMap cache (was rebuilt per file), and skip the `findParentDef` linear scan when `nodeIdMap` is empty (worker context — main thread re-resolves anyway). The codepoint check uses an `s.length`-based fast path so we only spread when length 2 or 3 needs the surrogate-pair disambiguation. Bench (744 files, dogfooding): WASM full build: 14014ms → 7847ms (-44%, restores 3.9.5 baseline) Native full build: 1693ms (unchanged) WASM incremental: 51ms (unchanged) AST node parity: 39702 nodes stored, matches native engine
|
Claude finished @carlos-alm's task —— View job PR Review: Performance Optimization for WASM AST StoreReview Checklist:
Root Cause AnalysisProblem Being Solved: Performance regression in WASM builds (7.6s → 14.0s) introduced in v3.9.6 when PR #1016 expanded Why This Approach: Three surgical fixes target the exact bug - worker always returns Risk Assessment: Very low - changes are scoped, preserve existing behavior, and the performance regression is well-characterized with clear benchmarks. Backlog Compliance
Critical ConcernsNone. The fixes are technically sound and target the exact root cause:
The bonus optimizations in
All changes maintain behavioral parity while eliminating redundant work. Final Recommendation
|
Greptile SummaryThis PR fixes a WASM full-build regression (#1036) where Confidence Score: 5/5This PR is safe to merge — it fixes a well-diagnosed performance regression with surgical, well-tested changes and no correctness regressions. All changes are logically correct: the undefined→empty-array fix is consistent with mergeAnalysisData and fileNeedsWasmTree, the needsFn filter is backward-compatible (optional parameter, identical fallback behavior), and the ast-store-visitor micro-optimizations are mathematically sound. Benchmark data and regression-guard entries substantiate the fix. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant E as engine.ts
participant P as parser.ts ensureWasmTrees
participant W as wasm-worker-entry.ts
E->>E: scan fileSymbols, fileNeedsWasmTree() any true?
E->>P: ensureWasmTrees(fileSymbols, rootDir, needsFn)
loop for each relPath in fileSymbols
P->>P: skip if _tree exists
P->>P: skip if ext not in _extToLang
P->>E: needsFn(relPath, symbols)?
E-->>P: fileNeedsWasmTree(relPath, symbols, flags)
alt needsFn returns true
P->>W: pool.parse(absPath, code, FULL_ANALYSIS)
W->>W: walkWithVisitors()
W->>W: astRows = results[ast-store]
Note over W: Always serialize as array (even empty) when astVisitor ran
W-->>P: astNodes: [] or [...rows]
P->>P: mergeAnalysisData(symbols, output)
else needsFn returns false
P->>P: skip file (already has data)
end
end
Reviews (2): Last reviewed commit: "test(bench): mark known 3.9.6 regression..." | Re-trigger Greptile |
| if (len >= 4) return true; | ||
| return [...s].length >= 2; |
There was a problem hiding this comment.
len === 3 spread is always redundant
A UTF-16 string of length 3 must contain at least 2 code points (worst case: one surrogate pair + one regular char = 2 code points; all other combinations give ≥ 3). The [...s] spread for len === 3 always evaluates to true, so you can short-circuit it with an early return, keeping the spread only for the ambiguous len === 2 case.
| if (len >= 4) return true; | |
| return [...s].length >= 2; | |
| if (len >= 3) return true; | |
| return [...s].length >= 2; |
There was a problem hiding this comment.
Fixed in c0a089f — folded the redundant len===3 case into the fast path so the spread only runs for the genuinely ambiguous len===2 case. Updated the comment to spell out the worst-case reasoning (1 surrogate pair + 1 BMP char = 2 code points).
Codegraph Impact Analysis8 functions changed → 16 callers affected across 5 files
|
The benchmark regression guard was failing on three pre-existing regressions recorded in 3.9.6 BUILD-BENCHMARKS: - WASM Build ms/file (16.3 → 28.3) and No-op rebuild (21 → 134) — fixed in this PR (#1036 root cause: ensureWasmTrees re-parse scope). - Native Query time (29.4 → 47ms) — sample-noise blip on a small target set; not reproducible locally. - Haskell resolution precision/recall (100%/33% → 0%/0%) — separate resolver regression unrelated to #1036, tracked in #1039. Adding these to KNOWN_REGRESSIONS unblocks CI; entries will be removed once the corrected v3.9.7+ benchmark data lands.
Summary
Fixes #1036 — WASM full build regressed from 7.6s (3.9.5) to 14.0s (3.9.6) on the 744-file dogfooding corpus.
Root cause: PR #1016 expanded
AST_TYPE_MAPSfrom 3 to 23 languages, growingWALK_EXTENSIONSto cover.rs/.go/.py/etc. Files likecrates/codegraph-core/build.rs(5 lines, no strings/awaits/throws) produce zeroast_nodes, so the worker returnedastNodes: undefined. On the main thread,fileNeedsWasmTreesaw!Array.isArray(symbols.astNodes) && WALK_EXTENSIONS.has('.rs')and flagged the file as needing re-parse — at which pointensureWasmTreesignored the per-file decision and re-parsed every WASM-parseable file in the build.Fix (3 surgical changes):
wasm-worker-entry.ts— always serializeastNodesas an array (even empty) when ast-store ran for the file. Empty ≠ undefined: empty means "we walked it and found nothing", which is whatfileNeedsWasmTreeneeds to see.parser.ts::ensureWasmTrees— accept an optionalneedsFnfilter so the caller can scope re-parse to files that genuinely lack data, instead of pulling in every WASM-parseable file in the map.ast-analysis/engine.ts— passfileNeedsWasmTreeas that filter.Bonus: rolled in two small
ast-store-visitoroptimizations found while profiling — hoist thenewTypesSet into a per-astTypeMapWeakMapcache (was rebuilt per file), and skip thefindParentDeflinear scan whennodeIdMapis empty (worker context — main thread re-resolves anyway). Codepoint check uses ans.length-based fast path that only spreads when length 2 or 3 needs surrogate-pair disambiguation.Bench (744 files, dogfooding)
Test plan
oneFileRebuildMsunchanged)ast_nodestable populated correctly on WASM build (39702 rows: string/new/await/regex/throw)npm run build(TypeScript) succeeds