perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep by carlos-alm · Pull Request #1134 · optave/ops-codegraph-tool

carlos-alm · 2026-05-15T14:42:34Z

Summary

Add exclude?: string[] to BuildGraphOpts; merged into ctx.config.exclude inside setupPipeline so the native orchestrator picks it up via the serialized config without a separate code path.
Centralize the bench exclude list (tests/benchmarks/resolution/fixtures/**) in scripts/lib/bench-config.ts and wire it into every buildGraph(root, ...) call in scripts/benchmark.ts, scripts/incremental-benchmark.ts, and scripts/query-benchmark.ts.
Drop the 3.10.0:Full build entry (and its docstring section) from KNOWN_REGRESSIONS in tests/benchmarks/regression-guard.test.ts — the underlying cause is fixed at the source, not papered over.

Why

Each new heavyweight grammar landing in codegraph (the Verilog port in #1107 being the trigger) pulls its resolution-benchmark fixtures into the corpus that all three dogfooding benchmarks sweep. Those fixtures are hand-annotated scaffolding for the static-resolution suite, not representative source code, and parsing them inflates fullBuildMs by hundreds of ms. The previous workaround was a KNOWN_REGRESSIONS exemption per release; this change excludes the fixtures at the build invocation so the next big-grammar PR will not trip the gate at all.

Local verification: native fullBuildMs on this repo returns from the regressed ~2.8s back to ~1.7-1.9s (in line with the 3.10.0 baseline of 1959 ms) while file count drops 745 → 607 (138 fixture files excluded).

Test plan

npm run lint clean on changed files
npx vitest run tests/integration/config-include-exclude.test.ts — 9/9 pass (7 existing + 2 new opts.exclude parity tests for wasm + native)
RUN_REGRESSION_GUARD=1 npx vitest run tests/benchmarks/regression-guard.test.ts — 17/17 pass with the exemption removed
npx vitest run tests/integration/ — 574/574 pass
npx tsx scripts/incremental-benchmark.ts — fixtures excluded (607 files vs prior 745); native fullBuildMs back to baseline range

Closes #1112

Adds an `exclude` field to BuildGraphOpts that augments `config.exclude` for a single build, then wires the three dogfooding benchmark scripts (scripts/benchmark.ts, scripts/incremental-benchmark.ts, scripts/query-benchmark.ts) to pass `tests/benchmarks/resolution/fixtures/**`. Those fixtures are hand-annotated test scaffolding for the static-resolution suite, not representative source code — each new heavyweight grammar (e.g. tree-sitter-verilog in #1107) inflates `fullBuildMs` by hundreds of ms purely from parsing fixture files that have no business in dogfooding timing measurements. The 3.10.0:Full build KNOWN_REGRESSIONS exemption that papered over the Verilog cost is removed now that the root cause is addressed at the build invocation. Closes #1112

claude · 2026-05-15T14:42:48Z

Claude finished @carlos-alm's task —— View job

Reviewing PR #1134: Exclude benchmark fixtures from dogfooding

Todo List:

Phase 1: Root Cause Analysis - understand the problem and approach
Phase 2: Configuration Change Detection - check for high-risk changes
Phase 3: Test Integrity - verify tests aren't weakened
Phase 4: Code Quality - review implementation details
Phase 5: Backlog Compliance - rate against BACKLOG.md criteria
Final Assessment - provide comprehensive review verdict

Root Cause Analysis

Problem Being Solved: Performance regression false positives in benchmarking when new heavyweight grammars (like Verilog) are added to codegraph. The resolution-benchmark fixtures inflate build times disproportionately as they're test scaffolding, not representative code.

Why This Approach: Source-level fix that excludes problematic files from the build corpus entirely, rather than papering over symptoms with KNOWN_REGRESSIONS exemptions. Architecturally sound with centralized configuration and proper native/WASM integration.

Risk Assessment: Low risk - addresses legitimate testing artifact problem at the source.

Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies added
Foundation-aligned: ✓ — Fully compliant with all core principles
Problem-fit: 4 — Directly improves benchmarking accuracy and prevents false regression alerts, supporting better development velocity
Breaking: No — Purely additive, no behavior changes for existing users
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Analysis

Implementation Quality: ⭐⭐⭐⭐☆ (4/5)

Strengths:

Consistent application: All 3 benchmark scripts properly use BENCHMARK_EXCLUDES
Proper type safety: BuildGraphOpts.exclude correctly typed as optional string array
Correct merging logic (pipeline.ts:190-192): Preserves existing excludes while extending programmatically
Comprehensive test coverage: Both engines tested for opts.exclude functionality
Clear documentation: Well-documented rationale in bench-config.ts
Engine parity maintained: Native orchestrator automatically picks up merged excludes

Architecture adherence: ✅ Follows existing patterns, maintains engine parity, no breaking changes

Critical Concerns

None identified - This is a well-executed fix that addresses the root cause rather than masking symptoms.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: Solid implementation that fixes performance regression false positives at the source. Proper typing, comprehensive testing, maintains engine parity, and removes technical debt (KNOWN_REGRESSIONS exemption). The centralized exclude configuration is well-designed and the local verification confirms the fix works as intended.

greptile-apps · 2026-05-15T14:45:36Z

Greptile Summary

This PR fixes a benchmark corpus pollution problem by adding exclude?: string[] to BuildGraphOpts, centralising the resolution-benchmark fixture glob in scripts/lib/bench-config.ts, and wiring it into all three benchmark scripts. The resolveBenchmarkExcludes() helper correctly returns [] in --npm mode to avoid corpus mismatch between baseline and dev runs.

Core pipeline change: setupPipeline in pipeline.ts merges opts.exclude onto ctx.config.exclude ?? [] with a null-safe spread, so the native orchestrator picks it up via the serialised config without a separate code path.
Benchmark wiring: All buildGraph calls in benchmark.ts, incremental-benchmark.ts, and query-benchmark.ts now pass the resolved exclude list; --npm mode passes [] to both the installed baseline and the dev build, keeping the comparison apples-to-apples.
Regression guard housekeeping: 3.10.0:Full build exemption is removed (timing improved), 3.10.0:DB bytes/file is added with a thorough docstring explaining the measurement-scope artifact caused by the reduced file count denominator; incremental round-trip tests for opts.exclude are added covering the previously-untested second-build removal path.

Confidence Score: 5/5

The change is safe to merge; all three previously-raised concerns are fully addressed in this revision.

The core pipeline change is a narrow, null-safe merge of an optional caller-supplied exclude list. The benchmark helper correctly short-circuits to an empty list in --npm mode, preventing timing skew from a corpus size mismatch. The type interface is clean, integration tests cover both fresh-build and incremental removal paths for wasm and native, and the regression-guard exemption swap is arithmetically justified.

No files require special attention.

Important Files Changed

Filename	Overview
scripts/lib/bench-config.ts	Adds BENCHMARK_EXCLUDES constant and resolveBenchmarkExcludes() helper that correctly returns [] in --npm mode to prevent corpus mismatch between baseline and dev runs; parseArgs() integration is clean and warning idempotency via module-level flag works correctly per-process.
src/types.ts	Previously-duplicate exclude? property is now a single clean declaration; skipRegistry was moved to after exclude, resolving the prior P1 duplicate-identifier issue.
src/domain/graph/builder/pipeline.ts	Null-safe merge of opts.exclude onto ctx.config.exclude using ctx.config.exclude ?? [] defensively handles configs where exclude is undefined; native orchestrator picks up the merged config via serialization.
scripts/benchmark.ts	BENCH_EXCLUDE hoisted to top-level so all four buildGraph calls share the same resolved exclude list; in --npm mode resolveBenchmarkExcludes() correctly returns [] keeping corpus consistent across baseline and dev runs.
scripts/incremental-benchmark.ts	Replaces hardcoded BENCH_EXCLUDE constant with resolveBenchmarkExcludes() from shared bench-config; updated comment accurately describes --npm behavior and the corpus-mismatch rationale.
scripts/query-benchmark.ts	Single buildGraph call now passes exclude: [...resolveBenchmarkExcludes()] consistently with the other benchmark scripts.
tests/benchmarks/regression-guard.test.ts	Removes 3.10.0:Full build exemption (timing improved) and adds 3.10.0:DB bytes/file exemption with thorough docstring explaining the measurement-scope artifact from reduced file count; exemption rationale is sound.
tests/integration/config-include-exclude.test.ts	Adds fresh-build tests for opts.exclude on both wasm and native engines, plus incremental round-trip tests that verify previously-indexed files are detected as removals when exclude is introduced on a second build — fully covering the gap identified in the prior review.

_{Reviews (7): Last reviewed commit: "Merge branch 'main' into perf/1112-exclu..." | Re-trigger Greptile}

greptile-apps · 2026-05-15T14:45:39Z

+
+  it('wasm: opts.exclude rejects matching files without writing config', async () => {
+    const root = fs.mkdtempSync(path.join(tmpDir, 'opts-wasm-'));
+    writeFixture(root);
+    const files = await buildWithOptsExclude(root, 'wasm', ['**/*.test.js', '**/*.spec.js']);
+    expect(files).toContain('src/math.js');
+    expect(files).not.toContain('src/math.test.js');
+    expect(files).not.toContain('src/util.spec.js');
+  });
+
+  itNative('native: opts.exclude rejects matching files without writing config', async () => {
+    const root = fs.mkdtempSync(path.join(tmpDir, 'opts-native-'));
+    writeFixture(root);
+    const files = await buildWithOptsExclude(root, 'native', ['**/*.test.js', '**/*.spec.js']);
+    expect(files).toContain('src/math.js');
+    expect(files).not.toContain('src/math.test.js');
+    expect(files).not.toContain('src/util.spec.js');
+  });
 });


opts.exclude incremental coverage gap

Both new tests always wipe the DB before building, so they only exercise the fresh-build path. The scenario where files that were previously indexed become excluded on a subsequent incremental run (i.e. opts.exclude changes between builds against the same DB) is untested. In practice this path should work — collectFiles would omit the newly-excluded files and detectChanges would surface them as removals — but a short test covering one incremental round trip would lock in that behaviour and guard against regressions in the collect/detect stages.

Fixed in b008ffb — added wasm + native parity tests opts.exclude introduced on second incremental build drops previously-indexed files that build the fixture twice against the same DB (first without exclude, then with). The second build observes the previously-indexed test files as removals via detectChanges, and they are dropped from file_hashes. Verified locally: first build indexes 5 files, second build reports "0 changed, 2 removed", and asserts the test/spec files no longer appear in file_hashes.

github-actions · 2026-05-15T14:49:57Z

Codegraph Impact Analysis

3 functions changed → 9 callers affected across 8 files

resolveBenchmarkExcludes in scripts/lib/bench-config.ts:56 (3 transitive callers)
setupPipeline in src/domain/graph/builder/pipeline.ts:168 (6 transitive callers)
BuildGraphOpts.skipRegistry in src/types.ts:1073 (0 transitive callers)

Greptile flagged that the existing opts.exclude tests always wipe the DB before building, so they only exercise the fresh-build path. The scenario where opts.exclude changes between builds against the same DB (files previously indexed must be detected as removals on the second pass) was untested. Add wasm + native parity tests that build the fixture twice against the same DB — first with no exclude, then with exclude — and assert the second build drops the previously-indexed test/spec files from file_hashes. Locks in the collect → detect-removal contract that the benchmark scripts depend on once their exclude list changes mid-life.

The fixture-exclude in this PR shifts the denominator of every per-file build metric: the 3.10.0 baseline was measured over ~745 files (codegraph source + resolution-benchmark fixtures), while dev now measures the ~607 source files alone. DB content is dominated by src/, so absolute bytes stay roughly constant while the file count drops — inflating dbSizeBytes/file from 41614 to ~52211 (+25%, exactly at the threshold). This is a one-time methodology shift, not a real regression in the schema or extraction layer (which is why the old 3.10.0:Full build exemption could be dropped — Full build absolute timing actually improved with the exclude). Add a one-release exemption with a detailed comment so the distinction is visible to future maintainers; remove once 3.11.0+ data is captured under the post-#1134 methodology.

carlos-alm · 2026-05-15T16:11:41Z

CI follow-up addressed in e2346e0:

The regression-guard caught DB bytes/file: 41614 → 52211 (+25%, threshold 25%) after this PR landed locally. Root cause: excluding tests/benchmarks/resolution/fixtures/** shifts the denominator of every per-file build metric — the 3.10.0 baseline measured ~745 files (source + fixtures), dev now measures ~607 source files alone. DB content is dominated by src/, so absolute bytes stay roughly constant while the file count drops, inflating the per-file ratio.

This is a one-time methodology shift, not a real regression. Added a single-release 3.10.0:DB bytes/file exemption to KNOWN_REGRESSIONS with a detailed comment distinguishing it from the dropped 3.10.0:Full build entry (Full build absolute timing actually improved — that one was correctly removed; per-file bytes is a measurement-scope artifact). Exemption will fall off naturally once 3.11.0 baseline data is captured under the post-#1134 methodology.

The KNOWN_REGRESSIONS entries are not stale test still passes (3.10.0 is within the 1-minor-version window of the current 3.10.0 package version).

carlos-alm · 2026-05-15T16:11:55Z

@greptileai

carlos-alm · 2026-05-17T04:28:46Z

Addressed Greptile's P1 outside-diff comment on scripts/benchmark.ts:81 (--npm baseline corpus-mismatch):

Added resolveBenchmarkExcludes() in scripts/lib/bench-config.ts that returns BENCHMARK_EXCLUDES in local mode and [] in --npm mode (with a one-shot stderr note explaining the methodology). All three benchmark scripts (benchmark.ts, incremental-benchmark.ts, query-benchmark.ts) now route through the helper instead of consuming BENCHMARK_EXCLUDES directly. The baseline (older published version that silently drops opts.exclude) and the dev run now sweep the same ~745-file corpus, eliminating the corpus-mismatch that disguised the methodology shift as a perf delta.

Also removed an accidental duplicate exclude?: string[] field in BuildGraphOpts that the merge with #1131 introduced — caught by tsc --noEmit (TS2300). Tests still 11/11 + 17/17 pass.

carlos-alm · 2026-05-17T04:29:06Z

@greptileai

greptile-apps Bot reviewed May 15, 2026

View reviewed changes

carlos-alm added 2 commits May 15, 2026 10:09

carlos-alm mentioned this pull request May 17, 2026

Duplicate early-return in backfillNativeDroppedFiles #1144

Closed

carlos-alm added 3 commits May 16, 2026 22:23

fix: resolve merge conflicts with main

8cc86cf

fix: remove duplicate exclude field from BuildGraphOpts (#1134)

d5b354a

fix: skip BENCHMARK_EXCLUDES in --npm mode for corpus parity (#1134)

e6bf5a1

Merge branch 'main' into perf/1112-exclude-bench-fixtures

10c1812

carlos-alm merged commit c7e600a into main May 17, 2026
21 checks passed

carlos-alm deleted the perf/1112-exclude-bench-fixtures branch May 17, 2026 23:24

github-actions Bot locked and limited conversation to collaborators May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134

perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134
carlos-alm merged 7 commits into
mainfrom
perf/1112-exclude-bench-fixtures

carlos-alm commented May 15, 2026

Uh oh!

claude Bot commented May 15, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 15, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 15, 2026

Uh oh!

carlos-alm May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

carlos-alm commented May 17, 2026

Uh oh!

carlos-alm commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented May 15, 2026

Summary

Why

Test plan

Uh oh!

claude Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR #1134: Exclude benchmark fixtures from dogfooding

Todo List:

Root Cause Analysis

Backlog Compliance

Code Quality Analysis

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

carlos-alm commented May 17, 2026

Uh oh!

carlos-alm commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 15, 2026 •

edited

Loading

greptile-apps Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading