Skip to content

perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134

Merged
carlos-alm merged 7 commits into
mainfrom
perf/1112-exclude-bench-fixtures
May 17, 2026
Merged

perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep#1134
carlos-alm merged 7 commits into
mainfrom
perf/1112-exclude-bench-fixtures

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Add exclude?: string[] to BuildGraphOpts; merged into ctx.config.exclude inside setupPipeline so the native orchestrator picks it up via the serialized config without a separate code path.
  • Centralize the bench exclude list (tests/benchmarks/resolution/fixtures/**) in scripts/lib/bench-config.ts and wire it into every buildGraph(root, ...) call in scripts/benchmark.ts, scripts/incremental-benchmark.ts, and scripts/query-benchmark.ts.
  • Drop the 3.10.0:Full build entry (and its docstring section) from KNOWN_REGRESSIONS in tests/benchmarks/regression-guard.test.ts — the underlying cause is fixed at the source, not papered over.

Why

Each new heavyweight grammar landing in codegraph (the Verilog port in #1107 being the trigger) pulls its resolution-benchmark fixtures into the corpus that all three dogfooding benchmarks sweep. Those fixtures are hand-annotated scaffolding for the static-resolution suite, not representative source code, and parsing them inflates fullBuildMs by hundreds of ms. The previous workaround was a KNOWN_REGRESSIONS exemption per release; this change excludes the fixtures at the build invocation so the next big-grammar PR will not trip the gate at all.

Local verification: native fullBuildMs on this repo returns from the regressed ~2.8s back to ~1.7-1.9s (in line with the 3.10.0 baseline of 1959 ms) while file count drops 745 → 607 (138 fixture files excluded).

Test plan

  • npm run lint clean on changed files
  • npx vitest run tests/integration/config-include-exclude.test.ts — 9/9 pass (7 existing + 2 new opts.exclude parity tests for wasm + native)
  • RUN_REGRESSION_GUARD=1 npx vitest run tests/benchmarks/regression-guard.test.ts — 17/17 pass with the exemption removed
  • npx vitest run tests/integration/ — 574/574 pass
  • npx tsx scripts/incremental-benchmark.ts — fixtures excluded (607 files vs prior 745); native fullBuildMs back to baseline range

Closes #1112

Adds an `exclude` field to BuildGraphOpts that augments `config.exclude`
for a single build, then wires the three dogfooding benchmark scripts
(scripts/benchmark.ts, scripts/incremental-benchmark.ts,
scripts/query-benchmark.ts) to pass `tests/benchmarks/resolution/fixtures/**`.

Those fixtures are hand-annotated test scaffolding for the static-resolution
suite, not representative source code — each new heavyweight grammar (e.g.
tree-sitter-verilog in #1107) inflates `fullBuildMs` by hundreds of ms
purely from parsing fixture files that have no business in dogfooding
timing measurements. The 3.10.0:Full build KNOWN_REGRESSIONS exemption
that papered over the Verilog cost is removed now that the root cause is
addressed at the build invocation.

Closes #1112
@claude
Copy link
Copy Markdown

claude Bot commented May 15, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR #1134: Exclude benchmark fixtures from dogfooding

Todo List:

  • Phase 1: Root Cause Analysis - understand the problem and approach
  • Phase 2: Configuration Change Detection - check for high-risk changes
  • Phase 3: Test Integrity - verify tests aren't weakened
  • Phase 4: Code Quality - review implementation details
  • Phase 5: Backlog Compliance - rate against BACKLOG.md criteria
  • Final Assessment - provide comprehensive review verdict

Root Cause Analysis

Problem Being Solved: Performance regression false positives in benchmarking when new heavyweight grammars (like Verilog) are added to codegraph. The resolution-benchmark fixtures inflate build times disproportionately as they're test scaffolding, not representative code.

Why This Approach: Source-level fix that excludes problematic files from the build corpus entirely, rather than papering over symptoms with KNOWN_REGRESSIONS exemptions. Architecturally sound with centralized configuration and proper native/WASM integration.

Risk Assessment: Low risk - addresses legitimate testing artifact problem at the source.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Fully compliant with all core principles
  • Problem-fit: 4 — Directly improves benchmarking accuracy and prevents false regression alerts, supporting better development velocity
  • Breaking: No — Purely additive, no behavior changes for existing users
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Analysis

Implementation Quality: ⭐⭐⭐⭐☆ (4/5)

Strengths:

  1. Consistent application: All 3 benchmark scripts properly use BENCHMARK_EXCLUDES
  2. Proper type safety: BuildGraphOpts.exclude correctly typed as optional string array
  3. Correct merging logic (pipeline.ts:190-192): Preserves existing excludes while extending programmatically
  4. Comprehensive test coverage: Both engines tested for opts.exclude functionality
  5. Clear documentation: Well-documented rationale in bench-config.ts
  6. Engine parity maintained: Native orchestrator automatically picks up merged excludes

Architecture adherence: ✅ Follows existing patterns, maintains engine parity, no breaking changes

Critical Concerns

None identified - This is a well-executed fix that addresses the root cause rather than masking symptoms.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Solid implementation that fixes performance regression false positives at the source. Proper typing, comprehensive testing, maintains engine parity, and removes technical debt (KNOWN_REGRESSIONS exemption). The centralized exclude configuration is well-designed and the local verification confirms the fix works as intended.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

This PR fixes a benchmark corpus pollution problem by adding exclude?: string[] to BuildGraphOpts, centralising the resolution-benchmark fixture glob in scripts/lib/bench-config.ts, and wiring it into all three benchmark scripts. The resolveBenchmarkExcludes() helper correctly returns [] in --npm mode to avoid corpus mismatch between baseline and dev runs.

  • Core pipeline change: setupPipeline in pipeline.ts merges opts.exclude onto ctx.config.exclude ?? [] with a null-safe spread, so the native orchestrator picks it up via the serialised config without a separate code path.
  • Benchmark wiring: All buildGraph calls in benchmark.ts, incremental-benchmark.ts, and query-benchmark.ts now pass the resolved exclude list; --npm mode passes [] to both the installed baseline and the dev build, keeping the comparison apples-to-apples.
  • Regression guard housekeeping: 3.10.0:Full build exemption is removed (timing improved), 3.10.0:DB bytes/file is added with a thorough docstring explaining the measurement-scope artifact caused by the reduced file count denominator; incremental round-trip tests for opts.exclude are added covering the previously-untested second-build removal path.

Confidence Score: 5/5

The change is safe to merge; all three previously-raised concerns are fully addressed in this revision.

The core pipeline change is a narrow, null-safe merge of an optional caller-supplied exclude list. The benchmark helper correctly short-circuits to an empty list in --npm mode, preventing timing skew from a corpus size mismatch. The type interface is clean, integration tests cover both fresh-build and incremental removal paths for wasm and native, and the regression-guard exemption swap is arithmetically justified.

No files require special attention.

Important Files Changed

Filename Overview
scripts/lib/bench-config.ts Adds BENCHMARK_EXCLUDES constant and resolveBenchmarkExcludes() helper that correctly returns [] in --npm mode to prevent corpus mismatch between baseline and dev runs; parseArgs() integration is clean and warning idempotency via module-level flag works correctly per-process.
src/types.ts Previously-duplicate exclude? property is now a single clean declaration; skipRegistry was moved to after exclude, resolving the prior P1 duplicate-identifier issue.
src/domain/graph/builder/pipeline.ts Null-safe merge of opts.exclude onto ctx.config.exclude using ctx.config.exclude ?? [] defensively handles configs where exclude is undefined; native orchestrator picks up the merged config via serialization.
scripts/benchmark.ts BENCH_EXCLUDE hoisted to top-level so all four buildGraph calls share the same resolved exclude list; in --npm mode resolveBenchmarkExcludes() correctly returns [] keeping corpus consistent across baseline and dev runs.
scripts/incremental-benchmark.ts Replaces hardcoded BENCH_EXCLUDE constant with resolveBenchmarkExcludes() from shared bench-config; updated comment accurately describes --npm behavior and the corpus-mismatch rationale.
scripts/query-benchmark.ts Single buildGraph call now passes exclude: [...resolveBenchmarkExcludes()] consistently with the other benchmark scripts.
tests/benchmarks/regression-guard.test.ts Removes 3.10.0:Full build exemption (timing improved) and adds 3.10.0:DB bytes/file exemption with thorough docstring explaining the measurement-scope artifact from reduced file count; exemption rationale is sound.
tests/integration/config-include-exclude.test.ts Adds fresh-build tests for opts.exclude on both wasm and native engines, plus incremental round-trip tests that verify previously-indexed files are detected as removals when exclude is introduced on a second build — fully covering the gap identified in the prior review.

Reviews (7): Last reviewed commit: "Merge branch 'main' into perf/1112-exclu..." | Re-trigger Greptile

Comment on lines +196 to 214

it('wasm: opts.exclude rejects matching files without writing config', async () => {
const root = fs.mkdtempSync(path.join(tmpDir, 'opts-wasm-'));
writeFixture(root);
const files = await buildWithOptsExclude(root, 'wasm', ['**/*.test.js', '**/*.spec.js']);
expect(files).toContain('src/math.js');
expect(files).not.toContain('src/math.test.js');
expect(files).not.toContain('src/util.spec.js');
});

itNative('native: opts.exclude rejects matching files without writing config', async () => {
const root = fs.mkdtempSync(path.join(tmpDir, 'opts-native-'));
writeFixture(root);
const files = await buildWithOptsExclude(root, 'native', ['**/*.test.js', '**/*.spec.js']);
expect(files).toContain('src/math.js');
expect(files).not.toContain('src/math.test.js');
expect(files).not.toContain('src/util.spec.js');
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 opts.exclude incremental coverage gap

Both new tests always wipe the DB before building, so they only exercise the fresh-build path. The scenario where files that were previously indexed become excluded on a subsequent incremental run (i.e. opts.exclude changes between builds against the same DB) is untested. In practice this path should work — collectFiles would omit the newly-excluded files and detectChanges would surface them as removals — but a short test covering one incremental round trip would lock in that behaviour and guard against regressions in the collect/detect stages.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b008ffb — added wasm + native parity tests opts.exclude introduced on second incremental build drops previously-indexed files that build the fixture twice against the same DB (first without exclude, then with). The second build observes the previously-indexed test files as removals via detectChanges, and they are dropped from file_hashes. Verified locally: first build indexes 5 files, second build reports "0 changed, 2 removed", and asserts the test/spec files no longer appear in file_hashes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

Codegraph Impact Analysis

3 functions changed9 callers affected across 8 files

  • resolveBenchmarkExcludes in scripts/lib/bench-config.ts:56 (3 transitive callers)
  • setupPipeline in src/domain/graph/builder/pipeline.ts:168 (6 transitive callers)
  • BuildGraphOpts.skipRegistry in src/types.ts:1073 (0 transitive callers)

Greptile flagged that the existing opts.exclude tests always wipe the DB
before building, so they only exercise the fresh-build path. The scenario
where opts.exclude changes between builds against the same DB (files
previously indexed must be detected as removals on the second pass) was
untested.

Add wasm + native parity tests that build the fixture twice against the
same DB — first with no exclude, then with exclude — and assert the
second build drops the previously-indexed test/spec files from
file_hashes. Locks in the collect → detect-removal contract that the
benchmark scripts depend on once their exclude list changes mid-life.
The fixture-exclude in this PR shifts the denominator of every per-file
build metric: the 3.10.0 baseline was measured over ~745 files (codegraph
source + resolution-benchmark fixtures), while dev now measures the ~607
source files alone. DB content is dominated by src/, so absolute bytes
stay roughly constant while the file count drops — inflating
dbSizeBytes/file from 41614 to ~52211 (+25%, exactly at the threshold).

This is a one-time methodology shift, not a real regression in the schema
or extraction layer (which is why the old 3.10.0:Full build exemption
could be dropped — Full build absolute timing actually improved with the
exclude). Add a one-release exemption with a detailed comment so the
distinction is visible to future maintainers; remove once 3.11.0+ data
is captured under the post-#1134 methodology.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

CI follow-up addressed in e2346e0:

The regression-guard caught DB bytes/file: 41614 → 52211 (+25%, threshold 25%) after this PR landed locally. Root cause: excluding tests/benchmarks/resolution/fixtures/** shifts the denominator of every per-file build metric — the 3.10.0 baseline measured ~745 files (source + fixtures), dev now measures ~607 source files alone. DB content is dominated by src/, so absolute bytes stay roughly constant while the file count drops, inflating the per-file ratio.

This is a one-time methodology shift, not a real regression. Added a single-release 3.10.0:DB bytes/file exemption to KNOWN_REGRESSIONS with a detailed comment distinguishing it from the dropped 3.10.0:Full build entry (Full build absolute timing actually improved — that one was correctly removed; per-file bytes is a measurement-scope artifact). Exemption will fall off naturally once 3.11.0 baseline data is captured under the post-#1134 methodology.

The KNOWN_REGRESSIONS entries are not stale test still passes (3.10.0 is within the 1-minor-version window of the current 3.10.0 package version).

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile's P1 outside-diff comment on scripts/benchmark.ts:81 (--npm baseline corpus-mismatch):

Added resolveBenchmarkExcludes() in scripts/lib/bench-config.ts that returns BENCHMARK_EXCLUDES in local mode and [] in --npm mode (with a one-shot stderr note explaining the methodology). All three benchmark scripts (benchmark.ts, incremental-benchmark.ts, query-benchmark.ts) now route through the helper instead of consuming BENCHMARK_EXCLUDES directly. The baseline (older published version that silently drops opts.exclude) and the dev run now sweep the same ~745-file corpus, eliminating the corpus-mismatch that disguised the methodology shift as a perf delta.

Also removed an accidental duplicate exclude?: string[] field in BuildGraphOpts that the merge with #1131 introduced — caught by tsc --noEmit (TS2300). Tests still 11/11 + 17/17 pass.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit c7e600a into main May 17, 2026
21 checks passed
@carlos-alm carlos-alm deleted the perf/1112-exclude-bench-fixtures branch May 17, 2026 23:24
@github-actions github-actions Bot locked and limited conversation to collaborators May 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: exclude verilog benchmark fixtures from incremental benchmark sweep

1 participant