fix(scripts): trend annotations fall back to nearest non-null prior release by carlos-alm · Pull Request #1120 · optave/ops-codegraph-tool

carlos-alm · 2026-05-14T21:10:02Z

Summary

scripts/update-incremental-report.ts compared each release against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5, where both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank — hiding real regressions.
3.9.6 wasm full-build was 14.0s vs 3.9.4's 7.6s (~86% slower) and no-op was 131ms vs 19ms (~589% slower), but the table showed no arrow at all.
Replace the per-release lookup with findPrevMetric — a per-metric walk back through history that skips releases without a value for that specific metric.
Apply the same fallback in engineRow (the version-history table) and the regression-warning block.
Regenerated generated/benchmarks/INCREMENTAL-BENCHMARKS.md so the existing rows reflect the fallback. Notable previously-blank rows now annotated: 3.9.6 (vs 3.9.4), 3.8.0 native (vs 3.6.0 native), 3.6.0 wasm (vs 3.4.1 wasm).
Added regression test tests/unit/incremental-report-trend-fallback.test.ts gated on the CODEGRAPH_INCREMENTAL_REPORT_PATH env override so the script can target a temp file from tests.

Note: PR #1119 (closing #1042) introduces the same env override. Whichever lands first, the other rebases trivially.

Test plan

npx vitest run tests/unit/incremental-report-trend-fallback.test.ts — both cases pass
Visual inspection of the regenerated INCREMENTAL-BENCHMARKS.md confirms previously-blank trend cells now show the expected fallback annotations
npm run lint — clean
CI green

…nnotations scripts/update-incremental-report.ts compared each release's metrics against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5 had null after both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank, hiding real regressions: 3.9.6 wasm full-build was 14.0s vs 3.9.4's 7.6s (~86% slower) and no-op was 131ms vs 19ms (~589% slower), but the table showed no arrow at all. Replace the per-release prev lookup with findPrevMetric — a per-metric walk back through history that skips releases without a value for that specific metric. Apply consistently in engineRow (the table) and the regression-warning block. Regenerated the report so existing trend annotations now reflect the fallback (3.9.6, 3.8.0 native, 3.6.0 wasm). Closes #1043

claude · 2026-05-14T21:10:18Z

Claude finished @carlos-alm's task —— View job

PR Review Complete

Final Assessment

Root Cause Analysis

Problem Being Solved: The incremental benchmark script was showing blank trend arrows when comparing against releases with null metrics (e.g., when workers timed out and stored null values). For example, version 3.9.6 had 86% slower WASM builds and 589% slower no-ops compared to 3.9.4, but showed no trend arrows because it only looked at the immediate prior release (3.9.5) which had null metrics.

Why This Approach: The PR introduces findPrevMetric() which walks back through release history to find the nearest non-null value for each specific metric. This ensures trend annotations reflect actual performance changes rather than being hidden by intermediate releases with incomplete data.

Risk Assessment: Low risk. This is a reporting/analytics fix with comprehensive test coverage. No runtime or CLI behavior changes.

Backlog Compliance

Zero-dep: ✓ — No new dependencies added
Foundation-aligned: ✓ — Pure reporting improvement, maintains zero-cost core
Problem-fit: 2 — Nice improvement for development observability, but tangential to core agent navigation problem
Breaking: No — Only changes generated report content, no CLI/API changes
Tier: 1

Code Quality Review

Implementation Quality ⭐⭐⭐⭐☆

The findPrevMetric() function is well-designed:

Clean separation of concerns - single purpose function
Proper null handling and dev version skipping (lines 64-65)
Applied consistently across all metric types (full build, no-op, one-file, resolve metrics)
Maintains backward compatibility with existing report format

Test Coverage ⭐⭐⭐⭐⭐

Excellent test implementation (tests/unit/incremental-report-trend-fallback.test.ts):

Comprehensive scenarios: Tests both fallback behavior and empty history edge case
Realistic data: Uses actual version patterns that mirror real release structure
Isolation: Uses temp files and proper cleanup with CODEGRAPH_INCREMENTAL_REPORT_PATH env override
Assertions: Verifies specific trend percentages (↑100%, ↑1200%, ↑20%) and negative cases
Edge cases: Handles first release (no prior history) correctly

Code Changes Analysis

`scripts/update-incremental-report.ts` (lines 58-69, 92-97, 101-104, 194-228)

✅ Clean function extraction: findPrevMetric() is pure and reusable
✅ Consistent application: Applied to all metric types (fullBuildMs, noopRebuildMs, etc.)
✅ Regression detection: Also updated to use fallback logic (lines 194-228)
✅ Dev version handling: Properly skips dev versions when finding prior metrics

Test Structure

✅ Node.js compatibility: Uses --experimental-strip-types flag with version detection
✅ Proper isolation: Temp directories, env overrides, cleanup in afterEach
✅ Realistic fixtures: Hand-crafted release history mimicking actual benchmark data

Generated File Updates

The regenerated INCREMENTAL-BENCHMARKS.md shows the fix working correctly:

Line 12: 3.9.6 wasm: ↑86% (vs 3.9.4, not blank due to null 3.9.5)
Line 25: 3.8.0 native: ↓50% (vs 3.6.0, skipping the gaps)
Previously blank trend cells now show meaningful annotations

Minor Observations

Env override pattern: The CODEGRAPH_INCREMENTAL_REPORT_PATH pattern is clean and follows existing conventions
Error handling: Graceful fallback when no prior metric exists (return null)
Performance: No performance impact - just changes lookup logic during report generation

Critical Concerns

None identified. This is a solid bug fix with excellent test coverage.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: Well-implemented fix for a real reporting issue. Clean code, comprehensive tests, and no breaking changes. The solution properly handles edge cases and maintains backward compatibility. This directly improves the usefulness of performance trend data without any downside risk.

greptile-apps · 2026-05-14T21:14:09Z

Greptile Summary

This PR fixes silent blank trend cells in the incremental benchmark report by replacing the single-step findPrevRelease lookup with findPrevMetric — a per-metric backward walk that skips releases whose engine workers crashed and stored null. The trend() helper also gains a previous === 0 guard to prevent ↑Infinity% output.

findPrevMetric walks hist from fromIdx + 1, skips dev entries, and returns the first non-null value from a getter — applied to per-engine build times and release-level resolve timings.
engineRow is refactored to accept pre-computed prevResolveNative/prevResolveJs values (hoisted to the call-site loop) so history is only traversed once per release for resolve metrics instead of once per engine type.
Regression detection is updated symmetrically, and a new Vitest integration test exercises the fallback and the no-history base case.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to the trend-annotation and regression-detection paths of a benchmark reporting script with no production side-effects.

The findPrevMetric walk is correct: it starts one position past the current index, skips dev entries, and returns the first non-null getter result. The trend() zero-baseline guard and the resolve-metric hoisting both address previously flagged gaps. The new test exercises the null-skipping path end-to-end with a real script invocation against a temp file. No behavioral regressions were identified in the regression-detection block refactor.

No files require special attention.

Important Files Changed

Filename	Overview
scripts/update-incremental-report.ts	Core logic change: replaces findPrevRelease with findPrevMetric for per-metric null-skipping fallback; trend() gains a zero-baseline guard; engineRow refactored to accept pre-hoisted resolve baselines; regression detection updated symmetrically.
tests/unit/incremental-report-trend-fallback.test.ts	New integration test via execFileSync + temp file: covers the null-skip fallback case (9.9.5 null → compare to 9.9.4) and the empty-history base case (no trend annotations emitted).
generated/benchmarks/INCREMENTAL-BENCHMARKS.md	Regenerated output: previously-blank trend cells for 3.9.6, 3.8.0 native, and 3.6.0 wasm now show fallback annotations (e.g. ↑86%, ↑589%).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["engineRow called"] --> B["e = hist[i][engineKey]"]
    B --> C{e exists?}
    C -- No --> D["return null"]
    C -- Yes --> E["findPrevMetric walk"]
    E --> F["i = fromIdx + 1"]
    F --> G{version is dev?}
    G -- Yes --> F
    G -- No --> H["apply getter"]
    H --> I{value not null?}
    I -- No --> F
    I -- Yes --> J["return value to trend()"]
    F --> K["no match found, return null"]
    J --> L["trend(current, prev)"]
    K --> L
    L --> M{prev null or zero?}
    M -- Yes --> N["return empty string"]
    M -- No --> O["pct = cur-prev / prev * 100"]
    O --> P{abs pct less than 2?}
    P -- Yes --> Q["return tilde"]
    P -- No --> R["return arrow annotation"]

_{Reviews (9): Last reviewed commit: "Merge branch 'main' into fix/incremental..." | Re-trigger Greptile}

greptile-apps · 2026-05-14T21:14:13Z

 for (let i = 0; i < history.length; i++) {
-	const h = history[i];
-	const prev = findPrevRelease(history, i);
-
-	const nativeRow = engineRow(h, prev, 'native');
-	const wasmRow = engineRow(h, prev, 'wasm');
+	const nativeRow = engineRow(history, i, 'native');
+	const wasmRow = engineRow(history, i, 'wasm');
 	if (nativeRow) md += nativeRow + '\n';
 	if (wasmRow) md += wasmRow + '\n';
 }


The natT/jsT resolve trends are computed inside engineRow, which is called once for each engine type per release. Since h.resolve is release-level (not engine-specific), findPrevMetric for nativeBatchMs and jsFallbackMs is invoked twice per release — returning identical results both times. This is harmless but wasteful for long history arrays. Consider hoisting the resolve trend calls to the call site, alongside the nativeRow/wasmRow pair, so they are computed once and passed in.

Suggested change

for (let i = 0; i < history.length; i++) {

const h = history[i];

const prev = findPrevRelease(history, i);

const nativeRow = engineRow(h, prev, 'native');

const wasmRow = engineRow(h, prev, 'wasm');

const nativeRow = engineRow(history, i, 'native');

const wasmRow = engineRow(history, i, 'wasm');

if (nativeRow) md += nativeRow + '\n';

if (wasmRow) md += wasmRow + '\n';

}

for (let i = 0; i < history.length; i++) {

// Resolve metrics are release-level; pre-compute their trends once per

// release so engineRow doesn't redundantly walk history twice.

const prevNative = findPrevMetric(history, i, (x) => x.resolve?.nativeBatchMs);

const prevJs = findPrevMetric(history, i, (x) => x.resolve?.jsFallbackMs);

const nativeRow = engineRow(history, i, 'native', prevNative, prevJs);

const wasmRow = engineRow(history, i, 'wasm', prevNative, prevJs);

if (nativeRow) md += nativeRow + '\n';

if (wasmRow) md += wasmRow + '\n';

}

Fixed in 628d37d — hoisted findPrevMetric for the release-level resolve metrics (nativeBatchMs, jsFallbackMs) to the call site and pass them into engineRow, so they are computed once per release instead of twice.

github-actions · 2026-05-14T21:17:02Z

Codegraph Impact Analysis

3 functions changed → 2 callers affected across 1 files

findPrevMetric in scripts/update-incremental-report.ts:70 (2 transitive callers)
trend in scripts/update-incremental-report.ts:80 (2 transitive callers)
engineRow in scripts/update-incremental-report.ts:97 (1 transitive callers)

…lookup (#1120) - trend(): treat previous === 0 the same as null to avoid emitting '↑Infinity%' if a historical entry stores 0 for a metric. Matches the existing previous === 0 guard in checkRegression. - engineRow(): accept pre-computed prevResolveNative/prevResolveJs from the caller instead of recomputing them inside the function. Resolve metrics are release-level (not engine-specific), so the previous code walked history twice per release (once for native, once for wasm) returning identical results both times.

carlos-alm · 2026-05-15T05:17:55Z

Addressed Greptile review feedback (commit 628d37d):

Hoisted resolve trend lookup (line 130-135): pre-compute prevResolveNative / prevResolveJs once per release at the call site, and pass them into engineRow. Resolve metrics are release-level, so we no longer walk history twice per release (once per engine).
Zero-baseline guard in trend() (line 72-80, from the Comments Outside Diff section): trend() now early-returns '' when previous === 0, matching the existing guard in checkRegression. Prevents ↑Infinity% if a historical metric is genuinely zero.

carlos-alm · 2026-05-15T05:18:19Z

@greptileai

…t-trend-fallback

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

carlos-alm and others added 6 commits May 14, 2026 23:46

Merge branch 'main' into fix/incremental-report-trend-fallback

e86c257

Merge branch 'main' into fix/incremental-report-trend-fallback

f2c47f0

Merge remote-tracking branch 'origin/main' into fix/incremental-repor…

c90a8be

…t-trend-fallback

Merge branch 'main' into fix/incremental-report-trend-fallback

119b8a5

Merge branch 'main' into fix/incremental-report-trend-fallback

01a0264

Merge branch 'main' into fix/incremental-report-trend-fallback

0c4d2c2

carlos-alm merged commit a7c2c00 into main May 17, 2026
21 checks passed

carlos-alm deleted the fix/incremental-report-trend-fallback branch May 17, 2026 03:50

github-actions Bot locked and limited conversation to collaborators May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scripts): trend annotations fall back to nearest non-null prior release#1120

fix(scripts): trend annotations fall back to nearest non-null prior release#1120
carlos-alm merged 8 commits into
mainfrom
fix/incremental-report-trend-fallback

carlos-alm commented May 14, 2026

Uh oh!

claude Bot commented May 14, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 14, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 14, 2026

Uh oh!

carlos-alm May 15, 2026

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented May 14, 2026

Summary

Test plan

Uh oh!

claude Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete

Final Assessment

Root Cause Analysis

Backlog Compliance

Code Quality Review

Implementation Quality ⭐⭐⭐⭐☆

Test Coverage ⭐⭐⭐⭐⭐

Code Changes Analysis

scripts/update-incremental-report.ts (lines 58-69, 92-97, 101-104, 194-228)

Test Structure

Generated File Updates

Minor Observations

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

carlos-alm commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 14, 2026 •

edited

Loading

`scripts/update-incremental-report.ts` (lines 58-69, 92-97, 101-104, 194-228)

greptile-apps Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading