fix(scripts): trend annotations fall back to nearest non-null prior release#1120
Conversation
…nnotations scripts/update-incremental-report.ts compared each release's metrics against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5 had null after both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank, hiding real regressions: 3.9.6 wasm full-build was 14.0s vs 3.9.4's 7.6s (~86% slower) and no-op was 131ms vs 19ms (~589% slower), but the table showed no arrow at all. Replace the per-release prev lookup with findPrevMetric — a per-metric walk back through history that skips releases without a value for that specific metric. Apply consistently in engineRow (the table) and the regression-warning block. Regenerated the report so existing trend annotations now reflect the fallback (3.9.6, 3.8.0 native, 3.6.0 wasm). Closes #1043
|
Claude finished @carlos-alm's task —— View job PR Review CompleteFinal AssessmentRoot Cause AnalysisProblem Being Solved: The incremental benchmark script was showing blank trend arrows when comparing against releases with null metrics (e.g., when workers timed out and stored null values). For example, version 3.9.6 had 86% slower WASM builds and 589% slower no-ops compared to 3.9.4, but showed no trend arrows because it only looked at the immediate prior release (3.9.5) which had null metrics. Why This Approach: The PR introduces Risk Assessment: Low risk. This is a reporting/analytics fix with comprehensive test coverage. No runtime or CLI behavior changes. Backlog Compliance
Code Quality ReviewImplementation Quality ⭐⭐⭐⭐☆The
Test Coverage ⭐⭐⭐⭐⭐Excellent test implementation (
Code Changes Analysis
|
Greptile SummaryThis PR fixes silent blank trend cells in the incremental benchmark report by replacing the single-step
Confidence Score: 5/5Safe to merge — the change is narrowly scoped to the trend-annotation and regression-detection paths of a benchmark reporting script with no production side-effects. The findPrevMetric walk is correct: it starts one position past the current index, skips dev entries, and returns the first non-null getter result. The trend() zero-baseline guard and the resolve-metric hoisting both address previously flagged gaps. The new test exercises the null-skipping path end-to-end with a real script invocation against a temp file. No behavioral regressions were identified in the regression-detection block refactor. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["engineRow called"] --> B["e = hist[i][engineKey]"]
B --> C{e exists?}
C -- No --> D["return null"]
C -- Yes --> E["findPrevMetric walk"]
E --> F["i = fromIdx + 1"]
F --> G{version is dev?}
G -- Yes --> F
G -- No --> H["apply getter"]
H --> I{value not null?}
I -- No --> F
I -- Yes --> J["return value to trend()"]
F --> K["no match found, return null"]
J --> L["trend(current, prev)"]
K --> L
L --> M{prev null or zero?}
M -- Yes --> N["return empty string"]
M -- No --> O["pct = cur-prev / prev * 100"]
O --> P{abs pct less than 2?}
P -- Yes --> Q["return tilde"]
P -- No --> R["return arrow annotation"]
Reviews (9): Last reviewed commit: "Merge branch 'main' into fix/incremental..." | Re-trigger Greptile |
| for (let i = 0; i < history.length; i++) { | ||
| const h = history[i]; | ||
| const prev = findPrevRelease(history, i); | ||
|
|
||
| const nativeRow = engineRow(h, prev, 'native'); | ||
| const wasmRow = engineRow(h, prev, 'wasm'); | ||
| const nativeRow = engineRow(history, i, 'native'); | ||
| const wasmRow = engineRow(history, i, 'wasm'); | ||
| if (nativeRow) md += nativeRow + '\n'; | ||
| if (wasmRow) md += wasmRow + '\n'; | ||
| } |
There was a problem hiding this comment.
The
natT/jsT resolve trends are computed inside engineRow, which is called once for each engine type per release. Since h.resolve is release-level (not engine-specific), findPrevMetric for nativeBatchMs and jsFallbackMs is invoked twice per release — returning identical results both times. This is harmless but wasteful for long history arrays. Consider hoisting the resolve trend calls to the call site, alongside the nativeRow/wasmRow pair, so they are computed once and passed in.
| for (let i = 0; i < history.length; i++) { | |
| const h = history[i]; | |
| const prev = findPrevRelease(history, i); | |
| const nativeRow = engineRow(h, prev, 'native'); | |
| const wasmRow = engineRow(h, prev, 'wasm'); | |
| const nativeRow = engineRow(history, i, 'native'); | |
| const wasmRow = engineRow(history, i, 'wasm'); | |
| if (nativeRow) md += nativeRow + '\n'; | |
| if (wasmRow) md += wasmRow + '\n'; | |
| } | |
| for (let i = 0; i < history.length; i++) { | |
| // Resolve metrics are release-level; pre-compute their trends once per | |
| // release so engineRow doesn't redundantly walk history twice. | |
| const prevNative = findPrevMetric(history, i, (x) => x.resolve?.nativeBatchMs); | |
| const prevJs = findPrevMetric(history, i, (x) => x.resolve?.jsFallbackMs); | |
| const nativeRow = engineRow(history, i, 'native', prevNative, prevJs); | |
| const wasmRow = engineRow(history, i, 'wasm', prevNative, prevJs); | |
| if (nativeRow) md += nativeRow + '\n'; | |
| if (wasmRow) md += wasmRow + '\n'; | |
| } |
There was a problem hiding this comment.
Fixed in 628d37d — hoisted findPrevMetric for the release-level resolve metrics (nativeBatchMs, jsFallbackMs) to the call site and pass them into engineRow, so they are computed once per release instead of twice.
Codegraph Impact Analysis3 functions changed → 2 callers affected across 1 files
|
…lookup (#1120) - trend(): treat previous === 0 the same as null to avoid emitting '↑Infinity%' if a historical entry stores 0 for a metric. Matches the existing previous === 0 guard in checkRegression. - engineRow(): accept pre-computed prevResolveNative/prevResolveJs from the caller instead of recomputing them inside the function. Resolve metrics are release-level (not engine-specific), so the previous code walked history twice per release (once for native, once for wasm) returning identical results both times.
|
Addressed Greptile review feedback (commit 628d37d):
|
Summary
scripts/update-incremental-report.tscompared each release against the immediately prior release only. When that prior release had null build/rebuild metrics (e.g. 3.9.5, where both engine workers were SIGKILL'd on timeout), the trend cell was silently emitted blank — hiding real regressions.findPrevMetric— a per-metric walk back through history that skips releases without a value for that specific metric.engineRow(the version-history table) and the regression-warning block.generated/benchmarks/INCREMENTAL-BENCHMARKS.mdso the existing rows reflect the fallback. Notable previously-blank rows now annotated: 3.9.6 (vs 3.9.4), 3.8.0 native (vs 3.6.0 native), 3.6.0 wasm (vs 3.4.1 wasm).tests/unit/incremental-report-trend-fallback.test.tsgated on theCODEGRAPH_INCREMENTAL_REPORT_PATHenv override so the script can target a temp file from tests.Closes #1043
Test plan
npx vitest run tests/unit/incremental-report-trend-fallback.test.ts— both cases passINCREMENTAL-BENCHMARKS.mdconfirms previously-blank trend cells now show the expected fallback annotationsnpm run lint— clean