Skip to content

Commit ad6c78d

Browse files
committed
ci(bench): gate release benchmark on engine parity thresholds
The Benchmark workflow already runs after every release (on workflow_run completion of Publish). Add a parity gate to the build-benchmark job so drift between the native and wasm engines fails the workflow — catching regressions the benchmark data would otherwise silently record. The gate runs after the benchmark doc PR is created, so the raw numbers still land in generated/benchmarks/BUILD-BENCHMARKS.md even when parity regresses; only the workflow status goes red to alert maintainers. Thresholds reference the currently-open parity bugs on v3.9.5: - File-set gap |wasm - native| ≤ 2 (#1011) - DB size ratio native/wasm ≤ 1.02 (#1010) - Full-build edges-phase ratio ≤ 1.30 (#1013) - Full-build roles-phase ratio ≤ 1.30 (#1013) - 1-file incremental ratio ≤ 1.50 (#1012) The gate writes a markdown table to \$GITHUB_STEP_SUMMARY showing pass/fail per threshold with a direct link to the tracking issue, so reviewers see the regression at a glance without digging through logs. No behavior change on the passing path — when both engines are within thresholds the step exits 0 silently. Impact: 2 functions changed, 2 affected
1 parent 176e3f0 commit ad6c78d

2 files changed

Lines changed: 128 additions & 0 deletions

File tree

.github/workflows/benchmark.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,13 @@ jobs:
198198
--body "Automated build benchmark update for **${VERSION}** from workflow run [#${{ github.run_number }}](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }})."
199199
fi
200200
201+
# Engine-parity gate: runs AFTER the doc PR is created so the PR still
202+
# records raw benchmark data even when parity regresses. The job status
203+
# going red alerts maintainers; the linked issues describe each threshold.
204+
- name: Engine parity gate
205+
if: steps.existing.outputs.skip != 'true'
206+
run: node scripts/benchmark-parity-gate.mjs benchmark-result.json
207+
201208
embedding-benchmark:
202209
runs-on: ubuntu-latest
203210
# 7 models x 30 min each = 210 min worst-case; symbols are sampled to 1500 so

scripts/benchmark-parity-gate.mjs

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
#!/usr/bin/env node
2+
/**
3+
* Engine parity gate — runs after the release build benchmark.
4+
*
5+
* Reads the merged benchmark-result.json (contains `wasm` and `native` blocks)
6+
* and fails the workflow if the gap between engines breaches a documented
7+
* threshold. A failure here doesn't block the release (benchmark runs *after*
8+
* Publish completes); it surfaces regressions to maintainers via the workflow's
9+
* red status and writes a summary to $GITHUB_STEP_SUMMARY.
10+
*
11+
* Thresholds reference the parity bugs open against v3.9.5:
12+
* - #1010 DB size / excess ast_nodes
13+
* - #1011 Native orchestrator drops files
14+
* - #1012 Native 1-file incremental runs globally
15+
* - #1013 Native full-build edges/roles phases
16+
*
17+
* Each threshold fires only when BOTH engines produced results. If one engine
18+
* failed, we leave the gate passing so the rest of the workflow (doc PR,
19+
* artifact upload) still runs, and a separate "both engines ran" check flags
20+
* the missing engine.
21+
*/
22+
import fs from 'node:fs';
23+
import path from 'node:path';
24+
25+
const resultFile = process.argv[2];
26+
if (!resultFile) {
27+
console.error('Usage: benchmark-parity-gate.mjs <benchmark-result.json>');
28+
process.exit(2);
29+
}
30+
31+
const result = JSON.parse(fs.readFileSync(resultFile, 'utf8'));
32+
const { wasm, native, version } = result;
33+
34+
const summaryFile = process.env.GITHUB_STEP_SUMMARY;
35+
const writeSummary = (text) => {
36+
if (summaryFile) fs.appendFileSync(summaryFile, text);
37+
};
38+
39+
function line(s = '') {
40+
console.log(s);
41+
writeSummary(`${s}\n`);
42+
}
43+
44+
line(`## Engine parity gate — v${version}`);
45+
line('');
46+
47+
if (!wasm || !native) {
48+
const missing = [!wasm && 'wasm', !native && 'native'].filter(Boolean).join(', ');
49+
line(`**FAIL:** missing engine result for: ${missing}. Benchmark cannot assert parity.`);
50+
process.exit(1);
51+
}
52+
53+
// ── Thresholds ─────────────────────────────────────────────────────────
54+
// Each entry:
55+
// name — human-readable label
56+
// actual — computed metric
57+
// limit — ceiling; actual must be ≤ limit
58+
// formatter — how to render the value
59+
// tracks — related issue link shown on failure
60+
const checks = [
61+
{
62+
name: 'File-set gap (|wasm − native|)',
63+
actual: Math.abs(wasm.files - native.files),
64+
limit: 2,
65+
formatter: (v) => String(v),
66+
tracks: '#1011',
67+
},
68+
{
69+
name: 'DB size ratio (native / wasm)',
70+
actual: native.dbSizeBytes / wasm.dbSizeBytes,
71+
limit: 1.02,
72+
formatter: (v) => v.toFixed(3),
73+
tracks: '#1010',
74+
},
75+
{
76+
name: 'Full-build edges-phase ratio',
77+
actual: (native.phases?.edgesMs ?? 0) / Math.max(wasm.phases?.edgesMs ?? 1, 1),
78+
limit: 1.3,
79+
formatter: (v) => v.toFixed(2),
80+
tracks: '#1013',
81+
},
82+
{
83+
name: 'Full-build roles-phase ratio',
84+
actual: (native.phases?.rolesMs ?? 0) / Math.max(wasm.phases?.rolesMs ?? 1, 1),
85+
limit: 1.3,
86+
formatter: (v) => v.toFixed(2),
87+
tracks: '#1013',
88+
},
89+
{
90+
name: '1-file incremental ratio',
91+
actual:
92+
(native.oneFileRebuildMs ?? 0) /
93+
Math.max(wasm.oneFileRebuildMs ?? 1, 1),
94+
limit: 1.5,
95+
formatter: (v) => v.toFixed(2),
96+
tracks: '#1012',
97+
},
98+
];
99+
100+
line('| Check | Actual | Limit | Status | Tracks |');
101+
line('|---|---:|---:|---|---|');
102+
103+
let failed = 0;
104+
for (const c of checks) {
105+
const ok = c.actual <= c.limit;
106+
if (!ok) failed++;
107+
const status = ok ? ':white_check_mark: pass' : ':x: **fail**';
108+
line(
109+
`| ${c.name} | ${c.formatter(c.actual)} | ${c.formatter(c.limit)} | ${status} | ${c.tracks} |`,
110+
);
111+
}
112+
113+
line('');
114+
if (failed > 0) {
115+
line(
116+
`**${failed} parity check(s) failed.** See linked issues for root-cause tracking; the benchmark doc PR (if opened) captures the raw numbers.`,
117+
);
118+
process.exit(1);
119+
}
120+
121+
line('All parity checks passed.');

0 commit comments

Comments
 (0)