Floor baselines: zero-time + empty-dfg — multi-entry on 3 boards by protosphinx · Pull Request #15 · erphq/pm-bench

protosphinx · 2026-05-01T05:26:12Z

Stacked on top of #14 (uniform-baseline). Merge order: ... → #13 → #14 → this.

Summary

Adds two more "floor" baselines:
- zero-time for remaining-time — predicts 0 days for every prefix.
- empty for conformance (a discover mode) — submits a model with no transitions.
3 of 5 boards now demonstrate multi-entry standings (next-event from PR uniform-ref second baseline on next-event — multi-entry leaderboard demo #14, plus remaining-time and conformance from this PR).

What's new

pm_bench/baselines/zero_time.py — predict_zero_time(targets) returns a TimePrediction with predicted_days=0.0 for every target. ~10 lines.
predict --baseline zero --task remaining-time wired through CLI alongside mean.
discover --baseline empty writes an empty DFG ({"transitions": []}). Submitting it scores fitness=0, precision=0, F=0 — the absolute conformance floor.
New leaderboard entries on synthetic-toy:
- remaining-time: zero-ref MAE 2.7410 (vs mean-ref 1.3481 — zero loses by ~2x)
- conformance: empty-ref F 0.0 (vs dfg-ref F 1.0)
STANDINGS.md regenerated to show both rows on each board.

Why

Floor entries make "did the submission do anything at all?" answerable at a glance. A real submission must clear them; tying them is a red flag.
Demonstrates the leaderboard handles scores at both ends of the range — the rescore path was already correct, but having entries near the floor exercises the standings sort and table rendering.

Test plan

pytest -q — 117 passed (unchanged; no new tests, the existing multi-entry-sort test already covers the principle on next-event)
ruff check pm_bench tests — clean
pm-bench leaderboard --all --verify — 5/5 boards OK; 3 boards now show 2 entries
STANDINGS.md staleness canary green

Roadmap impact

Outcome and bottleneck still have a single entry each (prior-ref and mean-wait-ref). Floor baselines for those are easy follow-ups (constant 0.5 / random rank), but they're less informative — both already have a clear "0.5 floor" semantically. Skipping for now.

- pm_bench/baselines/zero_time.py: predicts 0 days for every prefix (absolute MAE floor) - discover --baseline empty: submits an empty DFG (fitness 0, F 0 — absolute conformance floor) - CLI: predict --baseline zero --task remaining-time wired alongside mean; discover --baseline empty wired alongside dfg - New leaderboard entries: * remaining-time/synthetic-toy: zero-ref MAE 2.7410 vs mean-ref 1.3481 * conformance/synthetic-toy: empty-ref F 0.0 vs dfg-ref F 1.0 - 3 of 5 boards now have 2 entries (next-event already had uniform-ref from the previous PR; outcome and bottleneck still single-entry pending future submissions) - STANDINGS.md regenerated; 117 tests, ruff clean

Sweep across STATUS.md, baselines (uniform, zero_time), stats.py, cli.py, and the leaderboard JSON fixtures. ASCII-only punctuation. Tests pass unchanged (117); ruff clean.

protosphinx · 2026-05-01T17:54:29Z

Merged into main as part of the audit-cleanup stack (commit 9c00b47). The full content of this PR is now on main.

protosphinx added 2 commits April 30, 2026 22:25

chore: replace em dashes with hyphens per writing style guide

5854658

Sweep across STATUS.md, baselines (uniform, zero_time), stats.py, cli.py, and the leaderboard JSON fixtures. ASCII-only punctuation. Tests pass unchanged (117); ruff clean.

This was referenced May 1, 2026

pm-bench compare — diff two leaderboard JSON snapshots #16

Closed

test: exercise auto-download fetch path with tmp HTTP server #17

Closed

protosphinx deleted the branch uniform-baseline May 1, 2026 17:54

protosphinx closed this May 1, 2026

protosphinx deleted the floor-baselines branch May 1, 2026 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floor baselines: zero-time + empty-dfg — multi-entry on 3 boards#15

Floor baselines: zero-time + empty-dfg — multi-entry on 3 boards#15
protosphinx wants to merge 2 commits into
uniform-baselinefrom
floor-baselines

protosphinx commented May 1, 2026

Uh oh!

protosphinx commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

protosphinx commented May 1, 2026

Summary

What's new

Why

Test plan

Roadmap impact

Uh oh!

protosphinx commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant