Skip to content

Floor baselines: zero-time + empty-dfg — multi-entry on 3 boards#15

Closed
protosphinx wants to merge 2 commits into
uniform-baselinefrom
floor-baselines
Closed

Floor baselines: zero-time + empty-dfg — multi-entry on 3 boards#15
protosphinx wants to merge 2 commits into
uniform-baselinefrom
floor-baselines

Conversation

@protosphinx
Copy link
Copy Markdown
Member

Stacked on top of #14 (uniform-baseline). Merge order: ... → #13#14 → this.

Summary

What's new

  • pm_bench/baselines/zero_time.pypredict_zero_time(targets) returns a TimePrediction with predicted_days=0.0 for every target. ~10 lines.
  • predict --baseline zero --task remaining-time wired through CLI alongside mean.
  • discover --baseline empty writes an empty DFG ({"transitions": []}). Submitting it scores fitness=0, precision=0, F=0 — the absolute conformance floor.
  • New leaderboard entries on synthetic-toy:
    • remaining-time: zero-ref MAE 2.7410 (vs mean-ref 1.3481 — zero loses by ~2x)
    • conformance: empty-ref F 0.0 (vs dfg-ref F 1.0)
  • STANDINGS.md regenerated to show both rows on each board.

Why

  • Floor entries make "did the submission do anything at all?" answerable at a glance. A real submission must clear them; tying them is a red flag.
  • Demonstrates the leaderboard handles scores at both ends of the range — the rescore path was already correct, but having entries near the floor exercises the standings sort and table rendering.

Test plan

  • pytest -q — 117 passed (unchanged; no new tests, the existing multi-entry-sort test already covers the principle on next-event)
  • ruff check pm_bench tests — clean
  • pm-bench leaderboard --all --verify — 5/5 boards OK; 3 boards now show 2 entries
  • STANDINGS.md staleness canary green

Roadmap impact

  • Outcome and bottleneck still have a single entry each (prior-ref and mean-wait-ref). Floor baselines for those are easy follow-ups (constant 0.5 / random rank), but they're less informative — both already have a clear "0.5 floor" semantically. Skipping for now.

- pm_bench/baselines/zero_time.py: predicts 0 days for every prefix
  (absolute MAE floor)
- discover --baseline empty: submits an empty DFG (fitness 0, F 0 —
  absolute conformance floor)
- CLI: predict --baseline zero --task remaining-time wired alongside
  mean; discover --baseline empty wired alongside dfg
- New leaderboard entries:
  * remaining-time/synthetic-toy: zero-ref MAE 2.7410 vs mean-ref 1.3481
  * conformance/synthetic-toy:    empty-ref F 0.0    vs dfg-ref F 1.0
- 3 of 5 boards now have 2 entries (next-event already had uniform-ref
  from the previous PR; outcome and bottleneck still single-entry
  pending future submissions)
- STANDINGS.md regenerated; 117 tests, ruff clean
Sweep across STATUS.md, baselines (uniform, zero_time), stats.py,
cli.py, and the leaderboard JSON fixtures. ASCII-only punctuation.
Tests pass unchanged (117); ruff clean.
@protosphinx
Copy link
Copy Markdown
Member Author

Merged into main as part of the audit-cleanup stack (commit 9c00b47). The full content of this PR is now on main.

@protosphinx protosphinx deleted the branch uniform-baseline May 1, 2026 17:54
@protosphinx protosphinx closed this May 1, 2026
@protosphinx protosphinx deleted the floor-baselines branch May 1, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant