Skip to content

End-to-end loop on synthetic-toy: prefixes + predict + score#2

Merged
protosphinx merged 1 commit into
mainfrom
end-to-end-loop
May 1, 2026
Merged

End-to-end loop on synthetic-toy: prefixes + predict + score#2
protosphinx merged 1 commit into
mainfrom
end-to-end-loop

Conversation

@protosphinx
Copy link
Copy Markdown
Member

Summary

  • Wires up prefixes, predict --baseline markov, and score so the README's command sequence actually runs end-to-end on synthetic-toy.
  • Adds the first-order Markov reference baseline — the floor any submission must clear. On synthetic-toy: top-1 0.976, top-3 1.000.
  • Locks the file formats the leaderboard CI will depend on (prefixes.csv, predictions.csv) via tests/test_e2e.py.

What's new

  • pm_bench/prefixes.py — extract (case_id, prefix_idx, prefix, true_next) targets from a split; skips length-1 cases. CSV round-trip.
  • pm_bench/predictions.py — predictions CSV (case_id, prefix_idx, predictions).
  • pm_bench/baselines/markov.py — fit on train cases only; unigram fallback for unseen last-activity. No torch / sklearn — just CPython.
  • CLI: pm-bench prefixes <name> --split split.json --out prefixes.csv, pm-bench predict <name> --split split.json --prefixes prefixes.csv --out predictions.csv --baseline markov, pm-bench score predictions.csv --prefixes prefixes.csv --task next-event.
  • README install/use section now reflects the loop that actually works; STATUS.md added; GOALS.md ticked.

Why this matters

  • v0.1's dataset fetch is gated on a 4TU TOS step that can't be automated. Locking the file format on synthetic-toy first means external contributors can build models against pm-bench now, on synthetic data, and slot the same code into BPI/Sepsis/Helpdesk the moment v0.1 lands.
  • Sets up the second reference baseline cleanly: adding gnn is a --baseline gnn choice once the dataset machinery is in place.

Test plan

  • pytest -q — 24 passed (was 17)
  • ruff check pm_bench tests — clean
  • Manual end-to-end smoke: split → prefixes → predict → score returns top1 0.976 / top3 1.000
  • tests/test_e2e.py runs the same sequence via click runner — guards the file format

Roadmap impact

  • Bumps v0.0.1 checkbox in README + GOALS; v0.1 (dataset fetch) is the next gate.

… score

- prefixes.py — extract (case_id, prefix_idx, prefix, true_next) targets
  from a split; CSV round-trip helpers
- predictions.py — predictions CSV format (case_id, prefix_idx, ranked)
- baselines/markov.py — first-order Markov reference (train-only fit,
  unigram fallback for unseen last-activity)
- CLI gains `prefixes`, `predict --baseline markov`, `score`; the full
  `split → prefixes → predict → score` loop now matches the README
- tests/test_e2e.py exercises the loop via click runner, locking the
  file formats the leaderboard depends on
- 24 tests pass (was 17); ruff clean
- Markov on synthetic-toy: top1 0.976, top3 1.000 — sets the floor any
  future model has to clear
Copy link
Copy Markdown
Member Author

auto-deferred for human review: LOC delta exceeds gate (652 added + 15 removed = 667 lines, threshold is 250)


Generated by Claude Code

@protosphinx protosphinx added the needs-review label May 1, 2026 — with Claude
@protosphinx protosphinx merged commit 562f86c into main May 1, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant