Skip to content

v0.3: bottleneck task — NDCG@10 + mean-wait baseline + leaderboard#8

Closed
protosphinx wants to merge 1 commit into
outcome-taskfrom
bottleneck-task
Closed

v0.3: bottleneck task — NDCG@10 + mean-wait baseline + leaderboard#8
protosphinx wants to merge 1 commit into
outcome-taskfrom
bottleneck-task

Conversation

@protosphinx
Copy link
Copy Markdown
Member

Stacked on top of #7 (outcome). Merge order: #2#3#4#5#6#7 → this.

Summary

  • Adds bottleneck detection as the fourth task — per-transition wait-time ranking, NDCG@10. CLI is fully task-aware; the leaderboard scaffold now hosts 3 entries on synthetic-toy (next-event, remaining-time, bottleneck) plus the outcome pipeline.
  • Mean-wait baseline scores NDCG@10 0.9786 on synthetic-toy. Strong floor — ranking by training-mean already nails the ordering.

What's new

  • score_bottleneck — pure-CPython NDCG@k. Predicted scores rank transitions; truth is the held-out per-(a,b) mean wait time. Missing predictions sink to the bottom (refusing to predict doesn't earn credit).
  • pm_bench/bottleneck.pyBottleneckTarget, BottleneckPrediction, extract_bottleneck_targets, CSV r/w. Per-transition shape (activity_a, activity_b, mean_wait_seconds, n_observations) — different from the per-prefix tasks, intentionally so.
  • pm_bench/baselines/mean_wait.py — train-mean-per-transition with observation-weighted global fallback. ~30 lines.
  • CLI dispatch--task bottleneck, --baseline mean-wait. UsageError on mismatched (task, baseline) pairs, consistent with the other tasks.
  • leaderboard.py_rescore_bottleneck and a new dispatch branch in rescore; standings now sorts by ndcg_at_k for bottleneck and auc for outcome (correctly higher-better for both).
  • CLI pm-bench leaderboard prints task-appropriate columns: mae_days for time, auc for outcome, ndcg@10 for bottleneck, top1/top3 for next-event.
  • 7 new tests (test_bottleneck.py); 86 total, ruff clean.

Smoke

$ pm-bench prefixes synthetic-toy --split split.json --out bt.csv --task bottleneck
wrote 6 prefixes to bt.csv (task=bottleneck partition=test)

$ pm-bench predict synthetic-toy --split split.json --prefixes bt.csv \
    --out bpreds.csv --baseline mean-wait --task bottleneck
wrote 6 predictions to bpreds.csv (task=bottleneck baseline=mean-wait)

$ pm-bench score bpreds.csv --prefixes bt.csv --task bottleneck
{ "task": "bottleneck", "ndcg_at_k": 0.9786..., "k": 10, "n_transitions": 6 }

$ pm-bench leaderboard --all --verify
bottleneck/synthetic-toy: OK — 1 entry(ies)
next-event/synthetic-toy: OK — 1 entry(ies)
remaining-time/synthetic-toy: OK — 1 entry(ies)

Test plan

  • pytest -q — 86 passed (was 73 on PR v0.3: outcome task — AUC scoring + prior baseline #7)
  • ruff check pm_bench tests — clean
  • NDCG math: perfect ranking, inverted ranking, missing-predictions, hand-checked 3-transition example
  • Drift canary on the new leaderboard entry runs through the existing --all --verify workflow

Roadmap impact

  • v0.3 (5-task scoring): now 4 of 5 ✅ — conformance is the only remaining task. That one will need pm4py (process discovery + token replay), so it's the natural moment to introduce a [bpi] extra.

- score_bottleneck — pure-CPython NDCG@k. Predictions rank
  transitions; truth is the held-out per-(a,b) mean wait time.
  Missing predictions sink to the bottom (refusing to predict
  doesn't earn credit)
- pm_bench/bottleneck.py — BottleneckTarget + extract; per-transition
  shape (4-tuple: a, b, mean_wait_seconds, n_observations) instead
  of per-prefix
- baselines/mean_wait.py — train-mean-per-transition with global-mean
  fallback. On synthetic-toy: NDCG@10 0.9786 over 6 transitions
- CLI: --task bottleneck, --baseline mean-wait wired through
  prefixes / predict / score
- leaderboard/bottleneck/synthetic-toy.json with mean-wait-ref entry
  (NDCG@10 0.9786, n_transitions 6); pm-bench leaderboard --all now
  walks 3 boards (next-event, remaining-time, bottleneck)
- 7 new tests; 86 total, ruff clean
- v0.3 marked partial → 4 of 5 tasks (conformance remains)
@protosphinx
Copy link
Copy Markdown
Member Author

Merged into main as part of the audit-cleanup stack (commit 9c00b47). The full content of this PR is now on main.

@protosphinx protosphinx deleted the branch outcome-task May 1, 2026 17:54
@protosphinx protosphinx closed this May 1, 2026
@protosphinx protosphinx deleted the bottleneck-task branch May 1, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant