pm-bench stats <name> — one-shot summary stats for any log by protosphinx · Pull Request #13 · erphq/pm-bench

protosphinx · 2026-05-01T05:20:44Z

Stacked on top of #12 (synthetic-200). Merge order: #2 → #3 → #4 → #5 → #6 → #7 → #8 → #9 → #10 → #11 → #12 → this.

Summary

New CLI: pm-bench stats <name-or-path> prints n_cases / n_events / n_activities / time span / mean+median case length / top-N activities / top-N transitions.
Pure CPython; works on synthetic-toy and any CSV path. Dispatched through the same _load_events so the CSV-ingest path Just Works.

What's new

pm_bench/stats.py:summarize(events, top_n=10) — one pass over the event iterable; returns LogStats (frozen dataclass).
CLI pm-bench stats emits JSON, includes earliest/latest ISO timestamps and span_days so you can sanity-check a split before running it.
README gets a one-liner pointing at the command, right above the CSV ingest snippet.
7 new tests including a CLI smoke against synthetic-toy. 116 total, ruff clean.

Smoke

$ pm-bench stats synthetic-toy --top-n 5 | head -10
{
  "n_events": 965,
  "n_cases": 200,
  "n_activities": 9,
  "span_days": 367.21,
  "earliest": "2024-01-01T00:00:00",
  "latest": "2025-01-02T05:00:00",
  "mean_case_length": 4.825,
  ...
}

Why

Anyone trying pm-bench on their own CSV currently has no quick "is my log loaded right?" check. stats answers that in one command before they wire up a model.
It's also useful for the leaderboard narrative: every entry can include the dataset's stats inline, without re-running PM4Py.

Test plan

pytest -q — 116 passed (was 109 on PR synthetic-toy → 200 cases: outcome leaderboard row lands; all 5 boards real #12)
ruff check pm_bench tests — clean
CLI smoke against synthetic-toy returns expected counts (200 cases, 965 events, 9 activities)

- pm_bench/stats.py:summarize(events, top_n) → LogStats with n_events, n_cases, n_activities, time span, earliest/latest, mean/median case length, top-N activities, top-N transitions - CLI: pm-bench stats <name-or-path> [--top-n N] emits JSON - Works on synthetic-toy and any CSV path that the existing _load_events dispatch accepts - 7 new tests (test_stats.py); 116 total, ruff clean - README gets a one-liner pointing at the command

protosphinx · 2026-05-01T17:54:25Z

Merged into main as part of the audit-cleanup stack (commit 9c00b47). The full content of this PR is now on main.

This was referenced May 1, 2026

uniform-ref second baseline on next-event — multi-entry leaderboard demo #14

Closed

Floor baselines: zero-time + empty-dfg — multi-entry on 3 boards #15

Closed

protosphinx deleted the branch synthetic-200 May 1, 2026 17:54

protosphinx closed this May 1, 2026

protosphinx deleted the stats-command branch May 1, 2026 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pm-bench stats <name> — one-shot summary stats for any log#13

pm-bench stats <name> — one-shot summary stats for any log#13
protosphinx wants to merge 1 commit into
synthetic-200from
stats-command

protosphinx commented May 1, 2026

Uh oh!

protosphinx commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

protosphinx commented May 1, 2026

Summary

What's new

Smoke

Why

Test plan

Uh oh!

protosphinx commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant