Skip to content

pm-bench stats <name> — one-shot summary stats for any log#13

Closed
protosphinx wants to merge 1 commit into
synthetic-200from
stats-command
Closed

pm-bench stats <name> — one-shot summary stats for any log#13
protosphinx wants to merge 1 commit into
synthetic-200from
stats-command

Conversation

@protosphinx
Copy link
Copy Markdown
Member

Stacked on top of #12 (synthetic-200). Merge order: #2#3#4#5#6#7#8#9#10#11#12 → this.

Summary

  • New CLI: pm-bench stats <name-or-path> prints n_cases / n_events / n_activities / time span / mean+median case length / top-N activities / top-N transitions.
  • Pure CPython; works on synthetic-toy and any CSV path. Dispatched through the same _load_events so the CSV-ingest path Just Works.

What's new

  • pm_bench/stats.py:summarize(events, top_n=10) — one pass over the event iterable; returns LogStats (frozen dataclass).
  • CLI pm-bench stats emits JSON, includes earliest/latest ISO timestamps and span_days so you can sanity-check a split before running it.
  • README gets a one-liner pointing at the command, right above the CSV ingest snippet.
  • 7 new tests including a CLI smoke against synthetic-toy. 116 total, ruff clean.

Smoke

$ pm-bench stats synthetic-toy --top-n 5 | head -10
{
  "n_events": 965,
  "n_cases": 200,
  "n_activities": 9,
  "span_days": 367.21,
  "earliest": "2024-01-01T00:00:00",
  "latest": "2025-01-02T05:00:00",
  "mean_case_length": 4.825,
  ...
}

Why

  • Anyone trying pm-bench on their own CSV currently has no quick "is my log loaded right?" check. stats answers that in one command before they wire up a model.
  • It's also useful for the leaderboard narrative: every entry can include the dataset's stats inline, without re-running PM4Py.

Test plan

- pm_bench/stats.py:summarize(events, top_n) → LogStats with
  n_events, n_cases, n_activities, time span, earliest/latest,
  mean/median case length, top-N activities, top-N transitions
- CLI: pm-bench stats <name-or-path> [--top-n N] emits JSON
- Works on synthetic-toy and any CSV path that the existing
  _load_events dispatch accepts
- 7 new tests (test_stats.py); 116 total, ruff clean
- README gets a one-liner pointing at the command
@protosphinx
Copy link
Copy Markdown
Member Author

Merged into main as part of the audit-cleanup stack (commit 9c00b47). The full content of this PR is now on main.

@protosphinx protosphinx deleted the branch synthetic-200 May 1, 2026 17:54
@protosphinx protosphinx closed this May 1, 2026
@protosphinx protosphinx deleted the stats-command branch May 1, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant