Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions GOALS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@ Be the default benchmark for new process-mining methods. Within 18 months,
≥10 external papers report `pm-bench` numbers in their abstract.

## v0 success criteria
- 7 datasets fetchable + hash-verified
- 7 datasets fetchable + hash-verified — fetch/hash machinery shipped
(`pm-bench fetch <name> [--pin]`); per-dataset hash pins pending
the one-time TOS-gated downloads
- 5 tasks with fixed scoring scripts (next-event ✅; remaining-time, outcome,
conformance, bottleneck pending)
- `gnn` runs end-to-end as the reference baseline (Markov reference ✅;
`gnn` integration pending v0.1 dataset machinery)
`gnn` integration pending the first pinned dataset)
- End-to-end loop runs on `synthetic-toy` ✅ — split → prefixes →
predict → score, covered by `tests/test_e2e.py`

Expand Down
25 changes: 20 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,22 @@ pm-bench score predictions.csv \

The full loop (`split → prefixes → predict → score`) runs end-to-end on
`synthetic-toy` today; it's covered by `tests/test_e2e.py` and locks
the file formats the leaderboard depends on. BPI / Sepsis / Helpdesk
will use the same commands once v0.1's fetch+cache machinery lands —
4TU's interactive TOS makes the download itself a one-time manual
step, but everything downstream is automated.
the file formats the leaderboard depends on.

For the public datasets, the fetch + hash machinery is in place:

```bash
pm-bench fetch bpi2020 # auto-downloads if URL is set
pm-bench fetch bpi2020 --pin # after manual TOS-gated download,
# emits a registry.yml sha256 patch
```

`pm-bench fetch` resolves a cache directory (`$PM_BENCH_CACHE`, else
`~/.cache/pm-bench/`), verifies the registry sha256 if pinned, and —
for TOS-gated 4TU / Mendeley datasets — prints the precise landing URL
and on-disk path you need to fill in. The per-dataset hash pins are the
last manual step before BPI / Sepsis / Helpdesk run through the same
loop as `synthetic-toy`.

The full pipeline:

Expand Down Expand Up @@ -204,7 +216,10 @@ honesty. The point of the benchmark is to make the comparison real.
- [x] v0.0.1 — end-to-end loop on `synthetic-toy`: split → prefixes →
predict (Markov) → score, with a smoke test that locks the file
formats
- [ ] v0.1 — fetch + cache + hash for all 7 datasets
- [🟡] v0.1 — fetch + cache + hash for all 7 datasets. Machinery
shipped (`pm-bench fetch <name> [--pin]`, sha256 verification,
`$PM_BENCH_CACHE` resolution); per-dataset hash-pinning PRs
pending the one-time TOS-gated downloads from 4TU and Mendeley.
- [ ] v0.2 — splits: next-event, remaining-time
- [ ] v0.3 — scoring scripts for all 5 tasks
- [ ] v0.4 — leaderboard CI + landing page
Expand Down
95 changes: 70 additions & 25 deletions STATUS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,61 +4,106 @@ _Last updated: 2026-04-30._

## Where we are

Pre-v0. The end-to-end loop runs on the bundled `synthetic-toy`
dataset; the seven public datasets are still pending v0.1's fetch +
hash machinery.
Pre-v0. Two pieces shipped on top of v0.0:

A submission today looks like:
1. The end-to-end loop runs on the bundled `synthetic-toy` dataset
(split → prefixes → predict → score; Markov reference baseline
gets top-1 0.976, top-3 1.000).
2. The fetch + hash + cache machinery is in place. `pm-bench fetch
<name>` resolves a dataset to a local path, verifies the registry
sha256, and prints precise instructions for the TOS-gated download
step on 4TU / Mendeley. `--pin` emits the `registry.yml` patch a
contributor pastes into a PR after the manual download.

What's still left in v0.1 is purely a per-dataset operational task: do
the one-time download, run `--pin`, open seven small PRs to pin the
hashes, then wire the XES parser to `_load_events` so `split`/
`prefixes`/`predict` work on real BPI data. None of it requires
further code design.

A submission today on the bundled toy:

```bash
pm-bench split synthetic-toy > split.json
pm-bench prefixes synthetic-toy --split split.json --out prefixes.csv
pm-bench predict synthetic-toy --split split.json \
--prefixes prefixes.csv --out predictions.csv --baseline markov
pm-bench score predictions.csv --prefixes prefixes.csv --task next-event
# → top1 0.976, top3 1.000 (Markov on synthetic-toy)
# → top1 0.976, top3 1.000
```

That sequence is the contract — it's what `tests/test_e2e.py` runs in
CI, and it's what the leaderboard CI will run once datasets are pinned.
The fetch flow on a TOS-gated dataset:

```bash
pm-bench fetch bpi2020
# → bpi2020: no download_url (TOS-gated). Visit https://data.4tu.nl/...,
# accept the terms, and save the archive to ~/.cache/pm-bench/bpi2020.xes.gz.
# Then re-run `pm-bench fetch bpi2020 --pin` to compute the sha256.

# (manual download + place in cache dir)

pm-bench fetch bpi2020 --pin
# → bpi2020: cached at ~/.cache/pm-bench/bpi2020.xes.gz (unpinned)
# sha256: <hex>
#
# # paste under the matching dataset entry in pm_bench/registry.yml:
# - name: bpi2020
# sha256: <hex>
```

## Recently shipped

- **End-to-end loop on synthetic-toy** (`end-to-end-loop` branch).
- **v0.1 fetch + hash machinery** (`dataset-fetch` branch).
- `pm_bench/cache.py` — cache root resolution
(`$PM_BENCH_CACHE` → `~/.cache/pm-bench/`), per-dataset path with
correct extension by format.
- `pm_bench/fetch.py` — `ensure_cached(dataset)` covers the four
cases: cached+match, cached+mismatch (loud failure),
cached+unpinned (returns actual hash), not-cached (auto-download
if URL set, otherwise raise `ManualFetchRequired`). Streams in
1 MiB chunks; atomic `.part`-then-rename writes; sha256 verified
against the registry pin.
- CLI `pm-bench fetch <name> [--pin]` — prints status, emits a
pasteable `registry.yml` patch when `--pin` is set.
- 13 new tests across `test_cache.py` and `test_fetch.py`. 37 total.
- **End-to-end loop on synthetic-toy** (`end-to-end-loop` branch,
PR #2).
- `pm_bench/prefixes.py` — extract prediction targets from a split,
write/read CSV. Skips length-1 cases.
- `pm_bench/predictions.py` — predictions CSV format
(`case_id,prefix_idx,predictions`).
- `pm_bench/baselines/markov.py` — first-order Markov reference
baseline. Trained on the train partition only; falls back to
unigram for unseen last-activities.
- CLI gained `prefixes`, `predict`, `score`. The full
`split → prefixes → predict → score` loop now matches what the
README advertises.
- CLI gained `prefixes`, `predict`, `score`.
- `tests/test_e2e.py` covers the loop end-to-end via the click
runner; format changes will trip it.
- **v0.0** (initial release): scaffold, registry, case-chrono split,
next-event scoring function, CLI `list` / `info` / `split`.

## Next up

- **v0.1 — dataset fetch + hash** for the seven public logs. The 4TU
portal needs interactive TOS acceptance per dataset, so the fetch
itself is a one-time manual step; the rest (cache → verify hash →
parse XES → run the same loop) is automated. This is the work that
unblocks every downstream milestone.
- **`gnn` as the second reference baseline** once v0.1 lands. `gnn`'s
v0.5 milestone is symmetrical with this — it's been waiting for a
pinned dataset registry, which `pm-bench` is meant to provide.
- **One-time dataset pinning.** Per dataset (BPI 2012/2017/2018/2019/
2020 collection, Sepsis, Helpdesk): accept the TOS, save to the
cache, run `pm-bench fetch <name> --pin`, open the registry PR.
This is the gate on every downstream milestone.
- **XES parser wiring.** `_load_events` currently rejects everything
except `synthetic-toy`. Once a dataset is pinned, swap that branch
for a pm4py-backed XES read (move pm4py to `[bpi]` extras so the
base install stays light).
- **`gnn` as the second reference baseline.** `gnn`'s v0.5 milestone
has been waiting for a pinned dataset registry, which `pm-bench`
now provides the moment any single dataset is pinned.
- Additional tasks beyond next-event (remaining-time, outcome,
conformance, bottleneck). The split + prefixes machinery is shared;
scoring is the per-task piece.

## Known gaps

- No `pm-bench fetch` yet. README still hints at it; the install &
use section now shows the loop that actually works (synthetic-toy
only) so the doc and the CLI line up.
- `predict` currently only knows `markov`. The `--baseline` flag is a
click choice so adding a second is a one-liner, but the second one
worth adding is `gnn`, which depends on v0.1.
- The base install does not pull `pm4py`, so XES parsing isn't wired
yet. Adding a `[bpi]` extra is the right move when we pin the
first dataset — keeps `pip install pm-bench` fast for users who
only need scoring.
- No leaderboard CI yet (v0.4). The file formats are stable, so this
is "wire up a workflow that runs `pm-bench score`" — orthogonal to
the dataset work.
58 changes: 58 additions & 0 deletions pm_bench/cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""Local cache directory for downloaded event logs.

Datasets land in `$PM_BENCH_CACHE` if set, else `~/.cache/pm-bench/`.
We never write inside the install tree — the cache survives uninstalls
and wheel rebuilds, and a single cache can be shared across virtualenvs.

The on-disk layout is one file per dataset:

<cache_root>/<name>.<ext>

where `<ext>` is `xes.gz` for XES logs (the canonical 4TU
distribution form) and `csv` / `csv.gz` for CSV. The synthetic-toy
dataset is generated on demand and never touches the cache.
"""
from __future__ import annotations

import os
from pathlib import Path

from pm_bench.registry import Dataset


def cache_root(override: str | None = None) -> Path:
"""Return the cache root, creating it if needed.

Resolution order: explicit `override`, then `$PM_BENCH_CACHE`, then
`~/.cache/pm-bench/`. The directory is created on first call so
callers don't have to.
"""
if override:
root = Path(override).expanduser()
elif env := os.environ.get("PM_BENCH_CACHE"):
root = Path(env).expanduser()
else:
root = Path.home() / ".cache" / "pm-bench"
root.mkdir(parents=True, exist_ok=True)
return root


_EXT_BY_FORMAT = {
"xes": "xes.gz",
"csv": "csv",
}


def cache_path(dataset: Dataset, override_root: str | None = None) -> Path:
"""Return the on-disk path where this dataset's archive lives.

The path is purely a function of `(cache_root, name, format)`; we
do not check whether the file actually exists. Callers should test
`path.exists()` before reading.
"""
if dataset.format == "synthetic":
raise ValueError(f"{dataset.name} is generated on demand, not cached")
ext = _EXT_BY_FORMAT.get(dataset.format)
if ext is None:
raise ValueError(f"unknown dataset format: {dataset.format}")
return cache_root(override_root) / f"{dataset.name}.{ext}"
68 changes: 68 additions & 0 deletions pm_bench/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@

from pm_bench import _synth
from pm_bench.baselines.markov import fit_markov, predict_markov
from pm_bench.fetch import (
FetchError,
ManualFetchRequired,
ensure_cached,
sha256_file,
)
from pm_bench.predictions import read_predictions_csv, write_predictions_csv
from pm_bench.prefixes import extract_prefixes, read_prefixes_csv, write_prefixes_csv
from pm_bench.registry import get_dataset, load_registry
Expand Down Expand Up @@ -71,6 +77,68 @@ def info(name: str) -> None:
)


@main.command()
@click.argument("name")
@click.option(
"--pin",
is_flag=True,
default=False,
help="After locating the cached file, print a registry.yml patch with its sha256.",
)
def fetch(name: str, pin: bool) -> None:
"""Make a dataset available locally and verify its hash.

Auto-downloads when `download_url` is set; otherwise prints
instructions for the manual TOS-gated download path (4TU / Mendeley).
"""
try:
d = get_dataset(name)
except KeyError:
click.echo(f"unknown dataset: {name}", err=True)
sys.exit(1)

if d.format == "synthetic":
click.echo(f"{name}: generated on demand, no fetch needed")
return

try:
result = ensure_cached(d)
except ManualFetchRequired as exc:
# Special-cased only so we can also handle --pin against a file the
# user just placed by hand. If the file is now there, recurse via
# ensure_cached; otherwise print the instructions and bail.
path = exc.expected_path
if path.exists():
actual = sha256_file(path)
click.echo(f"{name}: cached at {path}")
click.echo(f" sha256: {actual}")
if pin:
_print_pin_patch(name, actual)
elif d.sha256 is None:
click.echo(" (registry hash unset — re-run with --pin to emit a patch)")
return
click.echo(str(exc), err=True)
sys.exit(2)
except FetchError as exc:
click.echo(f"{name}: {exc}", err=True)
sys.exit(2)

state = "downloaded" if result.downloaded else "cached"
pinned = "verified" if result.pinned else "unpinned"
click.echo(f"{name}: {state} at {result.path} ({pinned})")
click.echo(f" sha256: {result.sha256}")
if pin and not result.pinned:
_print_pin_patch(name, result.sha256)


def _print_pin_patch(name: str, digest: str) -> None:
"""Print a YAML snippet the user can paste into registry.yml."""
click.echo("")
click.echo("# paste under the matching dataset entry in pm_bench/registry.yml:")
click.echo(f" - name: {name}")
click.echo(f" sha256: {digest}")


@main.command()
@click.argument("name")
@click.option("--task", default="next-event", show_default=True)
Expand Down
Loading