erphq · protosphinx · May 1, 2026 · May 1, 2026
diff --git a/STANDINGS.md b/STANDINGS.md
@@ -15,6 +15,7 @@ _DFG fitness × precision → F-score (higher is better)_
 | Model | F | Fitness | Precision | n_test | n_model |
 |---|---:|---:|---:|---:|---:|
 | `dfg-ref` | 1.0000 | 1.0000 | 1.0000 | 9 | 9 |
+| `empty-ref` | 0.0000 | 0.0000 | 0.0000 | 9 | 0 |
 
 ### next-event · synthetic-toy
 _top1 / top3 accuracy_
@@ -37,3 +38,4 @@ _MAE in days (lower is better)_
 | Model | mae_days | n |
 |---|---:|---:|
 | `mean-ref` | 1.3481 | 158 |
+| `zero-ref` | 2.7410 | 158 |
diff --git a/STATUS.md b/STATUS.md
@@ -60,17 +60,27 @@ pm-bench fetch bpi2020 --pin
 
 ## Recently shipped
 
+- **Floor baselines for time + conformance** (`floor-baselines` branch).
+  - `zero-time` for remaining-time: predicts 0 days for every prefix.
+    MAE 2.741 on synthetic-toy - exactly twice mean-ref's 1.348, as
+    expected for a constant zero predictor.
+  - `empty` for conformance: submits a model with no transitions.
+    F = 0 by construction; the absolute conformance floor.
+  - Both wired through CLI (`predict --baseline zero`, `discover
+    --baseline empty`) and added as second entries on their boards.
+    3 of 5 boards now demonstrate >1 entry; outcome and bottleneck
+    still single-entry pending future submissions.
 - **Uniform second baseline + multi-entry leaderboard demo**
   (`uniform-baseline` branch).
-  - `pm_bench/baselines/uniform.py` — ranks every training-set
+  - `pm_bench/baselines/uniform.py` - ranks every training-set
     activity in lexicographic order, identical for every prefix. The
     "didn't read the trace at all" floor.
   - `predict --baseline uniform` for next-event.
   - `leaderboard/next-event/synthetic-toy.json` now has **2 entries**:
     markov-ref top-1 0.9304 vs uniform-ref top-1 0.2025. Standings
     sort puts markov on top.
   - Demonstrates the leaderboard scales to multiple submissions on
-    one (task, dataset) pair — a precondition for accepting external
+    one (task, dataset) pair - a precondition for accepting external
     submissions.
   - 1 new test asserting the standings order; 117 total.
 - **`pm-bench stats <name>`** (`stats-command` branch).
@@ -80,7 +90,7 @@ pm-bench fetch bpi2020 --pin
   - Pure CPython; works on synthetic-toy and any CSV path the
     existing `_load_events` accepts.
   - 7 new tests; 116 total.
-- **Synthetic-toy bumped to 200 cases — outcome row finally lands**
+- **Synthetic-toy bumped to 200 cases - outcome row finally lands**
   (`synthetic-200` branch).
   - `synthetic_log()` default `n_cases` = 200 (was 50). Test partition
     now has ~45 positive cases (`delivery_confirmed`) so AUC is
@@ -90,7 +100,7 @@ pm-bench fetch bpi2020 --pin
     mean-ref MAE 1.3481, mean-wait-ref NDCG@10 0.9911,
     dfg-ref F=1.0 (both partitions now cover the full path graph).
   - **5th leaderboard board added**: `outcome/synthetic-toy.json`
-    with `prior-ref` entry — AUC 0.6319, n_pos 45 / 158. Real floor
+    with `prior-ref` entry - AUC 0.6319, n_pos 45 / 158. Real floor
     for any temporal model on the outcome task.
   - `_rescore_outcome` + `_outcome_truth_for_dataset` added to
     `leaderboard.py`. `pm-bench leaderboard --all --verify` now

diff --git a/leaderboard/conformance/synthetic-toy.json b/leaderboard/conformance/synthetic-toy.json
@@ -24,6 +24,22 @@
       },
       "scored_at": "2026-04-30T00:00:00Z",
       "notes": "Directly-follows graph extracted from training cases. With 200 cases the train and test partitions both observe every path-graph edge, so fitness = precision = 1.0. Any future submission has to keep this floor while generalizing to a different test split."
+    },
+    {
+      "model": "empty-ref",
+      "version": "0.1.0",
+      "predictions_path": "leaderboard/predictions/conformance/synthetic-toy/empty-ref.json",
+      "code": "https://github.com/erphq/pm-bench/blob/main/pm_bench/cli.py",
+      "paper": null,
+      "score": {
+        "fitness": 0.0,
+        "precision": 0.0,
+        "fscore": 0.0,
+        "n_test_transitions": 9,
+        "n_model_transitions": 0
+      },
+      "scored_at": "2026-04-30T00:00:00Z",
+      "notes": "Empty model - zero transitions. Absolute floor: fitness 0, F 0. A submission has to do strictly better."
     }
   ]
 }
diff --git a/leaderboard/predictions/conformance/synthetic-toy/empty-ref.json b/leaderboard/predictions/conformance/synthetic-toy/empty-ref.json
@@ -0,0 +1,3 @@
+{
+  "transitions": []
+}
diff --git a/leaderboard/predictions/remaining-time/synthetic-toy/zero-ref.csv.gz b/leaderboard/predictions/remaining-time/synthetic-toy/zero-ref.csv.gz
diff --git a/leaderboard/remaining-time/synthetic-toy.json b/leaderboard/remaining-time/synthetic-toy.json
@@ -20,7 +20,20 @@
         "n": 158
       },
       "scored_at": "2026-04-30T00:00:00Z",
-      "notes": "Constant prediction = mean remaining-time observed on training prefixes. The dumbest model that still respects the train/test split — the floor any temporal model must clear."
+      "notes": "Constant prediction = mean remaining-time observed on training prefixes. The dumbest model that still respects the train/test split - the floor any temporal model must clear."
+    },
+    {
+      "model": "zero-ref",
+      "version": "0.1.0",
+      "predictions_path": "leaderboard/predictions/remaining-time/synthetic-toy/zero-ref.csv.gz",
+      "code": "https://github.com/erphq/pm-bench/blob/main/pm_bench/baselines/zero_time.py",
+      "paper": null,
+      "score": {
+        "mae_days": 2.7410337552742616,
+        "n": 158
+      },
+      "scored_at": "2026-04-30T00:00:00Z",
+      "notes": "Predicts 0 days remaining for every prefix. The absolute MAE floor - beats nothing, exists to anchor the lower bound."
     }
   ]
 }
diff --git a/pm_bench/baselines/__init__.py b/pm_bench/baselines/__init__.py
@@ -26,6 +26,7 @@
     write_outcome_predictions_csv,
 )
 from pm_bench.baselines.uniform import UniformBaseline, fit_uniform, predict_uniform
+from pm_bench.baselines.zero_time import predict_zero_time
 
 __all__ = [
     "MarkovBaseline",
@@ -44,6 +45,7 @@
     "predict_mean_wait",
     "predict_prior_outcome",
     "predict_uniform",
+    "predict_zero_time",
     "read_outcome_predictions_csv",
     "read_time_predictions_csv",
     "write_outcome_predictions_csv",

diff --git a/pm_bench/baselines/uniform.py b/pm_bench/baselines/uniform.py
@@ -7,7 +7,7 @@
 real model has to clear that floor or it isn't using the trace.
 
 This baseline exists alongside `markov` to demonstrate the leaderboard
-scales to multiple entries on the same (task, dataset) pair — and to
+scales to multiple entries on the same (task, dataset) pair - and to
 give an honest "did the trace help at all?" comparison number.
 """
 from __future__ import annotations
@@ -40,7 +40,7 @@ def fit_uniform(events: Iterable[Event], train_case_ids: Iterable[CaseId]) -> Un
 def predict_uniform(
     model: UniformBaseline, prefixes: Iterable[Prefix]
 ) -> list[Prediction]:
-    """Same ranked list for every target — the entire activity vocab."""
+    """Same ranked list for every target - the entire activity vocab."""
     ranking = model.activities_sorted
     return [
         Prediction(case_id=p.case_id, prefix_idx=p.prefix_idx, ranked=ranking)

diff --git a/pm_bench/baselines/zero_time.py b/pm_bench/baselines/zero_time.py
@@ -0,0 +1,21 @@
+"""Zero-remaining-time floor baseline.
+
+Predicts 0 days of remaining time for every prefix. The absolute MAE
+floor - any model that ties this isn't using time information at all.
+Sits below `mean-time` on the leaderboard and gives the dumbest
+possible reference number.
+"""
+from __future__ import annotations
+
+from collections.abc import Iterable
+
+from pm_bench.baselines.mean_time import TimePrediction
+from pm_bench.prefixes import TimeTarget
+
+
+def predict_zero_time(targets: Iterable[TimeTarget]) -> list[TimePrediction]:
+    """Constant 0.0 for every prefix."""
+    return [
+        TimePrediction(case_id=t.case_id, prefix_idx=t.prefix_idx, predicted_days=0.0)
+        for t in targets
+    ]
diff --git a/pm_bench/cli.py b/pm_bench/cli.py
@@ -22,6 +22,7 @@
     write_outcome_predictions_csv,
 )
 from pm_bench.baselines.uniform import fit_uniform, predict_uniform
+from pm_bench.baselines.zero_time import predict_zero_time
 from pm_bench.bottleneck import (
     extract_bottleneck_targets,
     read_bottleneck_predictions_csv,
@@ -357,11 +358,11 @@ def prefixes(name: str, split_path: str, out_path: str, partition: str, task: st
 )
 @click.option(
     "--baseline",
-    type=click.Choice(["markov", "uniform", "mean", "prior", "mean-wait"]),
+    type=click.Choice(["markov", "uniform", "mean", "zero", "prior", "mean-wait"]),
     default="markov",
     show_default=True,
     help=(
-        "markov / uniform → next-event; mean → remaining-time; "
+        "markov / uniform → next-event; mean / zero → remaining-time; "
         "prior → outcome; mean-wait → bottleneck."
     ),
 )
@@ -398,11 +399,14 @@ def predict(
             preds = predict_uniform(uni_model, targets)
         n = write_predictions_csv(preds, out_path)
     elif task == "remaining-time":
-        if baseline != "mean":
+        if baseline not in ("mean", "zero"):
             raise click.UsageError(f"baseline {baseline!r} doesn't apply to remaining-time")
-        time_model = fit_mean_time(events, split_data["train"])
         time_targets = read_time_targets_csv(prefixes_path)
-        time_preds = predict_mean_time(time_model, time_targets)
+        if baseline == "mean":
+            time_model = fit_mean_time(events, split_data["train"])
+            time_preds = predict_mean_time(time_model, time_targets)
+        else:
+            time_preds = predict_zero_time(time_targets)
         n = write_time_predictions_csv(time_preds, out_path)
     elif task == "outcome":
         if baseline != "prior":
@@ -445,25 +449,27 @@ def predict(
 )
 @click.option(
     "--baseline",
-    type=click.Choice(["dfg"]),
+    type=click.Choice(["dfg", "empty"]),
     default="dfg",
     show_default=True,
-    help="dfg → directly-follows graph from training cases.",
+    help="dfg → DFG from training cases; empty → no transitions (absolute floor).",
 )
 def discover(name: str, split_path: str, out_path: str, baseline: str) -> None:
     """Discover a process model from training cases.
 
-    The submission for the conformance task is a model JSON. Today only
-    the DFG baseline ships in-tree - other discoverers (alpha, inductive)
-    can be added behind a `[discovery]` extra without changing the
-    submission format.
+    The submission for the conformance task is a model JSON. `dfg`
+    extracts the directly-follows graph; `empty` submits no transitions
+    (the absolute conformance floor - fitness 0, F-score 0).
     """
     events = _load_events(name)
     with open(split_path) as f:
         split_data = json.load(f)
-    if baseline != "dfg":
+    if baseline == "dfg":
+        dfg = extract_dfg(events, split_data["train"])
+    elif baseline == "empty":
+        dfg = set()
+    else:
         raise click.UsageError(f"unknown discoverer: {baseline}")
-    dfg = extract_dfg(events, split_data["train"])
     n = write_model_json(dfg, out_path)
     click.echo(f"wrote model with {n} transitions to {out_path} (baseline={baseline})")
 

diff --git a/pm_bench/stats.py b/pm_bench/stats.py
@@ -1,6 +1,6 @@
 """Quick summary stats for an event log.
 
-Useful when inspecting a new dataset — n_cases, n_events, distinct
+Useful when inspecting a new dataset - n_cases, n_events, distinct
 activity count, time span, top-N most-frequent activities and
 transitions, mean / median case length. Pure CPython; runs in the
 same process as the rest of pm-bench so it works on `synthetic-toy`,