diff --git a/_toc.yml b/_toc.yml
index 4cd2c42..c728849 100755
--- a/_toc.yml
+++ b/_toc.yml
@@ -54,6 +54,10 @@ parts:
## CHALLENGES
- caption: Challenges
chapters:
+ - file: challenges/ecml2026
+ sections:
+ - file: challenges/ecml2026/eval
+ - file: challenges/ecml2026/levelconfig
- file: challenges/flatland-benchmarks
- file: challenges/flatland3
sections:
diff --git a/challenges/ecml2026.md b/challenges/ecml2026.md
new file mode 100644
index 0000000..5b48522
--- /dev/null
+++ b/challenges/ecml2026.md
@@ -0,0 +1,25 @@
+ECML 2026
+=========
+
+The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is the newest competition around the **Flat**land environment.
+
+
+
+- Follow the [starterkit](https://github.com/flatland-association/ecml2026-starterkit) to make your first submission.
+- Read about the [evaluations metrics](ecml2026/eval) of this edition.
+- Read about the [level configurations](ecml2026/levelconfig) of this edition.
+
+⏱ Timeline
+--------
+
+* Competition start: May 4th, 2026
+* Submission closure: June 8th, 2026 (AoE)
+* Winner announcement: June 15th, 2026
+
+
+
+
+
+⭐ Supported Flatland Versions
+-----------------------------
+You must use Flatland version [4.2.5](https://github.com/flatland-association/flatland-rl/releases/tag/v4.2.5) (forthcoming).
\ No newline at end of file
diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md
new file mode 100644
index 0000000..12ebec9
--- /dev/null
+++ b/challenges/ecml2026/eval.md
@@ -0,0 +1,100 @@
+Evaluation Metrics
+===
+
+The **[Flatland 4 challenge](https://fab.flatland.cloud/suites/24ab2336-a407-4329-b781-d71846250e24)** is the newest competition around the Flatland
+environment.
+
+In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The scenario setup and the evaluation
+metrics are designed accordingly. However, we are still open for other solutions as well, e.g. operations research, and encourage participants to benchmark
+their state-of-the art algorithms
+
+
+⚖ Evaluation metrics
+---
+
+### Normalized Episode Rewards
+
+The primary metrics uses the **normalized return** from your agents - the higher the better.
+
+What is the **normalized return**?
+
+- The **returns** are the sum of Flatland's default rewards your agents accumulate during each episode as described
+ in [rewards.md](../../environment/environment/rewards.md)
+- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. To guarantee this, the maximum penalty per agent can be at most
+ ```max_episode_steps```. This normalized rewards allows to compare results between environments of different dimensions and different number of agents.
+
+In code:
+
+```python
+normalized_reward = sum([max(cumulative_rewards[agent.handle], - self.env.max_episode_steps) for agent in agents]) / (
+ self.env.max_episode_steps * self.env.get_num_agents()) + 1
+```
+
+The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached. Therefore:
+
+- The **minimum possible value** (i.e. worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode.
+- The **maximum possible value** (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets and intermediate stops on time, i.e.
+ not receive any penalty.
+
+### Submission Score
+
+The submission score is the sum of the normalized scenario rewards.
+
+Evaluation is stopped when a submission does not reach the threshold of 25% completed agents within a level (5 scenarios).
+
+### Factors in reward function
+
+The factors for the [reward function](../../environment/environment/rewards.md) in this competition are:
+
+| factor | value |
+|-------------------------------------------|:-----:|
+| journey not started (cancellation factor) | 5 |
+| cancellation time buffer | 0 |
+| delay at target | 1 |
+| target not reached minimum penalty | 100 |
+| intermediate stop not served | 50 |
+| intermediate late arrival | 0.5 |
+| intermediate early departure | 0.5 |
+| collision | 250 |
+
+This configuration is implemented using `--rewards flatland.envs.rewards.ECML2026Rewards`.
+
+⛽ Time and Resource limits
+---
+
+The agents have to act within **time limits**:
+
+- You are allowed up to 30 minutes per scenario.
+- The full evaluation must finish in 4 hours.
+
+The agents are evaluated in a container with **resource limits**
+
+- 4 CPU cores
+- 15 GB of main memory.
+
+We do not provide GPUs.
+
+### Detailed overview over resource limits
+
+| Limit[^1] | Value | Submission outcome | Details |
+|---------------------------------------------|------------------------------------------------------------------------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. |
+| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. |
+| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now |
+| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario; evaluation terminated; results do notexcl. the overlong scenario |
+| `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation; results do not include the overlong scenario |
+| `ACTIVE_DEADLINE_SECONDS` | `600` (1h) | Failure/cleanup | everything including technical overhead for starting pods for submission |
+| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low`; results do include the test, but stop after the test. |
+| `ORCHESTRATION_JOB_K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "5Gi", "cpu": "1"}, "limits": {"memory": "5Gi", "cpu": "1"}}` | Failure | resource limits for pod running the submission |
+| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission |
+| `ORCHESTRATION_JOB_ACTIVE_DEADLINE_SECONDS` | `28800` (8h) | Failure/cleanup | everything including technical overhead for starting pods for orchestration and evaluation |
+
+[^1]: see [implementation](https://github.com/flatland-association/flatland-benchmarks/pull/594/changes)
+
+
+
+📪 Daily Submission Limits and Submission Closure.
+---
+You can submit up to 2 times per day.
+
+
diff --git a/challenges/ecml2026/levelconfig.md b/challenges/ecml2026/levelconfig.md
new file mode 100644
index 0000000..31f281c
--- /dev/null
+++ b/challenges/ecml2026/levelconfig.md
@@ -0,0 +1,12 @@
+# Level Configurations
+
+| level | #scenarios | number of agents | max. number of intermediate stops | properties | malfunctions |
+|---------|:----------:|----------------------------|:---------------------------------:|--------------------------------------------------------------------------------------------------|-------------------------------------------------|
+| level_0 | 5 | {8,11,14,26,28} | {3,3,4,6,6} | One train per Line starting at t=0 | None |
+| level_1 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Multiple trains per Line, different starting times, larger travel factor (more time for journey) | None |
+| level_2 | 5 | {90,125,150,300,532} | {3,3,4,6,6} | More trains, tighter schedules (periodicity & travel factor) | None |
+| level_3 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Like level 1 but with | Breakdowns |
+| level_4 | 5 | {90,125,150,300,532} | {3,3,4,6,6} | Like level 2 but with | Breakdowns and departure delays |
+| level_5 | 5 | {90,125,150,300,532} | {3,3,4,6,6} | Like level 4 but with more severe malfunctions (more frequent & longer) | Breakdowns and departure delays |
+| level_6 | 5 | {532,532,532,532,532} | {6,6,6,6,6} | Full map only, progressively more malfunctions (including infrastructure disruptions) | Breakdowns, departure delays and infrastructure |
+
diff --git a/environment/environment/rewards.md b/environment/environment/rewards.md
index d719f66..7e3053f 100644
--- a/environment/environment/rewards.md
+++ b/environment/environment/rewards.md
@@ -41,7 +41,7 @@ g_i = &
% journey not started:
+ \underbrace{(1 - \Delta_1) \cdot \phi \cdot (-(p + \pi))}_{\text{journey not started}}
% target not reached:
-+ \underbrace{(1 - \mathrm{A}_J) \cdot (-d)}_{\text{target not reached}}\\
++ \underbrace{(1 - \mathrm{A}_J) \cdot \min\{-\nu, -d\}}_{\text{target not reached}}\\
& + \sum_{j=2}^{J-1} \Big[
% intermediate late arrival
\underbrace{\mathrm{A}_j \cdot \alpha \cdot \min \{\alpha_j - a_j,0\}}_{\text{late arrival}}
@@ -55,22 +55,24 @@ g_i = &
$$
where $J$ is the number of stops (including the departure at the start, as well as the target) and $T$ is the number of timesteps of the episode.
-The symbols are described in Table~\ref{tab:events}.
-
-| | penalty factor
($\geq 0$) | event
$\in \{0,1\}$ | scheduled | actual | description |
-|:-----------------------------|--------------------------------|---------------------------|------------|--------|-----------------------------------------------------------------------------------------------------------|
-| delay at target | 1 | $\mathrm{A}_J$ | $\alpha_J$ | $a_J$ | $\mathrm{A}_J$ latest arrival and $a_J$ actual arrival at target $J$ |
-| journey not started | $\phi$, $\pi$ | $1-\Delta_1$ | | $p$ | cancellation factor $\phi$ and buffer $\pi$,
$p$ is the shortest path from start to target |
-| target not reached | 1 | $1-\mathrm{A}_J$ | | $d$ | time $d$ remaining on shortest path towards target |
-| intermediate late arrival | $\alpha$ | $\mathrm{A}_j$ | $\alpha_j$ | $a_j$ | latest arrival $\mathrm{A}_j$, actual arrival time $a_j$
at intermediate stop $j=2,\ldots,J-1$ |
-| intermediate stop not served | $\mu$ | $1-\mathrm{A}_j$ | | | intermediate stop $j$ not served, $j=2,\ldots,J-1$ |
-| intermediate early departure | $\delta$ | $\Delta_j$ | $\delta_j$ | $d_j$ | earliest departure from stop $j$, actual departure time $d_j$
at intermediate stop $j=2,\ldots,J-1$ |
-| collision | $\kappa$ | $\mathrm{K}_t$ | | $v(t)$ | collision at time $t$ with speed $v(t)$ |
+The symbols are described in the table below.
+
+| | penalty factor
($\geq 0$) | event
$\in \{0,1\}$ | scheduled | actual | description |
+|:-----------------------------|--------------------------------|---------------------------|------------|------------|-----------------------------------------------------------------------------------------------------------|
+| delay at target | 1 | $\mathrm{A}_J$ | $\alpha_J$ | $a_J$ | $\mathrm{A}_J$ latest arrival and $a_J$ actual arrival at target $J$ |
+| journey not started | $\phi$, $\pi$ | $1-\Delta_1$ | | $p$ | cancellation factor $\phi$ and buffer $\pi$,
$p$ is the shortest path from start to target |
+| target not reached | 1 | $1-\mathrm{A}_J$ | | $d$, $\nu$ | time $d$ remaining on shortest path to target with maximum speed for corresponding train category, or minimum penalty for target not reached $\nu$ if $d < \nu$ |
+| intermediate late arrival | $\alpha$ | $\mathrm{A}_j$ | $\alpha_j$ | $a_j$ | latest arrival $\mathrm{A}_j$, actual arrival time $a_j$
at intermediate stop $j=2,\ldots,J-1$ |
+| intermediate stop not served | $\mu$ | $1-\mathrm{A}_j$ | | | intermediate stop $j$ not served, $j=2,\ldots,J-1$ |
+| intermediate early departure | $\delta$ | $\Delta_j$ | $\delta_j$ | $d_j$ | earliest departure from stop $j$, actual departure time $d_j$
at intermediate stop $j=2,\ldots,J-1$ |
+| collision | $\kappa$ | $\mathrm{K}_t$ | | $v(t)$ | collision at time $t$ with speed $v(t)$ |
Note that the simulation enforces that agents cannot start earlier than $\delta_1$ at their start. On the other hand, early departure at intermediate stops is
not enforced by the simulation, but will be penalized by the rewards function.
Also note that order of intermediate stops is also not enforced by the simulation in case of overlapping time windows.
+To to compare results between environments of different dimensions and different number of agents, the rward can be normalized, such that the normalized reward is in the range $[0.0, 1.0]$. For each agent there is a maximum penalty set at ```- max_episode_steps```. This guarantees normalization regardless of the parametrization of the rewards.
+
```{admonition} Code reference
The reward is calculated in [envs/rewards.py](https://github.com/flatland-association/flatland-rl/blob/main/flatland/envs/rewards.py)
```