Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
50ab172
docs: Flatland4 Railway Competition.
chenkins Mar 30, 2026
5853eb9
docs: Flatland4 Railway Competition.
chenkins Mar 30, 2026
cd274e0
docs: Flatland4 Railway Competition
CleverManu Mar 31, 2026
21bbaf9
docs: updated reward function with min penalty for delay at target
CleverManu Apr 13, 2026
48f04bb
docs: updated reward parametrization
CleverManu Apr 13, 2026
3e01698
docs: cleanup table
CleverManu Apr 13, 2026
0a761d3
docs: rename flatland4 to ecml2026.
chenkins Apr 13, 2026
d42903a
docs: add timeline and supported Flatland versions.
chenkins Apr 13, 2026
d2d2a11
Apply suggestions from code review
chenkins Apr 13, 2026
84b9b8c
docs: add normalization description
CleverManu Apr 20, 2026
033cfc1
docs: add normalization description for DefaultReward
CleverManu Apr 20, 2026
988445b
docs: refine reward description
CleverManu Apr 20, 2026
8961990
docs: add technical table on evaluation constraints.
chenkins Apr 21, 2026
5e783b8
docs: add technical table on evaluation constraints.
chenkins Apr 21, 2026
6352eaa
Update link to ecml2026-starterkit.
chenkins Apr 23, 2026
c4318ad
docs: add daily submission limit.
chenkins Apr 23, 2026
39ab20f
docs: add reference to ECML2026Rewards class.
chenkins Apr 24, 2026
7aab989
docs: update level description
CleverManu Apr 27, 2026
cabdc6f
docs: consolidate incosistency
CleverManu Apr 27, 2026
312ceaa
docs(ecml2026): update limits.
chenkins Apr 27, 2026
aae4403
docs(ecml2026): update limits.
chenkins Apr 29, 2026
d937701
docs: update level config
CleverManu Apr 29, 2026
b73bd88
docs(ecml2026): clarification on handling of termination causes.
chenkins Apr 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,10 @@ parts:
## CHALLENGES
- caption: Challenges
chapters:
- file: challenges/ecml2026
sections:
- file: challenges/ecml2026/eval
- file: challenges/ecml2026/levelconfig
- file: challenges/flatland-benchmarks
- file: challenges/flatland3
sections:
Expand Down
25 changes: 25 additions & 0 deletions challenges/ecml2026.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
ECML 2026
=========

The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is the newest competition around the **Flat**land environment.

<!-- ![Flatland](../assets/images/flatland_wide.png) -->

- Follow the [starterkit](https://github.com/flatland-association/ecml2026-starterkit) to make your first submission.
- Read about the [evaluations metrics](ecml2026/eval) of this edition.
- Read about the [level configurations](ecml2026/levelconfig) of this edition.

⏱ Timeline
--------

* Competition start: May 4th, 2026
* Submission closure: June 8th, 2026 (AoE)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* Winner announcement: June 15th, 2026





⭐ Supported Flatland Versions
-----------------------------
You must use Flatland version [4.2.5](https://github.com/flatland-association/flatland-rl/releases/tag/v4.2.5) (forthcoming).
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 changes: 100 additions & 0 deletions challenges/ecml2026/eval.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
Evaluation Metrics
===

The **[Flatland 4 challenge](https://fab.flatland.cloud/suites/24ab2336-a407-4329-b781-d71846250e24)** is the newest competition around the Flatland
environment.

In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The scenario setup and the evaluation
metrics are designed accordingly. However, we are still open for other solutions as well, e.g. operations research, and encourage participants to benchmark
their state-of-the art algorithms


⚖ Evaluation metrics
---

### Normalized Episode Rewards

The primary metrics uses the **normalized return** from your agents - the higher the better.

What is the **normalized return**?

- The **returns** are the sum of Flatland's default rewards your agents accumulate during each episode as described
in [rewards.md](../../environment/environment/rewards.md)
- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. To guarantee this, the maximum penalty per agent can be at most
```max_episode_steps```. This normalized rewards allows to compare results between environments of different dimensions and different number of agents.

In code:

```python
normalized_reward = sum([max(cumulative_rewards[agent.handle], - self.env.max_episode_steps) for agent in agents]) / (
self.env.max_episode_steps * self.env.get_num_agents()) + 1
```

The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached. Therefore:
Comment thread
CleverManu marked this conversation as resolved.

- The **minimum possible value** (i.e. worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode.
- The **maximum possible value** (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets and intermediate stops on time, i.e.
not receive any penalty.

### Submission Score

The submission score is the sum of the normalized scenario rewards.

Evaluation is stopped when a submission does not reach the threshold of 25% completed agents within a level (5 scenarios).

### Factors in reward function

The factors for the [reward function](../../environment/environment/rewards.md) in this competition are:

| factor | value |
|-------------------------------------------|:-----:|
| journey not started (cancellation factor) | 5 |
| cancellation time buffer | 0 |
| delay at target | 1 |
| target not reached minimum penalty | 100 |
| intermediate stop not served | 50 |
| intermediate late arrival | 0.5 |
| intermediate early departure | 0.5 |
| collision | 250 |
Comment thread
CleverManu marked this conversation as resolved.

This configuration is implemented using `--rewards flatland.envs.rewards.ECML2026Rewards`.

⛽ Time and Resource limits
---

The agents have to act within **time limits**:

- You are allowed up to 30 minutes per scenario.
- The full evaluation must finish in 4 hours.

The agents are evaluated in a container with **resource limits**

- 4 CPU cores
- 15 GB of main memory.

We do not provide GPUs.

### Detailed overview over resource limits

| Limit[^1] | Value | Submission outcome | Details |
|---------------------------------------------|------------------------------------------------------------------------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. |
| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. |
| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now |
| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario; evaluation terminated; results do notexcl. the overlong scenario |
| `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation; results do not include the overlong scenario |
| `ACTIVE_DEADLINE_SECONDS` | `600` (1h) | Failure/cleanup | everything including technical overhead for starting pods for submission |
| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low`; results do include the test, but stop after the test. |
| `ORCHESTRATION_JOB_K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "5Gi", "cpu": "1"}, "limits": {"memory": "5Gi", "cpu": "1"}}` | Failure | resource limits for pod running the submission |
| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission |
| `ORCHESTRATION_JOB_ACTIVE_DEADLINE_SECONDS` | `28800` (8h) | Failure/cleanup | everything including technical overhead for starting pods for orchestration and evaluation |

[^1]: see [implementation](https://github.com/flatland-association/flatland-benchmarks/pull/594/changes)



📪 Daily Submission Limits and Submission Closure.
---
You can submit up to 2 times per day.


12 changes: 12 additions & 0 deletions challenges/ecml2026/levelconfig.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Level Configurations

| level | #scenarios | number of agents | max. number of intermediate stops | properties | malfunctions |
|---------|:----------:|----------------------------|:---------------------------------:|--------------------------------------------------------------------------------------------------|-------------------------------------------------|
| level_0 | 5 | {8,11,14,26,28}​ | {3,3,4,6,6} | One train per Line starting at t=0 | None |
| level_1 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Multiple trains per Line, different starting times, larger travel factor (more time for journey) | None |
| level_2 | 5 | {90,125,150,300,532}​ | {3,3,4,6,6} | More trains, tighter schedules (periodicity & travel factor) | None |
| level_3 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Like level 1 but with | Breakdowns |
| level_4 | 5 | {90,125,150,300,532}​ | {3,3,4,6,6} | Like level 2 but with | Breakdowns and departure delays |
| level_5 | 5 | {90,125,150,300,532}​ | {3,3,4,6,6} | Like level 4 but with more severe malfunctions (more frequent & longer)​ | Breakdowns and departure delays |
| level_6 | 5 | {532,532,532,532,532}​ | {6,6,6,6,6} | Full map only, progressively more malfunctions (including infrastructure disruptions) | Breakdowns, departure delays and infrastructure |

26 changes: 14 additions & 12 deletions environment/environment/rewards.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ g_i = &
% journey not started:
+ \underbrace{(1 - \Delta_1) \cdot \phi \cdot (-(p + \pi))}_{\text{journey not started}}
% target not reached:
+ \underbrace{(1 - \mathrm{A}_J) \cdot (-d)}_{\text{target not reached}}\\
+ \underbrace{(1 - \mathrm{A}_J) \cdot \min\{-\nu, -d\}}_{\text{target not reached}}\\
& + \sum_{j=2}^{J-1} \Big[
% intermediate late arrival
\underbrace{\mathrm{A}_j \cdot \alpha \cdot \min \{\alpha_j - a_j,0\}}_{\text{late arrival}}
Expand All @@ -55,22 +55,24 @@ g_i = &
$$

where $J$ is the number of stops (including the departure at the start, as well as the target) and $T$ is the number of timesteps of the episode.
The symbols are described in Table~\ref{tab:events}.

| | penalty factor <br/>($\geq 0$) | event <br/> $\in \{0,1\}$ | scheduled | actual | description |
|:-----------------------------|--------------------------------|---------------------------|------------|--------|-----------------------------------------------------------------------------------------------------------|
| delay at target | 1 | $\mathrm{A}_J$ | $\alpha_J$ | $a_J$ | $\mathrm{A}_J$ latest arrival and $a_J$ actual arrival at target $J$ |
| journey not started | $\phi$, $\pi$ | $1-\Delta_1$ | | $p$ | cancellation factor $\phi$ and buffer $\pi$, <br/> $p$ is the shortest path from start to target |
| target not reached | 1 | $1-\mathrm{A}_J$ | | $d$ | time $d$ remaining on shortest path towards target |
| intermediate late arrival | $\alpha$ | $\mathrm{A}_j$ | $\alpha_j$ | $a_j$ | latest arrival $\mathrm{A}_j$, actual arrival time $a_j$ <br/> at intermediate stop $j=2,\ldots,J-1$ |
| intermediate stop not served | $\mu$ | $1-\mathrm{A}_j$ | | | intermediate stop $j$ not served, $j=2,\ldots,J-1$ |
| intermediate early departure | $\delta$ | $\Delta_j$ | $\delta_j$ | $d_j$ | earliest departure from stop $j$, actual departure time $d_j$ <br/> at intermediate stop $j=2,\ldots,J-1$ |
| collision | $\kappa$ | $\mathrm{K}_t$ | | $v(t)$ | collision at time $t$ with speed $v(t)$ |
The symbols are described in the table below.

| | penalty factor <br/>($\geq 0$) | event <br/> $\in \{0,1\}$ | scheduled | actual | description |
|:-----------------------------|--------------------------------|---------------------------|------------|------------|-----------------------------------------------------------------------------------------------------------|
| delay at target | 1 | $\mathrm{A}_J$ | $\alpha_J$ | $a_J$ | $\mathrm{A}_J$ latest arrival and $a_J$ actual arrival at target $J$ |
| journey not started | $\phi$, $\pi$ | $1-\Delta_1$ | | $p$ | cancellation factor $\phi$ and buffer $\pi$, <br/> $p$ is the shortest path from start to target |
| target not reached | 1 | $1-\mathrm{A}_J$ | | $d$, $\nu$ | time $d$ remaining on shortest path to target with maximum speed for corresponding train category, or minimum penalty for target not reached $\nu$ if $d < \nu$ |
| intermediate late arrival | $\alpha$ | $\mathrm{A}_j$ | $\alpha_j$ | $a_j$ | latest arrival $\mathrm{A}_j$, actual arrival time $a_j$ <br/> at intermediate stop $j=2,\ldots,J-1$ |
| intermediate stop not served | $\mu$ | $1-\mathrm{A}_j$ | | | intermediate stop $j$ not served, $j=2,\ldots,J-1$ |
| intermediate early departure | $\delta$ | $\Delta_j$ | $\delta_j$ | $d_j$ | earliest departure from stop $j$, actual departure time $d_j$ <br/> at intermediate stop $j=2,\ldots,J-1$ |
| collision | $\kappa$ | $\mathrm{K}_t$ | | $v(t)$ | collision at time $t$ with speed $v(t)$ |

Note that the simulation enforces that agents cannot start earlier than $\delta_1$ at their start. On the other hand, early departure at intermediate stops is
not enforced by the simulation, but will be penalized by the rewards function.
Also note that order of intermediate stops is also not enforced by the simulation in case of overlapping time windows.

To to compare results between environments of different dimensions and different number of agents, the rward can be normalized, such that the normalized reward is in the range $[0.0, 1.0]$. For each agent there is a maximum penalty set at ```- max_episode_steps```. This guarantees normalization regardless of the parametrization of the rewards.

```{admonition} Code reference
The reward is calculated in [envs/rewards.py](https://github.com/flatland-association/flatland-rl/blob/main/flatland/envs/rewards.py)
```
Expand Down
Loading