From 50ab17287cc5ed8f1bd81300749a8cad2e05c818 Mon Sep 17 00:00:00 2001 From: chenkins Date: Mon, 30 Mar 2026 10:36:03 +0200 Subject: [PATCH 01/28] docs: Flatland4 Railway Competition. --- _toc.yml | 3 ++ challenges/flatland4.md | 9 ++++++ challenges/flatland4/eval.md | 54 ++++++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+) create mode 100644 challenges/flatland4.md create mode 100644 challenges/flatland4/eval.md diff --git a/_toc.yml b/_toc.yml index 4cd2c42..2ef9ef7 100755 --- a/_toc.yml +++ b/_toc.yml @@ -54,6 +54,9 @@ parts: ## CHALLENGES - caption: Challenges chapters: + - file: challenges/flatland4 + sections: + - file: challenges/flatland3/eval - file: challenges/flatland-benchmarks - file: challenges/flatland3 sections: diff --git a/challenges/flatland4.md b/challenges/flatland4.md new file mode 100644 index 0000000..011e586 --- /dev/null +++ b/challenges/flatland4.md @@ -0,0 +1,9 @@ +Flatland 4 +========== + +The **[Flatland 4 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is the newest competition around the **Flat**land environment. + + + +- Follow the [starterkit](https://github.com/flatland-association/flatland-benchmarks-f3-starterkit) to make your first submission. +- Read about the [evaluations metrics](flatland3/eval) of this edition. \ No newline at end of file diff --git a/challenges/flatland4/eval.md b/challenges/flatland4/eval.md new file mode 100644 index 0000000..d822259 --- /dev/null +++ b/challenges/flatland4/eval.md @@ -0,0 +1,54 @@ +Evaluation Metrics +=== + +The **[Flatland 3 challenge](https://www.aicrowd.com/challenges/flatland-3)** is the newest competition around the Flatland environment. + +In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The evaluation metrics and prize +distribution are designed accordingly. + + +⚖ Evaluation metrics +--- + +### Normalized Episode Rewards + +In this edition, the primary metrics use the **normalized return** from your agents - the higher the better. + +What is the **normalized return**? + +- The **returns** are the sum of Flatland's default rewards your agents accumulate during each episode as described + in [rewards.md](../../environment/environment/rewards.md) +- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. This makes it possible to compare results between environments of + different dimensions. + +In code: + +```python +normalized_reward = (cumulative_reward / (self.env._max_episode_steps * self.env.get_num_agents())) + 1 +``` + +The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached. Therefore: + +- The **minimum possible value** (ie worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode. +- The **maximum possible value** (ie best possible) is 1.0, which would occur if all the agents would reach their targets in one time step, which is generally + not achievable. + +### Submission Score + +The submission score is the sum of the normalized episode rewards. + +Evaluation is stopped when a submission does not reach the threshold of 25% completed agents within a test (10 scenarios). + +⏱ Time and Resource limits +--- + +The agents have to act within **time limits**: + +- You are allowed up to 30 minutes per episode. +- The full evaluation must finish in 2 hours. + +The agents are evaluated in a container with **resource limits** + +- 4 CPU cores +- 15 GB of main memory. + We do not provide GPUs. \ No newline at end of file From 5853eb99572f3e873780b5e5a0ff551603066011 Mon Sep 17 00:00:00 2001 From: chenkins Date: Mon, 30 Mar 2026 10:43:09 +0200 Subject: [PATCH 02/28] docs: Flatland4 Railway Competition. --- _toc.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_toc.yml b/_toc.yml index 2ef9ef7..0509985 100755 --- a/_toc.yml +++ b/_toc.yml @@ -56,7 +56,7 @@ parts: chapters: - file: challenges/flatland4 sections: - - file: challenges/flatland3/eval + - file: challenges/flatland4/eval - file: challenges/flatland-benchmarks - file: challenges/flatland3 sections: From cd274e04f88d2a0ff61bab42ce946f1e961b0a89 Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Tue, 31 Mar 2026 15:02:12 +0200 Subject: [PATCH 03/28] docs: Flatland4 Railway Competition --- challenges/flatland4/eval.md | 34 ++++++++++++++++++++--------- challenges/flatland4/levelconfig.md | 12 ++++++++++ 2 files changed, 36 insertions(+), 10 deletions(-) create mode 100644 challenges/flatland4/levelconfig.md diff --git a/challenges/flatland4/eval.md b/challenges/flatland4/eval.md index d822259..8b5d319 100644 --- a/challenges/flatland4/eval.md +++ b/challenges/flatland4/eval.md @@ -1,10 +1,9 @@ Evaluation Metrics === -The **[Flatland 3 challenge](https://www.aicrowd.com/challenges/flatland-3)** is the newest competition around the Flatland environment. +The **[Flatland 4 challenge](https://fab.flatland.cloud/suites/24ab2336-a407-4329-b781-d71846250e24)** is the newest competition around the Flatland environment. -In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The evaluation metrics and prize -distribution are designed accordingly. +In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The scenario setup and the evaluation metrics are designed accordingly. However, we are still open for other solutions as well, e.g. operations research, and encourage participants to benchmark their state-of-the art algorithms ⚖ Evaluation metrics @@ -12,14 +11,13 @@ distribution are designed accordingly. ### Normalized Episode Rewards -In this edition, the primary metrics use the **normalized return** from your agents - the higher the better. +The primary metrics uses the **normalized return** from your agents - the higher the better. What is the **normalized return**? - The **returns** are the sum of Flatland's default rewards your agents accumulate during each episode as described in [rewards.md](../../environment/environment/rewards.md) -- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. This makes it possible to compare results between environments of - different dimensions. +- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. This makes it possible to compare results between environments of different dimensions and different number of agents. In code: @@ -29,15 +27,31 @@ normalized_reward = (cumulative_reward / (self.env._max_episode_steps * self.env The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached. Therefore: -- The **minimum possible value** (ie worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode. -- The **maximum possible value** (ie best possible) is 1.0, which would occur if all the agents would reach their targets in one time step, which is generally +- The **minimum possible value** (i.e. worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode. +- The **maximum possible value** (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets in one time step, which is generally not achievable. ### Submission Score -The submission score is the sum of the normalized episode rewards. +The submission score is the sum of the normalized scenario rewards. + +Evaluation is stopped when a submission does not reach the threshold of 25% completed agents within a level (5 scenarios). + + +### Factors in reward function + +The factors for the [reward function](../../environment/environment/rewards.md) in this competition are: + +| factor | value | +|------------------------------------|:-----:| +| journey not started (cancellation) | 1 | +| cancellation time buffer | 0 | +| delay at target | 1 | +| intermediate stop not served | 15 | +| intermediate late arrival | 0.5 | +| intermediate early departure | 0.5 | +| collision | 100 | -Evaluation is stopped when a submission does not reach the threshold of 25% completed agents within a test (10 scenarios). ⏱ Time and Resource limits --- diff --git a/challenges/flatland4/levelconfig.md b/challenges/flatland4/levelconfig.md new file mode 100644 index 0000000..bc45088 --- /dev/null +++ b/challenges/flatland4/levelconfig.md @@ -0,0 +1,12 @@ +# Level Configurations + +| level | #scenarios | properties | malfunctions | +|---------|:---:|--------------------------------|---------------------------| +| level_0 | 5 | One train per Line starting at t=0 | None | +| level_1 | 5 | Multiple trains per Line, different starting times, larger travel factor (more time for journey)​ | None | +| level_2 | 5 | More trains, tighter schedules (periodicity & travel factor)​​ | None | +| level_3 | 5 | Like level 1 but with malfunctions​​​ | Breakdowns | +| level_4 | 5 | Like level 2 but with malfunctions​​​ | Breakdowns and departure delays | +| level_5 | 5 | Like level 4 but with more severe malfunctions (more frequent & longer)​ | Breakdowns and departure delays | +| level_6 | 5 | Full map only, progressively more malfunctions (including infrastructure disruptions)​ | Breakdowns, departure delays and infrastructure | + From 21bbaf961de05ab3af18f0d72a39b90d2bba056c Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Mon, 13 Apr 2026 10:21:44 +0200 Subject: [PATCH 04/28] docs: updated reward function with min penalty for delay at target --- environment/environment/rewards.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/environment/environment/rewards.md b/environment/environment/rewards.md index d719f66..bf73bba 100644 --- a/environment/environment/rewards.md +++ b/environment/environment/rewards.md @@ -41,7 +41,7 @@ g_i = & % journey not started: + \underbrace{(1 - \Delta_1) \cdot \phi \cdot (-(p + \pi))}_{\text{journey not started}} % target not reached: -+ \underbrace{(1 - \mathrm{A}_J) \cdot (-d)}_{\text{target not reached}}\\ ++ \underbrace{(1 - \mathrm{A}_J) \cdot \min\{-\nu, -d\}}_{\text{target not reached}}\\ & + \sum_{j=2}^{J-1} \Big[ % intermediate late arrival \underbrace{\mathrm{A}_j \cdot \alpha \cdot \min \{\alpha_j - a_j,0\}}_{\text{late arrival}} @@ -55,17 +55,17 @@ g_i = & $$ where $J$ is the number of stops (including the departure at the start, as well as the target) and $T$ is the number of timesteps of the episode. -The symbols are described in Table~\ref{tab:events}. - -| | penalty factor
($\geq 0$) | event
$\in \{0,1\}$ | scheduled | actual | description | -|:-----------------------------|--------------------------------|---------------------------|------------|--------|-----------------------------------------------------------------------------------------------------------| -| delay at target | 1 | $\mathrm{A}_J$ | $\alpha_J$ | $a_J$ | $\mathrm{A}_J$ latest arrival and $a_J$ actual arrival at target $J$ | -| journey not started | $\phi$, $\pi$ | $1-\Delta_1$ | | $p$ | cancellation factor $\phi$ and buffer $\pi$,
$p$ is the shortest path from start to target | -| target not reached | 1 | $1-\mathrm{A}_J$ | | $d$ | time $d$ remaining on shortest path towards target | -| intermediate late arrival | $\alpha$ | $\mathrm{A}_j$ | $\alpha_j$ | $a_j$ | latest arrival $\mathrm{A}_j$, actual arrival time $a_j$
at intermediate stop $j=2,\ldots,J-1$ | -| intermediate stop not served | $\mu$ | $1-\mathrm{A}_j$ | | | intermediate stop $j$ not served, $j=2,\ldots,J-1$ | -| intermediate early departure | $\delta$ | $\Delta_j$ | $\delta_j$ | $d_j$ | earliest departure from stop $j$, actual departure time $d_j$
at intermediate stop $j=2,\ldots,J-1$ | -| collision | $\kappa$ | $\mathrm{K}_t$ | | $v(t)$ | collision at time $t$ with speed $v(t)$ | +The symbols are described in the table below. + +| | penalty factor
($\geq 0$) | event
$\in \{0,1\}$ | scheduled | actual | description | +|:-----------------------------|--------------------------------|---------------------------|------------|------------|-----------------------------------------------------------------------------------------------------------| +| delay at target | 1 | $\mathrm{A}_J$ | $\alpha_J$ | $a_J$ | $\mathrm{A}_J$ latest arrival and $a_J$ actual arrival at target $J$ | +| journey not started | $\phi$, $\pi$ | $1-\Delta_1$ | | $p$ | cancellation factor $\phi$ and buffer $\pi$,
$p$ is the shortest path from start to target | +| target not reached | 1 | $1-\mathrm{A}_J$ | | $d$, $\nu$ | time $d$ remaining on shortest path towards target or minimum penalty for target not reached $\nu$ if $d < \nu$ | +| intermediate late arrival | $\alpha$ | $\mathrm{A}_j$ | $\alpha_j$ | $a_j$ | latest arrival $\mathrm{A}_j$, actual arrival time $a_j$
at intermediate stop $j=2,\ldots,J-1$ | +| intermediate stop not served | $\mu$ | $1-\mathrm{A}_j$ | | | intermediate stop $j$ not served, $j=2,\ldots,J-1$ | +| intermediate early departure | $\delta$ | $\Delta_j$ | $\delta_j$ | $d_j$ | earliest departure from stop $j$, actual departure time $d_j$
at intermediate stop $j=2,\ldots,J-1$ | +| collision | $\kappa$ | $\mathrm{K}_t$ | | $v(t)$ | collision at time $t$ with speed $v(t)$ | Note that the simulation enforces that agents cannot start earlier than $\delta_1$ at their start. On the other hand, early departure at intermediate stops is not enforced by the simulation, but will be penalized by the rewards function. From 48f04bbcf98982ea8f7148753cdefb2ed7723dc7 Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Mon, 13 Apr 2026 10:27:34 +0200 Subject: [PATCH 05/28] docs: updated reward parametrization --- challenges/flatland4/eval.md | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/challenges/flatland4/eval.md b/challenges/flatland4/eval.md index 8b5d319..6bf1b65 100644 --- a/challenges/flatland4/eval.md +++ b/challenges/flatland4/eval.md @@ -42,15 +42,16 @@ Evaluation is stopped when a submission does not reach the threshold of 25% comp The factors for the [reward function](../../environment/environment/rewards.md) in this competition are: -| factor | value | -|------------------------------------|:-----:| -| journey not started (cancellation) | 1 | -| cancellation time buffer | 0 | -| delay at target | 1 | -| intermediate stop not served | 15 | -| intermediate late arrival | 0.5 | -| intermediate early departure | 0.5 | -| collision | 100 | +| factor | value | +|-------------------------------------------|:-----:| +| journey not started (cancellation factor) | 5 | +| cancellation time buffer | 0 | +| delay at target | 1 | +| target not reached minimum penalty | 100 | +| intermediate stop not served | 50 | +| intermediate late arrival | 0.5 | +| intermediate early departure | 0.5 | +| collision | 250 | ⏱ Time and Resource limits @@ -58,8 +59,8 @@ The factors for the [reward function](../../environment/environment/rewards.md) The agents have to act within **time limits**: -- You are allowed up to 30 minutes per episode. -- The full evaluation must finish in 2 hours. +- You are allowed up to 30 minutes per scenario. +- The full evaluation must finish in 4 hours. The agents are evaluated in a container with **resource limits** From 3e01698801d7ac808792fff34f5a574c864b4c4b Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Mon, 13 Apr 2026 10:32:25 +0200 Subject: [PATCH 06/28] docs: cleanup table --- challenges/flatland4/levelconfig.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/challenges/flatland4/levelconfig.md b/challenges/flatland4/levelconfig.md index bc45088..5e33957 100644 --- a/challenges/flatland4/levelconfig.md +++ b/challenges/flatland4/levelconfig.md @@ -1,12 +1,12 @@ # Level Configurations -| level | #scenarios | properties | malfunctions | -|---------|:---:|--------------------------------|---------------------------| -| level_0 | 5 | One train per Line starting at t=0 | None | -| level_1 | 5 | Multiple trains per Line, different starting times, larger travel factor (more time for journey)​ | None | -| level_2 | 5 | More trains, tighter schedules (periodicity & travel factor)​​ | None | -| level_3 | 5 | Like level 1 but with malfunctions​​​ | Breakdowns | -| level_4 | 5 | Like level 2 but with malfunctions​​​ | Breakdowns and departure delays | -| level_5 | 5 | Like level 4 but with more severe malfunctions (more frequent & longer)​ | Breakdowns and departure delays | -| level_6 | 5 | Full map only, progressively more malfunctions (including infrastructure disruptions)​ | Breakdowns, departure delays and infrastructure | +| level | #scenarios | properties | malfunctions | +|---------|:----------:|--------------------------------------------------------------------------------------------------|-------------------------------------------------| +| level_0 | 5 | One train per Line starting at t=0 | None | +| level_1 | 5 | Multiple trains per Line, different starting times, larger travel factor (more time for journey)​ | None | +| level_2 | 5 | More trains, tighter schedules (periodicity & travel factor)​​ | None | +| level_3 | 5 | Like level 1 but with malfunctions​​​ | Breakdowns | +| level_4 | 5 | Like level 2 but with malfunctions​​​ | Breakdowns and departure delays | +| level_5 | 5 | Like level 4 but with more severe malfunctions (more frequent & longer)​ | Breakdowns and departure delays | +| level_6 | 5 | Full map only, progressively more malfunctions (including infrastructure disruptions)​ | Breakdowns, departure delays and infrastructure | From 0a761d301a6d3e6e5df15ef03105a26c18c40484 Mon Sep 17 00:00:00 2001 From: chenkins Date: Mon, 13 Apr 2026 10:54:28 +0200 Subject: [PATCH 07/28] docs: rename flatland4 to ecml2026. --- _toc.yml | 5 +++-- challenges/ecml2026.md | 10 ++++++++++ challenges/{flatland4 => ecml2026}/eval.md | 0 challenges/ecml2026/levelconfig.md | 12 ++++++++++++ challenges/flatland4.md | 9 --------- challenges/flatland4/levelconfig.md | 12 ------------ 6 files changed, 25 insertions(+), 23 deletions(-) create mode 100644 challenges/ecml2026.md rename challenges/{flatland4 => ecml2026}/eval.md (100%) create mode 100644 challenges/ecml2026/levelconfig.md delete mode 100644 challenges/flatland4.md delete mode 100644 challenges/flatland4/levelconfig.md diff --git a/_toc.yml b/_toc.yml index 0509985..c728849 100755 --- a/_toc.yml +++ b/_toc.yml @@ -54,9 +54,10 @@ parts: ## CHALLENGES - caption: Challenges chapters: - - file: challenges/flatland4 + - file: challenges/ecml2026 sections: - - file: challenges/flatland4/eval + - file: challenges/ecml2026/eval + - file: challenges/ecml2026/levelconfig - file: challenges/flatland-benchmarks - file: challenges/flatland3 sections: diff --git a/challenges/ecml2026.md b/challenges/ecml2026.md new file mode 100644 index 0000000..6fc5655 --- /dev/null +++ b/challenges/ecml2026.md @@ -0,0 +1,10 @@ +ECML 2026 +========= + +The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is the newest competition around the **Flat**land environment. + + + +- Follow the [starterkit](https://github.com/flatland-association/flatland-benchmarks-f3-starterkit) to make your first submission. +- Read about the [evaluations metrics](ecml2026/eval) of this edition. +- Read about the [level configurations](ecml2026/levelconfig) of this edition. \ No newline at end of file diff --git a/challenges/flatland4/eval.md b/challenges/ecml2026/eval.md similarity index 100% rename from challenges/flatland4/eval.md rename to challenges/ecml2026/eval.md diff --git a/challenges/ecml2026/levelconfig.md b/challenges/ecml2026/levelconfig.md new file mode 100644 index 0000000..2244ab4 --- /dev/null +++ b/challenges/ecml2026/levelconfig.md @@ -0,0 +1,12 @@ +# Level Configurations + +| level | #scenarios | properties | malfunctions | +|---------|:----------:|--------------------------------------------------------------------------------------------------|-------------------------------------------------| +| level_0 | 5 | One train per Line starting at t=0 | None | +| level_1 | 5 | Multiple trains per Line, different starting times, larger travel factor (more time for journey) | None | +| level_2 | 5 | More trains, tighter schedules (periodicity & travel factor) | None | +| level_3 | 5 | Like level 1 but with | Breakdowns | +| level_4 | 5 | Like level 2 but with | Breakdowns and departure delays | +| level_5 | 5 | Like level 4 but with more severe malfunctions (more frequent & longer) | Breakdowns and departure delays | +| level_6 | 5 | Full map only, progressively more malfunctions (including infrastructure disruptions) | Breakdowns, departure delays and infrastructure | + diff --git a/challenges/flatland4.md b/challenges/flatland4.md deleted file mode 100644 index 011e586..0000000 --- a/challenges/flatland4.md +++ /dev/null @@ -1,9 +0,0 @@ -Flatland 4 -========== - -The **[Flatland 4 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is the newest competition around the **Flat**land environment. - - - -- Follow the [starterkit](https://github.com/flatland-association/flatland-benchmarks-f3-starterkit) to make your first submission. -- Read about the [evaluations metrics](flatland3/eval) of this edition. \ No newline at end of file diff --git a/challenges/flatland4/levelconfig.md b/challenges/flatland4/levelconfig.md deleted file mode 100644 index 5e33957..0000000 --- a/challenges/flatland4/levelconfig.md +++ /dev/null @@ -1,12 +0,0 @@ -# Level Configurations - -| level | #scenarios | properties | malfunctions | -|---------|:----------:|--------------------------------------------------------------------------------------------------|-------------------------------------------------| -| level_0 | 5 | One train per Line starting at t=0 | None | -| level_1 | 5 | Multiple trains per Line, different starting times, larger travel factor (more time for journey)​ | None | -| level_2 | 5 | More trains, tighter schedules (periodicity & travel factor)​​ | None | -| level_3 | 5 | Like level 1 but with malfunctions​​​ | Breakdowns | -| level_4 | 5 | Like level 2 but with malfunctions​​​ | Breakdowns and departure delays | -| level_5 | 5 | Like level 4 but with more severe malfunctions (more frequent & longer)​ | Breakdowns and departure delays | -| level_6 | 5 | Full map only, progressively more malfunctions (including infrastructure disruptions)​ | Breakdowns, departure delays and infrastructure | - From d42903a464feeb2a9e58938f556f1faaf9f8901f Mon Sep 17 00:00:00 2001 From: chenkins Date: Mon, 13 Apr 2026 11:08:56 +0200 Subject: [PATCH 08/28] docs: add timeline and supported Flatland versions. --- challenges/ecml2026.md | 34 +++++++++++++++++++++++++++++++++- challenges/ecml2026/eval.md | 33 +++++++++++---------------------- 2 files changed, 44 insertions(+), 23 deletions(-) diff --git a/challenges/ecml2026.md b/challenges/ecml2026.md index 6fc5655..7a3136b 100644 --- a/challenges/ecml2026.md +++ b/challenges/ecml2026.md @@ -7,4 +7,36 @@ The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is - Follow the [starterkit](https://github.com/flatland-association/flatland-benchmarks-f3-starterkit) to make your first submission. - Read about the [evaluations metrics](ecml2026/eval) of this edition. -- Read about the [level configurations](ecml2026/levelconfig) of this edition. \ No newline at end of file +- Read about the [level configurations](ecml2026/levelconfig) of this edition. + +⏱ Timeline +-------- + +* Competition start: May 4th, 2026 +* Sumission closure:June 8th, 2026 (AoE) +* Winner announcement: June 15th, 2026 + +⛽ Time and Resource limits +--- + +The agents have to act within **time limits**: + +- You are allowed up to 30 minutes per scenario. +- The full evaluation must finish in 4 hours. + +The agents are evaluated in a container with **resource limits** + +- 4 CPU cores +- 15 GB of main memory. + +We do not provide GPUs. + +📪 Daily Submission Limits and Submission Closure. +--- +You can submit up to 2 times per day. + + + +⭐ Supported Flatland Versions +----------------------------- +You must use Flatland version [4.2.5](https://github.com/flatland-association/flatland-rl/releases/tag/v4.2.5) (forthcoming). \ No newline at end of file diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index 6bf1b65..a344440 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -1,9 +1,12 @@ Evaluation Metrics === -The **[Flatland 4 challenge](https://fab.flatland.cloud/suites/24ab2336-a407-4329-b781-d71846250e24)** is the newest competition around the Flatland environment. +The **[Flatland 4 challenge](https://fab.flatland.cloud/suites/24ab2336-a407-4329-b781-d71846250e24)** is the newest competition around the Flatland +environment. -In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The scenario setup and the evaluation metrics are designed accordingly. However, we are still open for other solutions as well, e.g. operations research, and encourage participants to benchmark their state-of-the art algorithms +In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The scenario setup and the evaluation +metrics are designed accordingly. However, we are still open for other solutions as well, e.g. operations research, and encourage participants to benchmark +their state-of-the art algorithms ⚖ Evaluation metrics @@ -17,7 +20,8 @@ What is the **normalized return**? - The **returns** are the sum of Flatland's default rewards your agents accumulate during each episode as described in [rewards.md](../../environment/environment/rewards.md) -- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. This makes it possible to compare results between environments of different dimensions and different number of agents. +- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. This makes it possible to compare results between environments of + different dimensions and different number of agents. In code: @@ -37,7 +41,6 @@ The submission score is the sum of the normalized scenario rewards. Evaluation is stopped when a submission does not reach the threshold of 25% completed agents within a level (5 scenarios). - ### Factors in reward function The factors for the [reward function](../../environment/environment/rewards.md) in this competition are: @@ -47,23 +50,9 @@ The factors for the [reward function](../../environment/environment/rewards.md) | journey not started (cancellation factor) | 5 | | cancellation time buffer | 0 | | delay at target | 1 | -| target not reached minimum penalty | 100 | +| target not reached minimum penalty | 100 | | intermediate stop not served | 50 | -| intermediate late arrival | 0.5 | -| intermediate early departure | 0.5 | -| collision | 250 | - - -⏱ Time and Resource limits ---- - -The agents have to act within **time limits**: - -- You are allowed up to 30 minutes per scenario. -- The full evaluation must finish in 4 hours. - -The agents are evaluated in a container with **resource limits** +| intermediate late arrival | 0.5 | +| intermediate early departure | 0.5 | +| collision | 250 | -- 4 CPU cores -- 15 GB of main memory. - We do not provide GPUs. \ No newline at end of file From d2d2a117b279d1f407919421098f94edc74a0222 Mon Sep 17 00:00:00 2001 From: Christian Eichenberger Date: Mon, 13 Apr 2026 11:10:38 +0200 Subject: [PATCH 09/28] Apply suggestions from code review Co-authored-by: Christian Eichenberger --- challenges/ecml2026.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/challenges/ecml2026.md b/challenges/ecml2026.md index 7a3136b..acc88eb 100644 --- a/challenges/ecml2026.md +++ b/challenges/ecml2026.md @@ -13,7 +13,7 @@ The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is -------- * Competition start: May 4th, 2026 -* Sumission closure:June 8th, 2026 (AoE) +* Submission closure: June 8th, 2026 (AoE) * Winner announcement: June 15th, 2026 ⛽ Time and Resource limits From 84b9b8ce75f59d780b4c05ea641b1984a286598f Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Mon, 20 Apr 2026 10:50:08 +0200 Subject: [PATCH 10/28] docs: add normalization description --- challenges/ecml2026/eval.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index a344440..ab8eb81 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -20,20 +20,18 @@ What is the **normalized return**? - The **returns** are the sum of Flatland's default rewards your agents accumulate during each episode as described in [rewards.md](../../environment/environment/rewards.md) -- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. This makes it possible to compare results between environments of - different dimensions and different number of agents. +- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. To guarantee this, the maximum penalty per agent can be at most ```max_episode_steps```. This normalized rewards allows to compare results between environments of different dimensions and different number of agents. In code: ```python -normalized_reward = (cumulative_reward / (self.env._max_episode_steps * self.env.get_num_agents())) + 1 +normalized_reward = sum([max(cumulative_rewards[agent.handle], - self.env.max_episode_steps) for agent in agents]) / (self.env.max_episode_steps * self.env.get_num_agents()) + 1 ``` The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached. Therefore: - The **minimum possible value** (i.e. worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode. -- The **maximum possible value** (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets in one time step, which is generally - not achievable. +- The **maximum possible value** (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets and intermediate stops on time, i.e. not receive any penalty. ### Submission Score @@ -55,4 +53,3 @@ The factors for the [reward function](../../environment/environment/rewards.md) | intermediate late arrival | 0.5 | | intermediate early departure | 0.5 | | collision | 250 | - From 033cfc12f4fed1946ddf51ab51c8e6dc97261c9c Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Mon, 20 Apr 2026 11:32:06 +0200 Subject: [PATCH 11/28] docs: add normalization description for DefaultReward --- environment/environment/rewards.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/environment/environment/rewards.md b/environment/environment/rewards.md index bf73bba..231e0ef 100644 --- a/environment/environment/rewards.md +++ b/environment/environment/rewards.md @@ -71,6 +71,8 @@ Note that the simulation enforces that agents cannot start earlier than $\delta_ not enforced by the simulation, but will be penalized by the rewards function. Also note that order of intermediate stops is also not enforced by the simulation in case of overlapping time windows. +To to compare results between environments of different dimensions and different number of agents, the rward can be normalized, such that the normalized reward is in the range $[0.0, 1.0]$. For each agent there is a maximum penalty set at ```- max_episode_steps```. This guarantees normalization regardless of the parametrization of the rewards. + ```{admonition} Code reference The reward is calculated in [envs/rewards.py](https://github.com/flatland-association/flatland-rl/blob/main/flatland/envs/rewards.py) ``` From 988445b718383612ed0b1c233fa1923634956962 Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Mon, 20 Apr 2026 14:54:56 +0200 Subject: [PATCH 12/28] docs: refine reward description --- environment/environment/rewards.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/environment/environment/rewards.md b/environment/environment/rewards.md index 231e0ef..7e3053f 100644 --- a/environment/environment/rewards.md +++ b/environment/environment/rewards.md @@ -61,7 +61,7 @@ The symbols are described in the table below. |:-----------------------------|--------------------------------|---------------------------|------------|------------|-----------------------------------------------------------------------------------------------------------| | delay at target | 1 | $\mathrm{A}_J$ | $\alpha_J$ | $a_J$ | $\mathrm{A}_J$ latest arrival and $a_J$ actual arrival at target $J$ | | journey not started | $\phi$, $\pi$ | $1-\Delta_1$ | | $p$ | cancellation factor $\phi$ and buffer $\pi$,
$p$ is the shortest path from start to target | -| target not reached | 1 | $1-\mathrm{A}_J$ | | $d$, $\nu$ | time $d$ remaining on shortest path towards target or minimum penalty for target not reached $\nu$ if $d < \nu$ | +| target not reached | 1 | $1-\mathrm{A}_J$ | | $d$, $\nu$ | time $d$ remaining on shortest path to target with maximum speed for corresponding train category, or minimum penalty for target not reached $\nu$ if $d < \nu$ | | intermediate late arrival | $\alpha$ | $\mathrm{A}_j$ | $\alpha_j$ | $a_j$ | latest arrival $\mathrm{A}_j$, actual arrival time $a_j$
at intermediate stop $j=2,\ldots,J-1$ | | intermediate stop not served | $\mu$ | $1-\mathrm{A}_j$ | | | intermediate stop $j$ not served, $j=2,\ldots,J-1$ | | intermediate early departure | $\delta$ | $\Delta_j$ | $\delta_j$ | $d_j$ | earliest departure from stop $j$, actual departure time $d_j$
at intermediate stop $j=2,\ldots,J-1$ | From 8961990de87a939c64c17c98c174476bb456ea84 Mon Sep 17 00:00:00 2001 From: chenkins Date: Tue, 21 Apr 2026 12:03:04 +0200 Subject: [PATCH 13/28] docs: add technical table on evaluation constraints. --- challenges/ecml2026.md | 17 -------------- challenges/ecml2026/eval.md | 47 ++++++++++++++++++++++++++++++++++--- 2 files changed, 44 insertions(+), 20 deletions(-) diff --git a/challenges/ecml2026.md b/challenges/ecml2026.md index acc88eb..85dd5a5 100644 --- a/challenges/ecml2026.md +++ b/challenges/ecml2026.md @@ -16,24 +16,7 @@ The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is * Submission closure: June 8th, 2026 (AoE) * Winner announcement: June 15th, 2026 -⛽ Time and Resource limits ---- -The agents have to act within **time limits**: - -- You are allowed up to 30 minutes per scenario. -- The full evaluation must finish in 4 hours. - -The agents are evaluated in a container with **resource limits** - -- 4 CPU cores -- 15 GB of main memory. - -We do not provide GPUs. - -📪 Daily Submission Limits and Submission Closure. ---- -You can submit up to 2 times per day. diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index ab8eb81..7be89b2 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -20,18 +20,21 @@ What is the **normalized return**? - The **returns** are the sum of Flatland's default rewards your agents accumulate during each episode as described in [rewards.md](../../environment/environment/rewards.md) -- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. To guarantee this, the maximum penalty per agent can be at most ```max_episode_steps```. This normalized rewards allows to compare results between environments of different dimensions and different number of agents. +- To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. To guarantee this, the maximum penalty per agent can be at most + ```max_episode_steps```. This normalized rewards allows to compare results between environments of different dimensions and different number of agents. In code: ```python -normalized_reward = sum([max(cumulative_rewards[agent.handle], - self.env.max_episode_steps) for agent in agents]) / (self.env.max_episode_steps * self.env.get_num_agents()) + 1 +normalized_reward = sum([max(cumulative_rewards[agent.handle], - self.env.max_episode_steps) for agent in agents]) / ( + self.env.max_episode_steps * self.env.get_num_agents()) + 1 ``` The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached. Therefore: - The **minimum possible value** (i.e. worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode. -- The **maximum possible value** (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets and intermediate stops on time, i.e. not receive any penalty. +- The **maximum possible value** (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets and intermediate stops on time, i.e. + not receive any penalty. ### Submission Score @@ -53,3 +56,41 @@ The factors for the [reward function](../../environment/environment/rewards.md) | intermediate late arrival | 0.5 | | intermediate early departure | 0.5 | | collision | 250 | + +⛽ Time and Resource limits +--- + +The agents have to act within **time limits**: + +- You are allowed up to 30 minutes per scenario. +- The full evaluation must finish in 4 hours. + +The agents are evaluated in a container with **resource limits** + +- 4 CPU cores +- 15 GB of main memory. + +We do not provide GPUs. + +### Details + +Limits on the + +| Limit[^1] | Value | Submission outcome | Details | +|---------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------| +| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | +| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | +| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | +| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario +| `ACTIVE_DEADLINE_SECONDS` | `18000` (5h) | Success with termination cause | everything including technical overhead for starting pods | +| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low` | + +[^1]: see [implementation](https://github.com/flatland-association/flatland-benchmarks/pull/594/changes) + + + +📪 Daily Submission Limits and Submission Closure. +--- +You can submit up to 2 times per day. + + From 5e783b872b31003f9e3f8cadbfa115aa713afcfc Mon Sep 17 00:00:00 2001 From: chenkins Date: Tue, 21 Apr 2026 12:59:16 +0200 Subject: [PATCH 14/28] docs: add technical table on evaluation constraints. --- challenges/ecml2026/eval.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index 7be89b2..67a0da2 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -72,9 +72,7 @@ The agents are evaluated in a container with **resource limits** We do not provide GPUs. -### Details - -Limits on the +### Detailed overview over resource limits | Limit[^1] | Value | Submission outcome | Details | |---------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------| From 6352eaa1997d5d507e16595d79861ec5373442c0 Mon Sep 17 00:00:00 2001 From: Christian Eichenberger Date: Thu, 23 Apr 2026 09:45:36 +0200 Subject: [PATCH 15/28] Update link to ecml2026-starterkit. Co-authored-by: Christian Eichenberger --- challenges/ecml2026.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/challenges/ecml2026.md b/challenges/ecml2026.md index 85dd5a5..5b48522 100644 --- a/challenges/ecml2026.md +++ b/challenges/ecml2026.md @@ -5,7 +5,7 @@ The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is -- Follow the [starterkit](https://github.com/flatland-association/flatland-benchmarks-f3-starterkit) to make your first submission. +- Follow the [starterkit](https://github.com/flatland-association/ecml2026-starterkit) to make your first submission. - Read about the [evaluations metrics](ecml2026/eval) of this edition. - Read about the [level configurations](ecml2026/levelconfig) of this edition. From c4318ad942b07eb1895023441feb03431b77ed3d Mon Sep 17 00:00:00 2001 From: chenkins Date: Thu, 23 Apr 2026 12:13:59 +0200 Subject: [PATCH 16/28] docs: add daily submission limit. --- challenges/ecml2026/eval.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index 67a0da2..f66bbb5 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -76,10 +76,11 @@ We do not provide GPUs. | Limit[^1] | Value | Submission outcome | Details | |---------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------| +| `dailyLimit` | `5` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | | `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | | `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | | `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | -| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario +| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario | | `ACTIVE_DEADLINE_SECONDS` | `18000` (5h) | Success with termination cause | everything including technical overhead for starting pods | | `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low` | From 39ab20f756c652df16e5a8a9d6e0dd9c2d62d58e Mon Sep 17 00:00:00 2001 From: chenkins Date: Fri, 24 Apr 2026 12:52:15 +0200 Subject: [PATCH 17/28] docs: add reference to ECML2026Rewards class. --- challenges/ecml2026/eval.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index f66bbb5..d39768c 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -57,6 +57,8 @@ The factors for the [reward function](../../environment/environment/rewards.md) | intermediate early departure | 0.5 | | collision | 250 | +This configuration is implemented using `--rewards flatland.envs.rewards.ECML2026Rewards`. + ⛽ Time and Resource limits --- From 7aab98914932d8ad955d45abdb014c1a78f4a022 Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Mon, 27 Apr 2026 14:03:14 +0200 Subject: [PATCH 18/28] docs: update level description --- challenges/ecml2026/levelconfig.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/challenges/ecml2026/levelconfig.md b/challenges/ecml2026/levelconfig.md index 2244ab4..6d9c458 100644 --- a/challenges/ecml2026/levelconfig.md +++ b/challenges/ecml2026/levelconfig.md @@ -1,12 +1,12 @@ # Level Configurations -| level | #scenarios | properties | malfunctions | -|---------|:----------:|--------------------------------------------------------------------------------------------------|-------------------------------------------------| -| level_0 | 5 | One train per Line starting at t=0 | None | -| level_1 | 5 | Multiple trains per Line, different starting times, larger travel factor (more time for journey) | None | -| level_2 | 5 | More trains, tighter schedules (periodicity & travel factor) | None | -| level_3 | 5 | Like level 1 but with | Breakdowns | -| level_4 | 5 | Like level 2 but with | Breakdowns and departure delays | -| level_5 | 5 | Like level 4 but with more severe malfunctions (more frequent & longer) | Breakdowns and departure delays | -| level_6 | 5 | Full map only, progressively more malfunctions (including infrastructure disruptions) | Breakdowns, departure delays and infrastructure | +| level | #scenarios | number of agents | max. number of intermediate stops | properties | malfunctions | +|---------|:----------:|----------------------------|:---------------------------------:|--------------------------------------------------------------------------------------------------|-------------------------------------------------| +| level_0 | 5 | {8,11,14,26,28}​ | {3,3,4,6,6} | One train per Line starting at t=0 | None | +| level_1 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Multiple trains per Line, different starting times, larger travel factor (more time for journey) | None | +| level_2 | 5 | {90,125,150,300,532}​ | {3,3,4,6,6} | More trains, tighter schedules (periodicity & travel factor) | None | +| level_3 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Like level 1 but with | Breakdowns | +| level_4 | 5 | {90,125,150,300,532}​ | {3,3,4,6,6} | Like level 2 but with | Breakdowns and departure delays | +| level_5 | 5 | {270,375,450,900,1600}​ | {3,3,4,6,6} | More trains and more severe departure delays​ | Breakdowns and departure delays | +| level_6 | 5 | {1600,1600,1600,1600,1600}​ | {6,6,6,6,6} | Full map only, progressively more malfunctions (including infrastructure disruptions) | Breakdowns, departure delays and infrastructure | From cabdc6f403f851ccc5cbc9f971fa7104cf583e66 Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Mon, 27 Apr 2026 14:07:43 +0200 Subject: [PATCH 19/28] docs: consolidate incosistency --- challenges/ecml2026/eval.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index d39768c..dfd9392 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -78,7 +78,7 @@ We do not provide GPUs. | Limit[^1] | Value | Submission outcome | Details | |---------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------| -| `dailyLimit` | `5` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | +| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | | `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | | `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | | `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | From 312ceaac265e07b1140b22560329b8e64453fdf9 Mon Sep 17 00:00:00 2001 From: chenkins Date: Mon, 27 Apr 2026 14:38:11 +0200 Subject: [PATCH 20/28] docs(ecml2026): update limits. --- challenges/ecml2026/eval.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index dfd9392..0cf0c9a 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -76,15 +76,16 @@ We do not provide GPUs. ### Detailed overview over resource limits -| Limit[^1] | Value | Submission outcome | Details | -|---------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------| -| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | -| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | -| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | -| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | -| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario | -| `ACTIVE_DEADLINE_SECONDS` | `18000` (5h) | Success with termination cause | everything including technical overhead for starting pods | -| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low` | +| Limit[^1] | Value | Submission outcome | Details | +|---------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------| +| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | +| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | +| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | +| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | +| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario | +| `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation | +| `ACTIVE_DEADLINE_SECONDS` | `21600` (6h) | Success with termination cause | everything including technical overhead for starting pods | +| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low` | [^1]: see [implementation](https://github.com/flatland-association/flatland-benchmarks/pull/594/changes) From aae440335f607131369bedc0c55004b8e906331f Mon Sep 17 00:00:00 2001 From: chenkins Date: Wed, 29 Apr 2026 08:45:42 +0200 Subject: [PATCH 21/28] docs(ecml2026): update limits. --- challenges/ecml2026/eval.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index 0cf0c9a..888f270 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -76,16 +76,18 @@ We do not provide GPUs. ### Detailed overview over resource limits -| Limit[^1] | Value | Submission outcome | Details | -|---------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------| -| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | -| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | -| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | -| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | -| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario | -| `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation | -| `ACTIVE_DEADLINE_SECONDS` | `21600` (6h) | Success with termination cause | everything including technical overhead for starting pods | -| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low` | +| Limit[^1] | Value | Submission outcome | Details | +|---------------------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------| +| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | +| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | +| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | +| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario | +| `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation | +| `ACTIVE_DEADLINE_SECONDS` | `600` (1h) | Failure/cleanup | everything including technical overhead for starting pods for submission | +| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low` | +| `ORCHESTRATION_JOB_K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "5Gi", "cpu": "1"}, "limits": {"memory": "5Gi", "cpu": "1"}}` | Failure | resource limits for pod running the submission | +| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | +| `ORCHESTRATION_JOB_ACTIVE_DEADLINE_SECONDS` | `28800` (8h) | Failure/cleanup | everything including technical overhead for starting pods for orchestration and evaluation | [^1]: see [implementation](https://github.com/flatland-association/flatland-benchmarks/pull/594/changes) From d937701d2f1bf69ac059641a23c40807964e3a22 Mon Sep 17 00:00:00 2001 From: Manuel Meyer Date: Wed, 29 Apr 2026 10:09:45 +0200 Subject: [PATCH 22/28] docs: update level config --- challenges/ecml2026/levelconfig.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/challenges/ecml2026/levelconfig.md b/challenges/ecml2026/levelconfig.md index 6d9c458..31f281c 100644 --- a/challenges/ecml2026/levelconfig.md +++ b/challenges/ecml2026/levelconfig.md @@ -7,6 +7,6 @@ | level_2 | 5 | {90,125,150,300,532}​ | {3,3,4,6,6} | More trains, tighter schedules (periodicity & travel factor) | None | | level_3 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Like level 1 but with | Breakdowns | | level_4 | 5 | {90,125,150,300,532}​ | {3,3,4,6,6} | Like level 2 but with | Breakdowns and departure delays | -| level_5 | 5 | {270,375,450,900,1600}​ | {3,3,4,6,6} | More trains and more severe departure delays​ | Breakdowns and departure delays | -| level_6 | 5 | {1600,1600,1600,1600,1600}​ | {6,6,6,6,6} | Full map only, progressively more malfunctions (including infrastructure disruptions) | Breakdowns, departure delays and infrastructure | +| level_5 | 5 | {90,125,150,300,532}​ | {3,3,4,6,6} | Like level 4 but with more severe malfunctions (more frequent & longer)​ | Breakdowns and departure delays | +| level_6 | 5 | {532,532,532,532,532}​ | {6,6,6,6,6} | Full map only, progressively more malfunctions (including infrastructure disruptions) | Breakdowns, departure delays and infrastructure | From b73bd88485eb9e3c7c66fbd2edf045a50f827740 Mon Sep 17 00:00:00 2001 From: chenkins Date: Thu, 30 Apr 2026 16:21:45 +0200 Subject: [PATCH 23/28] docs(ecml2026): clarification on handling of termination causes. --- challenges/ecml2026/eval.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index 888f270..12ebec9 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -76,18 +76,18 @@ We do not provide GPUs. ### Detailed overview over resource limits -| Limit[^1] | Value | Submission outcome | Details | -|---------------------------------------------|------------------------------------------------------------------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------| -| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | -| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | -| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | -| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario | -| `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation | -| `ACTIVE_DEADLINE_SECONDS` | `600` (1h) | Failure/cleanup | everything including technical overhead for starting pods for submission | -| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low` | -| `ORCHESTRATION_JOB_K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "5Gi", "cpu": "1"}, "limits": {"memory": "5Gi", "cpu": "1"}}` | Failure | resource limits for pod running the submission | -| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | -| `ORCHESTRATION_JOB_ACTIVE_DEADLINE_SECONDS` | `28800` (8h) | Failure/cleanup | everything including technical overhead for starting pods for orchestration and evaluation | +| Limit[^1] | Value | Submission outcome | Details | +|---------------------------------------------|------------------------------------------------------------------------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------| +| `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | +| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | +| `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | +| `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario; evaluation terminated; results do notexcl. the overlong scenario | +| `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation; results do not include the overlong scenario | +| `ACTIVE_DEADLINE_SECONDS` | `600` (1h) | Failure/cleanup | everything including technical overhead for starting pods for submission | +| `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low`; results do include the test, but stop after the test. | +| `ORCHESTRATION_JOB_K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "5Gi", "cpu": "1"}, "limits": {"memory": "5Gi", "cpu": "1"}}` | Failure | resource limits for pod running the submission | +| `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | +| `ORCHESTRATION_JOB_ACTIVE_DEADLINE_SECONDS` | `28800` (8h) | Failure/cleanup | everything including technical overhead for starting pods for orchestration and evaluation | [^1]: see [implementation](https://github.com/flatland-association/flatland-benchmarks/pull/594/changes) From b3ccdf72baa3200588d1052d3556ec09c2d91df4 Mon Sep 17 00:00:00 2001 From: chenkins Date: Sat, 2 May 2026 17:12:06 +0200 Subject: [PATCH 24/28] docs(ecml2026): update limits. --- challenges/ecml2026/eval.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index 12ebec9..aeec7c1 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -79,7 +79,7 @@ We do not provide GPUs. | Limit[^1] | Value | Submission outcome | Details | |---------------------------------------------|------------------------------------------------------------------------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------| | `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | -| `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | +| `WAIT_FOR_POD_TO_RUN_LIMIT` | `1200` (20 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | | `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | | `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario; evaluation terminated; results do notexcl. the overlong scenario | | `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation; results do not include the overlong scenario | From d60552587549e6ed10722e5fa0d97561f9a03cb5 Mon Sep 17 00:00:00 2001 From: Christian Eichenberger Date: Sun, 3 May 2026 14:24:47 +0200 Subject: [PATCH 25/28] Apply suggestions from code review Co-authored-by: Christian Eichenberger --- challenges/ecml2026/eval.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index aeec7c1..f33cad1 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -1,7 +1,7 @@ Evaluation Metrics === -The **[Flatland 4 challenge](https://fab.flatland.cloud/suites/24ab2336-a407-4329-b781-d71846250e24)** is the newest competition around the Flatland +The **[ECML 2026 challenge](https://fab.flatland.cloud/suites/24ab2336-a407-4329-b781-d71846250e24)** is the newest competition around the Flatland environment. In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The scenario setup and the evaluation @@ -65,7 +65,7 @@ This configuration is implemented using `--rewards flatland.envs.rewards.ECML202 The agents have to act within **time limits**: - You are allowed up to 30 minutes per scenario. -- The full evaluation must finish in 4 hours. +- The full evaluation must finish in 5 hours. The agents are evaluated in a container with **resource limits** From 18ae55f4623e7a86774ac2e2cff0f727e1864bcf Mon Sep 17 00:00:00 2001 From: Christian Eichenberger Date: Sun, 3 May 2026 14:53:53 +0200 Subject: [PATCH 26/28] Apply suggestions from code review Co-authored-by: Christian Eichenberger --- challenges/ecml2026.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/challenges/ecml2026.md b/challenges/ecml2026.md index 5b48522..de6fee4 100644 --- a/challenges/ecml2026.md +++ b/challenges/ecml2026.md @@ -22,4 +22,4 @@ The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is ⭐ Supported Flatland Versions ----------------------------- -You must use Flatland version [4.2.5](https://github.com/flatland-association/flatland-rl/releases/tag/v4.2.5) (forthcoming). \ No newline at end of file +You must use Flatland version [4.2.5](https://github.com/flatland-association/flatland-rl/releases/tag/v4.2.5). \ No newline at end of file From 9613aabd616e7121a926675d70b149c1ac12cc43 Mon Sep 17 00:00:00 2001 From: chenkins Date: Sun, 3 May 2026 14:59:54 +0200 Subject: [PATCH 27/28] ci: update gh actions running deprecated Node.js 20. --- .github/workflows/checks.yml | 12 ++++++------ .github/workflows/main.yml | 6 +++--- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/.github/workflows/checks.yml b/.github/workflows/checks.yml index 37f3f14..d1a75a8 100644 --- a/.github/workflows/checks.yml +++ b/.github/workflows/checks.yml @@ -15,9 +15,9 @@ jobs: build: runs-on: ubuntu-22.04 steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@v6 - name: Set up Python - uses: actions/setup-python@v5 + uses: actions/setup-python@v6 with: python-version: "3.10" - name: Build @@ -27,7 +27,7 @@ jobs: bash build.sh # https://github.com/actions/upload-artifact - name: Upload GitHub Actions artifact - uses: actions/upload-artifact@v4 + uses: actions/upload-artifact@v6 with: name: upload-build path: _build/html/ @@ -36,9 +36,9 @@ jobs: build-flatland-rl-main: runs-on: ubuntu-22.04 steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@v6 - name: Set up Python - uses: actions/setup-python@v5 + uses: actions/setup-python@v6 with: python-version: "3.10" - name: Build @@ -49,7 +49,7 @@ jobs: bash build.sh # https://github.com/actions/upload-artifact - name: Upload GitHub Actions artifact - uses: actions/upload-artifact@v4 + uses: actions/upload-artifact@v6 with: name: upload-build-main path: _build/html/ diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 5f1edde..6a1c035 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -13,9 +13,9 @@ jobs: build: runs-on: ubuntu-22.04 steps: - - uses: actions/checkout@v4 + - uses: actions/checkout@v6 - name: Set up Python - uses: actions/setup-python@v5 + uses: actions/setup-python@v6 with: python-version: "3.10" - name: Build @@ -25,7 +25,7 @@ jobs: bash build.sh # https://github.com/actions/upload-artifact - name: Upload GitHub Actions artifact - uses: actions/upload-artifact@v4 + uses: actions/upload-artifact@v6 with: name: upload-build path: _build/html/ From 2ad5c31dbaaedc8b72fbe3e369c0043dee175ae3 Mon Sep 17 00:00:00 2001 From: chenkins Date: Sun, 3 May 2026 15:00:10 +0200 Subject: [PATCH 28/28] docs(ecml2026): update limits. --- challenges/ecml2026/eval.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/challenges/ecml2026/eval.md b/challenges/ecml2026/eval.md index f33cad1..b419350 100644 --- a/challenges/ecml2026/eval.md +++ b/challenges/ecml2026/eval.md @@ -83,7 +83,7 @@ We do not provide GPUs. | `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | | `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario; evaluation terminated; results do notexcl. the overlong scenario | | `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation; results do not include the overlong scenario | -| `ACTIVE_DEADLINE_SECONDS` | `600` (1h) | Failure/cleanup | everything including technical overhead for starting pods for submission | +| `ACTIVE_DEADLINE_SECONDS` | `3600` (1h) | Failure/cleanup | everything including technical overhead for starting pods for submission | | `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low`; results do include the test, but stop after the test. | | `ORCHESTRATION_JOB_K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "5Gi", "cpu": "1"}, "limits": {"memory": "5Gi", "cpu": "1"}}` | Failure | resource limits for pod running the submission | | `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission |