-
Notifications
You must be signed in to change notification settings - Fork 2
docs: ECML2026 Railway Competition. #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
chenkins
wants to merge
23
commits into
main
Choose a base branch
from
feature/flatland4
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+155
−12
Draft
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
50ab172
docs: Flatland4 Railway Competition.
chenkins 5853eb9
docs: Flatland4 Railway Competition.
chenkins cd274e0
docs: Flatland4 Railway Competition
CleverManu 21bbaf9
docs: updated reward function with min penalty for delay at target
CleverManu 48f04bb
docs: updated reward parametrization
CleverManu 3e01698
docs: cleanup table
CleverManu 0a761d3
docs: rename flatland4 to ecml2026.
chenkins d42903a
docs: add timeline and supported Flatland versions.
chenkins d2d2a11
Apply suggestions from code review
chenkins 84b9b8c
docs: add normalization description
CleverManu 033cfc1
docs: add normalization description for DefaultReward
CleverManu 988445b
docs: refine reward description
CleverManu 8961990
docs: add technical table on evaluation constraints.
chenkins 5e783b8
docs: add technical table on evaluation constraints.
chenkins 6352eaa
Update link to ecml2026-starterkit.
chenkins c4318ad
docs: add daily submission limit.
chenkins 39ab20f
docs: add reference to ECML2026Rewards class.
chenkins 7aab989
docs: update level description
CleverManu cabdc6f
docs: consolidate incosistency
CleverManu 312ceaa
docs(ecml2026): update limits.
chenkins aae4403
docs(ecml2026): update limits.
chenkins d937701
docs: update level config
CleverManu b73bd88
docs(ecml2026): clarification on handling of termination causes.
chenkins File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| ECML 2026 | ||
| ========= | ||
|
|
||
| The **[ECML 2026 Challenge](https://www.aicrowd.com/challenges/flatland-3)** is the newest competition around the **Flat**land environment. | ||
|
|
||
| <!--  --> | ||
|
|
||
| - Follow the [starterkit](https://github.com/flatland-association/ecml2026-starterkit) to make your first submission. | ||
| - Read about the [evaluations metrics](ecml2026/eval) of this edition. | ||
| - Read about the [level configurations](ecml2026/levelconfig) of this edition. | ||
|
|
||
| ⏱ Timeline | ||
| -------- | ||
|
|
||
| * Competition start: May 4th, 2026 | ||
| * Submission closure: June 8th, 2026 (AoE) | ||
| * Winner announcement: June 15th, 2026 | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
| ⭐ Supported Flatland Versions | ||
| ----------------------------- | ||
| You must use Flatland version [4.2.5](https://github.com/flatland-association/flatland-rl/releases/tag/v4.2.5) (forthcoming). | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TODO release with https://github.com/flatland-association/flatland-rl/pull/397/changes |
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,100 @@ | ||
| Evaluation Metrics | ||
| === | ||
|
|
||
| The **[Flatland 4 challenge](https://fab.flatland.cloud/suites/24ab2336-a407-4329-b781-d71846250e24)** is the newest competition around the Flatland | ||
| environment. | ||
|
|
||
| In this edition, we are encouraging participants to develop innovative solutions that leverage **reinforcement learning**. The scenario setup and the evaluation | ||
| metrics are designed accordingly. However, we are still open for other solutions as well, e.g. operations research, and encourage participants to benchmark | ||
| their state-of-the art algorithms | ||
|
|
||
|
|
||
| ⚖ Evaluation metrics | ||
| --- | ||
|
|
||
| ### Normalized Episode Rewards | ||
|
|
||
| The primary metrics uses the **normalized return** from your agents - the higher the better. | ||
|
|
||
| What is the **normalized return**? | ||
|
|
||
| - The **returns** are the sum of Flatland's default rewards your agents accumulate during each episode as described | ||
| in [rewards.md](../../environment/environment/rewards.md) | ||
| - To **normalize** these return, we scale them so that they stays in the range $[0.0, 1.0]$. To guarantee this, the maximum penalty per agent can be at most | ||
| ```max_episode_steps```. This normalized rewards allows to compare results between environments of different dimensions and different number of agents. | ||
|
|
||
| In code: | ||
|
|
||
| ```python | ||
| normalized_reward = sum([max(cumulative_rewards[agent.handle], - self.env.max_episode_steps) for agent in agents]) / ( | ||
| self.env.max_episode_steps * self.env.get_num_agents()) + 1 | ||
| ``` | ||
|
|
||
| The episodes finish when all the trains have reached their target, or when the maximum number of time steps is reached. Therefore: | ||
|
CleverManu marked this conversation as resolved.
|
||
|
|
||
| - The **minimum possible value** (i.e. worst possible) is 0.0, which occurs if none of the agents reach their goal during the episode. | ||
| - The **maximum possible value** (i.e. best possible) is 1.0, which would occur if all the agents would reach their targets and intermediate stops on time, i.e. | ||
| not receive any penalty. | ||
|
|
||
| ### Submission Score | ||
|
|
||
| The submission score is the sum of the normalized scenario rewards. | ||
|
|
||
| Evaluation is stopped when a submission does not reach the threshold of 25% completed agents within a level (5 scenarios). | ||
|
|
||
| ### Factors in reward function | ||
|
|
||
| The factors for the [reward function](../../environment/environment/rewards.md) in this competition are: | ||
|
|
||
| | factor | value | | ||
| |-------------------------------------------|:-----:| | ||
| | journey not started (cancellation factor) | 5 | | ||
| | cancellation time buffer | 0 | | ||
| | delay at target | 1 | | ||
| | target not reached minimum penalty | 100 | | ||
| | intermediate stop not served | 50 | | ||
| | intermediate late arrival | 0.5 | | ||
| | intermediate early departure | 0.5 | | ||
| | collision | 250 | | ||
|
CleverManu marked this conversation as resolved.
|
||
|
|
||
| This configuration is implemented using `--rewards flatland.envs.rewards.ECML2026Rewards`. | ||
|
|
||
| ⛽ Time and Resource limits | ||
| --- | ||
|
|
||
| The agents have to act within **time limits**: | ||
|
|
||
| - You are allowed up to 30 minutes per scenario. | ||
| - The full evaluation must finish in 4 hours. | ||
|
|
||
| The agents are evaluated in a container with **resource limits** | ||
|
|
||
| - 4 CPU cores | ||
| - 15 GB of main memory. | ||
|
|
||
| We do not provide GPUs. | ||
|
|
||
| ### Detailed overview over resource limits | ||
|
|
||
| | Limit[^1] | Value | Submission outcome | Details | | ||
| |---------------------------------------------|------------------------------------------------------------------------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | `dailyLimit` | `2` | Not created | Error in frontend as error `429 TOO_MANY_REQUESTS` from backend. | | ||
| | `WAIT_FOR_POD_TO_RUN_LIMIT` | `300` (5 min) | Failure | submission pod should be listed by now, i.e. pulling has started by now. | | ||
| | `WAIT_FOR_POD_TO_START_LIMIT` | `1200` (20 min) | Failure | submission pod should have reached running state by now, i.e. pulling should be done by now | | ||
| | `RUNNING_TIME_LIMIT` | `1800` (30 min) | Success with termination cause | per scenario; evaluation terminated; results do notexcl. the overlong scenario | | ||
| | `TOTAL_RUNNING_TIME_LIMIT` | `18000` (5h) | Success with termination cause | all scenarios, excluding technical overhead for starting pods and running offline trajectory evaluation; results do not include the overlong scenario | | ||
| | `ACTIVE_DEADLINE_SECONDS` | `600` (1h) | Failure/cleanup | everything including technical overhead for starting pods for submission | | ||
| | `PERCENTAGE_COMPLETE_THRESHOLD` | `0.25` (25%) | Success with termination cause | `Mean percentage of done agents during the last test was too low`; results do include the test, but stop after the test. | | ||
| | `ORCHESTRATION_JOB_K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "5Gi", "cpu": "1"}, "limits": {"memory": "5Gi", "cpu": "1"}}` | Failure | resource limits for pod running the submission | | ||
| | `K8S_RESOURCE_ALLOCATION` | `{"requests": {"memory": "15Gi", "cpu": "4"}, "limits": {"memory": "15Gi", "cpu": "4"}}` | Failure | resource limits for pod running the submission | | ||
| | `ORCHESTRATION_JOB_ACTIVE_DEADLINE_SECONDS` | `28800` (8h) | Failure/cleanup | everything including technical overhead for starting pods for orchestration and evaluation | | ||
|
|
||
| [^1]: see [implementation](https://github.com/flatland-association/flatland-benchmarks/pull/594/changes) | ||
|
|
||
|
|
||
|
|
||
| 📪 Daily Submission Limits and Submission Closure. | ||
| --- | ||
| You can submit up to 2 times per day. | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # Level Configurations | ||
|
|
||
| | level | #scenarios | number of agents | max. number of intermediate stops | properties | malfunctions | | ||
| |---------|:----------:|----------------------------|:---------------------------------:|--------------------------------------------------------------------------------------------------|-------------------------------------------------| | ||
| | level_0 | 5 | {8,11,14,26,28} | {3,3,4,6,6} | One train per Line starting at t=0 | None | | ||
| | level_1 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Multiple trains per Line, different starting times, larger travel factor (more time for journey) | None | | ||
| | level_2 | 5 | {90,125,150,300,532} | {3,3,4,6,6} | More trains, tighter schedules (periodicity & travel factor) | None | | ||
| | level_3 | 5 | {36,50,62,118,210} | {3,3,4,6,6} | Like level 1 but with | Breakdowns | | ||
| | level_4 | 5 | {90,125,150,300,532} | {3,3,4,6,6} | Like level 2 but with | Breakdowns and departure delays | | ||
| | level_5 | 5 | {90,125,150,300,532} | {3,3,4,6,6} | Like level 4 but with more severe malfunctions (more frequent & longer) | Breakdowns and departure delays | | ||
| | level_6 | 5 | {532,532,532,532,532} | {6,6,6,6,6} | Full map only, progressively more malfunctions (including infrastructure disruptions) | Breakdowns, departure delays and infrastructure | | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be implemented flatland-association/flatland-benchmarks#571