Skip to content

Conversation

@mthede
Copy link
Collaborator

@mthede mthede commented Nov 26, 2025

Description

With the new learning architecture on main, we can streamline how default parameters are handled. Previously, defaults were defined redundantly in multiple places, making it difficult to determine which values applied in practice.

The updated structure centralizes all defaults within the learning role or in the dedicated learning_config data class, improving clarity and maintainability.

It also improves the learning configuration logic, based on four different simulation options (with and without DRL)

  1. No RL -> no learning_config -> no learning role
  2. Single run with loaded RL strategies -> learning_config with learning_mode: false and trained_policies_load_path is provided -> learning role, but no rl algorithm etc. necessary
  3. Training run -> learning_mode: true and full set of learning functions
    3.1. Training episodes
    3.2. Evaluation episodes (config item evaluation_mode exists, but learning loop overwrites it anyway)
  4. Continue learning -> continue_learning: true and trained_policies_load_path is provided (-> internally sets learning_mode = True for entering the usual learning process)

Checklist

  • Documentation updated (docstrings, READMEs, user guides, inline comments, doc folder updates etc.)
  • New unit/integration tests added (if applicable)
  • Changes noted in release notes (if any)
  • Consent to release this PR's code under the GNU Affero General Public License v3.0

Additional Notes (optional)

@mthede mthede requested review from kim-mskw and maurerle November 26, 2025 07:58
@mthede mthede marked this pull request as draft November 26, 2025 07:59
@mthede mthede marked this pull request as ready for review November 26, 2025 08:05
@codecov
Copy link

codecov bot commented Nov 26, 2025

Codecov Report

❌ Patch coverage is 91.44737% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.61%. Comparing base (54370e9) to head (c372568).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
assume/reinforcement_learning/learning_role.py 80.00% 9 Missing ⚠️
assume/world.py 89.47% 2 Missing ⚠️
assume/common/base.py 97.82% 1 Missing ⚠️
assume/scenario/loader_csv.py 92.85% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #692      +/-   ##
==========================================
- Coverage   45.72%   45.61%   -0.11%     
==========================================
  Files          54       54              
  Lines        8031     8030       -1     
==========================================
- Hits         3672     3663       -9     
- Misses       4359     4367       +8     
Flag Coverage Δ
pytest 45.61% <91.44%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@maurerle maurerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good decision to create a new PR for it.
I think that the learning_config should be renamed if it is only a dict.

Just an idea:
If we always have a learning_role anyway and this learning_role does always have the learning_config - we could also just set the properties in the learning_role.
Therefore, we can ditch LearningConfig and have world.learning_role.trained_policies_save_path instead of world.learning_role.learning_config.trained_policies_save_path?

Copy link
Contributor

@jrasko jrasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Training a simulation (02a) on my GPU does not work in this branch, however it does fine on main

RuntimeError: Expected all tensors to be on the same device, but got mat1 is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA_addmm)

@mthede
Copy link
Collaborator Author

mthede commented Nov 27, 2025

@jrasko Will look into the cuda problem tomorrow! Could you please try removing the "and not self.learning_config.evaluation_mode" in the learning role?

@mthede mthede requested review from jrasko and maurerle November 27, 2025 17:31
@mthede
Copy link
Collaborator Author

mthede commented Nov 27, 2025

Good decision to create a new PR for it. I think that the learning_config should be renamed if it is only a dict.

Do you mean renaming it in the yaml files?

Just an idea: If we always have a learning_role anyway and this learning_role does always have the learning_config - we could also just set the properties in the learning_role. Therefore, we can ditch LearningConfig and have world.learning_role.trained_policies_save_path instead of world.learning_role.learning_config.trained_policies_save_path

Yeah, I don't like the very long nesting either. But I felt like the config will be extended in the future and I wanted to keep everything that is user-settable via the config in one place. And it's still/already unpacked in the learning strategies, so then kept it at least centralized in the learning role. Before, all the settings were passed to the LearningStrategy as kwargs.

@jrasko
Copy link
Contributor

jrasko commented Nov 28, 2025

@jrasko Will look into the cuda problem tomorrow! Could you please try removing the "and not self.learning_config.evaluation_mode" in the learning role?

@mthede yep, thats why. Removing this condition fixes the error.

mthede and others added 8 commits November 28, 2025 13:47
…ation options

1. No RL -> no learning_config -> no learning role
2. Single run with loaded RL strategies -> learning_mode: false and trained_policies_load_path is provided -> learning role, but no rl algorithm etc. necessary
3. Training run -> learning_mode: true
3.1. Training episodes
3.2. Evaluation episodes (config item evaluation_mode exists, but learning loop overwrites it anyway)
4. Continue learning -> continue_learning: true and trained_policies_load_path is provided
@mthede mthede force-pushed the learning_config_pr branch from de192fb to f2eeab2 Compare November 28, 2025 12:47
- add abosult change to early stopping criterion, have that on my personal brnach for a while now
@kim-mskw kim-mskw self-requested a review December 1, 2025 13:44
Copy link
Contributor

@kim-mskw kim-mskw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the changes I wanted where discussed bilateral

with patch.object(Storage, "calculate_marginal_cost", return_value=10.0):
# Calculate bids using the strategy
bids = strategy.calculate_bids(
bids = strategy.calculate_bids( # TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did this TODO say?
At least add a comment here?

@maurerle maurerle merged commit 095f342 into main Dec 2, 2025
9 checks passed
@maurerle maurerle deleted the learning_config_pr branch December 2, 2025 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants