Skip to content

Conversation

@grassesi
Copy link
Contributor

@grassesi grassesi commented Oct 16, 2025

Description

PR #905 had a typo in in the backward compatibility mechanism, producing an exception when triggered. This PR contains a hotfix for this bug.

Issue Number

Closes #898

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

Copy link
Collaborator

@clessig clessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but how can one test this? Why didn't this probably occur in every run?

I am again confused by the linting changes but well.

@grassesi
Copy link
Contributor Author

Looks good but how can one test this? Why didn't this probably occur in every run?

It only triggers when you reload from a run config that was produced before this change, so any fresh run is uneffected

@grassesi grassesi merged commit a645e8a into ecmwf:develop Oct 27, 2025
9 of 10 checks passed
@grassesi grassesi deleted the sgrasse/develop/issue_929_hotfix branch October 27, 2025 13:55
TillHae pushed a commit to TillHae/WeatherGenerator that referenced this pull request Oct 28, 2025
SavvasMel added a commit that referenced this pull request Nov 4, 2025
* rebase

* add ensemble

* fix deterministic

* fix plotting

* lint

* fix eval_config

* probabilistic scores working now

* lint

* Fix spoofing and refactor handling of multiple source files (#1118)

* Cleaning up spoofing and related code on data preprocessing for model

* Fixed typo

* Updated comments

* Removed merge cells and implemented necessary adjustments

* Fixed forecasting

* Fixed missing handling of NaNs in coordinates and channel data

* Minor clean up

* Fix to removing/renaming variables

* Changed funtion name to improve readability

* Fixed bug with incorrect handling of multiple input datasources.

* Addressed reviewer comments

* resolve conflict

* [1131] fixes circular dependencies (#1134)

* fixes dependencies

* cleanup

* make the type checker not fail

* cleanup

* cleanup of type issues

* Give option to plot only prediction maps (#1139)

* add plot_preds_only feature

* minor changes after comments

* Tell FSDP2 about embedding engine forward functions (#1133)

* Tell FSDP2 about embedding engine forward functions

Note DO NOT add print functions in forward functions of the model, it
will break with FSDP2

* Add comment

* recover 'all' option (#1146)

* Fixed problem in inferecne (#1145)

* implement vrmse (#1147)

* [1144] Extra fixes (#1148)

* Fixed problem in inferecne

* more fixes

* fixes

* lint

* lint

---------

Co-authored-by: Christian Lessig <[email protected]>

* Jk/log grad norms/log grad norms (#1068)

* Log gradient norms

* Prototype for recording grad norms

* Address review changes + hide behind feature flag

* Final fixes including backward compatibility

* Ruff

* More ruff stuff

* forecast config with small decoder

* fixed uv.lock

* test gradient logging on mutli gpus

* update uv.lock to latest develop version

* revert to default confit

* add comment on FSDP2 specifics

* move plot grad script to private repo

* rm seaborn from pyproject

* updating terminal and metrics loggin, add get_tensor_item fct

* check for DTensor instead of world size

* revert forecast fct, fix in separate PR

* rename grad_norm log names to exclude from MLFlow

* add log_grad_norms to default config

---------

Co-authored-by: sophiex <[email protected]>

* Add forecast and observation activity (#1126)

* Add calculation methods for forecast and observation activity metrics in Scores class

* Add new calculation methods for forecast activity metrics in Scores class

* ruff

* fix func name

* Rename observation activity calculation method to target activity in Scores class

* typo

* refactor to common calc_act function for activity

* fix cases

* have calc_tact and calc_fact that use _calc_act for maintainability

* fix small thing in style

---------

Co-authored-by: iluise <[email protected]>

* hotfix: use correct methot `create` instead of `construct` (#1090)

* restore develop

* fix deterministic

* fix plotting

* lint

* fix eval_config

* probabilistic scores working now

* lint

* update utils

* packages/evaluate/src/weathergen/evaluate/score.py

* lint

* removing duplication

---------

Co-authored-by: Christian Lessig <[email protected]>
Co-authored-by: Timothy Hunter <[email protected]>
Co-authored-by: Savvas Melidonis <[email protected]>
Co-authored-by: Sophie X <[email protected]>
Co-authored-by: Julius Polz <[email protected]>
Co-authored-by: Julian Kuehnert <[email protected]>
Co-authored-by: Simon Grasse <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Rework configuration for frequency of training artifacts

2 participants