Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions mllam_data_prep/create_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,10 +236,12 @@ def create_dataset(config: Config):

# only need to do selection for the coordinates that the input dataset actually has
if output_coord_ranges is not None:
output_coord_ranges = {
# Use a temporary dict to avoid modifying the original ranges.
# Static features have no time dimension, so they would return an empty dict.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the suggestion below?

Suggested change
# Use a temporary dict to avoid modifying the original ranges.
# Static features have no time dimension, so they would return an empty dict.
# Use a temporary dict to apply selection on coordinate ranges to avoid
# modifying the original ranges given in the config. This is needed because
# for example static features would not have a time dimension, but that
# shouldn't that doing selection on time shouldn't be applied to the other
# variables.

Copy link
Copy Markdown
Contributor Author

@zweihuehner zweihuehner Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to make it a bit more detailed, however, I am a bit confused about the last sentence, something might have gotten mixed up. Do you mean something like:

This is needed because static features, for example, do not have a time dimension. Hence, the time-based selection returns an empty dictionary, which should not overwrite the selection for the other variables.

I have also updated the changelog under v0.6.1, I hope that is correct.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because static features, for example, do not have a time dimension. Hence, the time-based selection returns an empty dictionary, which should not overwrite the selection for the other variables.

That's perfect! I also think the changelog text isn't quite right. The issue here is that if a output variable didn't share a dimension and this output variable was processed before others, then the coordination selection for that dimension wouldn't be applied. Is that how you understand it too?

I have also updated the changelog under v0.6.1, I hope that is correct.

Almost, rather than changing what is in the v0.6.1 release the right way would be to create a new section at the top with unreleased changes (like https://github.com/mllam/mllam-data-prep/blob/2f02bac4120541ae2438e6bbfeb66b6466cc957a/CHANGELOG.md), since your fix will be part of the next release which hasn't been created yet :)

output_coord_ranges_tmp = {
k: w for k, w in output_coord_ranges.items() if k in output_dims
}
da_target = select_by_kwargs(da_target, **output_coord_ranges)
da_target = select_by_kwargs(da_target, **output_coord_ranges_tmp)

dataarrays_by_target[target_output_var].append(da_target)

Expand Down