Describe the bug
Let's say I want to build a dataset with 6h- and 3h accumulated precipitation
# 3h accumulations
- dates:
start: 2020-01-01 00:00:00
end: 2021-01-01 00:00:00
frequency: 3h
accumulations:
<<: *mars_request
time: [0]
accumulation_period: [0, 3]
param:
- tp # total precipitation
# rename to allow 2 total precipitation fields
- rename:
param: "{param}_3h"
# 6h accumulations
- dates:
start: 2020-01-01 00:00:00
end: 2021-01-01 00:00:00
frequency: 3h
accumulations:
<<: *mars_request
time: [0]
accumulation_period: [0, 6]
param:
- tp # total precipitation
# rename to allow 2 total precipitation fields
- rename:
param: "{param}_6h"
The rename filter in the recipe updates the param key in the metadata, which I guess is then used as the variable name in the dataset.
As far as I know there is currently no way of renaming a single variable (during dataset creation) without also changing the param metadata.
When changing the variable name structure of all variables with remapping, this is possible
output:
remapping:
param_level: "{param}_{levelist}"
I use the above snippet to get rid of the _2 or _10 in the name of surface levels fields like 2t_2 or 10v_10.
Here only the variable name is changed.
Now I want to combine this dataset with another dataset using the cutout functionality
Now the combined dataset has variable name tp with param metadata tp_6h.
This (currently) trips up the scalers during training.
** Version number **
building of cerra
anemoi-datasets branch abstracting-accumulation (8fb0a16)
opening datasets with cutout:
anemoi-datasets: current main (c26a9d8)
anemoi-transform: current main (ecmwf/anemoi-transform@548e2fa)
Additional context
I see two possible solutions:
- Add functionality to specify the name of a variable during building without changing the param metadata
- Let the rename action in
open_dataset also rename the param metadata, but this might lead to other problems with pressure level fields as these typically have name: t_600, param: t
Describe the bug
Let's say I want to build a dataset with 6h- and 3h accumulated precipitation
The rename filter in the recipe updates the
paramkey in the metadata, which I guess is then used as the variable name in the dataset.As far as I know there is currently no way of renaming a single variable (during dataset creation) without also changing the
parammetadata.When changing the variable name structure of all variables with
remapping, this is possibleI use the above snippet to get rid of the
_2or_10in the name of surface levels fields like2t_2or10v_10.Here only the variable name is changed.
Now I want to combine this dataset with another dataset using the
cutoutfunctionalityNow the combined dataset has variable name
tpwithparammetadatatp_6h.This (currently) trips up the scalers during training.
** Version number **
building of cerra
anemoi-datasets branch abstracting-accumulation (8fb0a16)
opening datasets with cutout:
anemoi-datasets: current main (c26a9d8)
anemoi-transform: current main (ecmwf/anemoi-transform@548e2fa)
Additional context
I see two possible solutions:
open_datasetalso rename the param metadata, but this might lead to other problems with pressure level fields as these typically have name:t_600, param:t