Update time coarsening configurations for 4-degree datasets and era5#991
Update time coarsening configurations for 4-degree datasets and era5#991
Conversation
Add time coarsening (factor 4) configuration to: - shield-ramped-climSST-random-CO2-ensemble-c96-4deg-8layer - CM4-piControl-atmosphere-4deg-8layer-200yr - CM4-1pctCO2-atmosphere-4deg-8layer-140yr - CM4-like-AM4-random-CO2-ensemble-atmosphere-4deg-8layer The SHiELD ramped config is modeled after the existing shield-amip time_coarsen, adapted for its variable set (adds h200, LW/SW heating tendencies; removes UGRD/VGRD1000, h1000, snow_cover_fraction). The three CM4 configs share the same variable structure, including vertically-resolved land variables and CM4-specific fluxes. LHTFLsfc is included as a window variable (computed from surface_evaporation_rate by the pipeline). Co-Authored-By: Claude Opus 4.6 <[email protected]>
…names Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add skip-if-exists checks to get_stats.py and combine_stats.py so that re-running the stats workflow (e.g. after adding time_coarsen to a config) does not recompute or overwrite previously generated stats. Uses centering.nc as a sentinel file since all 4 stats files are always written together. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Check if the beaker dataset exists before attempting to create it, avoiding errors when re-running the stats workflow after adding time_coarsen to a config. Co-Authored-By: Claude Opus 4.6 <[email protected]>
scripts/data_process/configs/CM4-1pctCO2-atmosphere-4deg-8layer-140yr.yaml
Outdated
Show resolved
Hide resolved
scripts/data_process/configs/CM4-like-AM4-random-CO2-ensemble-atmosphere-4deg-8layer.yaml
Outdated
Show resolved
Hide resolved
scripts/data_process/configs/CM4-piControl-atmosphere-4deg-8layer-200yr.yaml
Outdated
Show resolved
Hide resolved
| - eastward_wind_5 | ||
| - eastward_wind_6 | ||
| - eastward_wind_7 | ||
| - global_mean_co2 |
There was a problem hiding this comment.
Let's make this a window_name, since it represents the mean CO2 concentration over the interval in which we are computing the state update (e.g. I use it as a "next step forcing" in ACE). The distinction is not particularly important for most of our datasets where CO2 varies slowly (if at all), but it could make a meaningful difference here.
There was a problem hiding this comment.
I want to confirm, is the value we use taken as an interval-mean of the value from the higher time-resolution fortran model runs?
There was a problem hiding this comment.
In the random CO2 runs it is, but in the other runs it is not.
| - air_temperature_5 | ||
| - air_temperature_6 | ||
| - air_temperature_7 | ||
| - carbon_dioxide |
There was a problem hiding this comment.
Could consider making this a window_name for consistency with the random CO2 datasets.
There was a problem hiding this comment.
Hm yeah that makes sense.
scripts/data_process/configs/CM4-like-AM4-random-CO2-ensemble-atmosphere-4deg-8layer.yaml
Outdated
Show resolved
Hide resolved
scripts/data_process/configs/CM4-piControl-atmosphere-4deg-8layer-200yr.yaml
Outdated
Show resolved
Hide resolved
spencerkclark
left a comment
There was a problem hiding this comment.
Thanks for taking this on @mcgibbon, and for the updates to prevent re-computing the stats. The configurations look good to me now.
There was a problem hiding this comment.
Forgot to suggest, we could add n_split values for each of the time-coarsening configs, so that we do not blow up memory—we could likely get away with fewer splits since coarsening is less data intensive than our full dataset computation, but this is just to be safe, and it would be a way to show progress in the job logs.
Was a bit premature in this, so I removed the suggestions related to it. The 4-degree time-coarsening workflow ran just fine for ERA5, which is an 85-year dataset, so I expect it will also be fine for the piControl and 1pctCO2 runs. It's something to keep in mind potentially for 1-degree workflows, but maybe worth trying without first to see. Time-coarsening may be a simple enough operation that dask manages it well. |
Add time coarsening configurations for daily-resolution datasets across CM4, SHiELD, and ERA5 sources, and make the stats/upload scripts idempotent so pipeline steps can be safely re-run.
Changes:
Add
time_coarsensections to 4-degree CM4 configs (1pctCO2, piControl, random-CO2-ensemble) and SHiELD random-CO2-ensemble configAdd
time_coarsensection toera5-1deg-8layer-1940-2022.yamlwithtotal_frozen_precipitation_rateas a window nameAdd new
era5-4deg-8layer-1940-2025.yamlconfig for 4-degree ERA5 time coarsening and statsscripts/data_process/get_stats.py: skip stats generation when output files already existscripts/data_process/combine_stats.py: skip combining when combined stats already existscripts/data_process/upload_stats.py: skip Beaker upload when dataset already exists; add loggingTests added