Skip to content

Update time coarsening configurations for 4-degree datasets and era5#991

Merged
mcgibbon merged 12 commits intomainfrom
feature/era5_time_coarsen
Mar 20, 2026
Merged

Update time coarsening configurations for 4-degree datasets and era5#991
mcgibbon merged 12 commits intomainfrom
feature/era5_time_coarsen

Conversation

@mcgibbon
Copy link
Contributor

@mcgibbon mcgibbon commented Mar 19, 2026

Add time coarsening configurations for daily-resolution datasets across CM4, SHiELD, and ERA5 sources, and make the stats/upload scripts idempotent so pipeline steps can be safely re-run.

Changes:

  • Add time_coarsen sections to 4-degree CM4 configs (1pctCO2, piControl, random-CO2-ensemble) and SHiELD random-CO2-ensemble config

  • Add time_coarsen section to era5-1deg-8layer-1940-2022.yaml with total_frozen_precipitation_rate as a window name

  • Add new era5-4deg-8layer-1940-2025.yaml config for 4-degree ERA5 time coarsening and stats

  • scripts/data_process/get_stats.py: skip stats generation when output files already exist

  • scripts/data_process/combine_stats.py: skip combining when combined stats already exist

  • scripts/data_process/upload_stats.py: skip Beaker upload when dataset already exists; add logging

  • Tests added

mcgibbon and others added 8 commits March 6, 2026 21:33
Add time coarsening (factor 4) configuration to:
- shield-ramped-climSST-random-CO2-ensemble-c96-4deg-8layer
- CM4-piControl-atmosphere-4deg-8layer-200yr
- CM4-1pctCO2-atmosphere-4deg-8layer-140yr
- CM4-like-AM4-random-CO2-ensemble-atmosphere-4deg-8layer

The SHiELD ramped config is modeled after the existing shield-amip
time_coarsen, adapted for its variable set (adds h200, LW/SW heating
tendencies; removes UGRD/VGRD1000, h1000, snow_cover_fraction).

The three CM4 configs share the same variable structure, including
vertically-resolved land variables and CM4-specific fluxes. LHTFLsfc
is included as a window variable (computed from surface_evaporation_rate
by the pipeline).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add skip-if-exists checks to get_stats.py and combine_stats.py so that
re-running the stats workflow (e.g. after adding time_coarsen to a
config) does not recompute or overwrite previously generated stats.
Uses centering.nc as a sentinel file since all 4 stats files are always
written together.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Check if the beaker dataset exists before attempting to create it,
avoiding errors when re-running the stats workflow after adding
time_coarsen to a config.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- eastward_wind_5
- eastward_wind_6
- eastward_wind_7
- global_mean_co2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a window_name, since it represents the mean CO2 concentration over the interval in which we are computing the state update (e.g. I use it as a "next step forcing" in ACE). The distinction is not particularly important for most of our datasets where CO2 varies slowly (if at all), but it could make a meaningful difference here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to confirm, is the value we use taken as an interval-mean of the value from the higher time-resolution fortran model runs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the random CO2 runs it is, but in the other runs it is not.

- air_temperature_5
- air_temperature_6
- air_temperature_7
- carbon_dioxide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider making this a window_name for consistency with the random CO2 datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah that makes sense.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on @mcgibbon, and for the updates to prevent re-computing the stats. The configurations look good to me now.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to suggest, we could add n_split values for each of the time-coarsening configs, so that we do not blow up memory—we could likely get away with fewer splits since coarsening is less data intensive than our full dataset computation, but this is just to be safe, and it would be a way to show progress in the job logs.

@spencerkclark
Copy link
Member

Forgot to suggest, we could add n_split values for each of the time-coarsening configs, so that we do not blow up memory—we could likely get away with fewer splits since coarsening is less data intensive than our full dataset computation, but this is just to be safe, and it would be a way to show progress in the job logs.

Was a bit premature in this, so I removed the suggestions related to it. The 4-degree time-coarsening workflow ran just fine for ERA5, which is an 85-year dataset, so I expect it will also be fine for the piControl and 1pctCO2 runs. It's something to keep in mind potentially for 1-degree workflows, but maybe worth trying without first to see. Time-coarsening may be a simple enough operation that dask manages it well.

@mcgibbon mcgibbon changed the title Feature/era5 time coarsen Update time coarsening configurations for 4-degree datasets and era5 Mar 20, 2026
@mcgibbon mcgibbon marked this pull request as ready for review March 20, 2026 20:10
@mcgibbon mcgibbon enabled auto-merge (squash) March 20, 2026 20:10
@mcgibbon mcgibbon disabled auto-merge March 20, 2026 20:11
@mcgibbon mcgibbon enabled auto-merge (squash) March 20, 2026 22:25
@mcgibbon mcgibbon merged commit 11a8e8b into main Mar 20, 2026
7 checks passed
@mcgibbon mcgibbon deleted the feature/era5_time_coarsen branch March 20, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants