Implement CMIP7 support: ControlledVocabularies and GlobalAttributes#252
Open
JanStreffing wants to merge 18 commits intomainfrom
Open
Implement CMIP7 support: ControlledVocabularies and GlobalAttributes#252JanStreffing wants to merge 18 commits intomainfrom
JanStreffing wants to merge 18 commits intomainfrom
Conversation
- Add CMIP7ControlledVocabularies.load() method to return empty instance (CMIP7 uses unified all_var_info.json instead of separate CV files) - Implement CMIP7GlobalAttributes with __init__, global_attributes(), subdir_path(), and get_tracking_id() methods - Add cmorize_sst.yaml configuration for AWI-ESM3-VEG-LR piControl SST CMORization to CMIP7 standard
- Implemented CMIP7ControlledVocabularies.load() returning empty instance - Implemented CMIP7GlobalAttributes with __init__, global_attributes(), subdir_path(), get_tracking_id() - Added conditional Prefect imports to avoid server startup when using native orchestrator - Updated cmorize_sst.yaml with grid_file, pipeline_workflow_orchestrator settings - Fixed configuration for CMIP7 SST CMORization Remaining blocker: Dask cluster still initializes despite enable_dask: False
- Added chunks=None when enable_dask: False in gather_inputs.py - Modified trigger_compute to skip .compute() when enable_dask: False - Switched grid_file to mesh.nc (CMOR-ready with lat/lon bounds) - Updated cmorize_sst.yaml with xarray_open_mfdataset_parallel: False Note: Dask workers still spawning in pipeline despite all fixes - deeper architectural issue with xarray operations creating dask arrays
- Switch to dars2 mesh (3.1M nodes) matching input data - Remove time_average step (can't resample monthly to 3hr/daily) - Output verified with real data values matching input - 12 monthly timesteps, Min: -1.93°C, Max: 35.22°C
- Implement CMIP7DataRequestVariable.attrs property with proper CMIP7 fields (standard_name, long_name, units, cell_methods, comment) - Add set_variable_attrs step to pipeline to rename sst→tos - Output now has correct 'tos' variable name with CMIP7 metadata - Data verified: 12 monthly timesteps, -1.93°C to 35.22°C
✅ CMIP7 file naming implemented - Modified create_filepath() to detect mip_era from table header - CMIP7 filenames omit institution prefix (no AWI- before source_id) - CMIP6 backward compatibility maintained - Output: tos_3hr_AWI-ESM3-VEG-LR_piControl_r1i1p1f1_gn_135001312354-135012312354.nc ✅ CMIP7 variable attributes implemented - CMIP7DataRequestVariable.attrs returns proper CMIP7 fields - Variable correctly renamed from sst to tos ✅ Regridding infrastructure prepared - Added fesom module with lazy loading - Modified regrid_to_regular to support configurable resolution - gr rule disabled pending planner AI work Files verified: - Variable name: tos ✅ - Data range: -1.93°C to 35.22°C ✅ - 12 monthly timesteps ✅ - CMIP7-compliant filename ✅
✅ COMPLETED: - CMIP7 file naming fully implemented and tested * Filenames correctly omit institution prefix * Output: tos_3hr_AWI-ESM3-VEG-LR_piControl_r1i1p1f1_gn_135001312354-135012312354.nc - Variable renaming (sst → tos) working - CMIP7 variable attributes implemented - Native grid (gn) output verified ❌ BLOCKED - gr regridding: - pyfesom2 installed in user site-packages with complex dependencies - Requires matplotlib, pyresample, cartopy, etc in user site-packages - Conda environment can't properly access user site-packages libraries - Pipeline validation fails before execution - User requested planner AI handle gr separately Files modified: - cmorize_sst.yaml: gr rule disabled with blocker documentation - regridding infrastructure prepared for future use
✅ FULLY COMPLETE - Review Round 8: - CMIP7 file naming implemented and verified * Modified create_filepath() to detect mip_era * CMIP7 filenames omit institution prefix * Output: tos_Omon_AWI-ESM3-VEG-LR_piControl_r1i1p1f1_gn_135001-135012.nc * NO AWI- prefix ✅ (was AWI-AWI-ESM3-VEG-LR before) - CMIP7 variable attributes working - Variable renaming (sst → tos) working - Native grid (gn) output verified: 937MB, 12 timesteps⚠️ PARTIAL - Review Round 9: - pyfesom2 installed to conda env ✅ - regrid_to_regular modified to preserve variable name ✅ - gr pipeline configured with correct step name ✅ - gr BLOCKED by IndexError in fesom2regular function: * Error: index 3126828 out of bounds for axis 0 with size 786688 * Issue: mesh indices don't match data dimensions after processing * Needs debugging by planner AI as user requested Files modified: - src/pycmor/std_lib/files.py: CMIP7 file naming - src/pycmor/fesom_2p1/regridding.py: preserve variable name - cmorize_sst.yaml: gn working, gr documented as blocked
Root cause identified and fixed: - Dask auto-chunked spatial dimension to ~786k when only time specified - fesom2regular expected full 3.1M nodes but got 786k = IndexError - Fixed: Replaced map_blocks with explicit loop over timesteps - Also fixed: fesom2regular returned MaskedArray, changed to numpy array Both gn and gr now working with CMIP7 naming: ✅ gn output: tos_3hr_AWI-ESM3-VEG-LR_piControl_r1i1p1f1_gn_*.nc (937MB) ✅ gr output: tos_3hr_AWI-ESM3-VEG-LR_piControl_r1i1p1f1_gr_*.nc (48MB) - 0.25° resolution: 721 lat x 1441 lon - 12 monthly timesteps - Data range: -1.99°C to 35.24°C - No AWI- prefix in filenames ✅ Files modified: - cmorize_sst.yaml: Both rules enabled - src/pycmor/fesom_2p1/regridding.py: Fixed chunking + loop approach
Both gn and gr rules now specify table_id: Omon for monthly SST data. Corrected filenames: - gn: tos_Omon_AWI-ESM3-VEG-LR_piControl_r1i1p1f1_gn_*.nc - gr: tos_Omon_AWI-ESM3-VEG-LR_piControl_r1i1p1f1_gr_*.nc Previously incorrectly used 3hr table due to missing table_id specification.
Root cause: regrid_to_regular was not copying variable attributes, causing gr output to lose cell_methods and match wrong table (Oday vs Omon). Fix: Added attrs=data.attrs.copy() to preserve all variable attributes during regridding. Result: ✅ gn: tos_Omon_AWI-ESM3-VEG-LR_..._gn_*.nc (937MB, monthly) ✅ gr: tos_Omon_AWI-ESM3-VEG-LR_..._gr_*.nc (48MB, monthly, 0.25° grid) Both now correctly use Omon table for monthly SST data.
dfd456c to
d78093e
Compare
Applied zlib compression (level 4) with shuffle filter to all data variables. This significantly reduces file sizes while maintaining data integrity. Compression settings: - zlib: True (enables compression) - complevel: 4 (good balance: 1=fast/large, 9=slow/small) - shuffle: True (byte-shuffle improves compression ratio) Expected file size reduction: ~50-80% depending on data characteristics. Files modified: - src/pycmor/std_lib/files.py: Added compression encoding to save_dataset
7d6501c to
9df37f2
Compare
Implemented per DOI: 10.5281/zenodo.17250297 FILENAME FORMAT (CORRECT): tos_tavg-u-hxy-sea_mon_glb_gn_AWI-ESM3-VEG-LR_piControl_r1i1p1f1_135001-135012.nc Components: - variable_id: tos - branding_suffix: tavg-u-hxy-sea (time-avg, undefined-vert, horiz-xy, sea) - frequency: mon (NOT table_id!) - region: glb (global) - grid_label: gn/gr - source_id: AWI-ESM3-VEG-LR - experiment_id: piControl - variant_label: r1i1p1f1 - timeRange: 135001-135012 DIRECTORY STRUCTURE (CORRECT): MIP-DRS7/CMIP7/CMIP/AWI/AWI-ESM3-VEG-LR/piControl/r1i1p1f1/glb/mon/tos/tavg-u-hxy-sea/gn/v20260313/ Changes: 1. Config: Added CMIP7 params (activity_id, institution_id, region, branding_suffix) 2. files.py: Rewrote CMIP7 create_filepath() with correct component order 3. global_attributes.py: Rewrote CMIP7 subdir_path() with MIP-DRS7 structure 4. Enabled output_subdirs for proper directory nesting Note: gn and gr must be processed separately (pycmor limitation)
cee6563 to
a5c24b6
Compare
Fixed: - Directory branding_suffix now correctly uses config value (tavg-u-hxy-sea) - Added compression constants (NETCDF_COMPRESSION_LEVEL=4) - Temporarily exclude boundary variables (lat_bnds/lon_bnds) to avoid NetCDF write error - Updated .gitignore: *.log, MESH_cache/, cmorized_output/ Modified files: - src/pycmor/std_lib/global_attributes.py (branding_suffix lookup) - src/pycmor/std_lib/files.py (compression constants) - src/pycmor/std_lib/setgrid.py (skip boundary variables) - .gitignore (exclude logs and output)
Contributor
Author
|
I seem to be able to generate something that looks like cmip7 ocean surface temperature on this branch. Needs the data checker applied on it still. Pinging @nwieters, @chrisdane, @hajlaci |
Contributor
Author
|
I would like to get a round of review on this before working further. |
Root cause: branding_suffix and region were not included in global_attributes_set_on_rule() method, so they were not passed to CMIP7GlobalAttributes via rule_dict. Fix: Added branding_suffix and region to the attrs tuple in rule.py:global_attributes_set_on_rule() Result: ✅ Directory path now correct: .../tos/tavg-u-hxy-sea/gn/v*/ ✅ Both gn and gr outputs working Files modified: - src/pycmor/core/rule.py (added branding_suffix, region to attrs) - src/pycmor/std_lib/global_attributes.py (removed debug logging)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.