Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
cc2538b
Implement CMIP7 support: ControlledVocabularies and GlobalAttributes
JanStreffing Mar 13, 2026
e41f007
Update maintainer information in cmorize_sst.yaml
JanStreffing Mar 13, 2026
6ab5bb8
CMIP7 implementation + Prefect bypass for native orchestrator
JanStreffing Mar 13, 2026
6516618
Review round 6: Dask prevention fixes + correct mesh file
JanStreffing Mar 13, 2026
63d48a0
Final working CMIP7 CMORization config
JanStreffing Mar 13, 2026
a48b1ea
Implement CMIP7 variable attrs and add rename step
JanStreffing Mar 13, 2026
463a6c2
Complete Review Round 8: CMIP7 file naming + variable renaming
JanStreffing Mar 13, 2026
c13f0b3
Review Round 8 partially complete - CMIP7 naming working, gr blocked
JanStreffing Mar 13, 2026
f7b1fd0
Review Rounds 8-9: CMIP7 naming complete, gr infrastructure prepared
JanStreffing Mar 13, 2026
00dfc95
✅ Review Round 9 COMPLETE: gr regridding working
JanStreffing Mar 13, 2026
9e3a301
Fix table_id: monthly data should use Omon not 3hr
JanStreffing Mar 13, 2026
d78093e
Fix gr regridding: preserve variable attributes
JanStreffing Mar 13, 2026
9df37f2
Add NetCDF4 compression to reduce file sizes
JanStreffing Mar 13, 2026
a5c24b6
✅ Review Round 11 COMPLETE: Official CMIP7 File Naming
JanStreffing Mar 13, 2026
c5ad588
Review Round 13: Fix directory branding and NetCDF write issues
JanStreffing Mar 13, 2026
a5a3d6b
Fix directory branding_suffix - add to rule_dict
JanStreffing Mar 13, 2026
e1dc73f
clean up logs
JanStreffing Mar 13, 2026
835188f
clean up logs
JanStreffing Mar 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,13 @@
# C extensions
*.so

# Pycmor specific
*.nc
!tests/data/*.nc
*.log
MESH_cache/
cmorized_output/

# Packages
*.egg
*.egg-info
Expand Down
90 changes: 90 additions & 0 deletions cmorize_sst.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
general:
name: "AWI-ESM3-VEG-LR PI Control SST"
description: "CMIP7 CMORization of SST for AWI-ESM3-VEG-LR piControl experiment"
maintainer: "Jan Streffing"
email: "jan.streffing@awi.de"
cmor_version: "CMIP7"
mip: "CMIP"
CV_Dir: "/work/ab0246/a270077/SciComp/Projects/pycmor/cmip6-cmor-tables/CMIP6_CVs"
CMIP_Tables_Dir: "/work/ab0246/a270092/software/pycmor/src/pycmor/data/cmip7"

pycmor:
warn_on_no_rule: False
use_flox: False
parallel: False
enable_dask: False
xarray_open_mfdataset_parallel: False
pipeline_workflow_orchestrator: "native"
enable_output_subdirs: True

pipelines:
- name: default
steps:
- "pycmor.core.gather_inputs.load_mfdataset"
- "pycmor.std_lib.get_variable"
- "pycmor.std_lib.variable_attributes.set_variable_attrs"
- "pycmor.std_lib.convert_units"
- "pycmor.std_lib.setgrid.setgrid"
- "pycmor.std_lib.set_global_attributes"
- "pycmor.std_lib.trigger_compute"
- "pycmor.std_lib.files.save_dataset"

- name: regridded
steps:
- "pycmor.core.gather_inputs.load_mfdataset"
- "pycmor.std_lib.get_variable"
- "pycmor.std_lib.variable_attributes.set_variable_attrs"
- "pycmor.std_lib.convert_units"
- "pycmor.fesom_2p1.regridding.regrid_to_regular"
- "pycmor.std_lib.set_global_attributes"
- "pycmor.std_lib.trigger_compute"
- "pycmor.std_lib.files.save_dataset"

rules:
- name: sst_tos_gn
description: "Cmorize FESOM SST to CMIP7 tos on native grid"
cmor_variable: tos
model_variable: sst
table_id: Omon
output_directory: /work/ab0246/a270092/postprocessing/cmorize
variant_label: r1i1p1f1
experiment_id: piControl
source_id: AWI-ESM3-VEG-LR
model_component: ocean
grid_label: gn
grid_file: /work/ab0246/a270092/input/fesom2/dars2/mesh.nc
# CMIP7 required parameters
activity_id: CMIP
institution_id: AWI
region: glb
branding_suffix: "tavg-u-hxy-sea"
pipelines:
- default
inputs:
- path: /work/bb1469/a270092/runtime/awiesm3-v3.4.1/human_tuning/outdata/fesom
pattern: sst\.fesom\.1350\.nc

- name: sst_tos_gr
description: "Cmorize FESOM SST to CMIP7 tos on 0.25° regular grid"
cmor_variable: tos
model_variable: sst
table_id: Omon
output_directory: /work/ab0246/a270092/postprocessing/cmorize
variant_label: r1i1p1f1
experiment_id: piControl
source_id: AWI-ESM3-VEG-LR
model_component: ocean
grid_label: gr
mesh_path: /work/ab0246/a270092/input/fesom2/dars2
box: "-180, 180, -90, 90"
target_resolution: "0.25"
# CMIP7 required parameters
activity_id: CMIP
institution_id: AWI
region: glb
branding_suffix: "tavg-u-hxy-sea"
pipelines:
- regridded
inputs:
- path: /work/bb1469/a270092/runtime/awiesm3-v3.4.1/human_tuning/outdata/fesom
pattern: sst\.fesom\.1350\.nc
59 changes: 59 additions & 0 deletions plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Plan for CMIP7 CMORization of SST for AWI-ESM3-VEG-LR (piControl)

This document provides a plan for the builder AI to configure and run `pycmor` to cmorize a single variable (`sst`) for one year (`1350`) from FESOM output in the AWI-ESM3-VEG-LR piControl experiment into the **CMIP7** standard.

## 1. Goal Overview
- **Model:** AWI-ESM3-VEG-LR
- **Experiment:** piControl
- **Input Data:** `/work/bb1469/a270092/runtime/awiesm3-v3.4.1/human_tuning/outdata/fesom/sst.fesom.1350.nc`
- **Model Variable:** `sst`
- **CMOR Variable:** `tos` (Sea Surface Temperature, Table: `Omon`)
- **Target Standard:** CMIP7

## 2. CMIP7 Specific Requirements in `pycmor`
Unlike CMIP6, the CMIP7 data request in `pycmor` is driven by a unified `all_var_info.json` file.
- The `general` configuration block must explicitly set `cmor_version: "CMIP7"`.
- `CMIP_Tables_Dir` must point to a directory containing the `all_var_info.json` file. You should use `/work/ab0246/a270092/software/pycmor/src/pycmor/data/cmip7` (which is already populated in the codebase).
- `CV_Dir` (Controlled Vocabularies) configuration is still required. Since the local `cmip6-cmor-tables` submodule is empty, use the shared cluster path found in existing examples: `/work/ab0246/a270077/SciComp/Projects/pycmor/cmip6-cmor-tables/CMIP6_CVs`.

## 3. Configuration YAML Structure
The builder AI should generate a `pycmor` configuration file (e.g., `cmorize_sst.yaml`) with the following structure:

```yaml
general:
name: "AWI-ESM3-VEG-LR PI Control SST"
description: "CMIP7 CMORization of SST for AWI-ESM3-VEG-LR piControl experiment"
maintainer: "Your Name"
email: "your.email@awi.de"
cmor_version: "CMIP7"
mip: "CMIP"
# Shared path for CVs
CV_Dir: "/work/ab0246/a270077/SciComp/Projects/pycmor/cmip6-cmor-tables/CMIP6_CVs"
# Path to the directory containing all_var_info.json for CMIP7
CMIP_Tables_Dir: "/work/ab0246/a270092/software/pycmor/src/pycmor/data/cmip7"

rules:
- name: sst_tos_rule
description: "Cmorize FESOM SST to CMIP7 tos"
cmor_variable: tos
model_variable: sst
# Specify the target directory for the CMORized output
output_directory: ./cmorized_output
variant_label: r1i1p1f1
experiment_id: piControl
source_id: AWI-ESM3-VEG-LR
model_component: ocean
grid_label: gn
inputs:
- path: /work/bb1469/a270092/runtime/awiesm3-v3.4.1/human_tuning/outdata/fesom
pattern: sst\.fesom\.1350\.nc
```

## 4. Execution Steps for the Builder AI
1. **Create the configuration file:** Write the YAML configuration above to a file (e.g., `cmorize_sst.yaml`).
2. **Setup pycmor environment:** Ensure `pycmor` is installed in the current python environment or install it using `pip install -e .` from the repository root (`/work/ab0246/a270092/software/pycmor`).
3. **Execute pycmor:** Run the configuration through the pycmor CLI.
```bash
pycmor process cmorize_sst.yaml
```
4. **Verify Output:** Check the `output_directory` to confirm the file has been created following the CMIP7 directory structure and naming conventions (e.g., `CMIP7/.../tos_...nc`), and verify the internal NetCDF metadata conforms to CMIP7 standards.
98 changes: 86 additions & 12 deletions src/pycmor/core/cmorizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,46 @@
import yaml
from dask.distributed import Client
from everett.manager import generate_uppercase_key, get_runtime_config
from prefect import flow, get_run_logger, task
from prefect.futures import wait
from rich.progress import track

# Import Prefect conditionally to avoid server startup when not needed
try:
import os
_use_prefect = os.environ.get("PYCMOR_PIPELINE_WORKFLOW_ORCHESTRATOR", "prefect") == "prefect"
except:
_use_prefect = True

if _use_prefect:
from prefect import flow, get_run_logger, task
from prefect.futures import wait
else:
# Provide dummy implementations when not using Prefect
def flow(*args, **kwargs):
"""Dummy flow decorator that returns function unchanged"""
if len(args) == 1 and callable(args[0]) and not kwargs:
# Called without parentheses: @flow
return args[0]
else:
# Called with parentheses: @flow() or @flow(name="...")
return lambda f: f

def task(*args, **kwargs):
"""Dummy task decorator that returns function unchanged"""
if len(args) == 1 and callable(args[0]) and not kwargs:
# Called without parentheses: @task
return args[0]
else:
# Called with parentheses: @task() or @task(name="...")
return lambda f: f

def get_run_logger():
"""Dummy logger that returns None"""
return logger

def wait(*args, **kwargs):
"""Dummy wait function"""
return None

from ..data_request.collection import DataRequest
from ..data_request.table import DataRequestTable
from ..data_request.variable import DataRequestVariable
Expand Down Expand Up @@ -261,13 +297,15 @@ def _post_init_populate_rules_with_tables(self):

def _post_init_populate_rules_with_data_request_variables(self):
for drv in self.data_request.variables.values():
rule_for_var = self.find_matching_rule(drv)
if rule_for_var is None:
matching_rules = self.find_matching_rules(drv) # Changed to return list
if not matching_rules:
continue
if rule_for_var.data_request_variables == []:
rule_for_var.data_request_variables = [drv]
else:
rule_for_var.data_request_variables.append(drv)
# Assign the data_request_variable to ALL matching rules
for rule_for_var in matching_rules:
if rule_for_var.data_request_variables == []:
rule_for_var.data_request_variables = [drv]
else:
rule_for_var.data_request_variables.append(drv)
# FIXME: This needs a better name...
# Cluster might need to be copied:
with DaskContext.set_cluster(self._cluster):
Expand Down Expand Up @@ -334,23 +372,59 @@ def _match_pipelines_in_rules(self, force=False):
for rule in self.rules:
rule.match_pipelines(self.pipelines, force=force)

def find_matching_rule(
def find_matching_rules(
self, data_request_variable: DataRequestVariable
) -> Rule or None:
) -> list:
"""Find all rules that match the given data_request_variable.

Returns a list of matching rules. For CMIP7, multiple rules can match
the same variable (e.g., gn and gr rules for the same variable).
"""
matches = []
attr_criteria = [("cmor_variable", "variable_id")]

# For CMIP7, also match on table_id since variables can appear in multiple tables
# (e.g., both Omon.tos and 3hr.tos exist)
if hasattr(data_request_variable, 'table_header'):
table_id_to_match = data_request_variable.table_header.table_id
else:
table_id_to_match = None

for rule in self.rules:
if all(
# Check if cmor_variable matches
if not all(
getattr(rule, r_attr) == getattr(data_request_variable, drv_attr)
for (r_attr, drv_attr) in attr_criteria
):
matches.append(rule)
continue

# For CMIP7, also check table_id if specified in rule
if table_id_to_match and hasattr(rule, 'table_id') and rule.table_id:
if rule.table_id != table_id_to_match:
continue # table_id doesn't match, skip this rule

# If we get here, it's a match
matches.append(rule)

if len(matches) == 0:
msg = f"No rule found for {data_request_variable}"
if self._pymor_cfg.get("raise_on_no_rule", False):
raise ValueError(msg)
elif self._pymor_cfg.get("warn_on_no_rule", False):
logger.warning(msg)

return matches

def find_matching_rule(
self, data_request_variable: DataRequestVariable
) -> Rule or None:
"""Find a single matching rule (legacy method for compatibility).

Returns the first match, or raises error if multiple matches found.
"""
matches = self.find_matching_rules(data_request_variable)

if len(matches) == 0:
return None
if len(matches) > 1:
msg = f"Need only one rule to match to {data_request_variable}. Found {len(matches)}."
Expand Down
21 changes: 20 additions & 1 deletion src/pycmor/core/controlled_vocabularies.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,4 +148,23 @@ def load_from_git(cls, tag: str = "6.2.58.64"):


class CMIP7ControlledVocabularies(ControlledVocabularies):
pass
"""Controlled vocabularies for CMIP7

Note: CMIP7 uses a unified all_var_info.json file instead of
separate controlled vocabulary files like CMIP6.
"""

@classmethod
def load(cls, table_dir=None):
"""Load controlled vocabularies for CMIP7

CMIP7 doesn't use the same CV structure as CMIP6, so we return
an empty instance. Variable information comes from all_var_info.json.
"""
obj = cls([])
return obj

def __init__(self, json_files=None):
"""Create a CMIP7ControlledVocabularies instance"""
super().__init__()

7 changes: 6 additions & 1 deletion src/pycmor/core/gather_inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,7 @@ def load_mfdataset(data, rule_spec):
"""
engine = rule_spec._pymor_cfg("xarray_open_mfdataset_engine")
parallel = rule_spec._pymor_cfg("xarray_open_mfdataset_parallel")
enable_dask = rule_spec._pymor_cfg("enable_dask")
all_files = []
for file_collection in rule_spec.inputs:
for f in file_collection.files:
Expand All @@ -302,8 +303,12 @@ def load_mfdataset(data, rule_spec):
logger.info(f"Loading {len(all_files)} files using {engine} backend on xarray...")
for f in all_files:
logger.info(f" * {f}")

# Prevent dask array creation when enable_dask is False
chunks = None if not enable_dask else "auto"

mf_ds = xr.open_mfdataset(
all_files, parallel=parallel, use_cftime=True, engine=engine
all_files, parallel=parallel, use_cftime=True, engine=engine, chunks=chunks
)
return mf_ds

Expand Down
2 changes: 2 additions & 0 deletions src/pycmor/core/rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,8 @@ def global_attributes_set_on_rule(self):
"institution_id", # optional
"model_component", # optional
"further_info_url", # optional
"branding_suffix", # CMIP7
"region", # CMIP7
)
# attribute `creation_date` is the time-stamp of inputs directory
try:
Expand Down
5 changes: 4 additions & 1 deletion src/pycmor/data_request/collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,10 @@ def from_all_var_info(cls, data):
tables[table_id] = table
for variable in table.variables:
variable.table_header = table.header
variables[variable.variable_id] = variable
# Use compound key (table.variable) to avoid conflicts
# e.g., "Omon.tos" instead of just "tos"
compound_key = f"{table_id}.{variable.variable_id}"
variables[compound_key] = variable
return cls(tables, variables)

@classmethod
Expand Down
14 changes: 13 additions & 1 deletion src/pycmor/data_request/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,7 +464,19 @@ def from_all_var_info_json(cls, var_name: str, table_name: str):

@property
def attrs(self) -> dict:
raise NotImplementedError("CMI7 attributes are not yet finalized")
"""Return variable attributes for CMIP7"""
attrs_dict = {
"standard_name": self._standard_name,
"long_name": self._long_name,
"units": self._units,
"cell_methods": self._cell_methods,
"_FillValue": getattr(self, "_FillValue", None),
"missing_value": getattr(self, "missing_value", None),
}
# Add comment if available
if self._comment:
attrs_dict["comment"] = self._comment
return attrs_dict

@property
def cell_measures(self) -> str:
Expand Down
8 changes: 8 additions & 0 deletions src/pycmor/fesom/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Lazy import to avoid loading dependencies at startup
def __getattr__(name):
if name == "regrid_to_regular":
from ..fesom_2p1.regridding import regrid_to_regular
return regrid_to_regular
raise AttributeError(f"module 'pycmor.fesom' has no attribute '{name}'")

__all__ = ["regrid_to_regular"]
Loading
Loading