Skip to content

PMP enso #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
May 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
414a83b
initial commit for enso codes
lee1043 May 5, 2025
b5d8751
rename internal function and generalize variable name
lee1043 May 5, 2025
3973278
apply changes from #271
lee1043 May 5, 2025
92ede94
update
lee1043 May 5, 2025
7c5730f
update
lee1043 May 5, 2025
06e50ff
pre-commit fix
lee1043 May 5, 2025
7719d88
pre-commit fix
lee1043 May 5, 2025
f08c655
in progress
lee1043 May 6, 2025
8ecd3f7
update
lee1043 May 6, 2025
afa9c49
Update packages/climate-ref-pmp/src/climate_ref_pmp/diagnostics/enso.py
lee1043 May 7, 2025
76c7b97
update
lee1043 May 7, 2025
b2aeb1c
update
lee1043 May 7, 2025
81e60c4
in progress
lee1043 May 8, 2025
dbcdade
in progress
lee1043 May 8, 2025
d6b7491
update
lee1043 May 8, 2025
75501a6
add change log
lee1043 May 8, 2025
4b689d4
Update environment.yml
lee1043 May 8, 2025
0098813
update
lee1043 May 8, 2025
beae40e
update
lee1043 May 8, 2025
78d8d3b
Merge remote-tracking branch 'origin/main' into 223_pmp-enso-2
lewisjared May 9, 2025
d11b581
feat: Rework so that the command is executed
lewisjared May 9, 2025
c4dd856
clean up
lee1043 May 9, 2025
a49b336
update
lee1043 May 9, 2025
5b6885b
ruff fix
lee1043 May 9, 2025
7511d09
remove enso param file as enso driver does not need it for the curren…
lee1043 May 9, 2025
cd116e9
update
lee1043 May 9, 2025
5e21ae7
generate landmask for reference per variable basis because it is poss…
lee1043 May 9, 2025
f19e52d
typo fix
lee1043 May 9, 2025
b78f6b3
update
lee1043 May 11, 2025
c986f81
update
lee1043 May 11, 2025
aec1b48
update
lee1043 May 11, 2025
477a4ae
add logger lib to the pmp env
lee1043 May 11, 2025
f606fc6
update
lee1043 May 11, 2025
136edb1
update -- bug fix
lee1043 May 12, 2025
665ba8d
update -- typo fix
lee1043 May 12, 2025
6c6a72c
update
lee1043 May 12, 2025
b488f32
adjust numpy version limit
lee1043 May 12, 2025
b9220c7
chore: Update lockfile
lewisjared May 12, 2025
9166fbc
bug fix
lee1043 May 12, 2025
c212593
update
lee1043 May 12, 2025
14fc030
typo fix
lee1043 May 12, 2025
61d937e
cmec converter added
lee1043 May 14, 2025
09740b2
update cmec converter
lee1043 May 15, 2025
b9adbfa
bug fix
lee1043 May 15, 2025
1d6c427
update
lee1043 May 15, 2025
89ebb03
update
lee1043 May 15, 2025
242daac
clean up
lee1043 May 15, 2025
3783587
add ERA-5
lee1043 May 15, 2025
fa77ff3
clean up
lee1043 May 15, 2025
31209b6
Merge remote-tracking branch 'origin/main' into 223_pmp-enso-2
lewisjared May 16, 2025
907d97e
chore: cleanup dict_datasets
lewisjared May 16, 2025
8c7e2df
chore: Add files to obs4REF registry
lewisjared May 16, 2025
5a8e54e
Merge branch 'main' into 223_pmp-enso-2
lewisjared May 16, 2025
5d1f92d
chore: Skip coverage of driver files
lewisjared May 16, 2025
42f1e89
Merge remote-tracking branch 'origin/main' into 223_pmp-enso-2
lewisjared May 16, 2025
1a7f33e
chore: Add areacella and sftlf
lewisjared May 18, 2025
2c066c1
chore: Adding REF_TEST_DATA_DIR for out-of-source sample data
lewisjared May 18, 2025
c9637c9
typo fix
lee1043 May 21, 2025
757ed71
bug fix -- re-enable landsea mask estimation for obs and models if ne…
lee1043 May 21, 2025
a06787a
typo fix
lee1043 May 22, 2025
24d99aa
Merge remote-tracking branch 'origin/main' into 223_pmp-enso-2
lewisjared May 22, 2025
7e7f9ca
testing
lee1043 May 22, 2025
1362cf3
clean up
lee1043 May 22, 2025
e4b8c31
clean up
lee1043 May 22, 2025
34c1746
Merge remote-tracking branch 'origin/main' into 223_pmp-enso-2
lewisjared May 26, 2025
59bef7d
chore: Add additional dimensions
lewisjared May 26, 2025
4389095
chore: Add regression outputs
lewisjared May 26, 2025
3755dd4
chore: fix number of obs4ref file
lewisjared May 26, 2025
67656d3
chore: Fix coverage
lewisjared May 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog/273.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Implemented PMP ENSO metrics
3 changes: 3 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,9 @@ def regression_data_dir(test_data_dir) -> Path:

@pytest.fixture(autouse=True, scope="session")
def sample_data() -> None:
if os.environ.get("REF_TEST_DATA_DIR"):
logger.warning("Not fetching sample data. Using custom test data directory")
return
# Downloads the sample data if it doesn't exist
logger.disable("climate_ref_core.dataset_registry")
fetch_sample_data(force_cleanup=False, symlink=False)
Expand Down
7 changes: 7 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,13 @@ This defaults to the following locations:
environment variable, if defined. (Linux)
* `%USERPROFILE%\AppData\Local\climate_ref\Cache` (Windows)

### `REF_TEST_DATA_DIR`

Override the location of the test data directory.
If this is not set, the test data directory will be inferred from the location of the test suite.

If this is set, then the sample data won't be updated.

### `REF_TEST_OUTPUT`

Path where the test output is stored.
Expand Down
2 changes: 1 addition & 1 deletion packages/climate-ref-core/src/climate_ref_core/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def initialise_logging(level: int | str, format: str, log_directory: str | Path)
retention=10,
level="DEBUG",
format=VERBOSE_LOG_FORMAT,
colorize=True,
colorize=False,
)
logger.info("Starting REF logging")
logger.info(f"arguments: {sys.argv}")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
fetch_all_files,
)

NUM_OBS4REF_FILES = 58


@pytest.fixture
def fake_registry_file():
Expand Down Expand Up @@ -107,7 +109,7 @@ def test_fetch_all_files(mocker, tmp_path, symlink):
registry.fetch = mocker.MagicMock(return_value=downloaded_file)

fetch_all_files(registry, "obs4ref", tmp_path, symlink=symlink)
assert registry.fetch.call_count == 59
assert registry.fetch.call_count == NUM_OBS4REF_FILES

expected_file = (
tmp_path / "obs4REF/MOHC/HadISST-1-1/mon/ts/gn/v20210727/ts_mon_HadISST-1-1_PCMDI_gn_187001-201907.nc"
Expand All @@ -123,4 +125,4 @@ def test_fetch_all_files_no_output(mocker):
registry.fetch = mocker.MagicMock()

fetch_all_files(registry, "obs4ref", None)
assert registry.fetch.call_count == 59
assert registry.fetch.call_count == NUM_OBS4REF_FILES
12 changes: 10 additions & 2 deletions packages/climate-ref-pmp/src/climate_ref_pmp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,30 @@

from climate_ref_core.dataset_registry import DATASET_URL, dataset_registry_manager
from climate_ref_core.providers import CondaDiagnosticProvider
from climate_ref_pmp.diagnostics import AnnualCycle, ExtratropicalModesOfVariability
from climate_ref_pmp.diagnostics import ENSO, AnnualCycle, ExtratropicalModesOfVariability

__version__ = importlib.metadata.version("climate-ref-pmp")

# Create the PMP diagnostics provider
# PMP uses a conda environment to run the diagnostics
provider = CondaDiagnosticProvider("PMP", __version__)

# Annual cycle diagnostics and metrics
provider.register(AnnualCycle())

# ENSO diagnostics and metrics
# provider.register(ENSO("ENSO_perf")) # Assigned to ESMValTool
provider.register(ENSO("ENSO_tel"))
provider.register(ENSO("ENSO_proc"))

# Extratropical modes of variability diagnostics and metrics
provider.register(ExtratropicalModesOfVariability("PDO"))
provider.register(ExtratropicalModesOfVariability("NPGO"))
provider.register(ExtratropicalModesOfVariability("NAO"))
provider.register(ExtratropicalModesOfVariability("NAM"))
provider.register(ExtratropicalModesOfVariability("PNA"))
provider.register(ExtratropicalModesOfVariability("NPO"))
provider.register(ExtratropicalModesOfVariability("SAM"))
provider.register(AnnualCycle())


dataset_registry_manager.register(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
"""PMP diagnostics."""

from climate_ref_pmp.diagnostics.annual_cycle import AnnualCycle
from climate_ref_pmp.diagnostics.enso import ENSO
from climate_ref_pmp.diagnostics.variability_modes import ExtratropicalModesOfVariability

__all__ = [
"ENSO",
"AnnualCycle",
"ExtratropicalModesOfVariability",
]
245 changes: 245 additions & 0 deletions packages/climate-ref-pmp/src/climate_ref_pmp/diagnostics/enso.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
import json
import os
from collections.abc import Collection, Iterable
from typing import Any

from loguru import logger

from climate_ref_core.constraints import AddSupplementaryDataset
from climate_ref_core.datasets import DatasetCollection, FacetFilter, SourceDatasetType
from climate_ref_core.diagnostics import (
CommandLineDiagnostic,
DataRequirement,
ExecutionDefinition,
ExecutionResult,
)
from climate_ref_pmp.pmp_driver import _get_resource, process_json_result


class ENSO(CommandLineDiagnostic):
"""
Calculate the ENSO performance metrics for a dataset
"""

facets = ("source_id", "member_id", "grid_label", "experiment_id", "metric", "reference_datasets")

def __init__(self, metrics_collection: str, experiments: Collection[str] = ("historical",)) -> None:
self.name = metrics_collection
self.slug = metrics_collection.lower()
self.metrics_collection = metrics_collection
self.parameter_file = "pmp_param_enso.py"
self.obs_sources: tuple[str, ...]
self.model_variables: tuple[str, ...]

if metrics_collection == "ENSO_perf": # pragma: no cover
self.model_variables = ("pr", "ts", "tauu")
self.obs_sources = ("GPCP-Monthly-3-2", "TropFlux-1-0", "HadISST-1-1")
elif metrics_collection == "ENSO_tel":
self.model_variables = ("pr", "ts")
self.obs_sources = ("GPCP-Monthly-3-2", "TropFlux-1-0", "HadISST-1-1")
elif metrics_collection == "ENSO_proc":
self.model_variables = ("ts", "tauu", "hfls", "hfss", "rlds", "rlus", "rsds", "rsus")
self.obs_sources = (
"GPCP-Monthly-3-2",
"TropFlux-1-0",
"HadISST-1-1",
"CERES-EBAF-4-2",
)
else:
raise ValueError(
f"Unknown metrics collection: {metrics_collection}. "
"Valid options are: ENSO_perf, ENSO_tel, ENSO_proc"
)

self.data_requirements = self._get_data_requirements(experiments)

def _get_data_requirements(
self,
experiments: Collection[str] = ("historical",),
) -> tuple[DataRequirement, DataRequirement]:
filters = [
FacetFilter(
facets={
"frequency": "mon",
"experiment_id": tuple(experiments),
"variable_id": self.model_variables,
}
)
]

return (
DataRequirement(
source_type=SourceDatasetType.obs4MIPs,
filters=(
FacetFilter(facets={"source_id": self.obs_sources, "variable_id": self.model_variables}),
),
group_by=("activity_id",),
),
DataRequirement(
source_type=SourceDatasetType.CMIP6,
filters=tuple(filters),
group_by=("source_id", "experiment_id", "member_id", "grid_label"),
constraints=(
AddSupplementaryDataset.from_defaults("areacella", SourceDatasetType.CMIP6),
AddSupplementaryDataset.from_defaults("sftlf", SourceDatasetType.CMIP6),
),
),
)

def build_cmd(self, definition: ExecutionDefinition) -> Iterable[str]:
"""
Run the diagnostic on the given configuration.

Parameters
----------
definition : ExecutionDefinition
The configuration to run the diagnostic on.

Returns
-------
:
The result of running the diagnostic.
"""
mc_name = self.metrics_collection

# ------------------------------------------------
# Get the input datasets information for the model
# ------------------------------------------------
input_datasets = definition.datasets[SourceDatasetType.CMIP6]
input_selectors = input_datasets.selector_dict()
source_id = input_selectors["source_id"]
member_id = input_selectors["member_id"]
experiment_id = input_selectors["experiment_id"]
variable_ids = set(input_datasets["variable_id"].unique()) - {"areacella", "sftlf"}
mod_run = f"{source_id}_{member_id}"

# We only need one entry for the model run
dict_mod: dict[str, dict[str, Any]] = {mod_run: {}}

def extract_variable(dc: DatasetCollection, variable: str) -> list[str]:
return dc.datasets[input_datasets["variable_id"] == variable]["path"].to_list() # type: ignore

# TO DO: Get the path to the files per variable
for variable in variable_ids:
list_files = extract_variable(input_datasets, variable)
list_areacella = extract_variable(input_datasets, "areacella")
list_sftlf = extract_variable(input_datasets, "sftlf")

if len(list_files) > 0:
dict_mod[mod_run][variable] = {
"path + filename": list_files,
"varname": variable,
"path + filename_area": list_areacella,
"areaname": "areacella",
"path + filename_landmask": list_sftlf,
"landmaskname": "sftlf",
}

# -------------------------------------------------------
# Get the input datasets information for the observations
# -------------------------------------------------------
reference_dataset = definition.datasets[SourceDatasetType.obs4MIPs]
reference_dataset_names = reference_dataset["source_id"].unique()

dict_obs: dict[str, dict[str, Any]] = {}

# TO DO: Get the path to the files per variable and per source
for obs_name in reference_dataset_names:
dict_obs[obs_name] = {}
for variable in variable_ids:
# Get the list of files for the current variable and observation source
list_files = reference_dataset.datasets[
(reference_dataset["variable_id"] == variable)
& (reference_dataset["source_id"] == obs_name)
]["path"].to_list()
# If the list is not empty, add it to the dictionary
if len(list_files) > 0:
dict_obs[obs_name][variable] = {
"path + filename": list_files,
"varname": variable,
}

# Create input directory
dict_datasets = {
"model": dict_mod,
"observations": dict_obs,
"metricsCollection": mc_name,
"experiment_id": experiment_id,
}

# Create JSON file for dictDatasets
json_file = os.path.join(
definition.output_directory, f"input_{mc_name}_{source_id}_{experiment_id}_{member_id}.json"
)
with open(json_file, "w") as f:
json.dump(dict_datasets, f, indent=4)
logger.debug(f"JSON file created: {json_file}")

driver_file = _get_resource("climate_ref_pmp.drivers", "enso_driver.py", use_resources=True)
return [
"python",
driver_file,
"--metrics_collection",
mc_name,
"--experiment_id",
experiment_id,
"--input_json_path",
json_file,
"--output_directory",
str(definition.output_directory),
]

def build_execution_result(self, definition: ExecutionDefinition) -> ExecutionResult:
"""
Build a diagnostic result from the output of the PMP driver

Parameters
----------
definition
Definition of the diagnostic execution

Returns
-------
Result of the diagnostic execution
"""
input_datasets = definition.datasets[SourceDatasetType.CMIP6]
source_id = input_datasets["source_id"].unique()[0]
experiment_id = input_datasets["experiment_id"].unique()[0]
member_id = input_datasets["member_id"].unique()[0]
mc_name = self.metrics_collection
pattern = f"{mc_name}_{source_id}_{experiment_id}_{member_id}"

# Find the results files
results_files = list(definition.output_directory.glob(f"{pattern}_cmec.json"))
logger.debug(f"Results files: {results_files}")

if len(results_files) != 1: # pragma: no cover
logger.warning(f"A single cmec output file not found: {results_files}")
return ExecutionResult.build_from_failure(definition)

# Find the other outputs
png_files = [definition.as_relative_path(f) for f in definition.output_directory.glob("*.png")]
data_files = [definition.as_relative_path(f) for f in definition.output_directory.glob("*.nc")]

cmec_output, cmec_metric = process_json_result(results_files[0], png_files, data_files)

input_selectors = definition.datasets[SourceDatasetType.CMIP6].selector_dict()
cmec_metric_bundle = cmec_metric.remove_dimensions(
[
"model",
"realization",
],
).prepend_dimensions(
{
"source_id": input_selectors["source_id"],
"member_id": input_selectors["member_id"],
"grid_label": input_selectors["grid_label"],
"experiment_id": input_selectors["experiment_id"],
}
)

return ExecutionResult.build_from_output_bundle(
definition,
cmec_output_bundle=cmec_output,
cmec_metric_bundle=cmec_metric_bundle,
)
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,18 @@ def __init__(self, mode_id: str):
self.name = f"Extratropical modes of variability: {mode_id}"
self.slug = f"extratropical-modes-of-variability-{mode_id.lower()}"

def get_data_requirements(
def _get_data_requirements(
obs_source: str,
obs_variable: str,
cmip_variable: str,
model_variable: str,
extra_experiments: str | tuple[str, ...] | list[str] = (),
) -> tuple[DataRequirement, DataRequirement]:
filters = [
FacetFilter(
facets={
"frequency": "mon",
"experiment_id": ("historical", "hist-GHG", "piControl", *extra_experiments),
"variable_id": cmip_variable,
"variable_id": model_variable,
}
)
]
Expand All @@ -70,10 +70,10 @@ def get_data_requirements(

if self.mode_id in self.ts_modes:
self.parameter_file = "pmp_param_MoV-ts.py"
self.data_requirements = get_data_requirements("HadISST-1-1", "ts", "ts")
self.data_requirements = _get_data_requirements("HadISST-1-1", "ts", "ts")
elif self.mode_id in self.psl_modes:
self.parameter_file = "pmp_param_MoV-psl.py"
self.data_requirements = get_data_requirements("20CR", "psl", "psl", extra_experiments=("amip",))
self.data_requirements = _get_data_requirements("20CR", "psl", "psl", extra_experiments=("amip",))
else:
raise ValueError(
f"Unknown mode_id '{self.mode_id}'. Must be one of {self.ts_modes + self.psl_modes}"
Expand Down
Loading