Skip to content

Allow fix_file to return dataset objects #2579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 49 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
68845bb
Avoid copying input data in CMIP6 CESM2 fixes
bouweandela Jun 10, 2024
2541ab1
run a GA test
valeriupredoi Jun 17, 2024
c038510
Use ruff formatting
bouweandela Sep 26, 2024
3e95d3e
Merge branch 'main' of github.com:ESMValGroup/ESMValCore into in-memo…
bouweandela Sep 26, 2024
2e388ed
Disable github action on branch
bouweandela Sep 26, 2024
55bde64
Merge branch 'main' of github.com:ESMValGroup/ESMValCore into in-memo…
bouweandela Oct 17, 2024
e890f56
Fewer changes
bouweandela Oct 17, 2024
142ba29
Fewer changes
bouweandela Oct 17, 2024
625aa75
Move different attribute removal to concatenate
bouweandela Oct 17, 2024
588130a
Add test
bouweandela Oct 17, 2024
aadffe8
Various improvements
bouweandela Oct 23, 2024
07a5e37
Add docstring
bouweandela Oct 23, 2024
b95ee63
Merge branch 'main' into in-memory-fix-file
bouweandela Oct 23, 2024
379fd45
Merge remote-tracking branch 'origin/in-memory-fix-file' into better_…
schlunma Nov 11, 2024
e593611
Do not change CESM2 fixes
schlunma Nov 11, 2024
645d555
More flexible fix_file and ignore warnings
schlunma Nov 11, 2024
891e82b
Add function to convert between xarray/ncdata and iris
schlunma Nov 11, 2024
77c9ef1
Add first tests
schlunma Nov 11, 2024
94ccc90
Add more tests
schlunma Nov 12, 2024
a24de25
Add test to check that no warning is raised
schlunma Nov 12, 2024
3d1bd93
Use ignore_warnings in fix_file
schlunma Nov 12, 2024
e127a9d
Add doc
schlunma Nov 12, 2024
31d3762
Better doc
schlunma Nov 12, 2024
9dbbb40
Merge remote-tracking branch 'origin/main' into better_fix_file
schlunma Nov 12, 2024
7696480
100% coverage in fix.py
schlunma Nov 12, 2024
11c1b51
100% coverage
schlunma Nov 12, 2024
7e097bb
Avoid circular import
schlunma Nov 12, 2024
b8eb5b8
Doc
schlunma Nov 12, 2024
543efe1
Better comment
schlunma Nov 12, 2024
6808164
100% coverage
schlunma Nov 12, 2024
ee043a1
Add missing parameter to docstring
schlunma Nov 12, 2024
b9f70f7
Better docstrings
schlunma Nov 12, 2024
91bbe8d
Improve doc rendering
schlunma Nov 12, 2024
903d694
Double backticks for monospace font
schlunma Nov 12, 2024
b47ba81
Better docstring
schlunma Nov 14, 2024
863b0f1
Merge remote-tracking branch 'origin/main' into better_fix_file
schlunma Dec 11, 2024
ae277ca
Merge remote-tracking branch 'origin/main' into better_fix_file
schlunma Feb 4, 2025
ef317ab
Merge branch 'main' into better_fix_file
valeriupredoi Feb 10, 2025
b80c3b6
Merge remote-tracking branch 'origin/main' into better_fix_file
schlunma May 7, 2025
73c17ae
Please Codacy
schlunma May 7, 2025
db14d0f
Added missing type hint
schlunma May 7, 2025
d795e9d
Added one more missing type hint
schlunma May 7, 2025
cb9b4bb
Merge remote-tracking branch 'origin/main' into better_fix_file
schlunma May 14, 2025
0faf234
Update esmvalcore/cmor/_fixes/fix.py
schlunma May 14, 2025
9e9bd57
Allow load to read xarray and ncdata objects
schlunma May 14, 2025
dc3cfe8
Set up lock sharing in file that actually needs it
schlunma May 14, 2025
239709b
Better docstring
schlunma May 14, 2025
5d96c76
Raise warning if cubes do not contain source_file attribute after load()
schlunma May 14, 2025
4d3c675
Add test for invalid type in load()
schlunma May 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,10 +460,12 @@
'iris': ('https://scitools-iris.readthedocs.io/en/stable/', None),
'esmf_regrid': ('https://iris-esmf-regrid.readthedocs.io/en/stable/', None),
'matplotlib': ('https://matplotlib.org/stable/', None),
'ncdata': ('https://ncdata.readthedocs.io/en/stable/', None),
'numpy': ('https://numpy.org/doc/stable/', None),
'pyesgf': ('https://esgf-pyclient.readthedocs.io/en/stable/', None),
'python': ('https://docs.python.org/3/', None),
'scipy': ('https://docs.scipy.org/doc/scipy/', None),
'xarray': ('https://docs.xarray.dev/en/stable/', None),
}

# -- Extlinks extension -------------------------------------------------------
Expand Down
6 changes: 3 additions & 3 deletions doc/develop/fixing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ Then we have to create the class for the fix deriving from
Next we must choose the method to use between the ones offered by the
Fix class:

- ``fix_file``: should be used only to fix errors that prevent data loading.
As a rule of thumb, you should only use it if the execution halts before
reaching the checks.
- ``fix_file``: you need to fix errors that prevent loading the data with Iris
or perform operations that are more efficient with other packages (e.g.,
loading files with lots of variables is much faster with Xarray than Iris).

- ``fix_metadata``: you want to change something in the cube that is not
the data (e.g., variable or coordinate names, data units).
Expand Down
1 change: 1 addition & 0 deletions doc/quickstart/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -934,6 +934,7 @@ the preprocessing chain.

Currently supported preprocessor steps:

* :func:`~esmvalcore.preprocessor.fix_file`
* :func:`~esmvalcore.preprocessor.load`

Here is an example on how to ignore specific warnings during the preprocessor
Expand Down
30 changes: 16 additions & 14 deletions doc/recipe/preprocessor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -272,20 +272,22 @@ ESMValCore deals with those issues by applying specific fixes for those
datasets that require them. Fixes are applied at three different preprocessor
steps:

- ``fix_file``: apply fixes directly to a copy of the file.
Copying the files is costly, so only errors that prevent Iris to load the
file are fixed here.
See :func:`esmvalcore.preprocessor.fix_file`.

- ``fix_metadata``: metadata fixes are done just before concatenating the
cubes loaded from different files in the final one.
Automatic metadata fixes are also applied at this step.
See :func:`esmvalcore.preprocessor.fix_metadata`.

- ``fix_data``: data fixes are applied before starting any operation that
will alter the data itself.
Automatic data fixes are also applied at this step.
See :func:`esmvalcore.preprocessor.fix_data`.
- ``fix_file``: apply fixes to data before loading them with Iris.
This is mainly intended to fix errors that prevent data loading with Iris
(e.g., those related to ``missing_value`` or ``_FillValue``) or
operations that are more efficient with other packages (e.g., loading
files with lots of variables is much faster with Xarray than Iris). See
:func:`esmvalcore.preprocessor.fix_file`.

- ``fix_metadata``: metadata fixes are done just before concatenating the
cubes loaded from different files in the final one.
Automatic metadata fixes are also applied at this step.
See :func:`esmvalcore.preprocessor.fix_metadata`.

- ``fix_data``: data fixes are applied before starting any operation that
will alter the data itself.
Automatic data fixes are also applied at this step.
See :func:`esmvalcore.preprocessor.fix_data`.

To get an overview on data fixes and how to implement new ones, please go to
:ref:`fixing_data`.
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies:
- jinja2
- libnetcdf !=4.9.1 # to avoid hdf5 warnings
- nc-time-axis
- ncdata
- nested-lookup
- netcdf4
- numpy !=1.24.3
Expand Down
13 changes: 7 additions & 6 deletions esmvalcore/cmor/_fixes/cmip6/cesm2.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,16 @@

def _fix_formula_terms(
self,
filepath,
file,
output_dir,
add_unique_suffix=False,
ignore_warnings=None,

Check notice on line 28 in esmvalcore/cmor/_fixes/cmip6/cesm2.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

esmvalcore/cmor/_fixes/cmip6/cesm2.py#L28

Unused argument 'ignore_warnings' (unused-argument)
):
"""Fix ``formula_terms`` attribute."""
new_path = self.get_fixed_filepath(
output_dir, filepath, add_unique_suffix=add_unique_suffix
output_dir, file, add_unique_suffix=add_unique_suffix
)
copyfile(filepath, new_path)
copyfile(file, new_path)
dataset = Dataset(new_path, mode="a")
dataset.variables["lev"].formula_terms = "p0: p0 a: a b: b ps: ps"
dataset.variables[
Expand All @@ -39,7 +40,7 @@
dataset.close()
return new_path

def fix_file(self, filepath, output_dir, add_unique_suffix=False):
def fix_file(self, file, output_dir, add_unique_suffix=False):

Check notice on line 43 in esmvalcore/cmor/_fixes/cmip6/cesm2.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

esmvalcore/cmor/_fixes/cmip6/cesm2.py#L43

Function is missing a type annotation. (no-untyped-def)

Check notice on line 43 in esmvalcore/cmor/_fixes/cmip6/cesm2.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

esmvalcore/cmor/_fixes/cmip6/cesm2.py#L43

Number of parameters was 5 in 'Fix.fix_file' and is now 4 in overriding 'Cl.fix_file' method (arguments-differ)
"""Fix hybrid pressure coordinate.

Adds missing ``formula_terms`` attribute to file.
Expand All @@ -53,7 +54,7 @@

Parameters
----------
filepath : str
file : str
Path to the original file.
output_dir: Path
Output directory for fixed files.
Expand All @@ -67,7 +68,7 @@

"""
new_path = self._fix_formula_terms(
filepath, output_dir, add_unique_suffix=add_unique_suffix
file, output_dir, add_unique_suffix=add_unique_suffix
)
dataset = Dataset(new_path, mode="a")
dataset.variables["a_bnds"][:] = dataset.variables["a_bnds"][::-1, :]
Expand Down
12 changes: 9 additions & 3 deletions esmvalcore/cmor/_fixes/cmip6/cesm2_waccm.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,13 @@
class Cl(BaseCl):
"""Fixes for cl."""

def fix_file(self, filepath, output_dir, add_unique_suffix=False):
def fix_file(
self,
file,
output_dir,
add_unique_suffix=False,
ignore_warnings=None,

Check notice on line 23 in esmvalcore/cmor/_fixes/cmip6/cesm2_waccm.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

esmvalcore/cmor/_fixes/cmip6/cesm2_waccm.py#L23

Unused argument 'ignore_warnings' (unused-argument)
):
"""Fix hybrid pressure coordinate.

Adds missing ``formula_terms`` attribute to file.
Expand All @@ -29,7 +35,7 @@

Parameters
----------
filepath : str
file: str
Path to the original file.
output_dir: Path
Output directory for fixed files.
Expand All @@ -43,7 +49,7 @@

"""
new_path = self._fix_formula_terms(
filepath, output_dir, add_unique_suffix=add_unique_suffix
file, output_dir, add_unique_suffix=add_unique_suffix
)
dataset = Dataset(new_path, mode="a")
dataset.variables["a_bnds"][:] = dataset.variables["a_bnds"][:, ::-1]
Expand Down
14 changes: 10 additions & 4 deletions esmvalcore/cmor/_fixes/emac/emac.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,13 @@ class AllVars(EmacFix):
"kg/m**2s": "kg m-2 s-1",
}

def fix_file(self, filepath, output_dir, add_unique_suffix=False):
def fix_file(
self,
file,
output_dir,
add_unique_suffix=False,
ignore_warnings=None,
):
"""Fix file.

Fixes hybrid pressure level coordinate.
Expand All @@ -53,11 +59,11 @@ def fix_file(self, filepath, output_dir, add_unique_suffix=False):

"""
if "alevel" not in self.vardef.dimensions:
return filepath
return file
new_path = self.get_fixed_filepath(
output_dir, filepath, add_unique_suffix=add_unique_suffix
output_dir, file, add_unique_suffix=add_unique_suffix
)
copyfile(filepath, new_path)
copyfile(file, new_path)
with Dataset(new_path, mode="a") as dataset:
if "formula_terms" in dataset.variables["lev"].ncattrs():
del dataset.variables["lev"].formula_terms
Expand Down
52 changes: 36 additions & 16 deletions esmvalcore/cmor/_fixes/fix.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@
from typing import TYPE_CHECKING, Any, Optional

import dask
import ncdata
import numpy as np
import xarray as xr
from cf_units import Unit
from iris.coords import Coord, CoordExtent
from iris.cube import Cube, CubeList
Expand All @@ -27,7 +29,10 @@
)
from esmvalcore.cmor.fixes import get_time_bounds
from esmvalcore.cmor.table import get_var_info
from esmvalcore.iris_helpers import has_unstructured_grid, safe_convert_units
from esmvalcore.iris_helpers import (
has_unstructured_grid,
safe_convert_units,
)

if TYPE_CHECKING:
from esmvalcore.cmor.table import CoordinateInfo, VariableInfo
Expand Down Expand Up @@ -75,34 +80,49 @@

def fix_file(
self,
filepath: Path,
file: str | Path | Cube | CubeList | xr.Dataset | ncdata.NcData,
output_dir: Path,
add_unique_suffix: bool = False,
) -> str | Path:
"""Apply fixes to the files prior to creating the cube.
ignore_warnings: Optional[list[dict[str, Any]]] = None,

Check notice on line 86 in esmvalcore/cmor/_fixes/fix.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

esmvalcore/cmor/_fixes/fix.py#L86

Unused argument 'ignore_warnings' (unused-argument)
) -> str | Path | Cube | CubeList | xr.Dataset | ncdata.NcData:
"""Fix files before loading them into a :class:`~iris.cube.CubeList`.

This is mainly intended to fix errors that prevent loading the data
with Iris (e.g., those related to ``missing_value`` or ``_FillValue``)
or operations that are more efficient with other packages (e.g.,
loading files with lots of variables is much faster with Xarray than
Iris).

Should be used only to fix errors that prevent loading or cannot be
fixed in the cube (e.g., those related to `missing_value` or
`_FillValue`).
Warning
-------
A path should only be returned if it points to the original (unchanged)
file (i.e., a fix was not necessary). If a fix is necessary, this
function should return a :class:`~iris.cube.Cube`,
:class:`~iris.cube.CubeList`, :class:`~ncdata.NcData` or
:class:`~xarray.Dataset` object. Under no circumstances a copy of the
input data should be created (this is very inefficient).

Parameters
----------
filepath:
File to fix.
file:
Data to fix or path to them. Original files should not be
overwritten.
output_dir:
Output directory for fixed files.
add_unique_suffix:
Adds a unique suffix to `output_dir` for thread safety.
Adds a unique suffix to ``output_dir`` for thread safety.
ignore_warnings:
Keyword arguments passed to :func:`warnings.filterwarnings` used to
ignore warnings during data loading. Each list element corresponds
to one call to :func:`warnings.filterwarnings`.

Returns
-------
str or pathlib.Path
Path to the corrected file. It can be different from the original
filepath if a fix has been applied, but if not it should be the
original filepath.
str | pathlib.Path | iris.cube.Cube | iris.cube.CubeList | xr.Dataset | ncdata.NcData:
Fixed data or a path to them.

"""
return filepath
return file

def fix_metadata(self, cubes: Sequence[Cube]) -> Sequence[Cube]:
"""Apply fixes to the metadata of the cube.
Expand All @@ -118,7 +138,7 @@

Returns
-------
Iterable[iris.cube.Cube]
Sequence[iris.cube.Cube]
Fixed cubes. They can be different instances.

"""
Expand Down
24 changes: 15 additions & 9 deletions esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,13 @@
class AllVars(Fix):
"""Fixes for all IPSLCM variables."""

def fix_file(self, filepath, output_dir, add_unique_suffix=False):
def fix_file(
self,
file,
output_dir,
add_unique_suffix=False,
ignore_warnings=None,
):
"""Select IPSLCM variable in filepath.

This is done only if input file is a multi-variable one. This
Expand All @@ -34,28 +40,28 @@
"""
if "_" + self.extra_facets.get(
"group", "non-sense"
) + ".nc" not in str(filepath):
) + ".nc" not in str(file):
# No need to filter the file
logger.debug("Not filtering for %s", filepath)
return filepath
logger.debug("Not filtering for %s", file)
return file

Check warning on line 46 in esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py

View check run for this annotation

Codecov / codecov/patch

esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py#L45-L46

Added lines #L45 - L46 were not covered by tests

if not self.extra_facets.get("use_cdo", False):
# The configuration developer doesn't provide CDO, while ESMValTool
# licence policy doesn't allow to include it in dependencies
# Or he considers that plain Iris load is quick enough for
# that file
logger.debug("In ipsl-cm6.py : CDO not activated for %s", filepath)
return filepath
logger.debug("In ipsl-cm6.py : CDO not activated for %s", file)
return file

Check warning on line 54 in esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py

View check run for this annotation

Codecov / codecov/patch

esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py#L53-L54

Added lines #L53 - L54 were not covered by tests

# Proceed with CDO selvar
varname = self.extra_facets.get(VARNAME_KEY, self.vardef.short_name)
alt_filepath = str(filepath).replace(".nc", "_cdo_selected.nc")
alt_filepath = str(file).replace(".nc", "_cdo_selected.nc")

Check warning on line 58 in esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py

View check run for this annotation

Codecov / codecov/patch

esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py#L58

Added line #L58 was not covered by tests
outfile = self.get_fixed_filepath(
output_dir, alt_filepath, add_unique_suffix=add_unique_suffix
)
tim1 = time.time()
logger.debug("Using CDO for selecting %s in %s", varname, filepath)
command = ["cdo", "-selvar,%s" % varname, str(filepath), str(outfile)]
logger.debug("Using CDO for selecting %s in %s", varname, file)
command = ["cdo", "-selvar,%s" % varname, str(file), str(outfile)]

Check notice on line 64 in esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py

View check run for this annotation

Codacy Production / Codacy Static Code Analysis

esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py#L64

Formatting a regular string which could be an f-string (consider-using-f-string)

Check warning on line 64 in esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py

View check run for this annotation

Codecov / codecov/patch

esmvalcore/cmor/_fixes/ipslcm/ipsl_cm6.py#L63-L64

Added lines #L63 - L64 were not covered by tests
subprocess.run(command, check=True)
logger.debug("CDO selection done in %.2f seconds", time.time() - tim1)
return outfile
Expand Down
Loading