Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull new changes #61

Merged
merged 29 commits into from
Mar 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8b88896
Added spawn_copy options for some handlers, also made CFADataset a lo…
dwest77a Feb 26, 2025
c35e3d4
Updated all mixins for completion workflow, consolidated status methods
dwest77a Feb 26, 2025
4450089
Added completion methods for both project and group
dwest77a Feb 26, 2025
85982ca
Documented the Completion Workflow
dwest77a Feb 26, 2025
d31acc3
Updated docstrings for all core mixins
dwest77a Feb 26, 2025
8c6635f
Updated docstrings
dwest77a Feb 26, 2025
b45dace
Fixed bug with logging to filesystem
dwest77a Feb 26, 2025
5ab8cc1
Fixed bug with logging to filesystem
dwest77a Feb 26, 2025
1574ea9
Added phase for validate
dwest77a Feb 26, 2025
0b8296c
Added Zarr tests in correct order, added several project tests
dwest77a Feb 26, 2025
f97edff
Minor bug fixes
dwest77a Feb 26, 2025
8b04fc7
Rename to get tests working
dwest77a Feb 26, 2025
2fa85a3
Readded test project
dwest77a Feb 26, 2025
a2131fe
Added params to docstrings
dwest77a Feb 26, 2025
227c61c
Updated all imports with isort
dwest77a Feb 27, 2025
a13dcd9
Updated docstrings, added metadata writes to CFA dataset
dwest77a Mar 3, 2025
2eace2a
Updated docstrings
dwest77a Mar 3, 2025
48b3168
Updated how CFA dataset is defined
dwest77a Mar 3, 2025
890928f
Added logging to all dataset filehandler instances
dwest77a Mar 3, 2025
3ee0bb9
Minor additional changes to CFA handler, dryrun messages use loggers
dwest77a Mar 3, 2025
a3930df
Added auto updated for CFA datasets, inline with other dataset metada…
dwest77a Mar 3, 2025
e1e3b57
Updated release notes
dwest77a Mar 3, 2025
b7d49df
Updated release notes
dwest77a Mar 3, 2025
a53e45b
Added auto history update with minor version increment
dwest77a Mar 3, 2025
a7b974b
Added auto update for CFA history
dwest77a Mar 3, 2025
0c24ee6
Updated release notes
dwest77a Mar 3, 2025
d755788
Merge pull request #60 from cedadev/test_fixes
dwest77a Mar 3, 2025
b57e428
Incremented version
dwest77a Mar 3, 2025
5bd9155
Merge branch 'shepard_dev' into deleteme
dwest77a Mar 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions docs/source/core/interactive.rst
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,22 @@ A project can also be transferred between two group instances using the followin

Developer note (05/02/25): The transfer project mechanism is currently in alpha deployment, and is known to exhibit inconsistent behaviour when trying to transfer a project to a new uninitialised group. This is an ongoing issue.

6. Completion of a group of projects
------------------------------------

As of padocc v1.3.2, the Group operator now includes a ``complete_group`` method, which can be used to extract all created products from all projects in a group. This replaces the previous method which would involve running the validation phase in a specific way. This method requires a **completion directory** where all products will be copied. Project codes and revisions are applied at this stage to the copied products, whereas inside the pipeline most products are not referred to by their project codes.

.. code:: python

>>> # With my_group initialised with 'verbose as true'
>>> my_group.complete_group('my_home_dir/completed_datasets')
INFO [group-operation]: Verifying completion directory exists
INFO [group-operation]: Completing 2/2 projects for my-group
INFO [group-operation]: Updated new status: complete - Success
INFO [group-operation]: Updated new status: complete - Success

You can then check inside the ``completed_datasets`` directory to verify all products are present. For each kerchunk/zarr dataset you will also see a ``.nca`` CFA dataset file, which follows the Climate Forecast Aggregation conventions (see https://cedadev.github.io/CFAPyX/ for more details). These can be used locally with Xarray to open the dataset.

Using the ProjectOperation class
================================

Expand Down
1 change: 0 additions & 1 deletion padocc/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,4 @@

from .core import ProjectOperation
from .groups import GroupOperation

from .phases import phase_map
3 changes: 2 additions & 1 deletion padocc/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@

import argparse

from padocc.core.utils import BypassSwitch, get_attribute
from padocc import GroupOperation, phase_map
from padocc.core.utils import BypassSwitch, get_attribute


def get_args():
parser = argparse.ArgumentParser(description='Run a pipeline step for a group of datasets')
Expand Down
15 changes: 3 additions & 12 deletions padocc/core/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,6 @@
__contact__ = "[email protected]"
__copyright__ = "Copyright 2024 United Kingdom Research and Innovation"

from .logs import (
init_logger,
reset_file_handler,
FalseLogger,
LoggedOperation
)

from .utils import (
BypassSwitch
)

from .project import ProjectOperation
from .logs import FalseLogger, LoggedOperation, init_logger, reset_file_handler
from .project import ProjectOperation
from .utils import BypassSwitch
19 changes: 15 additions & 4 deletions padocc/core/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,20 @@
__copyright__ = "Copyright 2024 United Kingdom Research and Innovation"

import json
import os
import logging
import os
import traceback

from typing import Optional, Union


def error_handler(
err : Exception,
logger: logging.Logger,
phase: str,
subset_bypass: bool = False,
jobid: Optional[str] = None,
status_fh: Optional[object] = None
):
) -> str:

"""
This function should be used at top-level loops over project codes ONLY -
Expand All @@ -25,6 +25,17 @@ def error_handler(
1. Single slurm job failed - raise Error
2. Single serial job failed - raise Error
3. One of a set of tasks failed - print error for that dataset as traceback.

:param err: (Exception) Error raised within some part of the pipeline.

:param logger: (logging.Logger) Logging operator for any messages.

:param subset_bypass: (bool) Skip raising an error if this operation
is part of a sequence.

:param jobid: (str) The ID of the SLURM job if present.

:param status_fh: (object) Padocc Filehandler to update status.
"""

def get_status(tb: list) -> str:
Expand Down Expand Up @@ -156,7 +167,7 @@ def __init__(
proj_code: Union[str,None] = None,
groupdir: Union[str,None] = None
) -> None:
self.message = f'Decoding resulted in overflow - received chunk data contains junk (attempted 3 times)'
self.message = 'Decoding resulted in overflow - received chunk data contains junk (attempted 3 times)'
super().__init__(proj_code, groupdir)
if verbose < 1:
self.__class__.__module__ = 'builtins'
Expand Down
Loading
Loading