Skip to content

Latest commit

 

History

History
36 lines (22 loc) · 1.59 KB

README.md

File metadata and controls

36 lines (22 loc) · 1.59 KB

PADOCC Package

PyPI version

Padocc (Pipeline to Aggregate Data for Optimal Cloud Capabilities) is a Data Aggregation pipeline for creating Kerchunk (or alternative) files to represent various datasets in different original formats. Currently the Pipeline supports writing JSON/Parquet Kerchunk files for input NetCDF/HDF files. Further developments will allow GeoTiff, GRIB and possibly MetOffice (.pp) files to be represented, as well as using the Pangeo Rechunker tool to create Zarr stores for Kerchunk-incompatible datasets.

Example Notebooks at this link

Documentation hosted at this link

Kerchunk Pipeline

Release 1.3.3

Release date: 7th March 2025

See the release notes for details.

This package acknowledges contributions by Matt Brown as a pre-release tester.

Installation

To install this package, clone the repository using git clone (and switch to the MigrationOO branch - git checkout MigrationOO if release v1.3 has not been released.)

Then follow the steps below to install the package with the necessary dependencies.

python -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install

Usage

Please refer to the tests/ scripts for how to use the GroupOperation and ProjectOperation classes.