Skip to content

#122 Correction Package Refactor — Complete Redesign#133

Draft
mmaclay wants to merge 8 commits intomainfrom
122-refactor-correction-package-epic
Draft

#122 Correction Package Refactor — Complete Redesign#133
mmaclay wants to merge 8 commits intomainfrom
122-refactor-correction-package-epic

Conversation

@mmaclay
Copy link
Copy Markdown
Collaborator

@mmaclay mmaclay commented Mar 13, 2026

The is the PR for epic #122 - and it will track changes from the sub-issues.

Correction Package Refactor — Complete Redesign

Top-level tracking issue for a complete refactor and architectural redesign of curryer.correction.

Goals

  • Move to Pydantic config models; remove ad-hoc dataclasses
  • Remove user-supplied loader protocols; internalize data loading as config-driven
  • Split correction.py monolith into focused, testable modules
  • Add key deliverables: verification mode & GCP regridding tool
  • Lay groundwork for future CurryerConfig unifying all of curryer

PR Groupings & Sub-Issues

PR 1: Foundation — Naming & Module Split

  • Rename GeolocationConfig disambiguations & error_stats.py
  • Break up correction.py monolith into focused modules

PR 2: Config Redesign — Pydantic, Internalize Loading, Search Strategies

  • Introduce Pydantic CorrectionConfig with typed sub-models
  • Remove loader protocols — internalize data loading
  • Add SearchStrategy enum for deterministic parameter sweeps

PR 3: New Features — Verification & GCP Regridding

  • Add verification module
  • Add gcp_regrid module

PR 4: Test Cleanup & Future Planning

  • Restructure test_correction/
  • Tracking: extend pydantic config pattern to all of curryer

Dependency Graph

PR 1 (Foundation)
├─ Renames (unblocks everything)
└─ Module split (unblocks everything)
    │
    ▼
PR 2 (Config & Loading)
├─ Pydantic config (unblocks loader removal + search strategies)
├─ Remove protocols (depends on pydantic config)
└─ Search strategies (depends on pydantic config)
    │
    ▼
PR 3 (Features) — can start in parallel with PR 2 by stubbing config
├─ Verification module
└─ GCP regridding module
    │
    ▼
PR 4 (Cleanup)
├─ Test restructuring
└─ Future config tracking

Context

  • No external users of the correction package yet — breaking changes are acceptable
  • Pydantic introduced here will serve as the prototype for a future curryer-wide config system
  • CLARREO-specific code stays in tests/examples only; main library must be mission-agnostic

@mmaclay mmaclay linked an issue Mar 13, 2026 that may be closed by this pull request
9 tasks
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 80.80761% with 827 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.52%. Comparing base (996cc37) to head (d6bb926).

Files with missing lines Patch % Lines
curryer/correction/pipeline.py 34.12% 220 Missing and 29 partials ⚠️
curryer/correction/parameters.py 61.79% 76 Missing and 26 partials ⚠️
curryer/correction/image_io.py 66.77% 78 Missing and 21 partials ⚠️
curryer/correction/regrid.py 67.52% 64 Missing and 12 partials ⚠️
curryer/correction/config.py 78.27% 36 Missing and 27 partials ⚠️
curryer/correction/kernel_ops.py 51.57% 39 Missing and 7 partials ⚠️
tests/test_correction/clarreo/_pipeline_helpers.py 65.85% 39 Missing and 3 partials ⚠️
tests/test_correction/_synthetic_helpers.py 61.25% 30 Missing and 1 partial ⚠️
tests/test_correction/test_pipeline.py 72.22% 25 Missing ⚠️
curryer/correction/results_io.py 84.66% 11 Missing and 12 partials ⚠️
... and 13 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
+ Coverage   73.62%   76.52%   +2.89%     
==========================================
  Files          67       90      +23     
  Lines       10655    12303    +1648     
  Branches     1204     1331     +127     
==========================================
+ Hits         7845     9415    +1570     
- Misses       2343     2388      +45     
- Partials      467      500      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI and others added 8 commits April 9, 2026 11:14
…py (#132)

* Initial plan
* Rename GeolocationConfig classes and geolocation_error_stats.py
* Convert Google-style docstrings to numpy style in error_stats.py
* ruff
* breaking up modules

* correction shim for backwards compatibility

* configuration refactor

* descope pipeline to shorten

* show exports in init

* type fix
…135)

* add pydantic

* Introduce Pydantic models for configuration with typed sub-models

* Add unit tests for Pydantic-based correction configuration models

* Refactor: Remove legacy alias handling and update parameter access in configuration
* Refactor dataio module: Update documentation and rename data-loader protocol types

* Refactor CorrectionConfig: Simplify data loading process and enhance validation

* Refactor clarreo_data_loaders: Remove loader protocols and transition to config-driven data loading

* Refactor config: Introduce DataConfig for config-driven data loading and remove legacy loader protocols

* Refactor correction.py: Remove legacy loader functions and integrate DataConfig for improved data handling

* Refactor dataio.py: Remove loader protocols and update documentation for validation helpers

* Refactor image_match.py: Remove ImageMatchingFunc protocol and update output validation

* Refactor image_match.py: Remove ImageMatchingFunc protocol and update output validation

* Refactor pipeline.py: Remove mission-specific loader functions and implement internal file loading for telemetry and science data

* Refactor test_config.py: Add tests for DataConfig and remove legacy loader checks

* Refactor test_correction.py: Replace loader functions with DataConfig for file-based loading

* Refactor test_pairing.py: Remove redundant validation tests for pairing output

* Add CLARREO preprocessing script for telemetry and science data

* Refactor clarreo_data_loaders.py: Remove GCPLoader protocol and implement telemetry and science data loading functions

* load instead of open

* Refactor config.py: Remove GCP-related fields and clarify time field documentation

* Refactor correction.py: Add _resolve_gcp_pairs function to enhance data processing

* Refactor pipeline.py: Add _resolve_gcp_pairs function for GCP key validation

* Refactor test_config.py: Remove GCP-related assertions and simplify DataConfig tests

* Refactor test_correction.py: Remove 'corrected_timestamp' field from DataConfig instances
…137)

* Add SearchStrategy enum for deterministic parameter sweeps

* Add SearchStrategy enum and validation for correction analysis

* Add support for multiple search strategies in parameter set generation

* Add unit tests for parameter-set generation strategies

* Refactor search strategy validation to improve error messaging consistency

* Remove unused conversion functions for sigma to radians and seconds

* Refactor parameter set comparison in tests for consistency and clarity

* check parameter datatype

* Fix formatting in parameters.py for improved readability

* Update curryer/correction/parameters.py

* Update curryer/correction/parameters.py

* Update tests/test_correction/test_parameters.py

* Update tests/test_correction/test_parameters.py

* Add max_grid_sets parameter to limit GRID_SEARCH materialization

* Add search strategy enum for deterministic parameter sweeps and enforce max_grid_sets limit

* Add tests for max_grid_sets enforcement in GRID_SEARCH strategy
* cherry-pick regridding files from MM-104

* add regrid module and update imports in __init__.py

* add RegridConfig model with validation for GCP chip regridding parameters

* deprecate MATLAB file loading utilities in image_match; redirect to curryer.correction.image_io

* update import for load_image_grid_from_mat to use image_io module

* update import for integrated_image_match to use image_io module

* apply rng seed

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* remove cubic

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* check interpolation method

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* user helper for check point in cell

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor test cases to use default_rng for random data generation

* fix: add missing newline for code readability

* test gcp NetCDF and HDF5 support

* refactor: remove unused tolerance variable in cell check

* Update curryer/correction/data_structures.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* vectorize regard

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* relax tolerance on grid boundaries

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* valid pixels only

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor: improve documentation and streamline code in GCP regridding

* fix: update history attribute formatting in image_io.py

* docs: add gcp_regridding.md to contents

* fix: improve boundary handling in regrid.py to prevent NaN fill values

* feat: add example scripts for regridding GCP chips to NetCDF format

* fix: update variable names in image_io.py for consistency with GCP standards

* fix: enhance variable loading in image_io.py for compatibility with multiple naming conventions

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…rection loop (#138)

* add verification module for geolocation compliance checks

* add verification module for geolocation compliance checks

* add unit tests for verification module and its components

* handle HDF5 file loading in image_io.py when HDF4 library is unavailable

* improve HDF file loading in image_io.py to handle errors and fallback between HDF4 and HDF5

* add validation function and update verify parameters to include optional work_dir

* update verification tests to include work_dir parameter in verify function calls

* add minimal example for production verification workflow on geolocated observations

* add example script for weekly verification of geolocated observations

* refactor verification module to enhance key attribute handling and improve dataset aggregation logic

* refactor verification module to enhance key attribute handling and improve dataset aggregation logic

* enhance verification module to improve JSON serialization and summary output formatting
…for CLARREO (#146)

* remove tests

* add integration tests package

* refactor pytest configuration for test_correction to improve maintainability

* refactor test_dataio.py to use pytest and improve test structure for maintainability

* refactor test_image_match.py to use pytest and improve test structure for maintainability

* refactor test_pairing.py to use pytest and improve test structure for maintainability

* add __init__.py for CLARREO-specific correction tests

* add image-matching and pipeline runner helpers for CLARREO integration tests

* add synthetic data generation helpers for testing pipeline and e2e scenarios

* add unit tests for kernel operations in test_kernel_ops.py

* add tests for pipeline functions in test_pipeline.py

* add tests for results_io functions in test_results_io.py

* copy config

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* copy config

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update tests/test_correction/clarreo/_image_match_helpers.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix root_dir duplication and clarreo_cfg mutation in test files

Agent-Logs-Url: https://github.com/lasp/curryer/sessions/6ba9aa43-35f1-476a-a4dc-ad8eae24511b

Co-authored-by: mmaclay <21048535+mmaclay@users.noreply.github.com>

* Strengthen test_generate_clarreo_config_json and fix JSON round-trip bugs in load_config_from_json

Agent-Logs-Url: https://github.com/lasp/curryer/sessions/bbf5eb75-f1b0-46f8-af24-f2758fd30858

Co-authored-by: mmaclay <21048535+mmaclay@users.noreply.github.com>

* Refactor assertions in test_clarreo_config.py for clarity and maintainability; streamline field retrieval in config.py

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
@mmaclay mmaclay force-pushed the 122-refactor-correction-package-epic branch from 0a09d1c to d6bb926 Compare April 9, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[EPIC] Correction Package Refactor / Redesign

2 participants