Skip to content

Fix stale CRS metadata after reprojection#237

Open
taufik-shf wants to merge 4 commits intoopendatacube:developfrom
piksel-ina:geomad-production-indonesia
Open

Fix stale CRS metadata after reprojection#237
taufik-shf wants to merge 4 commits intoopendatacube:developfrom
piksel-ina:geomad-production-indonesia

Conversation

@taufik-shf
Copy link
Copy Markdown

Problem

While processing data with odc-stats, I noticed the pipeline correctly reprojects data to the target grid, but the CRS metadata revert to the native CRS downstream. This causes:

  • Inconsistent STAC vs GeoTIFF metadata (STAC shows target CRS, GeoTIFF tags show native CRS)
  • Downstream tools reading incorrect CRS from file headers

Root Cause Analysis

Important

Custom instrumentation for diagnosis
To diagnose this issue, I added custom CRS/grid logging at key pipeline points to track where CRS metadata diverges from actual coordinates. All log lines below (marked WARNING - CRS ...) are from this custom instrumentatio. They're not standard odc-stats output.

The logs track:

  • CRS before/after reprojection
  • Coordinate ranges (to verify grid transformation)
  • CRS metadata at reducer entry/exit

Initial observation

bugs_evidence

The logs reveal the inconsistency:

  • pre_reproject shows native UTM coordinates: x=[315375..373265], y=[43435..-95] -> EPSG:32649
  • post_reproject shows coordinates transformed to the target grid: x=[1.055e+07..1.06e+07], y=[49995..5] -> EPSG:6933 range
  • BUT the CRS metadata still reports crs=EPSG:32649

Note

Identified Issue: The reprojection warps the grid correctly, but the CRS metadata remains stale and affects the downstream.

Fix 1: Assign CRS after reprojection

  • After xr_reproject, I explicitly assign the CRS to match the destination grid

  • This ensures the post-reproject dataset is internally consistent: both coordinates AND CRS metadata reflect the target grid.

fix_01_bugs_02_evidence_2
  • After applying Fix 1:

    • post_reproject now correctly shows crs=EPSG:6933
    • Coordinates and CRS metadata are properly aligned
    • However, there's still an issue downstream.

Note

Issue: Downstream in the gm plugin - reducer(), the dataset enters with the grid CRS (EPSG:6933) but exits with the native CRS again (EPSG:32649).

Fix 2: Prevent CRS regression in gm reducer

The fix:

  1. Capture the authoritative CRS from .odc.crs
  2. Remove all stale CRS/grid-mapping metadata
  3. Cleanly re-attach the CRS

Evidence

bug_fix_evidence

With both fixes applied:

  • Post-reproject CRS is correct
  • Coordinates match the target grid
  • Reducer maintains CRS throughout
  • No CRS regression
  • STAC and GeoTIFF metadata remain consistent

Other smaller changes

  • Ensure proj:transform is JSON serialisable
    changed: transform=geobox.transform -> transform=list(geobox.transform)

I hit a case where geobox.transform was still an Affine object, which is not JSON serialisable and can break STAC/item serialisation. Converting to list(...) ensures we always write a JSON-safe transform.

  • Relax datacube pin and lock odc-stac version

Changed: datacube==1.9.5 to datacube>=1.9.5 and pinned odc-stac==0.4.0.

_stac_fetch.py imports stac2ds from odc-stac. In newer datacube versions (≥1.9.6), stac2ds also exists in datacube.metadata, but as long as we pin odc-stac==0.4.0, the import remains unambiguous and works correctly.

Tested successfully with datacube 1.9.10.

click>=8.0.0
dask
datacube==1.9.5
datacube>=1.9.5
Copy link
Copy Markdown
Contributor

@pjonsson pjonsson Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datacube version is pinned because memory consumption balloons for odc-stats with later versions. I'm not involved in odc-stats so I don't know about the timeline for resolving that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context! I wasn't aware of the memory issue with later datacube versions.

I made this change because my environment uses datacube 1.9.10, and the strict pin was causing dependency conflicts for me.

Would it be possible to loosen the pin to allow users to manage their own datacube version if needed?

**extra_args,
)

# Ensure output advertises the destination CRS consistently
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the output from xr_reproject have the right CRS, so this issue should be fixed inside that function instead?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - it should be fixed in xr_reproject().

I'm happy to attempt a fix in odc-geo if you'd prefer, otherwise we could merge this workaround for now and I'll open an issue upstream.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without speaking for this repository, I think it would make sense to try to fix the problems in odc-algo and odc-geo. It sounds like you've found some bugs, and if I'm wrong and the methods are meant to behave the way they currently behave, you're more likely to get that feedback if you make PRs to those repositories.

gm = geomedian_with_mads(xx, **cfg)
gm = gm.rename(self._renames)

# geomedian_with_mads drops spatial_ref; re-attach from input
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, shouldn't geomedian_with_mads return an object with the right metadata?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it should be fixed in geomedian_with_mads().

I can try to fix it in odc-algo if you'd prefer, or we can merge this workaround for now until the upstream is fixed. Once it is fixed there, I'll remove the workaround to keep the code clean.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants