Fix stale CRS metadata after reprojection#237
Fix stale CRS metadata after reprojection#237taufik-shf wants to merge 4 commits intoopendatacube:developfrom
Conversation
Add disable comment for too-many-locals on model.py:488. Unrelated to PR changes but was failing pylint
| click>=8.0.0 | ||
| dask | ||
| datacube==1.9.5 | ||
| datacube>=1.9.5 |
There was a problem hiding this comment.
The datacube version is pinned because memory consumption balloons for odc-stats with later versions. I'm not involved in odc-stats so I don't know about the timeline for resolving that.
There was a problem hiding this comment.
Thanks for the context! I wasn't aware of the memory issue with later datacube versions.
I made this change because my environment uses datacube 1.9.10, and the strict pin was causing dependency conflicts for me.
Would it be possible to loosen the pin to allow users to manage their own datacube version if needed?
| **extra_args, | ||
| ) | ||
|
|
||
| # Ensure output advertises the destination CRS consistently |
There was a problem hiding this comment.
Shouldn't the output from xr_reproject have the right CRS, so this issue should be fixed inside that function instead?
There was a problem hiding this comment.
You're right - it should be fixed in xr_reproject().
I'm happy to attempt a fix in odc-geo if you'd prefer, otherwise we could merge this workaround for now and I'll open an issue upstream.
There was a problem hiding this comment.
Without speaking for this repository, I think it would make sense to try to fix the problems in odc-algo and odc-geo. It sounds like you've found some bugs, and if I'm wrong and the methods are meant to behave the way they currently behave, you're more likely to get that feedback if you make PRs to those repositories.
| gm = geomedian_with_mads(xx, **cfg) | ||
| gm = gm.rename(self._renames) | ||
|
|
||
| # geomedian_with_mads drops spatial_ref; re-attach from input |
There was a problem hiding this comment.
Same question here, shouldn't geomedian_with_mads return an object with the right metadata?
There was a problem hiding this comment.
yes, it should be fixed in geomedian_with_mads().
I can try to fix it in odc-algo if you'd prefer, or we can merge this workaround for now until the upstream is fixed. Once it is fixed there, I'll remove the workaround to keep the code clean.
There was a problem hiding this comment.
Problem
While processing data with odc-stats, I noticed the pipeline correctly reprojects data to the target grid, but the CRS metadata revert to the native CRS downstream. This causes:
Root Cause Analysis
Important
Custom instrumentation for diagnosis
To diagnose this issue, I added custom CRS/grid logging at key pipeline points to track where CRS metadata diverges from actual coordinates. All log lines below (marked WARNING - CRS ...) are from this custom instrumentatio. They're not standard odc-stats output.
The logs track:
Initial observation
The logs reveal the inconsistency:
pre_reprojectshows native UTM coordinates: x=[315375..373265], y=[43435..-95] -> EPSG:32649post_reprojectshows coordinates transformed to the target grid: x=[1.055e+07..1.06e+07], y=[49995..5] -> EPSG:6933 rangeNote
Identified Issue: The reprojection warps the grid correctly, but the CRS metadata remains stale and affects the downstream.
Fix 1: Assign CRS after reprojection
After
xr_reproject, I explicitly assign the CRS to match the destination gridThis ensures the post-reproject dataset is internally consistent: both coordinates AND CRS metadata reflect the target grid.
After applying Fix 1:
post_reprojectnow correctly showscrs=EPSG:6933Note
Issue: Downstream in the gm plugin - reducer(), the dataset enters with the grid CRS (EPSG:6933) but exits with the native CRS again (EPSG:32649).
Fix 2: Prevent CRS regression in gm reducer
The fix:
.odc.crsEvidence
With both fixes applied:
Other smaller changes
proj:transformis JSON serialisablechanged:
transform=geobox.transform->transform=list(geobox.transform)I hit a case where
geobox.transformwas still anAffineobject, which is not JSON serialisable and can break STAC/item serialisation. Converting tolist(...)ensures we always write a JSON-safe transform.Changed:
datacube==1.9.5todatacube>=1.9.5and pinnedodc-stac==0.4.0._stac_fetch.pyimportsstac2dsfrom odc-stac. In newer datacube versions (≥1.9.6), stac2ds also exists in datacube.metadata, but as long as we pin odc-stac==0.4.0, the import remains unambiguous and works correctly.Tested successfully with datacube 1.9.10.