Use geomad instead of hdstats. by SpacemanPaul · Pull Request #207 · opendatacube/odc-stats

SpacemanPaul · 2025-09-05T01:12:02Z

Use new open source Geomad package instead of hdstats.

Remove dependency on packages.dea.ga.gov.au python package index.

omad

Nifty!

SpacemanPaul · 2025-09-05T02:27:05Z

Tests are hanging due to worker memory blowing out. Not sure what the issue is at this stage.

codecov · 2025-09-05T05:37:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.52%. Comparing base (5b85e93) to head (39ab72a).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop     #207   +/-   ##
========================================
  Coverage    81.52%   81.52%           
========================================
  Files           53       53           
  Lines         4849     4849           
========================================
  Hits          3953     3953           
  Misses         896      896

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

omad · 2025-09-05T07:25:30Z

That memory limit argument is used to tell dask how much memory it is allowed to use (and maybe to work out chuck sizes, I'm not sure).

I think GitHub actions runners have 16GB of RAM, so dask needs to use substantially less to not be killed.

The log messages from dask in the failed runs indicated it had decided to use about 14GB, so it must use a heuristic that assumes it can use most of the available ram, too much in this case.

for more information, see https://pre-commit.ci

SpacemanPaul · 2025-09-17T00:15:03Z

Comments:

I'm having to leave this where it is for now, as I'm off on a much-needed 4 week holiday.

Status

Geomad seems to be working hunky dory in odc-algo.
integration test failures here appear to be happening BEFORE the execution gets to the geomad code. I think it's got something to do with the dask scheduling of the output COG writing.
integration tests have been running against the last released version of odc-stats code (i.e. the code from the previous successfully merged PR), rather than the code of the current PR, making it very difficult to figure out where things broke. I have fixed this in this PR.
I have reverted to core's internal (deprecated) to_cog() method which gives clearer error messages (odc-geo to_cog just runs out of memory and the container gets killed). With datacube internal to_cog() I get pickling errors referencing _HLGExprSequence which I have seen before and indicates that an internal Dask object (e.g. a Client) is getting passed to a delayed dask function, but I haven't been able to spot where this might be happening.

There are some unrelated test errors happening due to the switch to numpy 2. These should be relatively straightforward to resolve.

omad · 2025-09-17T01:54:18Z

In addition to @SpacemanPaul 's suspicion about the different to_cog() dask functions, I'm wondering whether the yxbt_sink and reshape_yxbt features in odc-stats are still necessary.

My understanding is that they were very important a few years ago to efficiently scale across dask workers, especially for running all of time statistical summaries. Those summaries are embarrassingly parallel spatially, but, (I think), the dask scheduler at the time was causing major problems in how it was splitting the work up.

In the last year there have been a few significant improvements in the dask/distributed scheduler that should address some or all of these issues, and the custom memory sinks may just be getting in the way.

It may be possible to use a much simpler implementation using the geomad kernel with xarray code something like the following that ran better against one of these new tests:

def xr_geomedian(x, dim="time"):
    return xr.apply_ufunc(
        geomad.nangeomedian_pcm,
        x,
        input_core_dims=[["time"]],
        dask="parallelized",
        dask_gufunc_kwargs={'allow_rechunk': True}
        )
src_dataarray = src.to_dataarray(dim="band").chunk({'x': 800, 'y': 800, 'time': -1, 'band': -1})
result = xr_geomedian(src_dataarray)

It also shouldn't be necessary to reach as low level as the .map_blocks() calls, since .apply_ufunc() can do simple dimension reductions like geomad on it's own.

Loading

I'm also curious as to whether there are custom dask loading functions implemented here in odc-stats that would be better replaced by the code in odc-loader.

cbur24 · 2025-09-19T05:39:42Z

@SpacemanPaul @omad
I ran Landsat 8 geomedians using the code in this branch to see if the memory leak issue you're reporting is replicated outside of the unit testing. On the unstable sandbox image, the landsat geomedian plugin (gm-ls) worked as expected without memory spikes. The outputs also look correct. Maybe this is only an issue with the unit tests and not with the geomedian code per se?

I've pushed the jupyter notebook and config file that I used for this testing into odc-stats/docs within this branch so you can test for yourselves - we can delete these later once your happy. ODC and Datacube versions tested:

UPDATE: this was run on xarray=2025.6.1

SpacemanPaul requested review from Ariana-B, cbur24, omad and robbibt and removed request for Ariana-B September 5, 2025 01:12

omad approved these changes Sep 5, 2025

View reviewed changes

SpacemanPaul force-pushed the use-geomad branch from a9177c2 to 34a04c6 Compare September 5, 2025 04:05

omad mentioned this pull request Sep 7, 2025

With Geomad and LESS ram for integration tests #210

Open

SpacemanPaul added 10 commits September 8, 2025 14:03

Use geomad instead of hdstats.

af4c573

Pin odc-algo correctly in test requirements

45ff90a

Try incremental bump to memory limit on geomad test that is failing.

c27c9c6

Tune to just under GHA system limit.

f2ce485

Oops - restore geomad to the test requirements.txt

9574324

Add geomad to setup.cfg as package dependency.

8c6fa03

Numpy>=2, odc-algo>=1.1.1 (meaning geomad is now an implicit dependency)

87204b4

Run integration tests against current code.

c423ef7

Switch back to datacube core internal to_cog() funnction.

cbb088a

Use preferred aws configure method.

55a2c8c

SpacemanPaul force-pushed the use-geomad branch from 39ab72a to 55a2c8c Compare September 17, 2025 00:06

[pre-commit.ci] auto fixes from pre-commit.com hooks

70effee

for more information, see https://pre-commit.ci

testing geomedian in the wild

5348f68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use geomad instead of hdstats.#207

Use geomad instead of hdstats.#207
SpacemanPaul wants to merge 12 commits intodevelopfrom
use-geomad

SpacemanPaul commented Sep 5, 2025

Uh oh!

omad left a comment

Uh oh!

SpacemanPaul commented Sep 5, 2025

Uh oh!

codecov bot commented Sep 5, 2025

Uh oh!

omad commented Sep 5, 2025 •

edited

Loading

Uh oh!

SpacemanPaul commented Sep 17, 2025 •

edited

Loading

Uh oh!

omad commented Sep 17, 2025

Uh oh!

cbur24 commented Sep 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SpacemanPaul commented Sep 5, 2025

Uh oh!

omad left a comment

Choose a reason for hiding this comment

Uh oh!

SpacemanPaul commented Sep 5, 2025

Uh oh!

codecov bot commented Sep 5, 2025

Codecov Report

Uh oh!

omad commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SpacemanPaul commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omad commented Sep 17, 2025

Loading

Uh oh!

cbur24 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

omad commented Sep 5, 2025 •

edited

Loading

SpacemanPaul commented Sep 17, 2025 •

edited

Loading

cbur24 commented Sep 19, 2025 •

edited

Loading