Conversation
|
Tests are hanging due to worker memory blowing out. Not sure what the issue is at this stage. |
a9177c2 to
34a04c6
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #207 +/- ##
========================================
Coverage 81.52% 81.52%
========================================
Files 53 53
Lines 4849 4849
========================================
Hits 3953 3953
Misses 896 896 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
That memory limit argument is used to tell dask how much memory it is allowed to use (and maybe to work out chuck sizes, I'm not sure). I think GitHub actions runners have 16GB of RAM, so dask needs to use substantially less to not be killed. The log messages from dask in the failed runs indicated it had decided to use about 14GB, so it must use a heuristic that assumes it can use most of the available ram, too much in this case. |
39ab72a to
55a2c8c
Compare
for more information, see https://pre-commit.ci
|
Comments: I'm having to leave this where it is for now, as I'm off on a much-needed 4 week holiday. Status
There are some unrelated test errors happening due to the switch to numpy 2. These should be relatively straightforward to resolve. |
|
In addition to @SpacemanPaul 's suspicion about the different My understanding is that they were very important a few years ago to efficiently scale across dask workers, especially for running all of time statistical summaries. Those summaries are embarrassingly parallel spatially, but, (I think), the dask scheduler at the time was causing major problems in how it was splitting the work up. In the last year there have been a few significant improvements in the dask/distributed scheduler that should address some or all of these issues, and the custom memory sinks may just be getting in the way. It may be possible to use a much simpler implementation using the geomad kernel with xarray code something like the following that ran better against one of these new tests: It also shouldn't be necessary to reach as low level as the LoadingI'm also curious as to whether there are custom dask loading functions implemented here in |
|
@SpacemanPaul @omad I've pushed the jupyter notebook and config file that I used for this testing into UPDATE: this was run on
|

Use new open source Geomad package instead of hdstats.
Remove dependency on
packages.dea.ga.gov.aupython package index.