Replace PBC handling in feature detection with label function from dask-image by w-k-jones · Pull Request #562 · tobac-project/tobac

w-k-jones · 2026-02-05T18:42:01Z

The dask_image.ndmeasure.label function replicates the scipy.ndimage.label function, but adds a wrap_axes keyword that can be used to perform label detection across periodic boundaries (it requires identical logic to labelling regions across tiles). Replacing the current feature detection PBC code with this both simplifies our codebase, and starts to integrate more dask features into the tobac pipeline so that eventually it can be fully dask enabled. I have set up the label function to replicate the handling of connectivity indentically to the scikit-image function it is replacing, meaning that full connectivity is used rather than square connectivity. However, in future I think we should make this an option and change the default to square (1) connectivity to match segmentation (see #481)

Currently in draft as I am not sure whether to introduce this on its own or as part of a larger refactor of feature detection

github-actions · 2026-02-05T18:44:40Z

Linting results by Pylint:

Your code has been rated at 8.36/10 (previous run: 8.36/10, +0.00)
_{The linting score is an indicator that reflects how well your code version follows Pylint’s coding standards and quality metrics with respect to the RC_v1.6.x branch.

A decrease usually indicates your new code does not fully meet style guidelines or has potential errors.}

… wrap_axes options

codecov · 2026-02-06T07:18:25Z

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.46%. Comparing base (fcb7cd3) to head (a4e6957).
⚠️ Report is 12 commits behind head on RC_v1.6.x.

Files with missing lines	Patch %	Lines
tobac/feature_detection.py	85.71%	2 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##           RC_v1.6.x     #562      +/-   ##
=============================================
- Coverage      64.84%   64.46%   -0.38%     
=============================================
  Files             27       27              
  Lines           3985     3923      -62     
=============================================
- Hits            2584     2529      -55     
+ Misses          1401     1394       -7

Flag	Coverage Δ
unittests	`64.46% <85.71%> (-0.38%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…g confusion in future

w-k-jones · 2026-02-06T08:57:18Z

Testing with the low cloud tracking notebook shows a slight slow down in performance:

Previous: 3.43 s ± 27.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
New: 3.61 s ± 36.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I will test on a larger example, but I think this is a reasonable trade off for simpler code

w-k-jones · 2026-02-06T11:37:01Z

On further investigation with larger datasets, it seems like there is a fairly large overhead to using the dask image label function for moderately large feature detection jobs (where feature detection takes ~10 seconds to 1 minute), but performance benefits for larger tasks (e.g. multiple minutes). I will investigate a bit more to see if time steps vs spatial array size has different results. Note that this is using the default dask settings, so not increasing the number of workers etc. or chunking the data.

w-k-jones · 2026-02-06T13:24:06Z

Some test results:

MCSMIP Obs (geo-ir), 1200x3600 domain, PBC hdim_2: [on HPC]
24 time steps: previous: 26.92 s, new: 48.26 s
240 time steps: previous: 04:41.61, new: 04:12.25
672 time steps: previous: 0:11:17.83, new: 09:29.44

EUREC4A LES, 1524x1524 domain, PBC both: [on laptop]
13 time steps: previous: 1.04 s ± 6.93 ms per loop, new: 1.63 s ± 20.5 ms per loop
49 time steps: previous: 3.66 s ± 27.1 ms per loop, new: 5.84 s ± 71.5 ms per loop
720 time steps: previous: 41.8 s ± 528 ms per loop, new: 1min 22s ± 741 ms per loop

What's puzzling is that the relative performance is changing with the number of timesteps, even though the labelling is working independently per timestep. Dask continues to be an enigma to me, do you have any ideas on this @freemansw1 ?

freemansw1 · 2026-02-06T13:53:58Z

@w-k-jones I'm certainly not a dask expert (and I'm looking at this relatively quickly), but I'm guessing it has to do with how the dask graph is constructed/how big the dask graph gets. Even if it's operating independently per timestep, unless compute is being run after every timestep, it only (in theory) does it at the end. I've also found that sometimes if you do run compute every timestep, it recomputes old results. I'm not sure why it does that, though.

This is very exciting, though. I wonder how performance scales by number of workers.

w-k-jones · 2026-02-06T14:10:38Z

@w-k-jones I'm certainly not a dask expert (and I'm looking at this relatively quickly), but I'm guessing it has to do with how the dask graph is constructed/how big the dask graph gets. Even if it's operating independently per timestep, unless compute is being run after every timestep, it only (in theory) does it at the end. I've also found that sometimes if you do run compute every timestep, it recomputes old results. I'm not sure why it does that, though.

This is very exciting, though. I wonder how performance scales by number of workers.

Currently I have it running compute every timestep immediately for the labelling, to avoid having change any of the surrounding code, and given the array isn't chunked it will likely run slower than the scipy/scikit-image labelling just due to overhead.

From a quick inspection dask-image should have substitutes for most of the image functions we need for feature detection. I'll see if I can get lazy execution of the per timestep feature detection working in a simple manner and see if running with an actual dask client and changing the number of workers affects things.

w-k-jones · 2026-02-11T10:47:49Z

I think that the best way forward here is to keep both a "small data" approach based on the current method for performance on smaller problems, and a "big data" alternative using dask-image that is used if a dask array is passed to feature detection.

w-k-jones changed the base branch from main to RC_v1.6.x February 6, 2026 07:03

William added 3 commits February 6, 2026 07:12

Replace PBC handling in feature detection with dask-image label using…

ddb565a

… wrap_axes options

Add dask and dask-image to requirements

c5e3e7a

Add extra requirements to pyproject file

ffd8833

w-k-jones force-pushed the dask_image_label branch from 844f3d1 to ffd8833 Compare February 6, 2026 07:15

William added 4 commits February 6, 2026 07:28

Add requirements to docs configuration file for mocking

8ebc18a

Replace match-case with if-else syntax for 3.9 compatibility

f2e44e5

Add requirements to docs configuration file for mocking

1161c07

Fix naming of mock imports for docs build and add note to avoid namin…

a4e6957

…g confusion in future

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace PBC handling in feature detection with label function from dask-image#562

Replace PBC handling in feature detection with label function from dask-image#562
w-k-jones wants to merge 7 commits intotobac-project:RC_v1.6.xfrom
w-k-jones:dask_image_label

w-k-jones commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

w-k-jones commented Feb 6, 2026

Uh oh!

w-k-jones commented Feb 6, 2026

Uh oh!

w-k-jones commented Feb 6, 2026

Uh oh!

freemansw1 commented Feb 6, 2026

Uh oh!

w-k-jones commented Feb 6, 2026

Uh oh!

w-k-jones commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

w-k-jones commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linting results by Pylint:

Uh oh!

codecov bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

w-k-jones commented Feb 6, 2026

Uh oh!

w-k-jones commented Feb 6, 2026

Uh oh!

w-k-jones commented Feb 6, 2026

Uh oh!

freemansw1 commented Feb 6, 2026

Uh oh!

w-k-jones commented Feb 6, 2026

Uh oh!

w-k-jones commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Feb 5, 2026 •

edited

Loading

codecov bot commented Feb 6, 2026 •

edited

Loading