Chunkwise image loader #279

lucas-diedrich · 2025-02-17T17:02:53Z

Description

This PR addresses the challenge that the currently implemented and planned image loaders require loading imaging data entirely into memory, typically as NumPy arrays. Given the large size of microscopy datasets, this is not always feasible.

To mitigate this issue, and as discussed with @LucaMarconato, this PR aims to introduce a generalizable approach for reading large microscopy files in chunks, enabling efficient handling of data that does not fit into memory.

Some related discussions.

Strategy

In this PR, we focus on .tiff images, as implemented in the _tiff_to_chunks function.

Get a lazy representation of the image via a suitable reader function (here: tifffile.memmap)
Pre-define chunks that fit into memory, based on the dimensions of the image (_compute_chunks)
Load small chunks via a custom reader function and pass the chunks to dask.array which is memory-mapped and avoids memory overflow (_read_chunks)
Reassembling the chunks into a dask.array (via dask.array.block)
Parse to Image2DModel.

The strategy is implemented in

src/spatialdata_io/readers/generic.py and
src/spatialdata_io/readers/_utils/_image.py

Future extensions

The strategy can be implemented for any image type, as long as it is possible to implement

a lazy image-data loader
define a custom reader function

We have implemented similar readers for openslide-compatible whole slide images and the Carl-Zeiss microscopy format.

… image-reader-chunkwise

codecov-commenter · 2025-02-17T17:04:55Z

Codecov Report

Attention: Patch coverage is 97.91667% with 1 line in your changes missing coverage. Please review.

Project coverage is 50.82%. Comparing base (296d9a5) to head (b7e5874).
Report is 135 commits behind head on main.

Files with missing lines	Patch %	Lines
src/spatialdata_io/readers/generic.py	96.29%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #279       +/-   ##
===========================================
+ Coverage   39.16%   50.82%   +11.66%     
===========================================
  Files          26       27        +1     
  Lines        2663     2713       +50     
===========================================
+ Hits         1043     1379      +336     
+ Misses       1620     1334      -286

Files with missing lines	Coverage Δ
src/spatialdata_io/readers/_utils/_image.py	`100.00% <100.00%> (ø)`
src/spatialdata_io/readers/generic.py	`91.52% <96.29%> (+2.95%)`	⬆️

... and 12 files with indirect coverage changes

melonora · 2025-03-24T07:03:44Z

src/spatialdata_io/readers/_utils/_image.py

+def _compute_chunks(
+    dimensions: tuple[int, int],
+    chunk_size: tuple[int, int],
+    min_coordinates: tuple[int, int] = (0, 0),
+) -> NDArray[np.int_]:
+    """Create all chunk specs for a given image and chunk size.
+
+    Creates specifications (x, y, width, height) with (x, y) being the upper left corner
+    of chunks of size chunk_size. Chunks at the edges correspond to the remainder of
+    chunk size and dimensions
+
+    Parameters
+    ----------
+    dimensions : tuple[int, int]


Suggested change

def _compute_chunks(

dimensions: tuple[int, int],

chunk_size: tuple[int, int],

min_coordinates: tuple[int, int] = (0, 0),

) -> NDArray[np.int_]:

"""Create all chunk specs for a given image and chunk size.

Creates specifications (x, y, width, height) with (x, y) being the upper left corner

of chunks of size chunk_size. Chunks at the edges correspond to the remainder of

chunk size and dimensions

Parameters

----------

dimensions : tuple[int, int]

def _compute_chunks(

shape: tuple[int, int],

chunk_size: tuple[int, int],

min_coordinates: tuple[int, int] = (0, 0),

) -> NDArray[np.int_]:

"""Create all chunk specs for a given image and chunk size.

Creates specifications (x, y, width, height) with (x, y) being the upper left corner

of chunks of size chunk_size. Chunks at the edges correspond to the remainder of

chunk size and dimensions

Parameters

----------

shape : tuple[int, int]

Just to stick to standard numpy / array api conventions:) Dimensions could be interpreted as TCZYX.

melonora · 2025-03-24T07:15:11Z

src/spatialdata_io/readers/_utils/_image.py

+    positions = np.arange(min_coord, min_coord + size, chunk)
+    lengths = np.full_like(positions, chunk, dtype=int)
+
+    if positions[-1] + chunk > size + min_coord:
+        lengths[-1] = size + min_coord - positions[-1]


Suggested change

positions = np.arange(min_coord, min_coord + size, chunk)

lengths = np.full_like(positions, chunk, dtype=int)

if positions[-1] + chunk > size + min_coord:

lengths[-1] = size + min_coord - positions[-1]

positions = np.arange(min_coord, min_coord + size, chunk)

lengths = np.minimum(chunk, min_coord + size - positions)

Think this is the equivalent two liner:) but just a bit nitpicky

melonora

Thanks for your contribution! I have 2 minor suggestions. I also saw that you use the width by height convention. Personally, I don't have a strong opinion here, though we could also stick to array api conventions. @LucaMarconato WDYT? Pre-approving for now.

melonora · 2025-03-24T14:08:31Z

src/spatialdata_io/readers/generic.py

+    list[list[DaskArray]]
+    """
+    # Lazy file reader
+    slide = tiffmmemap(input)


the tiff.memmap might not always work for example with compression or tiling so I would add a try, except clause.

melonora

sorry had to change due to rethinking memmap. This does not always work, for example when dealing with compressed tiffs as far as I am aware.

Lucas Diedrich added 11 commits February 16, 2025 13:36

feature: Initial lazy tiff reader

8027c0b

Updated comments

7ff1d23

Updated comments

826133a

Move utility functions to designated submodule readers._utils._image

df258bc

Initial tests utils

db0d782

Fixes edge cases for min coordinate

c03932f

Added test for negative coordinates

339cbd8

Add support for png/jpg again

24c6eec

Add initial test

dbdc7c7

Fix: Fix jpeg and png reader, fix issues with local variable name

da98469

Merge branch 'main' of https://github.com/scverse/spatialdata-io into…

b7e5874

… image-reader-chunkwise

lucas-diedrich marked this pull request as draft February 17, 2025 17:03

lucas-diedrich marked this pull request as ready for review March 21, 2025 15:44

melonora reviewed Mar 24, 2025

View reviewed changes

melonora approved these changes Mar 24, 2025

View reviewed changes

melonora reviewed Mar 24, 2025

View reviewed changes

melonora requested changes Mar 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunkwise image loader #279

Chunkwise image loader #279

lucas-diedrich commented Feb 17, 2025 •

edited

Loading

codecov-commenter commented Feb 17, 2025 •

edited

Loading

melonora Mar 24, 2025 •

edited

Loading

melonora Mar 24, 2025

melonora left a comment

melonora Mar 24, 2025

melonora left a comment

Chunkwise image loader #279

Are you sure you want to change the base?

Chunkwise image loader #279

Conversation

lucas-diedrich commented Feb 17, 2025 • edited Loading

Description

Strategy

Future extensions

codecov-commenter commented Feb 17, 2025 • edited Loading

Codecov Report

melonora Mar 24, 2025 • edited Loading

Choose a reason for hiding this comment

melonora Mar 24, 2025

Choose a reason for hiding this comment

melonora left a comment

Choose a reason for hiding this comment

melonora Mar 24, 2025

Choose a reason for hiding this comment

melonora left a comment

Choose a reason for hiding this comment

lucas-diedrich commented Feb 17, 2025 •

edited

Loading

codecov-commenter commented Feb 17, 2025 •

edited

Loading

melonora Mar 24, 2025 •

edited

Loading