-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunkwise image loader #279
base: main
Are you sure you want to change the base?
Chunkwise image loader #279
Conversation
… image-reader-chunkwise
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #279 +/- ##
===========================================
+ Coverage 39.16% 50.82% +11.66%
===========================================
Files 26 27 +1
Lines 2663 2713 +50
===========================================
+ Hits 1043 1379 +336
+ Misses 1620 1334 -286
|
def _compute_chunks( | ||
dimensions: tuple[int, int], | ||
chunk_size: tuple[int, int], | ||
min_coordinates: tuple[int, int] = (0, 0), | ||
) -> NDArray[np.int_]: | ||
"""Create all chunk specs for a given image and chunk size. | ||
|
||
Creates specifications (x, y, width, height) with (x, y) being the upper left corner | ||
of chunks of size chunk_size. Chunks at the edges correspond to the remainder of | ||
chunk size and dimensions | ||
|
||
Parameters | ||
---------- | ||
dimensions : tuple[int, int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _compute_chunks( | |
dimensions: tuple[int, int], | |
chunk_size: tuple[int, int], | |
min_coordinates: tuple[int, int] = (0, 0), | |
) -> NDArray[np.int_]: | |
"""Create all chunk specs for a given image and chunk size. | |
Creates specifications (x, y, width, height) with (x, y) being the upper left corner | |
of chunks of size chunk_size. Chunks at the edges correspond to the remainder of | |
chunk size and dimensions | |
Parameters | |
---------- | |
dimensions : tuple[int, int] | |
def _compute_chunks( | |
shape: tuple[int, int], | |
chunk_size: tuple[int, int], | |
min_coordinates: tuple[int, int] = (0, 0), | |
) -> NDArray[np.int_]: | |
"""Create all chunk specs for a given image and chunk size. | |
Creates specifications (x, y, width, height) with (x, y) being the upper left corner | |
of chunks of size chunk_size. Chunks at the edges correspond to the remainder of | |
chunk size and dimensions | |
Parameters | |
---------- | |
shape : tuple[int, int] |
Just to stick to standard numpy / array api conventions:) Dimensions could be interpreted as TCZYX
.
positions = np.arange(min_coord, min_coord + size, chunk) | ||
lengths = np.full_like(positions, chunk, dtype=int) | ||
|
||
if positions[-1] + chunk > size + min_coord: | ||
lengths[-1] = size + min_coord - positions[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
positions = np.arange(min_coord, min_coord + size, chunk) | |
lengths = np.full_like(positions, chunk, dtype=int) | |
if positions[-1] + chunk > size + min_coord: | |
lengths[-1] = size + min_coord - positions[-1] | |
positions = np.arange(min_coord, min_coord + size, chunk) | |
lengths = np.minimum(chunk, min_coord + size - positions) |
Think this is the equivalent two liner:) but just a bit nitpicky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! I have 2 minor suggestions. I also saw that you use the width
by height
convention. Personally, I don't have a strong opinion here, though we could also stick to array api conventions. @LucaMarconato WDYT? Pre-approving for now.
list[list[DaskArray]] | ||
""" | ||
# Lazy file reader | ||
slide = tiffmmemap(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tiff.memmap might not always work for example with compression or tiling so I would add a try, except clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry had to change due to rethinking memmap
. This does not always work, for example when dealing with compressed tiffs as far as I am aware.
Description
This PR addresses the challenge that the currently implemented and planned image loaders require loading imaging data entirely into memory, typically as NumPy arrays. Given the large size of microscopy datasets, this is not always feasible.
To mitigate this issue, and as discussed with @LucaMarconato, this PR aims to introduce a generalizable approach for reading large microscopy files in chunks, enabling efficient handling of data that does not fit into memory.
Some related discussions.
Strategy
In this PR, we focus on
.tiff
images, as implemented in the_tiff_to_chunks
function.tifffile.memmap
)_compute_chunks
)dask.array
which is memory-mapped and avoids memory overflow (_read_chunks
)dask.array
(viadask.array.block
)The strategy is implemented in
src/spatialdata_io/readers/generic.py
andsrc/spatialdata_io/readers/_utils/_image.py
Future extensions
The strategy can be implemented for any image type, as long as it is possible to implement
We have implemented similar readers for openslide-compatible whole slide images and the Carl-Zeiss microscopy format.