Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always skip reads when completely overwriting chunks #2784

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Jan 31, 2025

Teach the *Indexer objects to detect when a selection corresponds to a whole chunk. Propagate that information down to the codec pipeline where reads can be skipped if it makes sense locally.

Closes #757

I'm opening this for comments about the approach (@d-v-b , @normanrz ) it could definitely use more tests (edit: and some fixes)

TODO:

  • Add unit tests and/or doctests in docstrings
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 31, 2025
src/zarr/core/indexing.py Outdated Show resolved Hide resolved
src/zarr/core/indexing.py Outdated Show resolved Hide resolved
Uses `slice(..., None)` to indicate that a `chunk_selection`
ends at the boundary of the current chunk. Also does so for a last
chunk that is shorter than the chunk size.

`is_total_slice` now understands this convention, and correctly
detects boundary chunks as total slices.

Closes zarr-developers#757
This reverts commit 234431cd6efb661c53e2a832a0e4ea4dca772c1b.
@dcherian dcherian force-pushed the optimize-boundary-chunk-writes branch from 1da36f4 to 0e00eb3 Compare February 4, 2025 05:07
@dcherian dcherian requested a review from normanrz February 5, 2025 15:54
@dcherian dcherian marked this pull request as ready for review February 5, 2025 16:16
@@ -1198,7 +1206,8 @@ def __iter__(self) -> Iterator[ChunkProjection]:
for (dim_sel, dim_chunk_offset) in zip(self.selection, chunk_offsets, strict=True)
)

yield ChunkProjection(chunk_coords, chunk_selection, out_selection)
is_complete_chunk = False # TODO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where this is used. Is it always True?

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 5, 2025
@dcherian
Copy link
Contributor Author

dcherian commented Feb 5, 2025

#2784 fixes the is_total_slice problem.

@dcherian dcherian force-pushed the optimize-boundary-chunk-writes branch from 1921878 to b2be002 Compare February 5, 2025 16:46
@dcherian dcherian changed the title Skip reads when completely overwriting boundary chunks Always skip reads when completely overwriting chunks Feb 5, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Feb 5, 2025

I find the indexing code pretty hard to follow, but this looks good to me

@jhamman
Copy link
Member

jhamman commented Feb 7, 2025

@normanrz - do you have the bandwidth to give this a review?

@dcherian
Copy link
Contributor Author

dcherian commented Feb 8, 2025

@normanrz if you have time, please prioritize reviewing zarr-developers/numcodecs#700 over this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

behavior of is_total_slice for boundary chunks
3 participants