Skip to content

Conversation

dstansby
Copy link
Contributor

@dstansby dstansby commented Sep 9, 2025

This is a refactor of AsyncGroup.open() for readbility/logic simplification:

  • Factor out the code for guessing the Zarr version from a store to the top of the method
  • This then allows the number of if branches to be reduced, so all the v2 code is in one branch and all the v3 code is in another branch.

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Sep 9, 2025
Copy link

codecov bot commented Sep 9, 2025

Codecov Report

❌ Patch coverage is 96.96970% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 61.07%. Comparing base (ee9c182) to head (74baee4).

Files with missing lines Patch % Lines
src/zarr/core/group.py 96.96% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ee9c182) and HEAD (74baee4). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (ee9c182) HEAD (74baee4)
11 10
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3443       +/-   ##
===========================================
- Coverage   94.92%   61.07%   -33.86%     
===========================================
  Files          79       79               
  Lines        9500     9502        +2     
===========================================
- Hits         9018     5803     -3215     
- Misses        482     3699     +3217     
Files with missing lines Coverage Δ
src/zarr/core/group.py 70.45% <96.96%> (-24.61%) ⬇️

... and 66 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Contributor

d-v-b commented Sep 9, 2025

If zarr_format=None, guess the Zarr format and then call AsyncGroup.open() recursively with the guessed format.

it's a bit weird that AsyncGroup.open() takes a zarr_format parameter -- AsyncGroup instances should be v2 or v3 only

Comment on lines 541 to 543
zarr_format = await _get_zarr_version(store_path)
return await cls.open(
store=store, zarr_format=zarr_format, use_consolidated=use_consolidated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit inefficient from an IO perspective, since _get_zarr_version does IO against metadata documents, and cls.open will do more IO against the same documents.

I think it's more efficient to:

  • avoid recursion here
  • instead of using a function that checks if objects exist (_get_zarr_version), use a function that returns an existing array or group directly. This is a more efficient use of IO. I think get_node, or one of the functions it relies on, could be used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point about the IO operations. I naively assumed a exists() operation would be much cheaper than get(), but I guess on remote stores that's not the case. I've refactored to avoid recursion, and minimise IO. When working out whether a group is v2/v3 there's now two less IO operations, which is a nice win.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using a function that checks if objects exist (_get_zarr_version), use a function that returns an existing array or group directly. This is a more efficient use of IO. I think get_node, or one of the functions it relies on, could be used here.

I'm not sure what you meant by this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry if I wasn't clear. one of the outcomes of open is to return an existing group (and if there's an existing array, or an existing group with the wrong zarr format, we should error). So instead of using a function that just infers the zarr_format of an existing array / group, it's likely simpler to simply try and open an array / group, and then handle the result of that operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called further down the stack than the "high level" functions to open groups, so I don't think that would work? e.g., for zarr.open_group() the call stack is:

zarr.open_group()
zarr.api.asynchronous.open_group()
AsyncGroup.open()

@dstansby
Copy link
Contributor Author

dstansby commented Sep 9, 2025

it's a bit weird that AsyncGroup.open() takes a zarr_format parameter -- AsyncGroup instances should be v2 or v3 only

Agreed, but it's a classmethod so I guess it sort of makes sense? Maybe it should really just be a function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants