-
-
Notifications
You must be signed in to change notification settings - Fork 358
Refactor AsyncGroup.open()
#3443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3443 +/- ##
===========================================
- Coverage 94.92% 61.07% -33.86%
===========================================
Files 79 79
Lines 9500 9502 +2
===========================================
- Hits 9018 5803 -3215
- Misses 482 3699 +3217
🚀 New features to boost your workflow:
|
it's a bit weird that |
src/zarr/core/group.py
Outdated
zarr_format = await _get_zarr_version(store_path) | ||
return await cls.open( | ||
store=store, zarr_format=zarr_format, use_consolidated=use_consolidated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit inefficient from an IO perspective, since _get_zarr_version
does IO against metadata documents, and cls.open
will do more IO against the same documents.
I think it's more efficient to:
- avoid recursion here
- instead of using a function that checks if objects exist (
_get_zarr_version
), use a function that returns an existing array or group directly. This is a more efficient use of IO. I thinkget_node
, or one of the functions it relies on, could be used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point about the IO operations. I naively assumed a exists()
operation would be much cheaper than get()
, but I guess on remote stores that's not the case. I've refactored to avoid recursion, and minimise IO. When working out whether a group is v2/v3 there's now two less IO operations, which is a nice win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using a function that checks if objects exist (_get_zarr_version), use a function that returns an existing array or group directly. This is a more efficient use of IO. I think get_node, or one of the functions it relies on, could be used here.
I'm not sure what you meant by this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry if I wasn't clear. one of the outcomes of open
is to return an existing group (and if there's an existing array, or an existing group with the wrong zarr format, we should error). So instead of using a function that just infers the zarr_format of an existing array / group, it's likely simpler to simply try and open an array / group, and then handle the result of that operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called further down the stack than the "high level" functions to open groups, so I don't think that would work? e.g., for zarr.open_group()
the call stack is:
zarr.open_group()
zarr.api.asynchronous.open_group()
AsyncGroup.open()
Agreed, but it's a |
This is a refactor of
AsyncGroup.open()
for readbility/logic simplification: