-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ZArray with zarr.core.array.ArrayMetadata #411
Comments
Why would we even need the V2 version of the class? |
Can't we just make all ManifestArrays use V3 metadata? The only place I can think of where we might need the V2 version is to deal with actual Zarr V2 data in @norlandrhagen 's Zarr reader PR, and even then we just immediately convert the metadata to V3 anyway. |
@d-v-b might have advice on this |
🤔 I can understand conceptually how in-memory the zarr array metadata could always be in the V3 format, but I could see the following instances where we might want to use a V2 Metadata representation: Readers:
Writing:
|
All those cases happen during creation or serialisation of ManifestArrays
though. So we need functions to make ManifestArrays from V2 metadata, but
we don't need the ManifestArray class itself to have flexibility, all
ManifestArrays can just represent their metadata in the V3 format.
I don't think kerchunk's format even makes a distinction right now, and we
cannot write to Zarr, only Icechunk, which is only for V3.
…On Fri, Jan 31, 2025, 6:55 PM Aimee Barciauskas ***@***.***> wrote:
🤔 I can understand conceptually how in-memory the zarr array metadata
could always be in the V3 format, but I could see the following instances
where we might want to use a V2 Metadata representation:
Readers:
1. @norlandrhagen <https://github.com/norlandrhagen>'s Zarr reader
will need to read Zarr V2 data, as you mentioned
2. Kerchunk reader: If opening from an existing Kerchunk store it is
most likely, at this time, using V2 metadata
Writing:
1. Do we want to support writing to kerchunk and Zarr in zarr_format
version 2?
—
Reply to this email directly, view it on GitHub
<#411 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AISNPIYFOMTSUBYG3UDDVIL2NQEQVAVCNFSM6AAAAABWISEMPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRYGU4DMNJUGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
+1 to using the metadata classes from in |
Thanks @TomNicholas and @d-v-b So if we decide all ManifestArrays will just use ArrayV3Metadata internally, then we just need to modify and test all readers are translating their format's metadata to v3. For writers, I see why I was confused. We have 3 writer methods right now:
I'm still parsing the code changes that may be required, but in terms of breaking down the work then, it seems like we could break this down into separate PRs to a zarr-python 3 branch:
|
Yes, see #359. Everything else you've written is exactly right. We might want to change the interface for getting metadata from the class ManifestArray:
_manifest: ChunkManifest
_metadata: ArrayV3Metadata
def __init__(self, chunkmanifest, metadata: ArrayV3Metadata) -> None:
...
@property
def metadata(self) -> ArrayV3Metadata:
...
def to_kerchunk_v2(self) -> KerchunkArrRefs:
... To create the def extract_metadata_from_kerchunk(refs: KerchunkArrRefs) -> ArrayV3Metadata:
... Once we have those swapping this in every reader should be straightforwards. |
I don't believe there is an issue for this: We want to replace (or dramatically simplify) the custom ZArray class that exists in virtualizarr/zarr.py with zarr-python classes and methods.
Based on discussion in #175, I believe we want to replace ZArray with
zarr.core.array.ArrayV2Metadata
andzarr.core.array.ArrayV3Metadata
and refactor every place that uses ZArray (and probably refactorcodecs.py
.I looked at #175 and pulled some of them into a branch I have started working on (https://github.com/zarr-developers/VirtualiZarr/tree/ab/use-zarr-python-metadata, which I have only tested and made one test pass so far (virtualizarr/tests/test_manifests/test_array.py::TestStack::test_stack_empty), but wanted to an open an issue to track the approach to this change.
I think I will next look at the codecs tests to see what refactoring may be necessary there.
The text was updated successfully, but these errors were encountered: