Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] zarr object models #46
base: main
Are you sure you want to change the base?
[draft] zarr object models #46
Changes from 1 commit
d7aa818
7bd949d
60594f3
1b9eb87
089a2f3
bfd1d0c
8603888
8d75fdd
46e71c4
e1b8755
a89bde6
659a090
20c9cc0
4c81889
3f1d95c
2f221b7
6cb0110
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to reiterate the idea of broadening this ZEP to include persisted consolidated metadata. Basically, why not allow to store the
members
property in a zarr.json?We would need to define the semantics of consolidated metadata (e.g. do member nodes still needs json files, does the members hierarchy need to be exhaustive). I would be happy to contribute that if there is interest to move this ZEP in that direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's definitely interest in that, my apologies for not making this more clear earlier. I'm not a user of consolidated metadata so I don't have a lot of experience with it, but I think for this ZEP to encompass consolidated metadata functionality as it exists today (i.e, a flat list of string keys pointing to JSON objects) we would need to define a tree flattening operation, and possible make
members
nullable (because in a flattened representation a ZOM group shouldn't hold a reference to its contents). Alternatively, if the flattened representation of the hierarchy used in consolidated metadata isn't essential to its function, we could simply just put a ZOM in JSON and leave it to clients to do the flattening. I don't have strong feelings either way! You should absolutely feel free contribute something here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep consolidated metadata separate from this ZOM concept. Consolidated metadata is a serialization choice, while the object model describes the relationships between entities in a more abstract, serialization independent way.
We intend to propose a ZEP which uses a STAC style link-relation element to allow a client to traverse an entire hierarchy without being able to list a store. This is similar to consolidated metadata but more scalable because it does not require all the metadata to be in a single json file. For context, we have hierarchies with 100_000 nodes. Would be happy to collaborate and iterate with you on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, maybe it would be good to get a statement like this in this ZEP to clarify the relationship between the abstract ZOM and consolidated metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am now wondering if this members property needs to be tweaked to accommodate the case of very large hierarchies. As is, it seems like the entire hierarchy may have to be explicitly populated all at once.
In python terms, I'd like to allow
members
to be either a set of child objects or a generator that yields such objects lazily. Is this making it too complicated?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't have to have this structure. It could be a nested json structure like this:
I think there is strong overlap between the ZOM and consolidated metadata. This ZEP introduces a JSON schema that describes the existing metadata of groups and arrays with a new addition of the
members
property. I think it would be very confusing, if consolidated metadata would end up with different terminology than the ZOM.That sounds great. As I said, we can discuss the semantics and features of the consolidated metadata. That could include linking. I don't think we should limit ourselves by what the implementation in zarr-python currently has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This concern is real! See also zarr-developers/pydantic-zarr#2 . The proposal there was to make
members
nullable, whereNone
would encode "The members have not been parsed", and to give a tree parser the option to limit the depth of traversal, which would result in "truncated"GroupSpec
instances being valid. But maybe the python generator approach obviates the need to express this with nullability? I'm open to suggestions hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to be able to distinguish between "there are definitely no members" vs. "there may be members, but they have to be discovered explicitly"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in
pydantic-zarr
members
is now nullable, and it's been extremely useful. That being said, this can be viewed as a well-defined transformation of the base type, so it's not clear if the ZEP actually needs to address it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if you could use some help generating this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some help here would be great, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I have an unpublished version that I can share soon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see zarr-developers/zarr-python#1526
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zarrita also has attrs classes that define the metadata (minus the new
members
properties) https://github.com/scalableminds/zarrita/blob/async/zarrita/metadata.py#L259There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may also be worth summarizing some of the intended benefits to existing/internal applications. For example, the utilization of a standard data object internally within zarr-python may help improve workflow for creating large hierarchies by allowing users to create the ZOM metadata before passing it to a zarr.creation method.