Why not allow child nodes of an array? #1501

marcel-goldschen-ohm · 2023-08-15T19:51:45Z

marcel-goldschen-ohm
Aug 15, 2023

I am curious why child nodes (groups or arrays) are not allowed for an array? So long as the child names do not begin with a number it seems like they would not clash with any of the chunk files. I am guessing that at some lower level there are operations that may conflict with child nodes? If so, how hard would it be to allow arrays to have their own child nodes?

The reason why this would be very useful is that a nested hierarchy is perfect for storing the results of ordered operations such as baselining, scaling, filtering, etc. If these arrays are stored as a tree of arrays such as '/data/baselined/scaled/filtered/...' then this takes advantage of the hierarchy to provide a self-describing dataset from which the analysis order is easily intuited.

Yes, the above can be done using groups with each group having an array, but this makes the path hierarchy messy with a bunch of additional keys, and thus is not nearly as satisfying or straightforward. For the above, you would need paths such as:

/datagroup/data
/datagroup/baselinedgroup/baselined
/datagroup/baselinedgroup/scaledgroup/scaled
/datagroup/baselinedgroup/scaledgroup/filteredgroup/filtered

In contrast, the nested arrays '/data/baselined/scaled/filtered/...' are in my opinion way more self-evident and intuitive.

Answered by rabernat

Aug 15, 2023

Hi @marcel-goldschen-ohm! Thanks for opening the discussion.

I believe it could make sense to allow child nodes of an array for some of the scenarios you described. Another use case I have in mind is to simply store the same exact array, but with different chunking.

This could be accommodated in the V3 spec via an extension.

A more challenging task would be to integrate these multiple versions of an array with implementation software.

View full answer

rabernat · 2023-08-15T23:20:50Z

rabernat
Aug 15, 2023
Maintainer

Hi @marcel-goldschen-ohm! Thanks for opening the discussion.

I believe it could make sense to allow child nodes of an array for some of the scenarios you described. Another use case I have in mind is to simply store the same exact array, but with different chunking.

This could be accommodated in the V3 spec via an extension.

A more challenging task would be to integrate these multiple versions of an array with implementation software.

1 reply

marcel-goldschen-ohm Aug 16, 2023
Author

Hi @rabernat, thanks for your quick reply! I'm glad you (and probably others) are already thinking about such an extension, and I look forward to it.

Perhaps you were referring more to the multiple different chunked versions of an array, but I also agree that any generic software implementation for a nontrivial data hierarchy is a challenge (e.g., neural data without borders, etc.). One thing I would note is that it is much easier in my opinion for a naive program to understand a hierarchy than to parse meaning from keys, so I think this sort of extension of the capabilities of the hierarchy would help ease the challenge when it comes to general data formats that a computer can "understand". I've actually been thinking about this kind of thing a bunch lately, which is what led me to zarr in the first place ;) Finally, thanks for all your hard work on this!

joshmoore · 2023-08-23T21:11:16Z

joshmoore
Aug 23, 2023
Maintainer

@d-v-b brought up this discussion during the community call this evening. Summarizing my part of our brief discussion, I agree that I can imagine Zarr having been developed to originally support this, but:

I'm not sure the benefit outweighs the impact change at this point
This likely would introduce compatibility (or minimally, mapping) issues to HDF5, which is (a) likely a reason this choice was made in the first place and (b) probably of considerable concern for the good folks from Unidata (cc: @WardF)

1 reply

marcel-goldschen-ohm Aug 24, 2023
Author

Hi @joshmoore, I see your point that this would likely create mapping issues with HDF5. I also understand the desire to maintain compatibility with HDF5 given its wide use. That said, I still think having this option would be useful for hierarchy path semantics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not allow child nodes of an array? #1501

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Why not allow child nodes of an array? #1501

marcel-goldschen-ohm Aug 15, 2023

Replies: 2 comments · 2 replies

rabernat Aug 15, 2023 Maintainer

marcel-goldschen-ohm Aug 16, 2023 Author

joshmoore Aug 23, 2023 Maintainer

marcel-goldschen-ohm Aug 24, 2023 Author

marcel-goldschen-ohm
Aug 15, 2023

Replies: 2 comments 2 replies

rabernat
Aug 15, 2023
Maintainer

marcel-goldschen-ohm Aug 16, 2023
Author

joshmoore
Aug 23, 2023
Maintainer

marcel-goldschen-ohm Aug 24, 2023
Author