Why not allow child nodes of an array? #1501
-
I am curious why child nodes (groups or arrays) are not allowed for an array? So long as the child names do not begin with a number it seems like they would not clash with any of the chunk files. I am guessing that at some lower level there are operations that may conflict with child nodes? If so, how hard would it be to allow arrays to have their own child nodes? The reason why this would be very useful is that a nested hierarchy is perfect for storing the results of ordered operations such as baselining, scaling, filtering, etc. If these arrays are stored as a tree of arrays such as '/data/baselined/scaled/filtered/...' then this takes advantage of the hierarchy to provide a self-describing dataset from which the analysis order is easily intuited. Yes, the above can be done using groups with each group having an array, but this makes the path hierarchy messy with a bunch of additional keys, and thus is not nearly as satisfying or straightforward. For the above, you would need paths such as:
In contrast, the nested arrays '/data/baselined/scaled/filtered/...' are in my opinion way more self-evident and intuitive. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @marcel-goldschen-ohm! Thanks for opening the discussion. I believe it could make sense to allow child nodes of an array for some of the scenarios you described. Another use case I have in mind is to simply store the same exact array, but with different chunking. This could be accommodated in the V3 spec via an extension. A more challenging task would be to integrate these multiple versions of an array with implementation software. |
Beta Was this translation helpful? Give feedback.
-
@d-v-b brought up this discussion during the community call this evening. Summarizing my part of our brief discussion, I agree that I can imagine Zarr having been developed to originally support this, but:
|
Beta Was this translation helpful? Give feedback.
Hi @marcel-goldschen-ohm! Thanks for opening the discussion.
I believe it could make sense to allow child nodes of an array for some of the scenarios you described. Another use case I have in mind is to simply store the same exact array, but with different chunking.
This could be accommodated in the V3 spec via an extension.
A more challenging task would be to integrate these multiple versions of an array with implementation software.