Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various storage updates #211

Merged
merged 11 commits into from
Feb 27, 2023
1,589 changes: 1,589 additions & 0 deletions docs/_static/js/mermaid.js

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@
'sphinx_reredirects',
]

mermaid_version = ""
html_js_files = [
'js/mermaid.js', # v9.4.0
]

# Display todos by setting to True
todo_include_todos = True

Expand Down
112 changes: 52 additions & 60 deletions docs/v3/core/v3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ Editors:
Corresponding ZEP:
`ZEP 1 — Zarr specification version 3 <https://zarr.dev/zeps/draft/ZEP0001.html>`_

Issue tracking and discussion overview:
`GitHub project board <https://github.com/orgs/zarr-developers/projects/2>`_
Issue tracking:
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/core-protocol-v3.0>`_

Suggest an edit for this spec:
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/main/docs/v3/core/v3.0.rst>`_
Expand Down Expand Up @@ -1129,6 +1129,9 @@ interface`_ subsection. The store interface can be implemented using a
variety of underlying storage technologies, described in the
subsection on `Store implementations`_.

Additionally, a store should specify a canonical URI format that can be used to
identify nodes in this store. Implementations should use the specified formats
when opening a Zarr hierarchy to automatically determine the appropriate store.
jstriebel marked this conversation as resolved.
Show resolved Hide resolved

.. _abstract-store-interface:

Expand Down Expand Up @@ -1396,125 +1399,111 @@ array, the prefix is the empty string. For a non-root array with hierarchy path
- `(1, 0)`
- `foo/baz/c/1/0`

It is recommended that the root of a Zarr fileset ends with ``.zarr``
to indicate the start of a hierarchy to users.


Operations
----------

.. todo::
The following section describes possible operations of an implementation
as a guide-line. Those descriptions are not yet finalized.
The following section describes possible operations of an implementation as a
non-normative guide-line.

Let `P` be an arbitrary hierarchy path.

Let ``array_meta_key(P)`` be the array metadata key for `P`. Let
``group_meta_key(P)`` be the group metadata key for `P`.
Let ``meta_key(P)`` be the metadata key for `P`, ``P/zarr.json``.

Let ``data_key(P, j, i ...)`` be the data key for `P` for the chunk
with grid coordinates (`j`, `i`, ...).

Let "+" be the string concatenation operator.

.. note::

Store and implementation can assume that a client will not try to
create both an *array* and *group* at the same path, and thus
may skip check of existence of a group/array of the same name.

**Create a group**

To create an explicit group at hierarchy path `P`, perform
``set(group_meta_key(P), value)``, where `value` is the
``set(meta_key(P), value)``, where `value` is the
serialization of a valid group metadata document.

If `P` is a non-root path then it is **not** necessary to create
or check for the existence of metadata documents for groups at any
of the ancestor paths of `P`. Creating a group at path `P` implies
Creating a group at path `P` implies
the existence of groups at all ancestor paths of `P`.

**Create an array**

To create an array at hierarchy path `P`, perform
``set(array_meta_key(P), value)``, where `value` is the
serialisation of a valid array metadata document.
``set(meta_key(P), value)``, where `value` is the serialisation of a valid
array metadata document.

If `P` is a non-root path then it is **not** necessary to create
or check for the existence of metadata documents for groups at any
of the ancestor paths of `P`. Creating an array at path `P`
implies the existence of groups at all ancestor paths of `P`.
Creating an array at path `P` implies the existence of groups at all
ancestor paths of `P`.

**Store chunk data in an array**

To store chunk data in an array at path `P` and chunk coordinate (`j`, `i`,
...), perform ``set(data_key(P, j, i, ...), value)``, where
`value` is the serialisation of the corresponding chunk, encoded
according to the information in the array metadata stored under
the key ``array_meta_key(P)``.
...), perform ``set(data_key(P, j, i, ...), value)``, where `value` is the
serialisation of the corresponding chunk, encoded according to the
information in the array metadata stored under the key ``meta_key(P)``.

**Retrieve chunk data in an array**

To retrieve chunk data in an array at path `P` and chunk coordinate (`i`,
`j`, ...), perform ``get(data_key(P, j, i, ...), value)``. The returned
value is the serialisation of the corresponding chunk, encoded
according to the array metadata stored at ``array_meta_key(P)``.
value is the serialisation of the corresponding chunk, encoded according to
the array metadata stored at ``meta_key(P)``.

**Discover children of a group**

To discover the children of a group at hierarchy path `P`, perform
``list_dir(P + "/")``. Any returned prefix not being ``c`` or
starting with ``__`` indicates a child group implied by some
descendant group or array.
``list_dir(P + "/")``. Any returned prefix not starting with ``__``
indicates a child group implied by some descendant group or array.
jstriebel marked this conversation as resolved.
Show resolved Hide resolved

For example, if a group is created at path "/foo/bar" and an array
is created at path "/foo/baz/qux", then the store will contain the
keys "foo/bar/zarr.json" and "foo/baz/qux/zarr.json".
Groups at paths "/", "/foo" and "/foo/baz" have not been explicitly
created but are implied by their descendants. To list the children
of the group at path "/foo", perform ``list_dir("meta/foo/")``,
which will return the prefixes "meta/foo/bar" and "meta/foo/baz".
of the group at path "/foo", perform ``list_dir("/foo/")``,
which will return the prefixes "foo/bar" and "foo/baz".
From this it can be inferred that child groups or arrays
"/foo/bar" and "/foo/baz" are present.

If a store does not support any of the list operations then
discovery of group children is not possible, and the contents of
the hierarchy must be communicated by some other means, such as
via an extension, or via some out of band communication.
If a store does not support any of the list operations then discovery of
group children is not possible, and the contents of the hierarchy must be
communicated by some other means, such as via an extension (see
https://github.com/zarr-developers/zarr-specs/issues/15) or via some out of
band communication.

**Discover all nodes in a hierarchy**

To discover all nodes in a hierarchy, one can call
``list_prefix("meta/")``. All keys represent either explicit group or
arrays. All intermediate prefixes ending in a ``/`` are implicit
To discover all nodes in a hierarchy, one should discover the children of
the root of the hierarchy and then recursively list children of child
groups.

For hierarchies without group storage transformers one may also call
``list_prefix("/")``. All ``zarr.json`` keys represent either explicit
groups or arrays. All intermediate prefixes ending in a ``/`` are implicit
groups.

**Erase a group or array**

To erase an array at path `P`:
- erase the metadata document for the array, ``erase(array_meta_key(P))``
- erase all data keys which prefix have path pointing to this array,
``erase_prefix("data" + P + "/")``
To erase an array at path `P`, erase the metadata document and array data
for the array, ``erase_prefix(P + "/")``.

To erase an implicit group at path `P`:
- erase all nodes under this group - it should be sufficient to
perform ``erase_prefix("meta" + P + "/")`` and
``erase_prefix("data" + P + "/")``.

To erase an explicit group at path `P`:
- erase the metadata document for the group, ``erase(group_meta_key(P))``
- erase all nodes under this group - it should be sufficient to
perform ``erase_prefix("meta" + P + "/")`` and
``erase_prefix("data" + P + "/")``.
To erase an explicit or implicit group at path `P`: erase all nodes under
this group and its metadata document - it should be sufficient to perform
``erase_prefix(P + "/")``

**Determine if a node exists**

To determine if a node exists at path ``P``, try in the following
order ``get(array_meta_key(P))`` (success implies an array at
``P``); ``get(group_meta_key(P))`` (success implies an explicit
group at ``P``); ``list_dir("meta" + P + "/")`` (non-empty
result set implies an implicit group at ``P``).
To determine if a node exists at path ``P``, try in the following order

- ``get(meta_key(P))``
(success implies an array or explicit group at ``P``);
- ``list_dir(P + "/")``
(non-empty result set implies an implicit group at ``P``).

.. note::
For listable store, ``list_dir(parent(P))`` can be an alternative.
For listable stores, ``list_dir(parent(P))`` can be an alternative.


Storage transformers
Expand Down Expand Up @@ -1601,6 +1590,9 @@ a prefix will erase all the implicit group in the prefix.
Care must thus be taken when erasing an array or a group if the parent needs to
be converted into an explicit group.

A race-condition arises if a client writes an array at path ``P``,
and another concurrently assumes ``P`` is an implicit group and writes subgroups or arrays into it.
jstriebel marked this conversation as resolved.
Show resolved Hide resolved

Resizing
--------

Expand Down
16 changes: 16 additions & 0 deletions docs/v3/stores/filesystem/v1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,19 @@ in the section above.
directory path ``dp``.


Canonical URI
=============

The canonical URI format for this store follows the file URI scheme defined in
[RFC8089]_. In the common case where the hostname is omitted this is
``"file:///" + P``, where `P` is the base directory path, e.g.
``"file:///C:\\data\\foo\\bar"``.
jstriebel marked this conversation as resolved.
Show resolved Hide resolved

In cases where a filesystem store may be considered the default, the
``"file:///"`` prefix can be omitted and only the base directory path is used,
possibly with a leading ``/`` for POSIX file systems.


Store limitations
=================

Expand All @@ -199,6 +212,9 @@ References
Requirement Levels. March 1997. Best Current Practice. URL:
https://tools.ietf.org/html/rfc2119

.. [RFC8089] M. Kerwin. The "file" URI Scheme. February 2017. Proposed Standard.
URL: https://tools.ietf.org/html/rfc8089


Change log
==========
Expand Down