Skip to content

Commit a418508

Browse files
jhammannormanrzdstansbyd-v-b
authored
docs: add docs on extending zarr 3 (#2597)
* docs: add docs on extending zarr 3 * Apply suggestions from code review Co-authored-by: David Stansby <[email protected]> * move note up * remove test.py (#2612) * Note that whole directories can be deleted in LocalStore (#2606) * fix: run-coverage command now tracks src directory (#2615) * fix doc build * Update docs/user-guide/extending.rst --------- Co-authored-by: Norman Rzepka <[email protected]> Co-authored-by: David Stansby <[email protected]> Co-authored-by: Davis Bennett <[email protected]>
1 parent 7c163e8 commit a418508

File tree

6 files changed

+110
-13
lines changed

6 files changed

+110
-13
lines changed

Diff for: docs/user-guide/arrays.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ algorithm (compression level 3) internally within Blosc, and with the
196196
bit-shuffle filter applied.
197197

198198
When using a compressor, it can be useful to get some diagnostics on the
199-
compression ratio. Zarr arrays provide the :property:`zarr.Array.info` property
199+
compression ratio. Zarr arrays provide the :attr:`zarr.Array.info` property
200200
which can be used to print useful diagnostics, e.g.:
201201

202202
.. ipython:: python
@@ -212,7 +212,7 @@ prints additional diagnostics, e.g.:
212212
213213
.. note::
214214
:func:`zarr.Array.info_complete` will inspect the underlying store and may
215-
be slow for large arrays. Use :property:`zarr.Array.info` if detailed storage
215+
be slow for large arrays. Use :attr:`zarr.Array.info` if detailed storage
216216
statistics are not needed.
217217

218218
If you don't specify a compressor, by default Zarr uses the Blosc

Diff for: docs/user-guide/extending.rst

+91
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
2+
Extending Zarr
3+
==============
4+
5+
Zarr-Python 3 was designed to be extensible. This means that you can extend
6+
the library by writing custom classes and plugins. Currently, Zarr can be extended
7+
in the following ways:
8+
9+
Custom codecs
10+
-------------
11+
12+
.. note::
13+
This section explains how custom codecs can be created for Zarr version 3 data. For Zarr
14+
version 2, codecs should subclass the
15+
`numcodecs.abc.Codec <https://numcodecs.readthedocs.io/en/stable/abc.html#numcodecs.abc.Codec>`_
16+
base class and register through
17+
`numcodecs.registry.register_codec <https://numcodecs.readthedocs.io/en/stable/registry.html#numcodecs.registry.register_codec>`_.
18+
19+
There are three types of codecs in Zarr:
20+
- array-to-array
21+
- array-to-bytes
22+
- bytes-to-bytes
23+
24+
Array-to-array codecs are used to transform the array data before serializing
25+
to bytes. Examples include delta encoding or scaling codecs. Array-to-bytes codecs are used
26+
for serializing the array data to bytes. In Zarr, the main codec to use for numeric arrays
27+
is the :class:`zarr.codecs.BytesCodec`. Bytes-to-bytes codecs transform the serialized bytestreams
28+
of the array data. Examples include compression codecs, such as
29+
:class:`zarr.codecs.GzipCodec`, :class:`zarr.codecs.BloscCodec` or
30+
:class:`zarr.codecs.ZstdCodec`, and codecs that add a checksum to the bytestream, such as
31+
:class:`zarr.codecs.Crc32cCodec`.
32+
33+
Custom codecs for Zarr are implemented by subclassing the relevant base class, see
34+
:class:`zarr.abc.codec.ArrayArrayCodec`, :class:`zarr.abc.codec.ArrayBytesCodec` and
35+
:class:`zarr.abc.codec.BytesBytesCodec`. Most custom codecs should implemented the
36+
``_encode_single`` and ``_decode_single`` methods. These methods operate on single chunks
37+
of the array data. Alternatively, custom codecs can implement the ``encode`` and ``decode``
38+
methods, which operate on batches of chunks, in case the codec is intended to implement
39+
its own batch processing.
40+
41+
Custom codecs should also implement the following methods:
42+
43+
- ``compute_encoded_size``, which returns the byte size of the encoded data given the byte
44+
size of the original data. It should raise ``NotImplementedError`` for codecs with
45+
variable-sized outputs, such as compression codecs.
46+
- ``validate`` (optional), which can be used to check that the codec metadata is compatible with the
47+
array metadata. It should raise errors if not.
48+
- ``resolve_metadata`` (optional), which is important for codecs that change the shape,
49+
dtype or fill value of a chunk.
50+
- ``evolve_from_array_spec`` (optional), which can be useful for automatically filling in
51+
codec configuration metadata from the array metadata.
52+
53+
To use custom codecs in Zarr, they need to be registered using the
54+
`entrypoint mechanism <https://packaging.python.org/en/latest/specifications/entry-points/>`_.
55+
Commonly, entrypoints are declared in the ``pyproject.toml`` of your package under the
56+
``[project.entry-points."zarr.codecs"]`` section. Zarr will automatically discover and
57+
load all codecs registered with the entrypoint mechanism from imported modules.
58+
59+
.. code-block:: toml
60+
61+
[project.entry-points."zarr.codecs"]
62+
"custompackage.fancy_codec" = "custompackage:FancyCodec"
63+
64+
New codecs need to have their own unique identifier. To avoid naming collisions, it is
65+
strongly recommended to prefix the codec identifier with a unique name. For example,
66+
the codecs from ``numcodecs`` are prefixed with ``numcodecs.``, e.g. ``numcodecs.delta``.
67+
68+
.. note::
69+
Note that the extension mechanism for the Zarr version 3 is still under development.
70+
Requirements for custom codecs including the choice of codec identifiers might
71+
change in the future.
72+
73+
It is also possible to register codecs as replacements for existing codecs. This might be
74+
useful for providing specialized implementations, such as GPU-based codecs. In case of
75+
multiple codecs, the :mod:`zarr.core.config` mechanism can be used to select the preferred
76+
implementation.
77+
78+
Custom stores
79+
-------------
80+
81+
Coming soon.
82+
83+
Custom array buffers
84+
--------------------
85+
86+
Coming soon.
87+
88+
Other extensions
89+
----------------
90+
91+
In the future, Zarr will support writing custom custom data types and chunk grids.

Diff for: docs/user-guide/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ Advanced Topics
2424

2525
performance
2626
consolidated_metadata
27+
extending
28+
2729

2830
.. Coming soon
2931
async
30-
extending

Diff for: pyproject.toml

+3-3
Original file line numberDiff line numberDiff line change
@@ -139,8 +139,8 @@ numpy = ["1.25", "2.1"]
139139
features = ["gpu"]
140140

141141
[tool.hatch.envs.test.scripts]
142-
run-coverage = "pytest --cov-config=pyproject.toml --cov=pkg --cov=tests"
143-
run-coverage-gpu = "pip install cupy-cuda12x && pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov=tests"
142+
run-coverage = "pytest --cov-config=pyproject.toml --cov=pkg --cov=src"
143+
run-coverage-gpu = "pip install cupy-cuda12x && pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov=src"
144144
run = "run-coverage --no-cov"
145145
run-verbose = "run-coverage --verbose"
146146
run-mypy = "mypy src"
@@ -160,7 +160,7 @@ numpy = ["1.25", "2.1"]
160160
version = ["minimal"]
161161

162162
[tool.hatch.envs.gputest.scripts]
163-
run-coverage = "pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov=tests"
163+
run-coverage = "pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov=src"
164164
run = "run-coverage --no-cov"
165165
run-verbose = "run-coverage --verbose"
166166
run-mypy = "mypy src"

Diff for: src/zarr/storage/local.py

+12
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,18 @@ async def set_partial_values(
189189
await concurrent_map(args, asyncio.to_thread, limit=None) # TODO: fix limit
190190

191191
async def delete(self, key: str) -> None:
192+
"""
193+
Remove a key from the store.
194+
195+
Parameters
196+
----------
197+
key : str
198+
199+
Notes
200+
-----
201+
If ``key`` is a directory within this store, the entire directory
202+
at ``store.root / key`` is deleted.
203+
"""
192204
# docstring inherited
193205
self._check_writable()
194206
path = self.root / key

Diff for: test.py

-7
This file was deleted.

0 commit comments

Comments
 (0)