|
| 1 | + |
| 2 | +Extending Zarr |
| 3 | +============== |
| 4 | + |
| 5 | +Zarr-Python 3 was designed to be extensible. This means that you can extend |
| 6 | +the library by writing custom classes and plugins. Currently, Zarr can be extended |
| 7 | +in the following ways: |
| 8 | + |
| 9 | +Custom codecs |
| 10 | +------------- |
| 11 | + |
| 12 | +.. note:: |
| 13 | + This section explains how custom codecs can be created for Zarr version 3 data. For Zarr |
| 14 | + version 2, codecs should subclass the |
| 15 | + `numcodecs.abc.Codec <https://numcodecs.readthedocs.io/en/stable/abc.html#numcodecs.abc.Codec>`_ |
| 16 | + base class and register through |
| 17 | + `numcodecs.registry.register_codec <https://numcodecs.readthedocs.io/en/stable/registry.html#numcodecs.registry.register_codec>`_. |
| 18 | + |
| 19 | +There are three types of codecs in Zarr: |
| 20 | +- array-to-array |
| 21 | +- array-to-bytes |
| 22 | +- bytes-to-bytes |
| 23 | + |
| 24 | +Array-to-array codecs are used to transform the array data before serializing |
| 25 | +to bytes. Examples include delta encoding or scaling codecs. Array-to-bytes codecs are used |
| 26 | +for serializing the array data to bytes. In Zarr, the main codec to use for numeric arrays |
| 27 | +is the :class:`zarr.codecs.BytesCodec`. Bytes-to-bytes codecs transform the serialized bytestreams |
| 28 | +of the array data. Examples include compression codecs, such as |
| 29 | +:class:`zarr.codecs.GzipCodec`, :class:`zarr.codecs.BloscCodec` or |
| 30 | +:class:`zarr.codecs.ZstdCodec`, and codecs that add a checksum to the bytestream, such as |
| 31 | +:class:`zarr.codecs.Crc32cCodec`. |
| 32 | + |
| 33 | +Custom codecs for Zarr are implemented by subclassing the relevant base class, see |
| 34 | +:class:`zarr.abc.codec.ArrayArrayCodec`, :class:`zarr.abc.codec.ArrayBytesCodec` and |
| 35 | +:class:`zarr.abc.codec.BytesBytesCodec`. Most custom codecs should implemented the |
| 36 | +``_encode_single`` and ``_decode_single`` methods. These methods operate on single chunks |
| 37 | +of the array data. Alternatively, custom codecs can implement the ``encode`` and ``decode`` |
| 38 | +methods, which operate on batches of chunks, in case the codec is intended to implement |
| 39 | +its own batch processing. |
| 40 | + |
| 41 | +Custom codecs should also implement the following methods: |
| 42 | + |
| 43 | +- ``compute_encoded_size``, which returns the byte size of the encoded data given the byte |
| 44 | + size of the original data. It should raise ``NotImplementedError`` for codecs with |
| 45 | + variable-sized outputs, such as compression codecs. |
| 46 | +- ``validate`` (optional), which can be used to check that the codec metadata is compatible with the |
| 47 | + array metadata. It should raise errors if not. |
| 48 | +- ``resolve_metadata`` (optional), which is important for codecs that change the shape, |
| 49 | + dtype or fill value of a chunk. |
| 50 | +- ``evolve_from_array_spec`` (optional), which can be useful for automatically filling in |
| 51 | + codec configuration metadata from the array metadata. |
| 52 | + |
| 53 | +To use custom codecs in Zarr, they need to be registered using the |
| 54 | +`entrypoint mechanism <https://packaging.python.org/en/latest/specifications/entry-points/>`_. |
| 55 | +Commonly, entrypoints are declared in the ``pyproject.toml`` of your package under the |
| 56 | +``[project.entry-points."zarr.codecs"]`` section. Zarr will automatically discover and |
| 57 | +load all codecs registered with the entrypoint mechanism from imported modules. |
| 58 | + |
| 59 | +.. code-block:: toml |
| 60 | +
|
| 61 | + [project.entry-points."zarr.codecs"] |
| 62 | + "custompackage.fancy_codec" = "custompackage:FancyCodec" |
| 63 | +
|
| 64 | +New codecs need to have their own unique identifier. To avoid naming collisions, it is |
| 65 | +strongly recommended to prefix the codec identifier with a unique name. For example, |
| 66 | +the codecs from ``numcodecs`` are prefixed with ``numcodecs.``, e.g. ``numcodecs.delta``. |
| 67 | + |
| 68 | +.. note:: |
| 69 | + Note that the extension mechanism for the Zarr version 3 is still under development. |
| 70 | + Requirements for custom codecs including the choice of codec identifiers might |
| 71 | + change in the future. |
| 72 | + |
| 73 | +It is also possible to register codecs as replacements for existing codecs. This might be |
| 74 | +useful for providing specialized implementations, such as GPU-based codecs. In case of |
| 75 | +multiple codecs, the :mod:`zarr.core.config` mechanism can be used to select the preferred |
| 76 | +implementation. |
| 77 | + |
| 78 | +Custom stores |
| 79 | +------------- |
| 80 | + |
| 81 | +Coming soon. |
| 82 | + |
| 83 | +Custom array buffers |
| 84 | +-------------------- |
| 85 | + |
| 86 | +Coming soon. |
| 87 | + |
| 88 | +Other extensions |
| 89 | +---------------- |
| 90 | + |
| 91 | +In the future, Zarr will support writing custom custom data types and chunk grids. |
0 commit comments