Compatibility with Zarr 3 #123

QuLogic · 2025-01-20T00:31:44Z

I've run a test build with Zarr 3, which fails 168 tests.

All of them appear to be variations of:

___________________________ test_numcodecs[lzma-rgb] ___________________________

codec = 'lzma', photometric = 'rgb'

    @pytest.mark.skipif(SKIP_NUMCODECS, reason='zarr or numcodecs missing')
    @pytest.mark.parametrize('photometric', ['gray', 'rgb'])
    @pytest.mark.parametrize(
        'codec',
        [...],
    )
    def test_numcodecs(codec, photometric):
        """Test numcodecs though roundtrips."""
        data = numpy.load(datafiles('rgb.u1.npy'))
        data = numpy.stack([data, data])
        if photometric == 'rgb':
            shape = data.shape
            chunks = (1, 128, 128, 3)
            axis = -2
        else:
            data = data[:, :, 1].copy()
            shape = data.shape
            chunks = (1, 128, 128)
            axis = -1
    
        lossless = True
        # ...
        # Many switches on parameters
        # ...

        if 0:
            # use ZIP file on disk
            fname = f'test_{codec}.{photometric}.{data.dtype.str[1:]}.zarr.zip'
            store = zarr.ZipStore(fname, mode='w')
        else:
>           store = zarr.MemoryStore()
E           AttributeError: module 'zarr' has no attribute 'MemoryStore'

tests/test_imagecodecs.py:3939: AttributeError

See https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html for further details.

The text was updated successfully, but these errors were encountered:

QuLogic · 2025-01-20T00:40:00Z

I actually realized the migration guide doesn't mention zarr.MemoryStore, so I've reported the above-linked issue to Zarr to clarify how to migrate.

QuLogic · 2025-01-20T01:28:11Z

Sorry, Fedora was a bit behind on releases; I've re-built with the latest version https://copr.fedorainfracloud.org/coprs/qulogic/zarr3/build/8547624/ There are only 97 failures in that case, but still of the same style.

cgohlke · 2025-01-20T01:37:30Z

I am aware of the issue.

Technically it is that Zarr 3 that no longer supports numcodecs. Imagecodecs itself does not use zarr anywhere (except for testing) and does not claim to support zarr 3:

imagecodecs/imagecodecs/imagecodecs.py

Line 36 in 988af50

Czifile, Zarr 2, kerchunk, and other scientific image input/output packages.

imagecodecs/imagecodecs/imagecodecs.py

Lines 85 to 86 in 988af50

    
           - `numcodecs <https://pypi.org/project/numcodecs/>`_ 0.14.1 
        
             (optional, for Zarr 2 compatible codecs)

imagecodecs/setup.py

Line 766 in 988af50

'zarr<3',

Zarr 3 codecs are much more complicated than numcodecs codecs. They include multi-class inheritance, mixins, dataclasses, async, type hints, partial reads, and whatnot. It's going to be a major task to implement and test zarr 3 compatible codecs for all 64 imagecodecs codecs.

QuLogic · 2025-01-20T08:21:09Z

Imagecodecs itself does not use zarr anywhere (except for testing) and does not claim to support zarr 3:

I'm aware; this is a preliminary check before doing any updating.

Zarr 3 codecs are much more complicated than numcodecs codecs. They include multi-class inheritance, mixins, dataclasses, async, type hints, partial reads, and whatnot. It's going to be a major task to implement and test zarr 3 compatible codecs for all 64 imagecodecs codecs.

It appears that this is not specifically Zarr 3, but Zarr file format 3. If you pass zarr_format=2 to create, then at least construction doesn't complain, and cuts down failures to 79. Unfortunately, there's a type mismatch (imagecodecs.*_decode seems to expect uint8_t, but the input is signed char), and I haven't worked out where in the path this arises and thus if it's a bug or not (or with who).

QuLogic · 2025-01-20T10:09:20Z

Unfortunately, there's a type mismatch (imagecodecs.*_decode seems to expect uint8_t, but the input is signed char), and I haven't worked out where in the path this arises and thus if it's a bug or not (or with who).

I believe this is a bug in Zarr, as in zarr-developers/zarr-python#2735. If I patch in the correct signedness, then I'm down to 9 test failures of imcd_byteshuffle returned IMCD_VALUE_ERROR (though Fedora doesn't run all codecs yet.)

cgohlke · 2025-01-20T14:03:54Z

I might have misunderstood this issue: is the objective merely getting the imagecodecs tests to pass with Zarr 3 installed? Or to support Zarr 3, that is (for me at least) provide Zarr format 3 compatible codecs?

cgohlke · 2025-01-20T14:25:10Z

These are the remaining test failures with patched test_imagecodecs.py and patched zarr 3.0.1:

================================================================================= short test summary info =================================================================================
FAILED tests/test_imagecodecs.py::test_numcodecs[byteshuffle-gray] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[byteshuffle-rgb] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[byteshuffle-stack] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[bz2-gray] - ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
FAILED tests/test_imagecodecs.py::test_numcodecs[bz2-rgb] - ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
FAILED tests/test_imagecodecs.py::test_numcodecs[bz2-stack] - ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
FAILED tests/test_imagecodecs.py::test_numcodecs[floatpred-gray] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[floatpred-rgb] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[floatpred-stack] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
======================================================================== 9 failed, 137 passed, 13 xfailed in 8.55s ========================================================================

The pglz codec tests hang.

cgohlke · 2025-01-20T15:50:41Z

The test failures and hangs with patched Zarr 3 are fixed by:

diff --git a/imagecodecs/numcodecs.py b/imagecodecs/numcodecs.py
index 8113337..4cfb0cc 100644
--- a/imagecodecs/numcodecs.py
+++ b/imagecodecs/numcodecs.py
@@ -414,8 +480,7 @@ class Byteshuffle(Codec):
         ).tobytes()
 
     def decode(self, buf, out=None):
-        if not isinstance(buf, numpy.ndarray):
-            buf = numpy.frombuffer(buf, dtype=self.dtype).reshape(*self.shape)
+        buf = numpy.frombuffer(buf, dtype=self.dtype).reshape(*self.shape)
         return imagecodecs.byteshuffle_decode(
             buf,
             axis=self.axis,
@@ -770,8 +835,7 @@ class Floatpred(Codec):
         ).tobytes()
 
     def decode(self, buf, out=None):
-        if not isinstance(buf, numpy.ndarray):
-            buf = numpy.frombuffer(buf, dtype=self.dtype).reshape(*self.shape)
+        buf = numpy.frombuffer(buf, dtype=self.dtype).reshape(*self.shape)
         return imagecodecs.floatpred_decode(
             buf, axis=self.axis, dist=self.dist, out=out
         )
diff --git a/imagecodecs/_bz2.pyx b/imagecodecs/_bz2.pyx
index 5f1ba07..3d142af 100644
--- a/imagecodecs/_bz2.pyx
+++ b/imagecodecs/_bz2.pyx
@@ -101,7 +101,7 @@ def bz2_encode(data, level=None, out=None):
     if out is None and dstsize < 0:
         # use Python's bz2 module
         import bz2
-        return bz2.compress(data, compresslevel)
+        return bz2.compress(bytes(data), compresslevel)
 
     if out is None:
         if dstsize < 0:
@@ -158,7 +158,7 @@ def bz2_decode(data, out=None):
         # use Python's bz2 module
         import bz2
 
-        return bz2.decompress(data)
+        return bz2.decompress(bytes(data))
 
     if out is None:
         if dstsize < 0:
diff --git a/imagecodecs/_pglz.pyx b/imagecodecs/_pglz.pyx
index fef8d01..a2c4ddf 100644
--- a/imagecodecs/_pglz.pyx
+++ b/imagecodecs/_pglz.pyx
@@ -120,13 +120,14 @@ def pglz_encode(
     if dstsize < PGLZ_MAX_OUTPUT(srcsize):
         raise ValueError('output too small')
 
-    with nogil:
-        ret = pglz_compress(
-            <const char*> &src[0],
-            <int32> srcsize,
-            <char*> &dst[offset],
-            pglz_strategy
-        )
+    # pglz_compress is not thread-safe
+    # with nogil:
+    ret = pglz_compress(
+        <const char*> &src[0],
+        <int32> srcsize,
+        <char*> &dst[offset],
+        pglz_strategy
+    )
     if header:
         pdst = <uint8_t*> &dst[0]
         pdst[0] = srcsize & 255
diff --git a/tests/test_imagecodecs.py b/tests/test_imagecodecs.py
index cc786c5..dcc1354 100644
--- a/tests/test_imagecodecs.py
+++ b/tests/test_imagecodecs.py
@@ -5277,7 +5280,11 @@ def test_numcodecs(codec, photometric):
         fname = f'test_{codec}.{photometric}.{data.dtype.str[1:]}.zarr.zip'
         store = zarr.ZipStore(fname, mode='w')
     else:
-        store = zarr.MemoryStore()
+        try:
+            store = zarr.MemoryStore()
+        except AttributeError:
+            # zarr 3
+            store = zarr.storage.MemoryStore()
     z = zarr.create(
         store=store,
         overwrite=True,
@@ -5285,6 +5292,7 @@ def test_numcodecs(codec, photometric):
         chunks=chunks,
         dtype=data.dtype.str,
         compressor=compressor,
+        zarr_format=2,
     )
     z[:] = data
     del z

cgohlke · 2025-01-20T15:54:32Z

The imagecodecs doctests fail with Zarr 3 due to tifffile's Zarr stores not being compatible (but that's an entirely other issue).

QuLogic · 2025-01-21T02:06:14Z

I might have misunderstood this issue: is the objective merely getting the imagecodecs tests to pass with Zarr 3 installed? Or to support Zarr 3, that is (for me at least) provide Zarr format 3 compatible codecs?

I only meant Zarr 3 the library, not Zarr 3 the file format. Sorry if that was unclear.

The test failures and hangs with patched Zarr 3 are fixed by:

I've made roughly the same changes, and it's good to see that that full test suite is not that far off from what Fedora runs. The only one I didn't see was the pglz hang. However, I did make the changes in a way as to reduce copies. I can open a PR to review if you want.

The imagecodecs doctests fail with Zarr 3 due to tifffile's Zarr stores not being compatible (but that's an entirely other issue).

And this one I didn't see; I suppose that Fedora doesn't run doctests.

cgohlke · 2025-01-21T02:50:57Z

I did make the changes in a way as to reduce copies. I can open a PR to review if you want.

Sure. Thank you.

cgohlke · 2025-01-21T02:53:26Z

I suppose that Fedora doesn't run doctests.

Don't worry. It's just to make sure the examples are actually executable:

imagecodecs/imagecodecs/imagecodecs.py

Lines 360 to 467 in 988af50

    
           Examples 
        
           -------- 
        
           Import the JPEG2K codec: 
        
           >>> from imagecodecs import ( 
        
           ...     jpeg2k_encode, 
        
           ...     jpeg2k_decode, 
        
           ...     jpeg2k_check, 
        
           ...     jpeg2k_version, 
        
           ...     JPEG2K, 
        
           ... ) 
        
           Check that the JPEG2K codec is available in the imagecodecs build: 
        
           >>> JPEG2K.available 
        
           True 
        
           Print the version of the JPEG2K codec's underlying OpenJPEG library: 
        
           >>> jpeg2k_version() 
        
           'openjpeg 2.5.3' 
        
           Encode a numpy array in lossless JP2 format: 
        
           >>> array = numpy.random.randint(100, 200, (256, 256, 3), numpy.uint8) 
        
           >>> encoded = jpeg2k_encode(array, level=0) 
        
           >>> bytes(encoded[:12]) 
        
           b'\x00\x00\x00\x0cjP  \r\n\x87\n' 
        
           Check that the encoded bytes likely contain a JPEG 2000 stream: 
        
           >>> jpeg2k_check(encoded) 
        
           True 
        
           Decode the JP2 encoded bytes to a numpy array: 
        
           >>> decoded = jpeg2k_decode(encoded) 
        
           >>> numpy.array_equal(decoded, array) 
        
           True 
        
           Decode the JP2 encoded bytes to an existing numpy array: 
        
           >>> out = numpy.empty_like(array) 
        
           >>> _ = jpeg2k_decode(encoded, out=out) 
        
           >>> numpy.array_equal(out, array) 
        
           True 
        
           Not all codecs are fully implemented, raising exceptions at runtime: 
        
           >>> from imagecodecs import tiff_encode 
        
           >>> tiff_encode(array) 
        
           Traceback (most recent call last): 
        
            ... 
        
           NotImplementedError: tiff_encode 
        
           Write the numpy array to a JP2 file: 
        
           >>> from imagecodecs import imwrite, imread 
        
           >>> imwrite('_test.jp2', array) 
        
           Read the image from the JP2 file as numpy array: 
        
           >>> image = imread('_test.jp2') 
        
           >>> numpy.array_equal(image, array) 
        
           True 
        
           Create a JPEG 2000 compressed Zarr 2 array: 
        
           >>> import zarr 
        
           >>> import numcodecs 
        
           >>> from imagecodecs.numcodecs import Jpeg2k 
        
           >>> numcodecs.register_codec(Jpeg2k) 
        
           >>> zarr.zeros( 
        
           ...     (4, 5, 512, 512, 3), 
        
           ...     chunks=(1, 1, 256, 256, 3), 
        
           ...     dtype='u1', 
        
           ...     compressor=Jpeg2k(), 
        
           ... ) 
        
           <zarr.core.Array (4, 5, 512, 512, 3) uint8> 
        
           Access image data in a sequence of JP2 files via tifffile.FileSequence and 
        
           dask.array: 
        
           >>> import tifffile 
        
           >>> import dask.array 
        
           >>> def jp2_read(filename): 
        
           ...     with open(filename, 'rb') as fh: 
        
           ...         data = fh.read() 
        
           ...     return jpeg2k_decode(data) 
        
           ... 
        
           >>> with tifffile.FileSequence(jp2_read, '*.jp2') as ims: 
        
           ...     with ims.aszarr() as store: 
        
           ...         dask.array.from_zarr(store) 
        
           ... 
        
           dask.array<from-zarr, shape=(1, 256, 256, 3)...chunksize=(1, 256, 256, 3)... 
        
           Write the Zarr 2 store to a fsspec ReferenceFileSystem in JSON format 
        
           and open it as a Zarr array: 
        
           >>> store.write_fsspec( 
        
           ...     'temp.json', url='file://', codec_id='imagecodecs_jpeg2k' 
        
           ... ) 
        
           >>> import fsspec 
        
           >>> mapper = fsspec.get_mapper( 
        
           ...     'reference://', fo='temp.json', target_protocol='file' 
        
           ... ) 
        
           >>> zarr.open(mapper, mode='r') 
        
           <zarr.core.Array (1, 256, 256, 3) uint8 read-only>

cgohlke · 2025-01-21T17:00:18Z

This has now been fixed in the development version, except where Zarr provides int8 arrays as "buffer" contrary to Python/Cython norms of uint8.

QuLogic mentioned this issue Jan 20, 2025

DOC: Top-level Store removal not in v3 migration guide zarr-developers/zarr-python#2733

Open

cgohlke added the enhancement New feature or request label Jan 20, 2025

QuLogic mentioned this issue Jan 20, 2025

Buffer uses signed bytes with v2 compressors zarr-developers/zarr-python#2735

Open

QuLogic mentioned this issue Jan 21, 2025

Add Zarr3 compatibility for testing #124

Closed

cgohlke closed this as completed Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility with Zarr 3 #123

Compatibility with Zarr 3 #123

QuLogic commented Jan 20, 2025

QuLogic commented Jan 20, 2025

QuLogic commented Jan 20, 2025 •

edited

Loading

cgohlke commented Jan 20, 2025 •

edited

Loading

QuLogic commented Jan 20, 2025

QuLogic commented Jan 20, 2025 •

edited

Loading

cgohlke commented Jan 20, 2025 •

edited

Loading

cgohlke commented Jan 20, 2025

cgohlke commented Jan 20, 2025 •

edited

Loading

cgohlke commented Jan 20, 2025

QuLogic commented Jan 21, 2025

cgohlke commented Jan 21, 2025

cgohlke commented Jan 21, 2025

cgohlke commented Jan 21, 2025

Compatibility with Zarr 3 #123

Compatibility with Zarr 3 #123

Comments

QuLogic commented Jan 20, 2025

QuLogic commented Jan 20, 2025

QuLogic commented Jan 20, 2025 • edited Loading

cgohlke commented Jan 20, 2025 • edited Loading

QuLogic commented Jan 20, 2025

QuLogic commented Jan 20, 2025 • edited Loading

cgohlke commented Jan 20, 2025 • edited Loading

cgohlke commented Jan 20, 2025

cgohlke commented Jan 20, 2025 • edited Loading

cgohlke commented Jan 20, 2025

QuLogic commented Jan 21, 2025

cgohlke commented Jan 21, 2025

cgohlke commented Jan 21, 2025

cgohlke commented Jan 21, 2025

QuLogic commented Jan 20, 2025 •

edited

Loading

cgohlke commented Jan 20, 2025 •

edited

Loading

QuLogic commented Jan 20, 2025 •

edited

Loading

cgohlke commented Jan 20, 2025 •

edited

Loading

cgohlke commented Jan 20, 2025 •

edited

Loading