Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with Zarr 3 #123

Closed
QuLogic opened this issue Jan 20, 2025 · 13 comments
Closed

Compatibility with Zarr 3 #123

QuLogic opened this issue Jan 20, 2025 · 13 comments
Labels
enhancement New feature or request

Comments

@QuLogic
Copy link

QuLogic commented Jan 20, 2025

I've run a test build with Zarr 3, which fails 168 tests.

All of them appear to be variations of:

___________________________ test_numcodecs[lzma-rgb] ___________________________

codec = 'lzma', photometric = 'rgb'

    @pytest.mark.skipif(SKIP_NUMCODECS, reason='zarr or numcodecs missing')
    @pytest.mark.parametrize('photometric', ['gray', 'rgb'])
    @pytest.mark.parametrize(
        'codec',
        [...],
    )
    def test_numcodecs(codec, photometric):
        """Test numcodecs though roundtrips."""
        data = numpy.load(datafiles('rgb.u1.npy'))
        data = numpy.stack([data, data])
        if photometric == 'rgb':
            shape = data.shape
            chunks = (1, 128, 128, 3)
            axis = -2
        else:
            data = data[:, :, 1].copy()
            shape = data.shape
            chunks = (1, 128, 128)
            axis = -1
    
        lossless = True
        # ...
        # Many switches on parameters
        # ...

        if 0:
            # use ZIP file on disk
            fname = f'test_{codec}.{photometric}.{data.dtype.str[1:]}.zarr.zip'
            store = zarr.ZipStore(fname, mode='w')
        else:
>           store = zarr.MemoryStore()
E           AttributeError: module 'zarr' has no attribute 'MemoryStore'

tests/test_imagecodecs.py:3939: AttributeError

See https://zarr.readthedocs.io/en/latest/user-guide/v3_migration.html for further details.

@QuLogic
Copy link
Author

QuLogic commented Jan 20, 2025

I actually realized the migration guide doesn't mention zarr.MemoryStore, so I've reported the above-linked issue to Zarr to clarify how to migrate.

@cgohlke cgohlke added the enhancement New feature or request label Jan 20, 2025
@QuLogic
Copy link
Author

QuLogic commented Jan 20, 2025

Sorry, Fedora was a bit behind on releases; I've re-built with the latest version https://copr.fedorainfracloud.org/coprs/qulogic/zarr3/build/8547624/ There are only 97 failures in that case, but still of the same style.

@cgohlke
Copy link
Owner

cgohlke commented Jan 20, 2025

I am aware of the issue.

Technically it is that Zarr 3 that no longer supports numcodecs. Imagecodecs itself does not use zarr anywhere (except for testing) and does not claim to support zarr 3:

Czifile, Zarr 2, kerchunk, and other scientific image input/output packages.

- `numcodecs <https://pypi.org/project/numcodecs/>`_ 0.14.1
(optional, for Zarr 2 compatible codecs)

'zarr<3',

Zarr 3 codecs are much more complicated than numcodecs codecs. They include multi-class inheritance, mixins, dataclasses, async, type hints, partial reads, and whatnot. It's going to be a major task to implement and test zarr 3 compatible codecs for all 64 imagecodecs codecs.

@QuLogic
Copy link
Author

QuLogic commented Jan 20, 2025

Imagecodecs itself does not use zarr anywhere (except for testing) and does not claim to support zarr 3:

I'm aware; this is a preliminary check before doing any updating.

Zarr 3 codecs are much more complicated than numcodecs codecs. They include multi-class inheritance, mixins, dataclasses, async, type hints, partial reads, and whatnot. It's going to be a major task to implement and test zarr 3 compatible codecs for all 64 imagecodecs codecs.

It appears that this is not specifically Zarr 3, but Zarr file format 3. If you pass zarr_format=2 to create, then at least construction doesn't complain, and cuts down failures to 79. Unfortunately, there's a type mismatch (imagecodecs.*_decode seems to expect uint8_t, but the input is signed char), and I haven't worked out where in the path this arises and thus if it's a bug or not (or with who).

@QuLogic
Copy link
Author

QuLogic commented Jan 20, 2025

Unfortunately, there's a type mismatch (imagecodecs.*_decode seems to expect uint8_t, but the input is signed char), and I haven't worked out where in the path this arises and thus if it's a bug or not (or with who).

I believe this is a bug in Zarr, as in zarr-developers/zarr-python#2735. If I patch in the correct signedness, then I'm down to 9 test failures of imcd_byteshuffle returned IMCD_VALUE_ERROR (though Fedora doesn't run all codecs yet.)

@cgohlke
Copy link
Owner

cgohlke commented Jan 20, 2025

I might have misunderstood this issue: is the objective merely getting the imagecodecs tests to pass with Zarr 3 installed? Or to support Zarr 3, that is (for me at least) provide Zarr format 3 compatible codecs?

@cgohlke
Copy link
Owner

cgohlke commented Jan 20, 2025

These are the remaining test failures with patched test_imagecodecs.py and patched zarr 3.0.1:

================================================================================= short test summary info =================================================================================
FAILED tests/test_imagecodecs.py::test_numcodecs[byteshuffle-gray] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[byteshuffle-rgb] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[byteshuffle-stack] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[bz2-gray] - ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
FAILED tests/test_imagecodecs.py::test_numcodecs[bz2-rgb] - ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
FAILED tests/test_imagecodecs.py::test_numcodecs[bz2-stack] - ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
FAILED tests/test_imagecodecs.py::test_numcodecs[floatpred-gray] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[floatpred-rgb] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
FAILED tests/test_imagecodecs.py::test_numcodecs[floatpred-stack] - imagecodecs._imcd.ImcdError: imcd_byteshuffle returned IMCD_VALUE_ERROR
======================================================================== 9 failed, 137 passed, 13 xfailed in 8.55s ========================================================================

The pglz codec tests hang.

@cgohlke
Copy link
Owner

cgohlke commented Jan 20, 2025

The test failures and hangs with patched Zarr 3 are fixed by:

diff --git a/imagecodecs/numcodecs.py b/imagecodecs/numcodecs.py
index 8113337..4cfb0cc 100644
--- a/imagecodecs/numcodecs.py
+++ b/imagecodecs/numcodecs.py
@@ -414,8 +480,7 @@ class Byteshuffle(Codec):
         ).tobytes()
 
     def decode(self, buf, out=None):
-        if not isinstance(buf, numpy.ndarray):
-            buf = numpy.frombuffer(buf, dtype=self.dtype).reshape(*self.shape)
+        buf = numpy.frombuffer(buf, dtype=self.dtype).reshape(*self.shape)
         return imagecodecs.byteshuffle_decode(
             buf,
             axis=self.axis,
@@ -770,8 +835,7 @@ class Floatpred(Codec):
         ).tobytes()
 
     def decode(self, buf, out=None):
-        if not isinstance(buf, numpy.ndarray):
-            buf = numpy.frombuffer(buf, dtype=self.dtype).reshape(*self.shape)
+        buf = numpy.frombuffer(buf, dtype=self.dtype).reshape(*self.shape)
         return imagecodecs.floatpred_decode(
             buf, axis=self.axis, dist=self.dist, out=out
         )
diff --git a/imagecodecs/_bz2.pyx b/imagecodecs/_bz2.pyx
index 5f1ba07..3d142af 100644
--- a/imagecodecs/_bz2.pyx
+++ b/imagecodecs/_bz2.pyx
@@ -101,7 +101,7 @@ def bz2_encode(data, level=None, out=None):
     if out is None and dstsize < 0:
         # use Python's bz2 module
         import bz2
-        return bz2.compress(data, compresslevel)
+        return bz2.compress(bytes(data), compresslevel)
 
     if out is None:
         if dstsize < 0:
@@ -158,7 +158,7 @@ def bz2_decode(data, out=None):
         # use Python's bz2 module
         import bz2
 
-        return bz2.decompress(data)
+        return bz2.decompress(bytes(data))
 
     if out is None:
         if dstsize < 0:
diff --git a/imagecodecs/_pglz.pyx b/imagecodecs/_pglz.pyx
index fef8d01..a2c4ddf 100644
--- a/imagecodecs/_pglz.pyx
+++ b/imagecodecs/_pglz.pyx
@@ -120,13 +120,14 @@ def pglz_encode(
     if dstsize < PGLZ_MAX_OUTPUT(srcsize):
         raise ValueError('output too small')
 
-    with nogil:
-        ret = pglz_compress(
-            <const char*> &src[0],
-            <int32> srcsize,
-            <char*> &dst[offset],
-            pglz_strategy
-        )
+    # pglz_compress is not thread-safe
+    # with nogil:
+    ret = pglz_compress(
+        <const char*> &src[0],
+        <int32> srcsize,
+        <char*> &dst[offset],
+        pglz_strategy
+    )
     if header:
         pdst = <uint8_t*> &dst[0]
         pdst[0] = srcsize & 255
diff --git a/tests/test_imagecodecs.py b/tests/test_imagecodecs.py
index cc786c5..dcc1354 100644
--- a/tests/test_imagecodecs.py
+++ b/tests/test_imagecodecs.py
@@ -5277,7 +5280,11 @@ def test_numcodecs(codec, photometric):
         fname = f'test_{codec}.{photometric}.{data.dtype.str[1:]}.zarr.zip'
         store = zarr.ZipStore(fname, mode='w')
     else:
-        store = zarr.MemoryStore()
+        try:
+            store = zarr.MemoryStore()
+        except AttributeError:
+            # zarr 3
+            store = zarr.storage.MemoryStore()
     z = zarr.create(
         store=store,
         overwrite=True,
@@ -5285,6 +5292,7 @@ def test_numcodecs(codec, photometric):
         chunks=chunks,
         dtype=data.dtype.str,
         compressor=compressor,
+        zarr_format=2,
     )
     z[:] = data
     del z

@cgohlke
Copy link
Owner

cgohlke commented Jan 20, 2025

The imagecodecs doctests fail with Zarr 3 due to tifffile's Zarr stores not being compatible (but that's an entirely other issue).

@QuLogic
Copy link
Author

QuLogic commented Jan 21, 2025

I might have misunderstood this issue: is the objective merely getting the imagecodecs tests to pass with Zarr 3 installed? Or to support Zarr 3, that is (for me at least) provide Zarr format 3 compatible codecs?

I only meant Zarr 3 the library, not Zarr 3 the file format. Sorry if that was unclear.

The test failures and hangs with patched Zarr 3 are fixed by:

I've made roughly the same changes, and it's good to see that that full test suite is not that far off from what Fedora runs. The only one I didn't see was the pglz hang. However, I did make the changes in a way as to reduce copies. I can open a PR to review if you want.

The imagecodecs doctests fail with Zarr 3 due to tifffile's Zarr stores not being compatible (but that's an entirely other issue).

And this one I didn't see; I suppose that Fedora doesn't run doctests.

@cgohlke
Copy link
Owner

cgohlke commented Jan 21, 2025

I did make the changes in a way as to reduce copies. I can open a PR to review if you want.

Sure. Thank you.

@cgohlke
Copy link
Owner

cgohlke commented Jan 21, 2025

I suppose that Fedora doesn't run doctests.

Don't worry. It's just to make sure the examples are actually executable:

Examples
--------
Import the JPEG2K codec:
>>> from imagecodecs import (
... jpeg2k_encode,
... jpeg2k_decode,
... jpeg2k_check,
... jpeg2k_version,
... JPEG2K,
... )
Check that the JPEG2K codec is available in the imagecodecs build:
>>> JPEG2K.available
True
Print the version of the JPEG2K codec's underlying OpenJPEG library:
>>> jpeg2k_version()
'openjpeg 2.5.3'
Encode a numpy array in lossless JP2 format:
>>> array = numpy.random.randint(100, 200, (256, 256, 3), numpy.uint8)
>>> encoded = jpeg2k_encode(array, level=0)
>>> bytes(encoded[:12])
b'\x00\x00\x00\x0cjP \r\n\x87\n'
Check that the encoded bytes likely contain a JPEG 2000 stream:
>>> jpeg2k_check(encoded)
True
Decode the JP2 encoded bytes to a numpy array:
>>> decoded = jpeg2k_decode(encoded)
>>> numpy.array_equal(decoded, array)
True
Decode the JP2 encoded bytes to an existing numpy array:
>>> out = numpy.empty_like(array)
>>> _ = jpeg2k_decode(encoded, out=out)
>>> numpy.array_equal(out, array)
True
Not all codecs are fully implemented, raising exceptions at runtime:
>>> from imagecodecs import tiff_encode
>>> tiff_encode(array)
Traceback (most recent call last):
...
NotImplementedError: tiff_encode
Write the numpy array to a JP2 file:
>>> from imagecodecs import imwrite, imread
>>> imwrite('_test.jp2', array)
Read the image from the JP2 file as numpy array:
>>> image = imread('_test.jp2')
>>> numpy.array_equal(image, array)
True
Create a JPEG 2000 compressed Zarr 2 array:
>>> import zarr
>>> import numcodecs
>>> from imagecodecs.numcodecs import Jpeg2k
>>> numcodecs.register_codec(Jpeg2k)
>>> zarr.zeros(
... (4, 5, 512, 512, 3),
... chunks=(1, 1, 256, 256, 3),
... dtype='u1',
... compressor=Jpeg2k(),
... )
<zarr.core.Array (4, 5, 512, 512, 3) uint8>
Access image data in a sequence of JP2 files via tifffile.FileSequence and
dask.array:
>>> import tifffile
>>> import dask.array
>>> def jp2_read(filename):
... with open(filename, 'rb') as fh:
... data = fh.read()
... return jpeg2k_decode(data)
...
>>> with tifffile.FileSequence(jp2_read, '*.jp2') as ims:
... with ims.aszarr() as store:
... dask.array.from_zarr(store)
...
dask.array<from-zarr, shape=(1, 256, 256, 3)...chunksize=(1, 256, 256, 3)...
Write the Zarr 2 store to a fsspec ReferenceFileSystem in JSON format
and open it as a Zarr array:
>>> store.write_fsspec(
... 'temp.json', url='file://', codec_id='imagecodecs_jpeg2k'
... )
>>> import fsspec
>>> mapper = fsspec.get_mapper(
... 'reference://', fo='temp.json', target_protocol='file'
... )
>>> zarr.open(mapper, mode='r')
<zarr.core.Array (1, 256, 256, 3) uint8 read-only>

@cgohlke
Copy link
Owner

cgohlke commented Jan 21, 2025

This has now been fixed in the development version, except where Zarr provides int8 arrays as "buffer" contrary to Python/Cython norms of uint8.

@cgohlke cgohlke closed this as completed Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants