Fix JPEG 2000 float->int quantization by juntyr · Pull Request #26 · ClimateBenchPress/compressor

juntyr · 2025-04-17T12:59:36Z

While experimenting, I realized that our current float->int remapping for JPEG 2000 is faulty.

the PSRN should probably use the data range, which is in the same units as the error bound
we need to scale from [min, max] to [0, max_pixel_val] since JPEG expects images
we need to do the rescaling in higher precision since 2**25 - 1 cannot be represented without loss by a float32

This commit doesn't work by itself since building the compressor now requires extra info on the range. Perhaps @treigerm could take a look on how to get it actually working? Feel free to push to this branch.

Fixes #18

juntyr · 2025-04-17T13:03:05Z

Though we also have to keep in mind that we already introduce error during quantization so the PSNR bound likely needs to be a bit lower to compensate

treigerm · 2025-04-17T14:16:40Z

Thanks @juntyr ! I will try to convert it into a working version tomorrow. I also looked at SZ3 and they seem to introduce a rather cryptic adjustment factor when converting from PSNR->Absolute Error

https://github.com/szcompressor/SZ3/blob/cd4a2611b47b4989e06ab863f60e99df8ce03aae/include/SZ3/utils/Statistic.hpp#L25

where the threshold is fixed to 0.99

https://github.com/szcompressor/SZ3/blob/cd4a2611b47b4989e06ab863f60e99df8ce03aae/include/SZ3/utils/Statistic.hpp#L39

juntyr · 2025-04-17T15:00:31Z

SZ3 goes the other way around where it translates everything into a pointwise absolute error (so quite strict), but I also don't fully understand their conversions

juntyr · 2025-04-18T19:22:21Z

Though we also have to keep in mind that we already introduce error during quantization so the PSNR bound likely needs to be a bit lower to compensate

Well we know what absolute error linear quantisation incurs and if we assume that worst case every element incurs it, we could include that in the absolute error, right?

treigerm · 2025-04-22T10:35:47Z

Well we know what absolute error linear quantisation incurs and if we assume that worst case every element incurs it, we could include that in the absolute error, right?

The error for linear quantisation should provide a lower bound of what absolute error we can achieve, right? So we might need a warning if the user provides a lower bound that is below the error incurred due to the quantization.

Also thinking about the quantization a bit more. I think we need to adjust the PSNR calculation that we are passing to the JPEG compressor. Because JPEG2000 will compute the PSNR on the linearly quantized data, right? Because the input data and quantized data are on different scales the PSNR values will change as well. Does that make sense @juntyr ?

treigerm · 2025-04-22T10:44:14Z

So I pushed a committ that makes the code run, it actually only required a minor change in the actual code.

But right now this code leads to the following error (the same error occurs on multiple datasets, CAMS and ERA5):

Exception: Jpeg2000 only supports signed/unsigned integers up to 25 bits

The above exception was the direct cause of the following exception:

Exception: Jpeg2000 failed to encode the data

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/src/climatebenchpress/compressor/scripts/compress.py", line 86, in compress
    ds_new, measurements = compress_decompress(
                           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/src/climatebenchpress/compressor/scripts/compress.py", line 131, in compress_decompress
    variables[v] = codec_.encode_decode_data_array(ds[v]).compute()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1207, in compute
    return new.load(**kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1175, in load
    ds = self._to_temp_dataset().load(**kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/core/dataset.py", line 899, in load
    evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
                                                       ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py", line 85, in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/dask/base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/core/parallel.py", line 351, in _wrapper
    result = func(*converted_args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/numcodecs_combinators/stack.py", line 197, in encode_decode_data_array_single_chunk
    decoded = self.encode_decode(da.values)  # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/numcodecs_combinators/stack.py", line 149, in encode_decode
    codec.encode((encoded)), flatten=False
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/numcodecs_observers/__init__.py", line 70, in encode
    encoded: Buffer = self._codec.encode(buf)  # type: ignore
                      ^^^^^^^^^^^^^^^^^^^^^^^
Exception: jpeg2000.rs codec's implementation raised an error

I'm a bit stuck on debugging this error. I went into the debugger and checked the behaviour of all the elements in the CodeStack processing pipeline. All the preprocessing does indeed convert the data to integers of the range [0, 2**25 - 1] but the JPEG2000 still produces this error. Any idea why this might happen @juntyr ?

juntyr · 2025-04-22T12:48:21Z

+    def abs_bound_codec(
+        error_bound,
+        *,
+        data_abs_min=None,


The JPEG codec needs the actual minimum and maximum, not the minimum absolute value.

Ah sorry, yes that's a bug. It won't be the cause of the issue though because for CAMS all the values are positive.

I'll fix this though

juntyr · 2025-04-22T12:54:59Z

Could you check again that the quantized elements are indeed in range? The error is only raised for uint32 if the value is > (2**25 - 1)

treigerm · 2025-04-22T13:46:12Z

This is the check I have done:

(Pdb) pre_jpeg2000 = CodecStack(codec[0], codec[1], codec[2], codec[3]).encode(ds[v].values)
(Pdb) pre_jpeg2000.min(), pre_jpeg2000.max()
(np.int32(0), np.int32(33554431))
(Pdb) 2 ** 25 - 1
33554431
(Pdb) CodecStack(codec[0], codec[1], codec[2], codec[3])
CodecStack(AsType(encode_dtype='<f8', decode_dtype='<f4'), FixedOffsetScale(offset=9.999996436384911e-15, scale=1.7859274880937514e-14, _version='1.0.0'), Round(precision=1.0, _version='1.0.0'), AsType(encode_dtype='<i4', decode_dtype='<f8'))
(Pdb) codec[4].encode(pre_jpeg2000)
*** Exception: jpeg2000.rs codec's implementation raised an error

with the breakpoint set at

compressor/src/climatebenchpress/compressor/scripts/compress.py

Line 127 in 6fe1c0c

with numcodecs_observers.observe(

treigerm · 2025-04-24T10:00:31Z

Having another stab at figuring out what could be going on here..

I think we need to adjust the PSNR calculation that we are passing to the JPEG compressor. Because JPEG2000 will compute the PSNR on the linearly quantized data, right?

I thought about this again and actually I was wrong, we don't need to adjust the PSNR because it is invariant to the scale-shift operations that we are doing.

Additionally, adjusted the new compressors that came from the main branch to be able to work with the new abstract base class interface.

treigerm · 2025-04-24T11:36:55Z

@juntyr So setting max_pixel_val = 2**24 - 1 seems to fix the issue. Setting max_pixel_val = 2**24 leads to an error so I assume JPEG200 is using a 25 bit signed integer/24 bit unsigned integer?

Either way, should be ready to merge now!

juntyr · 2025-04-24T11:39:25Z

+            # round and truncate to integer values
            numcodecs_wasm_round.Round(precision=1),
            numcodecs.astype.AsType(
                encode_dtype="int32",


That's the problem, you're right. We're going to signed int32, which supports [-2^24, 2^24 - 1]. If you change this to uint32 (since we now scale to [0, ...]), the max pixel value of 2^25 - 1 should work.

Just checked it, you're right! I know switched back to max pixel val 2^25 - 1 but using uint32 because it should give more range!

Thank you for finding what our issue was!

Draft JPEG2000 quantization fix

becec7f

juntyr requested a review from treigerm April 17, 2025 12:59

Implementation of new interface to allow JPEG2000 quantization

6fe1c0c

juntyr commented Apr 22, 2025

View reviewed changes

Pass in data_min not data_abs_min to jpeg2000

85e67ea

treigerm added 2 commits April 24, 2025 12:56

Adjust max_pixel_value bound

2373a7a

Merge remote-tracking branch 'origin/main' into jpeg2000-quantization

28a10e6

Additionally, adjusted the new compressors that came from the main branch to be able to work with the new abstract base class interface.

juntyr commented Apr 24, 2025

View reviewed changes

Switch to unsigned integer

44653bd

treigerm approved these changes Apr 24, 2025

View reviewed changes

treigerm merged commit 5a70912 into main Apr 24, 2025
3 checks passed

juntyr deleted the jpeg2000-quantization branch April 24, 2025 13:01

Conversation

juntyr commented Apr 17, 2025

Uh oh!

juntyr commented Apr 17, 2025

Uh oh!

treigerm commented Apr 17, 2025

Uh oh!

juntyr commented Apr 17, 2025

Uh oh!

juntyr commented Apr 18, 2025

Uh oh!

treigerm commented Apr 22, 2025

Uh oh!

treigerm commented Apr 22, 2025

Uh oh!

juntyr Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

treigerm Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

treigerm Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

juntyr commented Apr 22, 2025

Uh oh!

treigerm commented Apr 22, 2025

Uh oh!

treigerm commented Apr 24, 2025

Uh oh!

treigerm commented Apr 24, 2025

Uh oh!

juntyr Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

treigerm Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

juntyr Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants