Skip to content

Fix JPEG 2000 float->int quantization#26

Merged
treigerm merged 6 commits into
mainfrom
jpeg2000-quantization
Apr 24, 2025
Merged

Fix JPEG 2000 float->int quantization#26
treigerm merged 6 commits into
mainfrom
jpeg2000-quantization

Conversation

@juntyr
Copy link
Copy Markdown
Collaborator

@juntyr juntyr commented Apr 17, 2025

While experimenting, I realized that our current float->int remapping for JPEG 2000 is faulty.

  1. the PSRN should probably use the data range, which is in the same units as the error bound
  2. we need to scale from [min, max] to [0, max_pixel_val] since JPEG expects images
  3. we need to do the rescaling in higher precision since 2**25 - 1 cannot be represented without loss by a float32

This commit doesn't work by itself since building the compressor now requires extra info on the range. Perhaps @treigerm could take a look on how to get it actually working? Feel free to push to this branch.

Fixes #18

@juntyr juntyr requested a review from treigerm April 17, 2025 12:59
@juntyr
Copy link
Copy Markdown
Collaborator Author

juntyr commented Apr 17, 2025

Though we also have to keep in mind that we already introduce error during quantization so the PSNR bound likely needs to be a bit lower to compensate

@treigerm
Copy link
Copy Markdown
Member

Thanks @juntyr ! I will try to convert it into a working version tomorrow. I also looked at SZ3 and they seem to introduce a rather cryptic adjustment factor when converting from PSNR->Absolute Error

https://github.com/szcompressor/SZ3/blob/cd4a2611b47b4989e06ab863f60e99df8ce03aae/include/SZ3/utils/Statistic.hpp#L25

where the threshold is fixed to 0.99

https://github.com/szcompressor/SZ3/blob/cd4a2611b47b4989e06ab863f60e99df8ce03aae/include/SZ3/utils/Statistic.hpp#L39

@juntyr
Copy link
Copy Markdown
Collaborator Author

juntyr commented Apr 17, 2025

SZ3 goes the other way around where it translates everything into a pointwise absolute error (so quite strict), but I also don't fully understand their conversions

@juntyr
Copy link
Copy Markdown
Collaborator Author

juntyr commented Apr 18, 2025

Though we also have to keep in mind that we already introduce error during quantization so the PSNR bound likely needs to be a bit lower to compensate

Well we know what absolute error linear quantisation incurs and if we assume that worst case every element incurs it, we could include that in the absolute error, right?

@treigerm
Copy link
Copy Markdown
Member

Well we know what absolute error linear quantisation incurs and if we assume that worst case every element incurs it, we could include that in the absolute error, right?

The error for linear quantisation should provide a lower bound of what absolute error we can achieve, right? So we might need a warning if the user provides a lower bound that is below the error incurred due to the quantization.

Also thinking about the quantization a bit more. I think we need to adjust the PSNR calculation that we are passing to the JPEG compressor. Because JPEG2000 will compute the PSNR on the linearly quantized data, right? Because the input data and quantized data are on different scales the PSNR values will change as well. Does that make sense @juntyr ?

@treigerm
Copy link
Copy Markdown
Member

So I pushed a committ that makes the code run, it actually only required a minor change in the actual code.

But right now this code leads to the following error (the same error occurs on multiple datasets, CAMS and ERA5):

Exception: Jpeg2000 only supports signed/unsigned integers up to 25 bits

The above exception was the direct cause of the following exception:

Exception: Jpeg2000 failed to encode the data

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/src/climatebenchpress/compressor/scripts/compress.py", line 86, in compress
    ds_new, measurements = compress_decompress(
                           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/src/climatebenchpress/compressor/scripts/compress.py", line 131, in compress_decompress
    variables[v] = codec_.encode_decode_data_array(ds[v]).compute()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1207, in compute
    return new.load(**kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1175, in load
    ds = self._to_temp_dataset().load(**kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/core/dataset.py", line 899, in load
    evaluated_data: tuple[np.ndarray[Any, Any], ...] = chunkmanager.compute(
                                                       ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/namedarray/daskmanager.py", line 85, in compute
    return compute(*data, **kwargs)  # type: ignore[no-untyped-call, no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/dask/base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/xarray/core/parallel.py", line 351, in _wrapper
    result = func(*converted_args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/numcodecs_combinators/stack.py", line 197, in encode_decode_data_array_single_chunk
    decoded = self.encode_decode(da.values)  # type: ignore
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/numcodecs_combinators/stack.py", line 149, in encode_decode
    codec.encode((encoded)), flatten=False
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/reichelt/dev/python/ClimateBenchPress/compressor/.venv/lib/python3.12/site-packages/numcodecs_observers/__init__.py", line 70, in encode
    encoded: Buffer = self._codec.encode(buf)  # type: ignore
                      ^^^^^^^^^^^^^^^^^^^^^^^
Exception: jpeg2000.rs codec's implementation raised an error

I'm a bit stuck on debugging this error. I went into the debugger and checked the behaviour of all the elements in the CodeStack processing pipeline. All the preprocessing does indeed convert the data to integers of the range [0, 2**25 - 1] but the JPEG2000 still produces this error. Any idea why this might happen @juntyr ?

def abs_bound_codec(
error_bound,
*,
data_abs_min=None,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JPEG codec needs the actual minimum and maximum, not the minimum absolute value.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, yes that's a bug. It won't be the cause of the issue though because for CAMS all the values are positive.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this though

@juntyr
Copy link
Copy Markdown
Collaborator Author

juntyr commented Apr 22, 2025

Could you check again that the quantized elements are indeed in range? The error is only raised for uint32 if the value is > (2**25 - 1)

@treigerm
Copy link
Copy Markdown
Member

This is the check I have done:

(Pdb) pre_jpeg2000 = CodecStack(codec[0], codec[1], codec[2], codec[3]).encode(ds[v].values)
(Pdb) pre_jpeg2000.min(), pre_jpeg2000.max()
(np.int32(0), np.int32(33554431))
(Pdb) 2 ** 25 - 1
33554431
(Pdb) CodecStack(codec[0], codec[1], codec[2], codec[3])
CodecStack(AsType(encode_dtype='<f8', decode_dtype='<f4'), FixedOffsetScale(offset=9.999996436384911e-15, scale=1.7859274880937514e-14, _version='1.0.0'), Round(precision=1.0, _version='1.0.0'), AsType(encode_dtype='<i4', decode_dtype='<f8'))
(Pdb) codec[4].encode(pre_jpeg2000)
*** Exception: jpeg2000.rs codec's implementation raised an error

with the breakpoint set at

with numcodecs_observers.observe(

@treigerm
Copy link
Copy Markdown
Member

Having another stab at figuring out what could be going on here..

I think we need to adjust the PSNR calculation that we are passing to the JPEG compressor. Because JPEG2000 will compute the PSNR on the linearly quantized data, right?

I thought about this again and actually I was wrong, we don't need to adjust the PSNR because it is invariant to the scale-shift operations that we are doing.

Additionally, adjusted the new compressors that came from the main branch to be able to work with the new abstract base class interface.
@treigerm
Copy link
Copy Markdown
Member

@juntyr So setting max_pixel_val = 2**24 - 1 seems to fix the issue. Setting max_pixel_val = 2**24 leads to an error so I assume JPEG200 is using a 25 bit signed integer/24 bit unsigned integer?

Either way, should be ready to merge now!

# round and truncate to integer values
numcodecs_wasm_round.Round(precision=1),
numcodecs.astype.AsType(
encode_dtype="int32",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the problem, you're right. We're going to signed int32, which supports [-2^24, 2^24 - 1]. If you change this to uint32 (since we now scale to [0, ...]), the max pixel value of 2^25 - 1 should work.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked it, you're right! I know switched back to max pixel val 2^25 - 1 but using uint32 because it should give more range!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for finding what our issue was!

@treigerm treigerm merged commit 5a70912 into main Apr 24, 2025
3 checks passed
@juntyr juntyr deleted the jpeg2000-quantization branch April 24, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JPEG2000 input transformation

2 participants