Bit-Packing Codec for ADC Data Optimization #2942

asimchoudhary · 2025-04-01T07:47:55Z

This PR implements a prototype BitPackingCodec that efficiently stores ADC integer values (10-bit, 12-bit) that typically waste space when stored as 16-bit integers.

Key Features

Reduces storage requirements by ~37.5% for 10-bit ADC data
Maintains data integrity through careful bit manipulation
Simple API requiring only bits_per_value and original_dtype parameters

Implementation Details

The codec works by:

Accepting arrays with values that only use N bits (where N < standard bit widths)
Packing these values contiguously, crossing byte boundaries as needed
Unpacking values during decoding while preserving original data integrity

Key algorithms:

Bit Masking: Extracts only necessary bits
Byte Boundary Handling: Special logic for values that span multiple bytes
Efficient Unpacking: Reconstructs original values without data loss

Demo

Tried to bitpack an array of 4 values with dtype = uint16 into 10 bits

Packing Data

Unpacking Data

Signed-off-by: asim <[email protected]>

d-v-b · 2025-04-01T08:24:30Z

thanks for working on this. It seems like numcodecs already has a packbits codec, and @jbms has a specification for a more general form of that codec.

I'm not an expert but it seems to me that your PR and these other packbits codecs are doing something slightly different. Because the names are so similar, it might be good to disambiguate the various functionalities here.

LDeakin · 2025-04-01T09:03:30Z

This would be best suited as an array-to-bytes codec (like packbits) because:

The order of the input bytes going into your bytes-to-bytes codec depends on the endian parameter of the bytes codec
You would not need to pass in the original_dtype
- Also, the dtype of the input to an array-to-bytes codec (in chunk_spec) may not match the dtype of the array, array-to-array codecs can change it

asimchoudhary · 2025-04-01T10:39:21Z

Hi @d-v-b,

Thank you for your feedback and for highlighting these related implementations!

You're absolutely right—the naming similarity could lead to confusion. Would ADCPackCodec be a clearer alternative?

Looking forward to your thoughts!

asimchoudhary · 2025-04-01T11:02:12Z

Hii, @LDeakin

Thank you for the insightful feedback!

You’re absolutely right. ArrayBytesCodec is a better fit than BytesBytesCodec for this bit-packing implementation , shifting to an ArrayBytesCodec would simplify the implementation by eliminating the need for original_dtype and ensuring better alignment with the pipeline.

Initially, I modeled this after BloscCodec as a BytesBytesCodec, viewing it mainly as compression. But as I dug deeper, I realized the array-aware nature of bit-packing might suit ArrayArrayCodec better. My hesitation was around handling NDBuffer objects correctly.

Would you recommend ArrayBytesCodec over ArrayArrayCodec here?
I’ll refactor accordingly to simplify the code .

Appreciate your guidance!

jbms · 2025-04-01T16:50:28Z

This should definitely be an array -> bytes codec.

However, potentially it could make sense to combine this with the packbits codec that I proposed (as an extension of the numcodecs packbits codec).

My packbits proposal is designed to work with bool as well as non-whole-number-of-bytes dtypes like int2, int4, float4_*, that I also proposed as extensions.

In contrast, your codec proposed here works with the existing whole-number-of-bytes data types and retains only the specified number of least-significant bits.

Would it make sense to extend my packbits proposal to allow optionally specifying a range of bits to retain, e.g. start_bit and end_bit? I can imagine that someone may wish to keep just the most-significant bits in some cases.

jbms · 2025-04-01T16:52:55Z

Note: A separate issue is that your implementation, being pure Python that loops over each element, will surely be very slow. You will want to implement in cython or otherwise ensure the bulk of the work gets handled by optimized native code.

asimchoudhary · 2025-04-03T03:01:07Z

Thank you for your detailed feedback, @jbms!

I'll definitely refactor this as an array → bytes codec as you and @LDeakin suggested.

Your suggestion about extending packbits with start_bit and end_bit parameters is excellent! This would indeed provide more flexibility for selecting which bits to retain. I'd be happy to help.

Regarding performance, you're absolutely right. My current implementation is a proof-of-concept focused on correctness, but production use would require optimization. I'd be interested in exploring:

Cython implementation for the core bit manipulation logic
Use of native code wherever possible.
Numba for JIT compilation

jbms · 2025-04-08T22:22:15Z

I modified my packbits proposal to include a first_bit and last_bit option:
https://github.com/zarr-developers/zarr-extensions/pull/3/files#diff-8c721f8bc1d30cef0875281a1a9f922d762a0998f1601cd05022e544ed8df271

It is now a fair bit more complicated though --- but still it probably makes sense to keep those options integrated rather than as a separate codec. There could still be partial implementations of packbits that do not support them.

prototype for bit packing

5d79a38

Signed-off-by: asim <[email protected]>

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bit-Packing Codec for ADC Data Optimization #2942

Bit-Packing Codec for ADC Data Optimization #2942

asimchoudhary commented Apr 1, 2025

d-v-b commented Apr 1, 2025

LDeakin commented Apr 1, 2025

asimchoudhary commented Apr 1, 2025

asimchoudhary commented Apr 1, 2025

jbms commented Apr 1, 2025

jbms commented Apr 1, 2025

asimchoudhary commented Apr 3, 2025

jbms commented Apr 8, 2025

Bit-Packing Codec for ADC Data Optimization #2942

Are you sure you want to change the base?

Bit-Packing Codec for ADC Data Optimization #2942

Conversation

asimchoudhary commented Apr 1, 2025

Key Features

Implementation Details

Demo

d-v-b commented Apr 1, 2025

LDeakin commented Apr 1, 2025

asimchoudhary commented Apr 1, 2025

asimchoudhary commented Apr 1, 2025

jbms commented Apr 1, 2025

jbms commented Apr 1, 2025

asimchoudhary commented Apr 3, 2025

jbms commented Apr 8, 2025