Skip to content

Support Lempel–Ziv–Welch (LZW) - as used in the Unix .Z compress and decompress #246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matthijscox opened this issue May 27, 2025 · 1 comment

Comments

@matthijscox
Copy link

matthijscox commented May 27, 2025

Would be great to have Lempel–Ziv–Welch (LZW) compression and decompression available in the Julia ecosystem somewhere. Especially for Windows users, since Unix users have this default available via the compress utility. A codec in TranscodingStreams.jl seems like the most obvious candidate to me, although I have no idea how difficult it is to adhere to the transcoding interface.

There are pure Python and C implementations for reference if we want, like in the Python unlzw3 package. Less than 200 lines of source code (note: only decompression, no compression).

Might have a go at it myself if I find some time, but would like to create some awareness. I couldn't find a Julia discourse topic nor any Github issue on LZW yet, please link if there is one already.

@nhz2
Copy link
Member

nhz2 commented May 30, 2025

One option for decompression is the executables from either Gzip_jll or pigz_jll

using pigz_jll: pigz
decompress(data) = read(pipeline(IOBuffer(data), `$(pigz()) --stdout --decompress`))

Using the TranscodingStreams interface will be the most flexible, but can be tricky because you have to write code as a state machine instead of a normal function and have to deal with pointers instead of arrays. https://github.com/JuliaIO/CodecInflate64.jl is an example of a TranscodingStreams decompressor written in pure Julia. https://github.com/JuliaIO/CodecInflate64.jl/blob/main/src/codecs.jl has the actual interface glue code.

There is also the interface in https://github.com/JuliaIO/ChunkCodecs.jl, which, like Python unlzw3 package and Mark Adler's 'unlzw' C function on Stackoverflow requires the full input and output to be in memory.

For reference, astropy/astropy#10714 (comment) has a nice comparison of different LZW decompression tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants