-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add zstd compression support #60
base: main
Are you sure you want to change the base?
Conversation
zstd experimental feature is enabled to calculate upper bound of Vec capacity to be allocated while decompressing data using zstd::bulk::Decompressor::upper_bound. supported levels are 1~22, with 0 defaulting to level 3.
There's a few things about this PR that might need some feedback/discussions.
|
I would like it to be, but there's no production ready library out there right now. KillingSpark/zstd-rs#65 has some encoding efforts going on, but it's far from usable. Ideally, it could just be switched out if there ever is a worthy contender.
It never is. Blocks tend to be 4 - 64 KB in size, blobs maybe up to a couple of MB max. It's also needed in memory because the size needs to be known (for the block header).
Hmm yeah, interesting. The problem is that the block header needs to be fixed, so I went with 2 bytes because I haven't looked at how many compression levels there tend to be. Miniz just has 10 or so. With a u8 we can go from 20 down to -234. Most sources tend to recommend something along the lines of -7 - 20. So I'm not sure how important it is to even support negative levels that are much lower than 200. That would need some benchmarking. If -8000 is barely faster than -127 at much worse space savings, there's not point in supporting it I think.
For the benchmarks in the first chapter I used this project: https://gist.github.com/marvin-j97/22dfbe2ae2d9a8b9bcc938c8d48e54c7 - it needs a corpus of text documents on disk (DOCS_FOLDER) that it will ingest. You'll need to use fjall 2.0.1+ because I had to fix a bug. |
zstd experimental feature is enabled to calculate upper bound of Vec capacity to be allocated while decompressing data using zstd::bulk::Decompressor::upper_bound.
supported levels are -128~22, with 0 defaulting to level 3 (due to zstd library behaviour).
benchmark Load block from disk