Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[log] Fix the 'Destination buffer is too small' error while decompress data in ZstdArrowCompressionCodec #464

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

swuferhong
Copy link
Collaborator

@swuferhong swuferhong commented Feb 21, 2025

Purpose

Linked issue: #462

Currently, if we define a String in the table and stream write a very large size of data, such as a single row size 5k, the ZstdArrowCompressionCodec will definitely throw a RuntimeException with the message 'Destination buffer is too small' during decompression.

This pr is aims to fix this error. Through investigation, we found that if we replace the compression way using Zstd.compressUnsafe(long dst, long dstSize, long src, long srcSize, int level) with memory copy methods like Zstd.compressDirectByteBuffer(ByteBuffer dst, int dstOffset, int dstSize, ByteBuffer src, int srcOffset, int srcSize, int level), or switch to using ZstdOutputStreamNoFinalizer, the error no longer occurs. The possible reason is that the compressUnsafe way has some memory operation anomalies that are not being handled correctly.

Why we use Zstd.compressDirectByteBuffer() here is that the stream output way ZstdOutputStreamNoFinalizer will consume more cpu than previous way.

Tests

API and Format

Documentation

@swuferhong swuferhong requested a review from wuchong February 21, 2025 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant