[log] Fix the 'Destination buffer is too small' error while decompress data in ZstdArrowCompressionCodec #464
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: #462
Currently, if we define a String in the table and stream write a very large size of data, such as a single row size 5k, the ZstdArrowCompressionCodec will definitely throw a RuntimeException with the message 'Destination buffer is too small' during decompression.
This pr is aims to fix this error. Through investigation, we found that if we replace the compression way using
Zstd.compressUnsafe(long dst, long dstSize, long src, long srcSize, int level)
with memory copy methods likeZstd.compressDirectByteBuffer(ByteBuffer dst, int dstOffset, int dstSize, ByteBuffer src, int srcOffset, int srcSize, int level)
, or switch to usingZstdOutputStreamNoFinalizer
, the error no longer occurs. The possible reason is that thecompressUnsafe
way has some memory operation anomalies that are not being handled correctly.Why we use
Zstd.compressDirectByteBuffer()
here is that the stream output wayZstdOutputStreamNoFinalizer
will consume more cpu than previous way.Tests
API and Format
Documentation