-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Parquet] perf: reuse seeked File clone in ChunkReader::get_read() #9214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parquet] perf: reuse seeked File clone in ChunkReader::get_read() #9214
Conversation
Removes redundant try_clone() call. Improved get_read() by ~36% on local benchmark.
|
run benchmark arrow_reader arrow_reader_row_filter arrow_reader_clickbench |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Love it -- thank you @fvaleye and @Dandandan |
|
BTW I think @AdamGS fixed another unecessary read when reading from object store in the soon to be relesed
I am still waiting on one more vote to get the release approved so I can pubish it |
Which issue does this PR close?
N/A, it's a minor performance fix.
Rationale for this change
While reviewing Parquet performance, I observed a duplicate
try_clone(). I wasn't able to tell why it was required. After benchmarking and running tests, it seems there is no reason for the duplication.ChunkReader::get_read()forFilecallstry_clone()twice: once to seek, then again for theBufReader, discarding the first clone. This might be wasteful, as eachtry_clone()duplicates the file descriptor via a system call. So, one less dup() syscall per get_read() call.What changes are included in this PR?
Reuse the already-seeked file clone instead of creating a new one.
Are these changes tested?
Covered by existing tests.
Local benchmarks using divan show ~36% improvement for
get_read()calls on my laptop.Are there any user-facing changes?
No.