Skip to content

Conversation

@nirandaperera
Copy link
Contributor

@nirandaperera nirandaperera commented Oct 10, 2025

This PR adds benchmarks for pinned host buffer

Depends on #549

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 10, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@nirandaperera nirandaperera added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 10, 2025
@nirandaperera
Copy link
Contributor Author

Results

System:
NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0

nvcc: NVIDIA (R) Cuda compiler driver
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0

image

Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
@nirandaperera nirandaperera force-pushed the pinned_host_buffer_bench branch from f6a2faf to ddc5452 Compare October 17, 2025 22:13
@nirandaperera nirandaperera marked this pull request as ready for review October 17, 2025 22:13
@nirandaperera nirandaperera requested review from a team as code owners October 17, 2025 22:13
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirandaperera nirandaperera requested a review from a team as a code owner October 21, 2025 23:00
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The priming benchmark is measuring very misleading timings. Please also cull all the LLM-produced comments that just explain in words exactly what the next line of code says in code.

Comment on lines 109 to 122
auto latency_to_first = std::chrono::duration_cast<std::chrono::nanoseconds>(
first_allocation_time - start_time
)
.count();
auto first_round_duration_ns =
std::chrono::duration_cast<std::chrono::nanoseconds>(
first_round_end - start_time
)
.count();
auto second_round_duration_ns =
std::chrono::duration_cast<std::chrono::nanoseconds>(
second_round_end - first_round_end
)
.count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These timings are at least partly nonsense. The latency to first makes sense. The first_round_duration kind of doesn't because we've synced after the very first allocation, but ok. I guess it makes kind of sense.

The second_round_duration includes the time to deallocate all of the allocations from the first round. This makes no sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah. Good point. I was simply following this
https://github.com/rapidsai/rmm/blob/branch-25.12/cpp/benchmarks/async_priming/async_priming_bench.cpp
I guess the same issue is here as well.

@nirandaperera nirandaperera requested a review from wence- October 22, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants