[Plugin] Fix bug of memcpy in scatter plugin #3818

SilvesterHsu · 2024-04-23T13:03:48Z

In the scatter plugin, cudaMemcpy with implicit synchronization is used to complete data copying, ensuring that device_transform_coeff is properly assigned before the kernel execution. However, this method fails when using cudaStreamNonBlocking stream for inference in TensorRT, resulting in incorrect outcomes. This issue can be resolved by switching to cudaMemcpyAsync and using the same stream as the kernel, yielding correct results.

Merge release/10.0 to main

Signed-off-by: seel.xu <[email protected]>

asfiyab-nvidia and others added 2 commits April 3, 2024 14:43

Merge pull request NVIDIA#3773 from NVIDIA/release/10.0

5eeb6c7

Merge release/10.0 to main

[Plugin] Fix bug of memcpy in scatter plugin

05af502

Signed-off-by: seel.xu <[email protected]>

kevinch-nv force-pushed the main branch from 40efe7e to 2114dc7 Compare July 9, 2025 17:14

kevinch-nv requested a review from a team as a code owner July 9, 2025 17:14

kevinch-nv requested review from LeoZDong and kevinch-nv and removed request for a team July 9, 2025 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Plugin] Fix bug of memcpy in scatter plugin #3818

[Plugin] Fix bug of memcpy in scatter plugin #3818

Uh oh!

SilvesterHsu commented Apr 23, 2024

Uh oh!

Uh oh!

[Plugin] Fix bug of memcpy in scatter plugin #3818

Are you sure you want to change the base?

[Plugin] Fix bug of memcpy in scatter plugin #3818

Uh oh!

Conversation

SilvesterHsu commented Apr 23, 2024

Uh oh!

Uh oh!