Add Pad Reflect 1D CUDA support #14659

YavorGIvanov · 2025-07-13T00:46:13Z

No description provided.

JohannesGaessler

Please tell me whether you want to address the comment regarding the loop in this PR.

ggml/src/ggml-cuda/pad_reflect_1d.cu

JohannesGaessler · 2025-07-15T09:17:00Z

ggml/src/ggml-cuda/pad_reflect_1d.cu

+    const char * src0_ptr = (const char *)src0 + i3*nb03 + i2*nb02 + i1*nb01;
+    char * dst_ptr = (char *)dst + i3*nb3 + i2*nb2 + i1*nb1;
+
+    for (int64_t i0 = threadIdx.x; i0 < ne0; i0 += blockDim.x) {


This is going to produce correct results but generally speaking you will get much better performance if each thread just works on a single value instead of looping over ne0. However, it would also be fine to just merge it as-is and maybe change this later if it ever becomes relevant for end-to-end performance.

Co-authored-by: Johannes Gäßler <[email protected]>

Add Pad Reflect 1D CUDA support

12f5f7c

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 13, 2025

am17an requested a review from JohannesGaessler July 13, 2025 14:49

JohannesGaessler approved these changes Jul 15, 2025

View reviewed changes

Update ggml/src/ggml-cuda/pad_reflect_1d.cu

f908dce

Co-authored-by: Johannes Gäßler <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Pad Reflect 1D CUDA support #14659

Add Pad Reflect 1D CUDA support #14659

YavorGIvanov commented Jul 13, 2025

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

JohannesGaessler Jul 15, 2025

Uh oh!

Uh oh!

Add Pad Reflect 1D CUDA support #14659

Are you sure you want to change the base?

Add Pad Reflect 1D CUDA support #14659

Conversation

YavorGIvanov commented Jul 13, 2025

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JohannesGaessler Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!