-
Notifications
You must be signed in to change notification settings - Fork 0
Add debug wrapper functions for CUDA/cuBLAS/cuDNN API calls and kernel launches #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: cuda-backend-graph-support
Are you sure you want to change the base?
Add debug wrapper functions for CUDA/cuBLAS/cuDNN API calls and kernel launches #1
Conversation
Co-authored-by: Menkib64 <[email protected]>
Co-authored-by: Menkib64 <[email protected]>
|
@copilot Add logging to all kernel launches in src/neural/backends/cuda. Logging should be conditionally enabled using CUDA_WRAPPER_DEBUG define. |
- Added CUDA_KERNEL_LAUNCH_LOG macro to cuda_wrapper.h - Macro logs kernel name, grid dimensions, block dimensions, shared memory, and stream - Added logging before all 51 kernel launches in common_kernels.cu and fp16_kernels.cu - Logging is conditionally enabled via CUDA_WRAPPER_DEBUG define (default: disabled) Co-authored-by: Menkib64 <[email protected]>
Added kernel launch logging for all 51 CUDA kernel launches in src/neural/backends/cuda (34 in common_kernels.cu + 17 in fp16_kernels.cu). The logging is controlled by the CUDA_WRAPPER_DEBUG define and logs:
Commit: 10250a5 |
|
@copilot Make all wrappers use template types to fix ambiguous overload compiler errors. |
- Converted cudaMalloc, cudaFree, cudaMemcpy, cudaMemcpyAsync, cudaMemset to use templates - Converted cudaFuncSetAttribute to use template for function pointer - Converted cublasGemmStridedBatchedEx to use templates for data pointers - Converted cudnnConvolutionForward, cudnnConvolutionBiasActivationForward to use templates - Converted cudnnActivationForward, cudnnAddTensor to use templates - Templates allow proper type deduction and avoid ambiguous overload errors - All templates forward to the original CUDA API with appropriate casts where needed Co-authored-by: Menkib64 <[email protected]>
Converted all pointer-based wrappers to use template types to avoid ambiguous overload errors. Changed wrappers:
Templates enable proper type deduction from call sites and forward correctly to the original CUDA API functions. Commit: acf5cfa |
Implements debug logging wrappers for all CUDA, cuBLAS, and cuDNN API calls AND kernel launches used in
src/neural/backends/cudato aid in debugging GPU operations.Changes
New file:
src/neural/backends/cuda/cuda_wrapper.hCUDA_KERNEL_LAUNCH_LOGmacro for logging all kernel launcheslczero::cudnn_backendnamespaceLOGFILECUDA_WRAPPER_DEBUGcompile-time flag (default: 0 for zero overhead)Modified:
src/neural/backends/cuda/cuda_common.h#include "cuda_wrapper.h"Modified:
src/neural/backends/cuda/common_kernels.cuModified:
src/neural/backends/cuda/fp16_kernels.cuImplementation
Wrappers leverage C++ name lookup: code in
lczero::cudnn_backendresolves to namespace-scoped wrappers first, which then call global API via::cudaMallocetc. to avoid recursion.Template Design: Wrapper functions use C++ templates for pointer parameters (e.g.,
template <typename T> cudaMalloc(T** devPtr, size_t size)) to:void*andconst void*Kernel launch logging is added before each kernel launch using the
CUDA_KERNEL_LAUNCH_LOGmacro, which logs the kernel name and execution configuration.Example output when
CUDA_WRAPPER_DEBUG=1:All existing CUDA API calls and kernel launches in the backend transparently route through wrappers without code changes to calling code.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.