[REQ] Reduce kernel cold-compile time by excluding unused C++ headers from JIT compilation

### Description

Automatically determine which native features a kernel actually needs and skip unused C++ headers during JIT compilation, for both the CPU (Clang) and CUDA (NVRTC) paths.

### Context

When investigating adding `bfloat16` support to Warp, it was found that the new headers increased CPU cold-compile times for *unrelated* kernels by ~60%, even kernels that never touch `bfloat16`. This is because `builtin.h` unconditionally includes all native headers (mesh, volume, tile, noise, mat, float16 adjoint instantiations, etc.), so every kernel pays the full parsing cost regardless of what it actually uses. A simple scalar assignment kernel takes ~1.6s to cold-compile on CPU. The vast majority of that time is spent in header parsing rather than compiling the kernel itself.

As more features are added to Warp, this problem will only get worse. Each new header inclusion raises the compile-time floor for every kernel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQ] Reduce kernel cold-compile time by excluding unused C++ headers from JIT compilation #1017

Description

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[REQ] Reduce kernel cold-compile time by excluding unused C++ headers from JIT compilation #1017

Description

Description

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions