feat(native): port CUDA extractor to Rust#1099
Conversation
Adds tree-sitter-cuda dependency and native extractor matching the WASM-side behavior for CUDA symbol, import, and call extraction. Part of #1071
|
Claude finished @carlos-alm's task —— View job CUDA Extractor PR Review - Final AssessmentRoot Cause AnalysisProblem Being Solved: Porting CUDA extraction from WASM to native Rust engine to eliminate the WASM fallback for .cu/.cuh files and achieve true engine parity. Backlog Compliance
Critical ConcernsNone identified. Implementation follows established patterns precisely. Code Quality AssessmentArchitecture: Excellent mirroring of existing TypeScript extractor with proper C++ rule ID delegation for complexity/CFG Final Recommendation
Conditions:
This is exactly the kind of PR the project needs - systematic completion of native engine language support with careful attention to parity and testing. |
Greptile SummaryThis PR ports the CUDA extractor from the WASM/TypeScript side to a native Rust implementation, adding
Confidence Score: 5/5Safe to merge; the extraction logic is a faithful port of the existing JS extractor with no regressions and two previously-flagged correctness gaps already resolved. All changed paths are additive (new language support, new file), the wiring changes are mechanical and tested, and the extractor logic closely mirrors the battle-tested C++ extractor. The only notable gap (struct inheritance) pre-exists on the JS side and is explicitly in scope as a faithful port. crates/codegraph-core/src/extractors/cuda.rs — the struct-specifier handler is the one place that diverges from cpp.rs and could be brought to full parity. Important Files Changed
Sequence DiagramsequenceDiagram
participant FC as file_collector
participant PR as parser_registry
participant CE as CudaExtractor
participant FS as FileSymbols
FC->>PR: extension ".cu" / ".cuh"
PR-->>FC: LanguageKind::Cuda
PR->>CE: tree_sitter_cuda::LANGUAGE → parse tree
CE->>FS: walk_tree(match_cuda_node) — pass 1: symbols/calls/imports
CE->>FS: walk_ast_nodes_with_config(CUDA_AST_CONFIG) — pass 2: AST nodes
CE->>FS: walk_tree(match_cuda_type_map) — pass 3: type_map
FS-->>CE: "FileSymbols { definitions, calls, imports, classes, type_map }"
Reviews (3): Last reviewed commit: "fix: resolve merge conflicts with main (..." | Re-trigger Greptile |
| let text = node_text(&child, source); | ||
| if kind == "storage_class_specifier" || kind == "attribute_specifier" { | ||
| if is_cuda_qualifier(text) { | ||
| qualifiers.push(text.to_string()); | ||
| } | ||
| } else if is_cuda_qualifier(text) { | ||
| qualifiers.push(text.to_string()); | ||
| } | ||
| } | ||
| qualifiers | ||
| } | ||
|
|
||
| // ── Declarator helpers (mirror cpp.rs) ────────────────────────────────────── |
There was a problem hiding this comment.
Missing type-map walk present in C++ extractor
cpp.rs performs a third walk_tree pass with match_cpp_type_map that populates symbols.type_map with variable-to-type bindings (from declaration and parameter_declaration nodes). The CUDA extractor only does two passes and never populates type_map. As a result, CUDA code like DeviceBuffer buf; buf.copy(src, n); will be unable to resolve the copy call to DeviceBuffer.copy — type-inferred call resolution that C++ files benefit from will not fire for any CUDA file in the native layer.
There was a problem hiding this comment.
Fixed in 25f0adc. Added a third walk_tree pass with match_cuda_type_map that mirrors cpp.rs exactly — populates type_map from declaration and parameter_declaration nodes so receiver-typed calls like buf.copy(...) resolve to DeviceBuffer.copy for CUDA sources too. Added two new unit tests (populates_type_map_from_declarations, populates_type_map_from_parameters) to lock the behavior in.
| } | ||
| } | ||
|
|
||
| fn match_cuda_node(node: &Node, source: &[u8], symbols: &mut FileSymbols, _depth: usize) { | ||
| match node.kind() { | ||
| "function_definition" => handle_cuda_function_definition(node, source, symbols), | ||
| "class_specifier" => handle_cuda_class_specifier(node, source, symbols), | ||
| "struct_specifier" => handle_cuda_struct_specifier(node, source, symbols), | ||
| "enum_specifier" => handle_cuda_enum_specifier(node, source, symbols), | ||
| "namespace_definition" => handle_cuda_namespace_definition(node, source, symbols), | ||
| "type_definition" => handle_cuda_type_definition(node, source, symbols), | ||
| "preproc_include" => handle_cuda_preproc_include(node, source, symbols), | ||
| "call_expression" => handle_cuda_call_expression(node, source, symbols), | ||
| _ => {} | ||
| } | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; |
There was a problem hiding this comment.
Import names retain file extension, diverging from C++ extractor behavior
The C++ extractor strips .h/.hpp from the import name, so #include <cuda_runtime.h> produces names: ["cuda_runtime"]. The CUDA extractor keeps the full filename, producing names: ["cuda_runtime.h"]. Any cInclude resolution logic that matches against the stripped name will fail to link CUDA header includes. The unit test for this handler checks only source, not names, so the divergence is currently invisible to the test suite.
There was a problem hiding this comment.
Fixed in 25f0adc. handle_cuda_preproc_include now strips .cuh/.hpp/.h from the import name, matching the native C/C++ extractors so cInclude resolution links CUDA headers consistently. Strengthened extracts_include_with_c_include_flag to assert the stripped names (cuda_runtime, mylib) so future divergence will fail the test suite.
Codegraph Impact Analysis40 functions changed → 24 callers affected across 2 files
|
… in CUDA CUDA files now populate `type_map` from declarations and parameter declarations via a third walk pass, matching `cpp.rs`. Without it, receiver-typed calls like `buf.copy(...)` could not resolve to `DeviceBuffer.copy` for CUDA sources in the native layer. `#include` import names also now drop the trailing `.cuh`/`.hpp`/`.h` extension so `cInclude` resolution links CUDA headers consistently with the native C/C++ extractors. Strengthens the include test to cover the stripped names and adds two type_map tests.
Summary
tree-sitter-cuda 0.21dependency and a new native CUDA extractor atcrates/codegraph-core/src/extractors/cuda.rs, mirroring the WASM-sidesrc/extractors/cuda.ts.LanguageKind,extract_symbols_with_opts,file_collectorextensions (.cu/.cuh), theCUDA_AST_CONFIG, and thecudaentries inAST_TYPE_MAPS/AST_STRING_CONFIGS..cu/.cuhextensions toNATIVE_SUPPORTED_EXTENSIONSand removes.cufrom theunsupported-by-nativeclassification test, since CUDA is now natively supported.The extractor reuses the C++ rule id for complexity/CFG (the CUDA grammar exposes the same C++ control-flow node types) and emits
__global__/__device__/__host__/__shared__/__constant__qualifiers asdecoratorson function/method definitions.Part of #1071
Test plan
cargo build --release -p codegraph-coresucceedscargo test -p codegraph-core --libpasses (194/194, including 10 new CUDA extractor tests)vitest run tests/parsers/cuda.test.tspasses (5/5)vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t "cuda"passes (5/5)