Skip to content

feat(native): port CUDA extractor to Rust#1099

Merged
carlos-alm merged 5 commits into
mainfrom
feat/1071-cuda-rust-extractor
May 12, 2026
Merged

feat(native): port CUDA extractor to Rust#1099
carlos-alm merged 5 commits into
mainfrom
feat/1071-cuda-rust-extractor

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Adds tree-sitter-cuda 0.21 dependency and a new native CUDA extractor at crates/codegraph-core/src/extractors/cuda.rs, mirroring the WASM-side src/extractors/cuda.ts.
  • Wires CUDA through LanguageKind, extract_symbols_with_opts, file_collector extensions (.cu/.cuh), the CUDA_AST_CONFIG, and the cuda entries in AST_TYPE_MAPS/AST_STRING_CONFIGS.
  • Adds the .cu/.cuh extensions to NATIVE_SUPPORTED_EXTENSIONS and removes .cu from the unsupported-by-native classification test, since CUDA is now natively supported.

The extractor reuses the C++ rule id for complexity/CFG (the CUDA grammar exposes the same C++ control-flow node types) and emits __global__/__device__/__host__/__shared__/__constant__ qualifiers as decorators on function/method definitions.

Part of #1071

Test plan

  • cargo build --release -p codegraph-core succeeds
  • cargo test -p codegraph-core --lib passes (194/194, including 10 new CUDA extractor tests)
  • vitest run tests/parsers/cuda.test.ts passes (5/5)
  • vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t "cuda" passes (5/5)

Adds tree-sitter-cuda dependency and native extractor matching
the WASM-side behavior for CUDA symbol, import, and call extraction.

Part of #1071
@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Claude finished @carlos-alm's task —— View job


CUDA Extractor PR Review - Final Assessment

Root Cause Analysis

Problem Being Solved: Porting CUDA extraction from WASM to native Rust engine to eliminate the WASM fallback for .cu/.cuh files and achieve true engine parity.
Why This Approach: Adds native extraction capability for an existing supported language rather than hiding symptoms or relaxing rules.
Risk Assessment: Low risk - pure additive functionality mirroring established patterns.

Backlog Compliance

  • Zero-dep: ✓ — Only adds tree-sitter-cuda grammar (tree-sitter ecosystem dependency), builds on existing AST/SQLite infrastructure
  • Foundation-aligned: ✓ — Fully compliant, local-only parsing with no cloud APIs, maintains P1-P8 principles
  • Problem-fit: 4 — Directly improves agent accuracy by eliminating native→WASM fallback inconsistencies for CUDA codebases, reducing round-trips for AI coding assistants
  • Breaking: No — Purely additive, existing API contracts unchanged
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. Implementation follows established patterns precisely.

Code Quality Assessment

Architecture: Excellent mirroring of existing TypeScript extractor with proper C++ rule ID delegation for complexity/CFG
Integration: Complete wiring through all infrastructure layers (LanguageKind, file collector, extractor dispatch)
CUDA Features: Properly extracts __global__/__device__/__host__ qualifiers as decorators
Testing: 10 native Rust tests + existing WASM coverage + resolution benchmarks
Error Handling: Robust null checks and proper Rust ownership patterns

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE WITH CONDITIONS
  • Reasoning: High-quality implementation that directly addresses engine parity gap. The mirroring approach ensures consistency between native and WASM engines. Only loses one star due to the minor dependency addition, but it's well-justified for language support completeness.

Conditions:

  1. Verify cargo tests pass before merging (as mentioned in test plan)
  2. Consider adding brief mention in CHANGELOG of CUDA native support

This is exactly the kind of PR the project needs - systematic completion of native engine language support with careful attention to parity and testing.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR ports the CUDA extractor from the WASM/TypeScript side to a native Rust implementation, adding tree-sitter-cuda 0.21 as a dependency and wiring .cu/.cuh files through all the necessary registration points (LanguageKind, file_collector, parser_registry, NATIVE_SUPPORTED_EXTENSIONS, AST_TYPE_MAPS, AST_STRING_CONFIGS). Previously-flagged issues from earlier review rounds (missing type_map walk and import-name extension stripping) are already resolved in this revision.

  • New cuda.rs extractor: three-pass walker (symbol extraction → AST node config → type-map population) that mirrors cpp.rs, with CUDA-specific qualifier detection emitting __global__/__device__/__host__/__shared__/__constant__ as decorators on function definitions.
  • Infrastructure wiring: LanguageKind::Cuda added across parser registry, file collector, and extension dispatch; 10 new unit tests cover all major node kinds plus the type-map behavior.
  • One gap vs C++ extractor: handle_cuda_struct_specifier does not call extract_cuda_base_classes, so inheritance relationships from CUDA structs (e.g. struct Foo : Base {}) are not recorded — matching the JS extractor's behavior but diverging from cpp.rs.

Confidence Score: 5/5

Safe to merge; the extraction logic is a faithful port of the existing JS extractor with no regressions and two previously-flagged correctness gaps already resolved.

All changed paths are additive (new language support, new file), the wiring changes are mechanical and tested, and the extractor logic closely mirrors the battle-tested C++ extractor. The only notable gap (struct inheritance) pre-exists on the JS side and is explicitly in scope as a faithful port.

crates/codegraph-core/src/extractors/cuda.rs — the struct-specifier handler is the one place that diverges from cpp.rs and could be brought to full parity.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/cuda.rs New 642-line CUDA extractor; faithfully mirrors JS side with previously-flagged issues (type_map, include-name stripping) now addressed. Minor gap: struct specifier does not call extract_cuda_base_classes, so struct inheritance edges are not recorded (consistent with JS but diverges from C++ extractor).
crates/codegraph-core/src/extractors/helpers.rs Adds CUDA_AST_CONFIG as an alias of CPP_AST_CONFIG; correct since CUDA shares C++ node vocabulary.
crates/codegraph-core/src/extractors/mod.rs Wires CudaExtractor into extract_symbols_with_opts dispatch; straightforward addition.
crates/codegraph-core/src/parser_registry.rs Adds Cuda variant to LanguageKind, .cu/.cuh extension mapping, and all() list; EXPECTED_LEN updated correctly to 26.
crates/codegraph-core/src/file_collector.rs Adds "cu" and "cuh" to SUPPORTED_EXTENSIONS; consistent with LanguageKind mapping.
src/ast-analysis/rules/index.ts Adds CUDA_AST_TYPES and CUDA_STRING_CONFIG as aliases to the C++ equivalents; correctly wired into both AST_TYPE_MAPS and AST_STRING_CONFIGS.
src/domain/parser.ts Adds .cu and .cuh to NATIVE_SUPPORTED_EXTENSIONS; no issues.
tests/parsers/native-drop-classification.test.ts Removes .cu from unsupported-by-native test set and adjusts count from 10 to 9; correct.

Sequence Diagram

sequenceDiagram
    participant FC as file_collector
    participant PR as parser_registry
    participant CE as CudaExtractor
    participant FS as FileSymbols

    FC->>PR: extension ".cu" / ".cuh"
    PR-->>FC: LanguageKind::Cuda
    PR->>CE: tree_sitter_cuda::LANGUAGE → parse tree
    CE->>FS: walk_tree(match_cuda_node) — pass 1: symbols/calls/imports
    CE->>FS: walk_ast_nodes_with_config(CUDA_AST_CONFIG) — pass 2: AST nodes
    CE->>FS: walk_tree(match_cuda_type_map) — pass 3: type_map
    FS-->>CE: "FileSymbols { definitions, calls, imports, classes, type_map }"
Loading

Fix All in Claude Code

Reviews (3): Last reviewed commit: "fix: resolve merge conflicts with main (..." | Re-trigger Greptile

Comment on lines +69 to +81
let text = node_text(&child, source);
if kind == "storage_class_specifier" || kind == "attribute_specifier" {
if is_cuda_qualifier(text) {
qualifiers.push(text.to_string());
}
} else if is_cuda_qualifier(text) {
qualifiers.push(text.to_string());
}
}
qualifiers
}

// ── Declarator helpers (mirror cpp.rs) ──────────────────────────────────────
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing type-map walk present in C++ extractor

cpp.rs performs a third walk_tree pass with match_cpp_type_map that populates symbols.type_map with variable-to-type bindings (from declaration and parameter_declaration nodes). The CUDA extractor only does two passes and never populates type_map. As a result, CUDA code like DeviceBuffer buf; buf.copy(src, n); will be unable to resolve the copy call to DeviceBuffer.copy — type-inferred call resolution that C++ files benefit from will not fire for any CUDA file in the native layer.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 25f0adc. Added a third walk_tree pass with match_cuda_type_map that mirrors cpp.rs exactly — populates type_map from declaration and parameter_declaration nodes so receiver-typed calls like buf.copy(...) resolve to DeviceBuffer.copy for CUDA sources too. Added two new unit tests (populates_type_map_from_declarations, populates_type_map_from_parameters) to lock the behavior in.

Comment on lines +413 to +432
}
}

fn match_cuda_node(node: &Node, source: &[u8], symbols: &mut FileSymbols, _depth: usize) {
match node.kind() {
"function_definition" => handle_cuda_function_definition(node, source, symbols),
"class_specifier" => handle_cuda_class_specifier(node, source, symbols),
"struct_specifier" => handle_cuda_struct_specifier(node, source, symbols),
"enum_specifier" => handle_cuda_enum_specifier(node, source, symbols),
"namespace_definition" => handle_cuda_namespace_definition(node, source, symbols),
"type_definition" => handle_cuda_type_definition(node, source, symbols),
"preproc_include" => handle_cuda_preproc_include(node, source, symbols),
"call_expression" => handle_cuda_call_expression(node, source, symbols),
_ => {}
}
}

#[cfg(test)]
mod tests {
use super::*;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Import names retain file extension, diverging from C++ extractor behavior

The C++ extractor strips .h/.hpp from the import name, so #include <cuda_runtime.h> produces names: ["cuda_runtime"]. The CUDA extractor keeps the full filename, producing names: ["cuda_runtime.h"]. Any cInclude resolution logic that matches against the stripped name will fail to link CUDA header includes. The unit test for this handler checks only source, not names, so the divergence is currently invisible to the test suite.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 25f0adc. handle_cuda_preproc_include now strips .cuh/.hpp/.h from the import name, matching the native C/C++ extractors so cInclude resolution links CUDA headers consistently. Strengthened extracts_include_with_c_include_flag to assert the stripped names (cuda_runtime, mylib) so future divergence will fail the test suite.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Codegraph Impact Analysis

40 functions changed24 callers affected across 2 files

  • CudaExtractor.extract in crates/codegraph-core/src/extractors/cuda.rs:23 (0 transitive callers)
  • match_cuda_type_map in crates/codegraph-core/src/extractors/cuda.rs:46 (0 transitive callers)
  • is_cuda_qualifier in crates/codegraph-core/src/extractors/cuda.rs:102 (3 transitive callers)
  • extract_cuda_qualifiers in crates/codegraph-core/src/extractors/cuda.rs:117 (2 transitive callers)
  • unwrap_cuda_declarator in crates/codegraph-core/src/extractors/cuda.rs:139 (9 transitive callers)
  • extract_cuda_function_name in crates/codegraph-core/src/extractors/cuda.rs:162 (2 transitive callers)
  • extract_cuda_func_name_from_declarator in crates/codegraph-core/src/extractors/cuda.rs:167 (3 transitive callers)
  • extract_cuda_parameters in crates/codegraph-core/src/extractors/cuda.rs:182 (2 transitive callers)
  • extract_cuda_fields in crates/codegraph-core/src/extractors/cuda.rs:214 (3 transitive callers)
  • extract_cuda_enum_constants in crates/codegraph-core/src/extractors/cuda.rs:231 (2 transitive callers)
  • extract_cuda_base_classes in crates/codegraph-core/src/extractors/cuda.rs:251 (2 transitive callers)
  • handle_cuda_function_definition in crates/codegraph-core/src/extractors/cuda.rs:284 (1 transitive callers)
  • handle_cuda_class_specifier in crates/codegraph-core/src/extractors/cuda.rs:320 (1 transitive callers)
  • handle_cuda_struct_specifier in crates/codegraph-core/src/extractors/cuda.rs:341 (1 transitive callers)
  • handle_cuda_enum_specifier in crates/codegraph-core/src/extractors/cuda.rs:361 (1 transitive callers)
  • handle_cuda_namespace_definition in crates/codegraph-core/src/extractors/cuda.rs:377 (1 transitive callers)
  • handle_cuda_type_definition in crates/codegraph-core/src/extractors/cuda.rs:392 (1 transitive callers)
  • handle_cuda_preproc_include in crates/codegraph-core/src/extractors/cuda.rs:422 (1 transitive callers)
  • handle_cuda_call_expression in crates/codegraph-core/src/extractors/cuda.rs:449 (1 transitive callers)
  • match_cuda_node in crates/codegraph-core/src/extractors/cuda.rs:478 (0 transitive callers)

… in CUDA

CUDA files now populate `type_map` from declarations and parameter
declarations via a third walk pass, matching `cpp.rs`. Without it,
receiver-typed calls like `buf.copy(...)` could not resolve to
`DeviceBuffer.copy` for CUDA sources in the native layer.

`#include` import names also now drop the trailing `.cuh`/`.hpp`/`.h`
extension so `cInclude` resolution links CUDA headers consistently with
the native C/C++ extractors. Strengthens the include test to cover the
stripped names and adds two type_map tests.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 06aaa85 into main May 12, 2026
27 checks passed
@carlos-alm carlos-alm deleted the feat/1071-cuda-rust-extractor branch May 12, 2026 06:15
@github-actions github-actions Bot locked and limited conversation to collaborators May 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant