feat(native): port Julia extractor to Rust#1098
Conversation
Adds tree-sitter-julia dependency and native extractor matching the WASM-side behavior for Julia symbol, import, and call extraction. Part of #1071
|
Claude finished @carlos-alm's task —— View job Reviewing Julia Extractor ImplementationTodo List:
Root Cause AnalysisProblem Being Solved: This PR implements native Rust Julia parsing to replace/complement WASM-only support. This is part of the broader dual-engine architecture where codegraph provides both native (fast) and WASM (fallback) parsing support. Why This Approach: Adding native Julia support follows the established pattern for other languages. Julia was previously WASM-only (mentioned in the "WASM-only languages" list), and this PR ports it to the native Rust engine for better performance. Risk Assessment: Low risk. This is a pure addition with appropriate fallback mechanisms. The implementation follows established patterns from other language extractors. Backlog Compliance
Configuration & Test AnalysisConfiguration Changes:
Test Integrity: ✅ Strong test coverage:
Code Quality AssessmentImplementation Quality: ✅ Excellent
Rust Code Quality:
Parity with WASM: ✅ High fidelity
Architecture Compliance:
Minor Observations
Critical ConcernsNone identified. This is a well-implemented addition that follows all established patterns. Final Recommendation
Specific Strengths:
|
Greptile SummaryThis PR ports the Julia symbol extractor from the WASM/TypeScript path to native Rust, adding
Confidence Score: 5/5Safe to merge — the extractor is well-tested, all previously-flagged regressions are fixed, and the wiring changes are mechanical and exhaustive. The new Julia extractor handles every documented CST shape correctly. The recursive find_base_name helper cleanly addresses the parameterized-type corner cases, and the qualified-name double-prefix bug is guarded in both handle_function_def and handle_assignment. The only remaining limitation is multi-module using Foo, Bar producing a single import record, a pre-existing characteristic that mirrors the WASM extractor and does not affect definition or call extraction. crates/codegraph-core/src/extractors/julia.rs — specifically the handle_import multi-module path, which would benefit from a using Foo, Bar test. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[".jl file ingested"] --> B["LanguageKind::Julia\nparser_registry.rs"]
B --> C["tree-sitter-julia parse Tree"]
C --> D["JuliaExtractor.extract()"]
D --> E["walk_julia\ncurrent_module threading"]
E --> G["handle_module_def\npush Definition + set next_module"]
E --> H["handle_function_def\npush Definition + complexity/CFG"]
E --> I["handle_assignment\nshort-form functions"]
E --> J["handle_struct_def\nfind_base_name + ClassRelation"]
E --> K["handle_abstract_def\nfind_base_name"]
E --> L["handle_macro_def\npush Definition @name"]
E --> M["handle_import\npush Import source+names"]
E --> N["handle_call\npush Call name+receiver"]
E --> E
D --> O["walk_ast_nodes_with_config\nJULIA_AST_CONFIG"]
D --> P["FileSymbols output"]
Reviews (6): Last reviewed commit: "fix: resolve merge conflicts with main (..." | Re-trigger Greptile |
| let last = stripped.rsplit('.').next().unwrap_or(stripped); | ||
| if !last.is_empty() { | ||
| names.push(last.to_string()); | ||
| } | ||
| } | ||
| "selected_import" => { | ||
| // First identifier is the source module; the rest are imported names. | ||
| let mut first = true; | ||
| for j in 0..child.child_count() { | ||
| let Some(part) = child.child(j) else { continue }; | ||
| if part.kind() == "identifier" { | ||
| let txt = node_text(&part, source).to_string(); | ||
| if first { | ||
| if source_str.is_empty() { | ||
| source_str = txt.clone(); | ||
| } | ||
| first = false; | ||
| } else { | ||
| names.push(txt); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Garbled name for parameterized generic abstract types
When type_head exists but neither find_child(&th, "identifier") nor find_child(&bin, "identifier") succeeds — which happens for any parameterized abstract type like abstract type AbstractVector{T} <: AbstractArray{T,1} end — unwrap_or(th) falls back to the type_head node itself. node_text(&th, source) then returns the full raw text "AbstractVector{T} <: AbstractArray{T,1}", which gets pushed as a definition name. The TS counterpart (handleAbstractDef) simply returns early when no identifier is found, emitting nothing rather than a garbage name. This pollutes the code graph with a nonsensical definition key for any parameterized abstract type.
There was a problem hiding this comment.
Fixed in a4a8b5c. handle_abstract_def now recurses through binary_expression, parametrized_type_expression, parameterized_identifier, type_parameter_list, and type_argument_list wrappers to find the base-name identifier, and returns early (matching the TS extractor) when none is found — no more falling back to unwrap_or(th) and emitting the raw Name{T} <: Super{T,1} text. Added a regression test asserting that abstract type AbstractVector{T} <: AbstractArray{T,1} end records only AbstractVector.
Codegraph Impact Analysis38 functions changed → 24 callers affected across 2 files
|
Native Julia support landed in this PR but the JS-side mirror of the Rust LanguageKind enum was not updated, so the drift guard in tests/parsers/native-drop-classification.test.ts (and the WASM-only bucket in classifyNativeDrops) flagged .jl as missing. Add .jl to the set and drop it from the WASM-only test fixture.
handle_abstract_def previously fell back to the type_head node itself
when no plain identifier was found, dumping the full raw text
"Name{T} <: Super{T,1}" as a definition name for any parameterized
generic abstract type. The TS counterpart returns early on no-name; the
native port should match.
Recurse into wrapper shapes (binary_expression, parameterized identifier,
type_parameter_list, type_argument_list) to locate the base identifier,
and skip emission when none is found. Adds a regression test asserting
the base name "AbstractVector" for
"abstract type AbstractVector{T} <: AbstractArray{T,1} end".
|
Pushed fixes:
Also merged Local verification:
|
| let (name_node, supertype): (Node, Option<Node>) = if let Some(bin) = | ||
| find_child(&type_head, "binary_expression") | ||
| { | ||
| // First identifier is the struct name, last identifier (after `<:`) is the supertype. | ||
| let mut name_id: Option<Node> = None; | ||
| let mut super_id: Option<Node> = None; | ||
| for i in 0..bin.child_count() { | ||
| if let Some(c) = bin.child(i) { | ||
| if c.kind() == "identifier" { | ||
| if name_id.is_none() { | ||
| name_id = Some(c); | ||
| } else { | ||
| super_id = Some(c); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| match name_id { | ||
| Some(n) => (n, super_id), | ||
| None => return, | ||
| } | ||
| } else if let Some(id) = find_child(&type_head, "identifier") { | ||
| (id, None) | ||
| } else { | ||
| return; | ||
| }; |
There was a problem hiding this comment.
Parameterized generic structs silently dropped
handle_struct_def only looks for direct identifier children of type_head and direct identifier children of any binary_expression inside it. For a parameterized struct like struct Vec{T} or struct Vec{T} <: AbstractArray{T,1}, tree-sitter-julia wraps the name in a parameterized_identifier node — so find_child(&type_head, "identifier") returns None and bin.child(i).kind() == "identifier" never matches. Both forms fall through to return with no definition emitted. The same CST shapes were exactly what motivated the find_abstract_name recursive helper added for handle_abstract_def — handle_struct_def needs the same treatment to extract the base-name identifier and, separately, walk to the first identifier of the RHS for the supertype.
There was a problem hiding this comment.
Fixed in 47b9c1f. handle_struct_def now uses the same recursive find_base_name helper as handle_abstract_def, which walks into parameterized_identifier, parametrized_type_expression, type_parameter_list, type_argument_list, and nested binary_expression wrappers — so struct Vec{T} and struct Vec{T} <: AbstractArray{T,1} both record Vec as the base name and AbstractArray as the supertype. Added extracts_parameterized_struct_base_name to lock the behavior in.
| "selected_import" => { | ||
| // First identifier is the source module; the rest are imported names. | ||
| let mut first = true; | ||
| for j in 0..child.child_count() { | ||
| let Some(part) = child.child(j) else { continue }; | ||
| if part.kind() == "identifier" { | ||
| let txt = node_text(&part, source).to_string(); | ||
| if first { | ||
| if source_str.is_empty() { | ||
| source_str = txt.clone(); | ||
| } | ||
| first = false; | ||
| } else { | ||
| names.push(txt); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Qualified-module
selected_import produces a wrong import record
The selected_import inner loop only matches part.kind() == "identifier", but for import Foo.Bar: baz the module part is a scoped_identifier (Foo.Bar), not an identifier. The loop skips it, so first is still true when baz is encountered — source_str becomes "baz" and no names are pushed. The emitted import record has source = "baz" with names = ["baz"], making the edge (module→imported-name) completely wrong. This pattern is common in Julia codebases (e.g. import LinearAlgebra.BLAS: gemm). Handling "scoped_identifier" alongside "identifier" inside the selected_import branch, using rsplit('.').next() for the display name as the outer loop already does, would fix it.
There was a problem hiding this comment.
Fixed in 47b9c1f. The selected_import inner loop now matches scoped_identifier alongside identifier, and uses rsplit('.').next() for the imported-name display (mirroring the outer loop). import LinearAlgebra.BLAS: gemm now records source="LinearAlgebra.BLAS", names=["gemm"]. Added selected_import_handles_qualified_module to lock the behavior in.
| line: start_line(node), | ||
| }); | ||
| } | ||
|
|
||
| symbols.definitions.push(Definition { | ||
| name: struct_name, | ||
| kind: "struct".to_string(), | ||
| line: start_line(node), | ||
| end_line: Some(end_line(node)), | ||
| decorators: None, |
There was a problem hiding this comment.
Module double-prefix for qualified short-form method extensions
When func_name_node is a scoped_identifier — as it is for Foo.bar(x, y) = x + y (extending a method from another module) — base already contains the full qualified name "Foo.bar". Prepending current_module then produces "Outer.Foo.bar" instead of the correct "Foo.bar". The guard if base.contains('.') (skip the prefix) would fix both handle_assignment and the symmetric path in handle_function_def (line 186) where call_sig.child(0) can be a scoped_identifier for function Base.show(io, x) ... end inside a module, yielding "Foo.Base.show".
There was a problem hiding this comment.
Fixed in 47b9c1f. Both handle_function_def and handle_assignment now guard the module-prefix step with if !base.contains('.') — when the LHS is already a scoped_identifier (e.g. function Base.show ... end or Foo.bar(x, y) = x + y inside a module), the qualifier is preserved as-is rather than double-prefixed. Added qualified_short_form_method_does_not_double_prefix and qualified_function_def_does_not_double_prefix to lock the behavior in.
Summary
tree-sitter-juliadependency andextractors/julia.rsnative extractor wired throughLanguageKind::Juliausing/import/selected_import)JULIA_AST_CONFIG(native) and matchingJULIA_AST_TYPES/JULIA_STRING_CONFIGentries insrc/ast-analysis/rules/index.ts.jlremoved from change-detection WASM-only skip list; added tofile_collectorSUPPORTED_EXTENSIONSPart of #1071
Test plan
cargo build --release -p codegraph-corecargo test -p codegraph-core --lib— 195 passed (11 new Julia tests)npx vitest run tests/parsers/julia.test.ts— 6 passednpx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t "julia"— 5 passed