Skip to content

feat(native): port Julia extractor to Rust#1098

Merged
carlos-alm merged 8 commits into
mainfrom
feat/1071-julia-rust-extractor
May 12, 2026
Merged

feat(native): port Julia extractor to Rust#1098
carlos-alm merged 8 commits into
mainfrom
feat/1071-julia-rust-extractor

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Add tree-sitter-julia dependency and extractors/julia.rs native extractor wired through LanguageKind::Julia
  • Mirrors the WASM-side behavior (function, short-form, module-prefixed names, struct/abstract/macro defs, qualified calls, using/import/selected_import)
  • Adds JULIA_AST_CONFIG (native) and matching JULIA_AST_TYPES / JULIA_STRING_CONFIG entries in src/ast-analysis/rules/index.ts
  • .jl removed from change-detection WASM-only skip list; added to file_collector SUPPORTED_EXTENSIONS

Part of #1071

Test plan

  • cargo build --release -p codegraph-core
  • cargo test -p codegraph-core --lib — 195 passed (11 new Julia tests)
  • npx vitest run tests/parsers/julia.test.ts — 6 passed
  • npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t "julia" — 5 passed

Adds tree-sitter-julia dependency and native extractor matching
the WASM-side behavior for Julia symbol, import, and call extraction.

Part of #1071
@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Claude finished @carlos-alm's task —— View job


Reviewing Julia Extractor Implementation

Todo List:

  • Analyze root cause and problem justification
  • Check for configuration changes that might mask issues
  • Verify test integrity and coverage
  • Review Julia extractor implementation quality
  • Check compliance with zero-dep and foundation principles
  • Assess breaking changes and API compatibility
  • Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: This PR implements native Rust Julia parsing to replace/complement WASM-only support. This is part of the broader dual-engine architecture where codegraph provides both native (fast) and WASM (fallback) parsing support.

Why This Approach: Adding native Julia support follows the established pattern for other languages. Julia was previously WASM-only (mentioned in the "WASM-only languages" list), and this PR ports it to the native Rust engine for better performance.

Risk Assessment: Low risk. This is a pure addition with appropriate fallback mechanisms. The implementation follows established patterns from other language extractors.

Backlog Compliance

  • Zero-dep: ✓ — Only adds tree-sitter-julia = "0.23" which follows the same pattern as other tree-sitter grammar deps
  • Foundation-aligned: ✓ — Fully compliant with all principles. No cloud APIs, maintains local parsing, builds on existing tree-sitter architecture
  • Problem-fit: 4 — Directly improves parsing accuracy for Julia codebases, reduces fallback to WASM, helps AI agents navigate Julia code better
  • Breaking: No — Purely additive. Existing WASM fallback remains available if native fails
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Configuration & Test Analysis

Configuration Changes:
Appropriate changes found:

  • Removed .jl from WASM-only skip list in change_detection.rs:135 (line 135 comment updated to remove .jl)
  • Added .jl to SUPPORTED_EXTENSIONS in file_collector.rs:39
  • These changes correctly reflect that Julia is no longer WASM-only

Test Integrity: ✅ Strong test coverage:

  • 11 new native Rust tests in julia.rs:486-618 covering all major constructs
  • Existing WASM tests in julia.test.ts remain intact (6 tests)
  • Resolution benchmark tests with hand-annotated expected edges for Julia
  • No test assertions weakened or coverage thresholds lowered

Code Quality Assessment

Implementation Quality: ✅ Excellent

  • Comprehensive coverage: Functions, modules, structs, macros, imports, calls, parameters
  • Correct module prefixing: module Foo → functions become Foo.bar (line 96-98)
  • Proper short-form handling: add(x,y) = x + y detected as function (lines 140-175)
  • Qualified call support: Repository.save() → receiver=Repository, name=save (lines 426-437)
  • Import handling: Multiple import forms (using, import, selected_import) properly parsed (lines 321-383)
  • AST node extraction: Proper JULIA_AST_CONFIG for string literals (helpers.rs:363-371)

Rust Code Quality:

  • Follows established patterns from other extractors
  • Proper error handling with safe node access
  • Good separation of concerns with dedicated handler functions
  • Comprehensive test coverage including edge cases

Parity with WASM: ✅ High fidelity
The native implementation closely mirrors the WASM extractor in julia.ts. Key behavioral matches:

  • Module-prefixed names (WASM line 96, Rust line 96-98)
  • Macro prefix handling (@macroname format)
  • Same import resolution logic
  • Identical struct field extraction
  • Parameter extraction from function signatures

Architecture Compliance:

  • Follows dual-engine architecture correctly
  • Proper integration with LanguageKind::Julia
  • Correct AST config mapping for TypeScript side
  • File collector integration follows patterns

Minor Observations

  1. String prefixes: Julia JULIA_AST_CONFIG has empty string_prefixes which is correct (unlike Python's rbfuRBFU or C++'s LuUR)
  2. Test coverage: Both unit tests and integration tests via resolution benchmarks
  3. Documentation: Good inline comments explaining tree-sitter-julia grammar shapes
  4. Complexity/CFG support: Properly integrated via compute_all_metrics and build_function_cfg calls

Critical Concerns

None identified. This is a well-implemented addition that follows all established patterns.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality implementation following established patterns with comprehensive test coverage. Fills a gap in native parsing support without introducing risk. The only reason this isn't 5-star is that it's feature parity work rather than groundbreaking innovation, but it's executed excellently.

Specific Strengths:

  • Exact behavioral parity between WASM and native engines
  • Comprehensive test coverage across multiple test suites
  • Proper integration with all codegraph subsystems
  • Follows zero-dep and foundation principles perfectly
  • No configuration compromises or test weakening

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR ports the Julia symbol extractor from the WASM/TypeScript path to native Rust, adding tree-sitter-julia, a new JuliaExtractor, and all supporting infrastructure to treat .jl files as a first-class native language. All three bugs identified in earlier review rounds have been addressed before merge.

  • New extractors/julia.rs: open-coded recursive walker threading current_module state; handles function defs, short-form assignments, structs, abstract types, macros, qualified calls, and using/import/selected_import. Regression tests for every previously-flagged edge case are included.
  • Infrastructure wiring: LanguageKind::Julia added to parser_registry, .jl added to SUPPORTED_EXTENSIONS and NATIVE_SUPPORTED_EXTENSIONS, removed from the WASM-only skip list in change_detection, and JULIA_AST_CONFIG/JULIA_AST_TYPES/JULIA_STRING_CONFIG added for AST-node classification on both sides.

Confidence Score: 5/5

Safe to merge — the extractor is well-tested, all previously-flagged regressions are fixed, and the wiring changes are mechanical and exhaustive.

The new Julia extractor handles every documented CST shape correctly. The recursive find_base_name helper cleanly addresses the parameterized-type corner cases, and the qualified-name double-prefix bug is guarded in both handle_function_def and handle_assignment. The only remaining limitation is multi-module using Foo, Bar producing a single import record, a pre-existing characteristic that mirrors the WASM extractor and does not affect definition or call extraction.

crates/codegraph-core/src/extractors/julia.rs — specifically the handle_import multi-module path, which would benefit from a using Foo, Bar test.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/julia.rs New 756-line native Julia extractor. All three previously-flagged regressions fixed. One remaining edge case: multi-module using Foo, Bar emits a single mis-sourced import record.
crates/codegraph-core/src/extractors/helpers.rs Adds JULIA_AST_CONFIG with correct string_types for string_literal and prefixed_string_literal. No issues.
crates/codegraph-core/src/parser_registry.rs Adds Julia variant to LanguageKind enum, maps .jl extension, wires tree-sitter-julia, adds Julia to all() list, updates EXPECTED_LEN to 27.
crates/codegraph-core/src/file_collector.rs Adds .jl to SUPPORTED_EXTENSIONS and updates doc comment. Consistent with change_detection update.
src/ast-analysis/rules/index.ts Adds JULIA_AST_TYPES and JULIA_STRING_CONFIG, registers both in AST_TYPE_MAPS and AST_STRING_CONFIGS. Correct.
src/domain/parser.ts Adds .jl to NATIVE_SUPPORTED_EXTENSIONS. Correct.
crates/codegraph-core/src/change_detection.rs Removes julia/main.jl from WASM-only skip list test, consistent with Julia now being natively collected.
tests/parsers/native-drop-classification.test.ts Removes .jl from unsupported-by-native list and decrements expected count from 9 to 8. Correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[".jl file ingested"] --> B["LanguageKind::Julia\nparser_registry.rs"]
    B --> C["tree-sitter-julia parse Tree"]
    C --> D["JuliaExtractor.extract()"]
    D --> E["walk_julia\ncurrent_module threading"]
    E --> G["handle_module_def\npush Definition + set next_module"]
    E --> H["handle_function_def\npush Definition + complexity/CFG"]
    E --> I["handle_assignment\nshort-form functions"]
    E --> J["handle_struct_def\nfind_base_name + ClassRelation"]
    E --> K["handle_abstract_def\nfind_base_name"]
    E --> L["handle_macro_def\npush Definition @name"]
    E --> M["handle_import\npush Import source+names"]
    E --> N["handle_call\npush Call name+receiver"]
    E --> E
    D --> O["walk_ast_nodes_with_config\nJULIA_AST_CONFIG"]
    D --> P["FileSymbols output"]
Loading

Fix All in Claude Code

Reviews (6): Last reviewed commit: "fix: resolve merge conflicts with main (..." | Re-trigger Greptile

Comment on lines +351 to +373
let last = stripped.rsplit('.').next().unwrap_or(stripped);
if !last.is_empty() {
names.push(last.to_string());
}
}
"selected_import" => {
// First identifier is the source module; the rest are imported names.
let mut first = true;
for j in 0..child.child_count() {
let Some(part) = child.child(j) else { continue };
if part.kind() == "identifier" {
let txt = node_text(&part, source).to_string();
if first {
if source_str.is_empty() {
source_str = txt.clone();
}
first = false;
} else {
names.push(txt);
}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Garbled name for parameterized generic abstract types

When type_head exists but neither find_child(&th, "identifier") nor find_child(&bin, "identifier") succeeds — which happens for any parameterized abstract type like abstract type AbstractVector{T} <: AbstractArray{T,1} endunwrap_or(th) falls back to the type_head node itself. node_text(&th, source) then returns the full raw text "AbstractVector{T} <: AbstractArray{T,1}", which gets pushed as a definition name. The TS counterpart (handleAbstractDef) simply returns early when no identifier is found, emitting nothing rather than a garbage name. This pollutes the code graph with a nonsensical definition key for any parameterized abstract type.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a4a8b5c. handle_abstract_def now recurses through binary_expression, parametrized_type_expression, parameterized_identifier, type_parameter_list, and type_argument_list wrappers to find the base-name identifier, and returns early (matching the TS extractor) when none is found — no more falling back to unwrap_or(th) and emitting the raw Name{T} <: Super{T,1} text. Added a regression test asserting that abstract type AbstractVector{T} <: AbstractArray{T,1} end records only AbstractVector.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in a4a8b5c (before the merge) — see existing reply. Adding the parameterized-struct fix in 47b9c1f addresses the symmetric issue in handle_struct_def.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Codegraph Impact Analysis

38 functions changed24 callers affected across 2 files

  • detect_removed_skips_unsupported_extensions in crates/codegraph-core/src/change_detection.rs:776 (0 transitive callers)
  • JuliaExtractor.extract in crates/codegraph-core/src/extractors/julia.rs:11 (0 transitive callers)
  • walk_julia in crates/codegraph-core/src/extractors/julia.rs:24 (1 transitive callers)
  • handle_module_def in crates/codegraph-core/src/extractors/julia.rs:55 (2 transitive callers)
  • signature_call in crates/codegraph-core/src/extractors/julia.rs:80 (4 transitive callers)
  • handle_function_def in crates/codegraph-core/src/extractors/julia.rs:87 (2 transitive callers)
  • handle_assignment in crates/codegraph-core/src/extractors/julia.rs:143 (2 transitive callers)
  • handle_struct_def in crates/codegraph-core/src/extractors/julia.rs:183 (2 transitive callers)
  • handle_abstract_def in crates/codegraph-core/src/extractors/julia.rs:264 (2 transitive callers)
  • find_base_name in crates/codegraph-core/src/extractors/julia.rs:303 (4 transitive callers)
  • handle_macro_def in crates/codegraph-core/src/extractors/julia.rs:333 (2 transitive callers)
  • handle_import in crates/codegraph-core/src/extractors/julia.rs:368 (2 transitive callers)
  • handle_call in crates/codegraph-core/src/extractors/julia.rs:438 (2 transitive callers)
  • extract_julia_params in crates/codegraph-core/src/extractors/julia.rs:503 (4 transitive callers)
  • parse_jl in crates/codegraph-core/src/extractors/julia.rs:542 (16 transitive callers)
  • finds_function in crates/codegraph-core/src/extractors/julia.rs:552 (0 transitive callers)
  • finds_short_form_function in crates/codegraph-core/src/extractors/julia.rs:561 (0 transitive callers)
  • module_prefixes_inner_functions in crates/codegraph-core/src/extractors/julia.rs:577 (0 transitive callers)
  • extracts_struct_with_fields_and_supertype in crates/codegraph-core/src/extractors/julia.rs:585 (0 transitive callers)
  • extracts_struct_without_supertype in crates/codegraph-core/src/extractors/julia.rs:606 (0 transitive callers)

Native Julia support landed in this PR but the JS-side mirror of the
Rust LanguageKind enum was not updated, so the drift guard in
tests/parsers/native-drop-classification.test.ts (and the WASM-only
bucket in classifyNativeDrops) flagged .jl as missing. Add .jl to the
set and drop it from the WASM-only test fixture.
handle_abstract_def previously fell back to the type_head node itself
when no plain identifier was found, dumping the full raw text
"Name{T} <: Super{T,1}" as a definition name for any parameterized
generic abstract type. The TS counterpart returns early on no-name; the
native port should match.

Recurse into wrapper shapes (binary_expression, parameterized identifier,
type_parameter_list, type_argument_list) to locate the base identifier,
and skip emission when none is found. Adds a regression test asserting
the base name "AbstractVector" for
"abstract type AbstractVector{T} <: AbstractArray{T,1} end".
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Pushed fixes:

  • 3393ce7fix(parser): register .jl in NATIVE_SUPPORTED_EXTENSIONS. Closes the CI failure in tests/parsers/native-drop-classification.test.ts (drift guard between parser_registry.rs and the JS-side mirror). Also drops .jl from the WASM-only fixture in the same test file.
  • a4a8b5cfix(julia): emit base name for parameterized abstract types. Addresses Greptile P1: handle_abstract_def no longer falls back to unwrap_or(th) and dumps raw Name{T} <: Super{T,1} text. Recurses through wrapper shapes (binary_expression, parametrized_type_expression, parameterized_identifier, type_parameter_list, type_argument_list) to locate the base identifier; returns early when none is found, matching the TS extractor. Added a regression test (extracts_parameterized_abstract_type_base_name).

Also merged origin/main (clean merge — no conflicts).

Local verification:

  • cargo test -p codegraph-core --lib → 196 passed (12 Julia tests, +1 new)
  • npx vitest run tests/parsers/native-drop-classification.test.ts → 13 passed
  • npx biome check on changed files → clean

@greptileai

Comment on lines +186 to +211
let (name_node, supertype): (Node, Option<Node>) = if let Some(bin) =
find_child(&type_head, "binary_expression")
{
// First identifier is the struct name, last identifier (after `<:`) is the supertype.
let mut name_id: Option<Node> = None;
let mut super_id: Option<Node> = None;
for i in 0..bin.child_count() {
if let Some(c) = bin.child(i) {
if c.kind() == "identifier" {
if name_id.is_none() {
name_id = Some(c);
} else {
super_id = Some(c);
}
}
}
}
match name_id {
Some(n) => (n, super_id),
None => return,
}
} else if let Some(id) = find_child(&type_head, "identifier") {
(id, None)
} else {
return;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Parameterized generic structs silently dropped

handle_struct_def only looks for direct identifier children of type_head and direct identifier children of any binary_expression inside it. For a parameterized struct like struct Vec{T} or struct Vec{T} <: AbstractArray{T,1}, tree-sitter-julia wraps the name in a parameterized_identifier node — so find_child(&type_head, "identifier") returns None and bin.child(i).kind() == "identifier" never matches. Both forms fall through to return with no definition emitted. The same CST shapes were exactly what motivated the find_abstract_name recursive helper added for handle_abstract_defhandle_struct_def needs the same treatment to extract the base-name identifier and, separately, walk to the first identifier of the RHS for the supertype.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 47b9c1f. handle_struct_def now uses the same recursive find_base_name helper as handle_abstract_def, which walks into parameterized_identifier, parametrized_type_expression, type_parameter_list, type_argument_list, and nested binary_expression wrappers — so struct Vec{T} and struct Vec{T} <: AbstractArray{T,1} both record Vec as the base name and AbstractArray as the supertype. Added extracts_parameterized_struct_base_name to lock the behavior in.

Comment on lines +393 to +410
"selected_import" => {
// First identifier is the source module; the rest are imported names.
let mut first = true;
for j in 0..child.child_count() {
let Some(part) = child.child(j) else { continue };
if part.kind() == "identifier" {
let txt = node_text(&part, source).to_string();
if first {
if source_str.is_empty() {
source_str = txt.clone();
}
first = false;
} else {
names.push(txt);
}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Qualified-module selected_import produces a wrong import record

The selected_import inner loop only matches part.kind() == "identifier", but for import Foo.Bar: baz the module part is a scoped_identifier (Foo.Bar), not an identifier. The loop skips it, so first is still true when baz is encountered — source_str becomes "baz" and no names are pushed. The emitted import record has source = "baz" with names = ["baz"], making the edge (module→imported-name) completely wrong. This pattern is common in Julia codebases (e.g. import LinearAlgebra.BLAS: gemm). Handling "scoped_identifier" alongside "identifier" inside the selected_import branch, using rsplit('.').next() for the display name as the outer loop already does, would fix it.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 47b9c1f. The selected_import inner loop now matches scoped_identifier alongside identifier, and uses rsplit('.').next() for the imported-name display (mirroring the outer loop). import LinearAlgebra.BLAS: gemm now records source="LinearAlgebra.BLAS", names=["gemm"]. Added selected_import_handles_qualified_module to lock the behavior in.

Comment on lines +243 to +252
line: start_line(node),
});
}

symbols.definitions.push(Definition {
name: struct_name,
kind: "struct".to_string(),
line: start_line(node),
end_line: Some(end_line(node)),
decorators: None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Module double-prefix for qualified short-form method extensions

When func_name_node is a scoped_identifier — as it is for Foo.bar(x, y) = x + y (extending a method from another module) — base already contains the full qualified name "Foo.bar". Prepending current_module then produces "Outer.Foo.bar" instead of the correct "Foo.bar". The guard if base.contains('.') (skip the prefix) would fix both handle_assignment and the symmetric path in handle_function_def (line 186) where call_sig.child(0) can be a scoped_identifier for function Base.show(io, x) ... end inside a module, yielding "Foo.Base.show".

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 47b9c1f. Both handle_function_def and handle_assignment now guard the module-prefix step with if !base.contains('.') — when the LHS is already a scoped_identifier (e.g. function Base.show ... end or Foo.bar(x, y) = x + y inside a module), the qualifier is preserved as-is rather than double-prefixed. Added qualified_short_form_method_does_not_double_prefix and qualified_function_def_does_not_double_prefix to lock the behavior in.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit aa7812b into main May 12, 2026
27 checks passed
@carlos-alm carlos-alm deleted the feat/1071-julia-rust-extractor branch May 12, 2026 06:58
@github-actions github-actions Bot locked and limited conversation to collaborators May 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant