feat(native): port Julia extractor to Rust by carlos-alm · Pull Request #1098 · optave/ops-codegraph-tool

carlos-alm · 2026-05-11T09:38:43Z

Summary

Add tree-sitter-julia dependency and extractors/julia.rs native extractor wired through LanguageKind::Julia
Mirrors the WASM-side behavior (function, short-form, module-prefixed names, struct/abstract/macro defs, qualified calls, using/import/selected_import)
Adds JULIA_AST_CONFIG (native) and matching JULIA_AST_TYPES / JULIA_STRING_CONFIG entries in src/ast-analysis/rules/index.ts
.jl removed from change-detection WASM-only skip list; added to file_collector SUPPORTED_EXTENSIONS

Part of #1071

Test plan

cargo build --release -p codegraph-core
cargo test -p codegraph-core --lib — 195 passed (11 new Julia tests)
npx vitest run tests/parsers/julia.test.ts — 6 passed
npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t "julia" — 5 passed

Adds tree-sitter-julia dependency and native extractor matching the WASM-side behavior for Julia symbol, import, and call extraction. Part of #1071

claude · 2026-05-11T09:39:06Z

Claude finished @carlos-alm's task —— View job

Reviewing Julia Extractor Implementation

Todo List:

Analyze root cause and problem justification
Check for configuration changes that might mask issues
Verify test integrity and coverage
Review Julia extractor implementation quality
Check compliance with zero-dep and foundation principles
Assess breaking changes and API compatibility
Provide final rating and recommendation

Root Cause Analysis

Problem Being Solved: This PR implements native Rust Julia parsing to replace/complement WASM-only support. This is part of the broader dual-engine architecture where codegraph provides both native (fast) and WASM (fallback) parsing support.

Why This Approach: Adding native Julia support follows the established pattern for other languages. Julia was previously WASM-only (mentioned in the "WASM-only languages" list), and this PR ports it to the native Rust engine for better performance.

Risk Assessment: Low risk. This is a pure addition with appropriate fallback mechanisms. The implementation follows established patterns from other language extractors.

Backlog Compliance

Zero-dep: ✓ — Only adds tree-sitter-julia = "0.23" which follows the same pattern as other tree-sitter grammar deps
Foundation-aligned: ✓ — Fully compliant with all principles. No cloud APIs, maintains local parsing, builds on existing tree-sitter architecture
Problem-fit: 4 — Directly improves parsing accuracy for Julia codebases, reduces fallback to WASM, helps AI agents navigate Julia code better
Breaking: No — Purely additive. Existing WASM fallback remains available if native fails
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Configuration & Test Analysis

Configuration Changes:
✅ Appropriate changes found:

Removed .jl from WASM-only skip list in change_detection.rs:135 (line 135 comment updated to remove .jl)
Added .jl to SUPPORTED_EXTENSIONS in file_collector.rs:39
These changes correctly reflect that Julia is no longer WASM-only

Test Integrity: ✅ Strong test coverage:

11 new native Rust tests in julia.rs:486-618 covering all major constructs
Existing WASM tests in julia.test.ts remain intact (6 tests)
Resolution benchmark tests with hand-annotated expected edges for Julia
No test assertions weakened or coverage thresholds lowered

Code Quality Assessment

Implementation Quality: ✅ Excellent

Comprehensive coverage: Functions, modules, structs, macros, imports, calls, parameters
Correct module prefixing: module Foo → functions become Foo.bar (line 96-98)
Proper short-form handling: add(x,y) = x + y detected as function (lines 140-175)
Qualified call support: Repository.save() → receiver=Repository, name=save (lines 426-437)
Import handling: Multiple import forms (using, import, selected_import) properly parsed (lines 321-383)
AST node extraction: Proper JULIA_AST_CONFIG for string literals (helpers.rs:363-371)

Rust Code Quality:

Follows established patterns from other extractors
Proper error handling with safe node access
Good separation of concerns with dedicated handler functions
Comprehensive test coverage including edge cases

Parity with WASM: ✅ High fidelity
The native implementation closely mirrors the WASM extractor in julia.ts. Key behavioral matches:

Module-prefixed names (WASM line 96, Rust line 96-98)
Macro prefix handling (@macroname format)
Same import resolution logic
Identical struct field extraction
Parameter extraction from function signatures

Architecture Compliance:

Follows dual-engine architecture correctly
Proper integration with LanguageKind::Julia
Correct AST config mapping for TypeScript side
File collector integration follows patterns

Minor Observations

String prefixes: Julia JULIA_AST_CONFIG has empty string_prefixes which is correct (unlike Python's rbfuRBFU or C++'s LuUR)
Test coverage: Both unit tests and integration tests via resolution benchmarks
Documentation: Good inline comments explaining tree-sitter-julia grammar shapes
Complexity/CFG support: Properly integrated via compute_all_metrics and build_function_cfg calls

Critical Concerns

None identified. This is a well-implemented addition that follows all established patterns.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: High-quality implementation following established patterns with comprehensive test coverage. Fills a gap in native parsing support without introducing risk. The only reason this isn't 5-star is that it's feature parity work rather than groundbreaking innovation, but it's executed excellently.

Specific Strengths:

Exact behavioral parity between WASM and native engines
Comprehensive test coverage across multiple test suites
Proper integration with all codegraph subsystems
Follows zero-dep and foundation principles perfectly
No configuration compromises or test weakening

greptile-apps · 2026-05-11T09:42:42Z

Greptile Summary

This PR ports the Julia symbol extractor from the WASM/TypeScript path to native Rust, adding tree-sitter-julia, a new JuliaExtractor, and all supporting infrastructure to treat .jl files as a first-class native language. All three bugs identified in earlier review rounds have been addressed before merge.

New extractors/julia.rs: open-coded recursive walker threading current_module state; handles function defs, short-form assignments, structs, abstract types, macros, qualified calls, and using/import/selected_import. Regression tests for every previously-flagged edge case are included.
Infrastructure wiring: LanguageKind::Julia added to parser_registry, .jl added to SUPPORTED_EXTENSIONS and NATIVE_SUPPORTED_EXTENSIONS, removed from the WASM-only skip list in change_detection, and JULIA_AST_CONFIG/JULIA_AST_TYPES/JULIA_STRING_CONFIG added for AST-node classification on both sides.

Confidence Score: 5/5

Safe to merge — the extractor is well-tested, all previously-flagged regressions are fixed, and the wiring changes are mechanical and exhaustive.

The new Julia extractor handles every documented CST shape correctly. The recursive find_base_name helper cleanly addresses the parameterized-type corner cases, and the qualified-name double-prefix bug is guarded in both handle_function_def and handle_assignment. The only remaining limitation is multi-module using Foo, Bar producing a single import record, a pre-existing characteristic that mirrors the WASM extractor and does not affect definition or call extraction.

crates/codegraph-core/src/extractors/julia.rs — specifically the handle_import multi-module path, which would benefit from a using Foo, Bar test.

Important Files Changed

Filename	Overview
crates/codegraph-core/src/extractors/julia.rs	New 756-line native Julia extractor. All three previously-flagged regressions fixed. One remaining edge case: multi-module using Foo, Bar emits a single mis-sourced import record.
crates/codegraph-core/src/extractors/helpers.rs	Adds JULIA_AST_CONFIG with correct string_types for string_literal and prefixed_string_literal. No issues.
crates/codegraph-core/src/parser_registry.rs	Adds Julia variant to LanguageKind enum, maps .jl extension, wires tree-sitter-julia, adds Julia to all() list, updates EXPECTED_LEN to 27.
crates/codegraph-core/src/file_collector.rs	Adds .jl to SUPPORTED_EXTENSIONS and updates doc comment. Consistent with change_detection update.
src/ast-analysis/rules/index.ts	Adds JULIA_AST_TYPES and JULIA_STRING_CONFIG, registers both in AST_TYPE_MAPS and AST_STRING_CONFIGS. Correct.
src/domain/parser.ts	Adds .jl to NATIVE_SUPPORTED_EXTENSIONS. Correct.
crates/codegraph-core/src/change_detection.rs	Removes julia/main.jl from WASM-only skip list test, consistent with Julia now being natively collected.
tests/parsers/native-drop-classification.test.ts	Removes .jl from unsupported-by-native list and decrements expected count from 9 to 8. Correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[".jl file ingested"] --> B["LanguageKind::Julia\nparser_registry.rs"]
    B --> C["tree-sitter-julia parse Tree"]
    C --> D["JuliaExtractor.extract()"]
    D --> E["walk_julia\ncurrent_module threading"]
    E --> G["handle_module_def\npush Definition + set next_module"]
    E --> H["handle_function_def\npush Definition + complexity/CFG"]
    E --> I["handle_assignment\nshort-form functions"]
    E --> J["handle_struct_def\nfind_base_name + ClassRelation"]
    E --> K["handle_abstract_def\nfind_base_name"]
    E --> L["handle_macro_def\npush Definition @name"]
    E --> M["handle_import\npush Import source+names"]
    E --> N["handle_call\npush Call name+receiver"]
    E --> E
    D --> O["walk_ast_nodes_with_config\nJULIA_AST_CONFIG"]
    D --> P["FileSymbols output"]

_{Reviews (6): Last reviewed commit: "fix: resolve merge conflicts with main (..." | Re-trigger Greptile}

greptile-apps · 2026-05-11T09:42:46Z

+                let last = stripped.rsplit('.').next().unwrap_or(stripped);
+                if !last.is_empty() {
+                    names.push(last.to_string());
+                }
+            }
+            "selected_import" => {
+                // First identifier is the source module; the rest are imported names.
+                let mut first = true;
+                for j in 0..child.child_count() {
+                    let Some(part) = child.child(j) else { continue };
+                    if part.kind() == "identifier" {
+                        let txt = node_text(&part, source).to_string();
+                        if first {
+                            if source_str.is_empty() {
+                                source_str = txt.clone();
+                            }
+                            first = false;
+                        } else {
+                            names.push(txt);
+                        }
+                    }
+                }
+            }


Garbled name for parameterized generic abstract types

When type_head exists but neither find_child(&th, "identifier") nor find_child(&bin, "identifier") succeeds — which happens for any parameterized abstract type like abstract type AbstractVector{T} <: AbstractArray{T,1} end — unwrap_or(th) falls back to the type_head node itself. node_text(&th, source) then returns the full raw text "AbstractVector{T} <: AbstractArray{T,1}", which gets pushed as a definition name. The TS counterpart (handleAbstractDef) simply returns early when no identifier is found, emitting nothing rather than a garbage name. This pollutes the code graph with a nonsensical definition key for any parameterized abstract type.

Fixed in a4a8b5c. handle_abstract_def now recurses through binary_expression, parametrized_type_expression, parameterized_identifier, type_parameter_list, and type_argument_list wrappers to find the base-name identifier, and returns early (matching the TS extractor) when none is found — no more falling back to unwrap_or(th) and emitting the raw Name{T} <: Super{T,1} text. Added a regression test asserting that abstract type AbstractVector{T} <: AbstractArray{T,1} end records only AbstractVector.

Already fixed in a4a8b5c (before the merge) — see existing reply. Adding the parameterized-struct fix in 47b9c1f addresses the symmetric issue in handle_struct_def.

github-actions · 2026-05-11T09:45:47Z

Codegraph Impact Analysis

38 functions changed → 24 callers affected across 2 files

detect_removed_skips_unsupported_extensions in crates/codegraph-core/src/change_detection.rs:776 (0 transitive callers)
JuliaExtractor.extract in crates/codegraph-core/src/extractors/julia.rs:11 (0 transitive callers)
walk_julia in crates/codegraph-core/src/extractors/julia.rs:24 (1 transitive callers)
handle_module_def in crates/codegraph-core/src/extractors/julia.rs:55 (2 transitive callers)
signature_call in crates/codegraph-core/src/extractors/julia.rs:80 (4 transitive callers)
handle_function_def in crates/codegraph-core/src/extractors/julia.rs:87 (2 transitive callers)
handle_assignment in crates/codegraph-core/src/extractors/julia.rs:143 (2 transitive callers)
handle_struct_def in crates/codegraph-core/src/extractors/julia.rs:183 (2 transitive callers)
handle_abstract_def in crates/codegraph-core/src/extractors/julia.rs:264 (2 transitive callers)
find_base_name in crates/codegraph-core/src/extractors/julia.rs:303 (4 transitive callers)
handle_macro_def in crates/codegraph-core/src/extractors/julia.rs:333 (2 transitive callers)
handle_import in crates/codegraph-core/src/extractors/julia.rs:368 (2 transitive callers)
handle_call in crates/codegraph-core/src/extractors/julia.rs:438 (2 transitive callers)
extract_julia_params in crates/codegraph-core/src/extractors/julia.rs:503 (4 transitive callers)
parse_jl in crates/codegraph-core/src/extractors/julia.rs:542 (16 transitive callers)
finds_function in crates/codegraph-core/src/extractors/julia.rs:552 (0 transitive callers)
finds_short_form_function in crates/codegraph-core/src/extractors/julia.rs:561 (0 transitive callers)
module_prefixes_inner_functions in crates/codegraph-core/src/extractors/julia.rs:577 (0 transitive callers)
extracts_struct_with_fields_and_supertype in crates/codegraph-core/src/extractors/julia.rs:585 (0 transitive callers)
extracts_struct_without_supertype in crates/codegraph-core/src/extractors/julia.rs:606 (0 transitive callers)

…extractor

Native Julia support landed in this PR but the JS-side mirror of the Rust LanguageKind enum was not updated, so the drift guard in tests/parsers/native-drop-classification.test.ts (and the WASM-only bucket in classifyNativeDrops) flagged .jl as missing. Add .jl to the set and drop it from the WASM-only test fixture.

handle_abstract_def previously fell back to the type_head node itself when no plain identifier was found, dumping the full raw text "Name{T} <: Super{T,1}" as a definition name for any parameterized generic abstract type. The TS counterpart returns early on no-name; the native port should match. Recurse into wrapper shapes (binary_expression, parameterized identifier, type_parameter_list, type_argument_list) to locate the base identifier, and skip emission when none is found. Adds a regression test asserting the base name "AbstractVector" for "abstract type AbstractVector{T} <: AbstractArray{T,1} end".

carlos-alm · 2026-05-12T02:09:00Z

Pushed fixes:

3393ce7 — fix(parser): register .jl in NATIVE_SUPPORTED_EXTENSIONS. Closes the CI failure in tests/parsers/native-drop-classification.test.ts (drift guard between parser_registry.rs and the JS-side mirror). Also drops .jl from the WASM-only fixture in the same test file.
a4a8b5c — fix(julia): emit base name for parameterized abstract types. Addresses Greptile P1: handle_abstract_def no longer falls back to unwrap_or(th) and dumps raw Name{T} <: Super{T,1} text. Recurses through wrapper shapes (binary_expression, parametrized_type_expression, parameterized_identifier, type_parameter_list, type_argument_list) to locate the base identifier; returns early when none is found, matching the TS extractor. Added a regression test (extracts_parameterized_abstract_type_base_name).

Also merged origin/main (clean merge — no conflicts).

Local verification:

cargo test -p codegraph-core --lib → 196 passed (12 Julia tests, +1 new)
npx vitest run tests/parsers/native-drop-classification.test.ts → 13 passed
npx biome check on changed files → clean

@greptileai

greptile-apps · 2026-05-12T02:13:07Z

+    let (name_node, supertype): (Node, Option<Node>) = if let Some(bin) =
+        find_child(&type_head, "binary_expression")
+    {
+        // First identifier is the struct name, last identifier (after `<:`) is the supertype.
+        let mut name_id: Option<Node> = None;
+        let mut super_id: Option<Node> = None;
+        for i in 0..bin.child_count() {
+            if let Some(c) = bin.child(i) {
+                if c.kind() == "identifier" {
+                    if name_id.is_none() {
+                        name_id = Some(c);
+                    } else {
+                        super_id = Some(c);
+                    }
+                }
+            }
+        }
+        match name_id {
+            Some(n) => (n, super_id),
+            None => return,
+        }
+    } else if let Some(id) = find_child(&type_head, "identifier") {
+        (id, None)
+    } else {
+        return;
+    };


Parameterized generic structs silently dropped

handle_struct_def only looks for direct identifier children of type_head and direct identifier children of any binary_expression inside it. For a parameterized struct like struct Vec{T} or struct Vec{T} <: AbstractArray{T,1}, tree-sitter-julia wraps the name in a parameterized_identifier node — so find_child(&type_head, "identifier") returns None and bin.child(i).kind() == "identifier" never matches. Both forms fall through to return with no definition emitted. The same CST shapes were exactly what motivated the find_abstract_name recursive helper added for handle_abstract_def — handle_struct_def needs the same treatment to extract the base-name identifier and, separately, walk to the first identifier of the RHS for the supertype.

Fixed in 47b9c1f. handle_struct_def now uses the same recursive find_base_name helper as handle_abstract_def, which walks into parameterized_identifier, parametrized_type_expression, type_parameter_list, type_argument_list, and nested binary_expression wrappers — so struct Vec{T} and struct Vec{T} <: AbstractArray{T,1} both record Vec as the base name and AbstractArray as the supertype. Added extracts_parameterized_struct_base_name to lock the behavior in.

greptile-apps · 2026-05-12T02:13:07Z

+            "selected_import" => {
+                // First identifier is the source module; the rest are imported names.
+                let mut first = true;
+                for j in 0..child.child_count() {
+                    let Some(part) = child.child(j) else { continue };
+                    if part.kind() == "identifier" {
+                        let txt = node_text(&part, source).to_string();
+                        if first {
+                            if source_str.is_empty() {
+                                source_str = txt.clone();
+                            }
+                            first = false;
+                        } else {
+                            names.push(txt);
+                        }
+                    }
+                }
+            }


Qualified-module selected_import produces a wrong import record

The selected_import inner loop only matches part.kind() == "identifier", but for import Foo.Bar: baz the module part is a scoped_identifier (Foo.Bar), not an identifier. The loop skips it, so first is still true when baz is encountered — source_str becomes "baz" and no names are pushed. The emitted import record has source = "baz" with names = ["baz"], making the edge (module→imported-name) completely wrong. This pattern is common in Julia codebases (e.g. import LinearAlgebra.BLAS: gemm). Handling "scoped_identifier" alongside "identifier" inside the selected_import branch, using rsplit('.').next() for the display name as the outer loop already does, would fix it.

Fixed in 47b9c1f. The selected_import inner loop now matches scoped_identifier alongside identifier, and uses rsplit('.').next() for the imported-name display (mirroring the outer loop). import LinearAlgebra.BLAS: gemm now records source="LinearAlgebra.BLAS", names=["gemm"]. Added selected_import_handles_qualified_module to lock the behavior in.

greptile-apps · 2026-05-12T03:30:18Z

+            line: start_line(node),
+        });
+    }
+
+    symbols.definitions.push(Definition {
+        name: struct_name,
+        kind: "struct".to_string(),
+        line: start_line(node),
+        end_line: Some(end_line(node)),
+        decorators: None,


Module double-prefix for qualified short-form method extensions

When func_name_node is a scoped_identifier — as it is for Foo.bar(x, y) = x + y (extending a method from another module) — base already contains the full qualified name "Foo.bar". Prepending current_module then produces "Outer.Foo.bar" instead of the correct "Foo.bar". The guard if base.contains('.') (skip the prefix) would fix both handle_assignment and the symmetric path in handle_function_def (line 186) where call_sig.child(0) can be a scoped_identifier for function Base.show(io, x) ... end inside a module, yielding "Foo.Base.show".

Fixed in 47b9c1f. Both handle_function_def and handle_assignment now guard the module-prefix step with if !base.contains('.') — when the LHS is already a scoped_identifier (e.g. function Base.show ... end or Foo.bar(x, y) = x + y inside a module), the qualifier is preserved as-is rather than double-prefixed. Added qualified_short_form_method_does_not_double_prefix and qualified_function_def_does_not_double_prefix to lock the behavior in.

…ified selected imports (#1098)

carlos-alm · 2026-05-12T06:26:44Z

@greptileai

feat(native): port Julia extractor to Rust

b47d387

Adds tree-sitter-julia dependency and native extractor matching the WASM-side behavior for Julia symbol, import, and call extraction. Part of #1071

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

carlos-alm added 3 commits May 11, 2026 20:00

Merge remote-tracking branch 'origin/main' into feat/1071-julia-rust-…

65488ba

…extractor

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

Merge branch 'main' into feat/1071-julia-rust-extractor

058d812

greptile-apps Bot reviewed May 12, 2026

View reviewed changes

carlos-alm added 2 commits May 11, 2026 23:42

fix: resolve merge conflicts with main

b940cf2

fix(native/julia): handle parameterized structs, qualified defs, qual…

47b9c1f

…ified selected imports (#1098)

carlos-alm mentioned this pull request May 12, 2026

Julia WASM extractor has the same parameterized-type / qualified-def / qualified-selected-import bugs that #1098 fixed in native #1111

Open

4 tasks

fix: resolve merge conflicts with main (cuda integration)

3946332

carlos-alm merged commit aa7812b into main May 12, 2026
27 checks passed

carlos-alm deleted the feat/1071-julia-rust-extractor branch May 12, 2026 06:58

github-actions Bot locked and limited conversation to collaborators May 12, 2026

Conversation

carlos-alm commented May 11, 2026

Summary

Test plan

Uh oh!

claude Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing Julia Extractor Implementation

Root Cause Analysis

Backlog Compliance

Configuration & Test Analysis

Code Quality Assessment

Minor Observations

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 12, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented May 12, 2026

Uh oh!

greptile-apps Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 12, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 11, 2026 •

edited

Loading

greptile-apps Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading