feat(native): port Erlang extractor to Rust by carlos-alm · Pull Request #1103 · optave/ops-codegraph-tool

carlos-alm · 2026-05-11T20:43:13Z

Summary

Adds tree-sitter-erlang dependency and a native Erlang extractor in crates/codegraph-core/src/extractors/erlang.rs.
Registers .erl/.hrl with LanguageKind::Erlang and the Rust file_collector, removes Erlang from the WASM-only drop list, and wires ERLANG_AST_CONFIG (string literals) on both the native and JS sides.
Mirrors extractErlangSymbols: module declarations as module, function clauses (deduplicated per arity) as function, records as record, type aliases/opaque as type, macros as variable, plus -include/-include_lib/-import directives and module:function/local call extraction.

Test plan

cargo build --release -p codegraph-core (clean build)
cargo test -p codegraph-core --lib — 191/191 (7 new erlang extractor tests)
npx tree-sitter build --wasm regenerates tree-sitter-erlang.wasm
npx vitest run tests/parsers/erlang.test.ts — 5/5
npx vitest run tests/parsers/native-drop-classification.test.ts — 13/13

Adds tree-sitter-erlang dependency and native extractor matching the WASM-side behavior for Erlang symbol, import, and call extraction. Part of #1071

claude · 2026-05-11T20:43:28Z

Claude finished @carlos-alm's task —— View job

Root Cause Analysis

Problem Being Solved: This PR ports the Erlang extractor from WASM-only to native Rust implementation, achieving engine parity. Currently, Erlang files are only parsed by the WASM engine, causing performance overhead and engine inconsistencies.

Why This Approach: Following the established dual-engine architecture pattern used for all other supported languages. The native Rust implementation provides better performance and eliminates the need for WASM fallback on Erlang files.

Risk Assessment: Low risk - this is additive functionality following existing patterns. The implementation mirrors the existing WASM extractErlangSymbols function and includes comprehensive test coverage.

Backlog Compliance

Zero-dep: ✓ - Only adds tree-sitter-erlang = "0.16" as a parsing dependency, which follows the existing pattern for all other languages
Foundation-aligned: ✓ - Fully compliant with all principles. Adds local parsing capability without external dependencies or cloud services
Problem-fit: 4/5 - Directly improves native engine coverage and parsing performance for Erlang codebases
Breaking: No - Purely additive, no existing behavior changes
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Phase-by-Phase Analysis

Phase 2: Configuration Changes ✓

No concerning rule relaxations found. All changes are additive:

Cargo.toml: adds tree-sitter-erlang dependency
No timeout increases, test threshold reductions, or linting rule disables

Phase 3: Test Integrity ✓

Strong test coverage across both native and WASM engines:

Native Rust: 7 new extractor tests in erlang.rs:324-396 covering module, function, record, import, call, include, and multi-clause deduplication
WASM JS: 5 tests in tests/parsers/erlang.test.ts verifying equivalent extraction
Resolution benchmark: Hand-annotated expected edges in tests/benchmarks/resolution/fixtures/erlang/expected-edges.json
Author claims 191/191 cargo tests and 5/5 vitest tests passing

Phase 4: Code Quality Analysis ✅

Architecture Adherence: Perfect consistency with existing patterns

Follows identical structure to other language extractors (walk_tree + walk_ast_nodes_with_config)
Proper registration in parser_registry.rs with .erl/.hrl extensions
Native/WASM AST config sync: ERLANG_AST_CONFIG mirrors JS ERLANG_AST_TYPES

Symbol Extraction Coverage: Comprehensive Erlang language support

Modules: -module(name) → kind: "module"
Functions: Multi-clause deduplication prevents duplicate definitions (lines 135-142)
Records: Field extraction with children (lines 62-77)
Types: -type and -opaque declarations → kind: "type"
Macros: -define(NAME, value) → kind: "variable" (appropriate choice)
Imports: -import and -include/-include_lib directives
Calls: Local and remote calls (module:function syntax) with proper receiver handling

Implementation Quality:

Robust error handling with early returns on missing nodes
Memory safety through proper tree-sitter node handling
Legacy grammar compatibility (lines 276-287) handles different tree-sitter-erlang versions
Consistent use of helper functions (find_child, node_text, start_line)

Engine Parity: Excellent synchronization

Native ERLANG_AST_CONFIG exactly matches JS ERLANG_AST_TYPES (both: string literals only)
Extension registration sync: .erl/.hrlin bothNATIVE_SUPPORTED_EXTENSIONSandLANGUAGE_REGISTRY`
Test removes Erlang from WASM-only drop list (tests/parsers/native-drop-classification.test.ts:80-81)

Minor Observations:

Function deduplication logic (lines 135-142) is sound but could benefit from a comment explaining multi-clause Erlang functions
String literal handling uses double quotes only, appropriate for Erlang
Call extraction handles both legacy and modern tree-sitter-erlang grammar shapes

Phase 5: Final Assessment

No Critical Issues Identified

This is a well-executed language port that follows all established patterns and provides comprehensive Erlang support. The dual-engine architecture requires perfect parity between native and WASM implementations, and this PR achieves that through:

Comprehensive symbol extraction mirroring the WASM implementation
Proper AST configuration synchronization
Thorough test coverage across both engines
Clean integration with existing infrastructure

The implementation quality is high, following Rust best practices and the codebase's established patterns. Engine parity compliance is excellent.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: High-quality implementation that properly extends native engine coverage to Erlang. Follows all established patterns, includes comprehensive tests, and maintains perfect engine parity. This is exactly the kind of incremental language support addition the codebase is designed for.

greptile-apps · 2026-05-11T20:49:07Z

Greptile Summary

This PR ports the Erlang symbol extractor from WASM/JS to native Rust, adding tree-sitter-erlang as a dependency and wiring .erl/.hrl files through the native pipeline. It also backports several robustness fixes to the JS extractor so both engines remain in sync.

New erlang.rs extractor handles module declarations, function clauses (deduplicated per name/arity), records, type aliases/opaque, macros, include directives, import attributes, and qualified calls.
JS erlang.ts updated to mirror all Rust fixes: childForFieldName with findChild fallback, namedChild iteration for params, arity-aware dedup, and type_name-wrapped atom fallback.
Registration plumbing and AST-node config added consistently across both the Rust and TS layers.

Confidence Score: 5/5

Safe to merge — purely additive change that adds a new language extractor without touching existing extraction paths.

All three previously flagged correctness issues (arity dedup, complex-pattern parameter counting, module-attr field-name fragility) are resolved with dedicated tests. The Rust and JS extractors are kept in sync throughout.

No files require special attention.

Important Files Changed

Filename	Overview
crates/codegraph-core/src/extractors/erlang.rs	New 456-line Erlang extractor; previously flagged issues (arity dedup, complex-pattern params, module attr field name) are all addressed.
src/extractors/erlang.ts	JS extractor updated to mirror Rust fixes; minor duplicate findChild call in handleTypeAlias.
crates/codegraph-core/src/parser_registry.rs	Registers Erlang language kind, maps .erl/.hrl extensions, wires grammar, updates exhaustiveness count to 26.
tests/parsers/erlang.test.ts	Adds two new TS tests covering distinct-arity preservation and complex-pattern arity counting.
tests/parsers/native-drop-classification.test.ts	Removes .erl from unsupported-by-native list, updates expected count from 10 to 9.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["'.erl' / '.hrl' file"] --> B{Native path?}
    B -- yes --> C["LanguageKind::Erlang"]
    B -- no --> D["WASM JS path"]
    C --> E["tree-sitter-erlang parse"]
    E --> F["ErlangExtractor::extract"]
    F --> G["walk_tree -> match_erlang_node"]
    G --> G1["module_attribute -> Definition (module)"]
    G --> G2["fun_decl -> Definition (function, name/arity dedup)"]
    G --> G3["record_decl -> Definition (record) + fields"]
    G --> G4["type_alias / opaque -> Definition (type)"]
    G --> G5["pp_define -> Definition (variable/macro)"]
    G --> G6["pp_include -> Import"]
    G --> G7["import_attribute -> Import"]
    G --> G8["call -> Call (local or module:func)"]
    F --> H["walk_ast_nodes_with_config (string literals)"]
    G1 & G2 & G3 & G4 & G5 & G6 & G7 & G8 & H --> I["FileSymbols"]
    D --> J["ExtractorOutput"]

_{Reviews (3): Last reviewed commit: "fix(extractors): align Erlang record_dec..." | Re-trigger Greptile}

greptile-apps · 2026-05-11T20:49:11Z

+    // Don't duplicate if we already have this function
+    if symbols
+        .definitions
+        .iter()
+        .any(|d| d.name == name && d.kind == "function")
+    {
+        return;
+    }


Deduplication ignores arity, silently drops overloaded functions

The guard checks only d.name == name, so when a module defines foo/1 and foo/2 as two separate fun_decl nodes, the second fun_decl is processed, hits the check, finds the already-registered "foo" entry, and returns early. Only the first-encountered arity survives in definitions. Erlang's overloading by arity is idiomatic and common, so this will silently omit real definitions in virtually every non-trivial module. The fix is to compute arity before the guard and include it in the comparison, for example by counting expr_args children upfront and matching against d.children.as_ref().map_or(0, |c| c.len()).

Fixed in 4c3c13f — dedup now compares (name, arity) so all clauses (foo/1, foo/2, ...) survive. Added a test covering three arities for the same name.

greptile-apps · 2026-05-11T20:49:12Z

+    for i in 0..args_node.child_count() {
+        let child = match args_node.child(i) {
+            Some(c) => c,
+            None => continue,
+        };
+        if child.kind() == "var" || child.kind() == "atom" {
+            params.push(child_def(
+                node_text(&child, source).to_string(),
+                "parameter",
+                start_line(&child),
+            ));
+        }
+    }


Parameters only capture bare var and atom patterns; complex pattern-match arguments (tuples {ok, X}, lists [H|T], binaries <<Bin>>) are silently skipped. For Erlang functions that rely on pattern-matching to distinguish clauses, the extracted param list will be shorter than the actual arity, which will interact badly with any arity-based deduplication fix. Consider counting all non-punctuation children of expr_args to get a stable arity count independent of pattern complexity.

Suggested change

for i in 0..args_node.child_count() {

let child = match args_node.child(i) {

Some(c) => c,

None => continue,

};

if child.kind() == "var" || child.kind() == "atom" {

params.push(child_def(

node_text(&child, source).to_string(),

"parameter",

start_line(&child),

));

}

}

for i in 0..args_node.child_count() {

let child = match args_node.child(i) {

Some(c) => c,

None => continue,

};

// Skip punctuation so every argument pattern counts as one parameter.

if matches!(child.kind(), "," | "(" | ")") {

continue;

}

let label = if child.kind() == "var" || child.kind() == "atom" {

node_text(&child, source).to_string()

} else {

format!("_{}", i) // placeholder for complex patterns

};

params.push(child_def(label, "parameter", start_line(&child)));

}

Fixed in 4c3c13f — now iterates named children so every argument pattern (tuple, list, binary) counts as one parameter. Complex patterns get a positional placeholder label so arity is preserved. Added a test for {ok, X} / [H|T] arguments.

greptile-apps · 2026-05-11T20:49:13Z

+fn handle_module_attr(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
+    // module_attribute: - module ( atom ) .
+    let name_node = match find_child(node, "atom") {
+        Some(n) => n,
+        None => return,
+    };


handle_module_attr uses find_child(node, "atom") which returns the first atom child. In tree-sitter-erlang v0.16 the -module(name). attribute produces a module_attribute whose first child is often the anonymous keyword token module, but if that keyword is represented as a named atom node (grammar-version dependent), the extracted name would be the literal string "module" rather than the actual module name. The existing test passes on the current grammar version, but it is fragile. Using a field name lookup — node.child_by_field_name("name") — is more robust if the grammar exposes it.

Suggested change

fn handle_module_attr(node: &Node, source: &[u8], symbols: &mut FileSymbols) {

// module_attribute: - module ( atom ) .

let name_node = match find_child(node, "atom") {

Some(n) => n,

None => return,

};

fn handle_module_attr(node: &Node, source: &[u8], symbols: &mut FileSymbols) {

// module_attribute: - module ( atom ) .

// Prefer the named field to avoid accidentally picking up the `module`

// keyword itself if the grammar represents it as an atom node.

let name_node = node

.child_by_field_name("name")

.or_else(|| find_child(node, "atom"))

.filter(|n| node_text(n, source) != "module");

let name_node = match name_node {

Some(n) => n,

None => return,

};

Fixed in 4c3c13f — handle_module_attr now prefers child_by_field_name("name") and only falls back to find_child(node, "atom") if the grammar doesn't expose the field. handle_function_clause and extract_params use the same approach for the function-clause name and args. The behavior matches the current grammar's tests and is robust to future grammar revisions.

github-actions · 2026-05-11T20:51:10Z

Codegraph Impact Analysis

33 functions changed → 17 callers affected across 3 files

ErlangExtractor.extract in crates/codegraph-core/src/extractors/erlang.rs:9 (0 transitive callers)
match_erlang_node in crates/codegraph-core/src/extractors/erlang.rs:17 (0 transitive callers)
handle_module_attr in crates/codegraph-core/src/extractors/erlang.rs:37 (1 transitive callers)
handle_record_decl in crates/codegraph-core/src/extractors/erlang.rs:62 (1 transitive callers)
handle_type_alias in crates/codegraph-core/src/extractors/erlang.rs:103 (1 transitive callers)
handle_fun_decl in crates/codegraph-core/src/extractors/erlang.rs:129 (1 transitive callers)
handle_function_clause in crates/codegraph-core/src/extractors/erlang.rs:139 (2 transitive callers)
extract_params in crates/codegraph-core/src/extractors/erlang.rs:181 (3 transitive callers)
handle_define in crates/codegraph-core/src/extractors/erlang.rs:210 (1 transitive callers)
handle_include in crates/codegraph-core/src/extractors/erlang.rs:236 (1 transitive callers)
handle_import_attr in crates/codegraph-core/src/extractors/erlang.rs:251 (1 transitive callers)
handle_call in crates/codegraph-core/src/extractors/erlang.rs:282 (1 transitive callers)
parse_erlang in crates/codegraph-core/src/extractors/erlang.rs:339 (9 transitive callers)
extracts_module_declaration in crates/codegraph-core/src/extractors/erlang.rs:349 (0 transitive callers)
extracts_function_definition in crates/codegraph-core/src/extractors/erlang.rs:360 (0 transitive callers)
extracts_record_definition in crates/codegraph-core/src/extractors/erlang.rs:371 (0 transitive callers)
extracts_import_attribute in crates/codegraph-core/src/extractors/erlang.rs:386 (0 transitive callers)
extracts_function_calls in crates/codegraph-core/src/extractors/erlang.rs:396 (0 transitive callers)
extracts_include_directive in crates/codegraph-core/src/extractors/erlang.rs:402 (0 transitive callers)
deduplicates_multi_clause_function in crates/codegraph-core/src/extractors/erlang.rs:408 (0 transitive callers)

…1103) - Dedupe Erlang function defs by (name, arity) so foo/1 and foo/2 are both kept - Count every argument pattern (tuple, list, binary) as one parameter via named children, using placeholder labels for complex patterns - Prefer the named 'name'/'args' fields for module attributes and clause args, falling back to the previous atom/expr_args lookups - Add Rust and TS tests covering multi-arity overloads and complex pattern args

…-field fallback (#1103) - Rust handle_record_decl now prefers child_by_field_name("name") before falling back to find_child(atom), matching the other Erlang handlers and avoiding accidental keyword pickup if the grammar exposes 'record' as a named atom. - TypeScript handleTypeAlias now mirrors the Rust type_name->atom fallback so the two engines agree when the grammar wraps the alias name in a type_name node.

feat(native): port Erlang extractor to Rust

265c0c0

Adds tree-sitter-erlang dependency and native extractor matching the WASM-side behavior for Erlang symbol, import, and call extraction. Part of #1071

greptile-apps Bot reviewed May 11, 2026

View reviewed changes

carlos-alm added 3 commits May 11, 2026 23:33

fix: resolve merge conflicts with main

b5bd4ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(native): port Erlang extractor to Rust#1103

feat(native): port Erlang extractor to Rust#1103
carlos-alm wants to merge 4 commits into
mainfrom
feat/1071-erlang-rust-extractor

carlos-alm commented May 11, 2026

Uh oh!

claude Bot commented May 11, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 11, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

carlos-alm May 12, 2026

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

carlos-alm May 12, 2026

Uh oh!

greptile-apps Bot May 11, 2026

Uh oh!

carlos-alm May 12, 2026

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented May 11, 2026

Summary

Test plan

Uh oh!

claude Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause Analysis

Backlog Compliance

Phase-by-Phase Analysis

Phase 2: Configuration Changes ✓

Phase 3: Test Integrity ✓

Phase 4: Code Quality Analysis ✅

Phase 5: Final Assessment

Final Recommendation

Uh oh!

greptile-apps Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 11, 2026 •

edited

Loading

greptile-apps Bot commented May 11, 2026 •

edited

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading