feat(native): port Erlang extractor to Rust#1103
Conversation
Adds tree-sitter-erlang dependency and native extractor matching the WASM-side behavior for Erlang symbol, import, and call extraction. Part of #1071
|
Claude finished @carlos-alm's task —— View job Root Cause AnalysisProblem Being Solved: This PR ports the Erlang extractor from WASM-only to native Rust implementation, achieving engine parity. Currently, Erlang files are only parsed by the WASM engine, causing performance overhead and engine inconsistencies. Why This Approach: Following the established dual-engine architecture pattern used for all other supported languages. The native Rust implementation provides better performance and eliminates the need for WASM fallback on Erlang files. Risk Assessment: Low risk - this is additive functionality following existing patterns. The implementation mirrors the existing WASM Backlog Compliance
Phase-by-Phase AnalysisPhase 2: Configuration Changes ✓No concerning rule relaxations found. All changes are additive:
Phase 3: Test Integrity ✓Strong test coverage across both native and WASM engines:
Phase 4: Code Quality Analysis ✅Architecture Adherence: Perfect consistency with existing patterns
Symbol Extraction Coverage: Comprehensive Erlang language support
Implementation Quality:
Engine Parity: Excellent synchronization
Minor Observations:
Phase 5: Final AssessmentNo Critical Issues Identified This is a well-executed language port that follows all established patterns and provides comprehensive Erlang support. The dual-engine architecture requires perfect parity between native and WASM implementations, and this PR achieves that through:
The implementation quality is high, following Rust best practices and the codebase's established patterns. Engine parity compliance is excellent. Final Recommendation
|
Greptile SummaryThis PR ports the Erlang symbol extractor from WASM/JS to native Rust, adding
Confidence Score: 5/5Safe to merge — purely additive change that adds a new language extractor without touching existing extraction paths. All three previously flagged correctness issues (arity dedup, complex-pattern parameter counting, module-attr field-name fragility) are resolved with dedicated tests. The Rust and JS extractors are kept in sync throughout. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["'.erl' / '.hrl' file"] --> B{Native path?}
B -- yes --> C["LanguageKind::Erlang"]
B -- no --> D["WASM JS path"]
C --> E["tree-sitter-erlang parse"]
E --> F["ErlangExtractor::extract"]
F --> G["walk_tree -> match_erlang_node"]
G --> G1["module_attribute -> Definition (module)"]
G --> G2["fun_decl -> Definition (function, name/arity dedup)"]
G --> G3["record_decl -> Definition (record) + fields"]
G --> G4["type_alias / opaque -> Definition (type)"]
G --> G5["pp_define -> Definition (variable/macro)"]
G --> G6["pp_include -> Import"]
G --> G7["import_attribute -> Import"]
G --> G8["call -> Call (local or module:func)"]
F --> H["walk_ast_nodes_with_config (string literals)"]
G1 & G2 & G3 & G4 & G5 & G6 & G7 & G8 & H --> I["FileSymbols"]
D --> J["ExtractorOutput"]
Reviews (3): Last reviewed commit: "fix(extractors): align Erlang record_dec..." | Re-trigger Greptile |
| // Don't duplicate if we already have this function | ||
| if symbols | ||
| .definitions | ||
| .iter() | ||
| .any(|d| d.name == name && d.kind == "function") | ||
| { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Deduplication ignores arity, silently drops overloaded functions
The guard checks only d.name == name, so when a module defines foo/1 and foo/2 as two separate fun_decl nodes, the second fun_decl is processed, hits the check, finds the already-registered "foo" entry, and returns early. Only the first-encountered arity survives in definitions. Erlang's overloading by arity is idiomatic and common, so this will silently omit real definitions in virtually every non-trivial module. The fix is to compute arity before the guard and include it in the comparison, for example by counting expr_args children upfront and matching against d.children.as_ref().map_or(0, |c| c.len()).
There was a problem hiding this comment.
Fixed in 4c3c13f — dedup now compares (name, arity) so all clauses (foo/1, foo/2, ...) survive. Added a test covering three arities for the same name.
| for i in 0..args_node.child_count() { | ||
| let child = match args_node.child(i) { | ||
| Some(c) => c, | ||
| None => continue, | ||
| }; | ||
| if child.kind() == "var" || child.kind() == "atom" { | ||
| params.push(child_def( | ||
| node_text(&child, source).to_string(), | ||
| "parameter", | ||
| start_line(&child), | ||
| )); | ||
| } | ||
| } |
There was a problem hiding this comment.
Parameters only capture bare
var and atom patterns; complex pattern-match arguments (tuples {ok, X}, lists [H|T], binaries <<Bin>>) are silently skipped. For Erlang functions that rely on pattern-matching to distinguish clauses, the extracted param list will be shorter than the actual arity, which will interact badly with any arity-based deduplication fix. Consider counting all non-punctuation children of expr_args to get a stable arity count independent of pattern complexity.
| for i in 0..args_node.child_count() { | |
| let child = match args_node.child(i) { | |
| Some(c) => c, | |
| None => continue, | |
| }; | |
| if child.kind() == "var" || child.kind() == "atom" { | |
| params.push(child_def( | |
| node_text(&child, source).to_string(), | |
| "parameter", | |
| start_line(&child), | |
| )); | |
| } | |
| } | |
| for i in 0..args_node.child_count() { | |
| let child = match args_node.child(i) { | |
| Some(c) => c, | |
| None => continue, | |
| }; | |
| // Skip punctuation so every argument pattern counts as one parameter. | |
| if matches!(child.kind(), "," | "(" | ")") { | |
| continue; | |
| } | |
| let label = if child.kind() == "var" || child.kind() == "atom" { | |
| node_text(&child, source).to_string() | |
| } else { | |
| format!("_{}", i) // placeholder for complex patterns | |
| }; | |
| params.push(child_def(label, "parameter", start_line(&child))); | |
| } |
There was a problem hiding this comment.
Fixed in 4c3c13f — now iterates named children so every argument pattern (tuple, list, binary) counts as one parameter. Complex patterns get a positional placeholder label so arity is preserved. Added a test for {ok, X} / [H|T] arguments.
| fn handle_module_attr(node: &Node, source: &[u8], symbols: &mut FileSymbols) { | ||
| // module_attribute: - module ( atom ) . | ||
| let name_node = match find_child(node, "atom") { | ||
| Some(n) => n, | ||
| None => return, | ||
| }; |
There was a problem hiding this comment.
handle_module_attr uses find_child(node, "atom") which returns the first atom child. In tree-sitter-erlang v0.16 the -module(name). attribute produces a module_attribute whose first child is often the anonymous keyword token module, but if that keyword is represented as a named atom node (grammar-version dependent), the extracted name would be the literal string "module" rather than the actual module name. The existing test passes on the current grammar version, but it is fragile. Using a field name lookup — node.child_by_field_name("name") — is more robust if the grammar exposes it.
| fn handle_module_attr(node: &Node, source: &[u8], symbols: &mut FileSymbols) { | |
| // module_attribute: - module ( atom ) . | |
| let name_node = match find_child(node, "atom") { | |
| Some(n) => n, | |
| None => return, | |
| }; | |
| fn handle_module_attr(node: &Node, source: &[u8], symbols: &mut FileSymbols) { | |
| // module_attribute: - module ( atom ) . | |
| // Prefer the named field to avoid accidentally picking up the `module` | |
| // keyword itself if the grammar represents it as an atom node. | |
| let name_node = node | |
| .child_by_field_name("name") | |
| .or_else(|| find_child(node, "atom")) | |
| .filter(|n| node_text(n, source) != "module"); | |
| let name_node = match name_node { | |
| Some(n) => n, | |
| None => return, | |
| }; |
There was a problem hiding this comment.
Fixed in 4c3c13f — handle_module_attr now prefers child_by_field_name("name") and only falls back to find_child(node, "atom") if the grammar doesn't expose the field. handle_function_clause and extract_params use the same approach for the function-clause name and args. The behavior matches the current grammar's tests and is robust to future grammar revisions.
Codegraph Impact Analysis33 functions changed → 17 callers affected across 3 files
|
…1103) - Dedupe Erlang function defs by (name, arity) so foo/1 and foo/2 are both kept - Count every argument pattern (tuple, list, binary) as one parameter via named children, using placeholder labels for complex patterns - Prefer the named 'name'/'args' fields for module attributes and clause args, falling back to the previous atom/expr_args lookups - Add Rust and TS tests covering multi-arity overloads and complex pattern args
…-field fallback (#1103) - Rust handle_record_decl now prefers child_by_field_name("name") before falling back to find_child(atom), matching the other Erlang handlers and avoiding accidental keyword pickup if the grammar exposes 'record' as a named atom. - TypeScript handleTypeAlias now mirrors the Rust type_name->atom fallback so the two engines agree when the grammar wraps the alias name in a type_name node.
Summary
tree-sitter-erlangdependency and a native Erlang extractor incrates/codegraph-core/src/extractors/erlang.rs..erl/.hrlwithLanguageKind::Erlangand the Rustfile_collector, removes Erlang from the WASM-only drop list, and wiresERLANG_AST_CONFIG(string literals) on both the native and JS sides.extractErlangSymbols: module declarations asmodule, function clauses (deduplicated per arity) asfunction, records asrecord, type aliases/opaque astype, macros asvariable, plus-include/-include_lib/-importdirectives andmodule:function/local call extraction.Closes #1071
Test plan
cargo build --release -p codegraph-core(clean build)cargo test -p codegraph-core --lib— 191/191 (7 new erlang extractor tests)npx tree-sitter build --wasmregeneratestree-sitter-erlang.wasmnpx vitest run tests/parsers/erlang.test.ts— 5/5npx vitest run tests/parsers/native-drop-classification.test.ts— 13/13