Skip to content

feat(native): port Gleam extractor to Rust#1105

Open
carlos-alm wants to merge 4 commits into
mainfrom
feat/1071-gleam-rust-extractor
Open

feat(native): port Gleam extractor to Rust#1105
carlos-alm wants to merge 4 commits into
mainfrom
feat/1071-gleam-rust-extractor

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Adds tree-sitter-gleam dependency and a native Gleam extractor in crates/codegraph-core/src/extractors/gleam.rs.
  • Registers .gleam with LanguageKind::Gleam and the Rust file_collector, adds Gleam to NATIVE_SUPPORTED_EXTENSIONS on the JS side, and wires GLEAM_AST_TYPES / GLEAM_AST_CONFIG on both the native and JS sides so the two engines extract identical ast_nodes for Gleam source.
  • Mirrors extractGleamSymbols: module-level function definitions as function (with parameter children), type definitions as type/record/enum (mapped from the Gleam node kind), constants as variable, import declarations, and function-application call extraction.

Closes #1071

Test plan

  • cargo build --release -p codegraph-core (clean build)
  • cargo test -p codegraph-core --lib — 193/193
  • npx tree-sitter build --wasm regenerates tree-sitter-gleam.wasm
  • npx vitest run tests/parsers/gleam.test.ts — 4/4
  • npx vitest run tests/parsers/native-drop-classification.test.ts — 13/13

@claude
Copy link
Copy Markdown

claude Bot commented May 11, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: feat(native): port Gleam extractor to Rust

Todo List:

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection - check for high-risk changes
  • Phase 3: Test integrity verification - ensure tests aren't weakened
  • Phase 4: Code quality assessment - review new Gleam extractor code
  • Phase 5: Backlog compliance evaluation - rate against BACKLOG.md criteria
  • Provide structured final assessment with rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR implements native Gleam extractor support to complete the dual-engine architecture. Issue #1071 requested Gleam support, and this adds the missing native Rust implementation to match the existing WASM extractor.

Why This Approach: The approach follows the established dual-engine pattern - both native (Rust) and WASM engines must produce identical results for all supported languages. The PR properly mirrors the existing extractGleamSymbols WASM implementation.

Risk Assessment: Low risk - this is additive functionality that doesn't modify existing behavior for other languages.


Phase 2: Configuration Change Detection ✅

Configuration Changes Analyzed:

  • Cargo.toml: Only adds tree-sitter-gleam = "1.0.0" dependency - legitimate language support addition
  • package-lock.json: No new runtime dependencies, only lockfile update
  • No linting/build configuration changes detected
  • No timeout or coverage threshold modifications

Assessment: Clean addition - no concerning configuration relaxation found.


Phase 3: Test Integrity Verification ✅

Test Modifications Analyzed:

  • gleam.test.ts: New test file with proper coverage of Gleam extraction (functions, types, imports, calls)
  • native-drop-classification.test.ts: Clean modification - removes Gleam from unsupported list (expected with native support)
  • No test weakening - assertions remain strict, no timeouts increased, no coverage thresholds lowered
  • Test approach mirrors existing pattern - matches structure of other language tests

Assessment: Test changes are appropriate and strengthen coverage.


Phase 4: Code Quality Assessment ✅

Native Extractor Implementation (gleam.rs:1-432):

Strengths:

  • Perfect parity with WASM extractor - mirrors src/extractors/gleam.ts structure and logic
  • Comprehensive symbol extraction - functions, external functions, type definitions, type aliases, constants, imports, calls
  • Proper AST node handling - uses tree-sitter field access with fallback patterns
  • Complete test coverage - 7 unit tests covering all extraction scenarios
  • Follows established Rust patterns - matches other extractors in crates/codegraph-core/src/extractors/
  • Correct AST config integration - properly wires GLEAM_AST_CONFIG for engine parity

Code Quality Details:

  • Function extraction (lines 32-53): Handles parameters as children, includes complexity/CFG computation
  • Type definitions (lines 76-141): Extracts constructors as property children, handles both inline and grouped constructors
  • Import resolution (lines 185-241): Supports unqualified imports, aliases, defaults to module name
  • Call extraction (lines 243-286): Handles both direct calls and qualified module calls with receivers
  • Parameter extraction (lines 288-327): Comprehensive parameter handling with fallback patterns

Integration Points:

  • Registry integration - properly added to LanguageKind::Gleam and from_extension
  • File collector support - .gleam added to SUPPORTED_EXTENSIONS
  • AST config mirroring - both native and JS sides have matching GLEAM_AST_TYPES/GLEAM_AST_CONFIG

Assessment: High-quality implementation that correctly ports WASM functionality to native Rust.


Phase 5: Backlog Compliance Evaluation ✅

Root Cause Analysis

Problem Being Solved: Completes dual-engine architecture parity for Gleam language support (native Rust + WASM)
Why This Approach: Follows established dual-engine pattern - both engines must produce identical results
Risk Assessment: Low risk additive functionality

Backlog Compliance

  • Zero-dep: ✓ — adds tree-sitter-gleam which is a tree-sitter grammar (existing category)
  • Foundation-aligned: ✓ — maintains dual-engine parity (P1), builds on tree-sitter/SQLite (P4), embeddable-first (P5)
  • Problem-fit: 4/5 — directly enables AI agents to work with Gleam codebases, prevents "unsupported language" gaps
  • Breaking: No — purely additive, no existing API/CLI changes
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a clean implementation that follows established patterns.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent implementation that completes dual-engine architecture for Gleam. Perfect parity between native and WASM extractors, comprehensive test coverage, follows all established patterns. Only missing the 5th star due to being incremental language support rather than architectural innovation.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR ports the Gleam symbol extractor from the WASM/JS engine to a native Rust implementation, adding tree-sitter-gleam as a crate dependency and wiring LanguageKind::Gleam throughout the Rust codegraph-core pipeline.

  • New crates/codegraph-core/src/extractors/gleam.rs mirrors extractGleamSymbols in TypeScript: it handles functions, external functions, type definitions, type aliases, constants, imports, and call sites — including the previously-reviewed fixes for named_child(0) over child(0) and the dual function_call | call dispatch.
  • JS-side plumbing (src/domain/parser.ts, src/ast-analysis/rules/index.ts, src/extractors/gleam.ts) updates NATIVE_SUPPORTED_EXTENSIONS, registers GLEAM_AST_TYPES/GLEAM_STRING_CONFIG, and backports the namedChild(0) fix to the TS extractor for dual-engine parity.
  • Infrastructure changes remove .gleam from the WASM-only skip list in change_detection.rs and file_collector.rs now that Rust handles it natively.

Confidence Score: 5/5

Safe to merge — the Rust extractor faithfully mirrors the JS engine for all Gleam constructs, previously-flagged issues have been addressed, and the infrastructure changes are self-consistent.

All handler functions have been cross-checked against the JS extractor. The named_child(0) fix is correctly applied in both engines. The function_call | call dual-dispatch is in place. Change detection, file collection, and extension registration are updated atomically. The nine unit tests plus the updated classification test cover the full extractor surface.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/gleam.rs New native Gleam extractor — faithfully mirrors the JS engine for all major constructs; previously-flagged child(0)/named_child(0) and call-node-type issues addressed.
crates/codegraph-core/src/parser_registry.rs Adds Gleam variant and expected-length constant correctly; test guard updated to EXPECTED_LEN 26.
crates/codegraph-core/src/file_collector.rs Adds gleam to SUPPORTED_EXTENSIONS; comment and doc-comment updated to drop the WASM-only reference.
crates/codegraph-core/src/change_detection.rs Removes .gleam from the WASM-only skip list; test fixture updated accordingly.
src/extractors/gleam.ts Backports namedChild(0) and childForFieldName('record') fixes for dual-engine parity; no logic regressions.
src/domain/parser.ts Adds .gleam to NATIVE_SUPPORTED_EXTENSIONS.
src/ast-analysis/rules/index.ts Adds GLEAM_AST_TYPES and GLEAM_STRING_CONFIG matching the Rust GLEAM_AST_CONFIG.
tests/parsers/native-drop-classification.test.ts Updates classification test to remove .gleam from the unsupported bucket and replace with .jl; counts adjusted correctly.

Reviews (4): Last reviewed commit: "fix(gleam): match both function_call and..." | Re-trigger Greptile

Comment on lines +62 to +76
};

symbols.definitions.push(Definition {
name: node_text(&name_node, source).to_string(),
kind: "function".to_string(),
line: start_line(node),
end_line: Some(end_line(node)),
decorators: None,
complexity: None,
cfg: None,
children: None,
});
}

fn handle_type_definition(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 handle_external_function drops parameter children

handle_function extracts parameters and stores them as children, but handle_external_function hard-codes children: None. External Gleam functions still have a full parameter list in their signatures, so callers that rely on children to understand arity or parameter names will get nothing for external functions. This creates a silent asymmetry: two functions with identical signatures produce different output depending on whether they are external.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking as a follow-up in #1110. Both engines (native Rust and WASM/JS) currently drop the parameter list for external functions — the Rust port faithfully mirrors existing WASM/JS behavior to keep dual-engine parity, but the silent asymmetry between regular and external Gleam functions is real and worth fixing in both engines together. Deferred to keep this PR scoped to "port to native" rather than "port + change extraction semantics across engines".

Comment on lines +264 to +266
let record = func_node
.child_by_field_name("record")
.or_else(|| func_node.child(0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 child(0) fallback for record may pick up anonymous punctuation nodes

func_node.child(0) returns the first child regardless of whether it is named or anonymous. In the Gleam tree-sitter grammar a field_access node's children include the . punctuation token, so the fallback could capture . as the receiver text instead of the module identifier. Prefer func_node.named_child(0) to skip anonymous punctuation tokens.

Suggested change
let record = func_node
.child_by_field_name("record")
.or_else(|| func_node.child(0));
let record = func_node
.child_by_field_name("record")
.or_else(|| func_node.named_child(0));

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b971244. Replaced the func_node.child(0) fallback for the record field with func_node.named_child(0) to skip anonymous punctuation tokens. Applied the same fix to the JS extractor (src/extractors/gleam.ts) to keep dual-engine parity.

}

fn handle_call(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
let func_node = match node.child_by_field_name("function").or_else(|| node.child(0)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 handle_call first-child fallback also uses unnamed child

Same concern at the top-level func_node selection: node.child(0) can return anonymous tokens. node.named_child(0) is consistent with how other extractors resolve this.

Suggested change
let func_node = match node.child_by_field_name("function").or_else(|| node.child(0)) {
let func_node = match node.child_by_field_name("function").or_else(|| node.named_child(0)) {

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b971244. Replaced node.child(0) with node.named_child(0) in handle_call to skip anonymous punctuation tokens. Same fix applied to the JS extractor to keep dual-engine parity.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Codegraph Impact Analysis

29 functions changed14 callers affected across 3 files

  • detect_removed_skips_unsupported_extensions in crates/codegraph-core/src/change_detection.rs:776 (0 transitive callers)
  • GleamExtractor.extract in crates/codegraph-core/src/extractors/gleam.rs:11 (0 transitive callers)
  • match_gleam_node in crates/codegraph-core/src/extractors/gleam.rs:19 (0 transitive callers)
  • handle_function in crates/codegraph-core/src/extractors/gleam.rs:32 (1 transitive callers)
  • handle_external_function in crates/codegraph-core/src/extractors/gleam.rs:55 (1 transitive callers)
  • handle_type_definition in crates/codegraph-core/src/extractors/gleam.rs:76 (1 transitive callers)
  • handle_type_alias in crates/codegraph-core/src/extractors/gleam.rs:143 (1 transitive callers)
  • handle_constant in crates/codegraph-core/src/extractors/gleam.rs:164 (1 transitive callers)
  • handle_import in crates/codegraph-core/src/extractors/gleam.rs:185 (1 transitive callers)
  • handle_call in crates/codegraph-core/src/extractors/gleam.rs:243 (1 transitive callers)
  • extract_params in crates/codegraph-core/src/extractors/gleam.rs:291 (2 transitive callers)
  • parse_gleam in crates/codegraph-core/src/extractors/gleam.rs:337 (9 transitive callers)
  • extracts_public_function in crates/codegraph-core/src/extractors/gleam.rs:347 (0 transitive callers)
  • extracts_private_function in crates/codegraph-core/src/extractors/gleam.rs:356 (0 transitive callers)
  • extracts_qualified_call_as_receiver_name in crates/codegraph-core/src/extractors/gleam.rs:362 (0 transitive callers)
  • extracts_same_file_call in crates/codegraph-core/src/extractors/gleam.rs:374 (0 transitive callers)
  • extracts_import_module in crates/codegraph-core/src/extractors/gleam.rs:386 (0 transitive callers)
  • extracts_unqualified_imports in crates/codegraph-core/src/extractors/gleam.rs:394 (0 transitive callers)
  • extracts_type_definition_with_constructors in crates/codegraph-core/src/extractors/gleam.rs:403 (0 transitive callers)
  • extracts_type_alias in crates/codegraph-core/src/extractors/gleam.rs:419 (0 transitive callers)

Replaces child(0) fallbacks in handleCall / handle_call with
named_child(0) in both the native Rust and WASM/JS Gleam extractors.

The Gleam tree-sitter grammar's field_access node includes the '.'
punctuation token as a child, so child(0) on field_access could
return '.' as the receiver text on malformed input. named_child(0)
skips anonymous tokens and is consistent across both engines.

The field accessors always succeed on valid Gleam, so this only
affects the defensive fallback path, but it removes a silent
asymmetry that Greptile flagged in review of #1105.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

"import" => handle_import(node, source, symbols),
"function_call" => handle_call(node, source, symbols),
_ => {}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing call node type in match_gleam_node

The JS extractor's walkGleamNode dispatches on both 'function_call' and 'call' node types, but the Rust match_gleam_node only matches "function_call". Any Gleam file parsed with a grammar version that emits call nodes (or if the grammar ever uses both interchangeably) will have its call sites silently skipped by the native extractor while the WASM engine extracts them correctly — breaking the dual-engine parity that this PR aims to guarantee.

Suggested change
}
"function_call" | "call" => handle_call(node, source, symbols),

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a726c3a. Updated match_gleam_node to match both function_call and call node types, matching the JS walkGleamNode dispatch in src/extractors/gleam.ts. The 9 Gleam unit tests still pass.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rust engine parity: port the 11 remaining JS-only language extractors

1 participant