feat(native): port Clojure extractor to Rust#1097
Conversation
Adds tree-sitter-clojure dependency and native extractor matching the WASM-side behavior for Clojure symbol, import, and call extraction. Part of #1071
|
Claude finished @carlos-alm's task —— View job PR Review: Native Clojure Extractor PortConducting comprehensive review of the Clojure extractor port from WASM to native Rust. Review Progress
Root Cause AnalysisProblem Being Solved: Porting the Clojure language extractor from WASM-only to dual-engine (native Rust + WASM fallback) to achieve parsing performance parity with other supported languages. Why This Approach: Follows the established dual-engine architecture principle (Foundation P2). The Rust implementation mirrors the existing JavaScript extractor behavior exactly, preserving compatibility while enabling native performance gains. Risk Assessment: LOW RISK - This is a straightforward language addition following established patterns with comprehensive test coverage. Backlog Compliance
Critical ConcernsNone identified. This is an exemplary implementation that:
Technical Review SummaryPhase 2: Configuration Changes ✅
Phase 3: Test Integrity ✅
Phase 4: Code Quality ✅Rust Implementation (
Integration:
Final Recommendation
|
Greptile SummaryThis PR ports the Clojure symbol extractor from TypeScript (WASM) to Rust native, adding
Confidence Score: 5/5Safe to merge — this is a straightforward additive port with no modifications to existing extraction paths. Every changed file is either purely additive (new extractor, new language variant, new config entries) or a targeted removal of Clojure from the WASM-only skip lists. The Rust implementation is a faithful line-by-line translation of the JS extractor, all known behavioral gaps are documented, and the full test suite (193 Rust + 5 Vitest Clojure + 13 native-drop tests) passes cleanly. No files require special attention. Important Files Changed
Reviews (3): Last reviewed commit: "Merge branch 'main' into feat/1071-cloju..." | Re-trigger Greptile |
| break; | ||
| } | ||
| None | ||
| } | ||
|
|
||
| /// Find the second `sym_lit` or `kwd_lit` child. Used to extract the bound | ||
| /// name from forms like `(defn foo [...] ...)`. | ||
| fn find_second_symbol<'a>(list_node: &Node<'a>) -> Option<Node<'a>> { | ||
| let mut count = 0; |
There was a problem hiding this comment.
? exits the function on None — JS uses continue
Both find_first_symbol and find_second_symbol use let child = list_node.child(i)?;. If child(i) ever returns None for a valid index, the ? operator returns None from the whole function immediately, silently skipping all remaining children. The JS counterpart uses if (!child) continue; which just advances to the next iteration. In practice tree-sitter's child(i) shouldn't return None for i < child_count(), but the divergence is fragile and could produce silent extraction failures under unexpected grammar edge-cases.
There was a problem hiding this comment.
Fixed in d9e3c9b. Replaced the ? with an explicit match that skips None and continues, so the scan no longer aborts on an unexpected None from list_node.child(i) — matches the JS extractor's if (!child) continue; semantics. Practically inert today (tree-sitter always returns Some for i < child_count()), but the invariant is now explicit on both engines.
| fn extract_clojure_params(defn_node: &Node, source: &[u8]) -> Vec<Definition> { | ||
| let mut params = Vec::new(); | ||
| // First `vec_lit` child is the parameter vector `[x y z]`. | ||
| for i in 0..defn_node.child_count() { | ||
| let child = match defn_node.child(i) { | ||
| Some(c) if c.kind() == "vec_lit" => c, | ||
| _ => continue, | ||
| }; | ||
| for j in 0..child.child_count() { | ||
| if let Some(param) = child.child(j) { | ||
| if param.kind() == "sym_lit" { | ||
| params.push(child_def( | ||
| node_text(¶m, source).to_string(), | ||
| "parameter", | ||
| start_line(¶m), | ||
| )); | ||
| } | ||
| } | ||
| } | ||
| break; // Only the first vector is the params | ||
| } | ||
| params |
There was a problem hiding this comment.
defmethod with vector dispatch value misses params
extract_clojure_params breaks on the first vec_lit. For (defmethod foo [:a :b] [x] body) the dispatch vector [:a :b] is the first vec_lit, so the actual parameter vector [x] is never visited. Since :a/:b are kwd_lit, not sym_lit, the extracted children list ends up empty rather than wrong — but x is silently dropped. The JS extractor has the identical break behavior, so this matches parity, but it is a gap worth documenting explicitly given the comment already calls out visibility.
There was a problem hiding this comment.
Addressed in d9e3c9b — added an explicit comment in extract_clojure_params documenting the defmethod dispatch-vector gap (dispatch vec_lit consumes the break slot; param vector silently skipped; children ends up empty rather than wrong). Behavior unchanged to preserve parity with the JS extractor.
| } | ||
|
|
||
| fn handle_import_form(node: &Node, source: &[u8], symbols: &mut FileSymbols, keyword: &str) { | ||
| let name_node = match find_second_symbol(node) { | ||
| Some(n) => n, | ||
| None => return, | ||
| }; | ||
| symbols.imports.push(Import::new( | ||
| node_text(&name_node, source).to_string(), | ||
| vec![keyword.to_string()], | ||
| start_line(node), | ||
| )); | ||
| } |
There was a problem hiding this comment.
Top-level
require/use/import effectively dead for standard Clojure syntax
handle_import_form delegates to find_second_symbol, which only matches sym_lit or kwd_lit. In real Clojure code the argument is almost always a quoted form ('some.ns → quoting_lit) or a vector ('[some.ns :as s]). Both cause find_second_symbol to return None and the import is silently dropped. The JS handleImportForm has the same limitation, so parity is maintained. A note distinguishing this path from the correctly-handled (:require …) inside ns forms would help future maintainers understand why top-level require coverage is incomplete.
There was a problem hiding this comment.
Addressed in d9e3c9b — added a doc-comment on handle_import_form explaining that top-level (require 'some.ns) / (require '[some.ns :as s]) shapes return None from find_second_symbol and are silently dropped, while the real-world path (ns ... (:require ...)) is handled correctly by extract_ns_requires. Parity with JS preserved; future maintainers now have the breadcrumb.
Codegraph Impact Analysis32 functions changed → 22 callers affected across 2 files
|
…rity gaps Address Greptile review feedback on PR #1097: - `find_first_symbol`/`find_second_symbol`: replace `?` with `match` so a `None` from `list_node.child(i)` skips and continues instead of aborting the whole scan, matching the JS extractor's `if (!child) continue;` semantics. Practically inert today (tree-sitter always returns `Some` for `i < child_count()`), but makes the invariant explicit and removes a fragile divergence. - `extract_clojure_params`: document the known `defmethod` dispatch-vector gap inherited from the JS extractor. - `handle_import_form`: document why top-level `(require 'some.ns)` and `(require '[some.ns :as s])` are silently dropped and point maintainers at the working `(ns ...)` path in `extract_ns_requires`.
Summary
tree-sitter-clojure-orcharddependency and a native Clojure extractor incrates/codegraph-core/src/extractors/clojure.rs..clj/.cljs/.cljcwithLanguageKind::Clojureand the Rustfile_collector, removes Clojure from the WASM-only drop list, and wiresCLOJURE_AST_CONFIG(string/regex literals) on both the native and JS sides.extractClojureSymbolsinsrc/extractors/clojure.ts: namespace threading viacurrentNs,defn/defn-/defmacro/defmulti/defmethodasfunction,defprotocolasinterface,defrecordasrecord,deftypeastype,def/defonceasvariable, plus(:require ...)/(:import ...)/(:use ...)imports.Part of #1071
Test plan
cargo build --release -p codegraph-core(clean build)cargo test -p codegraph-core --lib— 193/193 (9 new clojure extractor tests)napi build --releaseregenerates.nodeartifactnpx vitest run tests/parsers/clojure.test.ts— 5/5npx vitest run tests/parsers/native-drop-classification.test.ts— 13/13npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts -t clojure— 5/5