fix(native): bump tree-sitter to 0.25 + ABI regression test (#1054)#1060
Conversation
The HCL extractor in codegraph-core was silently dropping every .tf file
at runtime because `tree-sitter-hcl` 1.1.0 ships a parser built for
tree-sitter ABI 15 while the host crate was pinned at `tree-sitter = "0.24"`
(ABI 14). `Parser::set_language` rejects the grammar with
`LanguageError { version: 15 }`, `parse_file` returns `None`, and the JS
engine-parity layer reports the file as a "drop" — triggering an expensive
WASM backfill on every build. See #1054.
Three changes:
- `Cargo.toml`: bump `tree-sitter = "0.25"`. Resolves to 0.25.10. ABI 15
runtime supports both ABI 14 and 15 grammars, so the older grammar
crates (most are on `0.23`/`0.24`) continue to load without their own
bumps.
- `Cargo.lock`: regenerated. Brings in `tree-sitter 0.25.10`,
`indexmap 2.14`, `hashbrown 0.17`, `equivalent 1.0`. The
`codegraph-core` package row also re-syncs from `3.9.5` to `3.9.6`,
which had drifted independently.
- `parser_registry.rs`: adds `LanguageKind::all()` plus a `#[cfg(test)]`
module with two regression tests:
- `all_grammars_have_compatible_abi` iterates `all()` and asserts
`Parser::set_language` succeeds for every grammar. This is the test
that catches the next ABI mismatch the moment it lands in the
lockfile, instead of after we ship a broken release.
- `all_kinds_listed_in_all` is an exhaustive match over `LanguageKind`
that fails to compile if a new variant is added without being
listed in `all()` — preventing silent ABI-coverage gaps when
languages are added.
- `ci.yml`: adds `cargo test --release` to the `native-host-build` job so
these tests actually run in CI. Without this, the regression test would
only run when someone manually invoked cargo, defeating its purpose.
Verified locally: 183 cargo tests pass, including both new ones.
Without this PR, `all_grammars_have_compatible_abi` fails on `Hcl` with
`LanguageError { version: 15 }`.
Refs #1054
|
Claude finished @carlos-alm's task —— View job Reviewing PR: tree-sitter ABI bump fixReview Checklist
Root Cause AnalysisProblem Being Solved: HCL .tf files were being silently dropped due to tree-sitter ABI version mismatch. Why This Approach: This is the correct root cause fix. Bumping to tree-sitter 0.25.10 provides ABI 15 support while maintaining backward compatibility with ABI 14 grammars. The regression tests ensure this drift is caught at PR-time rather than silently in production. Risk Assessment: ✅ Low risk - this is a dependency version bump with comprehensive test coverage. Tree-sitter 0.25 maintains backward compatibility, and the new tests prevent future regressions. Configuration Changes Review
Test Integrity ReviewNew Regression Tests ✅ EXCELLENT
Test Implementation Quality ✅
Code Quality ReviewChanges Are Minimal and Focused ✅
Implementation Quality ✅
Technical Correctness ✅
Backlog Compliance
Critical ConcernsNone. This is an exemplary fix:
Final AssessmentThis PR demonstrates excellent engineering practice - it identifies the real problem, implements the minimal correct fix, and adds robust safeguards to prevent regression. The ABI compatibility tests are particularly well-designed and will prevent this class of issue in the future. |
Greptile SummaryBumps Confidence Score: 5/5This PR is safe to merge — it fixes a confirmed runtime regression with a targeted version bump and adds meaningful regression tests. No P0 or P1 findings. The Cargo.toml bump is minimal and correct, Cargo.lock changes are consistent, the regression tests are well-designed (compile-time exhaustiveness + runtime ABI probe), and the CI integration is straightforward. The redundant clang step is correctly documented as intentionally deferred to a follow-up revert. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[cargo test --release] --> B[all_grammars_have_compatible_abi]
A --> C[all_kinds_listed_in_all]
B --> D{Parser::set_language\nfor each LanguageKind::all}
D -->|ABI match| E[pass]
D -->|LanguageError| F[collect failure\nfail test]
C --> G[Exhaustive compile-time match\nover all LanguageKind variants]
G -->|missing variant| H[compile error]
G -->|all present| I[assert_eq all len == EXPECTED_LEN]
I -->|mismatch| J[runtime failure]
I -->|match| K[pass]
subgraph ABI Fix
L[tree-sitter 0.24\nABI 14 only] -->|rejects HCL grammar| M[parse_file returns None\nsilently drops .tf files]
N[tree-sitter 0.25\nABI 14 + 15] -->|accepts HCL grammar| O[.tf files parsed correctly]
end
Reviews (2): Last reviewed commit: "ci: clarify clang step is redundant afte..." | Re-trigger Greptile |
| assert_eq!(LanguageKind::all().len(), 24); | ||
| } | ||
| } |
There was a problem hiding this comment.
The
assert_eq!(LanguageKind::all().len(), 24) check doesn't fully close the gap it aims to close. If a developer adds a new variant to the enum (forced by the compile-time match) but forgets to add it to all() and also forgets to bump this count, both the compile check and the runtime assertion pass silently — the new language gets zero ABI coverage. The assertion only catches someone who updates all() but forgets to bump the number, which is the less dangerous case. Naming the constant and improving the assertion message would make the three required update sites more explicit.
| assert_eq!(LanguageKind::all().len(), 24); | |
| } | |
| } | |
| // IMPORTANT: this constant must equal the number of arms in the match | |
| // above AND the length of the slice returned by `LanguageKind::all()`. | |
| // Because both checks require the same manual update, they reinforce | |
| // each other: a developer who updates the match is reminded to also | |
| // update `all()` and this count. | |
| const EXPECTED_LEN: usize = 24; | |
| assert_eq!( | |
| LanguageKind::all().len(), | |
| EXPECTED_LEN, | |
| "A LanguageKind variant is in the exhaustive match but missing from \ | |
| `all()` (or vice-versa). Update `all()` and bump EXPECTED_LEN.", | |
| ); | |
| } | |
| } |
There was a problem hiding this comment.
Fixed in 122da28 — applied the suggested EXPECTED_LEN constant rename and the clearer assertion message verbatim. Both regression tests still pass locally.
Codegraph Impact Analysis3 functions changed → 0 callers affected across 0 files
|
…1060) Addresses Greptile feedback: rename the bare `24` literal in the `all_kinds_listed_in_all` test to a named `EXPECTED_LEN` constant with a comment that points at all three update sites (the exhaustive match, `LanguageKind::all()`, and the constant itself), plus a clearer assertion message that names the failure mode.
Addresses Greptile feedback: the comment on the clang step claimed clang produced a working binary, but the real root cause of the .tf drop was a tree-sitter ABI mismatch — gcc and clang both fail at the same ABI check. With the tree-sitter 0.25 bump in this PR the clang override is redundant. Update the comment to reflect that, and note that the step itself is removed by a follow-up revert PR to keep this PR scoped to the ABI fix.
|
@greptileai addressed both review findings:
|
|
Both fixes look good — the commits match exactly what was described.
Both P2 findings are resolved. The PR is ready from my side. Tip: You can customize Greptile's behavior for this repo with |
Summary
tree-sitter-hcl1.1.0 ships parser code built for tree-sitter ABI 15; `crates/codegraph-core` was pinned at `tree-sitter = "0.24"` (ABI 14). At runtime, `Parser::set_language` rejects the grammar with `LanguageError { version: 15 }`, `parse_file` returns `None`, and every `.tf` file gets dropped — same behavior in both gcc and clang builds.Why this catches the regression
`all_grammars_have_compatible_abi` iterates `LanguageKind::all()`, calls `Parser::set_language` on every variant, and fails the test if any one returns `LanguageError`. Without the Cargo.toml bump in this PR, this test fails immediately with `Hcl: LanguageError { version: 15 }`.
`all_kinds_listed_in_all` is a no-op exhaustive match. If someone adds a new `LanguageKind` variant but forgets to add it to `all()`, this match fails to compile — preventing silent ABI-coverage gaps for new languages.
Test plan
Companion PRs
What this does not solve
The pre-publish gate also showed a separate ~2-second-per-call CI overhead on the native engine that isn't explained by the .tf drops (my Docker reproduces the drops with no equivalent slowdown). Tracking that separately — instrumentation patch coming.
Refs #1054