Skip to content

Commit 600fe90

Browse files
committed
docs: bump language count to 158 (add QML, CFML) and grammar count to 157
New languages added this round: Qt QML (.qml), CFML/ColdFusion (.cfc script + .cfm tag). Update README, npm README, and chocolatey description to the accurate distinct-language count (158) and vendored-grammar count (157).
1 parent a8fc901 commit 600fe90

3 files changed

Lines changed: 9 additions & 9 deletions

File tree

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616

1717
**The fastest and most efficient code intelligence engine for AI coding agents.** Full-indexes an average repository in milliseconds, the Linux kernel (28M LOC, 75K files) in 3 minutes. Answers structural queries in under 1ms. Ships as a single static binary for macOS, Linux, and Windows — download, run `install`, done.
1818

19-
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 155 languages, enhanced with [**Hybrid LSP** semantic type resolution](#hybrid-lsp) for Python, TypeScript / JavaScript / JSX / TSX, PHP, C#, Go, C, and C++ — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 11 coding agents.
19+
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across all 158 languages, enhanced with [**Hybrid LSP** semantic type resolution](#hybrid-lsp) for Python, TypeScript / JavaScript / JSX / TSX, PHP, C#, Go, C, and C++ — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 11 coding agents.
2020

2121
> **Research** — The design and benchmarks behind this project are described in the preprint [*Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP*](https://arxiv.org/abs/2603.27277) (arXiv:2603.27277). Evaluated across 31 real-world repositories: 83% answer quality, 10× fewer tokens, 2.1× fewer tool calls vs. file-by-file exploration.
2222
@@ -32,7 +32,7 @@ High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-si
3232

3333
- **Extreme indexing speed** — Linux kernel (28M LOC, 75K files) in 3 minutes. RAM-first pipeline: LZ4 compression, in-memory SQLite, fused Aho-Corasick pattern matching. Memory released after indexing.
3434
- **Plug and play** — single static binary for macOS (arm64/amd64), Linux (arm64/amd64), and Windows (amd64). No Docker, no runtime dependencies, no API keys. Download → `install` → restart agent → done.
35-
- **155 languages** — vendored tree-sitter grammars compiled into the binary. Nothing to install, nothing that breaks.
35+
- **158 languages** — vendored tree-sitter grammars compiled into the binary. Nothing to install, nothing that breaks.
3636
- **120x fewer tokens** — 5 structural queries: ~3,400 tokens vs ~412,000 via file-by-file search. One graph query replaces dozens of grep/read cycles.
3737
- **11 agents, one command**`install` auto-detects Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro — configures MCP entries, instruction files, and pre-tool hooks for each.
3838
- **Built-in graph visualization** — 3D interactive UI at `localhost:9749` (optional UI binary variant).
@@ -168,7 +168,7 @@ Removes all agent configs, skills, hooks, and instructions. Does not remove the
168168
- `SEMANTICALLY_RELATED` (vocabulary-mismatch, same-language, score ≥ 0.80)
169169

170170
### Indexing pipeline
171-
- **155 vendored tree-sitter grammars** compiled into the binary
171+
- **157 vendored tree-sitter grammars** compiled into the binary
172172
- **Generic package / module resolution** — bare specifiers like `@myorg/pkg`, `github.com/foo/bar`, `use my_crate::foo` resolved via manifest scanning (`package.json`, `go.mod`, `Cargo.toml`, `pyproject.toml`, `composer.json`, `pubspec.yaml`, `pom.xml`, `build.gradle`, `mix.exs`, `*.gemspec`)
173173
- **Infrastructure-as-code indexing** — Dockerfiles, Kubernetes manifests, Kustomize overlays as graph nodes
174174
- **[Hybrid LSP semantic type resolution](#hybrid-lsp)** for Python, TypeScript / JavaScript / JSX / TSX, PHP, C#, Go, C, and C++ — a clean-room re-implementation of the type-resolution algorithms used by tsserver / typescript-go, pyright, gopls, intelephense, and Roslyn (parameter binding, return-type inference, generic substitution, JSX component dispatch, JSDoc inference for plain JS files, namespace + trait + late-static-binding resolution for PHP, file-scoped namespaces + records + LINQ method syntax for C#)
@@ -496,14 +496,14 @@ codebase-memory-mcp ships a **clean-room re-implementation of the type-resolutio
496496

497497
**Two-layer architecture:**
498498

499-
1. **Tree-sitter pass** — fast, syntactic, runs for every one of the 155 languages. Extracts definitions, calls, imports.
499+
1. **Tree-sitter pass** — fast, syntactic, runs for every one of the 158 languages. Extracts definitions, calls, imports.
500500
2. **Hybrid LSP pass** — type-aware, runs above the tree-sitter pass per-language. Refines call edges using the import graph plus a per-file or pre-built cross-file definition registry. Languages without a Hybrid LSP pass yet fall back to textual resolution, so you always get *some* answer.
501501

502502
The result is a knowledge graph accurate enough to drive `trace_call_path` across packages, inheritance hierarchies, and stdlib calls — without paying for a language server process per project.
503503

504504
## Language Support
505505

506-
155 languages, all parsed via vendored tree-sitter grammars compiled into the binary. Benchmarked against 64 real open-source repositories (78 to 49K nodes):
506+
158 languages, all parsed via vendored tree-sitter grammars compiled into the binary. Benchmarked against 64 real open-source repositories (78 to 49K nodes):
507507

508508
| Tier | Score | Languages |
509509
|------|-------|-----------|
@@ -528,7 +528,7 @@ src/
528528
traces/ Runtime trace ingestion
529529
ui/ Embedded HTTP server + 3D graph visualization
530530
foundation/ Platform abstractions (threads, filesystem, logging, memory)
531-
internal/cbm/ Vendored tree-sitter grammars (155 languages) + AST extraction engine
531+
internal/cbm/ Vendored tree-sitter grammars (158 languages) + AST extraction engine
532532
```
533533

534534
## Security

pkg/chocolatey/codebase-memory-mcp.nuspec

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ codebase-memory-mcp is a single static binary MCP server that indexes codebases
1717
extreme speed and exposes 14 MCP tools for AI coding agents.
1818

1919
Full indexes an average repository in milliseconds, the Linux kernel (28M LOC) in
20-
3 minutes. Answers structural queries in under 1ms. Supports 155 languages via
20+
3 minutes. Answers structural queries in under 1ms. Supports 158 languages via
2121
vendored tree-sitter grammars. Zero dependencies. Plug and play across 11 coding
2222
agents including Claude Code, Codex CLI, Gemini CLI, and Zed.
2323

pkg/npm/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
**The fastest and most efficient code intelligence engine for AI coding agents.** Full-indexes an average repository in milliseconds, the Linux kernel (28M LOC, 75K files) in 3 minutes. Answers structural queries in under 1ms. Ships as a single static binary — this package downloads and runs it automatically.
99

10-
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across 155 languages — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 11 coding agents.
10+
High-quality parsing through [tree-sitter](https://tree-sitter.github.io/tree-sitter/) AST analysis across 158 languages — producing a persistent knowledge graph of functions, classes, call chains, HTTP routes, and cross-service links. 14 MCP tools. Zero dependencies. Plug and play across 11 coding agents.
1111

1212
## Installation
1313

@@ -27,7 +27,7 @@ Restart your agent. Say **"Index this project"** — done.
2727

2828
- **Extreme indexing speed** — Linux kernel (28M LOC, 75K files) in 3 minutes. RAM-first pipeline with LZ4 compression and in-memory SQLite.
2929
- **Plug and play** — single static binary for macOS (arm64/amd64), Linux (arm64/amd64), and Windows (amd64). No Docker, no runtime dependencies, no API keys.
30-
- **155 languages** — vendored tree-sitter grammars compiled into the binary. Nothing to install, nothing that breaks.
30+
- **158 languages** — vendored tree-sitter grammars compiled into the binary. Nothing to install, nothing that breaks.
3131
- **120x fewer tokens** — 5 structural queries: ~3,400 tokens vs ~412,000 via file-by-file search.
3232
- **11 agents, one command**`install` auto-detects Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro.
3333
- **14 MCP tools** — search, trace, architecture, impact analysis, Cypher queries, dead code detection, cross-service HTTP linking, ADR management, and more.

0 commit comments

Comments
 (0)