Benchmarks

Code quality metrics for Acolyte and other open-source AI agents, derived from static source analysis — no subjective scoring.

For feature and architecture comparisons, see Comparison.

All metrics extracted with scripts/benchmark.ts.

Methodology

Source lines = total lines of source code (including blanks and comments)
Test files, generated code, and files over 10k lines are excluded
Metrics normalized per 1k source lines where applicable
Dependencies shown as runtime + development dependencies

Closed systems

Several widely used coding agents are closed-source and cannot be analyzed with the same methodology.

	Acolyte	Claude Code	Cursor	Copilot
Open-source	✓	✗	✗	✗
Self-hostable	✓	✗	✗	✗
Observable execution	✓	✗	✗	✗

Claude Code, Cursor, and Copilot are included for context but excluded from code analysis benchmarks.

Projects compared

Project	Language	Description	Source lines	Files	Dependencies
Acolyte	TypeScript	Terminal coding agent with lifecycle, effects, and AST code tools	27,901	234	13 + 6
OpenCode	TypeScript	Open-source AI coding agent (TUI/web/desktop)	240,418	1,166	191 + 84
Codex	Rust	Terminal AI coding agent from OpenAI	462,656	1,139	245 + 58
Crush	Go	Terminal AI coding agent from Charm with Bubble Tea TUI	60,863	268	72 + 0
Aider	Python	AI pair programming in your terminal	25,943	105	35 + 17
Goose	Rust	Extensible AI agent from Block with MCP integration	133,379	343	150 + 19
Qwen Code	TypeScript	Terminal AI coding agent from Alibaba	233,638	1,076	91 + 85
Plandex	Go	AI coding agent for large multi-file tasks in the terminal	74,573	333	54 + 0
Mistral Vibe	Python	Terminal AI coding agent from Mistral	36,450	250	36 + 14

Dependency surface area

Measures how much of a codebase depends on external packages.

Metric	Acolyte	OpenCode	Qwen Code
External imports / 1k LOC	7.0	16.9	7.9
Runtime dependencies	13	191	91

TypeScript projects only.

Acolyte has the lowest external import density and fewest runtime dependencies among TypeScript projects.

Input validation coverage

Measures how frequently data entering the system is validated.

Metric	Acolyte	OpenCode	Qwen Code
Schema validations / 1k LOC	2.6	0.8	0.6
`.safeParse()` calls / 1k	1.1	0.1	0.0

TypeScript projects only.

Acolyte validates at a higher rate than every other project in the benchmark.

TypeScript type safety

Per 1k source lines.

Metric	Acolyte	OpenCode	Qwen Code
`as any`	0.1	1.7	0.1
`: any` annotations	0.0	0.9	0.3
`@ts-ignore` / `@ts-expect-error`	0.0	0.2	0.0
Lint ignores	0.2	0.0	0.3
`: unknown` usage	3.2	1.8	2.3

Acolyte and Qwen Code have near-zero any usage. Acolyte uses unknown with explicit narrowing — every tool output, model response, and RPC payload is validated through Zod schemas before entering the type system.

Cross-language type safety

Per 1k source lines.

Metric	Aider	Mistral Vibe	Goose	Codex	Crush	Plandex
`type: ignore` (Python)	0.0	0.1	—	—	—	—
`Any` usage (Python)	0.1	9.3	—	—	—	—
`cast()` calls (Python)	0.0	1.0	—	—	—	—
`unsafe` (Rust)	—	—	0.1	1.0	—	—
`.unwrap()` (Rust)	—	—	11.5	3.2	—	—
`.expect()` (Rust)	—	—	1.4	11.2	—	—
`any` / `interface{}` (Go)	—	—	—	—	3.8	4.4
`panic()` (Go)	—	—	—	—	0.2	0.3
`nolint` (Go)	—	—	—	—	0.2	0.0

Aider shows minimal type escape hatches. Mistral Vibe has high Any density. Codex has lower .unwrap() than Goose but high .expect() — errors are surfaced but rely on panicking assertions.

Test quality

Metric	Acolyte	OpenCode	Codex	Crush	Aider	Goose	Qwen Code	Plandex	Mistral Vibe
Test files	190	266	270	68	42	22	532	6	221
Test lines	23,279	61,963	128,336	14,612	12,427	7,970	228,906	2,517	46,582
Ratio	0.83	0.26	0.28	0.24	0.48	0.06	0.98	0.03	1.28

Acolyte maintains a high test ratio because lifecycle phases and tools are independent modules with clean interfaces.

Test types include:

unit (*.test.ts)
integration (*.int.test.ts)
TUI visual regression (*.tui.test.ts)
performance (*.perf.test.ts)

Module cohesion

Metric	Acolyte	OpenCode	Codex	Crush	Aider	Goose	Qwen Code	Plandex	Mistral Vibe
Avg lines / file	119	206	406	227	247	389	217	224	146
Files > 500 lines	2 (1%)	120 (10%)	242 (21%)	26 (10%)	14 (13%)	88 (26%)	114 (11%)	36 (11%)	8 (3%)
Largest file	692	5,341	9,842	3,611	2,486	2,741	2,369	2,455	2,617
Barrel / index files	1	54	50	2	5	45	53	0	43

Acolyte maintains the smallest average module size and fewest large files.

Error handling

Per 1k source lines.

Metric	Acolyte	OpenCode	Qwen Code
`.safeParse()` calls	1.1	0.1	0.0
`try { ... }` blocks	6.1	1.3	5.0
`.catch()` calls	0.5	2.3	0.4

TypeScript projects only.

Acolyte validates boundaries with Zod .safeParse() at a higher rate than other projects. RPC payloads, model responses, and configuration files are validated before entering the system.

Key takeaways

Across the benchmarked projects, Acolyte demonstrates:

Extremely low any usage and strong TypeScript safety
The smallest modules and lowest large-file density
The lightest dependency footprint
High automated test coverage
Clear lifecycle boundaries across independently testable modules

These characteristics reflect a deliberately small, strongly typed architecture — built so that lifecycle phases and tools behave predictably and can be independently verified.

Summary

Dimension	Acolyte	OpenCode	Codex	Crush	Aider	Goose	Qwen Code	Plandex	Mistral Vibe
Type safety	High	Medium	Medium	Medium	High	Panic-heavy	High	Medium	Any-heavy
Test density	High (0.83)	Low (0.26)	Low (0.28)	Low (0.24)	Medium (0.48)	Lowest (0.06)	High (0.98)	Low (0.03)	Highest (1.28)
Module size	Smallest (119)	Medium (208)	Large (406)	Medium (227)	Medium (247)	Largest (389)	Medium (217)	Medium (224)	Small (143)
Dependencies	Lightest (19)	Heavy (275)	Heavy (303)	Light (72)	Light (52)	Heavy (169)	Heavy (176)	Light (54)	Light (50)
First commit	Feb 2026	Apr 2025	Apr 2025	May 2025	May 2023	Aug 2024	Jun 2025	Oct 2023	Dec 2025

Acolyte leads on type safety, module size, and dependency count while remaining the smallest codebase in the benchmark.

Updated 14 April 2026.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks

Methodology

Closed systems

Projects compared

Dependency surface area

Input validation coverage

TypeScript type safety

Cross-language type safety

Test quality

Module cohesion

Error handling

Key takeaways

Summary

FilesExpand file tree

benchmarks.md

Latest commit

History

benchmarks.md

File metadata and controls

Benchmarks

Methodology

Closed systems

Projects compared

Dependency surface area

Input validation coverage

TypeScript type safety

Cross-language type safety

Test quality

Module cohesion

Error handling

Key takeaways

Summary