Skip to content

Latest commit

 

History

History
166 lines (116 loc) · 7.17 KB

File metadata and controls

166 lines (116 loc) · 7.17 KB

Benchmarks

Code quality metrics for Acolyte and other open-source AI agents, derived from static source analysis — no subjective scoring.

For feature and architecture comparisons, see Comparison.

All metrics extracted with scripts/benchmark.ts.

Methodology

  • Source lines = total lines of source code (including blanks and comments)
  • Test files, generated code, and files over 10k lines are excluded
  • Metrics normalized per 1k source lines where applicable
  • Dependencies shown as runtime + development dependencies

Closed systems

Several widely used coding agents are closed-source and cannot be analyzed with the same methodology.

Acolyte Claude Code Cursor Copilot
Open-source
Self-hostable
Observable execution

Claude Code, Cursor, and Copilot are included for context but excluded from code analysis benchmarks.

Projects compared

Project Language Description Source lines Files Dependencies
Acolyte TypeScript Terminal coding agent with lifecycle, effects, and AST code tools 27,901 234 13 + 6
OpenCode TypeScript Open-source AI coding agent (TUI/web/desktop) 240,418 1,166 191 + 84
Codex Rust Terminal AI coding agent from OpenAI 462,656 1,139 245 + 58
Crush Go Terminal AI coding agent from Charm with Bubble Tea TUI 60,863 268 72 + 0
Aider Python AI pair programming in your terminal 25,943 105 35 + 17
Goose Rust Extensible AI agent from Block with MCP integration 133,379 343 150 + 19
Qwen Code TypeScript Terminal AI coding agent from Alibaba 233,638 1,076 91 + 85
Plandex Go AI coding agent for large multi-file tasks in the terminal 74,573 333 54 + 0
Mistral Vibe Python Terminal AI coding agent from Mistral 36,450 250 36 + 14

Dependency surface area

Measures how much of a codebase depends on external packages.

Metric Acolyte OpenCode Qwen Code
External imports / 1k LOC 7.0 16.9 7.9
Runtime dependencies 13 191 91

TypeScript projects only.

Acolyte has the lowest external import density and fewest runtime dependencies among TypeScript projects.

Input validation coverage

Measures how frequently data entering the system is validated.

Metric Acolyte OpenCode Qwen Code
Schema validations / 1k LOC 2.6 0.8 0.6
.safeParse() calls / 1k 1.1 0.1 0.0

TypeScript projects only.

Acolyte validates at a higher rate than every other project in the benchmark.

TypeScript type safety

Per 1k source lines.

Metric Acolyte OpenCode Qwen Code
as any 0.1 1.7 0.1
: any annotations 0.0 0.9 0.3
@ts-ignore / @ts-expect-error 0.0 0.2 0.0
Lint ignores 0.2 0.0 0.3
: unknown usage 3.2 1.8 2.3

Acolyte and Qwen Code have near-zero any usage. Acolyte uses unknown with explicit narrowing — every tool output, model response, and RPC payload is validated through Zod schemas before entering the type system.

Cross-language type safety

Per 1k source lines.

Metric Aider Mistral Vibe Goose Codex Crush Plandex
type: ignore (Python) 0.0 0.1
Any usage (Python) 0.1 9.3
cast() calls (Python) 0.0 1.0
unsafe (Rust) 0.1 1.0
.unwrap() (Rust) 11.5 3.2
.expect() (Rust) 1.4 11.2
any / interface{} (Go) 3.8 4.4
panic() (Go) 0.2 0.3
nolint (Go) 0.2 0.0

Aider shows minimal type escape hatches. Mistral Vibe has high Any density. Codex has lower .unwrap() than Goose but high .expect() — errors are surfaced but rely on panicking assertions.

Test quality

Metric Acolyte OpenCode Codex Crush Aider Goose Qwen Code Plandex Mistral Vibe
Test files 190 266 270 68 42 22 532 6 221
Test lines 23,279 61,963 128,336 14,612 12,427 7,970 228,906 2,517 46,582
Ratio 0.83 0.26 0.28 0.24 0.48 0.06 0.98 0.03 1.28

Acolyte maintains a high test ratio because lifecycle phases and tools are independent modules with clean interfaces.

Test types include:

  • unit (*.test.ts)
  • integration (*.int.test.ts)
  • TUI visual regression (*.tui.test.ts)
  • performance (*.perf.test.ts)

Module cohesion

Metric Acolyte OpenCode Codex Crush Aider Goose Qwen Code Plandex Mistral Vibe
Avg lines / file 119 206 406 227 247 389 217 224 146
Files > 500 lines 2 (1%) 120 (10%) 242 (21%) 26 (10%) 14 (13%) 88 (26%) 114 (11%) 36 (11%) 8 (3%)
Largest file 692 5,341 9,842 3,611 2,486 2,741 2,369 2,455 2,617
Barrel / index files 1 54 50 2 5 45 53 0 43

Acolyte maintains the smallest average module size and fewest large files.

Error handling

Per 1k source lines.

Metric Acolyte OpenCode Qwen Code
.safeParse() calls 1.1 0.1 0.0
try { ... } blocks 6.1 1.3 5.0
.catch() calls 0.5 2.3 0.4

TypeScript projects only.

Acolyte validates boundaries with Zod .safeParse() at a higher rate than other projects. RPC payloads, model responses, and configuration files are validated before entering the system.

Key takeaways

Across the benchmarked projects, Acolyte demonstrates:

  • Extremely low any usage and strong TypeScript safety
  • The smallest modules and lowest large-file density
  • The lightest dependency footprint
  • High automated test coverage
  • Clear lifecycle boundaries across independently testable modules

These characteristics reflect a deliberately small, strongly typed architecture — built so that lifecycle phases and tools behave predictably and can be independently verified.

Summary

Dimension Acolyte OpenCode Codex Crush Aider Goose Qwen Code Plandex Mistral Vibe
Type safety High Medium Medium Medium High Panic-heavy High Medium Any-heavy
Test density High (0.83) Low (0.26) Low (0.28) Low (0.24) Medium (0.48) Lowest (0.06) High (0.98) Low (0.03) Highest (1.28)
Module size Smallest (119) Medium (208) Large (406) Medium (227) Medium (247) Largest (389) Medium (217) Medium (224) Small (143)
Dependencies Lightest (19) Heavy (275) Heavy (303) Light (72) Light (52) Heavy (169) Heavy (176) Light (54) Light (50)
First commit Feb 2026 Apr 2025 Apr 2025 May 2025 May 2023 Aug 2024 Jun 2025 Oct 2023 Dec 2025

Acolyte leads on type safety, module size, and dependency count while remaining the smallest codebase in the benchmark.

Updated 14 April 2026.