feat: Squad Bench — multi-agent benchmark suite

## Summary

A standardized benchmark system that tests and compares different AI CLI tools on collaborative tasks.

## Usage

```bash
squad bench --suite api-crud --agents claude,gemini,codex
```

## Output

```
┌────────────┬──────────┬────────┬───────────┐
│ Agent      │ Time     │ Tests  │ Tokens    │
├────────────┼──────────┼────────┼───────────┤
│ Claude     │ 2m 13s   │ 8/8    │ 12,400    │
│ Gemini     │ 1m 47s   │ 7/8    │ 8,200     │
│ Codex      │ 3m 02s   │ 8/8    │ 15,100    │
└────────────┴──────────┴────────┴───────────┘
```

## Benchmark Suites

Predefined task sets, e.g.:
- **api-crud**: Build a REST API with CRUD operations
- **bug-fix**: Fix a set of known bugs in a test repo
- **refactor**: Refactor messy code to clean patterns
- **collab**: Multi-agent task where manager + worker must coordinate

## Metrics

- Completion time
- Test pass rate (predefined test cases per suite)
- Code quality (lint score, complexity)
- Token consumption (if measurable)
- Collaboration efficiency (for multi-agent suites)

## Why

No one has built a multi-agent collaboration benchmark yet. This fills a gap in the ecosystem and produces high-value comparison content.

## Complexity

Medium-to-large. Requires automated agent launching, result collection, and test harness integration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Squad Bench — multi-agent benchmark suite #11

Summary

Usage

Output

Benchmark Suites

Metrics

Why

Complexity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Squad Bench — multi-agent benchmark suite #11

Description

Summary

Usage

Output

Benchmark Suites

Metrics

Why

Complexity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions