Give Claude a dev team in one workspace: one supervisor, four specialists, and a shared build doctrine that produces bulletproof plans before coding starts.
The point is not just that it runs on your Claude subscription. The point is that you stop copy-pasting between sessions, keep context in one place, and let one Claude session act like a supervisor with a real dev team behind it while you can still jump in and talk to any specialist directly.
I was spending hours doing the same ritual over and over: open one Claude session to shape the idea, another to write the plan, another to review it, another to code it, another to test it. Every handoff meant more copy-paste, more context loss, and more chances for the build to drift.
The Dev Squad is the answer to that problem. Instead of juggling separate chats by hand, you give Claude its own dev team in one place. The supervisor is still just a Claude session, like any other session you would open yourself, except now it knows it has a planner, a plan reviewer, a coder, and a tester it can use on your behalf. You can still intervene directly with any of them whenever you want, but you no longer have to manually orchestrate the whole thing.
The real win is not “multi-agent” by itself. The real win is that the build plan becomes the contract for the whole team. The planner writes a plan with complete, copy-pasteable code. The plan reviewer tears it apart until there are no gaps. Only then does the coder touch the implementation. The tester checks the result against the approved plan instead of guessing what “done” means.
The key insight is still the same: the plan is the code contract. The planner does not write a vague spec sheet. The planner writes a plan that is complete enough for the coder to build without asking a single question. That is what makes the builds reliable — by the time coding starts, the biggest decisions should already be made and verified.
That is why the team shares a doctrine: build-plan-template.md, checklist.md, and the locked plan.md. The supervisor can run the team for you, but the quality bar stays the same. Research first. Verify from source. Write complete code in the plan. Do one self-review. Let the plan reviewer challenge it. Only then move to coding and testing.
This project exists because I wanted that whole process in one interface, with one shared context, and a supervisor who can run it for me when I do not want to babysit five different sessions.
If you already vibe code solo with Claude, you are already doing the job of a whole team yourself. The Dev Squad is the “why limit yourself?” version of that workflow. Same Claude. Same chat feel. But now there is a planner, reviewer, coder, and tester coordinated around the same build doctrine so you can get the output quality of a real dev team without losing context every time you switch tasks.
The Dev Squad is Claude with its own dev team:
- the Supervisor is your default front door
- the Planner writes the build plan
- the Plan Reviewer pushes on the plan until there are no gaps
- the Coder builds the approved plan
- the Tester checks the result against the plan and sends fixes back when needed
- the Security Auditor is optional — toggle it on at build start and a final OWASP-class read-only audit runs after testing, with severity-ranked findings you choose what to do with before deploy
- the whole team follows the same doctrine:
build-plan-template.md,checklist.md, and the approvedplan.md - you can talk to the supervisor by default or jump directly into any specialist chat whenever you want
Today, the current product shape is already visible:
- the supervisor can capture the concept, start the team, pause after review, continue an approved plan, resume stalled planning/review turns, and stop the run
- the specialists keep their own context in the same workspace instead of forcing you to copy/paste between sessions
- there are now two interfaces for the same team:
- the Office View for the full visual dashboard
- the Squad View for a calmer Supervisor-first workspace without the office UI
Internally the app labels the team as S, A, B, C, D, and E. In the product, think of them as the Supervisor, Planner, Plan Reviewer, Coder, Tester, and Security Auditor (optional).
For the architecture, security model, and current honest stance on what we ship vs. what we deliberately don't, see ARCHITECTURE.md, SECURITY.md, and SECURITY-ROADMAP.md.
╔═══════════════════════════════════════════════════════════════╗
║ SQUAD ROSTER v0.4.3 ║
╠═══════════════════════════════════════════════════════════════╣
║ ● S │ SUPERVISOR │ ONLINE ║
║ ● A │ PLANNER │ ONLINE ║
║ ● B │ PLAN REVIEWER │ ONLINE ║
║ ● C │ CODER │ ONLINE ║
║ ● D │ CODE REVIEWER/TESTER │ ONLINE ║
║ ○ E │ SECURITY AUDITOR │ OPTIONAL (toggle at start) ║
╚═══════════════════════════════════════════════════════════════╝
| Team Member | What It Does | Internal ID |
|---|---|---|
| Supervisor | Your default Claude session. Oversees the team, captures the concept, starts work, pauses after review, continues approved plans, and helps when runs stall or get weird. | S |
| Planner | Chats with you about the concept, researches everything, and writes a build plan with complete code for every file. | A |
| Plan Reviewer | Reads the planner's work and tears it apart. Asks hard questions. Loops with the planner until there are zero gaps. | B |
| Coder | Follows the approved plan exactly. Writes every file, installs deps, and builds the project. No improvising. | C |
| Tester | Reviews the code against the approved plan, runs it, catches bugs, and loops with the coder until everything passes. | D |
| Security Auditor (optional) | Read-only OWASP-class audit after testing succeeds. Ranks findings by severity. You choose what to fix, dismiss, or ignore — and when to deploy. Only runs if the Security Audit toggle is on at build start. | E |
Each team member is a separate Claude Code session running Claude Opus 4.6. They communicate through structured JSON signals routed by an orchestrator. Restrictions are enforced by a PreToolUse hook, but the real product idea is not "a hook-driven pipeline." It is a supervisor-led dev team that all follows the same build doctrine: the build plan template, the checklist, and the locked plan. See SECURITY.md for the threat model and known limitations.
1. Open the viewer
2. Tell the supervisor what you want to build
3. Ask the supervisor to start planning or start the build
4. Let the supervisor manage the team, or jump into any specialist panel yourself
5. Your project is in ~/Builds/
The supervisor is the recommended front door now. The old buttons and direct specialist chats are still there, but the product is increasingly shaped around "talk to the supervisor, let the supervisor use the team."
flowchart TD
U([USER]) -->|"concept"| S
S[["SUPERVISOR (S)"]] -->|"start"| A
A[["PLANNER (A)"]] -->|"plan.md"| B
B[["PLAN REVIEWER (B)"]] -->|"approved"| C
B -.->|"questions"| A
C[["CODER (C)"]] -->|"code"| D
D[["TESTER (D)"]] -->|"pass"| AUDIT{"AUDIT ENABLED?"}
D -.->|"fail"| C
AUDIT -->|"no"| DEPLOY[["DEPLOY"]]
AUDIT -->|"yes"| E[["SEC AUDITOR (E)"]]
E -->|"findings"| PAUSE["AWAITING DECISION"]
PAUSE -->|"send to C"| C
PAUSE -->|"dismiss"| PAUSE
PAUSE -->|"deploy"| DEPLOY
DEPLOY --> DONE([BUILD COMPLETE])
classDef green fill:#0a0a0a,stroke:#00FF41,color:#00FF41
classDef optional fill:#0a0a0a,stroke:#00FF41,color:#00FF41,stroke-dasharray: 5 5
class S,A,B,C,D,DEPLOY green
class E,PAUSE optional
Phase 0: Concept — You talk to the supervisor or the planner. The recommended flow is to tell the supervisor what you want, let the supervisor capture the concept, and then tell the supervisor when to start the team. Strict mode can still ask for Bash approvals later.
Phase 1: Planning — The planner reads the build plan template and checklist, researches the concept (web searches, docs, source code), writes plan.md with complete, copy-pasteable code for every file, then does one self-review pass before handing it to the plan reviewer. No placeholders.
Large planning runs are often the slowest part of the system. For bigger builds, it is normal for planning to take 10-15 minutes, and sometimes longer, because the planner is doing real source verification and producing a code-complete plan before coding starts. If the planner is still emitting events, let it cook.
Phase 1b: Plan Review — The plan reviewer reads the plan and sends structured questions back to the planner. They loop until the reviewer is fully satisfied and approves. The plan is locked. No agent can modify it.
Phase 2: Coding — The coder reads the locked plan and builds exactly what it says. Every file, every dependency, every line of code.
Phase 3: Code Review + Testing — The tester reads the plan and the code. Checks every item. If anything doesn't match or fails, the tester sends issues back to the coder. They loop until the tester approves and all tests pass.
Phase 3.5: Security Audit (optional) — If the Security Audit toggle was on at build start, the security auditor (E) does a final read-only pass for OWASP-class issues, path traversal, ReDoS, and missing input validation. Findings are ranked critical/high/medium/low. The pipeline pauses; you decide per finding whether to send a scoped fix to the coder (the coder fixes, the tester verifies tests, the auditor re-audits just that finding), dismiss, or ignore. Then you click Deploy.
Phase 4: Deploy — The finished project is ready.
The plan-review loop between the planner and the plan reviewer catches design gaps before a single line of code is written. The test loop between the coder and the tester catches implementation bugs before anything ships. The optional security audit is a final read-only sanity check before you ship. Each loop has no round limit — they keep going until it's right.
A pixel art office where 5 agents sit at desks. You watch them work in real-time:
- Live Feed — Every event from every agent, timestamped and color-coded
- Dashboard — Phase progress, elapsed time, file count, errors
- Supervisor Update — A manager-style summary of what the team is doing, what is blocked, and what S needs from you next
- Current Turn — Shows which agent turn is active, what it is doing, and whether it looks stalled
- 5-Panel Grid — Supervisor panel on the left, Planner / Plan Reviewer / Coder / Tester on the right. Each panel shows that agent's activity with auto-scroll. Click any panel to expand.
- Per-Panel Chat — Each panel has its own input. Talk directly to any agent.
- Security Audit Panel — When the optional Security Audit toggle is on, a dedicated Agent E panel slides in on the right after testing succeeds. The other agent panels squish to make room. Findings list with per-finding
Send to C/Dismissbuttons, a chat with E for follow-up questions, and a gatedDeploy nowbutton with a confirmation modal. - Controls —
Full Build/Plan Only,Security Audit Off / On,START,STOP AFTER REVIEW,CONTINUE BUILD,RESUME STALLED RUN,STOP,Reset,View Plan. These now act as fallback controls; you can also ask the supervisor to do the same things in chat. - Art style — The office scene uses a mix of original pixel sprites and CSS-drawn props
When idle, agents wander the office, visit the hookah lounge, and play ping pong. Agent E does not appear in the office scene — it only exists when the optional audit runs.
A simpler Supervisor-first workspace for the same team model:
- Normal chat feel — one main Supervisor conversation with the same Planner, Plan Reviewer, Coder, and Tester behind it
- Specialist tabs — jump directly into Planner / Reviewer / Coder / Tester when you want a longer back-and-forth
- Same runtime — same orchestrator, same runner, same strict-mode approvals, same recovery behavior
- No office UI required — better for users who want the dev-team model without the visual dashboard
Open it at http://localhost:3000/squad.
- Claude Code CLI — this is the engine. You must have the
claudecommand installed and working in your terminal. Install it from claude.ai/code. - Active Claude subscription — Max, Pro, or Team. All agent sessions run on your subscription. No API key needed.
- Sandboxing note — The Dev Squad is not a sandbox. The hook is a discipline guardrail, not OS-level isolation. The repo includes a Docker runner abstraction that works in narrow conditions, but Claude Code subscription auth inside containers is too unreliable to make sandboxed execution the default. If you need real OS-level isolation, run The Dev Squad inside a VM you own. See SECURITY.md and SECURITY-ROADMAP.md for the honest threat model.
- Node.js 22+
- pnpm
git clone https://github.com/johnkf5-ops/the-dev-squad.git
cd the-dev-squad
pnpm install
pnpm devOpen http://localhost:3000 for Office View or http://localhost:3000/squad for Squad View.
That's it. The viewer handles everything — spawning agents, running the orchestrator, managing builds.
Yes, The Dev Squad can work on an existing repo.
The best current flow is:
- open the existing repo in The Dev Squad
- tell the Supervisor that this is an existing codebase
- explain what the repo is, what you want changed, and any constraints that matter
- let the Supervisor hand it to the Planner so the team can build context from the real codebase before planning and coding
Good prompts look like:
This is an existing repo. Read the current codebase first, understand how it works, then help me make the following changes: ...I am not starting from scratch. Build context from the repo as it exists today, then create a plan for: ...
This flow works today. What is still rough is the UX: The Dev Squad is currently more polished for new builds in ~/Builds/ than for a first-class "import existing project" flow.
The Dev Squad has two modes, toggled in the dashboard:
The automated team mode. You describe what you want, and the dev team builds it with minimal involvement from you. In strict mode, the UI can still ask you to approve coder/tester Bash commands.
- Reset if Needed — Clear any previous session.
- Talk to the Supervisor or Planner — The preferred path is to tell the supervisor what you want and let the supervisor manage the team. Direct planner chat still works when you want to hash out the concept yourself.
- Choose a Goal — Pick Full Build to run the whole team or Plan Only to stop cleanly after the plan reviewer approves the plan. The selected goal also acts as the default when you ask the supervisor to start from chat.
- Start from Chat or Button — Tell the supervisor to start planning or start the build. The old START button still works as a fallback control.
- Pause from Chat or Button — During planning or plan review, ask the supervisor to stop after review, or click STOP AFTER REVIEW if you want the run to pause after the plan reviewer approves the plan instead of continuing into coding.
- Watch — Each panel auto-scrolls as events come in. Click any panel to expand. The dashboard shows phase progress.
- Continue or Recover — If the run pauses after plan review, ask the supervisor to continue the build or use CONTINUE BUILD. If the planner or plan reviewer stalls during planning/review, ask the supervisor to resume the stalled run or use RESUME STALLED RUN.
- Stop — Ask the supervisor to stop the run, or click STOP at any time.
- View Plan — Once the planner writes the plan, click View Plan to read it.
- Done — Your project is in
~/Builds/<project-name>/.
After the build, chat with any agent for post-build work — fixing bugs, adding features, asking questions.
Strict mode is for users who want a human in the loop for shell execution from the build agents.
- What changes — Every Bash call from the coder and tester pauses for approval
- What you see — The dashboard shows an approval card with the agent, phase, and command description
- What happens on approve — The exact approved command gets a one-time grant and runs once
- What happens on deny — The agent is told the command was denied and must continue without it or explain what is blocked
- What does not change — Strict mode improves practical safety, but it is not OS-level sandboxing. The known hook limitations in SECURITY.md still apply.
- What this is not — Strict mode does not change the team model. It just adds human approval on risky shell execution.
You are the orchestrator. 5 panels, 5 Claude sessions, each with a specialty. You talk to whoever you want, whenever you want. No automation, no phases, no pipeline.
- No START/STOP — there's no pipeline to run. You direct everything.
- Claude permission prompts still apply — manual mode is looser than pipeline mode, but it is not unguarded. Claude Code can still ask for permission inside each direct session.
- Model picker — Choose between Opus and Sonnet. Appears only in manual mode.
- Hand off → — Each panel has a handoff button. Click it to grab that agent's last response and stage it as context for the next agent you message. One click to pass work between agents.
- Per-agent chat — Each panel has its own send button. You can talk to multiple agents at once — they run independently.
- No pipeline role guardrails — Agents don't follow the full pipeline templates/checklists automatically. They're direct Claude sessions with expertise labels (planning, code review, coding, testing, diagnostics), and you decide what they do.
Manual mode is useful when you want the multi-panel workspace without the automation — prototyping, brainstorming, or running your own workflow.
The screen is split into two sections:
Top half — A pixel art office with 5 agents at desks. They animate in real-time as they work. Below the office is a live feed showing every event from every agent. To the right is a dashboard with the mode toggle, agent status, and controls.
Bottom half — A 5-panel grid. The Supervisor panel spans the left column. The Planner, Plan Reviewer, Coder, and Tester panels fill the right in a 2x2 grid. Each panel shows that agent's activity and has its own chat input at the bottom.
Once the build is complete, you can chat directly with any agent for post-build work. Click on the coder's panel and ask it to fix a bug. Click on the tester's panel and ask it to run more tests. Each agent retains context from the build.
The Supervisor panel on the left is the clearest version of the product idea. It is still just a Claude session, like any session you would open yourself, except it knows it has a team and a shared build doctrine behind it. Before a run starts, the supervisor captures the concept locally and waits for an explicit start command instead of freelancing. Once a run exists, the supervisor gets a live team snapshot every time you chat with it: current phase, pipeline status, active turn, recent events, pending approvals, and recommended next actions. The UI now also shows a proactive supervisor update card so you do not have to read raw event logs just to understand what the team is doing, and the supervisor now narrates key transitions in chat too: planning start, review handoff, pauses, resumes, approval waits, and completion. The supervisor can also trigger the core team controls directly from chat: start a run, start plan-only mode, stop after review, continue an approved plan, resume a stalled planning/review turn, or stop the run. If something breaks, stalls, loops, or looks suspicious, ask the supervisor what is happening or tell it what you want the team to do next.
| Control | Mode | What It Does |
|---|---|---|
| PIPELINE / MANUAL | Both | Toggle between autonomous pipeline and manual orchestration |
| Model Picker | Manual | Choose Claude model (Opus or Sonnet) |
| Full Build / Plan Only | Pipeline | Chooses whether the supervisor should run the whole team or stop after approved plan review |
| Security Audit Off / On | Pipeline | Optional. When on, Agent E runs a final OWASP-class read-only audit after testing succeeds and the pipeline pauses for your per-finding review and explicit Deploy. Default Off. |
| START | Pipeline | Fallback button that creates the project directory, spawns the orchestrator, and begins the selected supervisor goal |
| STOP AFTER REVIEW | Pipeline | Arms a clean pause once the plan reviewer approves the plan |
| KEEP RUNNING AFTER REVIEW | Pipeline | Clears the stop-after-review request and lets the run continue into coding |
| CONTINUE BUILD | Pipeline | Resumes a paused plan-only / stopped-after-review run from the approved plan |
| RESUME STALLED RUN | Pipeline | Re-launches the orchestrator and resumes a stalled planner/plan-reviewer turn from the saved Claude session |
| Send to C (Security Audit panel) | Pipeline | Sends one specific finding to the coder for a scoped fix. The tester then verifies tests, the auditor re-audits that one finding. |
| Dismiss (Security Audit panel) | Pipeline | Marks a finding as acknowledged-but-not-fixing. Logged in the event stream. |
| Deploy now (Security Audit panel) | Pipeline | Confirms you are done reviewing findings and runs the deploy step (commit + open file). A modal asks for confirmation. |
| STOP | Pipeline | Kills orchestrator and all agent sessions immediately |
| Reset | Both | Clears all state. In pipeline mode, also stops the orchestrator. |
| View Plan | Pipeline | Opens plan.md in a modal (appears after the planner writes the plan) |
| Hand off → | Manual | Stages the agent's last response as context for the next agent you message |
Agents are constrained by a PreToolUse hook that gates every tool call. The hook prevents accidental lane drift — it is not a security sandbox. See SECURITY.md for the threat model, known limitations, and a matrix showing what is fixable in-hook vs what requires design changes or OS-level isolation.
This project is meant to provide practical guardrails and a disciplined workflow, not a security sandbox. If you plan to use it on sensitive code or systems, read SECURITY.md first and decide whether the current threat model fits your environment.
Plain-English status:
- Pipeline mode is the more structured path today
- Manual mode still has Claude permission prompts, but fewer product-level guardrails
- Sandboxed/isolated execution is not an active roadmap item. The Docker runner code remains for narrow cases, but Claude Code subscription auth inside containers is too unreliable to make sandboxed execution a default. If you need OS-level isolation, run The Dev Squad inside a VM you own.
| Team Member | Can Write | Can Run Bash | Can Spawn Agents |
|---|---|---|---|
Planner (A) |
plan.md only in the current project |
No | No |
Plan Reviewer (B) |
Nothing | No | No |
Coder (C) |
Current project only (except plan.md) |
Yes (dangerous cmds need approval) | No |
Tester (D) |
Nothing | Yes (dangerous cmds need approval) | No |
Security Auditor (E) (optional) |
Nothing | No | No |
Supervisor (S) |
~/Builds/ only (no .claude/) |
Yes (pattern-restricted) | No |
Additional protections:
Write/Edit/NotebookEditare jailed to the current project for the planner/coder and blocked for the reviewer/tester/auditor- Pipeline sessions set
CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1, so Bashcddoes not persist into later file-edit tool calls - Plan is locked after the plan reviewer approves — no agent can modify it
- The planner and plan reviewer can use
WebSearchandWebFetchfor direct-source research and review - The security auditor is read-only — no Bash, no Write/Edit, no Web, no Agent tool
- Fast mode auto-approves safer Bash and asks for riskier Bash
- Strict mode requires approval for every Bash call from the coder and tester
- All sessions default to
--permission-mode autofor Claude's built-in safety classifier - Override with
PIPELINE_PERMISSION_MODEenv var (e.g.plan,auto, ordangerously-skip-permissions)
Roadmap:
- Fast mode stays the default for autonomy
- Strict mode is available for pipeline runs
- Optional Security Audit (Agent E) — final OWASP-class read-only pass with severity ranking, user-controlled fix loop, and explicit deploy gate. Toggle at build start.
- Request-scoped approvals are live; strict-mode approvals are now tied to explicit request records instead of "latest project wins"
- Host-owned policy service (planned, no ship date) — moves trust outside the agent-writable workspace
- The concrete implementation plan lives in SECURITY-ROADMAP.md
Agents communicate via structured JSON — no text parsing:
// B reviewing A's plan
{ "status": "approved" }
{ "status": "questions", "questions": ["What about error handling?"] }
// D reviewing C's code
{ "status": "approved" }
{ "status": "issues", "issues": ["Missing input validation"] }
// D testing
{ "status": "passed" }
{ "status": "failed", "failures": ["PUT /users returns 500"] }
// E auditing (optional final pass)
{ "status": "approved" }
{ "status": "issues", "issues": [
{ "severity": "critical", "finding": "[src/api/auth.ts:42] SQL injection: ..." },
{ "severity": "medium", "finding": "[src/utils/regex.ts:7] ReDoS: ..." }
]
}The orchestrator routes these signals between agents and advances the pipeline when an approval is received.
Useful local checks:
pnpm test:hook— verifies the agent/tool contract against the live approval hookpnpm test:signals— verifies structured signal parsing for plan review, code review, and test resultspnpm dev— runs the viewer locally at http://localhost:3000
the-dev-squad/
src/
app/
page.tsx # Main page — dashboard, panels, controls
api/ # API routes (chat, start, stop, reset, state)
components/
mission/ # Pixel art office scene
lib/
use-pipeline.ts # React hook — polls state, exposes actions
pipeline/
orchestrator.ts # Spawns agents, routes signals, enforces flow
.claude/hooks/approval-gate.sh # Per-agent permission enforcement
role-a.md, role-b.md, etc. # Agent role context files
build-plan-template.md # Template A follows when writing plans
public/
sprites/ # Character and furniture sprites
See CONTRIBUTING.md for guidelines.
- CrashOverride LLC — creator and maintainer
- Claude Code (Anthropic) — core implementation, pipeline iteration, Agent E security audit
- ChatGPT 5.4 (OpenAI) — contributor for design review, security hardening guidance, and documentation passes
MIT - see LICENSE for details.
This project is provided AS IS, without warranty. It is your responsibility to review approvals, review generated code, and decide whether this tool is appropriate for your environment. The MIT license is the controlling legal text, and SECURITY.md documents the current threat model and limitations.
Copyright (c) 2026 CrashOverride LLC
Built with Claude Code and ChatGPT 5.4. Runs on Claude Code. No API required.
