The system has four component types, a quality bar, and a PR workflow. Everything else is implementation detail you discover by building things.
Every contribution passes through this filter. Not aspirational. Enforced.
| Criterion | Pass | Fail |
|---|---|---|
| Specific | Actionable steps, exit criteria | Vague advice ("be careful") |
| Verifiable | Evidence requirements | Trusts LLM confidence |
| Battle-tested | Real workflows, A/B tested | Hypothetical "should work" |
| Minimal | What guides the agent, nothing else | Verbose human explanations |
| Dense | Every word carries instruction | Prose where a table works |
| Component | Location | Format | Purpose |
|---|---|---|---|
| Agent | agents/{name}.md |
YAML frontmatter + markdown | Domain expertise |
| Skill | skills/{name}/SKILL.md |
YAML frontmatter + phased instructions | Workflow methodology |
| Hook | hooks/{name}.py |
Python, JSON in/out | Event-driven automation |
| Script | scripts/{name}.py |
Python CLI | Deterministic operations |
The constraint that separates them: agents know what, skills know how, hooks react to events, scripts do mechanical work. If it requires judgment, it belongs in an agent or skill. If it requires speed and determinism, it belongs in a script or hook.
The toolkit has creator agents. Tell /do what you want.
Agent:
/do create an agent for [domain]
Skill:
/do create a skill for [workflow]
Hook:
/do create a hook for [purpose]
The creator handles file structure, frontmatter, index registration, and routing integration. Test routing after creation:
/do [request that should trigger your new component]
The full cycle, in order:
- Branch from main (
feature/,fix/,refactor/) - Implement changes
- Wave review via
/pr-review(parallel reviewer agents) - Fix findings (up to 3 iterations)
- Retro captures learnings
- Graduate validated learnings into permanent files
- Commit (conventional format, no AI attribution)
- Push to remote
- PR via
gh pr create - CI passes
- Merge
The pr-workflow skill automates steps 3 through 10.
Before submitting:
ruff check . --config pyproject.tomlpassesruff format --check . --config pyproject.tomlpassespython3 scripts/validate-references.pypasses (if adding references)- New components appear in INDEX after running generators
- No secrets in committed files
pytest. Two directories: hooks/tests/, scripts/tests/.
pytest -v # everything
pytest hooks/tests/ -v # hooks only
pytest scripts/tests/ -v # scripts onlyHooks: feed JSON, assert JSON output. Scripts: deterministic input/output verification. Agents and skills use the eval harness in skills/meta/skill-eval/.
Conventional commits. type(scope): description. Types: feat, fix, refactor, docs, test, chore.
No AI attribution. No "Generated with Claude Code" or co-author lines. Ever.
Branch safety. Create feature branches for all work. The pretool hook enforces this.
INDEX.json is generated. Run scripts/generate-agent-index.py and scripts/generate-skill-index.py. Do not hand-edit.
Scripts are deterministic. LLM judgment goes in agents and skills, not scripts.
50ms hook budget. Hooks fire on every tool call. Keep them fast. scripts/benchmark-hooks.py validates this.
Wabi-sabi in docs. Write like a human. Contractions fine. Fragments fine. Banned words: "delve", "leverage", "comprehensive", "robust", "streamline", "empower". scripts/scan-ai-patterns.py catches violations.
Tried something that lost? Record it in docs/what-didnt-work.md before you forget. One dated section, four fields (Expectation, What happened, Evidence, Decision), newest on top. Evidence must be a location, not a claim. Check the file before re-running an experiment, so you don't re-litigate a decision already made.