You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The three existing review lenses left performance-sensitive concerns
(N+1 queries, algorithmic complexity, unbounded memory growth) in
nobody's scope. Adding a conditional 4th subagent keeps the typical
case at 3 agents while providing deeper analysis when the diff touches
hot paths. Folding standards, docs, and dependency concerns into
existing agents closes additional gaps without coordination overhead.
Copy file name to clipboardExpand all lines: src/ai_rules/config/skills/code-reviewer/SKILL.md
+33-7Lines changed: 33 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,6 +80,17 @@ Compute from the gathered diff:
80
80
81
81
Crossfire (external model perspectives) is only available for Medium/Large diffs when `crossfire` keyword is in args.
82
82
83
+
### Performance Relevance
84
+
85
+
For Medium/Large diffs, scan the diff text to determine whether the Performance & Scalability agent should be activated. Set `performance_relevant = true` when the diff contains ANY of:
86
+
- Database/ORM query construction (`.filter(`, `.all(`, `.query(`, `.execute(`, `SELECT`, `JOIN`, `WHERE`, raw SQL strings)
87
+
- Loops over collections of indeterminate size (`for x in results`, `for item in data`, `while` loops processing external input)
88
+
- New function calls, I/O operations, or subprocess invocations inside loops
89
+
- Data structures that grow proportionally with input volume (appending to lists/dicts in loops, accumulation patterns)
Do NOT activate for: config-only changes, test-only changes, documentation-only changes, UI/template changes, import reorganization, type annotation changes.
93
+
83
94
## Review Methodology
84
95
85
96
### Phase 1: Context Gathering
@@ -100,19 +111,22 @@ For Small complexity diffs, execute the review inline using all four lenses sequ
100
111
- Is this the right location/abstraction level for this functionality?
101
112
- Would this be better in a library or separate module?
102
113
- Does the overall design approach make sense for this system?
114
+
- If this diff introduces loops, query patterns, or data structure operations: are there obvious algorithmic complexity concerns (e.g., O(n^2) where O(n) is possible) or unnecessary repeated I/O?
103
115
104
116
**Lens 1: Simplicity & Maintainability**
105
117
- Could this be simpler while maintaining functionality?
106
118
- Will future developers understand this easily?
107
119
- Is there unnecessary complexity or over-engineering?
108
120
- Is this solving present needs or hypothetical future problems?
109
121
- Are there opportunities to reduce duplication (3+ occurrences)?
122
+
- Does the code follow project-specific conventions from AGENTS.md or CLAUDE.md? (naming, directory structure, tooling mandates)
110
123
111
124
**Lens 2: Security & Reliability**
112
125
- Are there security vulnerabilities? (SQL injection, XSS, auth bypass, data exposure)
113
126
- Is error handling adequate for external dependencies?
114
127
- Are edge cases properly handled?
115
128
- Could this cause data corruption or loss?
129
+
- If dependency files changed: are new dependencies well-maintained, version-pinned, and free of known vulnerabilities?
116
130
117
131
**Lens 3: Functionality & Testing**
118
132
- Does the code do what the developer intended?
@@ -122,6 +136,7 @@ For Small complexity diffs, execute the review inline using all four lenses sequ
122
136
- Do tests verify behavior, not implementation details?
123
137
- Is coverage sufficient for the risk level?
124
138
- Are tests focused on what matters, not trivial cases?
139
+
- For changed APIs or function signatures: are docstrings and documentation still accurate?
125
140
126
141
After applying all lenses, proceed directly to Phase 3.
127
142
@@ -140,13 +155,16 @@ Gather for subagent briefings:
140
155
141
156
Load the briefing template from `references/subagent-template.md` and construct one briefing per specialist. Launch all agents in parallel — this is critical for speed.
| Security & Reliability |`sonnet`| Injection, auth, data exposure, error handling, edge cases, dependency hygiene | Do NOT review for design fit, over-engineering, test coverage, or performance | Always |
163
+
| Design & Simplicity |`sonnet`| Architecture fit, abstraction level, over-engineering, duplication, maintainability, project conventions | Do NOT review for security vulnerabilities, test coverage, or performance cost | Always |
164
+
| Functionality & Testing |`sonnet`| Correctness, intended behavior, test coverage, test quality, user-facing edge cases, API contract accuracy | Do NOT review for security vulnerabilities, design patterns, or performance | Always |
165
+
| Performance & Scalability |`sonnet`| Algorithmic complexity, query efficiency, I/O patterns, memory growth, hot-path regressions | Do NOT review for security, design architecture, correctness, or test quality | Only when `performance_relevant = true`|
144
166
145
-
| Agent | Model | Lens Focus | Scope Boundaries |
146
-
|-------|-------|------------|-----------------|
147
-
| Security & Reliability |`sonnet`| Injection, auth, data exposure, error handling, edge cases | Do NOT review for design fit, over-engineering, or test coverage |
148
-
| Design & Simplicity |`sonnet`| Architecture fit, abstraction level, over-engineering, duplication, maintainability | Do NOT review for security vulnerabilities or test coverage |
149
-
| Functionality & Testing |`sonnet`| Correctness, intended behavior, test coverage, test quality, user-facing edge cases | Do NOT review for security vulnerabilities or design patterns |
167
+
If the diff was flagged as performance-relevant in the Performance Relevance classification, launch all four agents. Otherwise, launch only the three core agents (Security & Reliability, Design & Simplicity, Functionality & Testing).
150
168
151
169
Each subagent receives: the full diff, instruction to read modified files in full (not just diff hunks), its assigned lens with key questions from the template, explicit scope boundaries, and the severity framework (🔴 MUST FIX / 🟡 SHOULD FIX / 🟢 CONSIDER).
152
170
@@ -313,8 +331,16 @@ Synthesize findings from ALL sources (Claude subagents + optional crossfire):
313
331
- Identical findings from multiple agents: keep the one with the most specific file:line citation
314
332
- Map crossfire severities: CRITICAL → 🔴, IMPORTANT → 🟡, MINOR → 🟢
315
333
334
+
**Step 2.5: Cross-agent verification**
335
+
336
+
Before producing the final output, perform two verification checks:
337
+
338
+
1.**Contradiction check:** Scan for cases where one agent's findings assume something another agent's findings contradict. When detected, apply orchestrator judgment — explain which finding holds and why, rather than presenting both uncritically.
339
+
340
+
2.**Gap check:** Ask: "Are there concerns that fall between the scope boundaries of the agents that none of them would have been positioned to catch?" Surface any such concerns as orchestrator-attributed findings with appropriate severity.
341
+
316
342
**Step 3: Produce unified output**
317
-
Organize findings by severity tier (🔴 then 🟡 then 🟢), NOT by which agent found them. For each finding, note if it was confirmed by multiple sources. Include a methodology note (e.g., "Reviewed via 3 parallel Claude specialists"or "Reviewed via 3 Claude specialists + Codex + Gemini").
343
+
Organize findings by severity tier (🔴 then 🟡 then 🟢), NOT by which agent found them. For each finding, note if it was confirmed by multiple sources. Include a methodology note (e.g., "Reviewed via 3 parallel Claude specialists", "Reviewed via 4 Claude specialists (incl. Performance)" or "Reviewed via 4 Claude specialists + Codex + Gemini").
318
344
319
345
**Net verdict (PR Mode only):**
320
346
- REQUEST_CHANGES if any HIGH-confidence 🔴 MUST FIX exists
Copy file name to clipboardExpand all lines: src/ai_rules/config/skills/code-reviewer/references/subagent-template.md
+15-3Lines changed: 15 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,25 +85,37 @@ CRITICAL RULES:
85
85
## Lens Definitions
86
86
87
87
### Security & Reliability Agent
88
-
**Focus:** Injection, authentication, authorization, data exposure, error handling, edge cases, data corruption risks
88
+
**Focus:** Injection, authentication, authorization, data exposure, error handling, edge cases, data corruption risks, dependency hygiene
89
89
**Key questions (adapt per diff):**
90
90
- Are there input validation gaps where external data enters the system?
91
91
- Could any change introduce auth bypass, data leakage, or injection vulnerabilities?
92
92
- Is error handling adequate for external dependencies and failure modes?
93
93
- Are edge cases handled that could cause data corruption or crashes?
94
+
- If dependency files (pyproject.toml, uv.lock, package.json, Cargo.toml, etc.) are in the diff: are new dependencies well-maintained, pinned to a specific version range, and free of known security concerns?
- Does this change integrate well with the existing architecture and patterns?
99
100
- Is there unnecessary complexity or over-engineering for the problem being solved?
100
101
- Are there duplication opportunities (3+ occurrences of similar logic)?
101
102
- Will future developers understand this code without the PR context?
103
+
- Are there violations of project-specific conventions documented in AGENTS.md, CLAUDE.md, or similar convention files? (Read these files as part of your review context.)
102
104
103
105
### Functionality & Testing Agent
104
-
**Focus:** Correctness, intended behavior, user-facing edge cases, test coverage, test quality
106
+
**Focus:** Correctness, intended behavior, user-facing edge cases, test coverage, test quality, API contract accuracy
105
107
**Key questions (adapt per diff):**
106
108
- Does the code actually do what the developer intended? Any logical errors?
107
109
- Are critical paths covered by tests that verify behavior (not implementation)?
108
110
- Are there edge cases end users will encounter that aren't handled?
109
111
- For UI changes: will the user experience work as expected?
112
+
- For any changed public API, function signature, or user-facing behavior: is the corresponding documentation (docstrings, README, changelog) still accurate?
**Condition:** Only activated when the diff touches performance-sensitive code (database queries, loops over collections, data structure operations, I/O in loops)
117
+
**Key questions (adapt per diff):**
118
+
- Does this change introduce any operations whose cost scales worse than linearly with input size?
119
+
- Are there database queries or I/O operations inside loops that could be batched or hoisted?
120
+
- Could any new data structures grow unboundedly in proportion to user/data volume?
121
+
- Are there repeated computations or I/O calls that could be cached or deduplicated?
0 commit comments