Skip to content
Draft
7 changes: 7 additions & 0 deletions agents/PAW Review.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,13 @@ When the user requests society-of-thought review mode:
- Pass any user-specified interaction mode, interactive setting, or model preferences
- Do NOT select specialists yourself — let the SoT engine handle adaptive selection from the full set

## Embedded Review Control State

- If `ReviewContext.md` contains `## Hardened State`, treat that section as the durable source of truth for review-stage items and terminal external-review facts.
- Keep any built-in TODOs aligned only as an execution mirror.
- Before yield, delegation, or GitHub posting, reconcile the embedded state when present. If reconciliation cannot make the state `current`, STOP and report the blocker instead of delegating or posting. If the section is absent, continue in legacy best-effort mode and explicitly note that hardened protections are inactive.
- Do not advance past review-stage items or terminal external-review facts that remain unresolved when hardened state is present.

## Skill-Based Execution

{{#vscode}}
Expand Down
9 changes: 7 additions & 2 deletions agents/PAW.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ You are a workflow orchestrator using a **hybrid execution model**: interactive

## Initialization

On first request, identify work context from environment (current branch, `.paw/work/` directories) or user input. If no matching WorkflowContext.md exists, load `paw-init` to bootstrap. If resuming existing work, derive TODO state from completed artifacts. Load `paw-workflow` skill for reference documentation (activity tables, artifact structure, PR routing).
On first request, identify work context from environment (current branch, `.paw/work/` directories) or user input. If no matching WorkflowContext.md exists, load `paw-init` to bootstrap. If resuming existing work, derive TODO state from the embedded `## Hardened State` section when present; otherwise derive it from completed artifacts. Load `paw-workflow` skill for reference documentation (activity tables, artifact structure, PR routing).

## Workflow Rules

Expand Down Expand Up @@ -105,7 +105,12 @@ When calling `paw_new_session`, include resume hint: intended next activity + re

## Workflow Tracking

Use TODOs to externalize workflow steps.
Use TODOs as a mirror of active required workflow items.

- If `WorkflowContext.md` contains `## Hardened State`, treat that section as the durable source of truth for required items, gate items, configured procedure items, and later terminal states.
- Keep TODOs aligned only with items whose status is `pending`, `in_progress`, or `blocked`; do not treat TODOs as the portable source of truth when the embedded state exists.
- Before yield, delegation, or side-effect execution, reconcile the embedded state when present. If reconciliation cannot make the state `current`, STOP and report the blocker instead of delegating or mutating git/GitHub state. If the section is absent, continue in legacy best-effort mode and explicitly note that hardened protections are inactive.
- When hardened state is present, do not advance past required activity items, gate items, or configured procedure items that remain `pending`, `in_progress`, or `blocked`.

**Core rule**: After completing ANY activity, determine if you're at a stage boundary (see Stage Boundary Rule). If yes, delegate to `paw-transition` before doing anything else.

Expand Down
24 changes: 23 additions & 1 deletion docs/guide/stage-transitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,16 @@ continue

This tells the PAW agent to proceed to the recommended next activity.

## Hardened Control-State and Reconciliation

Current PAW workflows embed a compact `## Hardened State` section inside `WorkflowContext.md` (and `ReviewContext.md` for PAW Review).

- The embedded state records required activities, gate items, configured-procedure items, and terminal outcomes that matter for progression.
- Built-in TODOs mirror the active required items for execution convenience, but the embedded artifact state remains the durable source of truth across resume and handoff.
- Transition, status, handoff, resume, and repository-mutation paths reconcile against this state before proceeding.

If the hardened section is absent, PAW continues in **legacy best-effort mode** and reports that hardened protections are inactive.

## Review Policies

PAW supports four review policies that control when the workflow pauses for human review at artifact boundaries:
Expand Down Expand Up @@ -134,6 +144,16 @@ Session Policy: per-stage

To change the policy for an existing workflow, edit `WorkflowContext.md` directly.

## Exact Configured Procedure Enforcement

Planning-docs review, final review, and PAW Review evaluation run the configured review mode exactly:

- `single-model` runs the single-model procedure
- `multi-model` runs the configured multi-model procedure
- `society-of-thought` runs the specialist-based procedure

If the configured procedure cannot run in the current context, PAW blocks with an explicit reason instead of silently downgrading to a different mode.

## Inline Instructions

You can customize activity behavior by adding instructions to your requests:
Expand Down Expand Up @@ -202,7 +222,8 @@ The `paw-status` skill analyzes:
2. **Phase Progress**: Current implementation phase and completion status
3. **Git State**: Branch, commits ahead/behind, uncommitted changes
4. **PR Analysis**: Open PRs, review comments needing attention
5. **Recommended Actions**: Clear next steps based on current state
5. **Control-State Status**: Reconciled gate items, configured-procedure state, terminal review state, and legacy-mode reporting
6. **Recommended Actions**: Clear next steps based on current state

## Handling PR Review Comments

Expand Down Expand Up @@ -273,3 +294,4 @@ Session Policy controls chat context management. This is primarily relevant for

Session Policy is stored in `WorkflowContext.md` and can be changed by editing the file directly.

When `per-stage` mode opens a fresh session in VS Code, the next session resumes from the same embedded control state in `WorkflowContext.md` or `ReviewContext.md`. Session boundaries do not reset required activities or review-stage progress.
19 changes: 15 additions & 4 deletions docs/reference/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,19 @@ PAW uses two AI chat modes ("agents") that orchestrate workflow activities throu

Both agents follow the same pattern: a compact orchestrator that loads a workflow skill for guidance, then delegates activities to specialized skills via subagents.

> **Platform runtime note:** Copilot CLI runs installed `agents/`, `skills/`, and prompt content directly. The VS Code extension uses `src/` TypeScript for commands, setup, and handoff, but that code is not executed in CLI sessions.
> **Platform runtime note:** Copilot CLI runs installed `agents/`, `skills/`, and prompt content directly. The VS Code extension uses `src/` TypeScript for commands, setup, status, and handoff, but that code is not executed in CLI sessions. Both runtimes rely on the same durable contract in `WorkflowContext.md` and `ReviewContext.md`; VS Code surfaces preserve and forward that contract rather than implementing alternate workflow logic.

> **PAW Lite** is available as the `paw-lite` skill — any agent can load it on demand.

## Hardened-State Parity

Current PAW and PAW Review sessions embed a compact `## Hardened State` section inside their context artifacts.

- `WorkflowContext.md` tracks required implementation activities, gate items, configured review procedures, and lifecycle markers
- `ReviewContext.md` tracks review-stage progression, configured review mode, and terminal external-review outcomes
- Built-in TODOs mirror the active required items for execution convenience, but the embedded artifact state remains the durable source of truth across resume and handoff
- If hardened state is absent, agents continue in legacy best-effort mode and say so explicitly

---

## Implementation Workflow
Expand All @@ -32,7 +41,7 @@ Both agents follow the same pattern: a compact orchestrator that loads a workflo
1. Loads the `paw-workflow` skill for orchestration guidance
2. Resolves activity skills using the current platform's mechanism (`paw_get_skills` / `paw_get_skill` in VS Code; installed skill files in Copilot CLI)
3. Delegates activities to specialized skills
4. Applies Review Policy and Session Policy for workflow control
4. Reads `WorkflowContext.md`, reconciles hardened state when present, and applies Review Policy and Session Policy for workflow control

**Hybrid Execution Model:**

Expand All @@ -43,6 +52,8 @@ Both agents follow the same pattern: a compact orchestrator that loads a workflo

This preserves conversation flow for interactive work while leveraging fresh context for focused research and review.

Before transitions, resume/handoff, or repository mutation, the PAW agent reconciles the embedded workflow state with artifact, git, and PR reality. Unresolved required items block progression with explicit reasons.

**Activity Skills:**

| Skill | Capabilities | Primary Artifacts |
Expand Down Expand Up @@ -123,7 +134,7 @@ This preserves conversation flow for interactive work while leveraging fresh con

**Invocation (Copilot CLI):** `copilot --agent PAW-Review` then provide the PR number or URL

**Architecture:** The PAW Review agent uses a skills-based architecture. The shared review logic lives in skills; VS Code-specific automation remains in `src/` only.
**Architecture:** The PAW Review agent uses a skills-based architecture. The shared review logic lives in skills; VS Code-specific automation remains in `src/` only. The agent reads `ReviewContext.md`, enforces hardened review state when present, and falls back to legacy best-effort mode when it is absent.

1. Loads the `paw-review-workflow` skill for orchestration
2. Executes activity skills via subagents for each stage
Expand Down Expand Up @@ -162,7 +173,7 @@ This preserves conversation flow for interactive work while leveraging fresh con
- **Critique response**: Updates comments per recommendations, marks final status
- **GitHub posting**: Creates pending review with only approved comments

**Comment Evolution:** ReviewComments.md shows full history for each comment: original → assessment → updated → posted status. Skipped comments remain visible for manual inclusion if reviewer disagrees with critique.
**Comment Evolution:** ReviewComments.md shows full history for each comment: original → assessment → updated → posted status. Skipped comments remain visible for manual inclusion if reviewer disagrees with critique. Terminal external-review outcomes such as pending review creation or manual-posting guidance are persisted in `ReviewContext.md`.

**Human Control:** Pending review is never auto-submitted. User reviews comments, edits/deletes as needed, consults ReviewComments.md for full context, then submits manually.

Expand Down
20 changes: 20 additions & 0 deletions docs/reference/artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ PAW workflows produce durable Markdown artifacts that trace reasoning and decisi
| Final Review Models | Comma-separated model names |
| Final Review Specialists | `all`, comma-separated names, or `adaptive:<N>` (society-of-thought only) |
| Final Review Interaction Mode | `parallel` or `debate` (society-of-thought only) |
| Final Review Specialist Models | `none`, model pool, pinned pairs, or mixed (society-of-thought only) |
| Final Review Perspectives | `none`, `auto`, or comma-separated perspective names (society-of-thought only) |
| Final Review Perspective Cap | Positive integer (society-of-thought only) |
| Planning Docs Review | `enabled` or `disabled` |
| Plan Generation Mode | `single-model` or `multi-model` |
| Plan Generation Models | Comma-separated model names |
Expand All @@ -89,6 +92,13 @@ PAW workflows produce durable Markdown artifacts that trace reasoning and decisi
- `Artifact Lifecycle` controls git tracking for `.paw/work/` files only; it does **not** manage worktree creation, retention, or cleanup
- `WorkflowContext.md` deliberately stores portable metadata only; VS Code may keep machine-specific worktree paths in local extension state, while Copilot CLI sessions rely on the committed metadata plus git/worktree evidence instead

**Hardened-state section (current workflows):**

- `WorkflowContext.md` includes a compact `## Hardened State` block recording required activities, gate items, configured-procedure items, and lifecycle markers
- Built-in TODOs mirror the active required items from this block for in-session execution, but the embedded artifact state remains the durable source of truth
- `paw-transition`, `paw-status`, handoff/resume paths, and final PR readiness checks reconcile against this section before proceeding
- If the section is absent, the workflow runs in legacy best-effort mode and reports that hardened protections are inactive

### Spec.md

**Purpose:** Testable requirements document defining what the feature must do.
Expand Down Expand Up @@ -222,6 +232,8 @@ This decouples intent capture from phase elaboration, preserving implementer mom

When perspectives are active, each finding includes a `**Perspective**` field indicating which lens surfaced it (`baseline` for findings without a perspective overlay).

`REVIEW-SYNTHESIS.md` is also the artifact that resolves the configured evaluation/review procedure for society-of-thought flows. If the configured mode requires synthesis and this artifact is missing, the workflow stays blocked instead of silently falling back to another procedure.

---

## Review Workflow Artifacts
Expand All @@ -243,6 +255,12 @@ When perspectives are active, each finding includes a `**Perspective**` field in
| CI Status | Passing, failing, pending |
| Flags | CI failures, breaking changes suspected |

**Hardened-state section (current reviews):**

- `ReviewContext.md` includes a compact `## Hardened State` block for review-job identifier, configured review mode, required stage items, and terminal external-review outcomes
- The section lets resume, status, critique, and posting paths re-enter the same review job without re-inferring state from prompt prose
- If the section is absent, PAW Review uses legacy best-effort mode and says so explicitly

### ResearchQuestions.md

**Purpose:** Research questions to guide baseline codebase analysis.
Expand Down Expand Up @@ -329,6 +347,8 @@ When perspectives are active, each finding includes a `**Perspective**` field in
- **Comment text** — What gets posted
- **Rationale** — Evidence, baseline pattern, impact, best practice
- **Assessment** — Usefulness, accuracy, trade-offs (never posted)
- **Final marker** — `**Final**:` status that determines whether the comment is ready for posting
- **Posted/manual state** — Pending review IDs or manual-posting guidance reflected in the associated review control state

---

Expand Down
Loading
Loading