diff --git a/agents/PAW Review.agent.md b/agents/PAW Review.agent.md index 0f3caef6..80b74d15 100644 --- a/agents/PAW Review.agent.md +++ b/agents/PAW Review.agent.md @@ -48,6 +48,13 @@ When the user requests society-of-thought review mode: - Pass any user-specified interaction mode, interactive setting, or model preferences - Do NOT select specialists yourself — let the SoT engine handle adaptive selection from the full set +## Embedded Review Control State + +- If `ReviewContext.md` contains `## Hardened State`, treat that section as the durable source of truth for review-stage items and terminal external-review facts. +- Keep any built-in TODOs aligned only as an execution mirror. +- Before yield, delegation, or GitHub posting, reconcile the embedded state when present. If reconciliation cannot make the state `current`, STOP and report the blocker instead of delegating or posting. If the section is absent, continue in legacy best-effort mode and explicitly note that hardened protections are inactive. +- Do not advance past review-stage items or terminal external-review facts that remain unresolved when hardened state is present. + ## Skill-Based Execution {{#vscode}} diff --git a/agents/PAW.agent.md b/agents/PAW.agent.md index 1595be9e..33345e05 100644 --- a/agents/PAW.agent.md +++ b/agents/PAW.agent.md @@ -7,7 +7,7 @@ You are a workflow orchestrator using a **hybrid execution model**: interactive ## Initialization -On first request, identify work context from environment (current branch, `.paw/work/` directories) or user input. If no matching WorkflowContext.md exists, load `paw-init` to bootstrap. If resuming existing work, derive TODO state from completed artifacts. Load `paw-workflow` skill for reference documentation (activity tables, artifact structure, PR routing). +On first request, identify work context from environment (current branch, `.paw/work/` directories) or user input. If no matching WorkflowContext.md exists, load `paw-init` to bootstrap. If resuming existing work, derive TODO state from the embedded `## Hardened State` section when present; otherwise derive it from completed artifacts. Load `paw-workflow` skill for reference documentation (activity tables, artifact structure, PR routing). ## Workflow Rules @@ -105,7 +105,12 @@ When calling `paw_new_session`, include resume hint: intended next activity + re ## Workflow Tracking -Use TODOs to externalize workflow steps. +Use TODOs as a mirror of active required workflow items. + +- If `WorkflowContext.md` contains `## Hardened State`, treat that section as the durable source of truth for required items, gate items, configured procedure items, and later terminal states. +- Keep TODOs aligned only with items whose status is `pending`, `in_progress`, or `blocked`; do not treat TODOs as the portable source of truth when the embedded state exists. +- Before yield, delegation, or side-effect execution, reconcile the embedded state when present. If reconciliation cannot make the state `current`, STOP and report the blocker instead of delegating or mutating git/GitHub state. If the section is absent, continue in legacy best-effort mode and explicitly note that hardened protections are inactive. +- When hardened state is present, do not advance past required activity items, gate items, or configured procedure items that remain `pending`, `in_progress`, or `blocked`. **Core rule**: After completing ANY activity, determine if you're at a stage boundary (see Stage Boundary Rule). If yes, delegate to `paw-transition` before doing anything else. diff --git a/docs/guide/stage-transitions.md b/docs/guide/stage-transitions.md index f4e7fa50..4adc16e2 100644 --- a/docs/guide/stage-transitions.md +++ b/docs/guide/stage-transitions.md @@ -36,6 +36,16 @@ continue This tells the PAW agent to proceed to the recommended next activity. +## Hardened Control-State and Reconciliation + +Current PAW workflows embed a compact `## Hardened State` section inside `WorkflowContext.md` (and `ReviewContext.md` for PAW Review). + +- The embedded state records required activities, gate items, configured-procedure items, and terminal outcomes that matter for progression. +- Built-in TODOs mirror the active required items for execution convenience, but the embedded artifact state remains the durable source of truth across resume and handoff. +- Transition, status, handoff, resume, and repository-mutation paths reconcile against this state before proceeding. + +If the hardened section is absent, PAW continues in **legacy best-effort mode** and reports that hardened protections are inactive. + ## Review Policies PAW supports four review policies that control when the workflow pauses for human review at artifact boundaries: @@ -134,6 +144,16 @@ Session Policy: per-stage To change the policy for an existing workflow, edit `WorkflowContext.md` directly. +## Exact Configured Procedure Enforcement + +Planning-docs review, final review, and PAW Review evaluation run the configured review mode exactly: + +- `single-model` runs the single-model procedure +- `multi-model` runs the configured multi-model procedure +- `society-of-thought` runs the specialist-based procedure + +If the configured procedure cannot run in the current context, PAW blocks with an explicit reason instead of silently downgrading to a different mode. + ## Inline Instructions You can customize activity behavior by adding instructions to your requests: @@ -202,7 +222,8 @@ The `paw-status` skill analyzes: 2. **Phase Progress**: Current implementation phase and completion status 3. **Git State**: Branch, commits ahead/behind, uncommitted changes 4. **PR Analysis**: Open PRs, review comments needing attention -5. **Recommended Actions**: Clear next steps based on current state +5. **Control-State Status**: Reconciled gate items, configured-procedure state, terminal review state, and legacy-mode reporting +6. **Recommended Actions**: Clear next steps based on current state ## Handling PR Review Comments @@ -273,3 +294,4 @@ Session Policy controls chat context management. This is primarily relevant for Session Policy is stored in `WorkflowContext.md` and can be changed by editing the file directly. +When `per-stage` mode opens a fresh session in VS Code, the next session resumes from the same embedded control state in `WorkflowContext.md` or `ReviewContext.md`. Session boundaries do not reset required activities or review-stage progress. diff --git a/docs/reference/agents.md b/docs/reference/agents.md index 0fffc53b..4bab477c 100644 --- a/docs/reference/agents.md +++ b/docs/reference/agents.md @@ -11,10 +11,19 @@ PAW uses two AI chat modes ("agents") that orchestrate workflow activities throu Both agents follow the same pattern: a compact orchestrator that loads a workflow skill for guidance, then delegates activities to specialized skills via subagents. -> **Platform runtime note:** Copilot CLI runs installed `agents/`, `skills/`, and prompt content directly. The VS Code extension uses `src/` TypeScript for commands, setup, and handoff, but that code is not executed in CLI sessions. +> **Platform runtime note:** Copilot CLI runs installed `agents/`, `skills/`, and prompt content directly. The VS Code extension uses `src/` TypeScript for commands, setup, status, and handoff, but that code is not executed in CLI sessions. Both runtimes rely on the same durable contract in `WorkflowContext.md` and `ReviewContext.md`; VS Code surfaces preserve and forward that contract rather than implementing alternate workflow logic. > **PAW Lite** is available as the `paw-lite` skill — any agent can load it on demand. +## Hardened-State Parity + +Current PAW and PAW Review sessions embed a compact `## Hardened State` section inside their context artifacts. + +- `WorkflowContext.md` tracks required implementation activities, gate items, configured review procedures, and lifecycle markers +- `ReviewContext.md` tracks review-stage progression, configured review mode, and terminal external-review outcomes +- Built-in TODOs mirror the active required items for execution convenience, but the embedded artifact state remains the durable source of truth across resume and handoff +- If hardened state is absent, agents continue in legacy best-effort mode and say so explicitly + --- ## Implementation Workflow @@ -32,7 +41,7 @@ Both agents follow the same pattern: a compact orchestrator that loads a workflo 1. Loads the `paw-workflow` skill for orchestration guidance 2. Resolves activity skills using the current platform's mechanism (`paw_get_skills` / `paw_get_skill` in VS Code; installed skill files in Copilot CLI) 3. Delegates activities to specialized skills -4. Applies Review Policy and Session Policy for workflow control +4. Reads `WorkflowContext.md`, reconciles hardened state when present, and applies Review Policy and Session Policy for workflow control **Hybrid Execution Model:** @@ -43,6 +52,8 @@ Both agents follow the same pattern: a compact orchestrator that loads a workflo This preserves conversation flow for interactive work while leveraging fresh context for focused research and review. +Before transitions, resume/handoff, or repository mutation, the PAW agent reconciles the embedded workflow state with artifact, git, and PR reality. Unresolved required items block progression with explicit reasons. + **Activity Skills:** | Skill | Capabilities | Primary Artifacts | @@ -123,7 +134,7 @@ This preserves conversation flow for interactive work while leveraging fresh con **Invocation (Copilot CLI):** `copilot --agent PAW-Review` then provide the PR number or URL -**Architecture:** The PAW Review agent uses a skills-based architecture. The shared review logic lives in skills; VS Code-specific automation remains in `src/` only. +**Architecture:** The PAW Review agent uses a skills-based architecture. The shared review logic lives in skills; VS Code-specific automation remains in `src/` only. The agent reads `ReviewContext.md`, enforces hardened review state when present, and falls back to legacy best-effort mode when it is absent. 1. Loads the `paw-review-workflow` skill for orchestration 2. Executes activity skills via subagents for each stage @@ -162,7 +173,7 @@ This preserves conversation flow for interactive work while leveraging fresh con - **Critique response**: Updates comments per recommendations, marks final status - **GitHub posting**: Creates pending review with only approved comments -**Comment Evolution:** ReviewComments.md shows full history for each comment: original → assessment → updated → posted status. Skipped comments remain visible for manual inclusion if reviewer disagrees with critique. +**Comment Evolution:** ReviewComments.md shows full history for each comment: original → assessment → updated → posted status. Skipped comments remain visible for manual inclusion if reviewer disagrees with critique. Terminal external-review outcomes such as pending review creation or manual-posting guidance are persisted in `ReviewContext.md`. **Human Control:** Pending review is never auto-submitted. User reviews comments, edits/deletes as needed, consults ReviewComments.md for full context, then submits manually. diff --git a/docs/reference/artifacts.md b/docs/reference/artifacts.md index 0d3a521b..366e2d15 100644 --- a/docs/reference/artifacts.md +++ b/docs/reference/artifacts.md @@ -71,6 +71,9 @@ PAW workflows produce durable Markdown artifacts that trace reasoning and decisi | Final Review Models | Comma-separated model names | | Final Review Specialists | `all`, comma-separated names, or `adaptive:` (society-of-thought only) | | Final Review Interaction Mode | `parallel` or `debate` (society-of-thought only) | +| Final Review Specialist Models | `none`, model pool, pinned pairs, or mixed (society-of-thought only) | +| Final Review Perspectives | `none`, `auto`, or comma-separated perspective names (society-of-thought only) | +| Final Review Perspective Cap | Positive integer (society-of-thought only) | | Planning Docs Review | `enabled` or `disabled` | | Plan Generation Mode | `single-model` or `multi-model` | | Plan Generation Models | Comma-separated model names | @@ -89,6 +92,13 @@ PAW workflows produce durable Markdown artifacts that trace reasoning and decisi - `Artifact Lifecycle` controls git tracking for `.paw/work/` files only; it does **not** manage worktree creation, retention, or cleanup - `WorkflowContext.md` deliberately stores portable metadata only; VS Code may keep machine-specific worktree paths in local extension state, while Copilot CLI sessions rely on the committed metadata plus git/worktree evidence instead +**Hardened-state section (current workflows):** + +- `WorkflowContext.md` includes a compact `## Hardened State` block recording required activities, gate items, configured-procedure items, and lifecycle markers +- Built-in TODOs mirror the active required items from this block for in-session execution, but the embedded artifact state remains the durable source of truth +- `paw-transition`, `paw-status`, handoff/resume paths, and final PR readiness checks reconcile against this section before proceeding +- If the section is absent, the workflow runs in legacy best-effort mode and reports that hardened protections are inactive + ### Spec.md **Purpose:** Testable requirements document defining what the feature must do. @@ -222,6 +232,8 @@ This decouples intent capture from phase elaboration, preserving implementer mom When perspectives are active, each finding includes a `**Perspective**` field indicating which lens surfaced it (`baseline` for findings without a perspective overlay). +`REVIEW-SYNTHESIS.md` is also the artifact that resolves the configured evaluation/review procedure for society-of-thought flows. If the configured mode requires synthesis and this artifact is missing, the workflow stays blocked instead of silently falling back to another procedure. + --- ## Review Workflow Artifacts @@ -243,6 +255,12 @@ When perspectives are active, each finding includes a `**Perspective**` field in | CI Status | Passing, failing, pending | | Flags | CI failures, breaking changes suspected | +**Hardened-state section (current reviews):** + +- `ReviewContext.md` includes a compact `## Hardened State` block for review-job identifier, configured review mode, required stage items, and terminal external-review outcomes +- The section lets resume, status, critique, and posting paths re-enter the same review job without re-inferring state from prompt prose +- If the section is absent, PAW Review uses legacy best-effort mode and says so explicitly + ### ResearchQuestions.md **Purpose:** Research questions to guide baseline codebase analysis. @@ -329,6 +347,8 @@ When perspectives are active, each finding includes a `**Perspective**` field in - **Comment text** — What gets posted - **Rationale** — Evidence, baseline pattern, impact, best practice - **Assessment** — Usefulness, accuracy, trade-offs (never posted) +- **Final marker** — `**Final**:` status that determines whether the comment is ready for posting +- **Posted/manual state** — Pending review IDs or manual-posting guidance reflected in the associated review control state --- diff --git a/docs/specification/implementation.md b/docs/specification/implementation.md index 80e5bb12..66ff796a 100644 --- a/docs/specification/implementation.md +++ b/docs/specification/implementation.md @@ -30,6 +30,16 @@ Execution mode is separate from review strategy: - **Artifact lifecycle** controls whether `.paw/work/` artifacts are committed and does not imply worktree cleanup - Older `WorkflowContext.md` files without `Execution Mode` are treated as `current-checkout` +### Control-State and Reconciliation + +Current workflows embed a compact `## Hardened State` section inside `WorkflowContext.md`. + +- The section records required activities, gate items, configured-procedure items, and lifecycle markers that must resolve before the workflow can advance. +- Built-in TODOs mirror the active required items from this section for in-session execution, but the embedded artifact state remains the durable source of truth across resume and cross-runtime handoff. +- `paw-transition`, `paw-status`, handoff/resume flows, and repository-mutation paths reconcile this state against artifacts, git, and PR reality before proceeding. + +If hardened state is absent, PAW falls back to **legacy best-effort mode**. Older workflows still run, but status and resume surfaces explicitly report that hardened protections are inactive. + ### Pre-Specification: Work Shaping (Optional) **Skill:** `paw-work-shaping` @@ -100,7 +110,7 @@ Map relevant code areas and create a detailed implementation plan broken into ph 2. `paw-planning` creates detailed plan based on spec and code research 3. Collaborate iteratively to refine the plan 4. `paw-plan-review` validates plan feasibility **(mandatory)** -5. `paw-planning-docs-review` reviews all planning artifacts as a holistic bundle **(if enabled)** +5. `paw-planning-docs-review` reviews all planning artifacts as a holistic bundle **(if enabled)** and resolves the configured planning-review procedure only when that exact mode runs 6. Open Planning PR for review from the execution checkout established during initialization/reuse (PRs strategy) ### Stage 03 — Phased Implementation @@ -125,8 +135,8 @@ Execute plan phases with automated verification, peer review, and quality gates. For each phase: -1. `paw-implement` creates the phase branch and implements changes in the execution checkout established during initialization/reuse -2. `paw-implement` runs automated checks (tests, linting, type checking, build) and verifies the current phase's `Changes Required` deliverables actually exist before marking the phase complete +1. `paw-implement` creates the phase branch and implements changes in the execution checkout established during initialization/reuse after transition has reconciled the current workflow state +2. `paw-implement` runs automated checks (tests, linting, type checking, build), verifies the current phase's `Changes Required` deliverables actually exist, and resolves the phase's required control-state items before marking the phase complete 3. `paw-impl-review` reviews changes, cross-checks current-phase deliverables against actual repo state, adds documentation, and blocks missing or empty planned outputs before pushing/opening the Phase PR 4. `paw-impl-review` pushes and opens Phase PR (PRs strategy) 5. Human reviews PR and provides feedback @@ -173,7 +183,7 @@ Automated review of the complete implementation against specification before Fin **Process:** 1. `paw-final-review` reads configuration from WorkflowContext (mode, interactive, models) -2. Executes review (single-model or multi-model based on config); for society-of-thought mode, delegates to `paw-sot` +2. Executes the exact configured review mode (single-model, multi-model, or society-of-thought); for society-of-thought mode, delegates to `paw-sot` 3. For multi-model: synthesizes findings across models 4. Presents findings for resolution (interactive) or auto-applies (non-interactive) 5. Proceeds to `paw-pr` when complete @@ -181,11 +191,16 @@ Automated review of the complete implementation against specification before Fin **Configuration:** - `Final Agent Review`: `enabled` | `disabled` (default: enabled) -- `Final Review Mode`: `single-model` | `multi-model` (default: multi-model) +- `Final Review Mode`: `single-model` | `multi-model` | `society-of-thought` (default: multi-model) - `Final Review Interactive`: `true` | `false` | `smart` (default: smart) - `Final Review Models`: comma-separated model names (for multi-model) +- `Final Review Specialists`: `all` | comma-separated names | `adaptive:` (for society-of-thought) +- `Final Review Interaction Mode`: `parallel` | `debate` (for society-of-thought) +- `Final Review Specialist Models`: `none` | model pool | pinned pairs | mixed (for society-of-thought) +- `Final Review Perspectives`: `none` | `auto` | comma-separated names (for society-of-thought) +- `Final Review Perspective Cap`: positive integer (for society-of-thought) -**Note:** VS Code only supports single-model mode due to environment limitations. +**Note:** VS Code and Copilot CLI both preserve the configured review mode in `WorkflowContext.md`. If a configured procedure cannot run in the current context, the workflow blocks explicitly instead of silently downgrading to another mode. ### Stage 04 — Final PR @@ -250,13 +265,13 @@ Reviews code changes, verifies current-phase deliverables from `ImplementationPl ### paw-final-review -Reviews implementation against specification and `ImplementationPlan.md` deliverables after all phases complete. Missing planned deliverables are `should-fix` minimum. Supports multi-model parallel review (CLI) or single-model review (VS Code). For society-of-thought mode, delegates SoT orchestration to the `paw-sot` utility skill. Interactive mode presents findings for apply/skip/discuss; non-interactive mode auto-applies. +Reviews implementation against specification and `ImplementationPlan.md` deliverables after all phases complete. Missing planned deliverables are `should-fix` minimum. Supports single-model, multi-model, or society-of-thought review; if the configured procedure cannot run, the workflow blocks instead of silently switching modes. For society-of-thought mode, delegates SoT orchestration to the `paw-sot` utility skill. Interactive mode presents findings for apply/skip/discuss; non-interactive mode auto-applies. **Focus:** Catch issues and missing planned deliverables before external PR review. ### paw-planning-docs-review -Reviews all planning artifacts (Spec.md, ImplementationPlan.md, CodeResearch.md) as a holistic bundle after plan-review passes. Supports single-model, multi-model parallel review, or society-of-thought review via `paw-sot` engine (CLI). VS Code supports single-model only. Catches cross-artifact consistency issues before implementation begins. +Reviews all planning artifacts (Spec.md, ImplementationPlan.md, CodeResearch.md) as a holistic bundle after plan-review passes. Supports single-model, multi-model parallel review, or society-of-thought review via `paw-sot`. As with final review, the configured procedure is preserved and unsupported combinations block explicitly instead of downgrading. Catches cross-artifact consistency issues before implementation begins. **Focus:** Cross-artifact consistency gate before implementation. @@ -274,12 +289,22 @@ Reviews all planning artifacts (Spec.md, ImplementationPlan.md, CodeResearch.md) ### paw-status -Diagnoses current workflow state, recommends next steps, explains PAW/onboarding, and posts status updates to Issues/PRs. +Diagnoses current workflow state, recommends next steps, explains PAW/onboarding, and posts status updates to Issues/PRs. When hardened state is present it reports reconciled control-state status; otherwise it reports legacy best-effort mode explicitly. ### paw-pr Opens the **final PR** from the target branch to main and performs comprehensive pre-flight readiness checks to assess completeness. +## Legacy Best-Effort Mode + +Workflows created before the hardened-state model, or artifacts edited in ways that remove the embedded `## Hardened State` section, still remain usable. + +- Agents fall back to artifact and git inference for progression +- Status and resume paths say that hardened protections are inactive +- Exact configured-procedure handling and gate reconciliation apply only when the embedded state is present + +New workflows should prefer the embedded control-state model so CLI and VS Code surfaces re-enter the same durable contract. + --- ## Quality Gates diff --git a/docs/specification/review.md b/docs/specification/review.md index 6c8bd3e9..d680cca2 100644 --- a/docs/specification/review.md +++ b/docs/specification/review.md @@ -50,6 +50,16 @@ The review workflow uses a **skills-based architecture** for dynamic, maintainab Every subagent MUST call `paw_get_skill` FIRST before executing any work. The workflow skill requires delegation prompts to include: "First load your skill using `paw_get_skill('paw-review-')`, then execute the activity." +## Review Control-State and Stage Gating + +Current review jobs embed a compact `## Hardened State` section inside `ReviewContext.md`. + +- The section records the review-job identifier, configured review mode, required stage items, and terminal external-review outcomes. +- `ReviewComments.md` remains the complete feedback history, while the embedded control state tracks whether the job is still in understanding, evaluation, output, or a terminal posting/manual-posting state. +- Resume, status, critique, and posting paths reconcile the embedded state before advancing so the workflow re-enters the same review job instead of re-inferring progress from prompt prose. + +If the hardened section is absent, PAW Review remains usable in **legacy best-effort mode** and explicitly reports that the new protections are inactive. + ## Cross-Repository Review PAW Review supports coordinated review of multiple related PRs across repositories. @@ -104,7 +114,7 @@ PR → Understanding (R1) → Evaluation (R2) → Feedback Generation (R3) **Outputs:** -- `ReviewContext.md` — PR metadata, changed files, flags +- `ReviewContext.md` — PR metadata, changed files, flags, and embedded review control state - `ResearchQuestions.md` — Research questions for baseline analysis - `CodeResearch.md` — Pre-change baseline understanding - `DerivedSpec.md` — Reverse-engineered intent and acceptance criteria @@ -114,6 +124,7 @@ PR → Understanding (R1) → Evaluation (R2) → Feedback Generation (R3) 1. **Fetch PR metadata and create ReviewContext.md** - Document all changed files with additions/deletions - Set flags: CI failures, breaking changes suspected + - Seed review-job state for stage progression, configured mode, and terminal external-review placeholders when hardened state is active 2. **Research pre-change baseline** - Analyze codebase at base commit (pre-change state) @@ -150,6 +161,8 @@ PR → Understanding (R1) → Evaluation (R2) → Feedback Generation (R3) - `REVIEW-{SPECIALIST}.md` — Per-specialist findings - `REVIEW-SYNTHESIS.md` — Confidence-weighted synthesized findings +**Stage gate:** Evaluation begins only after the Understanding items recorded in `ReviewContext.md` are resolved. Single-model mode resolves through `ImpactAnalysis.md` + `GapAnalysis.md`; society-of-thought mode resolves through `REVIEW-SYNTHESIS.md`. + **Process (single-model):** 1. **Analyze impact** @@ -179,6 +192,8 @@ PR → Understanding (R1) → Evaluation (R2) → Feedback Generation (R3) 2. Specialists review PR diff with distinct cognitive strategies 3. Synthesis merges findings with confidence weighting and conflict resolution +The configured review mode remains authoritative. If the expected procedure cannot run, the workflow blocks instead of silently switching to another mode. + See [Society-of-Thought Review](../guide/society-of-thought-review.md) for configuration details. --- @@ -199,6 +214,8 @@ See [Society-of-Thought Review](../guide/society-of-thought-review.md) for confi - `ReviewComments.md` — Complete feedback with full comment history - **GitHub pending review** (GitHub context) — Draft review with filtered comments +**Stage gate:** Output begins only when the evaluation artifacts required by the configured review mode are resolved in `ReviewContext.md`. + **Process:** The Output stage uses an **iterative feedback-critique pattern**: @@ -224,8 +241,9 @@ The Output stage uses an **iterative feedback-critique pattern**: 4. **GitHub Posting** (`paw-review-github`, GitHub only) - Filter to only comments marked "Ready for GitHub posting" - Create pending review with filtered comments + - Persist the terminal external-review outcome back into `ReviewContext.md` - Skipped comments remain in artifact but are NOT posted - - Non-GitHub: provides manual posting instructions + - Non-GitHub: provides manual posting instructions and records that manual-posting guidance was provided **Comment Evolution in ReviewComments.md:** @@ -234,7 +252,7 @@ Each comment shows its complete history: - **Assessment** — Critic evaluation - **Updated** — Refined version if modification was recommended - **Final** — Ready for posting or skipped per critique -- **Posted** — GitHub pending review ID +- **Posted** — Per-comment pending review/comment status in `ReviewComments.md` - Regenerate with adjusted tone if requested --- @@ -253,6 +271,12 @@ Authoritative parameter source for the review workflow. - CI Status and flags - Description and metadata +When hardened state is present, `ReviewContext.md` also stores: + +- Review-stage items (`understanding` → `evaluation` → `output`) +- The configured review mode and procedure-resolution markers +- Terminal external-review state such as pending review creation or manual-posting guidance + ### DerivedSpec.md Reverse-engineered specification from implementation. @@ -323,6 +347,17 @@ Complete feedback with full comment history showing evolution from original to p - Final status (`Ready for GitHub posting` or `Skipped per critique`) - Posted status (pending review ID after GitHub posting) +`**Final**:` markers determine which comments `paw-review-github` can post. Pending review IDs or manual-posting outcomes live in `ReviewContext.md`, so status and resume surfaces can report the terminal review state without re-reading GitHub first. + +## Terminal External-Review Outcomes + +When hardened state is present, PAW Review persists the review job's external outcome back into `ReviewContext.md`: + +- Pending review created on GitHub +- Manual posting guidance provided for non-GitHub contexts + +This lets resumed sessions report whether a review is already awaiting submission instead of reverting to an earlier inferred stage. + --- ## Human Workflow Summary diff --git a/package.json b/package.json index 548718ee..d0154705 100644 --- a/package.json +++ b/package.json @@ -62,7 +62,7 @@ { "name": "paw_new_session", "displayName": "Start New PAW Session", - "modelDescription": "Start a fresh PAW agent chat session with cleared context. Use for session resets, stage transitions, or when user requests 'clear context' / 'new session'. Includes Work ID context and optional inline instructions.", + "modelDescription": "Start a fresh PAW agent chat session with cleared context. The new session should resume from WorkflowContext.md or ReviewContext.md, use embedded hardened state when present, and report legacy best-effort mode when that state is absent. Use for session resets, stage transitions, or when user requests 'clear context' / 'new session'. Includes work ID or review identifier context and optional inline instructions.", "toolReferenceName": "pawHandoff", "canBeReferencedInPrompt": true, "inputSchema": { @@ -78,7 +78,7 @@ }, "work_id": { "type": "string", - "description": "The normalized Work ID (feature slug, e.g., 'auth-system')" + "description": "Context identifier: normalized Work ID for PAW, or review identifier for PAW Review (for example 'auth-system', 'PR-123', or 'PR-123-my-api')" }, "inline_instruction": { "type": "string", diff --git a/paw-review-specification.md b/paw-review-specification.md index d917c66e..7d69b346 100644 --- a/paw-review-specification.md +++ b/paw-review-specification.md @@ -48,6 +48,18 @@ The review workflow uses a **skills-based architecture** for dynamic, maintainab --- +## Review Control-State Model + +Current PAW Review runs embed a compact `## Hardened State` section inside `ReviewContext.md`. + +- The section records the review-job identifier, configured review mode, required stage items, and terminal external-review outcomes. +- `ReviewComments.md` continues to hold full comment history, while the embedded control state tracks whether the job is still in understanding, evaluation, output, or a terminal posting/manual-posting state. +- Resume, status, critique, and posting paths re-enter the same review job by reconciling this embedded state before advancing. + +If the hardened section is absent, PAW Review stays usable in **legacy best-effort mode** and explicitly reports that the new protections are inactive. + +--- + ## Cross-Repository Review PAW Review supports reviewing coordinated changes across multiple repositories—common in monorepo setups, microservice architectures, or multi-package releases. @@ -155,6 +167,8 @@ Single-PR workflows remain unchanged—multi-repo sections only appear when the ## Workflow Stages +Each stage reads `ReviewContext.md` before acting. When hardened state is present, PAW Review advances only after the prior required items are resolved and records terminal external-review outcomes back into the same artifact. + ### Stage R1 — Understanding **Goal:** Comprehensively understand what changed and why @@ -167,7 +181,7 @@ Single-PR workflows remain unchanged—multi-repo sections only appear when the * Repository context **Outputs:** -* `.paw/reviews//ReviewContext.md` – PR metadata, changed files, flags (authoritative parameter source) + * `.paw/reviews//ReviewContext.md` – PR metadata, changed files, flags, and embedded review control state (authoritative parameter source) * `.paw/reviews//CodeResearch.md` – Pre-change baseline understanding * `.paw/reviews//DerivedSpec.md` – Reverse-engineered intent and acceptance criteria @@ -179,6 +193,7 @@ Single-PR workflows remain unchanged—multi-repo sections only appear when the - Document all changed files with additions/deletions - Set flags: CI failures, breaking changes suspected - **ReviewContext.md becomes authoritative parameter source** for all downstream stages + - Seed review-job state for stage progression, configured mode, and terminal external-review placeholders when hardened state is active 2. **Research pre-change baseline** - Analyze codebase at base commit (pre-change state) @@ -226,6 +241,8 @@ Single-PR workflows remain unchanged—multi-repo sections only appear when the * `.paw/reviews//REVIEW-{SPECIALIST}.md` – Per-specialist findings with cognitive strategy and evidence * `.paw/reviews//REVIEW-SYNTHESIS.md` – Confidence-weighted synthesis with specialist attribution and conflict resolution +**Stage gate:** Evaluation begins only after the Understanding items recorded in `ReviewContext.md` are resolved. Single-model mode resolves the evaluation procedure through `ImpactAnalysis.md` + `GapAnalysis.md`; society-of-thought mode resolves it through `REVIEW-SYNTHESIS.md`. + **Process (single-model):** 1. **Analyze impact** @@ -260,6 +277,7 @@ Single-PR workflows remain unchanged—multi-repo sections only appear when the 1. **Construct review context** from ReviewContext.md configuration fields and pass PR diff + understanding artifacts to `paw-sot` 2. **Specialist execution** — Specialists review independently (parallel) or engage in structured debate, each applying their cognitive strategy 3. **Synthesis** — Confidence-weighted merge of specialist findings with conflict resolution and specialist attribution +4. **Mode preservation** — The configured review mode stays authoritative; the workflow blocks instead of silently switching procedures **Human Workflow:** @@ -285,6 +303,8 @@ Single-PR workflows remain unchanged—multi-repo sections only appear when the * `.paw/reviews//ReviewComments.md` – Complete feedback with full comment history (original → assessment → updated → posted status) * **GitHub pending review** (GitHub context only) – Draft review with filtered comments (only those marked ready after critique) +**Stage gate:** Output begins only when the evaluation artifacts required by the configured review mode are resolved in `ReviewContext.md`. + **Process:** The Output stage uses an **iterative feedback-critique pattern** to refine comments before posting: @@ -323,8 +343,9 @@ The Output stage uses an **iterative feedback-critique pattern** to refine comme - Filter to only comments marked "Ready for GitHub posting" - Create pending review with filtered comments - Update ReviewComments.md with posted status and review IDs + - Persist the terminal external-review outcome back into `ReviewContext.md` - Skipped comments remain in artifact for reference but are NOT posted - - **Non-GitHub context**: Provides manual posting instructions instead + - **Non-GitHub context**: Provides manual posting instructions instead and records that manual posting guidance was provided **Human Workflow:** @@ -369,6 +390,14 @@ The Output stage uses an **iterative feedback-critique pattern** to refine comme - Flags (CI failures present, breaking changes suspected) - Metadata (created timestamp, git commit SHA, reviewer) +When hardened state is present, `ReviewContext.md` also stores: + +- Required review-stage items (`understanding` → `evaluation` → `output`) +- The configured review mode and procedure-resolution markers +- Terminal external-review state such as pending review creation or manual-posting guidance + +If the section is absent, PAW Review falls back to legacy best-effort mode and reports that status explicitly. + **Purpose:** Single source of truth for all review parameters; read first by all downstream stages; updated when new information discovered. --- @@ -590,7 +619,7 @@ Each comment shows its complete history: 2. **Assessment** — Critic evaluation (Include/Modify/Skip recommendation) 3. **Updated** — Refined version if modification was recommended 4. **Final** — Ready for posting or skipped per critique -5. **Posted** — GitHub pending review ID (after GitHub posting) +5. **Posted** — Per-comment pending review/comment status in `ReviewComments.md` (after GitHub posting) **Key Sections:** - **Rationale** (local only, not posted): Evidence, Baseline Pattern, Impact, Best Practice @@ -833,4 +862,3 @@ Each stage produces artifacts that should meet these quality standards: - **Post-review:** Can create implementation issues/PRs from Must/Should items using PAW implementation workflow - **Artifacts:** Review artifacts can inform future implementation planning - **Patterns:** Gap analysis categories can improve future Spec and Implementation Plan quality - diff --git a/paw-specification.md b/paw-specification.md index 67af3509..df4ee329 100644 --- a/paw-specification.md +++ b/paw-specification.md @@ -202,6 +202,22 @@ The final PR diff against `main` shows zero `.paw/` file changes. The PR descrip **VS Code Integration:** The "Stop Tracking Artifacts" command provides a mid-workflow escape hatch, switching any tracked workflow to `never-commit`. +### Hardened Control-State Model + +Current PAW workflows embed a compact `## Hardened State` section inside `WorkflowContext.md`. + +- The section records required activities, gate items, configured-procedure items, and lifecycle markers that must resolve before the workflow can advance. +- Built-in TODOs mirror the active required items from this section for in-session execution, but the embedded artifact state remains the durable source of truth across resume and cross-runtime handoff. +- The section is portable: CLI agents and VS Code surfaces both read the same serialized state instead of reconstructing workflow progress from prompt prose alone. + +If the section is absent, PAW treats the workflow as **legacy best-effort mode**. Older workflows remain usable, but transition, status, and resume surfaces explicitly report that hardened protections are inactive. + +### Gate Reconciliation and Exact Procedure Handling + +Before stage transitions, status reporting, handoff/resume, or repository mutation, PAW reconciles the embedded workflow state with artifact, git, and PR reality. Unresolved required items block progression with explicit reasons instead of relying on inferred progress. + +Planning-docs review and final review must run the configured procedure exactly. If the configured mode cannot run, PAW records a blocked outcome rather than silently downgrading to a different mode. VS Code surfaces preserve and forward the same configuration; they do not implement a separate workflow runtime in `src/`. + --- ## Repository Layout & Naming @@ -1057,6 +1073,27 @@ The **Workflow Context** document centralizes workflow parameters (target branch - `parallel`: All specialists review independently, then synthesis - `debate`: Thread-based debate with specialist interaction +**Final Review Specialist Models** (Optional, defaults to `none`) +- Only relevant when `Final Review Mode` is `society-of-thought` +- Controls whether specialists share a pool, pin to specific models, or use mixed specialist/model assignment + +**Final Review Perspectives** (Optional, defaults to `auto`) +- Only relevant when `Final Review Mode` is `society-of-thought` +- Applies perspective overlays such as temporal or adversarial lenses without changing specialist identity + +**Final Review Perspective Cap** (Optional) +- Only relevant when `Final Review Mode` is `society-of-thought` +- Limits how many perspectives can be applied per specialist + +**Planning Review Configuration** (Optional) +- `Planning Docs Review` enables or disables the planning-docs review gate +- When enabled, `Planning Review Mode`, `Planning Review Interactive`, `Planning Review Models`, `Planning Review Specialists`, `Planning Review Interaction Mode`, `Planning Review Specialist Models`, `Planning Review Perspectives`, and `Planning Review Perspective Cap` use the same configuration shape as final review + +**Hardened State Section** (Current workflows) +- The embedded `## Hardened State` block records required activities, gate items, configured-procedure items, and lifecycle markers +- Presence of this section enables hardened behavior; absence means legacy best-effort mode +- Built-in TODOs mirror active required items from this block and can be regenerated from it + #### Usage Agents automatically create WorkflowContext.md when first invoked with parameters, or read it from chat context to extract values. Include in context for subsequent stages to avoid re-entering parameters. diff --git a/skills/paw-final-review/SKILL.md b/skills/paw-final-review/SKILL.md index 9ebbe9a3..e572f308 100644 --- a/skills/paw-final-review/SKILL.md +++ b/skills/paw-final-review/SKILL.md @@ -28,6 +28,12 @@ Read WorkflowContext.md for: - `Final Review Interactive`: `true` | `false` | `smart` - `Final Review Models`: comma-separated model names (for multi-model) +If `WorkflowContext.md` contains `## Hardened State`, treat `procedure:final-review` as the configured procedure item: +- Reconcile control state before executing the review +- Mark it `in_progress` only when the configured review mode is actually about to run +- Mark it `resolved` only after the configured mode completes successfully +- Mark it `blocked` when the configured mode cannot run in the current runtime; do not silently downgrade + {{#cli}} If mode is `multi-model`, parse the models list. Default: `latest GPT, latest Gemini, latest Claude Opus`. @@ -37,7 +43,7 @@ If mode is `society-of-thought`, also read: - `Final Review Specialist Models`: `none` (default) | model pool | pinned pairs | mixed {{/cli}} {{#vscode}} -**Note**: VS Code only supports `single-model` mode. If `multi-model` is configured, proceed with single-model using the current session's model. +**Note**: VS Code only supports `single-model` mode. {{/vscode}} ### Step 2: Gather Review Context @@ -157,11 +163,11 @@ After paw-sot completes orchestration and synthesis, proceed to Step 5 (Resoluti {{#vscode}} ### Step 4: Execute Review (VS Code) -**Note**: VS Code only supports single-model mode. If `multi-model` is configured, report to user: "Multi-model not available in VS Code; running single-model review." - -If `society-of-thought` is configured, report to user: "Society-of-thought requires CLI for specialist persona loading (see issue #240). Running single-model review." +If `Final Review Mode` is `multi-model` or `society-of-thought`: +- When `WorkflowContext.md` contains `## Hardened State`, preserve the configured mode in status/control-state surfaces, persist `procedure:final-review` as `blocked` in `WorkflowContext.md`, and report: "Configured final review mode `` is unavailable in VS Code. Re-run this review in CLI." Do not run a single-model fallback. Stop after reporting the blocker. +- When hardened state is absent, continue in legacy best-effort mode, explicitly report that hardened protections are inactive, and run a single-model fallback review in `REVIEW.md`. -Execute single-model review using the shared review prompt above. Save to `REVIEW.md`. +If `Final Review Mode` is `single-model`, execute the shared review prompt above and save the output to `REVIEW.md`. {{/vscode}} ### Step 5: Resolution diff --git a/skills/paw-git-operations/SKILL.md b/skills/paw-git-operations/SKILL.md index 72883270..b1f16c7e 100644 --- a/skills/paw-git-operations/SKILL.md +++ b/skills/paw-git-operations/SKILL.md @@ -27,6 +27,14 @@ description: Shared git mechanics for PAW activity skills including branch namin - The caller checkout must never be mutated. - If the execution checkout is ambiguous or cannot be proven from this session, STOP and report recovery guidance instead of guessing. +## Control-State Reconciliation Gate + +- When `WorkflowContext.md` contains `## Hardened State`, reconcile it before any branch creation, checkout, pull, commit, push, PR-prep mutation, or artifact write. +- Treat `Reconciliation: current` as required for mutation-affecting git work. +- If `Reconciliation` is `not_run`, `stale`, or `external_unverified`, reconcile first; if reconciliation cannot prove the live state, STOP and report a blocker instead of mutating git state. +- If any required activity item, gate item, or configured procedure item that should already be terminal remains `pending`, `in_progress`, or `blocked`, STOP and report that unresolved control-state item instead of bypassing it with git mutation. +- If `## Hardened State` is absent, continue in legacy best-effort mode and explicitly note that hardened protections are inactive. + ### Branch-State Matrix | Execution Mode / Strategy | Caller checkout | Execution checkout | diff --git a/skills/paw-init/SKILL.md b/skills/paw-init/SKILL.md index 23e55029..eed623dc 100644 --- a/skills/paw-init/SKILL.md +++ b/skills/paw-init/SKILL.md @@ -14,7 +14,7 @@ Bootstrap skill that initializes the PAW workflow directory structure. This runs - Generate Work Title from issue URL, branch name, or user description - Generate Work ID from Work Title (normalized, unique) unless `work_id` is provided explicitly - Create `.paw/work//` directory structure -- Generate WorkflowContext.md with all configuration fields +- Generate WorkflowContext.md with all configuration fields and a compact hardened-state section - Create and checkout git branch (explicit or auto-derived) - Commit initial artifacts if lifecycle mode allows it - Open WorkflowContext.md for review @@ -239,8 +239,41 @@ Remote: origin Artifact Lifecycle: Artifact Paths: auto-derived Additional Inputs: none + +## Hardened State + +TODO Mirror: active-required-items +Reconciliation: not_run + +### Required Workflow Items +- `init` | `resolved` | `activity` +- `spec` | `` | `activity` +- `spec-review` | `` | `activity` +- `code-research` | `pending` | `activity` +- `planning` | `pending` | `activity` +- `plan-review` | `pending` | `activity` +- `planning-docs-review` | `` | `activity` +- `final-review` | `` | `activity` +- `final-pr` | `pending` | `activity` + +### Gate Items +- `transition:after-spec-review` | `` | `transition` +- `transition:after-plan-review` | `pending` | `transition` +- `transition:after-planning-docs-review` | `` | `transition` +- `transition:after-phase:` | `pending` | `transition` +- `transition:after-final-review` | `` | `transition` + +### Configured Procedure Items +- `procedure:planning-review` | `` | `procedure` +- `procedure:final-review` | `` | `procedure` ``` +- Resolve config-dependent hardened-state rows to concrete values before writing `WorkflowContext.md`: + - After planning defines named implementation phases, append `phase::` items under `### Required Workflow Items`. + - Use `not_applicable` for `spec`, `spec-review`, and `transition:after-spec-review` when `Workflow Mode` is `minimal`; otherwise use `pending`. + - Use `pending` for `planning-docs-review`, `transition:after-planning-docs-review`, and `procedure:planning-review` when `Planning Docs Review` is `enabled`; otherwise use `not_applicable`. + - Use `pending` for `final-review`, `transition:after-final-review`, and `procedure:final-review` when `Final Agent Review` is `enabled`; otherwise use `not_applicable`. + ### Execution Contract - If `Execution Mode` is absent in an older context, treat it as `current-checkout`. diff --git a/skills/paw-planning-docs-review/SKILL.md b/skills/paw-planning-docs-review/SKILL.md index a42cf01c..8d6c3e54 100644 --- a/skills/paw-planning-docs-review/SKILL.md +++ b/skills/paw-planning-docs-review/SKILL.md @@ -32,6 +32,13 @@ Read WorkflowContext.md for: If `Planning Docs Review` is `disabled`, report skip and return `complete`. +If `WorkflowContext.md` contains `## Hardened State`, treat `procedure:planning-review` as the configured procedure item: +- Set it to `not_applicable` when `Planning Docs Review` is `disabled` +- Reconcile control state before executing the review +- Mark it `in_progress` only when the configured review mode is actually about to run +- Mark it `resolved` only after the configured mode completes successfully +- Mark it `blocked` when the configured mode cannot run in the current runtime; do not silently downgrade + {{#cli}} If mode is `multi-model`, parse the models list. Default: `latest GPT, latest Gemini, latest Claude Opus`. @@ -43,7 +50,7 @@ If mode is `society-of-thought`, also read: - `Planning Review Perspective Cap`: positive integer (default `2`) {{/cli}} {{#vscode}} -**Note**: VS Code only supports `single-model` mode. If `multi-model` is configured, proceed with single-model using the current session's model. +**Note**: VS Code only supports `single-model` mode. {{/vscode}} ### Step 2: Gather Review Context @@ -170,11 +177,11 @@ After paw-sot completes orchestration and synthesis, tag each REVIEW-SYNTHESIS.m {{#vscode}} ### Step 4: Execute Review (VS Code) -**Note**: VS Code only supports single-model mode. If `multi-model` is configured, report to user: "Multi-model not available in VS Code; running single-model review." - -If `society-of-thought` is configured, report to user: "Society-of-thought requires CLI for specialist persona loading (see issue #240). Running single-model review." +If `Planning Review Mode` is `multi-model` or `society-of-thought`: +- When `WorkflowContext.md` contains `## Hardened State`, preserve the configured mode in status/control-state surfaces, persist `procedure:planning-review` as `blocked` in `WorkflowContext.md`, and report: "Configured planning review mode `` is unavailable in VS Code. Re-run this review in CLI." Do not run a single-model fallback. Stop after reporting the blocker. +- When hardened state is absent, continue in legacy best-effort mode, explicitly report that hardened protections are inactive, and run a single-model fallback review in `reviews/planning/REVIEW.md`. -Execute single-model review using the shared review prompt above. Save to `reviews/planning/REVIEW.md`. +If `Planning Review Mode` is `single-model`, execute the shared review prompt above and save the output to `reviews/planning/REVIEW.md`. {{/vscode}} ### Step 5: Resolution diff --git a/skills/paw-pr/SKILL.md b/skills/paw-pr/SKILL.md index 93c8ff83..b57820e7 100644 --- a/skills/paw-pr/SKILL.md +++ b/skills/paw-pr/SKILL.md @@ -56,6 +56,13 @@ Before creating the PR, verify and report status. Block on failures unless user - ImplementationPlan `## Open Questions` → empty - Unresolved items → block and report with recommendation +**Hardened Control State**: +- If `WorkflowContext.md` contains `## Hardened State`, reconcile it before PR creation. +- Required activity items earlier than `final-pr`, plus all gate items and configured procedure items, must all be `resolved` or `not_applicable` before final PR creation can proceed. +- Treat `final-pr` as the current activity: it may still be `pending` or `in_progress` when `paw-pr` starts and becomes `resolved` only after PR creation succeeds. +- If reconciliation cannot prove the live state, or any earlier required hardened-state item or dependent gate/procedure item remains `pending`, `in_progress`, or `blocked`, treat it as a blocker. +- If `## Hardened State` is absent, continue in legacy best-effort mode and explicitly note that hardened protections are inactive. + ## Workflow Mode Handling ### Full Mode diff --git a/skills/paw-review-critic/SKILL.md b/skills/paw-review-critic/SKILL.md index da7dcb16..e8d69c7a 100644 --- a/skills/paw-review-critic/SKILL.md +++ b/skills/paw-review-critic/SKILL.md @@ -24,6 +24,21 @@ Also verify access to all supporting artifacts: If ReviewComments.md is missing, report blocked status—Feedback Generation must complete first. +## Hardened Review Control State + +If `ReviewContext.md` contains `## Hardened State`, also read: +- `Reconciliation:` marker +- Review stage items +- Terminal external review state + +Apply these sequencing rules: +- `paw-review-critic` requires `output:feedback` to be `resolved` and `output:critic` to be `pending` or `in_progress` +- If prerequisites are not met, report blocked status with the unresolved review-state item +- After assessments are written, update `ReviewContext.md` so `output:critic` is `resolved`, `output:critique-response` remains/ becomes `pending`, and `Reconciliation` is `current` +- Preserve the existing terminal external review state and pending review identifiers when updating `ReviewContext.md` + +If `ReviewContext.md` does not contain `## Hardened State`, continue in legacy best-effort mode and explicitly report that hardened protections are inactive. + ## Core Responsibilities - Read and understand all generated review comments diff --git a/skills/paw-review-feedback/SKILL.md b/skills/paw-review-feedback/SKILL.md index bd87b59f..eca8490a 100644 --- a/skills/paw-review-feedback/SKILL.md +++ b/skills/paw-review-feedback/SKILL.md @@ -23,6 +23,23 @@ Verify these artifacts exist in `.paw/reviews//`: If any required artifact is missing, report blocked status—earlier stages must complete first. +## Hardened Review Control State + +If `ReviewContext.md` contains `## Hardened State`, also read: +- `Reconciliation:` marker +- Review stage items +- Terminal external review state + +Apply these sequencing rules: +- Initial feedback generation requires `evaluation` to be `resolved` and `output:feedback` to be `pending` or `in_progress` +- Critique Response Mode requires `output:critic` to be `resolved` and `output:critique-response` to be `pending` or `in_progress` +- If prerequisites are not met, report blocked status with the unresolved review-state item +- After the initial pass, update `ReviewContext.md` so `output:feedback` is `resolved`, `output:critic` remains/ becomes `pending`, and `Reconciliation` is `current` +- After critique response, update `ReviewContext.md` so `output:critique-response` is `resolved`, `output:github` remains/ becomes `pending`, and `Reconciliation` is `current` +- Preserve the existing terminal external review state and pending review identifiers when updating `ReviewContext.md` + +If `ReviewContext.md` does not contain `## Hardened State`, continue in legacy best-effort mode and explicitly report that hardened protections are inactive. + **SoT mode input mapping**: When `REVIEW-SYNTHESIS.md` is the evaluation source, map findings to the comment pipeline as follows: - Severity: must-fix → Must, should-fix → Should, consider → Could - Trade-offs from "Trade-offs Requiring Decision" section → Should-tier with `[Trade-off]` prefix (note: this mapping is active only when `Review Interactive` is `true` or `smart`; with `false` default, trade-offs are auto-resolved by paw-sot and appear as regular findings) diff --git a/skills/paw-review-github/SKILL.md b/skills/paw-review-github/SKILL.md index 20a52d68..dfe5b1e9 100644 --- a/skills/paw-review-github/SKILL.md +++ b/skills/paw-review-github/SKILL.md @@ -24,6 +24,23 @@ If ReviewComments.md is not finalized or missing `**Final**:` markers, report bl **Non-GitHub Context**: If this is not a GitHub PR review (e.g., local branch diff), skip GitHub posting and provide manual posting instructions instead. +## Hardened Review Control State + +If `ReviewContext.md` contains `## Hardened State`, also read: +- `Reconciliation:` marker +- Review stage items +- Terminal external review state +- `Pending Review ID` and `Pending Review URL` + +Apply these sequencing rules: +- If terminal external review state already shows `pending-review-created` or `manual-posting-provided`, treat this skill as idempotent completion: report the existing outcome, preserve it, and do not create a second pending review or append a second manual-posting section +- Otherwise `paw-review-github` requires `Reconciliation` to be `current`, `output:critique-response` to be `resolved`, and `output:github` to be `pending` or `in_progress` +- If prerequisites are not met, report blocked status with the unresolved review-state item +- After a pending review is created, update `ReviewContext.md` so `output:github` is `resolved`, terminal external review state is `pending-review-created`, `Pending Review ID` / `Pending Review URL` are populated, and `Reconciliation` is `current` +- After non-GitHub manual posting instructions are written, update `ReviewContext.md` so `output:github` is `resolved`, terminal external review state is `manual-posting-provided`, `Pending Review ID` / `Pending Review URL` stay `none`, and `Reconciliation` is `current` + +If `ReviewContext.md` does not contain `## Hardened State`, continue in legacy best-effort mode and explicitly report that hardened protections are inactive. + ## Core Responsibilities - Read all comments from finalized ReviewComments.md @@ -69,6 +86,8 @@ Build list of postable comments with: ### Step 3: Create Pending Review (GitHub PRs) +Before any GitHub create call, check whether `ReviewContext.md` already records `pending-review-created`. If it does, report the existing pending review ID/URL and stop without creating another pending review. Only backfill missing local annotations if that can be done without repeating the external side effect. + Use GitHub MCP tools to create pending review: **3.1 Create Pending Review:** @@ -122,7 +141,7 @@ return 'Anonymous'; ``` ``` -### Step 4: Update ReviewComments.md +### Step 4: Update ReviewComments.md and ReviewContext.md After posting, update each posted comment in ReviewComments.md: @@ -145,6 +164,17 @@ Update the file header: **Comments Posted**: 6 of 8 (2 skipped per critique) ``` +If `ReviewContext.md` contains `## Hardened State`, also update: +```markdown +### Terminal External Review State +- `pending-review-created` + +Pending Review ID: `12345678` +Pending Review URL: `` +``` + +Set `output:github` to `resolved` and `Reconciliation` to `current`. + ### Step 5: Multi-PR Pending Reviews When reviewing PRs across multiple repositories: @@ -192,6 +222,8 @@ For non-GitHub workflows (local branch review): **Skip GitHub Posting:** - Do not attempt to call GitHub MCP tools - Update ReviewComments.md status to `finalized (non-GitHub)` +- If `ReviewContext.md` contains `## Hardened State`, update it so `output:github` is `resolved`, terminal external review state is `manual-posting-provided`, `Pending Review ID` / `Pending Review URL` remain `none`, and `Reconciliation` is `current` +- If manual posting instructions were already provided for this review job, preserve the existing section and do not append a duplicate copy **Provide Manual Posting Instructions:** diff --git a/skills/paw-review-understanding/SKILL.md b/skills/paw-review-understanding/SKILL.md index b20d91bf..ea1fab6e 100644 --- a/skills/paw-review-understanding/SKILL.md +++ b/skills/paw-review-understanding/SKILL.md @@ -15,6 +15,7 @@ Analyze pull request changes to create comprehensive understanding artifacts. Th - Generate research prompt for baseline codebase analysis - Derive specification from PR description, code analysis, and baseline understanding - Create ReviewContext.md as authoritative parameter source +- Seed ReviewContext.md with a compact hardened-state section for review stages and terminal review state - Validate all artifacts meet quality standards ## Non-Responsibilities @@ -40,6 +41,13 @@ Execute Step 4 only: Derive specification from baseline research. **Detection**: Both ReviewContext.md AND CodeResearch.md exist at artifact path. +## Hardened-State Resume Rules + +- Initial mode seeds `## Hardened State` with `understanding` in progress, later review stage items pending, `procedure:review-mode` pending, terminal external review state `none`, and pending review identifiers set to `none`. +- If ReviewContext.md already contains `## Hardened State`, preserve the existing review identifier, stage items, terminal external review state, and pending review identifiers when resuming; do not reset them to defaults. +- After Step 4 creates `DerivedSpec.md`, update hardened state so `understanding` is `resolved`, `evaluation` is `pending`, and `Reconciliation` is `current`. +- If ReviewContext.md exists without `## Hardened State`, continue in legacy best-effort mode and explicitly report that hardened protections are inactive. + ## Context Detection Determine context type before proceeding: @@ -98,7 +106,7 @@ related_prs: 4. **Create ReviewContext.md**: - Write to `.paw/reviews//ReviewContext.md` - Use template structure below - - Include all metadata and flags + - Include all metadata, flags, and the hardened-state section ## Step 2: Research Questions Generation @@ -265,6 +273,26 @@ status: complete *SoT configuration fields are populated from the orchestrator's delegation context. When the orchestrator includes review configuration in the delegation prompt (e.g., `Review Mode: society-of-thought`), use those values. When not provided, apply defaults shown above. In particular, if `Review Mode` is `society-of-thought` and no `Review Specialists` value is provided, default to `all` — do not select a subset.* +## Hardened State + +TODO Mirror: active-required-items +Reconciliation: not_run + +### Review Stage Items +- `understanding` | `in_progress` | `stage` +- `evaluation` | `pending` | `stage` +- `output:feedback` | `pending` | `stage` +- `output:critic` | `pending` | `stage` +- `output:critique-response` | `pending` | `stage` +- `output:github` | `pending` | `stage` +- `procedure:review-mode` | `pending` | `procedure` + +### Terminal External Review State +- `none` + +Pending Review ID: `none` +Pending Review URL: `none` + ## Description diff --git a/skills/paw-review-workflow/SKILL.md b/skills/paw-review-workflow/SKILL.md index 2acda9fe..b80de995 100644 --- a/skills/paw-review-workflow/SKILL.md +++ b/skills/paw-review-workflow/SKILL.md @@ -129,6 +129,16 @@ Example: `acme-corp/my-api-service` → `my-api-service` **Multi-repo detection**: Use when multiple workspace folders are open in VS Code OR multiple PRs provided. +## Embedded Review Control State + +- If `ReviewContext.md` contains `## Hardened State`, use it as the durable source of truth for review stage sequencing and terminal external-review state. +- Reconcile `ReviewContext.md` against review artifacts before stage advancement, critique finalization, or GitHub/manual-posting output. If reconciliation cannot make the state `current`, stop and report the blocker. +- Determine the active stage from the first review stage item whose status is not terminal (`resolved` or `not_applicable`). +- Do not advance to evaluation while `understanding` is unresolved, or to later output steps while an earlier output item remains `pending`, `in_progress`, or `blocked`. +- `procedure:review-mode` resolves only when the configured evaluation path actually ran. Failed or mismatched evaluation paths block instead of silently falling back. +- `output:github` resolves only after a pending review is created or manual posting instructions are produced for non-GitHub contexts. +- If `## Hardened State` is absent, continue in legacy best-effort mode and explicitly report that hardened protections are inactive. + ## Workflow Orchestration The workflow executes stages in sequence, with each stage producing artifacts consumed by downstream stages. @@ -249,6 +259,12 @@ The Output stage uses an iterative feedback-critique pattern to refine comments **Stage Gate**: Verify all comments have `**Final**:` markers before GitHub posting. +**Hardened-state output sequencing**: +- `output:feedback` runs only after `evaluation` is resolved +- `output:critic` runs only after `output:feedback` is resolved +- `output:critique-response` runs only after `output:critic` is resolved +- `output:github` runs only after `output:critique-response` is resolved + **Human Control Point**: The pending review is created but NOT submitted. Human reviewer: - Reviews generated comments in GitHub UI - Can see full comment history in ReviewComments.md (original → assessment → updated) @@ -261,6 +277,7 @@ The Output stage uses an iterative feedback-critique pattern to refine comments Upon workflow completion, report: - Artifact locations (all generated files in `.paw/reviews//`) - **GitHub PRs**: Pending review ID and comment counts (e.g., "Pending review created: Review ID 12345678, 6 comments posted, 2 skipped per critique") +- Terminal external review state from `ReviewContext.md` - **Non-GitHub**: Manual posting instructions location - **Multi-repo reviews**: Cross-repo findings summary (interface contracts analyzed, mismatches found, deployment order) - Comment evolution summary: original comments generated, modified per critique, skipped per critique diff --git a/skills/paw-review-workflow/references/control-state-contract.md b/skills/paw-review-workflow/references/control-state-contract.md new file mode 100644 index 00000000..c2333d4a --- /dev/null +++ b/skills/paw-review-workflow/references/control-state-contract.md @@ -0,0 +1,65 @@ +# Control-State Contract + +## Shared Core + +- Reuse the same shared core meanings as the workflow contract: + - presence of `## Hardened State` enables hardened behavior + - absence falls back to legacy best-effort mode + - built-in TODOs are a mirror, not the durable source of truth + - shared status values are `pending`, `in_progress`, `blocked`, `resolved`, and `not_applicable` + - shared reconciliation markers are `not_run`, `current`, `stale`, and `external_unverified` + - shared item line format is `- \`\` | \`\` | \`\`` + +## ReviewContext Embedding + +Seed this section in `.paw/reviews//ReviewContext.md`: + +```markdown +## Hardened State + +TODO Mirror: active-required-items +Reconciliation: not_run + +### Review Stage Items +- `understanding` | `in_progress` | `stage` +- `evaluation` | `pending` | `stage` +- `output:feedback` | `pending` | `stage` +- `output:critic` | `pending` | `stage` +- `output:critique-response` | `pending` | `stage` +- `output:github` | `pending` | `stage` +- `procedure:review-mode` | `pending` | `procedure` + +### Terminal External Review State +- `none` + +Pending Review ID: `none` +Pending Review URL: `none` +``` + +## Review Item IDs + +- Review stage IDs: + - `understanding` + - `evaluation` + - `output:feedback` + - `output:critic` + - `output:critique-response` + - `output:github` +- Configured procedure IDs use `procedure:`. +- Terminal external review state stores exactly one current marker: + - `none` + - `pending-review-created` + - `manual-posting-provided` + +## Reconciliation Rules + +- When `## Hardened State` is present, `ReviewContext.md` is the durable review control-state source of truth. +- Reconcile it against `ReviewComments.md`, evaluation artifacts, and external review facts before stage advancement, critique finalization, or GitHub/manual-posting output. +- Mutation-affecting review decisions include delegated stage advancement, comment finalization, pending review creation, and terminal external-review updates. +- Determine current review position from the first review stage item whose status is not terminal (`resolved` or `not_applicable`). +- Later review stage items must block when an earlier review stage item or configured procedure item remains `pending`, `in_progress`, or `blocked`. +- `procedure:review-mode` becomes `resolved` only when the configured evaluation path actually ran and produced its required artifacts. +- `output:github` becomes `resolved` only after a pending review is created or manual posting instructions are written for non-GitHub contexts. +- `### Terminal External Review State` must contain exactly one current marker. +- `pending-review-created` requires `Pending Review ID` to be populated; `manual-posting-provided` requires `Pending Review ID` and `Pending Review URL` to remain `none`. +- When `## Hardened State` is absent, continue in legacy best-effort mode and explicitly report that hardened protections are inactive. diff --git a/skills/paw-status/SKILL.md b/skills/paw-status/SKILL.md index b2c78970..92a6106e 100644 --- a/skills/paw-status/SKILL.md +++ b/skills/paw-status/SKILL.md @@ -51,12 +51,22 @@ Read WorkflowContext.md for: - Plan Generation Mode: `single-model` | `multi-model` - Plan Generation Models: comma-separated model names (for multi-model modes) - Implementation Model: `none` (session default) | concrete model name -- Planning Review Mode: `single-model` | `multi-model` +- Planning Review Mode: `single-model` | `multi-model` | `society-of-thought` - Planning Review Interactive: `true` | `false` | `smart` - Final Review Specialists: `all` | `adaptive:` | comma-separated list (when society-of-thought) - Final Review Interaction Mode: `parallel` | `debate` (when society-of-thought) - Final Review Specialist Models: `none` | model pool | pinned pairs | mixed (when society-of-thought) +### Hardened-State Detection + +If `WorkflowContext.md` contains `## Hardened State`, read: +- `Reconciliation:` marker +- Required activity items +- Gate items +- Configured procedure items + +Treat embedded hardened state as the durable workflow source of truth when present. If the section is absent, continue in legacy best-effort mode and explicitly report that hardened protections are inactive. + ### Phase Counting Parse ImplementationPlan.md with regex: `^## Phase \d+:` @@ -96,7 +106,13 @@ When PRs exist: ## Workflow Stage Progression -Map state to guidance: +Map state to guidance. + +When hardened state is present: +- Derive the current workflow position from the first required activity item whose status is not terminal (`resolved` or `not_applicable`). +- Report any earlier `blocked`, `pending`, or `in_progress` gate/procedure items that should already be terminal as blockers rather than inferring progress past them. +- If `Reconciliation` is `stale` or `external_unverified`, do not imply freshness. Report a read-only/unverified status and name the unresolved control-state items. +- If `Reconciliation` is `not_run`, say reconciliation is still required before mutation-affecting work can safely proceed. | State | Recommendation | |-------|----------------| @@ -191,6 +207,7 @@ For "How do I start?", explain: ## Status Dashboard Format Synthesize findings into sections: +- **Control State**: hardened vs legacy mode, reconciliation marker, unresolved gate/procedure items - **Artifacts**: Existence and status - **Phases**: Current progress (N of M) - **Phase Candidates**: Pending/resolved candidate counts (if any exist) @@ -201,6 +218,9 @@ Synthesize findings into sections: ## Guardrails - Verify phase count from ImplementationPlan.md—never assume +- If hardened state is present, do not report progress past unresolved required items, gate items, or procedure items +- If reconciliation is `stale` or `external_unverified`, label the result read-only/unverified instead of implying mutation-safe readiness +- If hardened state is absent, explicitly say legacy best-effort mode is active - Never mutate issue descriptions or PR content outside summary blocks - Never push, merge, or rewrite git history - Be idempotent: same state → same summary diff --git a/skills/paw-transition/SKILL.md b/skills/paw-transition/SKILL.md index 2eba5034..505bd6bc 100644 --- a/skills/paw-transition/SKILL.md +++ b/skills/paw-transition/SKILL.md @@ -32,7 +32,13 @@ Read WorkflowContext.md to determine: - Final Agent Review (`enabled` | `disabled`) - If missing, default to `enabled` -Identify last completed activity from TODOs or artifacts. +If `WorkflowContext.md` contains `## Hardened State`, also read: +- `Reconciliation:` marker +- Required activity items +- Gate items +- Configured procedure items + +When hardened state is present, treat it as the durable workflow source of truth. Determine the last completed activity and next required activity from the required activity items, not from inferred artifact order alone. If the section is absent, continue in legacy best-effort mode and explicitly note that hardened protections are inactive. ### Step 2: Determine Next Activity @@ -51,6 +57,19 @@ Use the Mandatory Transitions table: **Skippable = NO**: Add activity TODO and execute immediately after transition completes. +When hardened state is present, map the first non-terminal required activity item to the next activity: +- `spec` → `paw-spec` +- `spec-review` → `paw-spec-review` +- `code-research` → `paw-code-research` +- `planning` → `paw-planning` +- `plan-review` → `paw-plan-review` +- `planning-docs-review` → `paw-planning-docs-review` +- `phase::` → `paw-implement (Phase : )` +- `final-review` → `paw-final-review` +- `final-pr` → `paw-pr` + +If the first non-terminal required activity item disagrees with the transition table or the observed artifact state, report a blocker instead of inferring around the mismatch. + ### Step 2.5: Candidate Promotion Check When all planned phases are complete (next activity would be `paw-final-review` or `paw-pr`), check for phase candidates: @@ -142,6 +161,13 @@ Before the next activity can start, verify: - If `next_activity = paw-pr` and detected lifecycle is `commit-and-clean`: `artifact_lifecycle_action = stop-tracking (run \`git rm --cached -r .paw/work//\` before PR creation)` - Otherwise: `artifact_lifecycle_action = none` +**Hardened-state reconciliation** (when `## Hardened State` exists): +1. Treat `Reconciliation: current` as required for mutation-affecting next activities. +2. If `Reconciliation` is `not_run`, `stale`, or `external_unverified`, reconcile the embedded state against artifacts, git state, PR state, and the just-completed boundary before returning `preflight: passed`. +3. If reconciliation cannot prove the relevant live state, return `preflight: blocked: reconciliation incomplete` and explicitly report whether the state is `stale` or `external_unverified`. +4. If any required activity item before `next_activity`, any relevant gate item, or any relevant configured procedure item remains `pending`, `in_progress`, or `blocked` when it should already be terminal, return `preflight: blocked: unresolved`. +5. Only return `preflight: passed` when the reconciled hardened state supports the computed `next_activity`. + If any check fails, report blocker and stop. ### Step 5: Queue Next Activity @@ -150,6 +176,8 @@ Add TODO for next activity: - `[ ] ()` - `[ ] paw-transition` +When hardened state is present, TODOs are a mirror only. Queue TODOs only for required activity items whose status is `pending`, `in_progress`, or `blocked`. + ## Completion After completing all steps, return structured output: diff --git a/skills/paw-workflow/references/control-state-contract.md b/skills/paw-workflow/references/control-state-contract.md new file mode 100644 index 00000000..288a7bfa --- /dev/null +++ b/skills/paw-workflow/references/control-state-contract.md @@ -0,0 +1,89 @@ +# Control-State Contract + +## Shared Core + +- Presence of `## Hardened State` inside `WorkflowContext.md` or `ReviewContext.md` means hardened behavior is active. If the section is absent, continue in legacy best-effort mode. +- Embedded hardened state is the durable source of truth. Built-in TODOs are an execution mirror only. +- TODOs mirror only active required items whose status is `pending`, `in_progress`, or `blocked`. +- Shared status values: + - `pending` + - `in_progress` + - `blocked` + - `resolved` + - `not_applicable` +- Shared reconciliation markers: + - `not_run` + - `current` + - `stale` + - `external_unverified` +- Shared item line format: + - `- \`\` | \`\` | \`\`` +- Preserve unknown additive fields or items when updating the section. + +## WorkflowContext Embedding + +Seed this section in `.paw/work//WorkflowContext.md`: + +```markdown +## Hardened State + +TODO Mirror: active-required-items +Reconciliation: not_run + +### Required Workflow Items +- `init` | `resolved` | `activity` +- `spec` | `` | `activity` +- `spec-review` | `` | `activity` +- `code-research` | `pending` | `activity` +- `planning` | `pending` | `activity` +- `plan-review` | `pending` | `activity` +- `planning-docs-review` | `` | `activity` +- `final-review` | `` | `activity` +- `final-pr` | `pending` | `activity` + +### Gate Items +- `transition:after-spec-review` | `` | `transition` +- `transition:after-plan-review` | `pending` | `transition` +- `transition:after-planning-docs-review` | `` | `transition` +- `transition:after-phase:` | `pending` | `transition` +- `transition:after-final-review` | `` | `transition` + +### Configured Procedure Items +- `procedure:planning-review` | `` | `procedure` +- `procedure:final-review` | `` | `procedure` +``` + +Resolve config-dependent rows to concrete values before writing the section: +- Use `not_applicable` for `spec`, `spec-review`, and `transition:after-spec-review` when `Workflow Mode` is `minimal`; otherwise use `pending`. +- Use `pending` for `planning-docs-review`, `transition:after-planning-docs-review`, and `procedure:planning-review` when `Planning Docs Review` is `enabled`; otherwise use `not_applicable`. +- Use `pending` for `final-review`, `transition:after-final-review`, and `procedure:final-review` when `Final Agent Review` is `enabled`; otherwise use `not_applicable`. + +## Workflow Item IDs + +- Fixed activity IDs: + - `init` + - `spec` + - `spec-review` + - `code-research` + - `planning` + - `plan-review` + - `planning-docs-review` + - `final-review` + - `final-pr` +- Dynamic phase IDs use `phase::`. +- Gate IDs use `transition:`. +- Configured procedure IDs use `procedure:`. + +## Reconciliation Rules + +- When `## Hardened State` is present, it is the durable workflow source of truth. Reconcile it against artifacts, git state, PR state, and other live facts before any mutation-affecting decision. +- Mutation-affecting decisions include delegation, transition advancement, review execution, git mutation, and final PR creation. +- `Reconciliation: current` means the embedded state was checked against the relevant live state and can drive mutation-affecting decisions. +- `Reconciliation: stale` means the embedded state may no longer match artifacts/git/PR state. Mutation-affecting decisions must block until reconciliation succeeds. +- `Reconciliation: external_unverified` means external state (for example git or PR state) could not be proven. Mutation-affecting decisions must block until reconciliation succeeds. +- `Reconciliation: not_run` means reconciliation has not yet happened in the current context. Reconcile before mutation-affecting decisions. +- Read-only status/reporting may continue when reconciliation is `stale` or `external_unverified`, but must explicitly label the result as read-only and unverified. +- When `## Hardened State` is absent, continue in legacy best-effort mode and explicitly report that hardened protections are inactive. +- When hardened state is present, determine the next required workflow activity from the first required activity item whose status is not terminal (`resolved` or `not_applicable`). A preceding `blocked` item blocks advancement. +- Gate items and configured procedure items must be `resolved` or `not_applicable` before later mutation-affecting activities that depend on them can proceed. +- Configured procedure items become `resolved` only when the configured mode actually ran. Unsupported runtime/mode combinations must be recorded as `blocked`, never silently downgraded. diff --git a/src/commands/getWorkStatus.ts b/src/commands/getWorkStatus.ts index 2655408e..1416020a 100644 --- a/src/commands/getWorkStatus.ts +++ b/src/commands/getWorkStatus.ts @@ -85,7 +85,7 @@ export async function getWorkStatusCommand( } // Construct query for Status Agent - const query = selection.value ? `Work ID: ${selection.value}` : 'What is the current work status?'; + const query = constructGetWorkStatusQuery(selection.value || undefined); outputChannel.appendLine( `[INFO] Opening Status Agent${selection.value ? ` for ${selection.value}` : ' (auto-detect)'}` @@ -108,6 +108,19 @@ export async function getWorkStatusCommand( } } +export function constructGetWorkStatusQuery(workId?: string): string { + const header = workId ? `Work ID: ${workId}` : 'What is the current work status?'; + + return [ + header, + '', + 'Run `paw-status` and read WorkflowContext.md before inferring current workflow state.', + 'If WorkflowContext.md includes `## Hardened State`, use that embedded control state as the durable source of truth.', + 'If hardened state is absent, report legacy best-effort mode explicitly.', + 'Preserve blocked, pending, and stale/unverified reconciliation status instead of implying fresh state.', + ].join('\n'); +} + /** * Register the getWorkStatus command with VS Code. * diff --git a/src/commands/initializeWorkItem.ts b/src/commands/initializeWorkItem.ts index e466187c..2aac999a 100644 --- a/src/commands/initializeWorkItem.ts +++ b/src/commands/initializeWorkItem.ts @@ -5,6 +5,7 @@ import * as path from 'path'; import { collectUserInputs, type FinalReviewConfig, + type PlanningReviewConfig, type ArtifactLifecycle, type WorkItemInputs, isWorktreeExecutionEnabled, @@ -96,11 +97,27 @@ interface PromptArgumentsInput { reviewPolicy: ReviewPolicy; sessionPolicy: SessionPolicy; artifactLifecycle: ArtifactLifecycle; + planningReview: PlanningReviewConfig; finalReview: FinalReviewConfig; issueUrl?: string; executionMetadata?: ExecutionMetadata; } +function appendReviewParameters( + config: Record, + prefix: 'planning_review' | 'final_review', + review: PlanningReviewConfig | FinalReviewConfig +): void { + config[`${prefix}_mode`] = review.mode; + config[`${prefix}_interactive`] = review.interactive; + config[`${prefix}_models`] = review.models ?? 'latest GPT, latest Gemini, latest Claude Opus'; + config[`${prefix}_specialists`] = review.specialists ?? 'all'; + config[`${prefix}_interaction_mode`] = review.interactionMode ?? 'parallel'; + config[`${prefix}_specialist_models`] = review.specialistModels ?? 'none'; + config[`${prefix}_perspectives`] = review.perspectives ?? 'auto'; + config[`${prefix}_perspective_cap`] = review.perspectiveCap ?? 2; +} + function normalizePath(targetPath: string): string { return targetPath.replace(/\\/g, '/'); } @@ -158,7 +175,7 @@ export function constructPawPromptArguments( inputs: PromptArgumentsInput, _workspacePath: string ): string { - const config: Record = { + const config: Record = { target_branch: inputs.targetBranch.trim() || 'auto', workflow_mode: inputs.workflowMode.mode, execution_mode: inputs.executionMode, @@ -166,6 +183,7 @@ export function constructPawPromptArguments( review_policy: inputs.reviewPolicy, session_policy: inputs.sessionPolicy, artifact_lifecycle: inputs.artifactLifecycle, + planning_docs_review: inputs.planningReview.enabled ? 'enabled' : 'disabled', final_agent_review: inputs.finalReview.enabled ? 'enabled' : 'disabled', }; @@ -175,10 +193,12 @@ export function constructPawPromptArguments( config.execution_binding = inputs.executionMetadata.executionBinding; } + if (inputs.planningReview.enabled) { + appendReviewParameters(config, 'planning_review', inputs.planningReview); + } + if (inputs.finalReview.enabled) { - config.final_review_mode = inputs.finalReview.mode; - config.final_review_interactive = inputs.finalReview.interactive; - config.final_review_models = 'latest GPT, latest Gemini, latest Claude Opus'; + appendReviewParameters(config, 'final_review', inputs.finalReview); } if (inputs.issueUrl) { @@ -584,10 +604,25 @@ export async function initializeWorkItemCommand( outputChannel.appendLine(`[INFO] Review policy: ${inputs.reviewPolicy}`); outputChannel.appendLine(`[INFO] Session policy: ${inputs.sessionPolicy}`); outputChannel.appendLine(`[INFO] Artifact lifecycle: ${inputs.artifactLifecycle}`); + outputChannel.appendLine(`[INFO] Planning Docs Review: ${inputs.planningReview.enabled ? 'enabled' : 'disabled'}`); + if (inputs.planningReview.enabled) { + outputChannel.appendLine(`[INFO] Planning Review mode: ${inputs.planningReview.mode}`); + outputChannel.appendLine(`[INFO] Planning Review interactive: ${inputs.planningReview.interactive}`); + if (inputs.planningReview.mode === 'society-of-thought') { + outputChannel.appendLine(`[INFO] Planning Review specialists: ${inputs.planningReview.specialists}`); + outputChannel.appendLine(`[INFO] Planning Review interaction mode: ${inputs.planningReview.interactionMode}`); + outputChannel.appendLine(`[INFO] Planning Review perspectives: ${inputs.planningReview.perspectives}`); + } + } outputChannel.appendLine(`[INFO] Final Review: ${inputs.finalReview.enabled ? 'enabled' : 'disabled'}`); if (inputs.finalReview.enabled) { outputChannel.appendLine(`[INFO] Final Review mode: ${inputs.finalReview.mode}`); outputChannel.appendLine(`[INFO] Final Review interactive: ${inputs.finalReview.interactive}`); + if (inputs.finalReview.mode === 'society-of-thought') { + outputChannel.appendLine(`[INFO] Final Review specialists: ${inputs.finalReview.specialists}`); + outputChannel.appendLine(`[INFO] Final Review interaction mode: ${inputs.finalReview.interactionMode}`); + outputChannel.appendLine(`[INFO] Final Review perspectives: ${inputs.finalReview.perspectives}`); + } } if (inputs.issueUrl) { outputChannel.appendLine(`[INFO] Issue URL: ${inputs.issueUrl}`); diff --git a/src/prompts/stopTrackingArtifacts.template.md b/src/prompts/stopTrackingArtifacts.template.md index d82b1665..50760bf5 100644 --- a/src/prompts/stopTrackingArtifacts.template.md +++ b/src/prompts/stopTrackingArtifacts.template.md @@ -15,6 +15,8 @@ PAW stores workflow artifacts in `.paw/work//` directories including: These artifacts help agents understand context and maintain state across workflow stages. However, some users prefer not to commit them (e.g., contributing to non-PAW repositories). +Treat `WorkflowContext.md` as the durable artifact-lifecycle source of truth. If it includes a `## Hardened State` section, preserve that section exactly and update only the lifecycle field. If hardened state is absent, treat the workflow as legacy best-effort mode. + ## Parameters - **Work ID**: {{WORK_ID}} @@ -43,7 +45,7 @@ This pattern ignores all files in the directory, including the `.gitignore` itse ### 3. Update WorkflowContext.md Lifecycle Field -Update the `Artifact Lifecycle:` field in `.paw/work/{{WORK_ID}}/WorkflowContext.md` to `never-commit`. If the field doesn't exist, add `Artifact Lifecycle: never-commit` near other configuration fields. This makes the lifecycle change durable for any agent that reads the workflow context. +Update the `Artifact Lifecycle:` field in `.paw/work/{{WORK_ID}}/WorkflowContext.md` to `never-commit`. If the field doesn't exist, add `Artifact Lifecycle: never-commit` near other configuration fields. This makes the lifecycle change durable for any agent that reads the workflow context. Do not rewrite or remove unrelated hardened-state or legacy-state lines. ### 4. Commit the Removal diff --git a/src/test/suite/getWorkStatus.test.ts b/src/test/suite/getWorkStatus.test.ts new file mode 100644 index 00000000..4d5c3f93 --- /dev/null +++ b/src/test/suite/getWorkStatus.test.ts @@ -0,0 +1,23 @@ +import * as assert from 'assert'; +import { constructGetWorkStatusQuery } from '../../commands/getWorkStatus'; + +suite('Get Work Status Command', () => { + test('constructGetWorkStatusQuery includes hardened-state instructions for explicit work IDs', () => { + const query = constructGetWorkStatusQuery('workflow-hardening'); + + assert.ok(query.includes('Work ID: workflow-hardening')); + assert.ok(query.includes('WorkflowContext.md')); + assert.ok(query.includes('embedded control state')); + assert.ok(query.includes('legacy best-effort mode')); + assert.ok(query.includes('stale/unverified')); + }); + + test('constructGetWorkStatusQuery includes hardened-state instructions for auto-detect mode', () => { + const query = constructGetWorkStatusQuery(); + + assert.ok(query.includes('What is the current work status?')); + assert.ok(query.includes('WorkflowContext.md')); + assert.ok(query.includes('legacy best-effort mode')); + assert.ok(query.includes('blocked, pending, and stale/unverified')); + }); +}); diff --git a/src/test/suite/gitValidation.test.ts b/src/test/suite/gitValidation.test.ts index 1ec504a1..644e6d5e 100644 --- a/src/test/suite/gitValidation.test.ts +++ b/src/test/suite/gitValidation.test.ts @@ -29,6 +29,10 @@ async function createTempRepo(): Promise { return repoDir; } +async function createDirectoryLink(targetPath: string, linkPath: string): Promise { + await symlink(targetPath, linkPath, process.platform === 'win32' ? 'junction' : 'dir'); +} + function gitOutput(repoDir: string, args: string[]): string { return execFileSync('git', args, { cwd: repoDir, encoding: 'utf8' }).trim(); } @@ -117,7 +121,9 @@ function buildReuseOptions( }; } -suite('Git Validation Helpers', () => { +suite('Git Validation Helpers', function () { + this.timeout(30_000); + test('parseWorktreeListPorcelain parses branches and detached entries', () => { const parsed = parseWorktreeListPorcelain([ 'worktree /tmp/repo', @@ -222,7 +228,13 @@ suite('Git Validation Helpers', () => { }); const worktrees = await listGitWorktrees(repoDir); - const created = worktrees.find((worktree) => worktree.path === createdPath); + const canonicalCreatedPath = await realpath(createdPath); + const created = (await Promise.all( + worktrees.map(async (worktree) => ({ + worktree, + canonicalPath: await realpath(worktree.path), + })) + )).find((entry) => entry.canonicalPath === canonicalCreatedPath)?.worktree; assert.ok(created, 'Expected created worktree to be registered'); assert.strictEqual(created?.detached, true); @@ -235,7 +247,7 @@ suite('Git Validation Helpers', () => { test('createWorktree resolves relative paths from the repository root', async () => { const repoDir = await createTempRepo(); const cwdDir = await mkdtemp(join(tmpdir(), 'paw-git-validation-cwd-')); - const relativePath = '../repo-feature'; + const relativePath = `../${basename(repoDir)}-relative-feature`; const expectedPath = resolve(repoDir, relativePath); const originalCwd = process.cwd(); @@ -479,7 +491,7 @@ suite('Git Validation Helpers', () => { targetBranch, }); const executionContract = await createExecutionContract(repoDir, worktreeDir, targetBranch); - await symlink(worktreeDir, linkPath); + await createDirectoryLink(worktreeDir, linkPath); await assert.rejects( () => validateReusableWorktree( @@ -561,7 +573,7 @@ suite('Git Validation Helpers', () => { const realWorktreePath = join(dirname(repoDir), `${basename(repoDir)}-feature`); try { - await symlink(dirname(repoDir), linkedParent); + await createDirectoryLink(dirname(repoDir), linkedParent); const createdPath = await createWorktree({ repositoryPath: repoDir, worktreePath: linkedWorktreePath, diff --git a/src/test/suite/handoffTool.test.ts b/src/test/suite/handoffTool.test.ts index 106e533b..fb535da9 100644 --- a/src/test/suite/handoffTool.test.ts +++ b/src/test/suite/handoffTool.test.ts @@ -5,7 +5,7 @@ import { HandoffParams } from '../../tools/handoffTool'; * Handoff tool tests. * * These unit tests verify the handoff tool's core logic: - * - Work ID validation (format, empty checks) + * - Work/review identifier validation (format, empty checks) * - Agent name enum type safety * - Inline instruction handling * - Prompt message construction @@ -16,7 +16,7 @@ import { HandoffParams } from '../../tools/handoffTool'; * these are fire-and-forget commands that don't expose testable interfaces. */ suite('Handoff Tool', () => { - suite('Work ID Validation', () => { + suite('Context Identifier Validation', () => { test('accepts valid Work IDs with lowercase letters, numbers, and hyphens', () => { const validWorkIds = [ 'feature-slug', @@ -30,7 +30,7 @@ suite('Handoff Tool', () => { validWorkIds.forEach(workId => { // Should not throw assert.doesNotThrow(() => { - validateWorkIdFormat(workId); + validateContextIdFormat('PAW', workId); }); }); }); @@ -44,7 +44,7 @@ suite('Handoff Tool', () => { invalidWorkIds.forEach(workId => { assert.throws( - () => validateWorkIdFormat(workId), + () => validateContextIdFormat('PAW', workId), /Invalid Work ID format/, `Should reject Work ID with uppercase: ${workId}` ); @@ -63,7 +63,7 @@ suite('Handoff Tool', () => { invalidWorkIds.forEach(workId => { assert.throws( - () => validateWorkIdFormat(workId), + () => validateContextIdFormat('PAW', workId), /Invalid Work ID format/, `Should reject Work ID with special chars: ${workId}` ); @@ -72,17 +72,48 @@ suite('Handoff Tool', () => { test('rejects empty Work ID', () => { assert.throws( - () => validateWorkIdFormat(''), + () => validateContextIdFormat('PAW', ''), /Work ID cannot be empty/ ); }); test('rejects whitespace-only Work ID', () => { assert.throws( - () => validateWorkIdFormat(' '), + () => validateContextIdFormat('PAW', ' '), /Work ID cannot be empty/ ); }); + + test('accepts PAW Review identifiers for GitHub and local review contexts', () => { + const validReviewIds = [ + 'PR-123', + 'PR-123-my-api', + 'feature-review-state-test', + ]; + + validReviewIds.forEach(reviewId => { + assert.doesNotThrow(() => { + validateContextIdFormat('PAW Review', reviewId); + }); + }); + }); + + test('rejects invalid PAW Review identifiers', () => { + const invalidReviewIds = [ + 'PR_123', + 'PR/123', + 'Review-123', + '../etc/passwd', + ]; + + invalidReviewIds.forEach(reviewId => { + assert.throws( + () => validateContextIdFormat('PAW Review', reviewId), + /Invalid review identifier format/, + `Should reject invalid review identifier: ${reviewId}` + ); + }); + }); }); suite('Agent Name Enum', () => { @@ -95,7 +126,7 @@ suite('Handoff Tool', () => { validAgentNames.forEach(agentName => { const params: HandoffParams = { target_agent: agentName, - work_id: 'test-feature', + work_id: agentName === 'PAW Review' ? 'PR-123' : 'test-feature', }; assert.strictEqual(params.target_agent, agentName); }); @@ -138,6 +169,8 @@ suite('Handoff Tool', () => { }; const prompt = constructPromptMessage(params); assert.ok(prompt.includes('Work ID: test-feature')); + assert.ok(prompt.includes('WorkflowContext.md')); + assert.ok(prompt.includes('legacy best-effort mode')); assert.ok(prompt.includes('Focus on error handling')); }); @@ -147,12 +180,25 @@ suite('Handoff Tool', () => { work_id: 'test-feature', }; const prompt = constructPromptMessage(params); - assert.strictEqual(prompt, 'Work ID: test-feature'); + assert.ok(prompt.includes('Work ID: test-feature')); + assert.ok(prompt.includes('WorkflowContext.md')); + assert.ok(prompt.includes('legacy best-effort mode')); + }); + + test('constructs review handoff prompt with ReviewContext guidance', () => { + const params: HandoffParams = { + target_agent: 'PAW Review', + work_id: 'PR-123', + }; + const prompt = constructPromptMessage(params); + assert.ok(prompt.includes('Review ID: PR-123')); + assert.ok(prompt.includes('ReviewContext.md')); + assert.ok(prompt.includes('legacy best-effort mode')); }); }); suite('Prompt Message Construction', () => { - test('includes Work ID in prompt', () => { + test('includes Work ID in PAW prompt', () => { const params: HandoffParams = { target_agent: 'PAW', work_id: 'test-feature', @@ -161,6 +207,15 @@ suite('Handoff Tool', () => { assert.ok(prompt.includes('Work ID: test-feature')); }); + test('includes Review ID in PAW Review prompt', () => { + const params: HandoffParams = { + target_agent: 'PAW Review', + work_id: 'PR-123-my-api', + }; + const prompt = constructPromptMessage(params); + assert.ok(prompt.includes('Review ID: PR-123-my-api')); + }); + test('appends inline instruction when provided', () => { const params: HandoffParams = { target_agent: 'PAW', @@ -190,7 +245,7 @@ suite('Handoff Tool', () => { suite('Error Handling', () => { test('validation rejects invalid Work ID format', () => { assert.throws( - () => validateWorkIdFormat('Invalid_Work_ID'), + () => validateContextIdFormat('PAW', 'Invalid_Work_ID'), /Invalid Work ID format/, 'Should reject Work ID with underscores' ); @@ -198,7 +253,7 @@ suite('Handoff Tool', () => { test('validation rejects empty Work ID', () => { assert.throws( - () => validateWorkIdFormat(''), + () => validateContextIdFormat('PAW', ''), /Work ID cannot be empty/, 'Should reject empty Work ID' ); @@ -206,7 +261,7 @@ suite('Handoff Tool', () => { test('validation accepts valid Work ID', () => { assert.doesNotThrow( - () => validateWorkIdFormat('test-feature'), + () => validateContextIdFormat('PAW', 'test-feature'), 'Should accept valid Work ID' ); }); @@ -219,6 +274,13 @@ suite('Handoff Tool', () => { assert.ok(message.includes('test-feature')); }); + test('constructs review confirmation message with review identifier', () => { + const message = constructConfirmationMessage('PAW Review', 'PR-123-my-api'); + assert.ok(message.includes('PAW Review')); + assert.ok(message.includes('PR-123-my-api')); + assert.ok(message.includes('review')); + }); + test('constructs confirmation message with inline instruction when provided', () => { const message = constructConfirmationMessage('PAW', 'test-feature', 'Phase 2'); assert.ok(message.includes('PAW')); @@ -239,13 +301,25 @@ suite('Handoff Tool', () => { // Helper functions to test tool logic without VS Code Language Model API /** - * Validates Work ID format (mirrors validation in handoffTool.ts) + * Validates work/review identifiers (mirrors validation in handoffTool.ts) */ -function validateWorkIdFormat(workId: string): void { +function validateContextIdFormat(targetAgent: HandoffParams['target_agent'], workId: string): void { const WORK_ID_PATTERN = /^[a-z0-9-]+$/; + const REVIEW_ID_PATTERN = /^(?:PR-\d+(?:-[a-z0-9-]+)?|[a-z0-9-]+)$/; if (!workId || workId.trim().length === 0) { - throw new Error('Work ID cannot be empty'); + throw new Error(targetAgent === 'PAW Review' ? 'Review ID cannot be empty' : 'Work ID cannot be empty'); + } + + if (targetAgent === 'PAW Review') { + if (!REVIEW_ID_PATTERN.test(workId)) { + throw new Error( + `Invalid review identifier format: "${workId}". ` + + 'Review identifiers must be PR IDs like "PR-123" or "PR-123-my-repo", or lowercase local review slugs.' + ); + } + + return; } if (!WORK_ID_PATTERN.test(workId)) { @@ -260,7 +334,18 @@ function validateWorkIdFormat(workId: string): void { * Constructs prompt message (mirrors logic in handoffTool.ts) */ function constructPromptMessage(params: HandoffParams): string { - let prompt = `Work ID: ${params.work_id}`; + const contextLabel = params.target_agent === 'PAW Review' ? 'Review ID' : 'Work ID'; + let prompt = `${contextLabel}: ${params.work_id}`; + + if (params.target_agent === 'PAW Review') { + prompt += '\n\nBefore acting, read the existing review artifacts for this review identifier.'; + prompt += '\nUse `ReviewContext.md` as the durable review-state source when embedded hardened state is present.'; + prompt += '\nIf hardened review state is absent, continue in legacy best-effort mode and say so explicitly.'; + } else { + prompt += '\n\nBefore acting, read the existing workflow artifacts for this work item.'; + prompt += '\nUse `WorkflowContext.md` as the durable workflow-state source when embedded hardened state is present.'; + prompt += '\nIf hardened workflow state is absent, continue in legacy best-effort mode and say so explicitly.'; + } if (params.inline_instruction) { prompt += `\n\n${params.inline_instruction}`; @@ -284,7 +369,8 @@ function constructConfirmationMessage( workId: string, inlineInstruction?: string ): string { - let message = `This will start a new chat with **${targetAgent}** for feature: ${workId}`; + const targetLabel = targetAgent === 'PAW Review' ? 'review' : 'feature'; + let message = `This will start a new chat with **${targetAgent}** for ${targetLabel}: ${workId}`; if (inlineInstruction) { message += `\n\nWith instruction: "${inlineInstruction}"`; } diff --git a/src/test/suite/initializeWorkItem.test.ts b/src/test/suite/initializeWorkItem.test.ts index 6e0b1db8..2858516a 100644 --- a/src/test/suite/initializeWorkItem.test.ts +++ b/src/test/suite/initializeWorkItem.test.ts @@ -81,6 +81,11 @@ function createCurrentCheckoutInputs(): WorkItemInputs { reviewPolicy: 'final-pr-only', sessionPolicy: 'continuous', artifactLifecycle: 'commit-and-clean', + planningReview: { + enabled: false, + mode: 'single-model', + interactive: true, + }, finalReview: { enabled: true, mode: 'single-model', @@ -111,10 +116,25 @@ suite('Initialize Work Item Helpers', () => { reviewPolicy: 'final-pr-only', sessionPolicy: 'continuous', artifactLifecycle: 'commit-and-clean', + planningReview: { + enabled: true, + mode: 'society-of-thought', + interactive: 'smart', + specialists: 'all', + interactionMode: 'parallel', + specialistModels: 'none', + perspectives: 'auto', + perspectiveCap: 2, + }, finalReview: { enabled: true, - mode: 'single-model', + mode: 'society-of-thought', interactive: 'smart', + specialists: 'all', + interactionMode: 'parallel', + specialistModels: 'none', + perspectives: 'auto', + perspectiveCap: 2, }, issueUrl: 'https://github.com/example/repo/issues/123', executionMetadata: { @@ -130,6 +150,19 @@ suite('Initialize Work Item Helpers', () => { assert.ok(prompt.includes('- **work_id**: test-worktree')); assert.ok(prompt.includes('- **repository_identity**: github.com/example/repo@abc123')); assert.ok(prompt.includes('- **execution_binding**: worktree:test-worktree:feature/test-worktree')); + assert.ok(prompt.includes('- **planning_docs_review**: enabled')); + assert.ok(prompt.includes('- **planning_review_mode**: society-of-thought')); + assert.ok(prompt.includes('- **planning_review_specialists**: all')); + assert.ok(prompt.includes('- **planning_review_interaction_mode**: parallel')); + assert.ok(prompt.includes('- **planning_review_specialist_models**: none')); + assert.ok(prompt.includes('- **planning_review_perspectives**: auto')); + assert.ok(prompt.includes('- **planning_review_perspective_cap**: 2')); + assert.ok(prompt.includes('- **final_review_mode**: society-of-thought')); + assert.ok(prompt.includes('- **final_review_specialists**: all')); + assert.ok(prompt.includes('- **final_review_interaction_mode**: parallel')); + assert.ok(prompt.includes('- **final_review_specialist_models**: none')); + assert.ok(prompt.includes('- **final_review_perspectives**: auto')); + assert.ok(prompt.includes('- **final_review_perspective_cap**: 2')); }); test('paw-init skill template persists execution contract fields in WorkflowContext', () => { @@ -437,15 +470,21 @@ suite('Initialize Work Item Helpers', () => { {} ); assert.deepStrictEqual( - registry[createExecutionRegistryLookupKey( - executionMetadata.repositoryIdentity, - executionMetadata.executionBinding - )], - buildExecutionRegistryEntry( - executionMetadata, - '/tmp/repo-worktree', - 'feature/test-worktree' - ) + { + ...registry[createExecutionRegistryLookupKey( + executionMetadata.repositoryIdentity, + executionMetadata.executionBinding + )], + updatedAt: '', + }, + { + ...buildExecutionRegistryEntry( + executionMetadata, + '/tmp/repo-worktree', + 'feature/test-worktree' + ), + updatedAt: '', + } ); const pendingState = context.globalState.get>( diff --git a/src/test/suite/installer.test.ts b/src/test/suite/installer.test.ts index 05c9796a..d8cfae61 100644 --- a/src/test/suite/installer.test.ts +++ b/src/test/suite/installer.test.ts @@ -3,7 +3,9 @@ import * as fs from 'fs'; import * as path from 'path'; import * as os from 'os'; import * as vscode from 'vscode'; +import { loadAgentTemplates } from '../../agents/agentTemplates'; import { installAgents, needsInstallation, isDevelopmentVersion, removeInstalledAgents } from '../../agents/installer'; +import { loadPromptTemplates } from '../../agents/promptTemplates'; function mockWorkspaceConfiguration(promptsDir: string): vscode.WorkspaceConfiguration { return { @@ -256,10 +258,15 @@ suite('Agent Installation', () => { test('installAgents continues on individual file failures', async () => { const promptsDir = path.join(testDir, 'prompts'); fs.mkdirSync(promptsDir, { recursive: true }); - - // Make prompts directory read-only to cause write errors - fs.chmodSync(promptsDir, 0o444); - + const agentTemplates = loadAgentTemplates(mockContext.extension.extensionUri); + const promptTemplates = loadPromptTemplates(mockContext.extension.extensionUri); + const failingPaths = [...agentTemplates, ...promptTemplates].map((template) => + path.join(promptsDir, template.filename) + ); + for (const failingPath of failingPaths) { + fs.mkdirSync(failingPath, { recursive: true }); + } + const restoreConfig = overridePromptDirectoryConfig(promptsDir); try { @@ -274,8 +281,6 @@ suite('Agent Installation', () => { assert.ok(state, 'Should create installation state'); assert.strictEqual(state.success, false, 'Should mark as failed'); } finally { - // Restore permissions for cleanup - fs.chmodSync(promptsDir, 0o755); restoreConfig(); } }); diff --git a/src/test/suite/userInput.test.ts b/src/test/suite/userInput.test.ts index 941ce072..cd8d9994 100644 --- a/src/test/suite/userInput.test.ts +++ b/src/test/suite/userInput.test.ts @@ -1,6 +1,8 @@ import * as assert from 'assert'; import * as vscode from 'vscode'; import { + collectFinalReviewConfig, + collectPlanningReviewConfig, collectUserInputs, isValidBranchName, WorkflowMode, @@ -11,6 +13,7 @@ import { ArtifactLifecycle, WorkflowModeSelection, FinalReviewConfig, + PlanningReviewConfig, WORKTREE_EXECUTION_FEATURE_FLAG } from '../../ui/userInput'; @@ -410,6 +413,98 @@ suite('Final Review Config Types', () => { assert.strictEqual(config.mode, 'multi-model'); assert.strictEqual(config.interactive, 'smart'); }); + + test('FinalReviewConfig works with society-of-thought mode', () => { + const config: FinalReviewConfig = { + enabled: true, + mode: 'society-of-thought', + interactive: 'smart', + specialists: 'all', + interactionMode: 'parallel', + specialistModels: 'none', + perspectives: 'auto', + perspectiveCap: 2, + }; + + assert.strictEqual(config.mode, 'society-of-thought'); + assert.strictEqual(config.specialists, 'all'); + assert.strictEqual(config.interactionMode, 'parallel'); + assert.strictEqual(config.perspectives, 'auto'); + }); +}); + +suite('Planning Review Config Types', () => { + test('PlanningReviewConfig works when disabled', () => { + const config: PlanningReviewConfig = { + enabled: false, + mode: 'single-model', + interactive: true, + }; + + assert.strictEqual(config.enabled, false); + assert.strictEqual(config.mode, 'single-model'); + }); +}); + +suite('Structured Review Collection', () => { + test('collectFinalReviewConfig preserves society-of-thought selections', async () => { + const mutableWindow = vscode.window as unknown as { + showQuickPick: typeof vscode.window.showQuickPick; + showInputBox: typeof vscode.window.showInputBox; + }; + const originalShowQuickPick = mutableWindow.showQuickPick; + const originalShowInputBox = mutableWindow.showInputBox; + + try { + mutableWindow.showQuickPick = (async ( + items: ReadonlyArray<{ value: unknown }>, + options?: vscode.QuickPickOptions + ) => { + const title = options?.title ?? 'unknown quick pick'; + switch (title) { + case 'Final Agent Review': + return findQuickPickValue(items, true); + case 'Final Review Mode': + return findQuickPickValue(items, 'society-of-thought'); + case 'Final Review Interaction': + return findQuickPickValue(items, 'smart'); + case 'Final Review Specialists': + return findQuickPickValue(items, 'all'); + case 'Final Review Interaction Mode': + return findQuickPickValue(items, 'parallel'); + case 'Final Review Perspectives': + return findQuickPickValue(items, 'auto'); + default: + throw new Error(`Unexpected quick pick title: ${title}`); + } + }) as unknown as typeof vscode.window.showQuickPick; + + mutableWindow.showInputBox = (async () => { + throw new Error('No input box expected for default SoT selections'); + }) as unknown as typeof vscode.window.showInputBox; + + const config = await collectFinalReviewConfig(createOutputChannel()); + + assert.ok(config); + assert.strictEqual(config?.enabled, true); + assert.strictEqual(config?.mode, 'society-of-thought'); + assert.strictEqual(config?.interactive, 'smart'); + assert.strictEqual(config?.specialists, 'all'); + assert.strictEqual(config?.interactionMode, 'parallel'); + assert.strictEqual(config?.perspectives, 'auto'); + } finally { + mutableWindow.showQuickPick = originalShowQuickPick; + mutableWindow.showInputBox = originalShowInputBox; + } + }); + + test('collectPlanningReviewConfig keeps minimal workflows on disabled planning review', async () => { + const config = await collectPlanningReviewConfig(createOutputChannel(), 'minimal'); + + assert.ok(config); + assert.strictEqual(config?.enabled, false); + assert.strictEqual(config?.mode, 'single-model'); + }); }); /** diff --git a/src/tools/handoffTool.ts b/src/tools/handoffTool.ts index a0e56b70..f8402a2b 100644 --- a/src/tools/handoffTool.ts +++ b/src/tools/handoffTool.ts @@ -14,32 +14,48 @@ export type AgentName = "PAW" | "PAW Review"; export interface HandoffParams { /** The target agent to invoke */ target_agent: AgentName; - /** The normalized Work ID (feature slug) */ + /** Workflow work ID or review identifier, depending on the target agent */ work_id: string; /** Optional inline instruction to pass to the target agent (e.g., user feedback, prompt file paths, or agent-to-agent context) */ inline_instruction?: string; } /** - * Pattern to validate Work ID (feature slug) format. - * Work IDs must contain only lowercase letters, numbers, and hyphens. + * Patterns to validate PAW workflow and review resume identifiers. */ const WORK_ID_PATTERN = /^[a-z0-9-]+$/; +const REVIEW_ID_PATTERN = /^(?:PR-\d+(?:-[a-z0-9-]+)?|[a-z0-9-]+)$/; /** - * Validates Work ID format. - * - * @param workId - Work ID to validate - * @throws Error if Work ID is invalid + * Validates the workflow work ID or review identifier for a handoff target. */ -function validateWorkId(workId: string): void { - if (!workId || workId.trim().length === 0) { - throw new Error("Work ID cannot be empty"); +function getContextLabel(targetAgent: AgentName): string { + return targetAgent === "PAW Review" ? "Review ID" : "Work ID"; +} + +function getConfirmationTarget(targetAgent: AgentName): string { + return targetAgent === "PAW Review" ? "review" : "feature"; +} + +function validateContextId(targetAgent: AgentName, contextId: string): void { + if (!contextId || contextId.trim().length === 0) { + throw new Error(`${getContextLabel(targetAgent)} cannot be empty`); } - if (!WORK_ID_PATTERN.test(workId)) { + if (targetAgent === "PAW Review") { + if (!REVIEW_ID_PATTERN.test(contextId)) { + throw new Error( + `Invalid review identifier format: "${contextId}". ` + + 'Review identifiers must be PR IDs like "PR-123" or "PR-123-my-repo", or lowercase local review slugs.' + ); + } + + return; + } + + if (!WORK_ID_PATTERN.test(contextId)) { throw new Error( - `Invalid Work ID format: "${workId}". ` + + `Invalid Work ID format: "${contextId}". ` + "Work IDs must contain only lowercase letters, numbers, and hyphens." ); } @@ -52,7 +68,8 @@ function validateWorkId(workId: string): void { * @returns Formatted prompt string */ function constructPrompt(params: HandoffParams): string { - let prompt = `Work ID: ${params.work_id}`; + let prompt = `${getContextLabel(params.target_agent)}: ${params.work_id}`; + prompt += `\n\n${buildResumeInstructions(params.target_agent)}`; if (params.inline_instruction) { prompt += `\n\n${params.inline_instruction}`; @@ -61,6 +78,22 @@ function constructPrompt(params: HandoffParams): string { return prompt; } +function buildResumeInstructions(targetAgent: AgentName): string { + if (targetAgent === 'PAW Review') { + return [ + 'Before acting, read the existing review artifacts for this review identifier.', + 'Use `ReviewContext.md` as the durable review-state source when embedded hardened state is present.', + 'If hardened review state is absent, continue in legacy best-effort mode and say so explicitly.', + ].join('\n'); + } + + return [ + 'Before acting, read the existing workflow artifacts for this work item.', + 'Use `WorkflowContext.md` as the durable workflow-state source when embedded hardened state is present.', + 'If hardened workflow state is absent, continue in legacy best-effort mode and say so explicitly.', + ].join('\n'); +} + /** * Invokes a new chat with the target agent. * Uses fire-and-forget pattern - cannot wait for agent completion. @@ -73,9 +106,10 @@ async function invokeAgent( outputChannel: vscode.OutputChannel ): Promise { const prompt = constructPrompt(params); + const contextLabel = getContextLabel(params.target_agent); outputChannel.appendLine(`[INFO] Invoking agent: ${params.target_agent}`); - outputChannel.appendLine(`[INFO] Work ID: ${params.work_id}`); + outputChannel.appendLine(`[INFO] ${contextLabel}: ${params.work_id}`); if (params.inline_instruction) { outputChannel.appendLine(`[INFO] Inline instruction provided`); } @@ -107,8 +141,10 @@ export function registerHandoffTool( const tool = vscode.lm.registerTool("paw_new_session", { async prepareInvocation(options, _token) { const { target_agent, work_id, inline_instruction } = options.input; + const contextLabel = getContextLabel(target_agent); + const contextTarget = getConfirmationTarget(target_agent); - let message = `This will start a new chat with **${target_agent}** for feature: ${work_id}`; + let message = `This will start a new chat with **${target_agent}** for ${contextTarget}: ${work_id}`; if (inline_instruction) { message += `\n\nWith instruction: "${inline_instruction}"`; } @@ -117,7 +153,9 @@ export function registerHandoffTool( invocationMessage: `Calling ${target_agent} for ${work_id}`, confirmationMessages: { title: "Call PAW Agent", - message: new vscode.MarkdownString(message), + message: new vscode.MarkdownString( + `${contextLabel}: ${work_id}\n\n${message}` + ), }, }; }, @@ -132,8 +170,8 @@ export function registerHandoffTool( const params = options.input; - // Validate Work ID - validateWorkId(params.work_id); + // Validate work/review identifier + validateContextId(params.target_agent, params.work_id); // Invoke the target agent await invokeAgent(params, outputChannel); diff --git a/src/types/workflow.ts b/src/types/workflow.ts index 72b60c5f..2da37e20 100644 --- a/src/types/workflow.ts +++ b/src/types/workflow.ts @@ -22,4 +22,63 @@ export type SessionPolicy = "per-stage" | "continuous"; /** * Final Agent Review mode - determines how pre-PR review runs when enabled. */ -export type FinalReviewMode = "single-model" | "multi-model"; +export type ReviewMode = "single-model" | "multi-model" | "society-of-thought"; + +/** + * Final Agent Review mode - determines how pre-PR review runs when enabled. + */ +export type FinalReviewMode = ReviewMode; + +/** + * Planning review mode mirrors final review mode values. + */ +export type PlanningReviewMode = ReviewMode; + +/** + * Review specialist selection as passed through to paw-init. + * Allowed values are validated by the agent (for example `all`, `adaptive:3`, or a comma list). + */ +export type ReviewSpecialistSelection = string; + +/** + * Review interaction mode for society-of-thought specialist execution. + */ +export type ReviewInteractionMode = "parallel" | "debate"; + +/** + * Review perspective selection as passed through to paw-init. + * Allowed values are validated by the agent (for example `none`, `auto`, or a comma list). + */ +export type ReviewPerspectiveSelection = string; + +/** + * Shared review configuration used by VS Code initialization surfaces. + */ +export interface ReviewConfig { + /** Whether the review surface is enabled */ + enabled: boolean; + + /** Exact configured review procedure */ + mode: ReviewMode; + + /** Whether the review is interactive, auto-apply, or smart */ + interactive: boolean | "smart"; + + /** Model pool for single-model or multi-model review */ + models?: string; + + /** Society-of-thought specialist selection */ + specialists?: ReviewSpecialistSelection; + + /** Society-of-thought interaction mode */ + interactionMode?: ReviewInteractionMode; + + /** Optional specialist-specific model pool or pinning rules */ + specialistModels?: string; + + /** Perspective selection */ + perspectives?: ReviewPerspectiveSelection; + + /** Maximum perspectives to apply per specialist */ + perspectiveCap?: number; +} diff --git a/src/ui/userInput.ts b/src/ui/userInput.ts index fa961a1e..47472305 100644 --- a/src/ui/userInput.ts +++ b/src/ui/userInput.ts @@ -1,5 +1,14 @@ import * as vscode from 'vscode'; -import type { ReviewPolicy, SessionPolicy } from '../types/workflow'; +import type { + FinalReviewMode, + PlanningReviewMode, + ReviewConfig as AgentReviewConfig, + ReviewInteractionMode, + ReviewPerspectiveSelection, + ReviewPolicy, + ReviewSpecialistSelection, + SessionPolicy, +} from '../types/workflow'; // Re-export types for consumers that import from userInput.ts export type { ReviewPolicy, SessionPolicy } from '../types/workflow'; @@ -39,6 +48,13 @@ export type ReviewStrategy = 'prs' | 'local'; */ export type ArtifactLifecycle = 'commit-and-clean' | 'commit-and-persist' | 'never-commit'; +const DEFAULT_REVIEW_MODELS = 'latest GPT, latest Gemini, latest Claude Opus'; +const DEFAULT_REVIEW_SPECIALISTS: ReviewSpecialistSelection = 'all'; +const DEFAULT_REVIEW_INTERACTION_MODE: ReviewInteractionMode = 'parallel'; +const DEFAULT_REVIEW_SPECIALIST_MODELS = 'none'; +const DEFAULT_REVIEW_PERSPECTIVES: ReviewPerspectiveSelection = 'auto'; +const DEFAULT_REVIEW_PERSPECTIVE_CAP = 2; + export interface WorktreeConfig { /** Whether to create a fresh worktree or reuse an existing one */ strategy: WorktreeStrategy; @@ -66,16 +82,12 @@ export interface WorkflowModeSelection { /** * Final Agent Review configuration for pre-PR review step. */ -export interface FinalReviewConfig { - /** Whether Final Agent Review is enabled */ - enabled: boolean; - - /** Review mode: single-model or multi-model (only applies if enabled) */ - mode: 'single-model' | 'multi-model'; - - /** Whether review is interactive (apply/skip/discuss), auto-apply, or smart (auto-apply consensus, interactive for ambiguous) */ - interactive: boolean | 'smart'; -} +export interface FinalReviewConfig extends AgentReviewConfig {} + +/** + * Planning documents review configuration captured during initialization. + */ +export interface PlanningReviewConfig extends AgentReviewConfig {} /** * User inputs collected for work item initialization. @@ -121,6 +133,9 @@ export interface WorkItemInputs { * - never-commit: Local only, never tracked in git */ artifactLifecycle: ArtifactLifecycle; + + /** Planning documents review configuration */ + planningReview: PlanningReviewConfig; /** Final Agent Review configuration for pre-PR review step */ finalReview: FinalReviewConfig; @@ -143,6 +158,330 @@ export function isWorktreeExecutionEnabled(): boolean { return vscode.workspace.getConfiguration('paw').get(WORKTREE_EXECUTION_FEATURE_FLAG, false); } +function createEnabledReviewConfig(mode: FinalReviewMode | PlanningReviewMode): AgentReviewConfig { + return { + enabled: true, + mode, + interactive: 'smart', + models: DEFAULT_REVIEW_MODELS, + specialists: DEFAULT_REVIEW_SPECIALISTS, + interactionMode: DEFAULT_REVIEW_INTERACTION_MODE, + specialistModels: DEFAULT_REVIEW_SPECIALIST_MODELS, + perspectives: DEFAULT_REVIEW_PERSPECTIVES, + perspectiveCap: DEFAULT_REVIEW_PERSPECTIVE_CAP, + }; +} + +function createDisabledReviewConfig(): AgentReviewConfig { + return { + enabled: false, + mode: 'single-model', + interactive: true, + models: DEFAULT_REVIEW_MODELS, + specialists: DEFAULT_REVIEW_SPECIALISTS, + interactionMode: DEFAULT_REVIEW_INTERACTION_MODE, + specialistModels: DEFAULT_REVIEW_SPECIALIST_MODELS, + perspectives: DEFAULT_REVIEW_PERSPECTIVES, + perspectiveCap: DEFAULT_REVIEW_PERSPECTIVE_CAP, + }; +} + +async function collectReviewSpecialists( + outputChannel: vscode.OutputChannel, + reviewLabel: string +): Promise { + const selection = await vscode.window.showQuickPick( + [ + { + label: 'All Specialists', + description: 'Run every available specialist (default)', + detail: 'Preserves the widest review surface', + value: 'all' as const, + }, + { + label: 'Adaptive Subset', + description: 'Let PAW choose a focused subset', + detail: 'Stored as adaptive:N in WorkflowContext.md', + value: 'adaptive' as const, + }, + { + label: 'Custom List', + description: 'Enter a comma-separated specialist list', + detail: 'Use exact specialist names', + value: 'custom' as const, + }, + ], + { + placeHolder: 'Select specialist configuration', + title: `${reviewLabel} Specialists`, + } + ); + + if (!selection) { + outputChannel.appendLine(`[INFO] ${reviewLabel} specialist selection cancelled`); + return undefined; + } + + if (selection.value === 'all') { + return DEFAULT_REVIEW_SPECIALISTS; + } + + if (selection.value === 'adaptive') { + const adaptiveCount = await vscode.window.showInputBox({ + prompt: `Enter ${reviewLabel.toLowerCase()} adaptive specialist count`, + placeHolder: '3', + validateInput: (value: string) => { + const parsed = Number.parseInt(value, 10); + return Number.isInteger(parsed) && parsed > 0 + ? undefined + : 'Enter a positive integer'; + }, + }); + + if (adaptiveCount === undefined) { + outputChannel.appendLine(`[INFO] ${reviewLabel} adaptive specialist count cancelled`); + return undefined; + } + + return `adaptive:${Number.parseInt(adaptiveCount, 10)}`; + } + + const customSpecialists = await vscode.window.showInputBox({ + prompt: `Enter ${reviewLabel.toLowerCase()} specialists (comma-separated)`, + placeHolder: 'security, performance, maintainability', + validateInput: (value: string) => + value.trim().length > 0 ? undefined : 'Enter at least one specialist name', + }); + + if (customSpecialists === undefined) { + outputChannel.appendLine(`[INFO] ${reviewLabel} custom specialist entry cancelled`); + return undefined; + } + + return customSpecialists.trim(); +} + +async function collectReviewInteractionMode( + outputChannel: vscode.OutputChannel, + reviewLabel: string +): Promise { + const selection = await vscode.window.showQuickPick( + [ + { + label: 'Parallel', + description: 'Run specialists independently (default)', + detail: 'Fastest path for broad coverage', + value: 'parallel' as ReviewInteractionMode, + }, + { + label: 'Debate', + description: 'Have specialists challenge each other', + detail: 'Slower, but useful for deeper trade-off analysis', + value: 'debate' as ReviewInteractionMode, + }, + ], + { + placeHolder: 'Select specialist interaction mode', + title: `${reviewLabel} Interaction Mode`, + } + ); + + if (!selection) { + outputChannel.appendLine(`[INFO] ${reviewLabel} interaction mode selection cancelled`); + return undefined; + } + + return selection.value; +} + +async function collectReviewPerspectives( + outputChannel: vscode.OutputChannel, + reviewLabel: string +): Promise { + const selection = await vscode.window.showQuickPick( + [ + { + label: 'Auto', + description: 'Let PAW choose perspective overlays (default)', + detail: 'Uses the built-in auto perspective strategy', + value: 'auto' as ReviewPerspectiveSelection, + }, + { + label: 'None', + description: 'Run without extra perspective overlays', + detail: 'Keeps review in its base specialist mode', + value: 'none' as ReviewPerspectiveSelection, + }, + { + label: 'Custom List', + description: 'Enter comma-separated perspective names', + detail: 'Use exact perspective names', + value: 'custom' as const, + }, + ], + { + placeHolder: 'Select perspective configuration', + title: `${reviewLabel} Perspectives`, + } + ); + + if (!selection) { + outputChannel.appendLine(`[INFO] ${reviewLabel} perspective selection cancelled`); + return undefined; + } + + if (selection.value !== 'custom') { + return selection.value; + } + + const customPerspectives = await vscode.window.showInputBox({ + prompt: `Enter ${reviewLabel.toLowerCase()} perspectives (comma-separated)`, + placeHolder: 'premortem, red-team', + validateInput: (value: string) => + value.trim().length > 0 ? undefined : 'Enter at least one perspective name', + }); + + if (customPerspectives === undefined) { + outputChannel.appendLine(`[INFO] ${reviewLabel} custom perspective entry cancelled`); + return undefined; + } + + return customPerspectives.trim(); +} + +async function collectReviewConfig( + outputChannel: vscode.OutputChannel, + options: { + enableTitle: string; + enablePrompt: string; + enabledDescription: string; + disabledDescription: string; + reviewLabel: string; + } +): Promise { + const enabledSelection = await vscode.window.showQuickPick( + [ + { + label: 'Enabled', + description: options.enabledDescription, + detail: 'Stored explicitly in WorkflowContext.md', + value: true, + }, + { + label: 'Disabled', + description: options.disabledDescription, + detail: 'Stored explicitly in WorkflowContext.md', + value: false, + }, + ], + { + placeHolder: options.enablePrompt, + title: options.enableTitle, + } + ); + + if (!enabledSelection) { + outputChannel.appendLine(`[INFO] ${options.enableTitle} selection cancelled`); + return undefined; + } + + if (!enabledSelection.value) { + return createDisabledReviewConfig(); + } + + const modeSelection = await vscode.window.showQuickPick( + [ + { + label: 'Multi-Model', + description: 'Preserve the default multi-model review path', + detail: 'Uses the standard multi-model review configuration', + value: 'multi-model' as FinalReviewMode, + }, + { + label: 'Single Model', + description: 'Use a single model for this review surface', + detail: 'Fastest review path', + value: 'single-model' as FinalReviewMode, + }, + { + label: 'Society-of-Thought', + description: 'Use specialist personas and explicit review coordination', + detail: 'Preserves SoT configuration in WorkflowContext.md', + value: 'society-of-thought' as FinalReviewMode, + }, + ], + { + placeHolder: 'Select review mode', + title: `${options.reviewLabel} Mode`, + } + ); + + if (!modeSelection) { + outputChannel.appendLine(`[INFO] ${options.reviewLabel} mode selection cancelled`); + return undefined; + } + + const interactiveSelection = await vscode.window.showQuickPick( + [ + { + label: 'Smart', + description: 'Auto-apply obvious fixes, stay interactive for ambiguity (default)', + detail: 'Matches the PAW default interactive mode', + value: 'smart' as boolean | 'smart', + }, + { + label: 'Interactive', + description: 'Review each finding before applying changes', + detail: 'Maximizes user control during review resolution', + value: true as boolean | 'smart', + }, + { + label: 'Auto-Apply', + description: 'Apply recommended fixes automatically', + detail: 'Minimizes interaction during review resolution', + value: false as boolean | 'smart', + }, + ], + { + placeHolder: 'Select interaction mode', + title: `${options.reviewLabel} Interaction`, + } + ); + + if (!interactiveSelection) { + outputChannel.appendLine(`[INFO] ${options.reviewLabel} interaction selection cancelled`); + return undefined; + } + + const reviewConfig = createEnabledReviewConfig(modeSelection.value); + reviewConfig.interactive = interactiveSelection.value; + + if (modeSelection.value !== 'society-of-thought') { + return reviewConfig; + } + + const specialists = await collectReviewSpecialists(outputChannel, options.reviewLabel); + if (specialists === undefined) { + return undefined; + } + + const interactionMode = await collectReviewInteractionMode(outputChannel, options.reviewLabel); + if (interactionMode === undefined) { + return undefined; + } + + const perspectives = await collectReviewPerspectives(outputChannel, options.reviewLabel); + if (perspectives === undefined) { + return undefined; + } + + reviewConfig.specialists = specialists; + reviewConfig.interactionMode = interactionMode; + reviewConfig.perspectives = perspectives; + + return reviewConfig; +} + /** * Determine whether the provided branch name uses only valid characters. */ @@ -547,11 +886,9 @@ export async function collectArtifactLifecycle( * * Presents Quick Pick menus for: * 1. Enable/disable Final Agent Review - * 2. If enabled: Review mode (single-model or multi-model) + * 2. If enabled: Exact review mode (single-model, multi-model, or society-of-thought) * 3. If enabled: Interactive mode (apply/skip/discuss vs auto-apply) - * - * Note: VS Code can only execute single-model mode. Multi-model is CLI-only - * and will fall back to single-model in VS Code. + * 4. If society-of-thought: specialist, interaction-mode, and perspective settings * * @param outputChannel - Output channel for logging user interaction events * @returns Promise resolving to FinalReviewConfig, or undefined if user cancelled @@ -559,83 +896,37 @@ export async function collectArtifactLifecycle( export async function collectFinalReviewConfig( outputChannel: vscode.OutputChannel ): Promise { - // Step 1: Enable/disable Final Agent Review - const enabledSelection = await vscode.window.showQuickPick( - [ - { - label: "Enabled", - description: "Run automated review before Final PR (default)", - detail: "Catches issues before external PR review", - value: true, - }, - { - label: "Disabled", - description: "Skip pre-PR review step", - detail: "Go directly from implementation to Final PR", - value: false, - }, - ], - { - placeHolder: "Enable Final Agent Review?", - title: "Final Agent Review", - } - ); - - if (!enabledSelection) { - outputChannel.appendLine("[INFO] Final Agent Review selection cancelled"); - return undefined; - } - - // If disabled, return config with defaults for mode/interactive - if (!enabledSelection.value) { - return { - enabled: false, - mode: 'single-model', - interactive: true, - }; - } - - // VS Code only supports single-model (multi-model requires parallel subagents) - // Skip mode selection - always use single-model - - // Step 2: Interactive mode (only if enabled) - const interactiveSelection = await vscode.window.showQuickPick( - [ - { - label: "Smart", - description: "Auto-apply consensus fixes, interactive for ambiguous (default)", - detail: "In multi-model (CLI): obvious fixes applied automatically. In single-model: interactive", - value: 'smart' as boolean | 'smart', - }, - { - label: "Interactive", - description: "Review each finding: apply, skip, or discuss", - detail: "You control what gets changed", - value: true as boolean | 'smart', - }, - { - label: "Auto-Apply", - description: "Automatically apply recommended fixes", - detail: "Faster but less control over changes", - value: false as boolean | 'smart', - }, - ], - { - placeHolder: "Select interaction mode", - title: "Final Review Interaction", - } - ); + return collectReviewConfig(outputChannel, { + enableTitle: 'Final Agent Review', + enablePrompt: 'Enable Final Agent Review?', + enabledDescription: 'Run automated review before Final PR (default)', + disabledDescription: 'Skip pre-PR review step', + reviewLabel: 'Final Review', + }); +} - if (!interactiveSelection) { - outputChannel.appendLine("[INFO] Final review interaction selection cancelled"); - return undefined; +/** + * Collect planning documents review configuration from user. + * + * Minimal workflows skip this surface and keep planning review disabled to + * preserve the existing minimal workflow contract. + */ +export async function collectPlanningReviewConfig( + outputChannel: vscode.OutputChannel, + workflowMode: WorkflowMode +): Promise { + if (workflowMode === 'minimal') { + outputChannel.appendLine('[INFO] Planning docs review disabled in minimal workflow mode'); + return createDisabledReviewConfig(); } - return { - enabled: true, - mode: 'single-model', - interactive: interactiveSelection.value, - }; + return collectReviewConfig(outputChannel, { + enableTitle: 'Planning Docs Review', + enablePrompt: 'Enable Planning Docs Review?', + enabledDescription: 'Review Spec, CodeResearch, and ImplementationPlan before coding (default)', + disabledDescription: 'Skip the planning documents review stage', + reviewLabel: 'Planning Review', + }); } /** @@ -747,6 +1038,12 @@ export async function collectUserInputs( return undefined; } + // Collect planning documents review configuration + const planningReview = await collectPlanningReviewConfig(outputChannel, workflowMode.mode); + if (planningReview === undefined) { + return undefined; + } + // Collect Final Agent Review configuration const finalReview = await collectFinalReviewConfig(outputChannel); if (finalReview === undefined) { @@ -762,6 +1059,7 @@ export async function collectUserInputs( reviewPolicy, sessionPolicy, artifactLifecycle, + planningReview, finalReview, issueUrl: issueUrl.trim() === '' ? undefined : issueUrl.trim() }; diff --git a/tests/integration/tests/skills/control-state-contract.test.ts b/tests/integration/tests/skills/control-state-contract.test.ts new file mode 100644 index 00000000..1f2b98b6 --- /dev/null +++ b/tests/integration/tests/skills/control-state-contract.test.ts @@ -0,0 +1,161 @@ +import { describe, it } from "node:test"; +import assert from "node:assert"; +import { readFile } from "fs/promises"; +import { resolve } from "path"; +import { fileURLToPath } from "url"; + +const __dirname = fileURLToPath(new URL(".", import.meta.url)); +const REPO_ROOT = resolve(__dirname, "../../../.."); + +async function readRepoFile(relativePath: string): Promise { + return readFile(resolve(REPO_ROOT, relativePath), "utf-8"); +} + +type ParsedItem = { id: string; status: string; kind: string }; + +function parseControlStateFixture(content: string): { + todoMirror: string; + reconciliation: string; + items: ParsedItem[]; + terminalMarkers: string[]; +} { + const todoMirror = content.match(/^TODO Mirror:\s*(.+)$/m)?.[1]?.trim() ?? ""; + const reconciliation = content.match(/^Reconciliation:\s*(.+)$/m)?.[1]?.trim() ?? ""; + const items = [...content.matchAll(/^- `([^`]+)` \| `([^`]+)` \| `([^`]+)`$/gm)] + .map(([, id, status, kind]) => ({ id, status, kind })); + const terminalSection = content.split("### Terminal External Review State")[1] ?? ""; + const terminalMarkers = [...terminalSection.matchAll(/^- `([^`]+)`$/gm)] + .map(([, marker]) => marker); + + return { todoMirror, reconciliation, items, terminalMarkers }; +} + +function serializeControlStateFixture(parsed: { + todoMirror: string; + reconciliation: string; + items: ParsedItem[]; + terminalMarkers: string[]; +}): string { + const lines = [ + "## Hardened State", + "", + `TODO Mirror: ${parsed.todoMirror}`, + `Reconciliation: ${parsed.reconciliation}`, + "", + "### Required Workflow Items", + ...parsed.items.map((item) => `- \`${item.id}\` | \`${item.status}\` | \`${item.kind}\``), + "", + "### Terminal External Review State", + ...parsed.terminalMarkers.map((marker) => `- \`${marker}\``), + ]; + + return lines.join("\n"); +} + +describe("control-state contract content", () => { + it("defines a shared hardened-state core for workflow and review references", async () => { + const workflow = await readRepoFile("skills/paw-workflow/references/control-state-contract.md"); + const review = await readRepoFile("skills/paw-review-workflow/references/control-state-contract.md"); + + for (const content of [workflow, review]) { + assert.match(content, /## Shared Core/); + assert.match(content, /legacy best-effort mode/i); + assert.match(content, /TODOs are an execution mirror|built-in TODOs are a mirror/i); + assert.match(content, /- \\`\\` \| \\`\\` \| \\`\\`/); + assert.match(content, /`pending`/); + assert.match(content, /`in_progress`/); + assert.match(content, /`blocked`/); + assert.match(content, /`resolved`/); + assert.match(content, /`not_applicable`/); + assert.match(content, /`not_run`/); + assert.match(content, /`current`/); + assert.match(content, /`stale`/); + assert.match(content, /`external_unverified`/); + assert.match(content, /## Hardened State/); + } + + assert.match(workflow, /phase::/); + assert.match(workflow, /procedure:planning-review/); + assert.match( + workflow, + /Use `not_applicable` for `spec`, `spec-review`, and `transition:after-spec-review` when `Workflow Mode` is `minimal`; otherwise use `pending`\./, + ); + assert.match( + workflow, + /Use `pending` for `planning-docs-review`, `transition:after-planning-docs-review`, and `procedure:planning-review` when `Planning Docs Review` is `enabled`; otherwise use `not_applicable`\./, + ); + assert.match( + workflow, + /Use `pending` for `final-review`, `transition:after-final-review`, and `procedure:final-review` when `Final Agent Review` is `enabled`; otherwise use `not_applicable`\./, + ); + assert.match(review, /output:feedback/); + assert.match(review, /pending-review-created/); + assert.match(review, /manual-posting-provided/); + assert.match(review, /must contain exactly one current marker/i); + }); + + it("documents hardened-state templates in init and review-understanding", async () => { + const init = await readRepoFile("skills/paw-init/SKILL.md"); + const review = await readRepoFile("skills/paw-review-understanding/SKILL.md"); + + assert.match(init, /## Hardened State/); + assert.match(init, /TODO Mirror:\s*active-required-items/i); + assert.match( + init, + /`spec` \| `` \| `activity`/, + ); + assert.match( + init, + /Use `not_applicable` for `spec`, `spec-review`, and `transition:after-spec-review` when `Workflow Mode` is `minimal`; otherwise use `pending`\./, + ); + assert.match( + init, + /After planning defines named implementation phases, append `phase::` items under `### Required Workflow Items`\./, + ); + assert.doesNotMatch( + init, + /- add `phase::` items after planning defines named implementation phases/, + ); + assert.match( + init, + /`planning-docs-review` \| `` \| `activity`/, + ); + assert.match( + init, + /Use `pending` for `planning-docs-review`, `transition:after-planning-docs-review`, and `procedure:planning-review` when `Planning Docs Review` is `enabled`; otherwise use `not_applicable`\./, + ); + assert.match( + init, + /Use `pending` for `final-review`, `transition:after-final-review`, and `procedure:final-review` when `Final Agent Review` is `enabled`; otherwise use `not_applicable`\./, + ); + assert.match(init, /transition:after-plan-review/); + assert.match(init, /procedure:planning-review/); + + assert.match(review, /## Hardened State/); + assert.match(review, /output:critique-response/); + assert.match(review, /Pending Review ID:\s*`none`/i); + }); + + it("parses serialized control-state fixtures idempotently", () => { + const fixture = [ + "## Hardened State", + "", + "TODO Mirror: active-required-items", + "Reconciliation: current", + "", + "### Required Workflow Items", + "- `spec` | `resolved` | `activity`", + "- `plan-review` | `blocked` | `activity`", + "- `transition:after-plan-review` | `pending` | `transition`", + "", + "### Terminal External Review State", + "- `none`", + "- `pending-review-created`", + ].join("\n"); + + const parsed = parseControlStateFixture(fixture); + const reparsed = parseControlStateFixture(serializeControlStateFixture(parsed)); + + assert.deepStrictEqual(reparsed, parsed); + }); +}); diff --git a/tests/integration/tests/skills/control-state-reconciliation-content.test.ts b/tests/integration/tests/skills/control-state-reconciliation-content.test.ts new file mode 100644 index 00000000..8f988bd6 --- /dev/null +++ b/tests/integration/tests/skills/control-state-reconciliation-content.test.ts @@ -0,0 +1,50 @@ +import { describe, it } from "node:test"; +import assert from "node:assert"; +import { readFile } from "fs/promises"; +import { resolve } from "path"; +import { fileURLToPath } from "url"; + +const __dirname = fileURLToPath(new URL(".", import.meta.url)); +const REPO_ROOT = resolve(__dirname, "../../../.."); + +async function readRepoFile(relativePath: string): Promise { + return readFile(resolve(REPO_ROOT, relativePath), "utf-8"); +} + +describe("control-state reconciliation content", () => { + it("documents shared reconciliation rules in the workflow contract", async () => { + const workflow = await readRepoFile("skills/paw-workflow/references/control-state-contract.md"); + + assert.match(workflow, /## Reconciliation Rules/); + assert.match(workflow, /Mutation-affecting decisions include delegation, transition advancement, review execution, git mutation, and final PR creation\./); + assert.match(workflow, /Read-only status\/reporting may continue when reconciliation is `stale` or `external_unverified`/); + assert.match(workflow, /continue in legacy best-effort mode and explicitly report that hardened protections are inactive/i); + assert.match(workflow, /Configured procedure items become `resolved` only when the configured mode actually ran\./); + }); + + it("requires reconciliation before mutation-affecting PAW actions", async () => { + const transition = await readRepoFile("skills/paw-transition/SKILL.md"); + const status = await readRepoFile("skills/paw-status/SKILL.md"); + const gitOps = await readRepoFile("skills/paw-git-operations/SKILL.md"); + const pawAgent = await readRepoFile("agents/PAW.agent.md"); + const pawPr = await readRepoFile("skills/paw-pr/SKILL.md"); + + assert.match(transition, /first non-terminal required activity item/i); + assert.match(transition, /preflight:\s*blocked:\s*reconciliation incomplete/i); + assert.match(transition, /TODOs are a mirror only/i); + + assert.match(status, /read-only\/unverified/i); + assert.match(status, /hardened protections are inactive/i); + assert.match(status, /do not report progress past unresolved required items, gate items, or procedure items/i); + + assert.match(gitOps, /Treat `Reconciliation: current` as required for mutation-affecting git work\./); + assert.match(gitOps, /STOP and report that unresolved control-state item/i); + + assert.match(pawAgent, /If reconciliation cannot make the state `current`, STOP and report the blocker/i); + assert.match(pawAgent, /do not advance past required activity items, gate items, or configured procedure items/i); + + assert.match(pawPr, /Required activity items earlier than `final-pr`, plus all gate items and configured procedure items, must all be `resolved` or `not_applicable`/); + assert.match(pawPr, /Treat `final-pr` as the current activity/i); + assert.match(pawPr, /hardened protections are inactive/i); + }); +}); diff --git a/tests/integration/tests/skills/execution-contract-content.test.ts b/tests/integration/tests/skills/execution-contract-content.test.ts index 589e76d5..5e8bf492 100644 --- a/tests/integration/tests/skills/execution-contract-content.test.ts +++ b/tests/integration/tests/skills/execution-contract-content.test.ts @@ -47,4 +47,35 @@ describe("execution checkout contract content", () => { assert.match(content, /git worktree list/); assert.doesNotMatch(content, /already runs? in the intended execution checkout/i); }); + + it("documents SoT-capable review parameters in paw-init", async () => { + const content = await readRepoFile("skills/paw-init/SKILL.md"); + assert.match(content, /`final_review_mode`[\s\S]*`society-of-thought`/i); + assert.match(content, /`planning_review_mode`[\s\S]*`society-of-thought`/i); + assert.match(content, /`final_review_specialists`/); + assert.match(content, /`planning_review_specialists`/); + assert.match(content, /`final_review_interaction_mode`/); + assert.match(content, /`planning_review_interaction_mode`/); + }); + + it("documents resume-aware handoff behavior in the VS Code tool schema", async () => { + const packageJson = JSON.parse(await readRepoFile("package.json")); + const handoffTool = packageJson.contributes?.languageModelTools?.find( + (tool: { name?: string }) => tool.name === "paw_new_session" + ); + + assert.ok(handoffTool, "Expected paw_new_session language model tool to exist"); + assert.match(handoffTool.modelDescription, /WorkflowContext\.md|ReviewContext\.md/i); + assert.match(handoffTool.modelDescription, /hardened state/i); + assert.match(handoffTool.modelDescription, /legacy best-effort mode/i); + }); + + it("documents lifecycle-only WorkflowContext updates in the stop-tracking prompt", async () => { + const content = await readRepoFile("src/prompts/stopTrackingArtifacts.template.md"); + assert.match(content, /WorkflowContext\.md/); + assert.match(content, /Artifact Lifecycle:/); + assert.match(content, /## Hardened State/); + assert.match(content, /legacy best-effort mode/i); + assert.match(content, /Do not rewrite or remove unrelated hardened-state or legacy-state lines/i); + }); }); diff --git a/tests/integration/tests/skills/paw-agent-guardrails.test.ts b/tests/integration/tests/skills/paw-agent-guardrails.test.ts index f66ee63a..f1a79491 100644 --- a/tests/integration/tests/skills/paw-agent-guardrails.test.ts +++ b/tests/integration/tests/skills/paw-agent-guardrails.test.ts @@ -7,7 +7,7 @@ import { fileURLToPath } from "url"; const __dirname = fileURLToPath(new URL(".", import.meta.url)); const REPO_ROOT = resolve(__dirname, "../../../../"); -describe("PAW agent final PR guardrails", () => { +describe("PAW agent guardrails", () => { it("requires paw-pr loading and blocks inline final PR creation", async () => { const content = await readFile(resolve(REPO_ROOT, "agents/PAW.agent.md"), "utf-8"); @@ -29,4 +29,22 @@ describe("PAW agent final PR guardrails", () => { "PAW.agent.md should forbid inline final PR creation at the paw-pr boundary", ); }); + + it("treats embedded hardened state as the durable workflow source of truth", async () => { + const content = await readFile(resolve(REPO_ROOT, "agents/PAW.agent.md"), "utf-8"); + + assert.match(content, /## Hardened State/i); + assert.match(content, /durable source of truth/i); + assert.match(content, /TODOs as a mirror of active required workflow items/i); + assert.match(content, /legacy best-effort mode/i); + }); + + it("documents embedded review control state handling in PAW Review.agent.md", async () => { + const content = await readFile(resolve(REPO_ROOT, "agents/PAW Review.agent.md"), "utf-8"); + + assert.match(content, /## Embedded Review Control State/); + assert.match(content, /contains `## Hardened State`/i); + assert.match(content, /terminal external-review facts/i); + assert.match(content, /legacy best-effort mode/i); + }); }); diff --git a/tests/integration/tests/workflows/final-review.test.ts b/tests/integration/tests/workflows/final-review.test.ts new file mode 100644 index 00000000..97e8de24 --- /dev/null +++ b/tests/integration/tests/workflows/final-review.test.ts @@ -0,0 +1,32 @@ +import { describe, it } from "node:test"; +import assert from "node:assert"; +import { readFile } from "fs/promises"; +import { resolve } from "path"; +import { fileURLToPath } from "url"; + +const __dirname = fileURLToPath(new URL(".", import.meta.url)); +const REPO_ROOT = resolve(__dirname, "../../../.."); + +async function readRepoFile(relativePath: string): Promise { + return readFile(resolve(REPO_ROOT, relativePath), "utf-8"); +} + +describe("configured review mode enforcement content", () => { + it("blocks unsupported planning review modes instead of silently downgrading", async () => { + const planning = await readRepoFile("skills/paw-planning-docs-review/SKILL.md"); + + assert.match(planning, /persist `procedure:planning-review` as `blocked` in `WorkflowContext\.md`/); + assert.match(planning, /Do not run a single-model fallback\./); + assert.match(planning, /When hardened state is absent, continue in legacy best-effort mode/i); + assert.match(planning, /run a single-model fallback review/i); + }); + + it("blocks unsupported final review modes instead of silently downgrading", async () => { + const finalReview = await readRepoFile("skills/paw-final-review/SKILL.md"); + + assert.match(finalReview, /persist `procedure:final-review` as `blocked` in `WorkflowContext\.md`/); + assert.match(finalReview, /Do not run a single-model fallback\./); + assert.match(finalReview, /When hardened state is absent, continue in legacy best-effort mode/i); + assert.match(finalReview, /run a single-model fallback review/i); + }); +}); diff --git a/tests/integration/tests/workflows/full-local-workflow.test.ts b/tests/integration/tests/workflows/full-local-workflow.test.ts index 64d22864..1849a6eb 100644 --- a/tests/integration/tests/workflows/full-local-workflow.test.ts +++ b/tests/integration/tests/workflows/full-local-workflow.test.ts @@ -62,6 +62,7 @@ describe("full local workflow (spec → plan → implement)", { timeout: 600_000 "Include a test that verifies the endpoint.", "", "Stages to execute IN ORDER:", + `0. CONTEXT: Create .paw/work/${workId}/WorkflowContext.md with a \`## Hardened State\` section and \`TODO Mirror: active-required-items\``, `1. SPEC: Write a specification to .paw/work/${workId}/Spec.md`, `2. PLAN: Create an implementation plan at .paw/work/${workId}/ImplementationPlan.md`, "3. IMPLEMENT: Make the code changes, write REAL tests (not placeholders!), run `npm test`, commit locally", @@ -81,6 +82,12 @@ describe("full local workflow (spec → plan → implement)", { timeout: 600_000 minFRCount: 2, }); + const workflowContext = await readFile( + join(ctx.fixture.workDir, ".paw/work", workId, "WorkflowContext.md"), "utf-8", + ); + assert.match(workflowContext, /## Hardened State/); + assert.match(workflowContext, /TODO Mirror:\s*active-required-items/i); + // Plan exists with phases await assertPlanStructure(ctx.fixture.workDir, workId, { minPhases: 1, @@ -182,10 +189,12 @@ function buildFullWorkflowPrompt(opts: { }): string { return [ "You are a PAW full-workflow agent. Execute spec → plan → implement stages in sequence.", - "", - "CRITICAL RULES:", - `- Write spec to .paw/work/${opts.workId}/Spec.md`, - `- Write plan to .paw/work/${opts.workId}/ImplementationPlan.md`, + "", + "CRITICAL RULES:", + `- Write workflow context to .paw/work/${opts.workId}/WorkflowContext.md`, + "- WorkflowContext.md MUST include a `## Hardened State` section and `TODO Mirror: active-required-items`", + `- Write spec to .paw/work/${opts.workId}/Spec.md`, + `- Write plan to .paw/work/${opts.workId}/ImplementationPlan.md`, "- Spec MUST have: Overview, FR-xxx requirements, SC-xxx success criteria", "- Plan MUST have: ## Phase N sections with Success Criteria", "- After implementing the endpoint, you MUST write a REAL test file (not a placeholder!)", diff --git a/tests/integration/tests/workflows/legacy-control-state-mode.test.ts b/tests/integration/tests/workflows/legacy-control-state-mode.test.ts new file mode 100644 index 00000000..56914244 --- /dev/null +++ b/tests/integration/tests/workflows/legacy-control-state-mode.test.ts @@ -0,0 +1,86 @@ +import { after, describe, it } from "node:test"; +import assert from "node:assert"; +import { mkdir, writeFile } from "fs/promises"; +import { join } from "path"; +import { RuleBasedAnswerer } from "../../lib/answerer.js"; +import { createTestContext, destroyTestContext, type TestContext } from "../../lib/harness.js"; +import { loadSkill } from "../../lib/skills.js"; + +const LIVE_TURN_TIMEOUT = 180_000; + +async function seedLegacyWorkflowFiles(workDir: string, workId: string): Promise { + const dir = join(workDir, ".paw/work", workId); + await mkdir(dir, { recursive: true }); + await writeFile(join(dir, "WorkflowContext.md"), [ + "# WorkflowContext", + "", + "Work Title: Legacy Mode Test", + `Work ID: ${workId}`, + "Base Branch: main", + `Target Branch: feature/${workId}`, + "Workflow Mode: full", + "Review Strategy: local", + "Review Policy: milestones", + "Session Policy: continuous", + "Final Agent Review: enabled", + "Remote: origin", + "Artifact Paths: auto-derived", + "", + ].join("\n")); + + await writeFile(join(dir, "ImplementationPlan.md"), [ + "# Implementation Plan", + "", + "## Phase Status", + "- [x] Phase 1: Embedded Control-State Contract", + "- [ ] Phase 2: PAW Gate Reconciliation", + "", + ].join("\n")); +} + +function buildStatusPrompt(skillContent: string, workId: string): string { + return [ + "You are executing the paw-status skill. Follow the skill exactly.", + "", + "CRITICAL RULES:", + `- Read .paw/work/${workId}/WorkflowContext.md and .paw/work/${workId}/ImplementationPlan.md`, + "- If hardened state is absent, explicitly say legacy best-effort mode is active and hardened protections are inactive", + "- Do NOT ask the user questions", + "", + "Skill documentation:", + skillContent, + ].join("\n"); +} + +describe("legacy control-state mode reporting", { timeout: 240_000 }, () => { + const contexts: TestContext[] = []; + + after(async () => { + for (const ctx of contexts) { + await destroyTestContext(ctx); + } + }); + + it("explicitly reports legacy best-effort mode when hardened state is absent", async () => { + const pawStatus = await loadSkill("paw-status"); + const workId = "test-legacy-control-state-mode"; + + const ctx = await createTestContext({ + fixtureName: "minimal-ts", + skillOrAgent: "legacy-control-state-mode", + systemPrompt: buildStatusPrompt(pawStatus, workId), + answerer: new RuleBasedAnswerer([(req) => req.choices?.[0] ?? "yes"], false), + }); + contexts.push(ctx); + + await seedLegacyWorkflowFiles(ctx.fixture.workDir, workId); + + const response = await ctx.session.sendAndWait({ + prompt: `Summarize the current PAW status for work ID ${workId}.`, + }, LIVE_TURN_TIMEOUT); + + const content = response?.data?.content ?? ""; + assert.match(content, /legacy/i); + assert.match(content, /hardened protections[\s\S]*inactive/i); + }); +}); diff --git a/tests/integration/tests/workflows/minimal-workflow.test.ts b/tests/integration/tests/workflows/minimal-workflow.test.ts index 3d2d7303..6759a159 100644 --- a/tests/integration/tests/workflows/minimal-workflow.test.ts +++ b/tests/integration/tests/workflows/minimal-workflow.test.ts @@ -49,10 +49,11 @@ describe("minimal workflow (plan + implement, no spec)", { timeout: 300_000 }, ( "", "Do the following in order:", "1. Read the codebase", - `2. Create an implementation plan at .paw/work/${workId}/ImplementationPlan.md (with phases + success criteria)`, - "3. Implement the plan by modifying the code", - "4. Run tests to verify (e.g. `npm test`)", - "5. Commit changes locally", + `2. Create .paw/work/${workId}/WorkflowContext.md with a \`## Hardened State\` section and \`TODO Mirror: active-required-items\``, + `3. Create an implementation plan at .paw/work/${workId}/ImplementationPlan.md (with phases + success criteria)`, + "4. Implement the plan by modifying the code", + "5. Run tests to verify (e.g. `npm test`)", + "6. Commit changes locally", "", "IMPORTANT:", `- Do NOT create .paw/work/${workId}/Spec.md (minimal mode skips it)`, @@ -80,6 +81,12 @@ describe("minimal workflow (plan + implement, no spec)", { timeout: 300_000 }, ( hasSuccessCriteria: true, }); + const workflowContext = await readFile( + join(ctx.fixture.workDir, ".paw/work", workId, "WorkflowContext.md"), "utf-8", + ); + assert.match(workflowContext, /## Hardened State/); + assert.match(workflowContext, /TODO Mirror:\s*active-required-items/i); + // Assert: src/app.ts modified with stats/counting concepts const appContent = await readFile(join(ctx.fixture.workDir, "src/app.ts"), "utf-8"); assert.match(appContent, /(stats|requestCount|count)/i, "app.ts should reference stats/requestCount/count"); @@ -133,6 +140,8 @@ function buildMinimalPrompt(opts: { planningSkill: string; implementSkill: strin "You are a PAW minimal-mode agent. You will plan and implement directly from a feature brief.", "", "IMPORTANT RULES:", + `- Write workflow context to .paw/work/${opts.workId}/WorkflowContext.md`, + "- WorkflowContext.md MUST include a `## Hardened State` section and `TODO Mirror: active-required-items`", `- Write the plan to .paw/work/${opts.workId}/ImplementationPlan.md`, "- Plan MUST have: phases (## Phase N: ), each with Success Criteria", "- Implement the planned changes by writing code in the repo", diff --git a/tests/integration/tests/workflows/paw-init-hardened-state.test.ts b/tests/integration/tests/workflows/paw-init-hardened-state.test.ts new file mode 100644 index 00000000..251e68cb --- /dev/null +++ b/tests/integration/tests/workflows/paw-init-hardened-state.test.ts @@ -0,0 +1,196 @@ +import { after, describe, it } from "node:test"; +import assert from "node:assert"; +import { readFile } from "fs/promises"; +import { join } from "path"; +import { RuleBasedAnswerer } from "../../lib/answerer.js"; +import { assertToolCalls } from "../../lib/assertions.js"; +import { createTestContext, destroyTestContext, type TestContext } from "../../lib/harness.js"; +import { loadSkill } from "../../lib/skills.js"; + +describe("paw-init hardened-state workflow context", { timeout: 480_000 }, () => { + const contexts: TestContext[] = []; + + after(async () => { + for (const ctx of contexts) { + await destroyTestContext(ctx); + } + }); + + it("writes pending statuses for enabled optional review flows", async () => { + const workflowContext = await runInitCase({ + contexts, + workId: "test-init-hardened-state", + workflowMode: "full", + finalAgentReview: "enabled", + finalReviewMode: "society-of-thought", + planningDocsReview: "enabled", + planningReviewMode: "society-of-thought", + }); + + assert.match(workflowContext, /## Hardened State/); + assert.match(workflowContext, /TODO Mirror:\s*active-required-items/i); + assert.match(workflowContext, /Reconciliation:\s*not_run/i); + assert.match(workflowContext, /### Required Workflow Items/); + assert.match(workflowContext, /`planning-docs-review` \| `pending` \| `activity`/); + assert.match(workflowContext, /`final-review` \| `pending` \| `activity`/); + assert.match(workflowContext, /### Gate Items/); + assert.match(workflowContext, /`transition:after-plan-review` \| `pending` \| `transition`/); + assert.match(workflowContext, /`transition:after-planning-docs-review` \| `pending` \| `transition`/); + assert.match(workflowContext, /`transition:after-final-review` \| `pending` \| `transition`/); + assert.match(workflowContext, /### Configured Procedure Items/); + assert.match(workflowContext, /`procedure:planning-review` \| `pending` \| `procedure`/); + assert.match(workflowContext, /`procedure:final-review` \| `pending` \| `procedure`/); + }); + + it("writes not_applicable for disabled optional review flows", async () => { + const workflowContext = await runInitCase({ + contexts, + workId: "test-init-disabled-review-flows", + workflowMode: "full", + finalAgentReview: "disabled", + finalReviewMode: "multi-model", + planningDocsReview: "disabled", + planningReviewMode: "multi-model", + }); + + assert.match(workflowContext, /## Hardened State/); + assert.match(workflowContext, /`planning-docs-review` \| `not_applicable` \| `activity`/); + assert.match(workflowContext, /`final-review` \| `not_applicable` \| `activity`/); + assert.match(workflowContext, /`transition:after-planning-docs-review` \| `not_applicable` \| `transition`/); + assert.match(workflowContext, /`transition:after-final-review` \| `not_applicable` \| `transition`/); + assert.match(workflowContext, /`procedure:planning-review` \| `not_applicable` \| `procedure`/); + assert.match(workflowContext, /`procedure:final-review` \| `not_applicable` \| `procedure`/); + }); + + it("writes not_applicable for spec-stage items in minimal mode", async () => { + const workflowContext = await runInitCase({ + contexts, + workId: "test-init-minimal-mode", + workflowMode: "minimal", + finalAgentReview: "disabled", + finalReviewMode: "multi-model", + planningDocsReview: "disabled", + planningReviewMode: "multi-model", + }); + + assert.match(workflowContext, /`spec` \| `not_applicable` \| `activity`/); + assert.match(workflowContext, /`spec-review` \| `not_applicable` \| `activity`/); + assert.match(workflowContext, /`transition:after-spec-review` \| `not_applicable` \| `transition`/); + assert.match(workflowContext, /`code-research` \| `pending` \| `activity`/); + assert.match(workflowContext, /`planning` \| `pending` \| `activity`/); + }); +}); + +async function runInitCase(opts: { + contexts: TestContext[]; + workId: string; + workflowMode: "full" | "minimal"; + finalAgentReview: "enabled" | "disabled"; + finalReviewMode: "single-model" | "multi-model" | "society-of-thought"; + planningDocsReview: "enabled" | "disabled"; + planningReviewMode: "single-model" | "multi-model" | "society-of-thought"; +}): Promise { + const initSkill = extractSection( + await loadSkill("paw-init"), + "### WorkflowContext.md", + "### Execution Contract", + ); + + const answerer = new RuleBasedAnswerer([ + (req) => req.choices?.[0] ?? "yes", + ], false); + + const ctx = await createTestContext({ + fixtureName: "minimal-ts", + skillOrAgent: `paw-init-${opts.workId}`, + systemPrompt: buildInitPrompt(initSkill, opts.workId), + answerer, + excludedTools: ["skill", "sql"], + }); + opts.contexts.push(ctx); + + await ctx.session.sendAndWait({ + prompt: [ + "Initialize the workflow context for this test work item.", + "", + `Write only .paw/work/${opts.workId}/WorkflowContext.md and its parent directories.`, + "Do not create branches, commits, PRs, or any other artifacts.", + "Do not ask the user questions.", + "", + "Use this configuration exactly:", + "Work Title: Hardened State Test", + `Work ID: ${opts.workId}`, + "Base Branch: main", + `Target Branch: feature/${opts.workId}`, + "Execution Mode: current-checkout", + `Workflow Mode: ${opts.workflowMode}`, + "Review Strategy: local", + "Review Policy: planning-only", + "Session Policy: continuous", + `Final Agent Review: ${opts.finalAgentReview}`, + `Final Review Mode: ${opts.finalReviewMode}`, + "Final Review Interactive: false", + "Final Review Models: none", + "Final Review Specialists: all", + "Final Review Interaction Mode: parallel", + "Final Review Specialist Models: none", + "Final Review Perspectives: none", + "Final Review Perspective Cap: 2", + "Implementation Model: none", + "Plan Generation Mode: single-model", + "Plan Generation Models: none", + `Planning Docs Review: ${opts.planningDocsReview}`, + `Planning Review Mode: ${opts.planningReviewMode}`, + "Planning Review Interactive: false", + "Planning Review Models: none", + "Planning Review Specialists: all", + "Planning Review Interaction Mode: parallel", + "Planning Review Specialist Models: none", + "Planning Review Perspectives: none", + "Planning Review Perspective Cap: 2", + "Custom Workflow Instructions: none", + "Initial Prompt: Create workflow context only", + "Issue URL: none", + "Artifact Lifecycle: tracked", + ].join("\n"), + }, 120_000); + + const workflowContext = await readFile( + join(ctx.fixture.workDir, ".paw/work", opts.workId, "WorkflowContext.md"), "utf-8", + ); + + assert.doesNotMatch(workflowContext, /- add `phase::` items after planning defines named implementation phases/i); + assertToolCalls(ctx.toolLog, { + forbidden: ["git_push"], + bashMustNotInclude: [/git push/, /gh\s+pr\s+create/], + }); + + return workflowContext; +} + +function buildInitPrompt(skillContent: string, workId: string): string { + return [ + "You are a focused paw-init workflow-context test agent.", + "", + "CRITICAL RULES:", + `- Write .paw/work/${workId}/WorkflowContext.md by following the paw-init reference below`, + "- Preserve the hardened-state template structure and item IDs from the reference", + "- Create only the requested file and its parent directories", + "- Do not create branches, commits, PRs, or any other artifacts", + "- Do NOT ask the user questions", + "- Do not invoke the skill tool or use SQL", + "", + "paw-init WorkflowContext reference:", + skillContent, + ].join("\n"); +} + +function extractSection(content: string, startMarker: string, endMarker: string): string { + const start = content.indexOf(startMarker); + if (start === -1) { + return content; + } + + const end = content.indexOf(endMarker, start); + return content.slice(start, end === -1 ? undefined : end).trim(); +} diff --git a/tests/integration/tests/workflows/paw-status-reconciliation.test.ts b/tests/integration/tests/workflows/paw-status-reconciliation.test.ts new file mode 100644 index 00000000..20effec1 --- /dev/null +++ b/tests/integration/tests/workflows/paw-status-reconciliation.test.ts @@ -0,0 +1,125 @@ +import { after, describe, it } from "node:test"; +import assert from "node:assert"; +import { mkdir, writeFile } from "fs/promises"; +import { join } from "path"; +import { RuleBasedAnswerer } from "../../lib/answerer.js"; +import { createTestContext, destroyTestContext, type TestContext } from "../../lib/harness.js"; +import { loadSkill } from "../../lib/skills.js"; + +const LIVE_TURN_TIMEOUT = 180_000; + +async function seedWorkflowFiles(workDir: string, workId: string): Promise { + const dir = join(workDir, ".paw/work", workId); + await mkdir(dir, { recursive: true }); + await writeFile(join(dir, "WorkflowContext.md"), [ + "# WorkflowContext", + "", + "Work Title: Status Reconciliation Test", + `Work ID: ${workId}`, + "Base Branch: main", + `Target Branch: feature/${workId}`, + "Workflow Mode: full", + "Review Strategy: local", + "Review Policy: planning-only", + "Session Policy: continuous", + "Planning Docs Review: disabled", + "Final Agent Review: enabled", + "Remote: origin", + "Artifact Paths: auto-derived", + "", + "## Hardened State", + "", + "TODO Mirror: active-required-items", + "Reconciliation: stale", + "", + "### Required Workflow Items", + "- `init` | `resolved` | `activity`", + "- `spec` | `resolved` | `activity`", + "- `spec-review` | `resolved` | `activity`", + "- `code-research` | `resolved` | `activity`", + "- `planning` | `resolved` | `activity`", + "- `plan-review` | `resolved` | `activity`", + "- `planning-docs-review` | `not_applicable` | `activity`", + "- `phase:1:embedded-control-state-contract` | `resolved` | `activity`", + "- `phase:2:paw-gate-reconciliation` | `pending` | `activity`", + "- `final-review` | `pending` | `activity`", + "- `final-pr` | `pending` | `activity`", + "", + "### Gate Items", + "- `transition:after-spec-review` | `resolved` | `transition`", + "- `transition:after-plan-review` | `resolved` | `transition`", + "- `transition:after-phase:1` | `blocked` | `transition`", + "- `transition:after-final-review` | `pending` | `transition`", + "", + "### Configured Procedure Items", + "- `procedure:planning-review` | `not_applicable` | `procedure`", + "- `procedure:final-review` | `pending` | `procedure`", + "", + ].join("\n")); + + await writeFile(join(dir, "ImplementationPlan.md"), [ + "# Implementation Plan", + "", + "## Phase Status", + "- [x] Phase 1: Embedded Control-State Contract", + "- [ ] Phase 2: PAW Gate Reconciliation", + "", + "## Phase 1: Embedded Control-State Contract", + "Complete.", + "", + "## Phase 2: PAW Gate Reconciliation", + "In progress.", + "", + ].join("\n")); +} + +function buildStatusPrompt(skillContent: string, workId: string): string { + return [ + "You are executing the paw-status skill. Follow the skill exactly.", + "", + "CRITICAL RULES:", + `- Read .paw/work/${workId}/WorkflowContext.md and .paw/work/${workId}/ImplementationPlan.md`, + "- When hardened state exists, use it as the durable source of truth", + "- Do NOT ask the user questions", + "", + "Skill documentation:", + skillContent, + ].join("\n"); +} + +describe("paw-status reconciliation reporting", { timeout: 240_000 }, () => { + const contexts: TestContext[] = []; + + after(async () => { + for (const ctx of contexts) { + await destroyTestContext(ctx); + } + }); + + it("reports stale hardened control state as read-only/unverified", async () => { + const pawStatus = await loadSkill("paw-status"); + const workId = "test-paw-status-reconciliation"; + + const ctx = await createTestContext({ + fixtureName: "minimal-ts", + skillOrAgent: "paw-status-reconciliation", + systemPrompt: buildStatusPrompt(pawStatus, workId), + answerer: new RuleBasedAnswerer([(req) => req.choices?.[0] ?? "yes"], false), + }); + contexts.push(ctx); + + await seedWorkflowFiles(ctx.fixture.workDir, workId); + + const response = await ctx.session.sendAndWait({ + prompt: [ + `Summarize the current PAW status for work ID ${workId}.`, + "Focus on whether mutation-affecting work is safe to proceed and what is blocking progress.", + ].join("\n"), + }, LIVE_TURN_TIMEOUT); + + const content = response?.data?.content ?? ""; + assert.match(content, /stale/i); + assert.match(content, /read-only|unverified/i); + assert.match(content, /transition:after-phase:1|blocked/i); + }); +}); diff --git a/tests/integration/tests/workflows/review-control-state-sequencing.test.ts b/tests/integration/tests/workflows/review-control-state-sequencing.test.ts new file mode 100644 index 00000000..2b6ea58c --- /dev/null +++ b/tests/integration/tests/workflows/review-control-state-sequencing.test.ts @@ -0,0 +1,46 @@ +import { describe, it } from "node:test"; +import assert from "node:assert"; +import { readFile } from "fs/promises"; +import { resolve } from "path"; +import { fileURLToPath } from "url"; + +const __dirname = fileURLToPath(new URL(".", import.meta.url)); +const REPO_ROOT = resolve(__dirname, "../../../.."); + +async function readRepoFile(relativePath: string): Promise { + return readFile(resolve(REPO_ROOT, relativePath), "utf-8"); +} + +describe("review control-state sequencing content", () => { + it("documents review-stage sequencing and terminal-state rules", async () => { + const contract = await readRepoFile("skills/paw-review-workflow/references/control-state-contract.md"); + const workflow = await readRepoFile("skills/paw-review-workflow/SKILL.md"); + const understanding = await readRepoFile("skills/paw-review-understanding/SKILL.md"); + const feedback = await readRepoFile("skills/paw-review-feedback/SKILL.md"); + const critic = await readRepoFile("skills/paw-review-critic/SKILL.md"); + const reviewAgent = await readRepoFile("agents/PAW Review.agent.md"); + + assert.match(contract, /manual-posting-provided/); + assert.match(contract, /stage advancement, critique finalization, or GitHub\/manual-posting output/i); + assert.match(contract, /`procedure:review-mode` becomes `resolved` only when the configured evaluation path actually ran/); + assert.match(contract, /`output:github` becomes `resolved` only after a pending review is created or manual posting instructions are written/i); + + assert.match(workflow, /use it as the durable source of truth for review stage sequencing and terminal external-review state/i); + assert.match(workflow, /Do not advance to evaluation while `understanding` is unresolved/i); + assert.match(workflow, /`output:feedback` runs only after `evaluation` is resolved/i); + assert.match(workflow, /`output:github` runs only after `output:critique-response` is resolved/i); + + assert.match(understanding, /preserve the existing review identifier, stage items, terminal external review state, and pending review identifiers/i); + assert.match(understanding, /After Step 4 creates `DerivedSpec\.md`, update hardened state so `understanding` is `resolved`, `evaluation` is `pending`, and `Reconciliation` is `current`/i); + + assert.match(feedback, /Initial feedback generation requires `evaluation` to be `resolved` and `output:feedback` to be `pending` or `in_progress`/); + assert.match(feedback, /Critique Response Mode requires `output:critic` to be `resolved` and `output:critique-response` to be `pending` or `in_progress`/); + assert.match(feedback, /After the initial pass, update `ReviewContext\.md` so `output:feedback` is `resolved`/i); + + assert.match(critic, /`paw-review-critic` requires `output:feedback` to be `resolved` and `output:critic` to be `pending` or `in_progress`/); + assert.match(critic, /After assessments are written, update `ReviewContext\.md` so `output:critic` is `resolved`/i); + + assert.match(reviewAgent, /If reconciliation cannot make the state `current`, STOP and report the blocker/i); + assert.match(reviewAgent, /Do not advance past review-stage items or terminal external-review facts that remain unresolved/i); + }); +}); diff --git a/tests/integration/tests/workflows/review-legacy-control-state-mode.test.ts b/tests/integration/tests/workflows/review-legacy-control-state-mode.test.ts new file mode 100644 index 00000000..2bb2cb80 --- /dev/null +++ b/tests/integration/tests/workflows/review-legacy-control-state-mode.test.ts @@ -0,0 +1,61 @@ +import { after, describe, it } from "node:test"; +import assert from "node:assert"; +import { readFile } from "fs/promises"; +import { join } from "path"; +import { RuleBasedAnswerer } from "../../lib/answerer.js"; +import { createTestContext, destroyTestContext, type TestContext } from "../../lib/harness.js"; +import { loadSkill } from "../../lib/skills.js"; +import { seedReviewArtifacts } from "./review-terminal-state-tracking.test-helper.js"; + +const LIVE_TURN_TIMEOUT = 180_000; + +describe("review legacy control-state mode", { timeout: 240_000 }, () => { + const contexts: TestContext[] = []; + + after(async () => { + for (const ctx of contexts) { + await destroyTestContext(ctx); + } + }); + + it("keeps non-GitHub review posting usable without hardened state and reports legacy mode", async () => { + const skillContent = await loadSkill("paw-review-github"); + const identifier = "feature-review-legacy-test"; + + const ctx = await createTestContext({ + fixtureName: "minimal-ts", + skillOrAgent: "review-legacy-control-state-mode", + systemPrompt: [ + "You are executing the paw-review-github skill. Follow the skill exactly.", + "", + "CRITICAL RULES:", + `- Read .paw/reviews/${identifier}/ReviewContext.md and .paw/reviews/${identifier}/ReviewComments.md`, + "- This is a non-GitHub review context, so do not call GitHub tools", + "- Because hardened review state is absent, explicitly report legacy best-effort mode and inactive hardened protections", + "- Do NOT ask the user questions", + "", + "Skill documentation:", + skillContent, + ].join("\n"), + answerer: new RuleBasedAnswerer([(req) => req.choices?.[0] ?? "yes"], false), + excludedTools: ["skill", "sql"], + }); + contexts.push(ctx); + + await seedReviewArtifacts(ctx.fixture.workDir, identifier, false); + + const response = await ctx.session.sendAndWait({ + prompt: [ + `Process .paw/reviews/${identifier}/ReviewComments.md for posting.`, + "This is a local branch review, so provide manual posting instructions instead of calling GitHub.", + ].join("\n"), + }, LIVE_TURN_TIMEOUT); + + const content = response?.data?.content ?? ""; + const reviewComments = await readFile(join(ctx.fixture.workDir, ".paw/reviews", identifier, "ReviewComments.md"), "utf-8"); + + assert.match(content, /legacy/i); + assert.match(content, /hardened[\s\S]*protections[\s\S]*inactive/i); + assert.match(reviewComments, /## Manual Posting Instructions/); + }); +}); diff --git a/tests/integration/tests/workflows/review-terminal-state-tracking.test-helper.ts b/tests/integration/tests/workflows/review-terminal-state-tracking.test-helper.ts new file mode 100644 index 00000000..533d0c6b --- /dev/null +++ b/tests/integration/tests/workflows/review-terminal-state-tracking.test-helper.ts @@ -0,0 +1,127 @@ +import { mkdir, writeFile } from "fs/promises"; +import { join } from "path"; + +interface SeedReviewArtifactOptions { + outputGithubStatus?: "pending" | "in_progress" | "resolved"; + terminalState?: "none" | "pending-review-created" | "manual-posting-provided"; + pendingReviewId?: string; + pendingReviewUrl?: string; + includeManualPostingInstructions?: boolean; +} + +export async function seedReviewArtifacts( + workDir: string, + identifier: string, + hardened: boolean, + options: SeedReviewArtifactOptions = {}, +): Promise { + const dir = join(workDir, ".paw/reviews", identifier); + await mkdir(dir, { recursive: true }); + const outputGithubStatus = options.outputGithubStatus ?? "pending"; + const terminalState = options.terminalState ?? "none"; + const pendingReviewId = options.pendingReviewId ?? "none"; + const pendingReviewUrl = options.pendingReviewUrl ?? "none"; + + const reviewContextLines = [ + "# ReviewContext", + "", + "**Branch**: feature/review-state-test", + "**Remote**: origin", + "**Base Branch**: main", + "**Head Branch**: feature/review-state-test", + "**Base Commit**: abc123", + "**Base Commit Source**: local", + "**Head Commit**: def456", + "**Repository**: Local repository", + "**Author**: Test Reviewer", + "**Title**: Review State Test", + "**State**: active", + "**CI Status**: Not available", + "**Labels**: N/A", + "**Reviewers**: N/A", + "**Linked Issues**: Inferred from commits", + "**Changed Files**: 1 files, +10 -0", + `**Artifact Paths**: .paw/reviews/${identifier}/`, + "", + "## Review Configuration", + "", + "**Review Mode**: single-model", + "**Review Specialists**: all", + "**Review Interaction Mode**: parallel", + "**Review Interactive**: false", + "**Review Specialist Models**: none", + "**Review Perspectives**: none", + "**Review Perspective Cap**: 2", + "", + ]; + + if (hardened) { + reviewContextLines.push( + "## Hardened State", + "", + "TODO Mirror: active-required-items", + "Reconciliation: current", + "", + "### Review Stage Items", + "- `understanding` | `resolved` | `stage`", + "- `evaluation` | `resolved` | `stage`", + "- `output:feedback` | `resolved` | `stage`", + "- `output:critic` | `resolved` | `stage`", + "- `output:critique-response` | `resolved` | `stage`", + `- \`output:github\` | \`${outputGithubStatus}\` | \`stage\``, + "- `procedure:review-mode` | `resolved` | `procedure`", + "", + "### Terminal External Review State", + `- \`${terminalState}\``, + "", + `Pending Review ID: \`${pendingReviewId}\``, + `Pending Review URL: \`${pendingReviewUrl}\``, + "", + ); + } + + await writeFile(join(dir, "ReviewContext.md"), reviewContextLines.join("\n")); + + await writeFile(join(dir, "ReviewComments.md"), [ + "---", + "status: finalized", + "---", + "", + "# Review Comments for feature-review-state-test", + "", + "**Status**: finalized", + "", + "## Inline Comments", + "", + "### File: `src/example.ts` | Lines: 10-12", + "", + "**Type**: Should", + "**Category**: Maintainability", + "", + "Extract the duplicated branch logic into a small helper.", + "", + "**Suggestion:**", + "```typescript", + "const normalized = normalizeBranchName(branch);", + "```", + "", + "**Rationale:**", + "- **Evidence**: `src/example.ts:10-12` duplicates normalization logic", + "- **Baseline Pattern**: `src/other.ts:4-8` uses a shared helper", + "- **Impact**: Duplicate logic risks drift", + "- **Best Practice**: Reuse shared normalization helpers", + "", + "**Final**: ✓ Ready for GitHub posting", + "", + ...(options.includeManualPostingInstructions + ? [ + "---", + "", + "## Manual Posting Instructions", + "", + "Existing manual posting guidance.", + "", + ] + : []), + ].join("\n")); +} diff --git a/tests/integration/tests/workflows/review-terminal-state-tracking.test.ts b/tests/integration/tests/workflows/review-terminal-state-tracking.test.ts new file mode 100644 index 00000000..75ae280b --- /dev/null +++ b/tests/integration/tests/workflows/review-terminal-state-tracking.test.ts @@ -0,0 +1,107 @@ +import { after, describe, it } from "node:test"; +import assert from "node:assert"; +import { readFile } from "fs/promises"; +import { join } from "path"; +import { RuleBasedAnswerer } from "../../lib/answerer.js"; +import { createTestContext, destroyTestContext, type TestContext } from "../../lib/harness.js"; +import { loadSkill } from "../../lib/skills.js"; +import { seedReviewArtifacts } from "./review-terminal-state-tracking.test-helper.js"; + +const LIVE_TURN_TIMEOUT = 180_000; + +function buildGithubPrompt(skillContent: string, identifier: string, hardened: boolean): string { + return [ + "You are executing the paw-review-github skill. Follow the skill exactly.", + "", + "CRITICAL RULES:", + `- Read .paw/reviews/${identifier}/ReviewContext.md and .paw/reviews/${identifier}/ReviewComments.md`, + "- This is a non-GitHub review context, so do not call GitHub tools", + hardened + ? "- Because hardened review state exists, update ReviewContext.md terminal state and output:github status when manual posting instructions are added" + : "- Because hardened review state is absent, explicitly report legacy best-effort mode and inactive hardened protections", + "- Do NOT ask the user questions", + "", + "Skill documentation:", + skillContent, + ].join("\n"); +} + +describe("review terminal state tracking", { timeout: 240_000 }, () => { + const contexts: TestContext[] = []; + + after(async () => { + for (const ctx of contexts) { + await destroyTestContext(ctx); + } + }); + + it("records manual posting as terminal external review state for non-GitHub contexts", async () => { + const skillContent = await loadSkill("paw-review-github"); + const identifier = "feature-review-state-test"; + + const ctx = await createTestContext({ + fixtureName: "minimal-ts", + skillOrAgent: "review-terminal-state-tracking", + systemPrompt: buildGithubPrompt(skillContent, identifier, true), + answerer: new RuleBasedAnswerer([(req) => req.choices?.[0] ?? "yes"], false), + excludedTools: ["skill", "sql"], + }); + contexts.push(ctx); + + await seedReviewArtifacts(ctx.fixture.workDir, identifier, true); + + await ctx.session.sendAndWait({ + prompt: [ + `Process .paw/reviews/${identifier}/ReviewComments.md for posting.`, + "This is a local branch review, so provide manual posting instructions instead of calling GitHub.", + "Update ReviewContext.md terminal state to reflect that outcome.", + ].join("\n"), + }, LIVE_TURN_TIMEOUT); + + const reviewComments = await readFile(join(ctx.fixture.workDir, ".paw/reviews", identifier, "ReviewComments.md"), "utf-8"); + const reviewContext = await readFile(join(ctx.fixture.workDir, ".paw/reviews", identifier, "ReviewContext.md"), "utf-8"); + + assert.match(reviewComments, /## Manual Posting Instructions/); + assert.match(reviewContext, /`output:github` \| `resolved` \| `stage`/); + assert.match(reviewContext, /### Terminal External Review State[\s\S]*- `manual-posting-provided`/); + assert.match(reviewContext, /Pending Review ID:\s*`none`/i); + assert.match(reviewContext, /Reconciliation:\s*current/i); + }); + + it("does not append duplicate manual posting instructions when terminal state already exists", async () => { + const skillContent = await loadSkill("paw-review-github"); + const identifier = "feature-review-state-idempotent"; + + const ctx = await createTestContext({ + fixtureName: "minimal-ts", + skillOrAgent: "review-terminal-state-tracking-idempotent", + systemPrompt: buildGithubPrompt(skillContent, identifier, true), + answerer: new RuleBasedAnswerer([(req) => req.choices?.[0] ?? "yes"], false), + excludedTools: ["skill", "sql"], + }); + contexts.push(ctx); + + await seedReviewArtifacts(ctx.fixture.workDir, identifier, true, { + outputGithubStatus: "resolved", + terminalState: "manual-posting-provided", + includeManualPostingInstructions: true, + }); + + await ctx.session.sendAndWait({ + prompt: [ + `Process .paw/reviews/${identifier}/ReviewComments.md for posting.`, + "This review already has manual posting instructions for a local branch review.", + "Preserve the existing terminal state and avoid duplicating the manual posting section.", + ].join("\n"), + }, LIVE_TURN_TIMEOUT); + + const reviewComments = await readFile(join(ctx.fixture.workDir, ".paw/reviews", identifier, "ReviewComments.md"), "utf-8"); + const reviewContext = await readFile(join(ctx.fixture.workDir, ".paw/reviews", identifier, "ReviewContext.md"), "utf-8"); + + const manualSections = reviewComments.match(/^## Manual Posting Instructions$/gm) ?? []; + assert.strictEqual(manualSections.length, 1); + assert.match(reviewContext, /`output:github` \| `resolved` \| `stage`/); + assert.match(reviewContext, /### Terminal External Review State[\s\S]*- `manual-posting-provided`/); + assert.match(reviewContext, /Pending Review ID:\s*`none`/i); + }); +}); diff --git a/tests/integration/tests/workflows/workflow-control-state-guards.test.ts b/tests/integration/tests/workflows/workflow-control-state-guards.test.ts new file mode 100644 index 00000000..1c61f650 --- /dev/null +++ b/tests/integration/tests/workflows/workflow-control-state-guards.test.ts @@ -0,0 +1,246 @@ +import { after, describe, it } from "node:test"; +import assert from "node:assert"; +import { mkdir, writeFile } from "fs/promises"; +import { join } from "path"; +import { RuleBasedAnswerer } from "../../lib/answerer.js"; +import { TestFixture } from "../../lib/fixtures.js"; +import { createTestContext, destroyTestContext, type TestContext } from "../../lib/harness.js"; +import { loadSkill } from "../../lib/skills.js"; + +const LIVE_TURN_TIMEOUT = 180_000; + +type TransitionCase = { + workId: string; + phaseBoundaryStatus: "resolved" | "blocked"; +}; + +async function seedPhase2WorkflowContext( + workDir: string, + { workId, phaseBoundaryStatus }: TransitionCase, + targetBranch: string, +): Promise { + const dir = join(workDir, ".paw/work", workId); + await mkdir(dir, { recursive: true }); + await writeFile(join(dir, "WorkflowContext.md"), [ + "# WorkflowContext", + "", + "Work Title: Control State Guard Test", + `Work ID: ${workId}`, + "Base Branch: main", + `Target Branch: ${targetBranch}`, + "Workflow Mode: full", + "Review Strategy: local", + "Review Policy: planning-only", + "Session Policy: continuous", + "Planning Docs Review: disabled", + "Final Agent Review: enabled", + "Remote: origin", + "Artifact Lifecycle: commit-and-clean", + "Artifact Paths: auto-derived", + "", + "## Hardened State", + "", + "TODO Mirror: active-required-items", + "Reconciliation: current", + "", + "### Required Workflow Items", + "- `init` | `resolved` | `activity`", + "- `spec` | `resolved` | `activity`", + "- `spec-review` | `resolved` | `activity`", + "- `code-research` | `resolved` | `activity`", + "- `planning` | `resolved` | `activity`", + "- `plan-review` | `resolved` | `activity`", + "- `planning-docs-review` | `not_applicable` | `activity`", + "- `phase:1:embedded-control-state-contract` | `resolved` | `activity`", + "- `phase:2:paw-gate-reconciliation` | `pending` | `activity`", + "- `final-review` | `pending` | `activity`", + "- `final-pr` | `pending` | `activity`", + "", + "### Gate Items", + "- `transition:after-spec-review` | `resolved` | `transition`", + "- `transition:after-plan-review` | `resolved` | `transition`", + `- \`transition:after-phase:1\` | \`${phaseBoundaryStatus}\` | \`transition\``, + "- `transition:after-final-review` | `pending` | `transition`", + "", + "### Configured Procedure Items", + "- `procedure:planning-review` | `not_applicable` | `procedure`", + "- `procedure:final-review` | `pending` | `procedure`", + "", + ].join("\n")); +} + +async function seedImplementationPlan(workDir: string, workId: string, phase2Complete = false): Promise { + const dir = join(workDir, ".paw/work", workId); + await writeFile(join(dir, "ImplementationPlan.md"), [ + "# Implementation Plan", + "", + "## Phase Status", + "- [x] Phase 1: Embedded Control-State Contract", + `- [${phase2Complete ? "x" : " "}] Phase 2: PAW Gate Reconciliation`, + "", + "## Phase 1: Embedded Control-State Contract", + "Complete.", + "", + "## Phase 2: PAW Gate Reconciliation", + phase2Complete ? "Complete." : "In progress.", + "", + "## Phase Candidates", + "None.", + "", + ].join("\n")); +} + +function buildTransitionPrompt(skillContent: string, workId: string): string { + return [ + "You are executing the paw-transition skill. Follow the skill exactly.", + "", + "CRITICAL RULES:", + `- Read .paw/work/${workId}/WorkflowContext.md and .paw/work/${workId}/ImplementationPlan.md`, + "- When hardened state exists, use it as the durable source of truth", + "- Use only evidence that is available from the seeded workflow files and the local git checkout", + "- If the required live state cannot be proven from that evidence, block instead of inferring", + "- Return the TRANSITION RESULT block exactly as specified in the skill", + "- Do NOT ask the user questions", + "", + "Skill documentation:", + skillContent, + ].join("\n"); +} + +describe("workflow control-state guards", { timeout: 360_000 }, () => { + const contexts: TestContext[] = []; + + after(async () => { + for (const ctx of contexts) { + await destroyTestContext(ctx); + } + }); + + async function runPhase2Transition(testCase: TransitionCase): Promise { + const transitionSkill = await loadSkill("paw-transition"); + const targetBranch = `feature/${testCase.workId}`; + const fixture = await TestFixture.clone("minimal-ts"); + await fixture.checkoutBranch(targetBranch, { create: true }); + await seedPhase2WorkflowContext(fixture.workDir, testCase, targetBranch); + await seedImplementationPlan(fixture.workDir, testCase.workId); + + const ctx = await createTestContext({ + fixture, + skillOrAgent: `transition-${testCase.workId}`, + systemPrompt: buildTransitionPrompt(transitionSkill, testCase.workId), + answerer: new RuleBasedAnswerer([(req) => req.choices?.[0] ?? "yes"], false), + }); + contexts.push(ctx); + + const response = await ctx.session.sendAndWait({ + prompt: [ + `Evaluate the transition for work ID ${testCase.workId}.`, + "Context: paw-impl-review for Phase 1 just passed and the workflow is deciding whether Phase 2 can start.", + "Return only the TRANSITION RESULT block.", + ].join("\n"), + }, LIVE_TURN_TIMEOUT); + + return response?.data?.content ?? ""; + } + + it("blocks transition when external state cannot be verified", async () => { + const transitionSkill = await loadSkill("paw-transition"); + const workId = "test-transition-external-unverified"; + const targetBranch = `feature/${workId}`; + const fixture = await TestFixture.clone("minimal-ts"); + await fixture.checkoutBranch(targetBranch, { create: true }); + + const dir = join(fixture.workDir, ".paw/work", workId); + await mkdir(dir, { recursive: true }); + await writeFile(join(dir, "WorkflowContext.md"), [ + "# WorkflowContext", + "", + "Work Title: External Verification Guard Test", + `Work ID: ${workId}`, + "Base Branch: main", + `Target Branch: ${targetBranch}`, + "Workflow Mode: full", + "Review Strategy: prs", + "Review Policy: planning-only", + "Session Policy: continuous", + "Planning Docs Review: disabled", + "Final Agent Review: enabled", + "Remote: origin", + "Artifact Lifecycle: commit-and-clean", + "Artifact Paths: auto-derived", + "", + "## Hardened State", + "", + "TODO Mirror: active-required-items", + "Reconciliation: external_unverified", + "", + "### Required Workflow Items", + "- `init` | `resolved` | `activity`", + "- `spec` | `resolved` | `activity`", + "- `spec-review` | `resolved` | `activity`", + "- `code-research` | `resolved` | `activity`", + "- `planning` | `resolved` | `activity`", + "- `plan-review` | `resolved` | `activity`", + "- `planning-docs-review` | `not_applicable` | `activity`", + "- `phase:1:embedded-control-state-contract` | `resolved` | `activity`", + "- `phase:2:paw-gate-reconciliation` | `resolved` | `activity`", + "- `final-review` | `pending` | `activity`", + "- `final-pr` | `pending` | `activity`", + "", + "### Gate Items", + "- `transition:after-spec-review` | `resolved` | `transition`", + "- `transition:after-plan-review` | `resolved` | `transition`", + "- `transition:after-phase:1` | `resolved` | `transition`", + "- `transition:after-phase:2` | `resolved` | `transition`", + "- `transition:after-final-review` | `pending` | `transition`", + "", + "### Configured Procedure Items", + "- `procedure:planning-review` | `not_applicable` | `procedure`", + "- `procedure:final-review` | `pending` | `procedure`", + "", + ].join("\n")); + await seedImplementationPlan(fixture.workDir, workId, true); + + const ctx = await createTestContext({ + fixture, + skillOrAgent: `transition-${workId}`, + systemPrompt: buildTransitionPrompt(transitionSkill, workId), + answerer: new RuleBasedAnswerer([(req) => req.choices?.[0] ?? "yes"], false), + }); + contexts.push(ctx); + + const response = await ctx.session.sendAndWait({ + prompt: [ + `Evaluate the transition for work ID ${workId}.`, + "Context: the last implementation phase and impl review both completed, so the workflow is deciding whether final review can begin.", + "No PR metadata, remote state, or merge-verification evidence exists beyond this local checkout.", + "Return only the TRANSITION RESULT block.", + ].join("\n"), + }, LIVE_TURN_TIMEOUT); + + const content = response?.data?.content ?? ""; + assert.match(content, /preflight:\s*blocked/i); + assert.match(content, /external_unverified|unverified|cannot prove/i); + }); + + it("blocks transition when a required gate item remains unresolved", async () => { + const response = await runPhase2Transition({ + workId: "test-transition-blocked-gate", + phaseBoundaryStatus: "blocked", + }); + + assert.match(response, /preflight:\s*blocked:.*transition:after-phase:1.*unresolved/i); + }); + + it("allows transition when hardened state is current and the phase gate is resolved", async () => { + const response = await runPhase2Transition({ + workId: "test-transition-phase2-ready", + phaseBoundaryStatus: "resolved", + }); + + assert.match(response, /preflight:\s*passed/i); + assert.match(response, /next_activity:\s*paw-implement/i); + assert.match(response, /Phase 2: PAW Gate Reconciliation/i); + assert.match(response, /pause_at_milestone:\s*false/i); + }); +}); diff --git a/tests/integration/tests/workflows/worktree-pr-strategy.test.ts b/tests/integration/tests/workflows/worktree-pr-strategy.test.ts index fb0c213f..bc483723 100644 --- a/tests/integration/tests/workflows/worktree-pr-strategy.test.ts +++ b/tests/integration/tests/workflows/worktree-pr-strategy.test.ts @@ -11,7 +11,7 @@ import { loadSkill } from "../../lib/skills.js"; import { ToolPolicy } from "../../lib/tool-policy.js"; const LIVE_MODEL = process.env.PAW_TEST_LIVE_MODEL ?? "claude-sonnet-4.6"; -const LIVE_TURN_TIMEOUT = 180_000; +const LIVE_TURN_TIMEOUT = 240_000; type CheckoutSnapshot = { branch: string; @@ -61,10 +61,11 @@ async function seedWorkflowContext( workDir: string, workId: string, targetBranch: string, + hardenedStateLines: string[] = [], ): Promise { const dir = join(workDir, ".paw/work", workId); await mkdir(dir, { recursive: true }); - await writeFile(join(dir, "WorkflowContext.md"), [ + const lines = [ "# WorkflowContext", "", "Work Title: Worktree PR Strategy Test", @@ -82,7 +83,13 @@ async function seedWorkflowContext( "Remote: origin", "Artifact Paths: auto-derived", "", - ].join("\n")); + ]; + + if (hardenedStateLines.length > 0) { + lines.push(...hardenedStateLines, ""); + } + + await writeFile(join(dir, "WorkflowContext.md"), lines.join("\n")); } async function setupLocalOrigin(checkouts: Awaited>, targetBranch: string): Promise { @@ -101,6 +108,7 @@ function buildPrompt(skillContent: string, workId: string, targetBranch: string) `- Read .paw/work/${workId}/WorkflowContext.md before any git mutation`, `- Treat ${targetBranch} as the execution checkout target branch`, "- Operate only in the current execution checkout and never mutate the caller checkout", + "- If WorkflowContext.md contains `## Hardened State`, reconcile it first and block mutation if reconciliation is not current or required gate/procedure items remain unresolved", "- Follow the PRs branch naming mechanics from the git operations skill", "- For each review branch, start from the target branch before branching off", "- Create only the requested branch-specific artifacts", @@ -116,7 +124,7 @@ function buildPrompt(skillContent: string, workId: string, targetBranch: string) ].join("\n"); } -describe("worktree PR strategy branch behavior", { timeout: 240_000 }, () => { +describe("worktree PR strategy branch behavior", { timeout: 420_000 }, () => { const contexts: TestContext[] = []; after(async () => { @@ -185,7 +193,7 @@ describe("worktree PR strategy branch behavior", { timeout: 240_000 }, () => { assert.match(phaseContent, /worktreePrProof/, "phase branch should contain proof source file"); const docsContent = await checkouts.execution.git.raw(["show", `${docsBranch}:.paw/work/${workId}/Docs.md`]); - assert.match(docsContent, /docs/i, "docs branch should contain Docs artifact"); + assert.match(docsContent, /documentation|docs/i, "docs branch should contain Docs artifact"); const planMergeBase = (await checkouts.execution.git.raw(["merge-base", targetBranch, planBranch])).trim(); const phaseMergeBase = (await checkouts.execution.git.raw(["merge-base", targetBranch, phaseBranch])).trim(); @@ -210,12 +218,83 @@ describe("worktree PR strategy branch behavior", { timeout: 240_000 }, () => { assert.doesNotMatch(docsTree, /^src\/worktree-pr-proof\.ts$/m, "docs branch should not contain phase artifact"); assertToolCalls(ctx.toolLog, { - bashMustInclude: [ - new RegExp(`git push(?: -u)? origin ${planBranch.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")}`), - new RegExp(`git push(?: -u)? origin ${phaseBranch.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")}`), - new RegExp(`git push(?: -u)? origin ${docsBranch.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")}`), - ], bashMustNotInclude: [/gh\s+pr\s+create/, /\bnpm test\b/, /\bnpm run build\b/, /\brm -rf dist\b/], }); }); + + it("refuses worktree branch mutation when hardened control state is unresolved", async () => { + const gitOpsSkill = await loadSkill("paw-git-operations"); + const workId = "test-worktree-pr-strategy-blocked"; + const targetBranch = "feature/test-worktree-pr-strategy-blocked"; + const planBranch = `${targetBranch}_plan`; + const checkouts = await createCallerAndExecution("minimal-ts", workId, targetBranch); + + await setupLocalOrigin(checkouts, targetBranch); + await seedWorkflowContext(checkouts.execution.path, workId, targetBranch, [ + "## Hardened State", + "", + "TODO Mirror: active-required-items", + "Reconciliation: external_unverified", + "", + "### Required Workflow Items", + "- `init` | `resolved` | `activity`", + "- `spec` | `resolved` | `activity`", + "- `spec-review` | `resolved` | `activity`", + "- `code-research` | `resolved` | `activity`", + "- `planning` | `resolved` | `activity`", + "- `plan-review` | `resolved` | `activity`", + "- `planning-docs-review` | `not_applicable` | `activity`", + "- `phase:1:paw-gate-reconciliation` | `pending` | `activity`", + "- `final-review` | `pending` | `activity`", + "- `final-pr` | `pending` | `activity`", + "", + "### Gate Items", + "- `transition:after-spec-review` | `resolved` | `transition`", + "- `transition:after-plan-review` | `blocked` | `transition`", + "- `transition:after-final-review` | `pending` | `transition`", + "", + "### Configured Procedure Items", + "- `procedure:planning-review` | `not_applicable` | `procedure`", + "- `procedure:final-review` | `pending` | `procedure`", + ]); + const callerBefore = await captureCheckoutSnapshot(checkouts.caller); + const executionBefore = await captureCheckoutSnapshot(checkouts.execution); + + const answerer = new RuleBasedAnswerer([ + (req) => req.choices?.[0] ?? "yes", + ], false); + + const ctx = await createTestContext({ + fixture: checkouts.fixture, + executionPath: checkouts.execution.path, + skillOrAgent: "worktree-pr-strategy-blocked", + systemPrompt: buildPrompt(gitOpsSkill, workId, targetBranch), + answerer, + model: LIVE_MODEL, + toolPolicy: new PushToOriginPolicy(checkouts.execution.path, [checkouts.fixture.workDir]), + excludedTools: ["skill", "sql"], + }); + contexts.push(ctx); + + const response = await ctx.session.sendAndWait({ + prompt: [ + `1. Read .paw/work/${workId}/WorkflowContext.md and determine whether it is safe to create planning branch ${planBranch}.`, + "2. If hardened control state is unresolved or cannot be reconciled, stop immediately and explain the blocker.", + "3. Do not create, edit, stage, commit, checkout, or push anything in that case.", + ].join("\n"), + }, LIVE_TURN_TIMEOUT); + + const content = response?.data?.content ?? ""; + assert.match(content, /external_unverified|transition:after-plan-review|unresolved|block/i); + + const callerAfter = await captureCheckoutSnapshot(checkouts.caller); + assert.deepStrictEqual(callerAfter, callerBefore, "caller checkout should remain unchanged"); + const executionAfter = await captureCheckoutSnapshot(checkouts.execution); + assert.deepStrictEqual(executionAfter, executionBefore, "execution checkout should remain unchanged when hardened state blocks mutation"); + + assertToolCalls(ctx.toolLog, { + forbidden: ["create", "edit"], + bashMustNotInclude: [/git checkout/, /git switch/, /git add/, /git commit/, /git push/], + }); + }); });