NVIDIA-NeMo · andreatgretel · May 4, 2026 · May 4, 2026 · May 4, 2026 · May 4, 2026
@@ -0,0 +1,204 @@
+# Agentic CI Fix Policy
+
+Prepended to every daily-suite recipe alongside `_runner.md`. Defines what
+"open a PR" means for these recipes and the rules that apply across all of
+them. Each suite recipe declares only its eligible finding categories, its
+branch types, and any risk-specific notes — everything else is here.
+
+When in doubt, fall back to report-only.
+
+## Localized fix bar
+
+A finding may be converted to a fix only if all hold:
+
+- **Bounded scope**: ≤3 files, ≤50 LOC net.
+- **Reversible**: no public API changes, no `__all__` deletions, no version
+  bumps (Dependabot owns those), no schema changes, no migrations.
+- **Self-evident**: the audit established both the problem *and* the unique
+  correct fix. Mechanical, not interpretive.
+- **Test-safe**: when the recipe declares `test_required`, run
+  `make test-<package>` for the affected package and abort on failure.
+- **Single concern**: one finding per PR.
+- **Allowlisted paths**: matches the suite's path allowlist.
+
+If the top-ranked candidate fails the bar, try the next. If none of the top
+5 qualify, skip the fix step and emit report-only.
+
+## Allowlists
+
+### Per-suite path allowlist
+
+| Suite | Paths the recipe MAY modify |
+|-------|-----------------------------|
+| docs-and-references | `architecture/**`, `docs/**`, `README.md`, `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md`, `packages/*/src/**/*.py` (docstring-only edits) |
+| dependencies | `packages/*/pyproject.toml` |
+| structure | `packages/*/src/**/*.py` |
+| code-quality | `packages/*/src/**/*.py` |
+| test-health | (no fix phase) |
+
+### Shared forbidden paths (all suites)
+
+- `.github/workflows/**`, `.agents/**`, repo-root `pyproject.toml`,
+  `.git/**`, anything in `.gitignore`.
+
+### Shared forbidden commands
+
+- `git push --force` (any variant), `git rebase`, `git reset --hard`,
+  `git branch -D`/`-d`/`--delete`.
+- `gh pr merge`, `gh pr close`, `gh pr review`.
+- `pip install`, `uv pip install` (use `make install-dev` only).
+
+## Runner-state schema
+
+Each daily recipe maintains two arrays in
+`{{memory_path}}/runner-state.json` beyond the existing `known_issues` /
+`baselines`:
+
+```json
+{
+  "fix_backlog": [
+    { "id": "<hash>", "category": "...", "first_seen": "YYYY-MM-DD",
+      "last_seen": "YYYY-MM-DD", "data": { /* category fields */ } }
+  ],
+  "attempted_fixes": [
+    { "id": "<hash>", "attempts": [
+      { "pr_number": 612, "outcome": "merged", "at": "YYYY-MM-DD",
+        "branch": "agentic-ci/..." }
+    ] }
+  ]
+}
+```
+
+Also: `draft_until_proven` (boolean, per-suite, default `true` for
+code-quality and unset elsewhere) controls draft-PR mode.
+
+### `fix_backlog` rules (audit phase populates this)
+
+- Append every detected finding in an eligible category. Update `last_seen`
+  if `id` already present.
+- Drop entries with `last_seen` older than 30 days.
+- Cap at 200 entries (drop oldest by `first_seen`).
+- Populated **before** the `known_issues` filter so fixable findings persist
+  even when their report row is suppressed for being unchanged.
+
+### `attempted_fixes` rules
+
+`outcome` ∈ `{open, merged, closed, abandoned}`.
+
+- `abandoned` means the recipe could not produce a PR (tests failed,
+  conflict, lint failed, allowlist rejected, etc.).
+- Reconcile against open PRs (`gh pr list`) at the start of each fix run
+  to recover from crashes that left state un-updated.
+- Prune: drop `merged` >90d, drop single `closed`/`abandoned` >30d.
+- Two-strike entries (≥2 `closed`/`abandoned`) are NOT pruned; they
+  surface in the report under `Repeatedly-failed fix attempts`.
+
+## Finding hash
+
+`finding_id = sha1(suite + ":" + canonical_key)[:12]`, where
+`canonical_key` uses durable identifiers only — never line numbers or free
+text:
+
+| Suite (category) | canonical_key |
+|------------------|---------------|
+| docs (broken-link) | `<source-file>:<target>` |
+| docs (docstring-drift) | `<source-file>:<symbol>:<param-or-empty>:<drift-type>` |
+| docs (arch-ref-rename) | `<doc-file>:<old-symbol>` |
+| dependencies (transitive-gap) | `<package>:<dep>:transitive` |
+| dependencies (unused) | `<package>:<dep>:unused` |
+| structure (missing-future) | `<source-file>:missing-future` |
+| structure (lazy-import) | `<source-file>:lazy-import:<imported-module>` |
+| code-quality (bare-except) | `<source-file>:<enclosing-symbol>:bare-except` |
+
+Symbols use fully-qualified Python names.
+
+## Ranking
+
+Earlier criteria override later ones:
+
+1. **Fix confidence** (per-category):
+
+   | Category | Confidence |
+   |----------|-----------|
+   | structure / missing-future | 1.0 |
+   | structure / lazy-import | 0.9 |
+   | docs / broken-link | 0.9 |
+   | dependencies / transitive-gap | 0.85 |
+   | docs / arch-ref-rename | 0.8 |
+   | dependencies / unused | 0.75 |
+   | docs / docstring-drift | 0.75 |
+   | code-quality / bare-except | 0.6 |
+
+2. **Defect severity**:
+
+   | Severity | Examples |
+   |----------|----------|
+   | high | missing transitive dep, heavy import bypassing lazy system |
+   | medium | broken doc link visible on docs site, bare-except hiding errors, docstring drift on public API |
+   | low | broken link in dev-notes, missing `__future__ import annotations`, unused dep |
+
+3. **User-facing impact** — visible to docs-site readers or plugin
+   consumers vs internal-only.
+
+4. **Recency** — newer findings rank above long-standing ones.
+
+Record the chosen finding's id, scores, and rationale at the top of
+`/tmp/audit-{{suite}}.md`.
+
+## Standard fix procedure
+
+The fix phase of every eligible recipe follows these steps. Suite recipes
+declare only the parts that vary (eligible categories, branch type,
+`test_required`, suite-specific quirks).
+
+1. Reconcile `attempted_fixes` against open PRs (`gh pr list`) to recover
+   any state lost to a prior crash.
+2. Filter `fix_backlog`: drop entries whose latest attempt is `open` or
+   `merged`; surface two-strike entries in the report's
+   `Repeatedly-failed fix attempts` section and drop them from selection.
+3. Rank the remainder per the Ranking section.
+4. For each candidate, top 5 max:
+   1. Re-verify the finding still applies (re-grep / re-read). If not,
+      remove from `fix_backlog` and continue.
+   2. Apply the fix. If the diff exceeds the localized-fix bar or touches
+      a non-allowlisted path, abandon and continue.
+   3. If the category sets `test_required: true`, run
+      `make test-<package>` for the package containing the change. On
+      failure: abandon and continue.
+   4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
+      `<type>(agentic-ci): <one-line>`. Push.
+   5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the
+      hidden metadata block:
+      `<!-- agentic-ci finding=<id> suite=<suite> -->`
+   6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
+      iff `draft_until_proven` is true for the suite.
+   7. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
+   8. Record `attempted_fixes` entry with `outcome: "open"` and exit.
+5. If all 5 candidates were abandoned, append a one-line note to the
+   report and exit cleanly. The state already reflects the abandonments.
+
+On any failure mid-flow: record `outcome: "abandoned"` for the chosen
+finding (with `pr_number: null`), leave any pushed branch in place
+(`pr-stale.yml` will reap it; branch deletion is forbidden), and continue
+to the next candidate.
+
+## PR conventions
+
+- **Use `gh pr create --body-file`**, not `/create-pr`. The skill is
+  interactive-only and shells the body inline; CI needs determinism.
+- **Title**: conventional, `<type>(agentic-ci): <one-line>`.
+- **Labels**: `agentic-ci`, `agentic-ci/<suite>`.
+- **Draft PRs**: `code-quality` opens draft until a maintainer flips
+  `draft_until_proven` to `false` in runner-state, after at least two
+  non-draft PRs from that suite have landed clean.
+
+## Atomicity
+
+Each fix-phase invocation produces exactly one of:
+
+- **Report-only** — runner-state updated; no branch, commit, or PR.
+- **Report + PR** — same, plus a pushed branch, a commit, and a PR. The
+  `attempted_fixes` entry is recorded *before* the recipe exits.
+
+No half-states. The runner state is the source of truth for what the
+recipe has tried; never silently drop a failed attempt.
@@ -0,0 +1,17 @@
+## Phase directive
+
+This invocation runs the **AUDIT** phase only.
+
+- Execute the audit steps from the recipe and write the report to
+  `/tmp/audit-{{suite}}.md`.
+- Update `{{memory_path}}/runner-state.json` with detected findings,
+  including `fix_backlog` entries per `_fix-policy.md` (populated BEFORE
+  applying the `known_issues` filter to the report, so fixable findings
+  persist across runs even when their report row is suppressed).
+- Do NOT attempt any fix. Do NOT create any branches, commits, or PRs.
+- Do NOT modify any files outside `{{memory_path}}/`.
+- A separate invocation will run the FIX phase if `fix_backlog` has
+  eligible candidates and the suite has a fix phase.
+- Read the recipe in full for context; the "Fix phase" section informs
+  which finding categories should populate `fix_backlog`, but you must
+  not act on them in this invocation.
@@ -0,0 +1,23 @@
+## Phase directive
+
+This invocation runs the **FIX** phase only.
+
+- The audit phase has already completed in a previous invocation. Its
+  report is at `/tmp/audit-{{suite}}.md` and
+  `{{memory_path}}/runner-state.json` has the populated `fix_backlog`.
+- Execute only the recipe's "Fix phase" section per `_fix-policy.md`.
+  Do NOT redo audit work; do NOT re-scan the codebase to rebuild
+  findings.
+- Pick the highest-ranked eligible candidate from `fix_backlog`, apply
+  the fix, run the package's tests if applicable, commit, push, and open
+  the PR using `gh pr create --body-file`.
+- Record the attempt in `attempted_fixes` (whether successful, abandoned,
+  or failed through the top-5 fallback) before exiting.
+- If no candidate qualifies after trying up to 5 of them, exit cleanly,
+  append a short note to `/tmp/audit-{{suite}}.md` describing what was
+  tried, and update `attempted_fixes` accordingly. Do NOT open a PR.
+- Do NOT delete branches, even on failure (per `_runner.md` and
+  `_fix-policy.md`). Leave them for the existing `pr-stale.yml` workflow
+  to reap over time.
+- Read the recipe in full for context, but treat the audit phase as
+  already done.
@@ -76,6 +76,14 @@ Write all output to a temp file (e.g., `/tmp/recipe-output.md`). The workflow
 will handle posting it. Do not post directly to GitHub - the workflow controls
 output routing.
 
-If your recipe produces code changes, commit them on a new branch and use
-`/create-pr` to open a pull request. The branch name should follow the
-pattern `agentic-ci/chore/{suite}-YYYYMMDD`.
+If your recipe produces code changes, commit them on a new branch following
+the pattern `agentic-ci/{type}/{suite}-YYYYMMDD-{short-slug}` where `{type}`
+matches the change kind (`chore`/`docs`/`fix`/`refactor`).
+
+For PR creation in CI, use `gh pr create --body-file /tmp/pr-body-<suite>.md`
+directly rather than the `/create-pr` skill. The skill assumes an interactive
+session (it can prompt about uncommitted changes, base branch, etc.) and
+shells the body inline, which breaks on backticks and special characters.
+Daily-suite recipes that open PRs are governed by `_fix-policy.md` — read it
+for the full PR contract (allowlists, draft mode, hidden metadata, branch
+naming, atomicity).
@@ -35,6 +35,15 @@ Read `{{memory_path}}/runner-state.json` for baselines from previous runs
 re-reporting known issues. Flag metrics that are trending in the wrong
 direction compared to the previous baseline.
 
+This recipe also maintains `fix_backlog` and `attempted_fixes` per
+`_fix-policy.md`. Update `fix_backlog` for every detected bare-except
+finding *before* the `known_issues` filter applies. (Other categories
+remain report-only and do not enter `fix_backlog`.)
+
+The `draft_until_proven` flag in runner-state controls whether this
+suite's PRs are opened as draft. Default `true` until a maintainer flips
+it to `false`.
+
 ## Instructions
 
 ### 1. Complexity hotspots
@@ -238,9 +247,51 @@ Write the report to `/tmp/audit-{{suite}}.md`:
 
 If no findings in any category, write `NO_FINDINGS` on the first line instead.
 
+## Fix phase
+
+Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits:
+
+### Eligible categories
+
+| Category | Branch type | test_required | Eligibility note |
+|----------|-------------|---------------|------------------|
+| bare-except | `refactor` | yes | Replace `except:` / `except BaseException:` with the specific exception type. Eligible only when grep across the try-block confirms **exactly one** exception type is plausibly raised, verified by inspecting the called functions or imported library docs. Multiple plausible types → ineligible. Test files are excluded (different exception-handling standards). |
+
+`fix_backlog.data` should record the proposed replacement exception type
+and the grep evidence used to determine it. Within bare-except findings,
+prefer ones in user-facing modules (`packages/data-designer/src/`) over
+internal helpers (the ranking impact criterion handles this once
+`data.user_facing` is set).
+
+The PR body should include the before/after of the try-block plus the
+grep evidence that justified the chosen exception type, and a note that
+the PR is draft until landing rate is proven (ask reviewers to mark
+ready-for-review if the change is correct).
+
+**Draft mode**: this suite opens PRs as draft until a maintainer flips
+`draft_until_proven` to `false` in runner-state, after at least two
+non-draft PRs have landed clean. Bare-except narrowing is the most
+inference-heavy fix in any suite (confidence 0.6); recipe judgement has
+to be earned before promotion. Two-strike findings here are an
+especially important signal — they suggest the detector is producing
+false positives in an already-cautious category.
+
+**Not eligible** — stays report-only:
+
+- Complexity refactors, type annotation additions, exception hierarchy
+  normalization (judgement-heavy).
+- **TODO line deletion** — the audit's "looks done" judgement is not
+  mechanical enough to delete code on. Deletion is forbidden.
+
 ## Constraints
 
-- Do not modify any files. This is a read-only audit.
+- Outside the fix phase, this recipe is read-only — do not modify files.
+- Within the fix phase, only modify paths in the suite's path allowlist
+  (`packages/*/src/**/*.py`). Test files are excluded.
+- **TODO line deletion is forbidden.** The audit phase still inventories
+  TODOs, but the fix phase does not act on them.
+- Bare-except narrowing is only eligible when the exception type is
+  unambiguous. When in doubt, skip.
 - Do not flag test files for type coverage or exception hygiene. Tests have
   different standards.
 - Do not duplicate ruff checks (W, F, I, ICN, PIE, TID, UP*). Those are

@@ -26,6 +26,10 @@ dependency versions. After the audit, update `known_issues` and
 `baselines.dependency_versions` with the current state. Skip reporting issues
 that already appear in `known_issues`.
 
+This recipe also maintains `fix_backlog` and `attempted_fixes` per
+`_fix-policy.md`. Update `fix_backlog` for every detected finding *before*
+the `known_issues` filter applies.
+
 ## Instructions
 
 ### 1. Inventory current dependencies
@@ -154,12 +158,40 @@ Write the report to `/tmp/audit-{{suite}}.md`:
 
 If no findings in any category, write `NO_FINDINGS` on the first line instead.
 
+## Fix phase
+
+Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits:
+
+### Eligible categories
+
+| Category | Branch type | test_required | Eligibility note |
+|----------|-------------|---------------|------------------|
+| transitive-gap | `chore` | yes | Add the imported module to `[project.dependencies]` of the package that imports it. Use the version specifier from a package that already declares it; otherwise the latest stable specifier. Insert in alphabetical order; match existing quote/specifier style. |
+| unused | `chore` | yes | Remove the declaration. Eligible only when grep across the package's `src/`, lazy-import system, plugin entry points, and tests turns up zero references. |
+
+`fix_backlog.data` should record: for transitive-gap, the importing source
+files and proposed version specifier; for unused, which other packages
+also declare the dep.
+
+Before running `make test-<package>`, run `make install-dev` to confirm
+the lockfile resolves cleanly. `make install-dev` is the only sanctioned
+install command (no direct `pip install` or `uv pip install`).
+
+**Not eligible** — stays report-only:
+
+- Cross-package version reconciliation, version pinning concerns
+  (judgement-heavy).
+- CVE response (Dependabot's job).
+
 ## Constraints
 
-- Do not modify any files. This is a read-only audit.
-- Do not install packages or run `pip install`. Only inspect `pyproject.toml`
-  and source files.
+- Outside the fix phase, this recipe is read-only — do not modify files.
+- Within the fix phase, only modify `packages/*/pyproject.toml`. The
+  repo-root `pyproject.toml` is forbidden.
+- `make install-dev` is the only sanctioned install command. Do not
+  invoke `pip install` or `uv pip install` directly.
 - Do not run `pip audit` (may not be available on the runner). Focus on
   structural dependency analysis, not CVE scanning (Dependabot handles that).
 - Do not recommend changes to dependencies you haven't verified are actually
   problematic. False positives erode trust in the audit.
+- Version pinning changes are explicitly out of scope for the fix phase.