Skip to content
204 changes: 204 additions & 0 deletions .agents/recipes/_fix-policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Agentic CI Fix Policy

Prepended to every daily-suite recipe alongside `_runner.md`. Defines what
"open a PR" means for these recipes and the rules that apply across all of
them. Each suite recipe declares only its eligible finding categories, its
branch types, and any risk-specific notes β€” everything else is here.

When in doubt, fall back to report-only.

## Localized fix bar

A finding may be converted to a fix only if all hold:

- **Bounded scope**: ≀3 files, ≀50 LOC net.
- **Reversible**: no public API changes, no `__all__` deletions, no version
bumps (Dependabot owns those), no schema changes, no migrations.
- **Self-evident**: the audit established both the problem *and* the unique
correct fix. Mechanical, not interpretive.
- **Test-safe**: when the recipe declares `test_required`, run
`make test-<package>` for the affected package and abort on failure.
- **Single concern**: one finding per PR.
- **Allowlisted paths**: matches the suite's path allowlist.

If the top-ranked candidate fails the bar, try the next. If none of the top
5 qualify, skip the fix step and emit report-only.

## Allowlists

### Per-suite path allowlist

| Suite | Paths the recipe MAY modify |
|-------|-----------------------------|
| docs-and-references | `architecture/**`, `docs/**`, `README.md`, `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md`, `packages/*/src/**/*.py` (docstring-only edits) |
| dependencies | `packages/*/pyproject.toml` |
| structure | `packages/*/src/**/*.py` |
| code-quality | `packages/*/src/**/*.py` |
| test-health | (no fix phase) |

### Shared forbidden paths (all suites)

- `.github/workflows/**`, `.agents/**`, repo-root `pyproject.toml`,
`.git/**`, anything in `.gitignore`.

### Shared forbidden commands

- `git push --force` (any variant), `git rebase`, `git reset --hard`,
`git branch -D`/`-d`/`--delete`.
- `gh pr merge`, `gh pr close`, `gh pr review`.
- `pip install`, `uv pip install` (use `make install-dev` only).

## Runner-state schema

Each daily recipe maintains two arrays in
`{{memory_path}}/runner-state.json` beyond the existing `known_issues` /
`baselines`:

```json
{
"fix_backlog": [
{ "id": "<hash>", "category": "...", "first_seen": "YYYY-MM-DD",
"last_seen": "YYYY-MM-DD", "data": { /* category fields */ } }
],
"attempted_fixes": [
{ "id": "<hash>", "attempts": [
{ "pr_number": 612, "outcome": "merged", "at": "YYYY-MM-DD",
"branch": "agentic-ci/..." }
] }
]
}
```

Also: `draft_until_proven` (boolean, per-suite, default `true` for
code-quality and unset elsewhere) controls draft-PR mode.

### `fix_backlog` rules (audit phase populates this)

- Append every detected finding in an eligible category. Update `last_seen`
if `id` already present.
- Drop entries with `last_seen` older than 30 days.
- Cap at 200 entries (drop oldest by `first_seen`).
- Populated **before** the `known_issues` filter so fixable findings persist
even when their report row is suppressed for being unchanged.

### `attempted_fixes` rules

`outcome` ∈ `{open, merged, closed, abandoned}`.

- `abandoned` means the recipe could not produce a PR (tests failed,
conflict, lint failed, allowlist rejected, etc.).
- Reconcile against open PRs (`gh pr list`) at the start of each fix run
to recover from crashes that left state un-updated.
- Prune: drop `merged` >90d, drop single `closed`/`abandoned` >30d.
- Two-strike entries (β‰₯2 `closed`/`abandoned`) are NOT pruned; they
surface in the report under `Repeatedly-failed fix attempts`.

## Finding hash

`finding_id = sha1(suite + ":" + canonical_key)[:12]`, where
`canonical_key` uses durable identifiers only β€” never line numbers or free
text:
Comment thread
andreatgretel marked this conversation as resolved.

| Suite (category) | canonical_key |
|------------------|---------------|
| docs (broken-link) | `<source-file>:<target>` |
| docs (docstring-drift) | `<source-file>:<symbol>:<param-or-empty>:<drift-type>` |
| docs (arch-ref-rename) | `<doc-file>:<old-symbol>` |
| dependencies (transitive-gap) | `<package>:<dep>:transitive` |
| dependencies (unused) | `<package>:<dep>:unused` |
| structure (missing-future) | `<source-file>:missing-future` |
| structure (lazy-import) | `<source-file>:lazy-import:<imported-module>` |
| code-quality (bare-except) | `<source-file>:<enclosing-symbol>:bare-except` |

Symbols use fully-qualified Python names.

## Ranking

Earlier criteria override later ones:

1. **Fix confidence** (per-category):

| Category | Confidence |
|----------|-----------|
| structure / missing-future | 1.0 |
| structure / lazy-import | 0.9 |
| docs / broken-link | 0.9 |
| dependencies / transitive-gap | 0.85 |
| docs / arch-ref-rename | 0.8 |
| dependencies / unused | 0.75 |
| docs / docstring-drift | 0.75 |
| code-quality / bare-except | 0.6 |

2. **Defect severity**:

| Severity | Examples |
|----------|----------|
| high | missing transitive dep, heavy import bypassing lazy system |
| medium | broken doc link visible on docs site, bare-except hiding errors, docstring drift on public API |
| low | broken link in dev-notes, missing `__future__ import annotations`, unused dep |

3. **User-facing impact** β€” visible to docs-site readers or plugin
consumers vs internal-only.

4. **Recency** β€” newer findings rank above long-standing ones.

Record the chosen finding's id, scores, and rationale at the top of
`/tmp/audit-{{suite}}.md`.

## Standard fix procedure

The fix phase of every eligible recipe follows these steps. Suite recipes
declare only the parts that vary (eligible categories, branch type,
`test_required`, suite-specific quirks).

1. Reconcile `attempted_fixes` against open PRs (`gh pr list`) to recover
any state lost to a prior crash.
2. Filter `fix_backlog`: drop entries whose latest attempt is `open` or
`merged`; surface two-strike entries in the report's
`Repeatedly-failed fix attempts` section and drop them from selection.
3. Rank the remainder per the Ranking section.
4. For each candidate, top 5 max:
1. Re-verify the finding still applies (re-grep / re-read). If not,
remove from `fix_backlog` and continue.
2. Apply the fix. If the diff exceeds the localized-fix bar or touches
a non-allowlisted path, abandon and continue.
3. If the category sets `test_required: true`, run
`make test-<package>` for the package containing the change. On
failure: abandon and continue.
4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
`<type>(agentic-ci): <one-line>`. Push.
5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the
hidden metadata block:
`<!-- agentic-ci finding=<id> suite=<suite> -->`
6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
iff `draft_until_proven` is true for the suite.
7. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
8. Record `attempted_fixes` entry with `outcome: "open"` and exit.
5. If all 5 candidates were abandoned, append a one-line note to the
report and exit cleanly. The state already reflects the abandonments.

On any failure mid-flow: record `outcome: "abandoned"` for the chosen
finding (with `pr_number: null`), leave any pushed branch in place
(`pr-stale.yml` will reap it; branch deletion is forbidden), and continue
to the next candidate.

## PR conventions

- **Use `gh pr create --body-file`**, not `/create-pr`. The skill is
interactive-only and shells the body inline; CI needs determinism.
- **Title**: conventional, `<type>(agentic-ci): <one-line>`.
- **Labels**: `agentic-ci`, `agentic-ci/<suite>`.
- **Draft PRs**: `code-quality` opens draft until a maintainer flips
`draft_until_proven` to `false` in runner-state, after at least two
non-draft PRs from that suite have landed clean.

## Atomicity

Each fix-phase invocation produces exactly one of:

- **Report-only** β€” runner-state updated; no branch, commit, or PR.
- **Report + PR** β€” same, plus a pushed branch, a commit, and a PR. The
`attempted_fixes` entry is recorded *before* the recipe exits.

No half-states. The runner state is the source of truth for what the
recipe has tried; never silently drop a failed attempt.
17 changes: 17 additions & 0 deletions .agents/recipes/_phase-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Phase directive

This invocation runs the **AUDIT** phase only.

- Execute the audit steps from the recipe and write the report to
`/tmp/audit-{{suite}}.md`.
- Update `{{memory_path}}/runner-state.json` with detected findings,
including `fix_backlog` entries per `_fix-policy.md` (populated BEFORE
applying the `known_issues` filter to the report, so fixable findings
persist across runs even when their report row is suppressed).
- Do NOT attempt any fix. Do NOT create any branches, commits, or PRs.
- Do NOT modify any files outside `{{memory_path}}/`.
- A separate invocation will run the FIX phase if `fix_backlog` has
eligible candidates and the suite has a fix phase.
- Read the recipe in full for context; the "Fix phase" section informs
which finding categories should populate `fix_backlog`, but you must
not act on them in this invocation.
23 changes: 23 additions & 0 deletions .agents/recipes/_phase-fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## Phase directive

This invocation runs the **FIX** phase only.

- The audit phase has already completed in a previous invocation. Its
report is at `/tmp/audit-{{suite}}.md` and
`{{memory_path}}/runner-state.json` has the populated `fix_backlog`.
- Execute only the recipe's "Fix phase" section per `_fix-policy.md`.
Do NOT redo audit work; do NOT re-scan the codebase to rebuild
findings.
- Pick the highest-ranked eligible candidate from `fix_backlog`, apply
the fix, run the package's tests if applicable, commit, push, and open
the PR using `gh pr create --body-file`.
- Record the attempt in `attempted_fixes` (whether successful, abandoned,
or failed through the top-5 fallback) before exiting.
- If no candidate qualifies after trying up to 5 of them, exit cleanly,
append a short note to `/tmp/audit-{{suite}}.md` describing what was
tried, and update `attempted_fixes` accordingly. Do NOT open a PR.
- Do NOT delete branches, even on failure (per `_runner.md` and
`_fix-policy.md`). Leave them for the existing `pr-stale.yml` workflow
to reap over time.
- Read the recipe in full for context, but treat the audit phase as
already done.
14 changes: 11 additions & 3 deletions .agents/recipes/_runner.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,14 @@ Write all output to a temp file (e.g., `/tmp/recipe-output.md`). The workflow
will handle posting it. Do not post directly to GitHub - the workflow controls
output routing.

If your recipe produces code changes, commit them on a new branch and use
`/create-pr` to open a pull request. The branch name should follow the
pattern `agentic-ci/chore/{suite}-YYYYMMDD`.
If your recipe produces code changes, commit them on a new branch following
the pattern `agentic-ci/{type}/{suite}-YYYYMMDD-{short-slug}` where `{type}`
matches the change kind (`chore`/`docs`/`fix`/`refactor`).

For PR creation in CI, use `gh pr create --body-file /tmp/pr-body-<suite>.md`
directly rather than the `/create-pr` skill. The skill assumes an interactive
session (it can prompt about uncommitted changes, base branch, etc.) and
shells the body inline, which breaks on backticks and special characters.
Daily-suite recipes that open PRs are governed by `_fix-policy.md` β€” read it
for the full PR contract (allowlists, draft mode, hidden metadata, branch
naming, atomicity).
53 changes: 52 additions & 1 deletion .agents/recipes/code-quality/recipe.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@ Read `{{memory_path}}/runner-state.json` for baselines from previous runs
re-reporting known issues. Flag metrics that are trending in the wrong
direction compared to the previous baseline.

This recipe also maintains `fix_backlog` and `attempted_fixes` per
`_fix-policy.md`. Update `fix_backlog` for every detected bare-except
finding *before* the `known_issues` filter applies. (Other categories
remain report-only and do not enter `fix_backlog`.)

The `draft_until_proven` flag in runner-state controls whether this
suite's PRs are opened as draft. Default `true` until a maintainer flips
it to `false`.

## Instructions

### 1. Complexity hotspots
Expand Down Expand Up @@ -238,9 +247,51 @@ Write the report to `/tmp/audit-{{suite}}.md`:

If no findings in any category, write `NO_FINDINGS` on the first line instead.

## Fix phase

Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits:

### Eligible categories

| Category | Branch type | test_required | Eligibility note |
|----------|-------------|---------------|------------------|
| bare-except | `refactor` | yes | Replace `except:` / `except BaseException:` with the specific exception type. Eligible only when grep across the try-block confirms **exactly one** exception type is plausibly raised, verified by inspecting the called functions or imported library docs. Multiple plausible types β†’ ineligible. Test files are excluded (different exception-handling standards). |

`fix_backlog.data` should record the proposed replacement exception type
and the grep evidence used to determine it. Within bare-except findings,
prefer ones in user-facing modules (`packages/data-designer/src/`) over
internal helpers (the ranking impact criterion handles this once
`data.user_facing` is set).

The PR body should include the before/after of the try-block plus the
grep evidence that justified the chosen exception type, and a note that
the PR is draft until landing rate is proven (ask reviewers to mark
ready-for-review if the change is correct).

**Draft mode**: this suite opens PRs as draft until a maintainer flips
`draft_until_proven` to `false` in runner-state, after at least two
non-draft PRs have landed clean. Bare-except narrowing is the most
inference-heavy fix in any suite (confidence 0.6); recipe judgement has
to be earned before promotion. Two-strike findings here are an
especially important signal β€” they suggest the detector is producing
false positives in an already-cautious category.

**Not eligible** β€” stays report-only:

- Complexity refactors, type annotation additions, exception hierarchy
normalization (judgement-heavy).
- **TODO line deletion** β€” the audit's "looks done" judgement is not
mechanical enough to delete code on. Deletion is forbidden.

## Constraints

- Do not modify any files. This is a read-only audit.
- Outside the fix phase, this recipe is read-only β€” do not modify files.
- Within the fix phase, only modify paths in the suite's path allowlist
(`packages/*/src/**/*.py`). Test files are excluded.
- **TODO line deletion is forbidden.** The audit phase still inventories
TODOs, but the fix phase does not act on them.
- Bare-except narrowing is only eligible when the exception type is
unambiguous. When in doubt, skip.
- Do not flag test files for type coverage or exception hygiene. Tests have
different standards.
- Do not duplicate ruff checks (W, F, I, ICN, PIE, TID, UP*). Those are
Expand Down
38 changes: 35 additions & 3 deletions .agents/recipes/dependencies/recipe.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ dependency versions. After the audit, update `known_issues` and
`baselines.dependency_versions` with the current state. Skip reporting issues
that already appear in `known_issues`.

This recipe also maintains `fix_backlog` and `attempted_fixes` per
`_fix-policy.md`. Update `fix_backlog` for every detected finding *before*
the `known_issues` filter applies.

## Instructions

### 1. Inventory current dependencies
Expand Down Expand Up @@ -154,12 +158,40 @@ Write the report to `/tmp/audit-{{suite}}.md`:

If no findings in any category, write `NO_FINDINGS` on the first line instead.

## Fix phase

Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits:

### Eligible categories

| Category | Branch type | test_required | Eligibility note |
|----------|-------------|---------------|------------------|
| transitive-gap | `chore` | yes | Add the imported module to `[project.dependencies]` of the package that imports it. Use the version specifier from a package that already declares it; otherwise the latest stable specifier. Insert in alphabetical order; match existing quote/specifier style. |
| unused | `chore` | yes | Remove the declaration. Eligible only when grep across the package's `src/`, lazy-import system, plugin entry points, and tests turns up zero references. |

`fix_backlog.data` should record: for transitive-gap, the importing source
files and proposed version specifier; for unused, which other packages
also declare the dep.

Before running `make test-<package>`, run `make install-dev` to confirm
the lockfile resolves cleanly. `make install-dev` is the only sanctioned
install command (no direct `pip install` or `uv pip install`).

**Not eligible** β€” stays report-only:

- Cross-package version reconciliation, version pinning concerns
(judgement-heavy).
- CVE response (Dependabot's job).

## Constraints

- Do not modify any files. This is a read-only audit.
- Do not install packages or run `pip install`. Only inspect `pyproject.toml`
and source files.
- Outside the fix phase, this recipe is read-only β€” do not modify files.
- Within the fix phase, only modify `packages/*/pyproject.toml`. The
repo-root `pyproject.toml` is forbidden.
- `make install-dev` is the only sanctioned install command. Do not
invoke `pip install` or `uv pip install` directly.
- Do not run `pip audit` (may not be available on the runner). Focus on
structural dependency analysis, not CVE scanning (Dependabot handles that).
- Do not recommend changes to dependencies you haven't verified are actually
problematic. False positives erode trust in the audit.
- Version pinning changes are explicitly out of scope for the fix phase.
Loading
Loading