-
Notifications
You must be signed in to change notification settings - Fork 159
feat(agentic-ci): decision-ready triage and daily PR fixes #600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
andreatgretel
wants to merge
8
commits into
main
Choose a base branch
from
andreatgretel/feat/agentic-ci-improvements
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
15455d7
feat(agentic-ci): decision-ready triage and daily PR fixes
andreatgretel 19db106
fix(agentic-ci): address review findings before merge
andreatgretel ca16f52
fix(agentic-ci): close residuals from review pass 2
andreatgretel b77c647
fix(agentic-ci): identity-based partial-post detection in triage fallβ¦
andreatgretel ee06bac
fix(agentic-ci): close remaining bot-review findings
andreatgretel 42f8673
Merge branch 'main' into andreatgretel/feat/agentic-ci-improvements
andreatgretel 969fdaa
fix(agentic-ci): close Greptile pass-2 findings (timeout, re-verify wβ¦
andreatgretel 91e8749
fix(agentic-ci): close Greptile pass-3 P1s in triage fallback
andreatgretel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,223 @@ | ||
| # Agentic CI Fix Policy | ||
|
|
||
| Prepended to every daily-suite recipe alongside `_runner.md`. Defines what | ||
| "open a PR" means for these recipes and the rules that apply across all of | ||
| them. Each suite recipe declares only its eligible finding categories, its | ||
| branch types, and any risk-specific notes β everything else is here. | ||
|
|
||
| When in doubt, fall back to report-only. | ||
|
|
||
| ## Localized fix bar | ||
|
|
||
| A finding may be converted to a fix only if all hold: | ||
|
|
||
| - **Bounded scope**: β€3 files, β€50 LOC net. | ||
| - **Reversible**: no public API changes, no `__all__` deletions, no version | ||
| bumps (Dependabot owns those), no schema changes, no migrations. | ||
| - **Self-evident**: the audit established both the problem *and* the unique | ||
| correct fix. Mechanical, not interpretive. | ||
| - **Test-safe**: when the recipe declares `test_required`, run the | ||
| per-package test target for the affected package and abort on failure. | ||
| Mapping (the Makefile does not expose `test-<package>` directly): | ||
|
|
||
| | Package directory | Test target | | ||
| |-------------------|-------------| | ||
| | `packages/data-designer-config` | `make test-config` | | ||
| | `packages/data-designer-engine` | `make test-engine` | | ||
| | `packages/data-designer` | `make test-interface` | | ||
| - **Single concern**: one finding per PR. | ||
| - **Allowlisted paths**: matches the suite's path allowlist. | ||
|
|
||
| If the top-ranked candidate fails the bar, try the next. If none of the top | ||
| 5 qualify, skip the fix step and emit report-only. | ||
|
|
||
| ## Allowlists | ||
|
|
||
| ### Per-suite path allowlist | ||
|
|
||
| | Suite | Paths the recipe MAY modify | | ||
| |-------|-----------------------------| | ||
| | docs-and-references | `architecture/**`, `docs/**`, `README.md`, `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md`, `packages/*/src/**/*.py` (docstring-only edits) | | ||
| | dependencies | `packages/*/pyproject.toml` | | ||
| | structure | `packages/*/src/**/*.py` | | ||
| | code-quality | `packages/*/src/**/*.py` | | ||
| | test-health | (no fix phase) | | ||
|
|
||
| ### Shared forbidden paths (all suites) | ||
|
|
||
| - `.github/workflows/**`, `.agents/**`, repo-root `pyproject.toml`, | ||
| `.git/**`, anything in `.gitignore`. | ||
|
|
||
| ### Shared forbidden commands | ||
|
|
||
| - `git push --force` (any variant), `git rebase`, `git reset --hard`, | ||
| `git branch -D`/`-d`/`--delete`. | ||
| - `gh pr merge`, `gh pr close`, `gh pr review`. | ||
| - `pip install`, `uv pip install` (use `make install-dev` only). | ||
|
|
||
| ## Runner-state schema | ||
|
|
||
| Each daily recipe maintains two arrays in | ||
| `{{memory_path}}/runner-state.json` beyond the existing `known_issues` / | ||
| `baselines`: | ||
|
|
||
| ```json | ||
| { | ||
| "fix_backlog": [ | ||
| { "id": "<hash>", "category": "...", "first_seen": "YYYY-MM-DD", | ||
| "last_seen": "YYYY-MM-DD", "data": { /* category fields */ } } | ||
| ], | ||
| "attempted_fixes": [ | ||
| { "id": "<hash>", "attempts": [ | ||
| { "pr_number": 612, "outcome": "merged", "at": "YYYY-MM-DD", | ||
| "branch": "agentic-ci/..." } | ||
| ] } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| Also: `draft_until_proven` (boolean, per-suite, default `true` for | ||
| code-quality and unset elsewhere) controls draft-PR mode. | ||
|
|
||
| ### `fix_backlog` rules (audit phase populates this) | ||
|
|
||
| - Append every detected finding in an eligible category. If `id` is already | ||
| present, **refresh both `last_seen` and `data`** with the current scan's | ||
| values. The `data` field is used by the fix phase to apply the change | ||
| without re-scanning, so stale `data` would let an old plan drive a new | ||
| PR after the underlying file moved or changed. | ||
| - Drop entries with `last_seen` older than 30 days. | ||
| - Cap at 200 entries (drop oldest by `first_seen`). | ||
| - Populated **before** the `known_issues` filter so fixable findings persist | ||
| even when their report row is suppressed for being unchanged. | ||
|
|
||
| ### `attempted_fixes` rules | ||
|
|
||
| `outcome` β `{open, merged, closed, abandoned}`. | ||
|
|
||
| - `abandoned` means the recipe could not produce a PR (tests failed, | ||
| conflict, lint failed, allowlist rejected, etc.). | ||
| - Reconcile against open PRs (`gh pr list`) at the start of each fix run | ||
| to recover from crashes that left state un-updated. | ||
| - Prune: drop `merged` entries older than 90 days. Do **not** prune | ||
| `closed` or `abandoned` entries by age β pruning a single-strike entry | ||
| would erase the history needed to ever reach the two-strike threshold. | ||
| The 200-entry cap (with oldest-first eviction by `first_seen`) handles | ||
| long-tail cleanup. | ||
| - Two-strike entries (β₯2 `closed`/`abandoned`) surface in the report | ||
| under `Repeatedly-failed fix attempts` and are filtered from selection | ||
| permanently. | ||
|
|
||
| ## Finding hash | ||
|
|
||
| `finding_id = sha1(suite + ":" + canonical_key)[:12]`, where | ||
| `canonical_key` uses durable identifiers only β never line numbers or free | ||
| text: | ||
|
|
||
| | Suite (category) | canonical_key | | ||
| |------------------|---------------| | ||
| | docs (broken-link) | `<source-file>:<target>` | | ||
| | docs (docstring-drift) | `<source-file>:<symbol>:<param-or-empty>:<drift-type>` | | ||
| | docs (arch-ref-rename) | `<doc-file>:<old-symbol>` | | ||
| | dependencies (transitive-gap) | `<package>:<dep>:transitive` | | ||
| | dependencies (unused) | `<package>:<dep>:unused` | | ||
| | structure (missing-future) | `<source-file>:missing-future` | | ||
| | structure (lazy-import) | `<source-file>:lazy-import:<imported-module>` | | ||
| | code-quality (bare-except) | `<source-file>:<enclosing-symbol>:<try-body-hash>:bare-except` | | ||
|
|
||
| Symbols use fully-qualified Python names. | ||
| `try-body-hash` is `sha1(<try-block body, leading/trailing whitespace | ||
| stripped, internal lines preserved>)[:8]` β needed because a function | ||
| can contain multiple `try:` blocks with bare excepts that would | ||
| otherwise collide on the same finding id. | ||
|
|
||
| ## Ranking | ||
|
|
||
| Earlier criteria override later ones: | ||
|
|
||
| 1. **Fix confidence** (per-category): | ||
|
|
||
| | Category | Confidence | | ||
| |----------|-----------| | ||
| | structure / missing-future | 1.0 | | ||
| | structure / lazy-import | 0.9 | | ||
| | docs / broken-link | 0.9 | | ||
| | dependencies / transitive-gap | 0.85 | | ||
| | docs / arch-ref-rename | 0.8 | | ||
| | dependencies / unused | 0.75 | | ||
| | docs / docstring-drift | 0.75 | | ||
| | code-quality / bare-except | 0.6 | | ||
|
|
||
| 2. **Defect severity**: | ||
|
|
||
| | Severity | Examples | | ||
| |----------|----------| | ||
| | high | missing transitive dep, heavy import bypassing lazy system | | ||
| | medium | broken doc link visible on docs site, bare-except hiding errors, docstring drift on public API | | ||
| | low | broken link in dev-notes, missing `__future__ import annotations`, unused dep | | ||
|
|
||
| 3. **User-facing impact** β visible to docs-site readers or plugin | ||
| consumers vs internal-only. | ||
|
|
||
| 4. **Recency** β newer findings rank above long-standing ones. | ||
|
|
||
| Record the chosen finding's id, scores, and rationale at the top of | ||
| `/tmp/audit-{{suite}}.md`. | ||
|
|
||
| ## Standard fix procedure | ||
|
|
||
| The fix phase of every eligible recipe follows these steps. Suite recipes | ||
| declare only the parts that vary (eligible categories, branch type, | ||
| `test_required`, suite-specific quirks). | ||
|
|
||
| 1. Reconcile `attempted_fixes` against open PRs (`gh pr list`) to recover | ||
| any state lost to a prior crash. | ||
| 2. Filter `fix_backlog`: drop entries whose latest attempt is `open` or | ||
| `merged`; surface two-strike entries in the report's | ||
| `Repeatedly-failed fix attempts` section and drop them from selection. | ||
| 3. Rank the remainder per the Ranking section. | ||
| 4. For each candidate, top 5 max: | ||
| 1. Re-verify the finding still applies (re-grep / re-read). If not, | ||
| remove from `fix_backlog` and continue. | ||
| 2. Apply the fix. If the diff exceeds the localized-fix bar or touches | ||
| a non-allowlisted path, abandon and continue. | ||
| 3. If the category sets `test_required: true`, run | ||
| `make test-<package>` for the package containing the change. On | ||
| failure: abandon and continue. | ||
| 4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit: | ||
| `<type>(agentic-ci): <one-line>`. Push. | ||
| 5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the | ||
| hidden metadata block: | ||
| `<!-- agentic-ci finding=<id> suite=<suite> -->` | ||
| 6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft` | ||
| iff `draft_until_proven` is true for the suite. | ||
| 7. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`. | ||
| 8. Record `attempted_fixes` entry with `outcome: "open"` and exit. | ||
| 5. If all 5 candidates were abandoned, append a one-line note to the | ||
| report and exit cleanly. The state already reflects the abandonments. | ||
|
|
||
| On any failure mid-flow: record `outcome: "abandoned"` for the chosen | ||
| finding (with `pr_number: null`), leave any pushed branch in place | ||
| (`pr-stale.yml` will reap it; branch deletion is forbidden), and continue | ||
| to the next candidate. | ||
|
|
||
| ## PR conventions | ||
|
|
||
| - **Use `gh pr create --body-file`**, not `/create-pr`. The skill is | ||
| interactive-only and shells the body inline; CI needs determinism. | ||
| - **Title**: conventional, `<type>(agentic-ci): <one-line>`. | ||
| - **Labels**: `agentic-ci`, `agentic-ci/<suite>`. | ||
| - **Draft PRs**: `code-quality` opens draft until a maintainer flips | ||
| `draft_until_proven` to `false` in runner-state, after at least two | ||
| non-draft PRs from that suite have landed clean. | ||
|
|
||
| ## Atomicity | ||
|
|
||
| Each fix-phase invocation produces exactly one of: | ||
|
|
||
| - **Report-only** β runner-state updated; no branch, commit, or PR. | ||
| - **Report + PR** β same, plus a pushed branch, a commit, and a PR. The | ||
| `attempted_fixes` entry is recorded *before* the recipe exits. | ||
|
|
||
| No half-states. The runner state is the source of truth for what the | ||
| recipe has tried; never silently drop a failed attempt. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| ## Phase directive | ||
|
|
||
| This invocation runs the **AUDIT** phase only. | ||
|
|
||
| - Execute the audit steps from the recipe and write the report to | ||
| `/tmp/audit-{{suite}}.md`. | ||
| - Update `{{memory_path}}/runner-state.json` with detected findings, | ||
| including `fix_backlog` entries per `_fix-policy.md` (populated BEFORE | ||
| applying the `known_issues` filter to the report, so fixable findings | ||
| persist across runs even when their report row is suppressed). | ||
| - Do NOT attempt any fix. Do NOT create any branches, commits, or PRs. | ||
| - Do NOT modify any files outside `{{memory_path}}/`. | ||
| - A separate invocation will run the FIX phase if `fix_backlog` has | ||
| eligible candidates and the suite has a fix phase. | ||
| - Read the recipe in full for context; the "Fix phase" section informs | ||
| which finding categories should populate `fix_backlog`, but you must | ||
| not act on them in this invocation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| ## Phase directive | ||
|
|
||
| This invocation runs the **FIX** phase only. | ||
|
|
||
| - The audit phase has already completed in a previous invocation. Its | ||
| report is at `/tmp/audit-{{suite}}.md` and | ||
| `{{memory_path}}/runner-state.json` has the populated `fix_backlog`. | ||
| - Execute only the recipe's "Fix phase" section per `_fix-policy.md`. | ||
| Do NOT redo audit work; do NOT re-scan the codebase to rebuild | ||
| findings. | ||
| - Pick the highest-ranked eligible candidate from `fix_backlog`, apply | ||
| the fix, run the package's tests if applicable, commit, push, and open | ||
| the PR using `gh pr create --body-file`. | ||
| - Record the attempt in `attempted_fixes` (whether successful, abandoned, | ||
| or failed through the top-5 fallback) before exiting. | ||
| - If no candidate qualifies after trying up to 5 of them, exit cleanly, | ||
| append a short note to `/tmp/audit-{{suite}}.md` describing what was | ||
| tried, and update `attempted_fixes` accordingly. Do NOT open a PR. | ||
| - Do NOT delete branches, even on failure (per `_runner.md` and | ||
| `_fix-policy.md`). Leave them for the existing `pr-stale.yml` workflow | ||
| to reap over time. | ||
| - Read the recipe in full for context, but treat the audit phase as | ||
| already done. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.