feat(agentic-ci): decision-ready triage and daily PR fixes#600
feat(agentic-ci): decision-ready triage and daily PR fixes#600andreatgretel wants to merge 8 commits intomainfrom
Conversation
Reorganize the weekly issue-triage report around recommended actions (close as resolved, close as duplicate, needs maintainer decision, ready for assignment, stuck PR, duplicate PRs, stale) so each flagged item carries action + evidence + rationale and can be resolved without opening it. Multi-comment split with i/N markers and orphan reconciliation when the report grows or shrinks. Flip the four daily audit suites with mechanical fix categories from read-only reports to opening one PR per run: - docs-and-references: broken-link, docstring-drift, arch-ref-rename - structure: missing-future, lazy-import - dependencies: transitive-gap, unused - code-quality: bare-except (draft until landing rate proven) test-health stays report-only (all candidates require inferring intent). The shared procedure - fix_backlog selection, finding-hash spec for stable cross-run identification, attempted_fixes lifecycle with two-strike escalation, allowlists, ranking, branch/PR conventions - lives in .agents/recipes/_fix-policy.md. Each suite recipe declares only its eligible categories, branch types, and test requirements. Workflow runs claude twice per suite (audit, then conditionally fix), each capped at the existing --max-turns 50. Fix call is gated on non-empty fix_backlog and skipped entirely for test-health.
PR #600 Review —
|
Greptile SummaryThis PR promotes four daily audit suites from read-only reporters to active fixers, centralizing the shared fix contract in
|
| Filename | Overview |
|---|---|
| .agents/recipes/_fix-policy.md | New load-bearing policy document; localized-fix bar, per-suite allowlists, finding-hash spec, ranking table, two-strike/cap eviction logic, and standard fix procedure all correct; two-strike cap exemption previously flagged is present in its fixed form. |
| .github/workflows/agentic-ci-daily.yml | Audit/fix split well-implemented: timeout doubled to 40 min, git identity uses valid RFC-5322 noreply address, backlog gate correctly handles skipped/failed steps, fix step correctly gated on audit success + non-test-health + non-empty backlog. |
| .github/workflows/agentic-ci-issue-triage.yml | Fallback correctly posts only MISSING parts (not all parts); test() guard before capture() prevents jq stream truncation on non-matching comments; --paginate multi-page issue resolved by emitting individual part indices rather than per-page counts. |
| .agents/recipes/issue-triage/recipe.md | Rewritten around action buckets; multi-part split on bucket boundaries with correct part numbering; PATCH/DELETE reconciliation in step 5 keeps coherent comment set; fallback section correctly scoped to numbered files. |
| .agents/recipes/code-quality/recipe.md | Fix phase added for bare-except narrowing only; draft_until_proven default true; TODO deletion explicitly forbidden; grep updated to catch both except: and except BaseException:. |
Sequence Diagram
sequenceDiagram
participant GHA as GitHub Actions Runner
participant Audit as claude (audit phase)
participant State as runner-state.json
participant Fix as claude (fix phase)
participant GH as GitHub API
GHA->>GHA: Checkout main + restore cache
GHA->>Audit: _phase-audit.md + _runner.md + _fix-policy.md + recipe
Audit->>State: Write fix_backlog (before known_issues filter)
Audit->>GHA: /tmp/audit-{suite}.md
GHA->>State: jq .fix_backlog length → BACKLOG_SIZE
alt suite==test-health OR BACKLOG_SIZE==0 OR audit failed
GHA->>GHA: Skip fix phase
else eligible backlog entries exist
GHA->>Fix: _phase-fix.md + _runner.md + _fix-policy.md + recipe
Fix->>GH: gh pr list (reconcile attempted_fixes)
Fix->>Fix: Rank and select top candidate
Fix->>Fix: Re-verify finding still applies
Fix->>Fix: Apply fix (<=3 files, <=50 LOC)
Fix->>GHA: git commit + push branch
Fix->>GH: gh pr create --body-file /tmp/pr-body-{suite}.md
Fix->>State: Record attempted_fixes outcome=open
end
GHA->>State: Update last_run (audit gate only)
GHA->>GHA: Upload artifacts (audit log, fix log, PR body)
Reviews (6): Last reviewed commit: "fix(agentic-ci): close Greptile pass-3 P..." | Re-trigger Greptile
- Map per-package test targets explicitly in _fix-policy.md (Makefile exposes test-config/test-engine/test-interface, not test-<package>). - Use github-actions[bot] noreply identity for commits the recipes produce. - Refresh fix_backlog.data when an id already exists so the fix phase cannot drive a PR from stale data after the underlying file changed. - Stop time-pruning closed/abandoned attempted_fixes entries — pruning before the two-strike threshold erases the history needed to escalate. Single-strike entries now age out only via the 200-entry cap. - Disambiguate bare-except findings within the same function by including a try-body hash in the finding id. - Audit grep for code-quality now matches both `except:` and `except BaseException:`, in parity with the fix eligibility. - Restrict transitive-gap fix eligibility to cases where a sibling package already declares the dep (avoids inventing version specifiers from scratch). - Issue-triage workflow handles multi-part reports in both the fallback post step and the job summary; recipe always writes numbered parts.
- Replace remaining `make test-<package>` references with pointers to the mapping table; only the table itself uses that placeholder now. - Fix `gh api --paginate | jq | length` returning per-page counts: slurp with `jq -s 'add // 0'` to get a single total. - Compare posted-comment count to expected part count so a partial post (agent posted part 1 but not 2/3) triggers the fallback instead of being silently treated as success. - Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're not at the mercy of the runner's default shell. - Disambiguate bare-except findings whose try-body hashes collide by adding a per-function ordinal to the canonical_key. - Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at` (the schema has no `first_seen` field).
…back Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an identity-based check that extracts every i/N marker seen in today-dated bot comments and verifies each expected i is present. A duplicate post of one part can no longer mask a missing other.
- Exempt two-strike attempted_fixes entries from the 200-entry cap eviction. Cap now evicts non-two-strike oldest-first by attempts[0].at; two-strike entries are silently-forgotten only in the pathological all-200-are-two-strike case (itself a signal). - Specify the attempted_fixes PR-marker reconciliation algorithm: scan open PR bodies for the `<!-- agentic-ci finding=<id> -->` marker and back-fill missing entries. - Tighten the daily workflow conditionals to gate on explicit step outcomes (steps.audit.outcome == 'success' rather than success()) so a future pre-audit gate cannot accidentally trip the fix step.
…ording) - Bump daily-suite job timeout from 20 to 40 minutes. The split into two sequential `claude --max-turns 50` invocations can saturate a 20-minute budget; a mid-fix SIGTERM would leave an orphaned branch and inconsistent runner-state. - Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids rebuilding fix_backlog from scratch but does NOT override the per-candidate re-verification step required by _fix-policy.md step 4.1 (re-grep / re-read the specific file the candidate points at). Single-candidate re-verification is required; whole-codebase re-scanning is forbidden.
- Guard `jq capture()` with a `test()` select. `capture()` errors on non-match instead of returning empty, which would truncate SEEN_PARTS if any unrelated today-dated bot comment lacks the triage marker (e.g. from a sibling workflow). Adding the test() guard ensures capture() only runs on bodies that already match. - Iterate the MISSING[] array when posting fallback parts, not the full PARTS[] array. Posting all parts when only some were missing was creating duplicate comments for the parts the agent already successfully posted.
📋 Summary
Reorganize the weekly issue-triage report around recommended actions so each flagged item is decision-ready, and flip four of the five daily audit suites from read-only reports to opening one PR per run for the most important localized fix. The shared procedure (selection, ranking, allowlists, attempted-fixes lifecycle, two-strike escalation, branch/PR conventions) lives in a single
_fix-policy.md; suite recipes declare only their eligible categories.🔗 Related Issue
N/A — extends the agentic-CI work tracked in plan 472. We can link a follow-up tracking issue once one is opened.
🔄 Changes
.agents/recipes/_fix-policy.md— universal localized-fix bar (≤3 files, ≤50 LOC, reversible, self-evident, test-safe, single-concern), per-suite path/command allowlists, finding-hash spec for stable cross-run identification,fix_backlog/attempted_fixesschema, ranking criteria (confidence > severity > impact > recency), draft-PR mode, two-strike escalation, standard fix procedure, directgh pr create --body-filePR-creation pattern..agents/recipes/_phase-audit.mdand.agents/recipes/_phase-fix.md— phase directives prepended to eachclaudeinvocation so each call knows which phase it executes..agents/recipes/issue-triage/recipe.md: action-organized buckets (Close as resolved,Close as duplicate,Needs maintainer decision,Ready for assignment,Stuck PR,Duplicate PRs,Stale, consider closing), per-row Action / Evidence / Rationale columns, healthy items collapsed to count +<details>, multi-comment split with:i/Nmarkers and orphan reconciliation, plus aRepeatedly-failed fix attemptssection that surfaces two-strike findings from the daily suites._fix-policy.md):docs-and-references: broken-link, docstring-drift (signature-driven), arch-ref-rename. Non-draft.structure: missing-future, lazy-import. Non-draft. Dead exports stay report-only.dependencies: transitive-gap, unused. Non-draft.code-quality: bare-except narrowing only. Draft PRs untildraft_until_provenis flipped after two non-draft PRs land clean. TODO-line deletion explicitly forbidden.test-health: explicit "no fix phase, all categories report-only" with a future-candidate note for test-isolation violations..github/workflows/agentic-ci-daily.yml: split the recipe step into twoclaudeinvocations (audit, then conditionally fix), each with the existing--max-turns 50. Fix step gated on audit success ANDmatrix.suite != 'test-health'AND non-emptyfix_backlog. Adds a git-identity step, expands artifact upload to include the fix log + PR body, and reports both phase outcomes in the job summary. No branch auto-deletion._runner.mdupdates: generalized branch prefix beyondchore, documented why CI usesgh pr create --body-fileinstead of/create-pr.🔍 Attention Areas
.agents/recipes/_fix-policy.md— load-bearing contract. Allowlists, ranking, two-strike escalation, and the standard fix procedure all live here. Worth reviewing in full..github/workflows/agentic-ci-daily.yml— the audit/fix split, the backlog gate (fromJSON(steps.backlog.outputs.size || '0') > 0), and thematrix.suite != 'test-health'guard.agentic-ci,agentic-ci/docs-and-references,agentic-ci/structure,agentic-ci/dependencies,agentic-ci/code-quality. The recipes apply these viagh pr edit --add-label; missing labels won't block the PR opening but will surface aghwarning.🧪 Testing
make test— N/A. No Python or other source code changed; the diff is recipes (markdown), workflow YAML, and one new policy doc.python3 -c "import yaml; yaml.safe_load(...)")workflow_dispatchone at a time and read the first 1–2 actual runs before promoting to the next, per the validation section of the plan. Two-strike escalation, allowlist enforcement, and re-attempt blocking are all observable from the runner state and PR list.✅ Checklist
_fix-policy.mditself is the architecture doc for the fix phase.