Skip to content

feat(agentic-ci): decision-ready triage and daily PR fixes#600

Open
andreatgretel wants to merge 8 commits intomainfrom
andreatgretel/feat/agentic-ci-improvements
Open

feat(agentic-ci): decision-ready triage and daily PR fixes#600
andreatgretel wants to merge 8 commits intomainfrom
andreatgretel/feat/agentic-ci-improvements

Conversation

@andreatgretel
Copy link
Copy Markdown
Contributor

📋 Summary

Reorganize the weekly issue-triage report around recommended actions so each flagged item is decision-ready, and flip four of the five daily audit suites from read-only reports to opening one PR per run for the most important localized fix. The shared procedure (selection, ranking, allowlists, attempted-fixes lifecycle, two-strike escalation, branch/PR conventions) lives in a single _fix-policy.md; suite recipes declare only their eligible categories.

🔗 Related Issue

N/A — extends the agentic-CI work tracked in plan 472. We can link a follow-up tracking issue once one is opened.

🔄 Changes

  • New .agents/recipes/_fix-policy.md — universal localized-fix bar (≤3 files, ≤50 LOC, reversible, self-evident, test-safe, single-concern), per-suite path/command allowlists, finding-hash spec for stable cross-run identification, fix_backlog / attempted_fixes schema, ranking criteria (confidence > severity > impact > recency), draft-PR mode, two-strike escalation, standard fix procedure, direct gh pr create --body-file PR-creation pattern.
  • New .agents/recipes/_phase-audit.md and .agents/recipes/_phase-fix.md — phase directives prepended to each claude invocation so each call knows which phase it executes.
  • Rewrite .agents/recipes/issue-triage/recipe.md: action-organized buckets (Close as resolved, Close as duplicate, Needs maintainer decision, Ready for assignment, Stuck PR, Duplicate PRs, Stale, consider closing), per-row Action / Evidence / Rationale columns, healthy items collapsed to count + <details>, multi-comment split with :i/N markers and orphan reconciliation, plus a Repeatedly-failed fix attempts section that surfaces two-strike findings from the daily suites.
  • Add a fix phase to four suite recipes (eligibility tables only — procedure inherited from _fix-policy.md):
    • docs-and-references: broken-link, docstring-drift (signature-driven), arch-ref-rename. Non-draft.
    • structure: missing-future, lazy-import. Non-draft. Dead exports stay report-only.
    • dependencies: transitive-gap, unused. Non-draft.
    • code-quality: bare-except narrowing only. Draft PRs until draft_until_proven is flipped after two non-draft PRs land clean. TODO-line deletion explicitly forbidden.
  • test-health: explicit "no fix phase, all categories report-only" with a future-candidate note for test-isolation violations.
  • Update .github/workflows/agentic-ci-daily.yml: split the recipe step into two claude invocations (audit, then conditionally fix), each with the existing --max-turns 50. Fix step gated on audit success AND matrix.suite != 'test-health' AND non-empty fix_backlog. Adds a git-identity step, expands artifact upload to include the fix log + PR body, and reports both phase outcomes in the job summary. No branch auto-deletion.
  • Minor _runner.md updates: generalized branch prefix beyond chore, documented why CI uses gh pr create --body-file instead of /create-pr.

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • .agents/recipes/_fix-policy.md — load-bearing contract. Allowlists, ranking, two-strike escalation, and the standard fix procedure all live here. Worth reviewing in full.
  • .github/workflows/agentic-ci-daily.yml — the audit/fix split, the backlog gate (fromJSON(steps.backlog.outputs.size || '0') > 0), and the matrix.suite != 'test-health' guard.
  • One-time manual setup before merge: create labels agentic-ci, agentic-ci/docs-and-references, agentic-ci/structure, agentic-ci/dependencies, agentic-ci/code-quality. The recipes apply these via gh pr edit --add-label; missing labels won't block the PR opening but will surface a gh warning.

🧪 Testing

  • make test — N/A. No Python or other source code changed; the diff is recipes (markdown), workflow YAML, and one new policy doc.
  • Workflow YAML validated (python3 -c "import yaml; yaml.safe_load(...)")
  • Pre-commit hooks pass (trim whitespace, EOF, yaml, large files, merge conflicts, line endings)
  • Production validation plan — enable each suite via workflow_dispatch one at a time and read the first 1–2 actual runs before promoting to the next, per the validation section of the plan. Two-strike escalation, allowlist enforcement, and re-attempt blocking are all observable from the runner state and PR list.

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A; this extends agentic-CI infrastructure (originally in plan 472), and _fix-policy.md itself is the architecture doc for the fix phase.

Reorganize the weekly issue-triage report around recommended actions
(close as resolved, close as duplicate, needs maintainer decision,
ready for assignment, stuck PR, duplicate PRs, stale) so each flagged
item carries action + evidence + rationale and can be resolved without
opening it. Multi-comment split with i/N markers and orphan
reconciliation when the report grows or shrinks.

Flip the four daily audit suites with mechanical fix categories from
read-only reports to opening one PR per run:

- docs-and-references: broken-link, docstring-drift, arch-ref-rename
- structure: missing-future, lazy-import
- dependencies: transitive-gap, unused
- code-quality: bare-except (draft until landing rate proven)

test-health stays report-only (all candidates require inferring intent).

The shared procedure - fix_backlog selection, finding-hash spec for
stable cross-run identification, attempted_fixes lifecycle with
two-strike escalation, allowlists, ranking, branch/PR conventions -
lives in .agents/recipes/_fix-policy.md. Each suite recipe declares
only its eligible categories, branch types, and test requirements.

Workflow runs claude twice per suite (audit, then conditionally fix),
each capped at the existing --max-turns 50. Fix call is gated on
non-empty fix_backlog and skipped entirely for test-health.
@andreatgretel andreatgretel requested a review from a team as a code owner May 4, 2026 16:55
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

PR #600 Review — feat(agentic-ci): decision-ready triage and daily PR fixes

Summary

Reorganizes the weekly issue-triage report around recommended actions (seven exclusive buckets, per-row Action/Evidence/Rationale) and extends four of five daily audit suites (docs-and-references, structure, dependencies, code-quality) to open one PR per run for eligible, narrowly-scoped mechanical fixes. Shared rules live in a new .agents/recipes/_fix-policy.md; suite recipes declare only their eligible categories. test-health stays explicitly report-only. The workflow .github/workflows/agentic-ci-daily.yml splits the recipe step into audit + fix invocations, gated on audit success, non-test-health suite, and fix_backlog non-empty.

Diff is 607 adds / 118 dels, entirely in recipe markdown + one workflow YAML — no Python changes. Scope is consistent with the PR description.

Findings

Correctness / potential issues

  • make test-<package> name mapping is ambiguous. _fix-policy.md:26 and the fix procedure at _fix-policy.md:171-173 tell the fix phase to run make test-<package> for the affected package. The Makefile only defines test-config, test-engine, test-interface — but a recipe selecting e.g. packages/data-designer-config/pyproject.toml under the dependencies allowlist may plausibly infer make test-data-designer-config (which doesn't exist). Recommend adding an explicit mapping table in _fix-policy.md (path prefix → make target) so the recipe can't synthesize a non-existent target. This is the single most likely silent-abandon source on day one.

  • Check fix backlog step has no id: audit dependency, just if: success(). That's fine when Run audit recipe fails (whole job flips to failure), but if: success() does not distinguish between "audit ran and passed" and "audit was skipped". There is no current path to skip it, so not a bug today, but worth tightening to if: steps.audit.outcome == 'success' && matrix.suite != 'test-health' for robustness if a pre-audit gate is ever added.

  • fromJSON(steps.backlog.outputs.size || '0') when the step is skipped. If backlog is skipped (test-health), steps.backlog.outputs.size is empty string and the || '0' is required to keep fromJSON from erroring. Good. Consider applying the same pattern to AUDIT_OUTCOME / FIX_OUTCOME in the summary step — the default skipped/unknown render is currently fine, but a skipped backlog will show Fix backlog size: \`rather thann/a`. Minor.

  • Single-part vs numbered triage filename inconsistency. issue-triage/recipe.md:204-206 says each part is /tmp/issue-triage-report-1.md, ...-2.md, etc. The fallback note at recipe.md:265 says "the workflow's fallback step will post /tmp/issue-triage-report.md" (no suffix). In the single-part case, the file must exist under both names or the fallback loses the report. Either (a) instruct the recipe to write both, or (b) change the workflow fallback to prefer -1.md.

  • attempted_fixes reconciliation is underspecified. _fix-policy.md:158-159 says reconcile against open PRs at the start of each fix run; step 6 adds a hidden <!-- agentic-ci finding=<id> suite=<suite> --> marker in the PR body. But the reconciliation algorithm (how to match PRs to attempted_fixes entries by id) isn't described. On a crash after gh pr create but before the state write, the next run has no way to recover the entry without parsing that marker. Recommend one explicit sentence: "reconcile by scanning open PR bodies for the agentic-ci finding= marker and back-filling missing attempted_fixes entries".

  • fix_backlog.data schema is freeform. Each category describes its data shape in prose (e.g. docs-drift "signature-vs-Args delta", bare-except "proposed replacement type and grep evidence"). If two runs use slightly different key names, downstream reconciliation is brittle. A small schema table per category would harden this.

  • set -o pipefail is set in both the audit and fix shell steps — good. claude ... | tee will propagate non-zero correctly. No issue.

Style / conventions

  • Markdown tables in _fix-policy.md are dense but readable; good use of section headers. No issues.
  • Branch-naming pattern update in _runner.md is consistent with _fix-policy.md:167.
  • The explicit "TODO line deletion is forbidden" in code-quality (both in fix phase and Constraints) is a welcome belt-and-suspenders after the prior incident pattern.
  • Issue-triage multi-comment split logic (PATCH existing, post surplus, delete extras) is clean and idempotent.

Risk / operational

  • Cost. Cache is disabled (DISABLE_PROMPT_CACHING: "1") and the fix phase re-loads _phase-fix.md + _runner.md + _fix-policy.md + the recipe body per run. This roughly doubles per-suite prompt bytes on fix-eligible days. Acceptable, but worth a note in the rollout plan that cost per daily run approximately doubles when a backlog exists.
  • Draft-PR gate is governed by a runner-state boolean (draft_until_proven). Flipping it requires manually editing the cached state. That's fine as designed, but please document the flip procedure in the rollout note (the PR body already says "a maintainer flips the flag after two non-draft PRs land clean" — just spell out the exact file/key path).
  • Label pre-creation is a documented manual step. Missing labels only surface a gh warning; the PR still opens. Acceptable.
  • Two-strike surfacing in the triage report depends on access to the daily-suites' runner-state, which lives in per-suite caches on a different workflow. The recipe acknowledges this with a "may be empty" caveat, but that effectively neuters the escalation visibility until a shared-state mechanism is added. Worth tracking as a follow-up rather than blocking this PR.

Tests

  • N/A. No Python source changed. YAML is syntactically valid per the PR checklist; the changes are behavioral and only exercisable in production runs.

Verdict

Approve-with-comments. The design is coherent, the audit/fix split is well-isolated, and the escape hatches (localized-fix bar, allowlists, two-strike escalation, draft mode for the highest-risk suite) are appropriate. The one thing I'd resolve before merging is the make test-<package> mapping ambiguity — leaving it implicit invites day-one silent abandons. The other findings are nice-to-have tightening that can land as follow-ups.

Suggested pre-merge:

  1. Add an explicit path-prefix → make-target table to _fix-policy.md.
  2. Clarify the attempted_fixes PR-marker reconciliation algorithm in one sentence.
  3. Resolve the single-part triage filename (/tmp/issue-triage-report.md vs -1.md) inconsistency.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR promotes four daily audit suites from read-only reporters to active fixers, centralizing the shared fix contract in _fix-policy.md and splitting each suite's CI invocation into a gated two-phase audit → fix run. All previously-flagged issues (timeout budget, git identity RFC-5322 validity, two-strike/FIFO cap contradiction, multi-part fallback truncation, --paginate per-page jq output, and fallback posting all parts instead of only missing ones) are resolved in the current diff.

Confidence Score: 5/5

Safe to merge — no P0 or P1 findings; all prior thread issues are addressed in this diff.

Every previously-flagged blocking issue has a concrete fix present: 40-minute timeout, valid RFC-5322 bot identity, two-strike cap exemption, identity-based fallback deduplication with test()-guarded capture(), and MISSING-only posting. No new logic bugs were found in the workflow gating, the jq expression, the runner-state schema, or the recipe eligibility rules.

No files require special attention beyond the one-time label setup noted in the PR description.

Important Files Changed

Filename Overview
.agents/recipes/_fix-policy.md New load-bearing policy document; localized-fix bar, per-suite allowlists, finding-hash spec, ranking table, two-strike/cap eviction logic, and standard fix procedure all correct; two-strike cap exemption previously flagged is present in its fixed form.
.github/workflows/agentic-ci-daily.yml Audit/fix split well-implemented: timeout doubled to 40 min, git identity uses valid RFC-5322 noreply address, backlog gate correctly handles skipped/failed steps, fix step correctly gated on audit success + non-test-health + non-empty backlog.
.github/workflows/agentic-ci-issue-triage.yml Fallback correctly posts only MISSING parts (not all parts); test() guard before capture() prevents jq stream truncation on non-matching comments; --paginate multi-page issue resolved by emitting individual part indices rather than per-page counts.
.agents/recipes/issue-triage/recipe.md Rewritten around action buckets; multi-part split on bucket boundaries with correct part numbering; PATCH/DELETE reconciliation in step 5 keeps coherent comment set; fallback section correctly scoped to numbered files.
.agents/recipes/code-quality/recipe.md Fix phase added for bare-except narrowing only; draft_until_proven default true; TODO deletion explicitly forbidden; grep updated to catch both except: and except BaseException:.

Sequence Diagram

sequenceDiagram
    participant GHA as GitHub Actions Runner
    participant Audit as claude (audit phase)
    participant State as runner-state.json
    participant Fix as claude (fix phase)
    participant GH as GitHub API

    GHA->>GHA: Checkout main + restore cache
    GHA->>Audit: _phase-audit.md + _runner.md + _fix-policy.md + recipe
    Audit->>State: Write fix_backlog (before known_issues filter)
    Audit->>GHA: /tmp/audit-{suite}.md

    GHA->>State: jq .fix_backlog length → BACKLOG_SIZE
    alt suite==test-health OR BACKLOG_SIZE==0 OR audit failed
        GHA->>GHA: Skip fix phase
    else eligible backlog entries exist
        GHA->>Fix: _phase-fix.md + _runner.md + _fix-policy.md + recipe
        Fix->>GH: gh pr list (reconcile attempted_fixes)
        Fix->>Fix: Rank and select top candidate
        Fix->>Fix: Re-verify finding still applies
        Fix->>Fix: Apply fix (<=3 files, <=50 LOC)
        Fix->>GHA: git commit + push branch
        Fix->>GH: gh pr create --body-file /tmp/pr-body-{suite}.md
        Fix->>State: Record attempted_fixes outcome=open
    end

    GHA->>State: Update last_run (audit gate only)
    GHA->>GHA: Upload artifacts (audit log, fix log, PR body)
Loading

Reviews (6): Last reviewed commit: "fix(agentic-ci): close Greptile pass-3 P..." | Re-trigger Greptile

Comment thread .github/workflows/agentic-ci-daily.yml Outdated
Comment thread .agents/recipes/_fix-policy.md
Comment thread .agents/recipes/issue-triage/recipe.md Outdated
- Map per-package test targets explicitly in _fix-policy.md (Makefile
  exposes test-config/test-engine/test-interface, not test-<package>).
- Use github-actions[bot] noreply identity for commits the recipes
  produce.
- Refresh fix_backlog.data when an id already exists so the fix phase
  cannot drive a PR from stale data after the underlying file changed.
- Stop time-pruning closed/abandoned attempted_fixes entries — pruning
  before the two-strike threshold erases the history needed to
  escalate. Single-strike entries now age out only via the 200-entry
  cap.
- Disambiguate bare-except findings within the same function by
  including a try-body hash in the finding id.
- Audit grep for code-quality now matches both `except:` and
  `except BaseException:`, in parity with the fix eligibility.
- Restrict transitive-gap fix eligibility to cases where a sibling
  package already declares the dep (avoids inventing version
  specifiers from scratch).
- Issue-triage workflow handles multi-part reports in both the fallback
  post step and the job summary; recipe always writes numbered parts.
Comment thread .github/workflows/agentic-ci-issue-triage.yml Outdated
andreatgretel and others added 5 commits May 4, 2026 17:23
- Replace remaining `make test-<package>` references with pointers to
  the mapping table; only the table itself uses that placeholder now.
- Fix `gh api --paginate | jq | length` returning per-page counts: slurp
  with `jq -s 'add // 0'` to get a single total.
- Compare posted-comment count to expected part count so a partial post
  (agent posted part 1 but not 2/3) triggers the fallback instead of
  being silently treated as success.
- Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're
  not at the mercy of the runner's default shell.
- Disambiguate bare-except findings whose try-body hashes collide by
  adding a per-function ordinal to the canonical_key.
- Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at`
  (the schema has no `first_seen` field).
…back

Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an
identity-based check that extracts every i/N marker seen in
today-dated bot comments and verifies each expected i is present.
A duplicate post of one part can no longer mask a missing other.
- Exempt two-strike attempted_fixes entries from the 200-entry cap
  eviction. Cap now evicts non-two-strike oldest-first by
  attempts[0].at; two-strike entries are silently-forgotten only in
  the pathological all-200-are-two-strike case (itself a signal).
- Specify the attempted_fixes PR-marker reconciliation algorithm:
  scan open PR bodies for the `<!-- agentic-ci finding=<id> -->`
  marker and back-fill missing entries.
- Tighten the daily workflow conditionals to gate on explicit step
  outcomes (steps.audit.outcome == 'success' rather than success())
  so a future pre-audit gate cannot accidentally trip the fix step.
…ording)

- Bump daily-suite job timeout from 20 to 40 minutes. The split into
  two sequential `claude --max-turns 50` invocations can saturate a
  20-minute budget; a mid-fix SIGTERM would leave an orphaned branch
  and inconsistent runner-state.
- Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids
  rebuilding fix_backlog from scratch but does NOT override the
  per-candidate re-verification step required by _fix-policy.md
  step 4.1 (re-grep / re-read the specific file the candidate points
  at). Single-candidate re-verification is required; whole-codebase
  re-scanning is forbidden.
Comment thread .github/workflows/agentic-ci-issue-triage.yml
Comment thread .github/workflows/agentic-ci-issue-triage.yml
- Guard `jq capture()` with a `test()` select. `capture()` errors on
  non-match instead of returning empty, which would truncate
  SEEN_PARTS if any unrelated today-dated bot comment lacks the
  triage marker (e.g. from a sibling workflow). Adding the test()
  guard ensures capture() only runs on bodies that already match.
- Iterate the MISSING[] array when posting fallback parts, not the
  full PARTS[] array. Posting all parts when only some were missing
  was creating duplicate comments for the parts the agent already
  successfully posted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant