-
Notifications
You must be signed in to change notification settings - Fork 290
Description
This issue tracks the operational health of all 166 agentic workflows in this repository as of 2026-03-12.
Summary
| Category | Count | % |
|---|---|---|
| ✅ Healthy | 157 | 95% |
| 4 | 2% | |
| ❌ Critical | 5 | 3% |
| 🔇 Inactive | 0 | 0% |
Overall Health Score: 66/100 (↓6 from 72 — Smoke Copilot regressed; Smoke Gemini recovered)
Critical Issues 🚨
P1: Lockdown Token Missing — 4 Workflows (Day 16+)
All four workflows fail consistently due to lockdown: true requiring GH_AW_GITHUB_TOKEN which is not configured as a repository secret.
| Workflow | Frequency | Run # | Error |
|---|---|---|---|
| Issue Monster | Every 30 min | #2733 (today) | Lockdown token missing |
| PR Triage Agent | Every 6h | #196 (today) | Lockdown token missing |
| Daily Issues Report | Daily | #130 (today) | Lockdown token missing |
| Org Health Report | Weekly | #28 (Mar 9) | Lockdown token missing |
- Action: New tracking issue [P1] Lockdown token failures: Issue Monster, PR Triage Agent, Daily Issues Report, Org Health Report #20643 created (previous [P1] Lockdown token failures: Issue Monster, PR Triage Agent, Daily Issues Report #20315 expired)
- Root cause:
GH_AW_GITHUB_TOKENsecret not provisioned; all programmatic fix paths closed - Priority: P1 — requires manual admin intervention to provision secret
P1: Smoke Copilot — New Regression (Day 1, since Mar 11 ~13:10 UTC)
Smoke Copilot has 10+ consecutive failures due to send_slack_message handler not being loaded in safe-outputs. The actual smoke tests pass (11/11 applicable tests), but the safe-outputs job fails.
| Run | Time | Conclusion |
|---|---|---|
| #2330 | 2026-03-12T01:12Z | failure |
| #2329 | 2026-03-11T19:34Z | failure |
| ... 8 more | (all Mar 11-12) | failure |
- Action: Issue [P1] Smoke Copilot: send_slack_message handler not loaded — 10+ consecutive failures #20644 created
- Error:
No handler loaded for message type 'send_slack_message' - Note: Smoke tests pass 11/11 — this is a safe-outputs infra issue
- Priority: P1 — appears as 100% failure rate; misleading health signal
Warnings ⚠️
P2 Issues with Tracking (3 workflows)
| Workflow | Issue | Error | Latest Run |
|---|---|---|---|
| Duplicate Code Detector | #20304 | 401 Unauthorized downloading actions/github-script |
#231 (Mar 12, failure — re-regressed after recovery on Mar 11) |
| Smoke Update Cross-Repo PR | #20288 | Pre-agent failure | #128 (Mar 12, failure) |
| Smoke Codex | #20285 | Intermittent failure | #2225 (Mar 12, 1 failure / 10 runs) — mostly healthy |
Recoveries 🎉
| Workflow | Previous State | Current State | Run |
|---|---|---|---|
| Smoke Gemini | ✅ RECOVERED | #332 Mar 12 success |
Compilation Status ✅
- 166/166 workflows compiled successfully
- 0 missing lock files
- 0 outdated lock files
- 36 shared include files (
.github/workflows/shared/*.md) — intentionally not compiled
Systemic Issues
GH_AW_GITHUB_TOKEN Missing (P1 — Ongoing Day 16+)
- Affected: Issue Monster, PR Triage Agent, Daily Issues Report, Org Health Report
- Pattern: All use
lockdown: truerequiring a custom GitHub token - Status: All programmatic fix paths closed — requires admin action
- Impact: ~50+ failed runs/day across 4 workflows
Safe Outputs Handler Registration (P2 — Potentially Systemic)
- Affected: Smoke Copilot (confirmed), possibly other workflows using
send_slack_message - Pattern:
send_slack_messagehandler not loaded despite workflow using it - Possible cause: Recent safe-outputs changes on Mar 11 (commits at 18:07 and 20:15)
Healthy Workflows ✅
157 workflows operating normally, including:
- Smoke Claude ✅ | Smoke Codex ✅ (mostly) | Smoke Gemini ✅ (RECOVERED)
- Metrics Collector ✅ | Agentic Maintenance ✅ | Chroma Issue Indexer ✅
- Auto-Triage Issues ✅ | Bot Detection ✅ | Contribution Check ✅
- Static Analysis Report ✅ | AI Moderator ✅ (mostly healthy)
Trends (7-Day)
| Date | Score | Key Events |
|---|---|---|
| Mar 7 | 74/100 | Codex issues begin |
| Mar 9 | 72/100 | Multiple P2 failures |
| Mar 10 | 70/100 | Codex + lockdown + smoke failures |
| Mar 11 | 72/100 | ↑ Codex + Duplicate Code recovered |
| Mar 12 | 66/100 | ↓ Smoke Copilot regressed, Smoke Gemini recovered |
Recommendations
- URGENT: Provision
GH_AW_GITHUB_TOKENsecret — 4 workflows blocked daily ([P1] Lockdown token failures: Issue Monster, PR Triage Agent, Daily Issues Report, Org Health Report #20643) - HIGH: Fix
send_slack_messagehandler in Smoke Copilot — 10+ consecutive failures ([P1] Smoke Copilot: send_slack_message handler not loaded — 10+ consecutive failures #20644) - MEDIUM: Investigate Duplicate Code Detector re-regression — 401 on
actions/github-script([aw] Duplicate Code Detector failed #20304) - MEDIUM: Continue investigating Smoke Update Cross-Repo PR ([aw] Smoke Update Cross-Repo PR failed #20288)
- LOW: Smoke Codex intermittent failure — monitor, likely transient ([aw] Smoke Codex failed #20285)
Last updated: 2026-03-12T07:30Z
Run: §22991171595
Next check: 2026-03-13T07:00Z
Related to #19352
Related to #19352
Generated by Workflow Health Manager - Meta-Orchestrator · ◷
- expires on Mar 13, 2026, 7:45 AM UTC