-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat(skills): add maintainer loop skills for automated PR triage and salvage #1585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ca5539f
0d37065
b9dcc41
200e78f
3a9f6ec
43002be
badbe78
ade03ce
647c0a4
91b71fb
2f223b2
bd9264c
cd9acad
4eb088f
f07051c
aa855bf
84ab75b
49c96af
9f90cc9
14c0305
869170b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # Hotspots Workflow | ||
|
|
||
| Find files hurting throughput and reduce their future blast radius. | ||
|
|
||
| ## Step 1: Run the Hotspot Script | ||
|
|
||
| ```bash | ||
| node --experimental-strip-types --no-warnings .agents/skills/nemoclaw-maintainer-day/scripts/hotspots.ts | ||
| ``` | ||
|
|
||
| This combines 30-day git churn on `main` with open PR file overlap, flags risky areas, and outputs a ranked JSON list. | ||
|
|
||
| Pipe into state: | ||
|
|
||
| ```bash | ||
| node --experimental-strip-types --no-warnings .agents/skills/nemoclaw-maintainer-day/scripts/hotspots.ts | node --experimental-strip-types --no-warnings .agents/skills/nemoclaw-maintainer-day/scripts/state.ts set-hotspots | ||
| ``` | ||
|
|
||
| ## Step 2: Prioritize | ||
|
|
||
| Review the ranked output. Most urgent: high `combinedScore` + `isRisky: true` + weak tests. | ||
|
|
||
| ## Step 3: Choose Cooling Strategy | ||
|
|
||
| Smallest change to reduce future collisions: | ||
|
|
||
| - extract stable logic from giant file into tested helper | ||
| - split parsing from execution | ||
| - add regression tests around repeated breakage | ||
| - deduplicate workflow logic | ||
| - narrow interfaces with typed helpers | ||
|
|
||
| Prefer changes that also improve testability. | ||
|
|
||
| ## Step 4: Keep Small | ||
|
|
||
| One file cluster per pass. Stop if next step is large redesign → follow [SEQUENCE-WORK.md](SEQUENCE-WORK.md). | ||
|
|
||
| ## Step 5: Validate | ||
|
|
||
| Run relevant tests. If risky code, also follow [TEST-GAPS.md](TEST-GAPS.md). | ||
|
|
||
| ## Notes | ||
|
|
||
| - Goal is lower future merge pain, not aesthetic cleanup. | ||
| - No giant refactors inside contributor PRs. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| # Merge Gate Workflow | ||
|
|
||
| Run the last maintainer check before approval. Never merge automatically. | ||
|
|
||
| ## Gates | ||
|
|
||
| For the full priority list see [PR-REVIEW-PRIORITIES.md](PR-REVIEW-PRIORITIES.md). A PR is approval-ready only when **all** hard gates pass: | ||
|
|
||
| 1. **CI green** — all required checks in `statusCheckRollup`. | ||
| 2. **No conflicts** — `mergeStateStatus` clean. | ||
| 3. **No major CodeRabbit** — ignore style nits; block on correctness/security bugs. | ||
| 4. **Risky code tested** — see [RISKY-AREAS.md](RISKY-AREAS.md). Confirm tests exist (added or pre-existing). | ||
|
|
||
| ## Step 1: Run the Gate Checker | ||
|
|
||
| ```bash | ||
| node --experimental-strip-types --no-warnings .agents/skills/nemoclaw-maintainer-day/scripts/check-gates.ts <pr-number> | ||
| ``` | ||
|
|
||
| This checks all 4 gates programmatically and returns structured JSON with `allPass` and per-gate `pass`/`details`. | ||
|
|
||
| ## Step 2: Interpret Results | ||
|
|
||
| The script handles the deterministic checks. You handle judgment calls: | ||
|
|
||
| - **Conflicts (DIRTY):** Do NOT approve — GitHub invalidates approvals when new commits are pushed. Salvage first (rebase), wait for CI, then re-run the gate checker. Follow [SALVAGE-PR.md](SALVAGE-PR.md). | ||
| - **CI failing but narrow:** Follow the salvage workflow in [SALVAGE-PR.md](SALVAGE-PR.md). | ||
| - **CI pending:** Wait and re-check. Do not approve while checks are still running. | ||
| - **CodeRabbit:** Script flags unresolved major/critical threads. Review the `snippet` to confirm it's a real issue vs style nit. If doubt, leave unapproved. | ||
| - **Tests:** If `riskyCodeTested.pass` is false, follow [TEST-GAPS.md](TEST-GAPS.md). | ||
|
|
||
| ## Step 3: Approve or Report | ||
|
|
||
| **Approve only when:** `allPass` is true AND `mergeStateStatus` is not DIRTY. Approving a PR with conflicts is wasted effort — the rebase will invalidate the approval. | ||
|
|
||
| The correct sequence for a conflicted PR: **salvage (rebase) → CI green → approve → report ready for merge.** | ||
|
|
||
| **All pass + no conflicts:** Approve and summarize why. | ||
|
|
||
| **Any fail:** | ||
|
|
||
| | Gate | Status | What is needed | | ||
| |------|--------|----------------| | ||
| | CI | Failing | Fix flaky timeout test | | ||
| | Conflicts | DIRTY | Rebase onto main first — approval would be invalidated | | ||
|
|
||
| Use full GitHub links. | ||
|
Comment on lines
+1
to
+47
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generated skill file should not be edited directly. This path is designated as autogenerated. Please apply content changes in the source docs and regenerate via As per coding guidelines, "Files in .agents/skills/nemoclaw-/.md are autogenerated by scripts/docs-to-skills.py and must never be edited directly". 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # PR Review Priorities | ||
|
|
||
| Ordered list of what NemoClaw maintainers look for in a pull request. Higher items block approval; lower items inform queue ranking. | ||
|
|
||
| ## Hard gates (all must pass to approve) | ||
|
|
||
| 1. **Security correctness** — no sandbox escape, SSRF, credential exposure, policy bypass, or installer trust violation. PRs touching risky areas (see [RISKY-AREAS.md](RISKY-AREAS.md)) get a deep security pass before anything else. | ||
| 2. **CI green** — all required checks in `statusCheckRollup` must pass. | ||
| 3. **No merge conflicts** — `mergeStateStatus` must be clean. | ||
| 4. **No unresolved major/critical CodeRabbit findings** — correctness and safety findings block; style nits do not. Use judgment on borderline cases. | ||
| 5. **Tests for touched risky code** — risky areas must have test coverage, either added in the PR or pre-existing. No exceptions. | ||
|
|
||
| ## Quality expectations (block if violated, but fixable via salvage) | ||
|
|
||
| 1. **Narrow scope** — each PR has one clear objective. Unrelated config changes, drive-by refactors, and tool setting diffs get reverted to `main`. | ||
| 2. **Contributor intent preserved** — the fix must match what the contributor intended. Stop and ask when the diff would change semantics or when intent is unclear. | ||
| 3. **Small, mergeable changes** — prefer substrate-first slicing: extract helper, add tests for current behavior, land fix on top. One file cluster per pass. If the next step is a large redesign, route to sequencing. | ||
|
|
||
| ## Queue ranking signals (inform priority, not approval) | ||
|
|
||
| 1. **Actionability** — PRs closest to done rank highest. A merge-ready PR outranks a near-miss; a near-miss outranks a blocked item. | ||
| 2. **Security-sensitive and actionable** — PRs touching risky code get a priority bump, but only when they are not otherwise blocked. | ||
| 3. **Staleness** — PRs idle for more than 7 days get a mild bump to prevent rot. | ||
| 4. **Hotspot relief** — PRs that reduce future conflict pressure in high-churn files are preferred over equivalent work elsewhere. | ||
|
|
||
| ## Daily cadence | ||
|
|
||
| The team follows a daily ship cycle. All maintainer skills operate within this rhythm. | ||
|
|
||
| 1. **Morning** (`/nemoclaw-maintainer-morning`) — triage the backlog, pick items for the day, label them with the target version (e.g., `v0.0.8`). | ||
| 2. **During the day** (`/nemoclaw-maintainer-day`) — land PRs using the maintainer loop. Version labels make progress visible on dashboards. | ||
| 3. **Evening** (`/nemoclaw-maintainer-evening`) — check what shipped, bump open items to the next version (`v0.0.9`), generate a QA-focused summary, and cut the tag. | ||
| 4. **Overnight** — QA team (different timezone) tests the tag. Any issues they file enter the next morning's triage like any other issue. | ||
|
|
||
| Version labels are living markers: they always mean "ship in this version." If an item doesn't make the cut, the label moves to the next patch version. | ||
|
|
||
| ## Explicitly not priorities | ||
|
|
||
| - **Code style and formatting** — not a reason to block or delay. No opportunistic reformatting. | ||
| - **Documentation completeness** — not required for approval unless the PR changes user-facing behavior. | ||
| - **Architectural elegance** — the goal is lower future merge pain, not aesthetic cleanup. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # NemoClaw Risky Code Areas | ||
|
|
||
| PRs touching these areas need tests before approval. | ||
|
|
||
| | Area | Key paths | | ||
| |------|-----------| | ||
| | Installer / bootstrap shell | `install.sh`, `setup.sh`, `brev-setup.sh`, `scripts/*.sh` | | ||
| | Onboarding / host glue | `bin/lib/onboard.js`, `bin/*.js` | | ||
| | Sandbox / policy / SSRF | `nemoclaw/src/blueprint/`, `nemoclaw-blueprint/`, policy presets | | ||
| | Workflow / enforcement | `.github/workflows/`, prek hooks, DCO, signing, version/tag flows | | ||
| | Credentials / inference / network | credential helpers, inference provider routing, approval flows | | ||
|
|
||
| A PR in a risky area is only promoted in the queue when it is actually actionable. | ||
| If risky and under-tested, follow the test gaps or security sweep workflows. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| # Salvage PR Workflow | ||
|
|
||
| Take one near-mergeable PR and make the smallest safe change to unblock it. | ||
|
|
||
| **Default to maintainer salvage.** When a maintainer picks an item from the queue, the assumption is they're doing the work now — rebase, fix conflicts, add missing tests, push. Do not default to "ask the contributor and wait" because that blocks the daily cadence. Only defer to the contributor when the fix requires understanding intent that isn't clear from the diff. | ||
|
|
||
| ## Step 1: Gather Context | ||
|
|
||
| ```bash | ||
| gh pr view <number> --repo NVIDIA/NemoClaw \ | ||
| --json number,title,url,body,baseRefName,headRefName,author,files,commits,comments,reviews,statusCheckRollup,mergeStateStatus,reviewDecision | ||
|
|
||
| gh pr diff <number> --repo NVIDIA/NemoClaw | ||
| ``` | ||
|
|
||
| Also read: maintainer and CodeRabbit comments, linked issues, recent `main` changes in touched files. Understand the PR's purpose before coding. | ||
|
|
||
| ## Step 2: Assess Fit | ||
|
|
||
| **Maintainer does it now:** rebase and resolve conflicts, add missing tests for risky code, fix one or two failing checks, apply small correctness fixes from review, narrow gate cleanup. | ||
|
|
||
| **Defer to contributor only when:** the fix requires a design change the maintainer can't judge from the diff, contributor intent is ambiguous and the wrong guess would change semantics, or the PR spans multiple subsystems the maintainer isn't familiar with. | ||
|
|
||
| ## Step 3: Check Out and Reproduce | ||
|
|
||
| ```bash | ||
| gh pr checkout <number> | ||
| git fetch origin --prune | ||
| ``` | ||
|
|
||
| Reproduce locally. Run narrowest relevant commands first. | ||
|
|
||
| ## Step 4: Review PR Scope Before Fixing | ||
|
|
||
| Before fixing, review **all** changed files in the PR — not just the ones causing failures. Flag any files that expand the PR's scope unnecessarily (config changes, unrelated refactors, tool settings). Revert those to `main` if they aren't needed for the feature to work. | ||
|
|
||
| ## Step 5: Fix Narrowly | ||
|
|
||
| Smallest change that clears the blocker. No opportunistic reformatting. | ||
|
|
||
| If risky code is touched (see [RISKY-AREAS.md](RISKY-AREAS.md)), treat missing tests as part of the fix — follow [TEST-GAPS.md](TEST-GAPS.md) when needed. | ||
|
|
||
| ## Step 6: Conflicts | ||
|
|
||
| Resolve only mechanical conflicts (import ordering, adjacent additions, branch drift). Stop and summarize if the conflict changes behavior. | ||
|
|
||
| ## Step 7: Validate | ||
|
|
||
| ```bash | ||
| npm test # root integration tests | ||
| cd nemoclaw && npm test # plugin tests | ||
| npm run typecheck:cli # CLI type check | ||
| make check # all linters | ||
| ``` | ||
|
|
||
| Use only commands matching the changed area. | ||
|
|
||
| ## Step 8: Push | ||
|
|
||
| Push when: fix is small, improves mergeability, validation passed, you have push permission. Never force-push. If you cannot push, prepare a comment describing the fix. | ||
|
|
||
| **Fork PRs:** Most PRs come from contributor forks. Check where to push: | ||
|
|
||
| ```bash | ||
| gh pr view <number> --repo NVIDIA/NemoClaw --json headRepositoryOwner,headRepository,headRefName,maintainerCanModify | ||
| ``` | ||
|
|
||
| If `maintainerCanModify` is true, push directly to the fork: | ||
|
|
||
| ```bash | ||
| git push git@github.com:<owner>/<repo>.git <local-branch>:<headRefName> | ||
| ``` | ||
|
|
||
| Do **not** push to `origin` — that creates a separate branch on NVIDIA/NemoClaw that won't appear in the PR. | ||
|
|
||
| ## Step 9: Route to Merge Gate | ||
|
|
||
| If PR looks ready, follow [MERGE-GATE.md](MERGE-GATE.md). | ||
|
|
||
| ## Notes | ||
|
|
||
| - Goal is safe backlog reduction, not finishing the PR at any cost. | ||
| - Never hide unresolved reviewer concerns. | ||
| - Use full GitHub links. | ||
|
Comment on lines
+1
to
+84
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This skill markdown should be generated, not manually edited. Please update source documentation and regenerate this file to keep skill artifacts canonical. As per coding guidelines, "Files in .agents/skills/nemoclaw-/.md are autogenerated by scripts/docs-to-skills.py and must never be edited directly". 🤖 Prompt for AI Agents |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| # Security Sweep Workflow | ||
|
|
||
| Review a security-sensitive item before it enters the normal fast path. | ||
|
|
||
| ## Step 1: Identify the Security Item | ||
|
|
||
| The morning triage and `find-review-pr` already surface security-labeled PRs. Start from the item selected in the day loop's action step. If running standalone, check the triage queue for PRs touching risky areas (see [RISKY-AREAS.md](RISKY-AREAS.md)). | ||
|
|
||
| ## Step 2: Gather Context | ||
|
|
||
| Read the PR or issue, all comments, linked items, changed files, diff, current checks, and recent relevant `main` commits. | ||
|
|
||
| ## Step 3: Classify Risk | ||
|
|
||
| Which bucket applies? | ||
|
|
||
| - **escape or policy bypass** | ||
| - **credential or secret exposure** | ||
| - **installer or release integrity** | ||
| - **workflow or governance bypass** | ||
| - **input validation or SSRF weakness** | ||
| - **test gap in risky code** | ||
|
|
||
| If none apply, route back to normal action selection. | ||
|
|
||
| ## Step 4: Deep Security Pass | ||
|
|
||
| Load `security-code-review` for the nine-category review whenever the item changes behavior in a security-sensitive area. Do not skip this step just because the diff is small. | ||
|
|
||
| ## Step 5: Decide Action | ||
|
|
||
| ### Salvage-now | ||
|
|
||
| All true: risk is understood, fix is small/local, required tests are clear, no unresolved design question. Follow [SALVAGE-PR.md](SALVAGE-PR.md) and [TEST-GAPS.md](TEST-GAPS.md). | ||
|
|
||
| ### Blocked | ||
|
|
||
| Any true: fix changes core trust assumptions, review found real vulnerability needing redesign, PR adds risk without tests, reviewer disagreement. Summarize blocker clearly; do not approve. | ||
|
|
||
| ## Notes | ||
|
|
||
| - Backlog reduction never outranks a credible security concern. | ||
| - No security-sensitive approvals without both deep review and tests. | ||
| - Use full GitHub links. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| # Sequence Work Workflow | ||
|
|
||
| Turn a large problem into a short sequence of mergeable slices. | ||
|
|
||
| ## Step 1: Read the Full Problem Surface | ||
|
|
||
| Read greedily: issue body and comments, linked issues, linked PRs with review comments and status, touched code/tests/docs, recent `main` changes if the area is active. | ||
|
|
||
| Do not sequence work from the title alone. | ||
|
|
||
| ## Step 2: Identify What Is Moving | ||
|
|
||
| Inventory overlapping open PRs and recently merged changes. For each: blocker that must land first? Dependency to build on? Conflict to avoid? Noise? | ||
|
|
||
| ## Step 3: Define Slices | ||
|
|
||
| Each slice should have: one core objective, short file list, explicit tests, merge dependency list, stop condition. | ||
|
|
||
| Prefer substrate-first sequencing: | ||
|
|
||
| 1. Extract stable helper or type boundary | ||
| 2. Add regression tests for current behavior | ||
| 3. Land behavioral fix or refactor on top | ||
| 4. Remove duplication afterward | ||
|
|
||
| ## Step 4: Rank | ||
|
|
||
| Use repo priorities: (1) backlog reduction, (2) security, (3) test coverage, (4) hotspot cooling. | ||
|
|
||
| A slice that unblocks several PRs moves up. An elegant but non-urgent slice moves down. | ||
|
|
||
| ## Step 5: Output | ||
|
|
||
| | Order | Slice | Why now | Depends on | Tests | | ||
| |-------|-------|---------|------------|-------| | ||
| | 1 | Extract timeout parsing from onboard | Enables safe tests, reduces conflicts | None | Unit tests for invalid env values | | ||
|
|
||
| Also include: outstanding blockers, which slices are safe for the maintainer loop, where human design decisions are needed. | ||
|
|
||
| ## Notes | ||
|
|
||
| - Every slice maps to real files, tests, and merge behavior — not abstract architecture. | ||
| - Prefer small serially mergeable changes over one ambitious cleanup branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid direct edits to generated skill markdown.
Please make the change in the source docs and regenerate this artifact.
As per coding guidelines, "Files in .agents/skills/nemoclaw-/.md are autogenerated by scripts/docs-to-skills.py and must never be edited directly".
🤖 Prompt for AI Agents