NVIDIA-NeMo
diff --git a/‎.agents/agents/docs-searcher.md‎
Lines changed: 1 addition & 1 deletion b/‎.agents/agents/docs-searcher.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.agents/recipes/_fix-policy.md‎
Lines changed: 38 additions & 19 deletions b/‎.agents/recipes/_fix-policy.md‎
Lines changed: 38 additions & 19 deletions
diff --git a/‎.agents/recipes/_phase-fix.md‎
Lines changed: 9 additions & 5 deletions b/‎.agents/recipes/_phase-fix.md‎
Lines changed: 9 additions & 5 deletions
diff --git a/‎.agents/recipes/_runner.md‎
Lines changed: 3 additions & 0 deletions b/‎.agents/recipes/_runner.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎.agents/recipes/code-quality/recipe.md‎
Lines changed: 2 additions & 2 deletions b/‎.agents/recipes/code-quality/recipe.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.agents/recipes/docs-and-references/recipe.md‎
Lines changed: 29 additions & 9 deletions b/‎.agents/recipes/docs-and-references/recipe.md‎
Lines changed: 29 additions & 9 deletions
diff --git a/‎.agents/recipes/structure/recipe.md‎
Lines changed: 8 additions & 1 deletion b/‎.agents/recipes/structure/recipe.md‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎.agents/recipes/test-health/recipe.md‎
Lines changed: 19 additions & 1 deletion b/‎.agents/recipes/test-health/recipe.md‎
Lines changed: 19 additions & 1 deletion
@@ -63,7 +63,7 @@ Brief summary of what was found and any recommendations for the user.
 - Only include results that are actually relevant to the search topic
 - If no relevant documentation is found, clearly state that
 - Keep excerpts concise but include enough context to be useful
-- Prioritize user guides and examples over API reference when both exist
+- Prioritize user guides, concepts, tutorials, and recipes according to the user's task
 - If the docs/ folder doesn't exist or is empty, report that clearly
 
 ## Search Strategy
 
@@ -25,7 +25,9 @@ A finding may be converted to a fix only if all hold:
   | `packages/data-designer-config` | `make test-config` |
   | `packages/data-designer-engine` | `make test-engine` |
   | `packages/data-designer` | `make test-interface` |
-- **Single concern**: one finding per PR.
+- **Single concern**: one finding per PR, except suite-declared batchable
+  mechanical fixes. A batch must share one suite/category and satisfy the
+  localized-fix bar as a single combined diff.
 - **Allowlisted paths**: matches the suite's path allowlist.
 
 If the top-ranked candidate fails the bar, try the next. If none of the top
@@ -79,6 +81,9 @@ Each daily recipe maintains two arrays in
 Also: `draft_until_proven` (boolean, per-suite, default `true` for
 code-quality and unset elsewhere) controls draft-PR mode.
 
+Batch PRs still record one `attempted_fixes` entry per finding. Multiple
+entries may point to the same `pr_number` and `branch`.
+
 ### `fix_backlog` rules (audit phase populates this)
 
 - Append every detected finding in an eligible category. If `id` is already
@@ -90,6 +95,9 @@ code-quality and unset elsewhere) controls draft-PR mode.
 - Cap at 200 entries (drop oldest by `first_seen`).
 - Populated **before** the `known_issues` filter so fixable findings persist
   even when their report row is suppressed for being unchanged.
+- Batchable categories must include enough information in `data` to group
+  siblings safely. For package-scoped Python fixes, derive `test_target` from
+  the package containing the source file.
 
 ### `attempted_fixes` rules
 
@@ -101,9 +109,9 @@ code-quality and unset elsewhere) controls draft-PR mode.
   `open` attempts that have a `pr_number`: query the PR and flip the
   attempt to `merged` or `closed` if it is no longer open. Then recover
   from crashes that left state un-updated: list open PRs (`gh pr list`)
-  whose bodies contain the
-  `<!-- agentic-ci finding=<id> suite=<suite> -->` marker, parse out
-  each `<id>`, and back-fill any missing `attempted_fixes` entries with
+  whose bodies contain one or more
+  `<!-- agentic-ci finding=<id> suite=<suite> -->` markers, parse out
+  every `<id>`, and back-fill any missing `attempted_fixes` entries with
   `outcome: "open"` and the parsed `pr_number` and `branch`.
 - Prune: drop `merged` entries older than 90 days. Do **not** prune
   `closed` or `abandoned` entries by age — pruning a single-strike entry
@@ -175,7 +183,7 @@ Earlier criteria override later ones:
 
 4. **Recency** — newer findings rank above long-standing ones.
 
-Record the chosen finding's id, scores, and rationale at the top of
+Record the chosen finding id(s), scores, and rationale at the top of
 `/tmp/audit-{{suite}}.md`.
 
 ## Standard fix procedure
@@ -191,29 +199,38 @@ declare only the parts that vary (eligible categories, branch type,
    `merged`; surface two-strike entries in the report's
    `Repeatedly-failed fix attempts` section and drop them from selection.
 3. Rank the remainder per the Ranking section.
-4. For each candidate, top 5 max:
-   1. Re-verify the finding still applies (re-grep / re-read). If not,
-      remove from `fix_backlog` and continue.
-   2. Apply the fix. If the diff exceeds the localized-fix bar or touches
-      a non-allowlisted path, abandon and continue.
-   3. If the category sets `test_required: true`, run the per-package
+4. For each primary candidate, top 5 max:
+   1. If the suite declares the category batchable, collect sibling
+      `fix_backlog` entries for the same suite/category that share the same
+      test target and branch type. Do not discover new findings; use only
+      existing backlog entries. Batch at most 3 entries to stay within the
+      localized-fix file cap.
+   2. Re-verify every finding still applies (re-grep / re-read). If a
+      sibling no longer applies, remove it from `fix_backlog`; if the
+      primary no longer applies, remove it from `fix_backlog` and continue
+      to the next primary candidate.
+   3. Apply the fix or batch. If the combined diff exceeds the
+      localized-fix bar or touches a non-allowlisted path, abandon and
+      continue.
+   4. If the category sets `test_required: true`, run the per-package
       test target (see the mapping table in "Localized fix bar" above)
-      for the package containing the change. On failure: abandon and
+      for the package containing the change(s). On failure: abandon and
       continue.
-   4. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
+   5. Branch: `agentic-ci/<type>/<suite>-YYYYMMDD-<short-slug>`. Commit:
       `<type>(agentic-ci): <one-line>`. Push.
-   5. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including the
-      hidden metadata block:
+   6. Write the PR body to `/tmp/pr-body-{{suite}}.md`, including one
+      hidden metadata block per fixed finding:
       `<!-- agentic-ci finding=<id> suite=<suite> -->`
-   6. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
+   7. `gh pr create --body-file /tmp/pr-body-{{suite}}.md` with `--draft`
       iff `draft_until_proven` is true for the suite.
-   7. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
-   8. Record `attempted_fixes` entry with `outcome: "open"` and exit.
+   8. `gh pr edit <num> --add-label agentic-ci --add-label agentic-ci/<suite>`.
+   9. Record one `attempted_fixes` entry per fixed finding with
+      `outcome: "open"` and exit.
 5. If all 5 candidates were abandoned, append a one-line note to the
    report and exit cleanly. The state already reflects the abandonments.
 
 On any failure mid-flow: record `outcome: "abandoned"` for the chosen
-finding (with `pr_number: null`), leave any pushed branch in place
+finding(s) (with `pr_number: null`), leave any pushed branch in place
 (`pr-stale.yml` will reap it; branch deletion is forbidden), and continue
 to the next candidate.
 
@@ -223,6 +240,8 @@ to the next candidate.
   interactive-only and shells the body inline; CI needs determinism.
 - **Title**: conventional, `<type>(agentic-ci): <one-line>`.
 - **Labels**: `agentic-ci`, `agentic-ci/<suite>`.
+- **Batch markers**: batch PRs include one hidden finding marker per fixed
+  finding so crash recovery can reconstruct every `attempted_fixes` entry.
 - **Draft PRs**: `code-quality` opens draft until a maintainer flips
   `draft_until_proven` to `false` in runner-state, after at least two
   non-draft PRs from that suite have landed clean. This flip is
 
@@ -9,16 +9,20 @@ This invocation runs the **FIX** phase only.
   Do NOT redo audit work — that is, do NOT re-scan whole packages or
   rebuild `fix_backlog` from scratch. The "no re-scan" rule does NOT
   override the per-candidate re-verification step required by
-  `_fix-policy.md` §"Standard fix procedure" step 4.1: when you pick a
-  candidate, you MUST re-grep / re-read the specific file or symbol it
-  points at to confirm the finding still applies before editing.
+  `_fix-policy.md` §"Standard fix procedure": when you pick a candidate,
+  you MUST re-grep / re-read the specific file or symbol it points at to
+  confirm the finding still applies before editing.
   Re-verification of a single candidate is required; re-scanning the
   codebase to discover new findings is forbidden.
 - Pick the highest-ranked eligible candidate from `fix_backlog`, apply
   the fix, run the package's tests if applicable, commit, push, and open
-  the PR using `gh pr create --body-file`.
+  the PR using `gh pr create --body-file`. If the recipe and
+  `_fix-policy.md` declare the category batchable, you may add sibling
+  entries from the existing `fix_backlog` after re-verifying each one.
+  Do not scan for findings that are not already in `fix_backlog`.
 - Record the attempt in `attempted_fixes` (whether successful, abandoned,
-  or failed through the top-5 fallback) before exiting.
+  or failed through the top-5 fallback) before exiting. Batch PRs record
+  one attempt per fixed finding, all pointing to the same PR and branch.
 - If no candidate qualifies after trying up to 5 of them, exit cleanly,
   append a short note to `/tmp/audit-{{suite}}.md` describing what was
   tried, and update `attempted_fixes` accordingly. Do NOT open a PR.
 
@@ -67,6 +67,9 @@ Rules:
   passwords) in your output, even if you encounter them in code.
 - **Stay in scope.** Only perform the task described in the recipe. Do not
   explore unrelated areas of the codebase.
+- **No subagents.** Do not use Task, Explore, or other delegated/local agents.
+  The CI key may not have access to their default models; do the work in the
+  main agent session.
 - **Cost awareness.** Minimize unnecessary file reads and tool calls. If you
   have the information you need, stop.
 
 
@@ -4,7 +4,7 @@ description: Audit code quality gaps not covered by ruff - complexity trends, ex
 trigger: schedule
 tool: claude-code
 timeout_minutes: 20
-max_turns: 30
+max_turns: 50
 permissions:
   contents: write
 ---
@@ -152,7 +152,7 @@ Examples of things to test (pick 2-3 per run, and invent new ones):
 - Column names with special characters or very long strings
 - Recently changed validators (check `git log --oneline -10 -- packages/*/src/data_designer/config/`)
 
-**API reference:**
+**Useful imports:**
 
 ```python
 from data_designer.config.config_builder import DataDesignerConfigBuilder
 
@@ -33,11 +33,31 @@ even when their report row is suppressed for being unchanged.
 
 ## Instructions
 
+### Turn budget
+
+This suite must finish before the `max_turns` limit. Do not attempt a
+repo-wide audit in one run.
+
+1. Read runner memory.
+2. Write `/tmp/audit-{{suite}}.md` immediately with the required headings and
+   empty tables. If the run is interrupted later, the workflow must still have
+   a usable partial report.
+3. Use targeted searches to find candidates, then read only the files needed
+   to verify a specific finding.
+4. Stop after either:
+   - 20 tool calls
+   - 2 new findings in a section
+   - all sections have been sampled
+5. Finalize the report, update runner memory, and stop. If no new findings
+   were verified, replace the report with `NO_FINDINGS`.
+
 ### 1. Docstring vs signature drift
 
 This repo uses Google-style docstrings (`Args:`, `Returns:`, `Raises:`).
-Scan public functions and methods in `packages/` for mismatches between the
-docstring and the actual function signature:
+Sample public functions and methods in `packages/` for mismatches between the
+docstring and the actual function signature. Do not scan every source file.
+Use `rg "Args:|Returns:|Raises:" packages/*/src/ --glob '*.py'` to find
+candidates, then inspect at most 5 high-value files:
 
 - Parameters in the `Args:` section that no longer exist in the signature
 - Parameters in the signature that are missing from `Args:`
@@ -60,14 +80,17 @@ Check links in these locations:
 - `docs/` - MkDocs content links, code references, cross-page links
 - `CONTRIBUTING.md`, `DEVELOPMENT.md`, `STYLEGUIDE.md` - relative links
 
-For each link, verify the target file or anchor exists. Report broken links
-with the source file, line number, and broken target.
+Use targeted link extraction and inspect at most 10 candidate links. Prefer
+high-value docs and links changed recently. For each sampled link, verify the
+target file or anchor exists. Report broken links with the source file, line
+number, and broken target.
 
 ### 3. Architecture doc references
 
 The 10 files in `architecture/` reference specific classes, functions, files,
 and registries by name. These are high-value docs that agents and developers
-rely on for orientation. For each code reference:
+rely on for orientation. Sample at most 3 architecture files per run,
+prioritizing files changed recently. For each code reference:
 - Verify the referenced class, function, or module still exists at the stated
   location
 - If renamed or moved, flag with the old and new location
@@ -101,11 +124,8 @@ Review for accuracy against the current code:
   the most recent 3-5 posts for references to functions, classes, or
   architecture that have since been modified.
 
-**Code reference** (`docs/code_reference/`):
-- Check that autodoc module paths point to modules that still exist.
-
 **Prioritize by risk of drift**: pages with the most code symbols referenced
-are most likely to be stale. Don't read every page - sample 5-10 high-value
+are most likely to be stale. Don't read every page - sample 3-5 high-value
 pages and flag patterns.
 
 ## Output format
 
@@ -4,7 +4,7 @@ description: Audit structural integrity - import boundaries, lazy import complia
 trigger: schedule
 tool: claude-code
 timeout_minutes: 20
-max_turns: 30
+max_turns: 50
 permissions:
   contents: write
 ---
@@ -223,6 +223,13 @@ Follow the standard fix procedure in `_fix-policy.md`. Suite-specific bits:
 | missing-future | `chore` | yes | Insert `from __future__ import annotations` after the SPDX header block, before other imports. Fully deterministic. Tests required because `__future__` annotations can affect introspection-heavy code paths. |
 | lazy-import | `refactor` | yes | Move a top-level heavy import (pandas/numpy/polars/torch/duckdb/sqlfluff/faker) to the `data_designer.lazy_heavy_imports` accessor pattern. Eligible only when (a) file is under `packages/*/src/`, (b) the module is already wired in the lazy system, (c) the heavy module is used only inside function bodies. |
 
+`missing-future` is batchable: when the primary candidate is
+`missing-future`, include other `missing-future` backlog entries with the
+same `test_target` if each file still lacks the import and the combined
+diff remains within the localized-fix bar. Batch at most 3 files. Run the
+shared test target once. Use one hidden finding marker and one
+`attempted_fixes` entry per file.
+
 **Not eligible** — stays report-only:
 
 - Import boundary violations (architectural judgement).
 
@@ -32,6 +32,24 @@ update `baselines` with current values and `known_issues` with new findings.
 
 ## Instructions
 
+### Turn budget
+
+This suite must finish before the `max_turns` limit. Do not attempt a
+repo-wide test audit in one run.
+
+1. Read runner memory.
+2. Write `/tmp/audit-{{suite}}.md` immediately with the required headings and
+   empty tables. If the run is interrupted later, the workflow must still have
+   a usable partial report.
+3. Use targeted searches to find candidates, then read only the files needed
+   to verify a specific finding.
+4. Stop after either:
+   - 20 tool calls
+   - 2 new findings in a section
+   - all sections have been sampled
+5. Finalize the report, update runner memory, and stop. If no new findings
+   were verified, replace the report with `NO_FINDINGS`.
+
 ### 1. Test-to-source coverage mapping
 
 Map source files to their corresponding test files:
@@ -208,7 +226,7 @@ without at least one provider configured. Stick to config-layer checks
 (`DataDesignerConfigBuilder.build()`, column type resolution) which do
 not require providers.
 
-**API reference** for writing checks:
+**Useful imports** for writing checks:
 
 ```python
 from data_designer.config.config_builder import DataDesignerConfigBuilder