Skip to content

Commit d61df30

Browse files
Q00shaun0927claudeshaun0927hermes-agent
authored
feat(auto): wire driver selection through capabilities (#672)
* Add selected driver and brake mode for auto interviews * fix(auto): align selected driver resume state * feat(auto): wire driver selection through capabilities * fix(auto): prompt for interview driver selection * fix(auto): only prompt installed interview drivers * fix(auto): preserve driver scaffold ledger sources * fix(auto): scope add-keyword brake gate to feature additions The bot's design note on PR #672 flagged a regression: bare `add` in the risk gate matched ordinary CRUD wording like "How should users add a task?", false-blocking selected-driver sessions under brake=on. Replace bare `add` with `add(?:ing)?\s+(?:a|an|the)?\s*<scope-noun>` where the scope-noun is feature/capability/support/requirement/epic/ story/scope/product-area. CRUD verbs no longer match while genuine scope-additions ("Should we add a feature for offline mode?") still fire the gate. Refs #672 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auto): align selected-driver ledger values with the freeform answer The bot's blocking finding on PR #672 noted that the selected-driver path sent the driver's freeform answer into the interview transcript but kept the deterministic scaffold's value in ``ledger_updates``. Seed generation reads the transcript while grading reads the ledger, so the two could disagree silently (driver says "Django", ledger says "FastAPI") without anyone catching the divergence. - Replace each scaffold-derived ledger entry value with the driver's freeform answer verbatim and tag the entry CONFLICTING with reduced confidence (<= 0.4), so the Seed-ready/A-grade gates surface the divergence instead of treating the scaffold value as confirmed. - Preserve the structural section/key/source so downstream section-aware Seed generation still works, and keep the original scaffold value in the rationale + evidence as audit context. - Update the existing ledger-values test to lock in the new contract and assert the original scaffold value is preserved as rationale evidence. Refs #675 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(auto): lock packaged driver/brake forwarding at MCP payload layer (#679) Issue #673 acceptance criteria require that the actual MCP payload reaching ouroboros_auto carries forwarded driver/brake values and never leaks $driver or $brake placeholder literals. Upstream stack/3 already locks dispatch-level resolution; this commit adds the missing layer by mocking the ouroboros MCP server and asserting call_tool's payload. - Forwarding test: --driver hermes --brake off must reach call_tool with driver="hermes", brake="off", skip_run=True. - Placeholder leak test: plain `ooo auto "goal"` must not send "$driver" or "$brake" as real values; an empty/None value is acceptable. Closes #673 * fix(auto): use INFERRED status for driver-derived ledger entries The earlier fix that aligned ledger values with the driver's freeform answer marked every structural ledger update as ``LedgerStatus.CONFLICTING``. The interview loop treats CONFLICTING (along with MISSING/WEAK/BLOCKED) as an open gap (``ledger.py:open_gaps``), so a selected-driver session could populate every required section and still never reach ``is_seed_ready`` — the loop would carry the same gaps forever and eventually block. Use ``LedgerStatus.INFERRED`` instead. The ledger entry still: - Carries the driver's freeform answer verbatim as the value, so the persisted ledger and the interview transcript share the same content (no silent divergence between the two sources of truth). - Has reduced confidence (<= 0.4) and an ``auto_interview_transcript`` evidence marker so grading and the A-grade gate can downgrade or re-verify the driver answer before final acceptance. - Preserves the original scaffold value verbatim in the rationale and in a ``scaffold_value:<value>`` evidence tag, so divergence between scaffold and driver is never silently lost. INFERRED is the right semantic: the value is the driver's inferred answer, not the user's confirmed choice; downstream grading is responsible for deciding whether to escalate to verification before A-grade. The interview loop is now free to converge for selected-driver sessions. Tests: - ``test_driver_answerer_ledger_values_reflect_driver_answer_without_blocking_loop`` asserts the new INFERRED+verbatim+scaffold-evidence contract and locks in that the status is not in the loop's blocking set. - ``test_driver_answerer_ledger_does_not_block_seed_ready_convergence`` is a regression test that drives the answerer through every required section and asserts those sections do not end up in the open-gap set (the contract that broke under the CONFLICTING variant). Addresses ouroboros-agent[bot] blocking finding on PR #672 against ``f229e35``. Refs #672. * feat(auto): add selected-driver answer metadata (#682) Co-authored-by: Hermes Agent <hermes-agent@users.noreply.github.com> Co-authored-by: shaun0927 <junghwan1912@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auto): align mcp resume bounds (#680) Co-authored-by: Hermes Agent <hermes-agent@users.noreply.github.com> --------- Co-authored-by: shaun0927 <junghwan1912@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Junghwan <70629228+shaun0927@users.noreply.github.com> Co-authored-by: Hermes Agent <hermes-agent@users.noreply.github.com>
1 parent 76475a5 commit d61df30

19 files changed

Lines changed: 2007 additions & 18 deletions

skills/auto/SKILL.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ mcp_args:
99
max_interview_rounds: "$max_interview_rounds"
1010
max_repair_rounds: "$max_repair_rounds"
1111
skip_run: "$skip_run"
12+
driver: "$driver"
13+
brake: "$brake"
1214
---
1315

1416
# /ouroboros:auto
@@ -30,6 +32,7 @@ is unavailable. A manual fallback is not an `ooo auto` run.
3032
ooo auto "Build a local-first habit tracker CLI"
3133
ooo auto --resume auto_abc123
3234
ooo auto "Build a local-first habit tracker CLI" --skip-run
35+
ooo auto "Build a local-first habit tracker CLI" --driver hermes --brake on
3336
/ouroboros:auto "Build a local-first habit tracker CLI"
3437
```
3538

@@ -42,3 +45,10 @@ ooo auto "Build a local-first habit tracker CLI" --skip-run
4245
5. Starts execution only after A-grade.
4346

4447
The pipeline must not hang indefinitely: all loops are bounded and timeout failures return a resumable `auto_session_id`. Resume with `ooo auto --resume <auto_session_id>`. Use `--skip-run` to stop after the A-grade Seed. The CLI-only `--show-ledger` flag prints assumptions/non-goals; MCP skill responses already include the same ledger summary when available.
48+
49+
When invoked through the interactive CLI without `--driver` or a configured
50+
default driver, `ooo auto` asks whether to use a selected interview driver if
51+
one of the supported driver CLIs is installed. Declining that prompt, or having
52+
no installed driver CLI, keeps the deterministic auto answerer. Use
53+
`--driver <backend>` to select a driver explicitly; use `--brake on|off` to
54+
control whether risky driver answers block for approval.

src/ouroboros/auto/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
before starting execution.
77
"""
88

9-
from ouroboros.auto.answerer import AutoAnswer, AutoAnswerer, AutoAnswerSource
9+
from ouroboros.auto.answerer import AutoAnswer, AutoAnswerer, AutoAnswerMetadata, AutoAnswerSource
1010
from ouroboros.auto.grading import GradeGate, GradeResult, SeedGrade
1111
from ouroboros.auto.interview_driver import AutoInterviewDriver, AutoInterviewResult, InterviewTurn
1212
from ouroboros.auto.ledger import LedgerEntry, LedgerSection, SeedDraftLedger
@@ -17,6 +17,7 @@
1717

1818
__all__ = [
1919
"AutoAnswer",
20+
"AutoAnswerMetadata",
2021
"AutoAnswerSource",
2122
"AutoAnswerer",
2223
"AutoInterviewDriver",

src/ouroboros/auto/answerer.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ class AutoAnswerSource(StrEnum):
1818
EXISTING_CONVENTION = "existing_convention"
1919
CONSERVATIVE_DEFAULT = "conservative_default"
2020
ASSUMPTION = "assumption"
21+
DRIVER = "driver"
2122
NON_GOAL = "non_goal"
2223
BLOCKER = "blocker"
2324

@@ -64,6 +65,15 @@ class AutoBlocker:
6465
question: str
6566

6667

68+
@dataclass(frozen=True, slots=True)
69+
class AutoAnswerMetadata:
70+
"""Structured provenance for auto answers that need audit context."""
71+
72+
risk: str | None = None
73+
confidence: float | None = None
74+
provenance: tuple[str, ...] = ()
75+
76+
6777
@dataclass(frozen=True, slots=True)
6878
class AutoAnswer:
6979
"""Answer plus structured ledger updates."""
@@ -75,6 +85,7 @@ class AutoAnswer:
7585
assumptions: list[str] = field(default_factory=list)
7686
non_goals: list[str] = field(default_factory=list)
7787
blocker: AutoBlocker | None = None
88+
metadata: AutoAnswerMetadata = field(default_factory=AutoAnswerMetadata)
7889

7990
@property
8091
def prefixed_text(self) -> str:

0 commit comments

Comments
 (0)