feat(auto): wire driver selection through capabilities#672
Conversation
1f7499b to
4cae512
Compare
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
58ec772for PR #672
Review record:
f99c89b6-8054-4c18-ac9c-f249bf2276b6
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/cli/commands/config.py:315 | BLOCKING | config driver set claims to manage the default auto-interview driver, but it persists the choice by overwriting llm.backend. That setting is consumed globally by other LLM-backed flows, so this command silently changes unrelated behavior outside ooo auto as well. This is a contract/UI violation, not just an implementation detail: a user trying to scope the change to interview answering will unexpectedly retarget other features. |
| 2 | src/ouroboros/cli/commands/config.py:292 | BLOCKING | config driver dereferences data.get("llm", {}).get("backend", ...) without validating that llm is a mapping. If the config is structurally invalid (for example llm: []), this subcommand raises AttributeError instead of returning the friendly config error that the rest of this command group already provides. The new surface needs the same malformed-config handling as config show/validate. |
Non-blocking Suggestions
| 1 | src/ouroboros/cli/commands/config.py:292 | documentation | The user-facing default/current value is shown as claude_code, while the accepted driver choices are documented as canonical names like claude. Normalizing the display would make the new command less confusing. |
Design Notes
The async answerer plumbing in auto is directionally sound, and the resume-path validation is careful. The main design problem is that the new config driver surface is not actually scoped to “interview driver” state; it is wired to the global LLM backend knob.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
fc02093 to
3f906b5
Compare
|
Addressed the requested changes in
Validation after the fix:
|
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
3f906b5for PR #672
Review record:
bdfc1490-abd8-40a9-973a-573dd0b18974
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/cli/commands/auto.py:233 | BLOCKING | Resume rejects valid Claude-backed sessions because the persisted driver is normalized to claude_code later (state.interview_driver_backend = selected_answerer.backend), but resume compares against resolve_interview_driver_backend(...), which canonicalizes the same user choice to claude. After the first save, ooo auto --resume ... can raise resume driver mismatch for the same backend, making Claude driver sessions effectively non-resumable. |
| 2 | src/ouroboros/cli/commands/auto.py:188 | BLOCKING | The configured default driver is applied before the resume path is examined, so resuming a legacy/deterministic session with state.interview_driver_backend is None will silently switch that in-flight session to the current configured auto driver even when the user did not pass --driver. That changes interview semantics mid-session based only on current config, which violates the persisted-session contract the surrounding resume logic is trying to preserve. |
Non-blocking Suggestions
None.
Design Notes
The separation between deterministic scaffolding and selected-driver answering is reasonable, and the new state/config plumbing is mostly coherent. The main weakness is that driver identity is normalized inconsistently across config, session state, and resume checks, so persisted-session semantics are not yet stable.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
4cae512 to
84095bb
Compare
3f906b5 to
c0efaed
Compare
|
Addressed the latest review in
Validation:
|
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
c0efaedfor PR #672
Review record:
c0207d13-11d0-4b2c-8eac-4deecd60521d
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/cli/commands/auto.py:232 | BLOCKING | The new resume check normalizes the requested driver to LLM adapter identity (claude -> claude_code) but compares it against the persisted state.interview_driver_backend without normalizing that stored value first. Any in-flight session created before this patch that persisted claude will now fail ooo auto --resume ... --driver claude with a false resume driver mismatch, which is exactly the compatibility case this change is supposed to fix. |
| 2 | src/ouroboros/mcp/tools/auto_handler.py:159 | BLOCKING | The MCP resume path has the same raw-vs-normalized driver comparison bug as the CLI path. A previously persisted session with interview_driver_backend="claude" will be rejected when resumed with driver="claude" because the request is normalized to claude_code before comparison, but the stored session value is not. |
Non-blocking Suggestions
None.
Design Notes
The overall direction is reasonable: driver selection is now scoped to auto interviews instead of global LLM config, and preserving structured scaffold ledger entries avoids collapsing seed contracts into freeform text. The main remaining issue is backward-compatibility in resume semantics, where identity normalization is only applied to one side of the comparison.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
84095bb to
9d7947a
Compare
c0efaed to
3961fc8
Compare
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
3961fc8for PR #672
Review record:
11257bd5-7bcd-4329-9736-bfb647f26213
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
src/ouroboros/mcp/tools/auto_handler.py:176[warning] New MCP-started auto sessions ignore the configured defaultauto.interview_driver_backendand always stay deterministic unless the caller passesdriverexplicitly. The direct CLI path now applies that default (src/ouroboros/cli/commands/auto.py:189-192), andconfig driver setpresents it as the default interview driver backend globally. In practice this means the same user configuration produces different interview behavior depending on whetherooo autois run through the CLI or through theouroboros_autoMCP/tooling path, which is a real contract break for users relying on the configured default.
Non-blocking Suggestions
None.
Design Notes
The driver-selection wiring is mostly consistent now: resume semantics, backend identity normalization, and ledger provenance are all materially better. The main architectural gap is that the new “default interview driver backend” concept was implemented only in the CLI entrypoint, not in the MCP auto entrypoint, so the system still has two different sources of truth for how a default auto session is started.
Policy Notes
- No in-scope blocking findings remained after policy filtering; downgraded verdict accordingly.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
9d7947a to
efaa22a
Compare
3961fc8 to
3b9b655
Compare
|
Rebased #672 onto the updated #671 ( Validation after rebase:
|
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
3b9b655for PR #672
Review record:
967e517e-d661-4624-9f8b-bfbe3a6a6751
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/config/models.py:245 | BLOCKING | auto.interview_driver_backend is introduced as persisted config, but AutoConfig accepts any string. That means load_config() and ouroboros config validate will happily accept invalid values like auto.interview_driver_backend: bogus, and plain ooo auto only fails later when _run_auto() tries to consume the default backend. Since this PR adds a first-class config contract for the auto driver, the model needs the same backend validation the orchestrator config already has, plus a regression test for an invalid stored value. |
Non-blocking Suggestions
| 1 | src/ouroboros/cli/commands/config.py:290 | cosmetic refactoring | @driver_app.callback(invoke_without_command=True) will also run for subcommands, so ouroboros config driver set ... prints the current backend before mutating it. That is mostly a UX annoyance, but splitting “show” into an explicit command or guarding on invoked subcommand would make the CLI output cleaner. |
Design Notes
The overall direction is coherent: persisting driver/brake in auto state and normalizing driver identity on resume fixes real lifecycle issues, and keeping deterministic ledger updates separate from freeform driver text is the right architectural choice. The main gap is that the new config surface is not validated at the model boundary, so the persisted contract is weaker than the runtime contract.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
dcf37ee to
9430acf
Compare
3b9b655 to
2817d48
Compare
|
Rebased #672 again onto the corrected #671 head ( Validation after rebase:
|
2817d48 to
75e4fb6
Compare
|
Updated #672 after the config validation review. Changes:
Validation:
|
9430acf to
76475a5
Compare
75e4fb6 to
52a3852
Compare
|
Rebased #672 onto the updated #671 head ( Validation after rebase:
|
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
52a3852for PR #672
Review record:
01e6a8fd-6f11-4d7d-94ba-3a13fad6cb89
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/backends/capabilities.py:110 | BLOCKING | litellm is now advertised and validated as a supported interview driver, but the selected-driver path always calls the adapter with CompletionConfig(model="default", role="auto_interview_answer") (driver_answerer.py:87). LiteLLMAdapter forwards config.model directly to LiteLLM (litellm_adapter.py:268) unless a matching llm_role_profiles entry rewrites it, and the default config has no auto_interview_answer profile. On a normal install, ooo auto --driver litellm or config driver set litellm will therefore fail on the first interview answer with an invalid literal model name "default". Either keep litellm out of supports_interview_driver for now, or add a real model-resolution path and regression test for this backend. |
Non-blocking Suggestions
None.
Design Notes
The overall shape is reasonable: the async answerer integration, persisted driver/brake state, and alias normalization all line up well. The main issue is the capability contract now claims support for a backend whose adapter path is not actually runnable with the completion config this feature uses.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
52a3852 to
80351dd
Compare
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
032964afor PR #672
Review record:
16698601-5e2a-4a4e-856d-7ee3bdaad063
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/auto/driver_answerer.py:236 | BLOCKING | _ledger_updates_for() rewrites every scaffold entry to LedgerSource.INFERENCE, which drops the original ASSUMPTION/NON_GOAL source categories. The pipeline result and ledger summary derive assumptions/non_goals strictly from those sources (src/ouroboros/auto/ledger.py:284-290, src/ouroboros/auto/pipeline.py:405-406), so selected-driver sessions will silently lose normal assumptions and non-goals from --show-ledger, MCP output, and downstream grading context. For example, a non-goals answer from AutoAnswerer (src/ouroboros/auto/answerer.py:162-171) becomes invisible after this rewrite. |
Non-blocking Suggestions
None.
Design Notes
The resume/config scoping work is disciplined, and normalizing persisted driver identity against adapter identity is the right direction. The main architectural regression is in the new driver answerer collapsing semantic ledger provenance that other layers still treat as part of the contract.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Addressed the change request in |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
3f91e39for PR #672
Review record:
4d0c7ce4-d06d-473f-8278-0693687da6a8
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
src/ouroboros/auto/driver_answerer.py:159[warning] The risk gate is too broad for the defaultbrake=onpath: matching any standaloneaddmarks the question as a “scope or product/business tradeoff”, andDriverAutoAnswerer.answer()then hard-blocks the session before calling the driver. Ordinary interview questions like “How should users add a task?” or “What should the add command do on duplicate input?” will now stop selected-driver sessions even though they are routine feature-behavior questions, not risky scope decisions. This makes the new default selected-driver mode regress into false blockers on common CRUD wording.
Non-blocking Suggestions
None.
Design Notes
The config scoping and resume semantics are much cleaner than earlier iterations: driver identity is persisted canonically, resume mismatch handling is explicit, and preserving scaffold ledger structure avoids collapsing interview state into freeform text. The main remaining issue is that the brake heuristic is currently policy-heavy enough to undermine the selected-driver path’s usability.
Policy Notes
- No in-scope blocking findings remained after policy filtering; downgraded verdict accordingly.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
The bot's design note on PR #672 flagged a regression: bare `add` in the risk gate matched ordinary CRUD wording like "How should users add a task?", false-blocking selected-driver sessions under brake=on. Replace bare `add` with `add(?:ing)?\s+(?:a|an|the)?\s*<scope-noun>` where the scope-noun is feature/capability/support/requirement/epic/ story/scope/product-area. CRUD verbs no longer match while genuine scope-additions ("Should we add a feature for offline mode?") still fire the gate. Refs #672 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflict in driver_answerer.py / test_driver_answerer.py between the stack/3 risk-gate scope fix (974f156) and the post-response classifier branch. Keep both: - Tightened scope-add regex from stack/3 (no longer false-blocks CRUD wording like "How do users add a task?"). - Post-response classifier and metadata threading from Q00#682/Q00#683. Refs Q00#672 Refs Q00#675 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflict in driver_answerer.py between the stack/3 risk-gate scope fix (974f156) and the metadata-plumbing branch. The metadata field added in Q00#682 is preserved on the brake-on blocker path; the scope-add regex tightening from stack/3 lands unchanged. Refs Q00#672 Refs Q00#675 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
974f156for PR #672
Review record:
ac4a1e8a-92ae-4d0e-a7fc-883dd70962dd
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/auto/driver_answerer.py:129 | BLOCKING | The selected-driver path sends the driver’s freeform answer into the interview transcript (interview_driver.py:157), but ledger_updates are copied from the deterministic scaffold instead of reflecting that answer (driver_answerer.py:232-253). Seed generation later reads the interview session transcript (auto/adapters.py:67-77, authoring_handlers.py:557-563), while grading only checks the ledger for gaps/risk (auto/grading.py:138-245). That means a driver can answer “use Django/Postgres” while the ledger still records the scaffold’s different choice, and the pipeline can generate/review a Seed against two conflicting sources of truth without detecting the mismatch. This is a real correctness bug in the new mode, not just provenance metadata. |
Non-blocking Suggestions
None.
Design Notes
The driver-selection/config plumbing is coherent, and the async answerer integration is straightforward. The main architectural problem is that selected-driver mode currently breaks the ledger-as-contract model by letting transcript content diverge from the persisted ledger.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
The bot's blocking finding on PR #672 noted that the selected-driver path sent the driver's freeform answer into the interview transcript but kept the deterministic scaffold's value in ``ledger_updates``. Seed generation reads the transcript while grading reads the ledger, so the two could disagree silently (driver says "Django", ledger says "FastAPI") without anyone catching the divergence. - Replace each scaffold-derived ledger entry value with the driver's freeform answer verbatim and tag the entry CONFLICTING with reduced confidence (<= 0.4), so the Seed-ready/A-grade gates surface the divergence instead of treating the scaffold value as confirmed. - Preserve the structural section/key/source so downstream section-aware Seed generation still works, and keep the original scaffold value in the rationale + evidence as audit context. - Update the existing ledger-values test to lock in the new contract and assert the original scaffold value is preserved as rationale evidence. Refs #675 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
f229e35for PR #672
Review record:
d8174b0f-b35c-488f-ad42-3c26a4626c98
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/auto/driver_answerer.py:239 | BLOCKING | Every structural ledger update produced by DriverAutoAnswerer is forced to LedgerStatus.CONFLICTING. In this pipeline, conflicting sections are treated as open gaps (src/ouroboros/auto/ledger.py:270-282), and the interview loop blocks whenever gaps remain (src/ouroboros/auto/interview_driver.py:182-189). That means selected-driver sessions cannot actually converge to Seed-ready once they populate required sections; they will keep carrying unresolved gaps and eventually block instead of advancing to seed generation. |
Non-blocking Suggestions
None.
Design Notes
The driver/brake/config plumbing is coherent, and the resume semantics are much cleaner than earlier iterations. The remaining issue is architectural: the new “driver answer is canonical but needs later verification” model is encoded using a ledger status that the interview phase already treats as a hard stop, so the new mode conflicts with the existing Seed-ready contract.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
#679) Issue #673 acceptance criteria require that the actual MCP payload reaching ouroboros_auto carries forwarded driver/brake values and never leaks $driver or $brake placeholder literals. Upstream stack/3 already locks dispatch-level resolution; this commit adds the missing layer by mocking the ouroboros MCP server and asserting call_tool's payload. - Forwarding test: --driver hermes --brake off must reach call_tool with driver="hermes", brake="off", skip_run=True. - Placeholder leak test: plain `ooo auto "goal"` must not send "$driver" or "$brake" as real values; an empty/None value is acceptable. Closes #673
The earlier fix that aligned ledger values with the driver's freeform answer marked every structural ledger update as ``LedgerStatus.CONFLICTING``. The interview loop treats CONFLICTING (along with MISSING/WEAK/BLOCKED) as an open gap (``ledger.py:open_gaps``), so a selected-driver session could populate every required section and still never reach ``is_seed_ready`` — the loop would carry the same gaps forever and eventually block. Use ``LedgerStatus.INFERRED`` instead. The ledger entry still: - Carries the driver's freeform answer verbatim as the value, so the persisted ledger and the interview transcript share the same content (no silent divergence between the two sources of truth). - Has reduced confidence (<= 0.4) and an ``auto_interview_transcript`` evidence marker so grading and the A-grade gate can downgrade or re-verify the driver answer before final acceptance. - Preserves the original scaffold value verbatim in the rationale and in a ``scaffold_value:<value>`` evidence tag, so divergence between scaffold and driver is never silently lost. INFERRED is the right semantic: the value is the driver's inferred answer, not the user's confirmed choice; downstream grading is responsible for deciding whether to escalate to verification before A-grade. The interview loop is now free to converge for selected-driver sessions. Tests: - ``test_driver_answerer_ledger_values_reflect_driver_answer_without_blocking_loop`` asserts the new INFERRED+verbatim+scaffold-evidence contract and locks in that the status is not in the loop's blocking set. - ``test_driver_answerer_ledger_does_not_block_seed_ready_convergence`` is a regression test that drives the answerer through every required section and asserts those sections do not end up in the open-gap set (the contract that broke under the CONFLICTING variant). Addresses ouroboros-agent[bot] blocking finding on PR #672 against ``f229e35``. Refs #672.
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
dc421dafor PR #672
Review record:
ec420147-2d07-442b-94b5-31e7a4db6f04
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
| 1 | tests/unit/auto/test_driver_answerer.py:67 | Nice-to-have tests | The new selected-driver coverage exercises the brake=off path well, but the default brake=on behavior still is not directly tested. A small regression test that asserts risky questions return a blocker without invoking the adapter would lock down the primary safety branch of this feature. |
Design Notes
The split between deterministic scaffolding and driver-sourced text is sound, and the config scoping change to auto.interview_driver_backend keeps this feature from mutating unrelated LLM defaults. Resume semantics and ledger provenance also look consistent with the contributor’s stated intent.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Co-authored-by: Hermes Agent <hermes-agent@users.noreply.github.com> Co-authored-by: shaun0927 <junghwan1912@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Hermes Agent <hermes-agent@users.noreply.github.com>
Re-review pingStatus
What the PR doesWires interview driver selection ( Resume semantics persist driver identity canonically; resume mismatches are explicit; configured defaults only affect new sessions (not resumed ones); ledger source categories on selected-driver sessions preserve scaffold structure instead of collapsing into freeform text. Iterative improvements driven by the botThe current APPROVE is the result of three rounds against
Non-blocking suggestion in the latest reviewThe bot suggested a direct Why merging is safe
cc @Q00 — please consider merging this after #671. #680 is already approved on top of this branch and will rebase cleanly onto |
Ready for maintainer review — re-review pingThis PR (driver selection through capabilities + brake mode for selected-driver auto interviews) is now in a clean merge-ready state on commit
How prior bot blocking findings were resolved (all on the current head)
Why this is safe to merge
Q00 maintainers — would you take a final pass and merge the foundation chain (#670 → #671 → #672)? Once #672 lands, the auto driver/brake surface is fully unblocked. |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
9fcef4cfor PR #672
Review record:
f761e672-0038-45c0-acf3-6922f131cd61
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/auto/driver_answerer.py:302 | BLOCKING | _ledger_updates_for() still rewrites each structural ledger value to "driver:{backend} answer (verbatim): {driver_text}" instead of persisting the actual driver answer string unchanged. That reintroduces the exact ledger/transcript divergence this PR says it fixed: the interview transcript stores answer.prefixed_text, while grading/seed consumers read a different ledger value with an extra wrapper. Any downstream equality/parsing checks now see two different sources of truth again, and the existing regression test only asserts substring inclusion, so it does not catch this contract break. |
Non-blocking Suggestions
None.
Design Notes
The selected-driver path is mostly well-contained: config, CLI/MCP, resume semantics, and brake gating line up cleanly. The remaining weakness is that provenance is still being encoded into the ledger value itself instead of staying in metadata/evidence, which makes the “single canonical answer” contract fragile.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Cross-linking #725 self-restraint thread and the closures of #695/#683 (regulated-topic vocab + answer risk classifier). This PR bundles two changes with very different self-restraint profiles:
The brake's risk-question vocabulary (destructive execution intent, credentials, external side effects, ...) is exactly what #689 / #725 say should live in a UserLevel skill ( Concrete recommendation:
If keeping the brake in core temporarily is preferred, make it a no-op default (always allows answers, brake-on means "log only") so the vocabulary's enforcement gravity stays in plugin-land. That avoids the inconsistency of merging the brake here while #695/#683 are closed for the same reason. |
Stack
3/3 Auto selected driver/brake mode plus registry-backed CLI wiring. Review after #671.
Previous PRs:
Changes
ooo auto.ouroboros config driver set <backend>, includinghermes.ooo autodeterministic unless--driveris provided.Validation
uv run pytest tests/unit/backends/test_capabilities.py tests/unit/providers/test_factory.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/config/test_loader.py tests/unit/config/test_models.py tests/unit/cli/test_config.py tests/unit/cli/test_auto_command.py tests/unit/auto/test_driver_answerer.py tests/unit/auto/test_surface.py -quv run ruff check src/ouroboros/backends src/ouroboros/providers/hermes_cli_adapter.py src/ouroboros/providers/factory.py src/ouroboros/orchestrator/runtime_factory.py src/ouroboros/config/loader.py src/ouroboros/cli/commands/auto.py src/ouroboros/cli/commands/config.py tests/unit/backends/test_capabilities.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/providers/test_factory.py tests/unit/cli/test_config.py tests/unit/cli/test_auto_command.pyuv run mypy src/ouroboros/backends src/ouroboros/providers/hermes_cli_adapter.py src/ouroboros/providers/factory.py src/ouroboros/cli/commands/config.py src/ouroboros/cli/commands/auto.pyuv run ouroboros config driver --help && uv run ouroboros auto --help