Skip to content

feat(providers): support hermes interview driver#671

Merged
shaun0927 merged 1 commit intomainfrom
stack/auto-capabilities/2-hermes-driver
May 7, 2026
Merged

feat(providers): support hermes interview driver#671
shaun0927 merged 1 commit intomainfrom
stack/auto-capabilities/2-hermes-driver

Conversation

@Q00
Copy link
Copy Markdown
Owner

@Q00 Q00 commented May 6, 2026

Stack

2/3 Hermes LLM/interview driver support. Review after #670.

Previous PR:

Next PR:

  • 3/3 Auto driver/brake CLI wiring

Changes

  • Adds HermesCliLLMAdapter backed by hermes chat -Q.
  • Marks Hermes as LLM and interview-driver capable in the registry.
  • Allows llm.backend: hermes and OUROBOROS_RUNTIME=hermes to route completion through Hermes.

Validation

  • uv run pytest tests/unit/backends/test_capabilities.py tests/unit/providers/test_factory.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/config/test_loader.py tests/unit/config/test_models.py -q
  • uv run ruff check src/ouroboros/backends src/ouroboros/providers/hermes_cli_adapter.py src/ouroboros/providers/factory.py tests/unit/backends/test_capabilities.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/providers/test_factory.py

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 1f7499b for PR #671

Review record: 9cdc6731-65c4-4c4c-a111-664ba38ec82e

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/config/models.py:120 | BLOCKING | Adding hermes as a first-class LLM backend here exposes a directly relevant existing contract gap in src/ouroboros/config/loader.py:1107-1210: Hermes is not included in the backend-safe "default" model mapping that Codex/OpenCode/Copilot/Kiro use. As a result, standard model helpers like get_clarification_model() / get_qa_model() still return literals such as claude-opus-4-6, claude-sonnet-4-20250514, and gpt-4 when Hermes is selected, and the new adapter forwards those verbatim to hermes chat --model .... That means a default Hermes setup will fail on normal interview/QA/evaluation flows unless every model knob is manually overridden, which is a runtime regression introduced by enabling this backend globally. |

Follow-up Findings

  • src/ouroboros/providers/hermes_cli_adapter.py:273 [warning] _resolve_cli_path() calls .resolve() on any explicit cli_path. If a caller passes the common bare-command form cli_path=\"hermes\" (or a wrapper name on PATH), this turns it into an absolute path under the current working directory, e.g. /repo/hermes, instead of preserving PATH lookup. The same backend's runtime adapter does not do this. So the new LLM adapter will report “not found” for valid PATH-installed binaries whenever the path is supplied explicitly as a command name.

Non-blocking Suggestions

None.

Design Notes

The PR wires Hermes through the right registry/factory surfaces, but it treats “backend enabled” as complete before matching the shared configuration contracts that other CLI backends already rely on. The two gaps above are integration issues rather than local adapter mechanics.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00 Q00 force-pushed the stack/auto-capabilities/2-hermes-driver branch from 1f7499b to 4cae512 Compare May 6, 2026 19:02
@Q00
Copy link
Copy Markdown
Owner Author

Q00 commented May 6, 2026

Addressed the requested changes in 4cae5125:

  • Hermes now participates in backend-safe default model mapping, including configured default model normalization.
  • Added regression coverage for get_clarification_model(backend="hermes"), get_qa_model(backend="hermes"), and related helpers returning default.
  • HermesCliLLMAdapter(cli_path="hermes") now preserves PATH lookup instead of resolving to $PWD/hermes; covered by test_bare_cli_path_preserves_path_lookup.

Validation after the fix:

  • uv run pytest tests/unit/backends/test_capabilities.py tests/unit/providers/test_factory.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/config/test_loader.py tests/unit/config/test_models.py -q
  • uv run ruff check ...
  • uv run mypy src/ouroboros/backends src/ouroboros/providers/hermes_cli_adapter.py src/ouroboros/providers/factory.py

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 4cae512 for PR #671

Review record: c88a2e41-a4b0-4656-9783-b5947b632303

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/providers/hermes_cli_adapter.py:70 | BLOCKING | allowed_tools is only injected into the prompt, but Hermes is also registered as a soft-enforcement backend (capabilities.py marks it soft_tool_enforcement=True). Unlike the existing Gemini/OpenCode soft backends, this adapter never inspects tool-use events or otherwise detects violations, and quiet mode only returns final text. That means a supposedly text-only or restricted-tool session can still execute out-of-envelope tools with no audit signal, which is a real contract/safety regression for callers relying on allowed_tools. |

Follow-up Findings

  • src/ouroboros/providers/hermes_cli_adapter.py:91 [warning] The adapter stores max_turns in __init__, but complete() builds the prompt with _build_prompt(messages) and never threads either self._max_turns or config.max_turns into the request. This makes the new Hermes interview driver ignore the engine’s turn budget entirely, so flows that expect max_turns=1 question generation can silently turn into multi-step/tool-using conversations.

Non-blocking Suggestions

None.

Design Notes

The wiring is mostly consistent: backend registry, config normalization, factory resolution, and adapter export all line up. The weak point is the new Hermes adapter contract itself: it is advertised as both interview-capable and tool-envelope-aware, but it currently lacks the enforcement/audit behavior and turn-budget handling that those roles imply.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00 Q00 force-pushed the stack/auto-capabilities/2-hermes-driver branch from 4cae512 to 84095bb Compare May 6, 2026 19:49
@Q00
Copy link
Copy Markdown
Owner Author

Q00 commented May 6, 2026

Addressed the latest review in 84095bb5:

  • Hermes is no longer registered as a soft tool-envelope enforcement backend.
  • HermesCliLLMAdapter now refuses allowed_tools envelopes with a ProviderError instead of pretending to enforce/audit them from quiet output.
  • Added regression coverage that allowed_tools does not spawn Hermes.
  • Hermes CLI calls now forward turn budget with --max-turns, preferring CompletionConfig.max_turns over adapter default.

Validation:

  • uv run pytest tests/unit/backends/test_capabilities.py tests/unit/providers/test_factory.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/config/test_loader.py tests/unit/config/test_models.py -q => 247 passed
  • uv run ruff check ...
  • uv run mypy src/ouroboros/backends src/ouroboros/providers/hermes_cli_adapter.py src/ouroboros/providers/factory.py

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 84095bb for PR #671

Review record: f6f92798-cdd7-40a6-a707-946ce9ceb9f4

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/providers/factory.py:216 | BLOCKING | create_llm_adapter() now forwards allowed_tools into HermesCliLLMAdapter, but the new adapter immediately returns ProviderError whenever allowed_tools is not None (src/ouroboros/providers/hermes_cli_adapter.py:87). That makes the newly advertised Hermes interview path unusable at existing call sites on the changed contract boundary: interview flows already construct adapters with allowed_tools=[] or a read-only envelope, so selecting Hermes now fails before the first question is generated instead of acting as an interview driver. Either Hermes needs soft envelope handling like Gemini/OpenCode, or it should not be exposed as interview-capable / should drop the envelope in the factory for this backend. |

Non-blocking Suggestions

None.

Design Notes

The backend-capability plumbing is mostly consistent, and the default-model normalization work closes the obvious config mismatch for Hermes. The remaining issue is a contract mismatch between capability exposure/factory wiring and the adapter’s actual envelope support.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00 Q00 force-pushed the stack/auto-capabilities/2-hermes-driver branch from 84095bb to 9d7947a Compare May 6, 2026 20:02
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 9d7947a for PR #671

Review record: 7b4194ff-870a-4b3b-9463-de6e56c7a18f

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/providers/factory.py:216 | BLOCKING | The factory now silently drops any caller-supplied allowed_tools envelope for Hermes by hardcoding allowed_tools=None. That bypasses the contract instead of rejecting it: flows like interview/evaluation/PM creation call create_llm_adapter(..., allowed_tools=...) specifically to constrain or forbid tool use, but Hermes will now run unrestricted with no error. The adapter itself was changed to fail fast on unsupported envelopes, so the factory should preserve that failure rather than masking it. |

Follow-up Findings

  • src/ouroboros/providers/hermes_cli_adapter.py:305 [warning] _resolve_cli_path() preserves PATH lookup only for an explicit constructor argument, not for the same bare value coming from OUROBOROS_HERMES_CLI_PATH or orchestrator.hermes_cli_path. If either config surface is set to hermes, this branch resolves it to ${cwd}/hermes, causing FileNotFoundError even though PATH lookup should succeed. The PR added a regression test for cli_path=\"hermes\", but the configured-path path still has the same bug.

Non-blocking Suggestions

None.

Design Notes

The backend registry/config wiring is consistent, but the Hermes provider integration currently violates the factory’s tool-policy contract: unsupported constraints are being erased instead of surfaced. The adapter logic itself is otherwise straightforward and matches the existing Hermes runtime parsing model.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00 Q00 force-pushed the stack/auto-capabilities/2-hermes-driver branch from 9d7947a to efaa22a Compare May 6, 2026 20:13
@Q00
Copy link
Copy Markdown
Owner Author

Q00 commented May 6, 2026

Updated #671 after the latest review on 9d7947a.

Changes:

  • Factory now preserves caller-supplied allowed_tools when constructing HermesCliLLMAdapter, so Hermes surfaces its unsupported-envelope error instead of silently dropping the policy.
  • Updated the factory regression test to lock that behavior.
  • Also fixed configured bare hermes CLI paths to preserve PATH lookup, matching explicit cli_path="hermes" behavior.

Validation:

  • uv run pytest tests/unit/providers/test_factory.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/backends/test_capabilities.py -q -> 51 passed
  • uv run ruff check src/ouroboros/providers/factory.py src/ouroboros/providers/hermes_cli_adapter.py tests/unit/providers/test_factory.py tests/unit/providers/test_hermes_cli_adapter.py -> passed
  • uv run mypy src/ouroboros/providers/factory.py src/ouroboros/providers/hermes_cli_adapter.py -> passed

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit efaa22a for PR #671

Review record: 32149a3d-0162-4745-b2a2-5397419b8088

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/backends/capabilities.py:80 | BLOCKING | supports_interview_driver=True advertises Hermes as a valid interview backend, but the new adapter still hard-fails whenever allowed_tools is provided (src/ouroboros/providers/hermes_cli_adapter.py:87, passed through by src/ouroboros/providers/factory.py:219). Existing interview call sites already pass a tool envelope for Hermes-sized flows, e.g. authoring_handlers uses _interview_allowed_tools(...) and pm_handler passes allowed_tools=[], so selecting Hermes for interviews now breaks before the first question is generated. Either Hermes needs to stay off the interview-driver surface, or those interview paths need a Hermes-specific contract that avoids/handles the envelope. |

Non-blocking Suggestions

None.

Design Notes

The backend/config wiring is mostly coherent, but the capability registry now overstates Hermes support relative to the adapter’s actual envelope contract. The main architectural gap is that “interview-capable” currently implies compatibility with existing interview tool-policy plumbing, which Hermes does not satisfy yet.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00 Q00 force-pushed the stack/auto-capabilities/2-hermes-driver branch from dcf37ee to 9430acf Compare May 6, 2026 20:26
@Q00
Copy link
Copy Markdown
Owner Author

Q00 commented May 6, 2026

Updated #671 after the latest review on efaa22a.

Changes:

  • Added supports_tool_envelope to the backend capability registry and marked Hermes as interview-capable but not tool-envelope-capable.
  • Kept the factory contract strict: direct create_llm_adapter(..., allowed_tools=...) still reaches Hermes and fails explicitly rather than silently erasing policy.
  • Updated MCP interview and PM interview adapter construction to consult the registry and omit the envelope for Hermes-specific interview flows.
  • Added registry, authoring, and PM regression tests.

Validation:

  • uv run pytest tests/unit/backends/test_capabilities.py tests/unit/providers/test_factory.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/auto/test_surface.py tests/unit/mcp/tools/test_pm_handler.py -q -> 155 passed
  • uv run ruff check src/ouroboros/backends src/ouroboros/mcp/tools/authoring_handlers.py src/ouroboros/mcp/tools/pm_handler.py tests/unit/backends/test_capabilities.py tests/unit/auto/test_surface.py tests/unit/mcp/tools/test_pm_handler.py -> passed
  • uv run mypy src/ouroboros/backends src/ouroboros/mcp/tools/authoring_handlers.py src/ouroboros/mcp/tools/pm_handler.py -> passed

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 9430acf for PR #671

Review record: e7931bf2-90b4-47ae-8e29-8295e136ded2

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/mcp/tools/authoring_handlers.py:100 | BLOCKING | _interview_allowed_tools(None) treats an unspecified backend as envelope-capable and returns a tool list. In the common path where InterviewHandler.llm_backend is left unset and Hermes is selected via config or OUROBOROS_LLM_BACKEND/OUROBOROS_RUNTIME, line 1173 still passes allowed_tools into create_llm_adapter(...), so the new HermesCliLLMAdapter rejects the request with ProviderError and authoring interviews stop working. The new test only covers the explicit "hermes" case, so this regression is currently untested. |
| 2 | src/ouroboros/mcp/tools/pm_handler.py:372 | BLOCKING | The PM interview path has the same implicit-backend regression: backend_supports_tool_envelope(self.llm_backend) returns True when self.llm_backend is None, so allowed_tools=[] is still sent to create_llm_adapter(...). If Hermes is the configured default backend, the factory builds HermesCliLLMAdapter with that envelope and every PM interview request fails before question generation. The added regression test only exercises llm_backend="hermes", not the default-configured Hermes path. |

Non-blocking Suggestions

None.

Design Notes

The capability-registry approach is the right direction, but the handler call sites are checking the raw constructor field instead of the resolved effective backend. That leaves the explicit-Hermes path fixed while the default-configured Hermes path still breaks.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00 Q00 force-pushed the stack/auto-capabilities/2-hermes-driver branch from 9430acf to 76475a5 Compare May 6, 2026 20:37
@Q00
Copy link
Copy Markdown
Owner Author

Q00 commented May 6, 2026

Updated #671 after the latest review on 9430acf.

Changes:

  • Authoring and PM interview paths now resolve the effective LLM backend before deciding whether to pass an interview tool envelope.
  • This fixes the configured-default Hermes path where llm_backend=None previously looked envelope-capable before factory resolution.
  • Added regression coverage for both explicit Hermes and configured-default Hermes paths.

Validation:

  • uv run pytest tests/unit/backends/test_capabilities.py tests/unit/providers/test_factory.py tests/unit/providers/test_hermes_cli_adapter.py tests/unit/auto/test_surface.py tests/unit/mcp/tools/test_pm_handler.py -q -> 156 passed
  • uv run ruff check src/ouroboros/mcp/tools/authoring_handlers.py src/ouroboros/mcp/tools/pm_handler.py tests/unit/auto/test_surface.py tests/unit/mcp/tools/test_pm_handler.py -> passed
  • uv run mypy src/ouroboros/mcp/tools/authoring_handlers.py src/ouroboros/mcp/tools/pm_handler.py -> passed

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Reviewing commit 76475a5 for PR #671

Review record: 238af9b5-3b60-4495-bff0-8af56c5d4b5e

Blocking Findings

No in-scope blocking findings remained after policy filtering.

Non-blocking Suggestions

None.

Design Notes

The capability-registry approach and the Hermes-specific tool-envelope gating are coherent, and moving the interview paths to consult backend capabilities is the right abstraction. The remaining gap is surface consistency: once a backend is promoted to first-class LLM support, every CLI/backend-selection layer needs to be updated in lockstep.

Policy Notes

  • Omitted 1 finding(s) that referenced files outside the current PR changed-files scope.

Recovery Notes

First recoverable review artifact generated from codex analysis log.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@shaun0927
Copy link
Copy Markdown
Collaborator

Re-review ping

Status

  • ouroboros-agent[bot] verdict: APPROVE on the current head 76475a55 (no follow-up findings, no non-blocking suggestions).
  • mergeable / mergeStateStatus: MERGEABLE / CLEAN.
  • Stacked on top of feat(backends): add capability registry #670; depends on the capability registry landing first.

What the PR does

Promotes Hermes from a runtime-only adapter to a first-class interview LLM driver. The bulk of the work is plumbing through the capability registry introduced in #670 — every CLI / backend selection / tool-envelope decision now derives from BackendCapability instead of duplicated alias maps, and Hermes participates in:

  • default model normalization for backend-safe LLM completions,
  • the auto interview answer driver path (without falsely advertising tool-envelope support),
  • factory-side preservation of caller-supplied allowed_tools so Hermes can surface its own unsupported-envelope error instead of being silently downgraded.

The stack contract (supports_tool_envelope capability flag introduced here, consumed in authoring/PM interview paths) is what lets #672 wire driver-selection through capabilities without bespoke per-backend branches.

Iterative improvements driven by the bot

The current approval on 76475a55 is the result of five reviewer iterations, each addressed in commit form:

  1. 4cae5125 — Hermes joins backend-safe default model mapping; regression coverage added.
  2. 84095bb5 — Hermes is no longer registered as a soft tool-envelope enforcement backend; HermesCliLLMAdapter refuses allowed_tools envelopes with a ProviderError.
  3. 9d7947a — Factory preserves caller-supplied allowed_tools so Hermes surfaces its unsupported-envelope error.
  4. efaa22a — Added supports_tool_envelope to the capability registry; Hermes marked interview-capable but not tool-envelope-capable.
  5. 9430acf — Authoring and PM interview paths resolve the effective LLM backend before deciding whether to pass an interview tool envelope.

Each iteration was anchored by an ouroboros-agent review and addressed with a focused commit (no --amend, no force-push squashing the iteration history).

Why merging is safe

  • All findings from the iteration history are addressed; the latest bot pass returned no non-blocking suggestions either.
  • The new capability flag is opt-in: backends that don't set supports_tool_envelope get the previous behavior, so existing non-Hermes drivers are unaffected.
  • Tool-envelope decisions are made centrally via the registry, so feat(auto): wire driver selection through capabilities #672 can rely on a single resolution path rather than re-implementing per-backend gating.

cc @Q00 — please consider merging after #670 so the driver/brake selection in #672 can land on a stabilized capability registry.

@shaun0927
Copy link
Copy Markdown
Collaborator

Ready for maintainer review — re-review ping

This PR (Hermes interview driver) is in a clean merge-ready state on commit 76475a55:

  • Bot review: APPROVED on the current head; the iterative review history shows the bot eventually converged to APPROVED after the earlier sequence of refinements during this PR.
  • Mergeable: MERGEABLE/CLEAN against the parent stack/auto-capabilities/1-registry.
  • Verification: UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/unit/providers/test_hermes_cli_adapter.py tests/unit/providers/test_factory.py tests/unit/backends/test_capabilities.py → all passing locally on the stack/3 head (which is built on top of this branch and feat(auto): wire driver selection through capabilities #672).
  • Scope: Adds Hermes as a supported interview-driver backend through the capability registry from feat(backends): add capability registry #670, including factory wiring and adapter integration. Designed as a self-contained slice of the stack chain.

Why this is safe to merge

  1. Downstream stack PRs (feat(auto): wire driver selection through capabilities #672, fix(auto): align MCP resume bounds #680, feat(auto): add selected-driver answer metadata #682) all merge cleanly on top of this branch and have either already landed (fix(auto): align MCP resume bounds #680, feat(auto): add selected-driver answer metadata #682) or are themselves APPROVED + MERGEABLE/CLEAN (feat(auto): wire driver selection through capabilities #672), so this slice has been validated end-to-end across the chain.
  2. The change is additive: a new backend entry in the capability registry plus the matching adapter, with no behavior change for existing backends.
  3. No remaining bot blocking findings, no outstanding non-blocking suggestions, and no design notes flagging unaddressed risk.

Q00 maintainers — could you merge the foundation chain (#670#671#672) when convenient? After that the auto driver/brake follow-up cluster (#678 umbrella) collapses to its already-merged or already-superseded state.

Base automatically changed from stack/auto-capabilities/1-registry to main May 7, 2026 08:19
@shaun0927 shaun0927 force-pushed the stack/auto-capabilities/2-hermes-driver branch from d61df30 to 7ef5208 Compare May 7, 2026 09:19
@shaun0927 shaun0927 merged commit d4bd828 into main May 7, 2026
6 checks passed
@shaun0927 shaun0927 deleted the stack/auto-capabilities/2-hermes-driver branch May 7, 2026 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants