Skip to content

feat(auto): block risky-fallback answers for regulated topics#695

Closed
shaun0927 wants to merge 6 commits intoQ00:mainfrom
shaun0927:feat/640-risky-fallback-gate
Closed

feat(auto): block risky-fallback answers for regulated topics#695
shaun0927 wants to merge 6 commits intoQ00:mainfrom
shaun0927:feat/640-risky-fallback-gate

Conversation

@shaun0927
Copy link
Copy Markdown
Collaborator

Summary

Add a narrow risky-fallback gate to the deterministic auto answerer so high-risk topics surface for human review instead of being silently filled with a generic default.

When the deterministic answerer would otherwise return CONSERVATIVE_DEFAULT or ASSUMPTION for a question whose topic has no defensible generic default, it now returns a BLOCKER with a concrete reason.

Targeted topics (intentionally narrow):

  • Regulated personal data: PII, personally identifiable information, GDPR, HIPAA, SOX, PCI-DSS.
  • Destructive bulk schema/table operations: truncate/purge of table(s) or schema(s).

Excluded by design (already covered or would over-trigger):

  • Production deployment authority — already covered by _blocker_for for the explicit verb+target pairs (deploy/release/publish to/against/on production/prod/live/external).
  • Real-money authority for credit cards / billing accounts — already covered by _blocker_for.
  • Generic 'destructive' keywords like drop ... database — already covered by _blocker_for.

Why

This implements the fourth piece of #640's acceptance criteria:

block or ask the user when a required answer would otherwise be a risky fallback

The provenance taxonomy is in place after #646/#666, but the auto answerer was still willing to fabricate a generic answer for topics where no generic answer is safe. PR-B keeps the change deliberately narrow — only the topics where a fallback is unambiguously wrong — so the existing safe-allowlists for product-feature questions about credentials/branches keep working.

Behavior

  • _blocker_for runs first — explicit sensitive-authority questions still block at their original reason and message.
  • Deterministic routing then picks a category answerer (verification / acceptance / actor-IO / runtime / product behavior / default).
  • After routing, if the answer source is CONSERVATIVE_DEFAULT or ASSUMPTION and the question matches a risky-fallback pattern and the safe-product allowlists do not exempt it, the answer is replaced with a BLOCKER carrying the matched reason.
  • A REPO_FACT / USER_GOAL / EXISTING_CONVENTION / INFERENCE answer is never gated (callers can still ground the answer with bounded repo facts via AutoAnswerContext).

Out of scope

Tests

  • New `test_auto_answerer_blocks_regulated_data_questions_instead_of_falling_back` — PII / HIPAA / 'purge tables' all block with their matched reason.
  • New `test_auto_answerer_does_not_block_regulated_topic_when_repo_fact_supplied` — a HIPAA-adjacent runtime question with supplied repo facts still returns `REPO_FACT`.
  • New `test_auto_answerer_skips_risky_fallback_for_safe_product_credential_questions` — feature questions about credentials remain unblocked through the existing safe-product allowlists.
  • Existing `test_auto_answerer_allows_safe_production_and_project_feature_questions` and the production-credential authority test continue to pass.

Validation

  • `UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/unit/auto/ tests/unit/mcp/ -q` → 998 passed
  • `UV_CACHE_DIR=/tmp/uv-cache uv run ruff check src/ouroboros/auto/answerer.py tests/unit/auto/test_ledger_grading_answerer.py` → passed
  • `UV_CACHE_DIR=/tmp/uv-cache uv run ruff format --check` (same files) → passed

Refs #640
Independent of #693 (provenance surface). Either PR can land first.

When a deterministic answer would land on CONSERVATIVE_DEFAULT or
ASSUMPTION for a high-risk topic that has no defensible generic
default, upgrade it to a BLOCKER instead of silently committing the
auto Seed to a fabricated stance.

Targeted topics:
- regulated personal data (PII, GDPR, HIPAA, SOX, PCI-DSS)
- destructive bulk schema/table operations (truncate/purge tables/schemas)

Existing safe-allowlists keep working: product-feature questions about
credentials/branches and the explicit `_blocker_for` patterns are
unchanged. REPO_FACT/USER_GOAL-backed answers also pass through
without gating.

Refs Q00#640
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 6944525 for PR #695

Review record: 91c7e9e2-a3f6-4466-b71c-f40cebb2e5c2

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/auto/answerer.py:127 | BLOCKING | The new post-routing blocker turns any CONSERVATIVE_DEFAULT/ASSUMPTION answer into a hard stop whenever the question merely mentions HIPAA, GDPR, PII, etc. Because this runs after _is_feature_acceptance_question() and _is_verification_question(), benign prompts like What acceptance criteria should the HIPAA worker satisfy? or Which command output verifies the GDPR export flow? now return BLOCKER instead of the existing feature/verification guidance. That is a behavioral regression: these questions do not ask the model to decide regulated-data handling, but the broad keyword match in _RISKY_FALLBACK_PATTERNS treats them as if they do. |

Non-blocking Suggestions

None.

Design Notes

The change is directionally reasonable: blocking generic fallbacks for genuinely high-risk topics is safer than inventing defaults. The issue is that the current implementation keys off broad keywords after answer routing, so it catches safe meta-questions as well as the risky ones.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Address PR Q00#695 blocking review finding:

The post-routing gate keyed off broad keywords, so meta-questions like
"What acceptance criteria should the HIPAA worker satisfy?" or
"Which command output verifies the GDPR export flow?" were rejected
as if they asked for regulated-data handling decisions. They actually
hit the `_feature_acceptance_answer` / `_verification_answer` routes
which return safe templates regardless of subject keywords.

Restructure `answer()` so the gate only fires after generative routes
(actor/IO, runtime, product behavior, default).  Meta-routes
(non-goal listing, verification, feature acceptance) return early
without going through the gate.

Add regression coverage for HIPAA/GDPR/PII acceptance and verification
phrasings to ensure they keep returning CONSERVATIVE_DEFAULT answers.
Existing PII/HIPAA generative-route block tests continue to pass.

Refs Q00#640
Copy link
Copy Markdown
Owner

@Q00 Q00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed. The regulated-topic risky fallback gate is a policy boundary, so it needs fully green tests. Checks are still running, so I cannot approve yet.

@Q00 Q00 added OS Core engine, state machine, internal pipeline, and system-level behavior Safety Risk, guardrail, policy, and regulated-topic behavior labels May 7, 2026
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 0dc7134 for PR #695

Review record: e629351a-e803-4b69-9b75-183cf6155272

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/auto/answerer.py:127 | BLOCKING | The new risky-fallback gate does not actually cover generic runtime answers. answer() routes Which runtime should the HIPAA worker use? into _runtime_answer(), and when no runtime_context fact is supplied that path returns AutoAnswerSource.EXISTING_CONVENTION, not one of _RISKY_FALLBACK_SOURCES. As a result, regulated runtime questions still get the generic “use the existing repository runtime...” answer instead of blocking, which contradicts the new policy and leaves a high-risk fallback path open. |

Non-blocking Suggestions

None.

Design Notes

The change is directionally sound: it separates explicit hard blockers from softer generic-answer routes. The main issue is that the enforcement key (answer.source) does not align with the documented route-level policy, so one fallback class still escapes the gate.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

Address PR Q00#695 follow-up: a regulated runtime question without a
supplied repo_fact, e.g. "Which runtime should the HIPAA worker use?",
routes through `_runtime_answer` and returns
`AutoAnswerSource.EXISTING_CONVENTION` with the generic
"use the existing repository runtime" template.  Because
EXISTING_CONVENTION was not in `_RISKY_FALLBACK_SOURCES`, that fallback
escaped the gate.

Add EXISTING_CONVENTION to the risky-fallback set.  REPO_FACT-backed
runtime answers (full `runtime_context` supplied) remain unaffected,
and the existing `does_not_block_regulated_topic_when_repo_fact_supplied`
test continues to assert that REPO_FACT answers pass through.

Add a regression test that covers the bot's exact case: HIPAA runtime
question with no supplied facts now blocks with reason
"regulated data handling".

Refs Q00#640
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit bee8570 for PR #695

Review record: 4487f146-f834-41ad-91ef-de76a8de6793

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/auto/answerer.py:659 | BLOCKING | The new destructive-operation gate is too narrow and order-dependent. It only matches `truncate |

Non-blocking Suggestions

None.

Design Notes

The routing change is localized and the added tests cover the intended happy paths, but the safety gate currently relies on brittle keyword regexes. For high-risk topics, that matcher needs broader verb coverage and symmetric phrasing support to be dependable.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

shaun0927 added 2 commits May 7, 2026 15:08
Address PR Q00#695 follow-up: the destructive-operation matcher only
caught ``verb ... noun`` phrasings such as ``purge tables``, so
reverse phrasings like ``Which tables should the migration truncate?``
slipped through. The verb vocabulary was also narrow.

Expand patterns:
- Verbs: ``truncate``, ``purge``, ``wipe`` plus tense variants
  (``truncates``/``truncating``/``truncated`` etc.).
- Nouns: ``table(s)``, ``schema(s)``, ``database(s)``, ``index/indexes/indices``,
  ``migration(s)``.
- Both verb-then-noun and noun-then-verb directions matched.

Note: ``drop ... database`` remains owned by ``_blocker_for`` (its
existing branch fires first), and product-feature questions are still
exempted by the safe-product allowlists, so this does not over-gate
benign feature semantics.

Add a regression test exercising both phrasing directions across the
new verb/noun vocabulary.

Refs Q00#640
Copy link
Copy Markdown
Owner

@Q00 Q00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed across OS/UserLevel/Program boundaries, auto scope, and UX complexity. Approving: the risky-fallback gate narrows automation for regulated/destructive topics, which reduces scope risk rather than expanding it.

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 007e254 for PR #695

Review record: b2b2bb66-208d-4029-8a96-385ad5a6ea68

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/auto/answerer.py:659 | BLOCKING | The new destructive-bulk fallback gate still misses common destructive verbs such as drop and erase. After this change, questions like Which tables should the migration drop? or Should we erase these schemas? will still flow through to a generic auto answer instead of blocking, even though the older _blocker_for policy already treated those verbs as destructive in related contexts. The added test coverage at tests/unit/auto/test_ledger_grading_answerer.py:1069 only exercises truncate/purge/wipe, so this regression path remains untested. |

Non-blocking Suggestions

None.

Design Notes

The routing change is directionally sound: it preserves explicit repo-fact answers while preventing generic fallbacks on higher-risk topics. The main weakness is that the new regex gate is now a separate policy surface from _blocker_for, so vocabulary drift between the two lists can leave obvious holes unless they are kept aligned.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Cross-linking from #689 (closed as out-of-scope for core).

Surfacing a boundary concern, not asking for an immediate decision: this PR places the regulated-topic vocabulary (PII / GDPR / HIPAA / SOX / PCI-DSS) and destructive bulk-schema verbs (truncate / purge of tables / schemas) directly inside src/ouroboros/auto/answerer.py.

The surface — block instead of silently fabricating a CONSERVATIVE_DEFAULT for risky topics — fits core (safety boundary). What I am less sure about is the vocabulary living in core:

  • The set is team- and jurisdiction-specific (e.g. some teams care about KYC/AML; others care about FERPA; others care about export-control). A core PR that hard-codes one regional/regulatory subset will keep accreting siblings.
  • It is exactly the same pattern @Q00 closed Add direct operational-task path for concrete PR and merge goals in ooo auto #689 PRs for: a typed classifier with a hard-coded vocabulary in core, instead of "stable primitive + policy lookup."

Possible reframings (any of these would address the concern without losing the safety property):

  1. Keep the gate in core, move the vocabulary to a config/policy file the gate consults. Default config can ship the current PII/GDPR/HIPAA/SOX/PCI-DSS list so behavior is unchanged; teams can extend without patching core.
  2. Or: keep core's job as "any CONSERVATIVE_DEFAULT / ASSUMPTION for an answer marked sensitive becomes a BLOCKER" and let an external classifier decide what is sensitive. Core then has zero domain words.
  3. Or: explicitly accept this as a narrow temporary primitive (with a comment + a tracker for plugin-extraction) and freeze the vocabulary at exactly the current 5 + bulk-schema verbs.

Happy to take any of those directions — flagging now so the boundary call is explicit before merge rather than litigated later.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Following up on #725 ("Design a UserLevel plugin manager for operational workflows"): recommend HOLD on this PR until #725 v0 contract lands.

Reasons:

  1. The current diff places regulated-topic vocabulary (PII / GDPR / HIPAA / SOX / PCI-DSS) and destructive bulk-schema verbs directly inside src/ouroboros/auto/answerer.py. Per the Design a UserLevel plugin manager for operational workflows #725 boundary, vocabulary belongs in a UserLevel skill, not in core auto.
  2. Merging now would create the same anti-pattern that got Add direct operational-task path for concrete PR and merge goals in ooo auto #689 closed: core slowly accumulates domain words. After Design a UserLevel plugin manager for operational workflows #725 v0 (RiskAssessor protocol + skill-owned config), this PR's intent can land as (a) a one-liner core change introducing RiskAssessor, plus (b) a default-shipping skill that owns the current vocabulary and can be replaced/extended without patching core.

The gate (block instead of fabricate when the answerer would otherwise return CONSERVATIVE_DEFAULT for a topic with no defensible default) is correct and should land in core. Only the vocabulary list and matching logic moves out.

Concrete suggested redesign (after #725 v0):

  • core: RiskAssessor protocol with one method assess(question, candidate_answer) -> RiskVerdict.
  • core: existing _blocker_for keeps its current behavior; new gate runs RiskAssessor after the deterministic answerer returns and converts a non-empty verdict to a BLOCKER.
  • skill: regulated-topics-default ships with the same PII/GDPR/HIPAA/SOX/PCI-DSS list as today — behavior unchanged for users who install nothing extra.
  • users: can install a different RiskAssessor-providing skill (e.g. regulated-topics-fintech, regulated-topics-healthcare) without forking core.

Happy to split this PR into the core gate + the default skill once the v0 manifest lands. Until then, holding so the contract isn't pre-committed to.

Address PR Q00#695 follow-up: phrasings like "Which tables should the
migration drop?" or "Should we erase these schemas before re-seeding?"
still flowed through to a generic auto answer because ``drop`` and
``erase`` were missing from `_DESTRUCTIVE_BULK_VERBS`.

`_blocker_for` already handles ``drop|delete|erase|wipe ... database``
at the explicit-authority layer, so this matcher and the existing
allow/deny list both treat the same families consistently. The
risky-fallback gate runs after `_blocker_for`, so explicit
authority-style prompts continue to block via the original code path.

Add ``drop`` and ``erase`` (with tense variants) and extend the
regression test to assert both verb-then-noun and noun-then-verb
phrasings using the new vocabulary.

Refs Q00#640
@shaun0927
Copy link
Copy Markdown
Collaborator Author

Closing this PR per the boundary established in #689 / #725 v0 discussion.

The intent is correct (block instead of fabricate for risky topics) and will return after #725 v0 lands as two cleaner pieces:

  1. core RiskAssessor protocol + the gate semantics (small, generic),
  2. a default-shipping regulated-topics-default skill carrying the current PII / GDPR / HIPAA / SOX / PCI-DSS vocabulary.

That split keeps the safety property (block instead of silently fabricate) without committing core to a specific regional/regulatory vocabulary.

Tests in this PR (test_auto_answerer_blocks_regulated_data_questions_instead_of_falling_back, ..._does_not_block_regulated_topic_when_repo_fact_supplied, ..._skips_risky_fallback_for_safe_product_credential_questions) will be ported to the post-v0 implementation.

Refs #725.

@shaun0927 shaun0927 closed this May 7, 2026
shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 7, 2026
Address PR Q00#695 blocking review finding:

The post-routing gate keyed off broad keywords, so meta-questions like
"What acceptance criteria should the HIPAA worker satisfy?" or
"Which command output verifies the GDPR export flow?" were rejected
as if they asked for regulated-data handling decisions. They actually
hit the `_feature_acceptance_answer` / `_verification_answer` routes
which return safe templates regardless of subject keywords.

Restructure `answer()` so the gate only fires after generative routes
(actor/IO, runtime, product behavior, default).  Meta-routes
(non-goal listing, verification, feature acceptance) return early
without going through the gate.

Add regression coverage for HIPAA/GDPR/PII acceptance and verification
phrasings to ensure they keep returning CONSERVATIVE_DEFAULT answers.
Existing PII/HIPAA generative-route block tests continue to pass.

Refs Q00#640
shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 7, 2026
Address PR Q00#695 follow-up: a regulated runtime question without a
supplied repo_fact, e.g. "Which runtime should the HIPAA worker use?",
routes through `_runtime_answer` and returns
`AutoAnswerSource.EXISTING_CONVENTION` with the generic
"use the existing repository runtime" template.  Because
EXISTING_CONVENTION was not in `_RISKY_FALLBACK_SOURCES`, that fallback
escaped the gate.

Add EXISTING_CONVENTION to the risky-fallback set.  REPO_FACT-backed
runtime answers (full `runtime_context` supplied) remain unaffected,
and the existing `does_not_block_regulated_topic_when_repo_fact_supplied`
test continues to assert that REPO_FACT answers pass through.

Add a regression test that covers the bot's exact case: HIPAA runtime
question with no supplied facts now blocks with reason
"regulated data handling".

Refs Q00#640
shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 7, 2026
Address PR Q00#695 follow-up: the destructive-operation matcher only
caught ``verb ... noun`` phrasings such as ``purge tables``, so
reverse phrasings like ``Which tables should the migration truncate?``
slipped through. The verb vocabulary was also narrow.

Expand patterns:
- Verbs: ``truncate``, ``purge``, ``wipe`` plus tense variants
  (``truncates``/``truncating``/``truncated`` etc.).
- Nouns: ``table(s)``, ``schema(s)``, ``database(s)``, ``index/indexes/indices``,
  ``migration(s)``.
- Both verb-then-noun and noun-then-verb directions matched.

Note: ``drop ... database`` remains owned by ``_blocker_for`` (its
existing branch fires first), and product-feature questions are still
exempted by the safe-product allowlists, so this does not over-gate
benign feature semantics.

Add a regression test exercising both phrasing directions across the
new verb/noun vocabulary.

Refs Q00#640
shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 7, 2026
Address PR Q00#695 follow-up: phrasings like "Which tables should the
migration drop?" or "Should we erase these schemas before re-seeding?"
still flowed through to a generic auto answer because ``drop`` and
``erase`` were missing from `_DESTRUCTIVE_BULK_VERBS`.

`_blocker_for` already handles ``drop|delete|erase|wipe ... database``
at the explicit-authority layer, so this matcher and the existing
allow/deny list both treat the same families consistently. The
risky-fallback gate runs after `_blocker_for`, so explicit
authority-style prompts continue to block via the original code path.

Add ``drop`` and ``erase`` (with tense variants) and extend the
regression test to assert both verb-then-noun and noun-then-verb
phrasings using the new vocabulary.

Refs Q00#640
shaun0927 added a commit that referenced this pull request May 7, 2026
…pics (#640) (#738)

* feat(auto): block risky-fallback answers for regulated topics

When a deterministic answer would land on CONSERVATIVE_DEFAULT or
ASSUMPTION for a high-risk topic that has no defensible generic
default, upgrade it to a BLOCKER instead of silently committing the
auto Seed to a fabricated stance.

Targeted topics:
- regulated personal data (PII, GDPR, HIPAA, SOX, PCI-DSS)
- destructive bulk schema/table operations (truncate/purge tables/schemas)

Existing safe-allowlists keep working: product-feature questions about
credentials/branches and the explicit `_blocker_for` patterns are
unchanged. REPO_FACT/USER_GOAL-backed answers also pass through
without gating.

Refs #640

* fix(auto): scope risky-fallback gate to generative answer routes only

Address PR #695 blocking review finding:

The post-routing gate keyed off broad keywords, so meta-questions like
"What acceptance criteria should the HIPAA worker satisfy?" or
"Which command output verifies the GDPR export flow?" were rejected
as if they asked for regulated-data handling decisions. They actually
hit the `_feature_acceptance_answer` / `_verification_answer` routes
which return safe templates regardless of subject keywords.

Restructure `answer()` so the gate only fires after generative routes
(actor/IO, runtime, product behavior, default).  Meta-routes
(non-goal listing, verification, feature acceptance) return early
without going through the gate.

Add regression coverage for HIPAA/GDPR/PII acceptance and verification
phrasings to ensure they keep returning CONSERVATIVE_DEFAULT answers.
Existing PII/HIPAA generative-route block tests continue to pass.

Refs #640

* fix(auto): include EXISTING_CONVENTION runtime fallback in risky gate

Address PR #695 follow-up: a regulated runtime question without a
supplied repo_fact, e.g. "Which runtime should the HIPAA worker use?",
routes through `_runtime_answer` and returns
`AutoAnswerSource.EXISTING_CONVENTION` with the generic
"use the existing repository runtime" template.  Because
EXISTING_CONVENTION was not in `_RISKY_FALLBACK_SOURCES`, that fallback
escaped the gate.

Add EXISTING_CONVENTION to the risky-fallback set.  REPO_FACT-backed
runtime answers (full `runtime_context` supplied) remain unaffected,
and the existing `does_not_block_regulated_topic_when_repo_fact_supplied`
test continues to assert that REPO_FACT answers pass through.

Add a regression test that covers the bot's exact case: HIPAA runtime
question with no supplied facts now blocks with reason
"regulated data handling".

Refs #640

* fix(auto): broaden destructive-bulk patterns and cover reverse phrasing

Address PR #695 follow-up: the destructive-operation matcher only
caught ``verb ... noun`` phrasings such as ``purge tables``, so
reverse phrasings like ``Which tables should the migration truncate?``
slipped through. The verb vocabulary was also narrow.

Expand patterns:
- Verbs: ``truncate``, ``purge``, ``wipe`` plus tense variants
  (``truncates``/``truncating``/``truncated`` etc.).
- Nouns: ``table(s)``, ``schema(s)``, ``database(s)``, ``index/indexes/indices``,
  ``migration(s)``.
- Both verb-then-noun and noun-then-verb directions matched.

Note: ``drop ... database`` remains owned by ``_blocker_for`` (its
existing branch fires first), and product-feature questions are still
exempted by the safe-product allowlists, so this does not over-gate
benign feature semantics.

Add a regression test exercising both phrasing directions across the
new verb/noun vocabulary.

Refs #640

* chore: drop stray local debug artifact accidentally committed in previous fix

* fix(auto): add drop and erase to destructive-bulk verb vocabulary

Address PR #695 follow-up: phrasings like "Which tables should the
migration drop?" or "Should we erase these schemas before re-seeding?"
still flowed through to a generic auto answer because ``drop`` and
``erase`` were missing from `_DESTRUCTIVE_BULK_VERBS`.

`_blocker_for` already handles ``drop|delete|erase|wipe ... database``
at the explicit-authority layer, so this matcher and the existing
allow/deny list both treat the same families consistently. The
risky-fallback gate runs after `_blocker_for`, so explicit
authority-style prompts continue to block via the original code path.

Add ``drop`` and ``erase`` (with tense variants) and extend the
regression test to assert both verb-then-noun and noun-then-verb
phrasings using the new vocabulary.

Refs #640

* fix(auto): require schema/data context for destructive-bulk gate (#640)

Add _DESTRUCTIVE_BULK_NON_DATA_QUALIFIERS to exempt verb/noun pairs that
appear with a non-data artefact qualifier (release plan, docs, roadmap,
etc.) from the destructive-bulk blocker.  Also extend _DESTRUCTIVE_BULK_NOUNS
with record/row/audit-log/audit-trail strong data-object nouns.

Addresses ouroboros-agent[bot] follow-up warning on #738: bare
``migration`` + ``drop`` and ``index`` + ``drop`` questions about release
plans or documentation were overblocked.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* fix(auto): allow product-semantics questions through risky-fallback gate (#640)

Add _is_safe_product_regulated_question() allowlist that passes through
bounded product-behavior questions mentioning regulated nouns (PII/GDPR/
HIPAA/SOX/PCI-DSS) when paired with a product-semantics verb (export,
download, display, show, view, access, …) and NOT a compliance-policy verb
(store, handle, retain, collect, encrypt, …).

Questions like "Should the app export PII reports?" or "Should users be
able to download GDPR exports?" are feature-level requirements and must
not be blocked; questions asking how to store/handle/retain regulated data
still block as before.

Addresses ouroboros-agent[bot] BLOCKING on #738 (answerer.py:716).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* fix(auto): route safe regulated-product questions through product-behavior answerer (#640)

Extend _is_product_behavior_question() with a new arm covering the
product-semantics verbs used by _is_safe_product_regulated_question()
that were not previously matched (download, allow, expose, render, enable,
support) and the "be able to <verb>" phrasing gap for view/access.

Previously, questions allowed past the risky-fallback gate (e.g.
"Should users be able to download GDPR exports?") fell through to
_default_answer(), producing a generic conservative-MVP ledger entry
that silently discarded the regulated-product feature semantics.  With
this fix the router at answerer.py:122 sends those questions to
_product_behavior_answer(), which writes subject-specific
constraints.behavior.* and acceptance.behavior.* ledger entries that
preserve the requested feature in the Seed contract.

New test: test_auto_answerer_routes_safe_regulated_product_questions_to_product_behavior_answerer
asserts blocker=None, source=CONSERVATIVE_DEFAULT, subject-specific
ledger keys (not conservative_mvp), and regulated noun present in
answer text/ledger entries for all three bot example questions.

Addresses ouroboros-agent[bot] BLOCKING on #738 (answerer.py:741).

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

* fix(auto): phrase-scope destructive-bulk non-data qualifier (#640)

Previously the destructive-bulk exemption matched on bare tokens such as
``documentation`` or ``release plan`` anywhere in the sentence, which let
real destructive operations slip past the gate when the question merely
*referenced* documentation as an authority (e.g. "Which tables should we
drop according to the documentation before redeploying?").

The qualifier is now strictly phrase-scoped to ``from the …`` so the
exemption fires only when the artefact is the explicit object of the
drop/wipe — the phrasing that signals "remove an entry from a process
artefact" rather than "delete data from a system". Authority/reference
phrasings ("according to the documentation", "per the release plan",
"in the documentation example") no longer suppress the gate.

Existing pass-through tests still hold:
- "Which migration should we drop from the release plan?" → not blocked
- "Which indexes should we drop from the docs?" → not blocked

New regression test locks the safety boundary:
- "Which tables should we drop according to the documentation …" → BLOCKER
- "Which tables should we drop per the release plan?" → BLOCKER
- "Per the documentation, which audit logs should we purge?" → BLOCKER
- "According to the docs, which tables should we drop?" → BLOCKER

Ref: ouroboros-agent[bot] BLOCKING on PR #738 — answerer.py:688.
68 tests passing in test_ledger_grading_answerer.py (337 in tests/unit/auto). Ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(auto): widen artefact qualifier and trust adjectival compliance verbs (#640)

Two BLOCKING regressions raised by ouroboros-agent[bot] on the previous fix
commit (fc11788):

1) **answerer.py:698** — destructive-bulk exemption only matched ``from the …``
   so safe process-artefact phrasings such as ``Which indexes should we drop
   in the docs?`` and ``Which migration should we drop in the roadmap?`` were
   still mis-blocked as data destruction. The qualifier now also accepts
   ``in the …`` for the same artefact list (``release plan``, ``docs``,
   ``documentation``, ``plan``, ``roadmap``, ``backlog``, ``changelog``,
   ``spec``).  Authority/reference phrasings (``according to the docs``,
   ``per the release plan``) still do not match the qualifier and remain
   blocked, locked in by an expanded regression test.

2) **answerer.py:782** — ``_is_safe_product_regulated_question()`` rejected any
   question containing a compliance-policy verb (``store``, ``handle``,
   ``encrypt``, ``share``, …) anywhere in the sentence. That over-blocked
   legitimate product-behavior questions where the compliance verb appeared as
   a past-participle adjective modifying the noun, e.g.

     Should admins be able to view stored PII fields?
     Should the dashboard display encrypted HIPAA files?

   In both, the main verb is product-semantics (``view`` / ``display``); the
   compliance verb is adjectival. The allowlist now requires (a) a regulated
   noun, (b) a product-question modal, and (c) a product-semantics verb. Pure
   compliance-policy phrasings (``How should the system handle GDPR data
   retention?``, ``What PII should the system collect?``) lack a
   product-semantics verb and remain blocked — covered by the existing
   ``test_auto_answerer_still_blocks_compliance_policy_regulated_questions``.

   The previously-defined ``_COMPLIANCE_POLICY_VERBS_RE`` constant is now
   unused and removed to avoid dead code.

New regression coverage:
- ``test_auto_answerer_allows_in_the_artefact_drop_questions`` — locks in
  ``in the docs/roadmap/release plan/changelog`` exemption.
- ``test_auto_answerer_allows_product_questions_with_adjectival_compliance_verbs``
  — locks in ``view stored PII``, ``display encrypted HIPAA files``,
  ``download retained GDPR exports``, etc.

Existing safety tests (compliance-policy questions, authority-reference
phrasings) all continue to block.

339 unit tests passing in tests/unit/auto. Ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(auto): block mixed-intent regulated questions via active-verb precedence (#640)

Bot follow-up on commit e846a47: the regulated-product allowlist was too
permissive. Mixed-intent questions that pair a compliance-policy verb with a
product-semantics verb — e.g.

    How should the system store and display HIPAA files?
    Should we retain and export PII records?

still ask the auto pipeline to decide regulated-data handling and must remain
blocked, even though they also mention a product verb.

The fix adds an explicit precedence rule: an *active*-form compliance-policy
verb (``store`` / ``stores`` / ``storing``, ``retain`` / ``retains`` /
``retaining``, ``encrypt``, ``handle``, ``collect``, ``share``, ``transmit``,
``disclose``, ``process``, ``manage``, ``govern``) blocks the question even if
a product-semantics verb is also present.

Past-participle forms (``stored``, ``encrypted``, ``retained``, ``collected``,
``shared``, …) are intentionally excluded from the negative list because they
act adjectivally on the regulated noun (``view stored PII``, ``display
encrypted HIPAA files``); the main verb of those sentences is the
product-semantics one and the question is product-behavior over already-
existing regulated data, not a compliance-policy decision.

New regression test ``test_auto_answerer_blocks_mixed_intent_regulated_questions``
locks the precedence rule on the bot's own examples plus three more variants
covering ``encrypt`` / ``share`` / ``collect``. Existing positive tests
(adjectival compliance verbs, pure product semantics) and existing negative
tests (pure compliance phrasings) all continue to pass.

340 unit tests passing in tests/unit/auto. Ruff clean.

Ref: ouroboros-agent[bot] BLOCKING on #738 — ``answerer.py:750``.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(auto): align router verb list with regulated allowlist (#640)

Bot follow-up on commit 7ed761c: ``_PRODUCT_SEMANTICS_REGULATED_VERBS_RE``
allows ``export`` and ``show`` (and the rest of the safe-allowlist set), but
the explicit alignment branch in ``_is_product_behavior_question()`` only
listed a subset (``download/allow/expose/render/enable/support/view/access``).
``export`` / ``show`` / ``display`` were still matched by an earlier broader
pattern in the same function, but the visible alignment was incomplete and
prone to silent drift.

The router branch added to bridge ``_is_safe_product_regulated_question()``
into ``_is_product_behavior_question()`` now lists every verb in the
allowlist:

    export | download | render | display | show | expose | support |
    enable | allow | view | access

This is a no-op for already-routed verbs but makes the allowlist↔router
contract explicit and grep-checkable, eliminating the drift surface flagged in
the bot's design note.

Test changes: ``test_auto_answerer_routes_safe_regulated_product_questions_to_product_behavior_answerer``
now exercises every verb in the allowlist (export, show, display, render,
expose, support, enable in addition to download/view/access). Each case
asserts (a) the gate passes (``answer.blocker is None``), (b) the router
takes the product-behavior path (``constraints.behavior.*`` and
``acceptance.behavior.*`` ledger keys, not the generic
``constraints.conservative_mvp`` from ``_default_answer()``), and (c) the
regulated noun is preserved in the answer text or ledger value.

340 unit tests passing in tests/unit/auto. Ruff clean.

Ref: ouroboros-agent[bot] BLOCKING on #738 — ``answerer.py:548``.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(auto): route regulated-product questions before IO/runtime branches (#640)

Bot follow-up on commit 52ef5ab: ``_is_safe_product_regulated_question()``
suppressed the risky-fallback blocker for any regulated-noun + product-verb
combination, but the router checked ``_is_actor_or_io_question`` and
``_is_runtime_context_question`` *before* ``_is_product_behavior_question``,
so prompts like

    What inputs should the GDPR export take?
    Which runtime should the GDPR export use?

got a generic IO/runtime answer (``ASSUMPTION`` / ``EXISTING_CONVENTION``)
and then bypassed the blocker via the safe-allowlist — silently dropping the
regulated-feature semantics from the ledger.

Fix: pull ``_is_safe_product_regulated_question`` to the top of the
content-routing chain so any regulated-product question — IO-shaped,
runtime-shaped, or product-shaped — is dispatched to
``_product_behavior_answer()``. The risky-fallback gate at the tail of
``answer()`` already consults the same predicate, so the router and the
safe-allowlist now share a single answer path.

Pure compliance phrasings remain blocked unchanged: they fail the
allowlist (no product-semantics verb) and fall through to the previous
branches, where the risky-fallback gate fires for any
``CONSERVATIVE_DEFAULT`` / ``ASSUMPTION`` / ``EXISTING_CONVENTION`` source.

New regression test ``test_auto_answerer_routes_regulated_product_questions_before_io_or_runtime``
locks in:
- Bot's example "What inputs should the GDPR export take?"
- Bot's example "Which runtime should the GDPR export use?"
- Two adjacent IO/runtime regulated-product variants

Each case asserts (a) not blocked, (b) answer comes from
``_product_behavior_answer()`` (subject-specific ``behavior.*`` ledger keys,
no IO/runtime keys), and (c) the regulated noun is preserved in the answer
text or ledger value.

341 unit tests passing in tests/unit/auto. Ruff clean. The two failures in
tests/unit/orchestrator/test_codex_cli_runtime.py are pre-existing and
unrelated to this PR's scope (verified by stashing the patch).

Ref: ouroboros-agent[bot] BLOCKING on #738 — ``answerer.py:837``.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(auto): preserve grounded REPO_FACT for regulated-runtime questions (#640)

Bot follow-up on commit 44e405f: the unconditional early route to
``_product_behavior_answer()`` for any question that
``_is_safe_product_regulated_question()`` recognised broke the existing
runtime contract. With a supplied ``runtime_context`` repo fact, a question
like ``Which runtime should the GDPR export use?`` should return a
``REPO_FACT`` runtime answer carrying the grounded evidence; the early
route replaced it with a generic product-behavior entry, dropping the
evidence.

Restructure: keep the original IO/runtime/product/default order so grounded
``REPO_FACT`` answers stay on the runtime path, then re-route to
``_product_behavior_answer()`` only when the chosen route produced a
non-grounded fallback (``ASSUMPTION`` / ``EXISTING_CONVENTION`` /
``CONSERVATIVE_DEFAULT``) AND the safe-allowlist recognises the question
as regulated-product. Concretely:

- ``Which runtime should the GDPR export use?`` + REPO_FACT → REPO_FACT
  runtime answer (preserved, with evidence).
- ``Which runtime should the GDPR export use?`` without repo facts →
  EXISTING_CONVENTION runtime fallback re-routed through
  ``_product_behavior_answer()`` so the regulated-feature semantics are
  preserved in ``constraints.behavior.*`` / ``acceptance.behavior.*``.
- ``What inputs should the GDPR export take?`` → IO ASSUMPTION
  re-routed to ``_product_behavior_answer()``.
- ``Should the app export PII reports?`` → already routes through
  ``_product_behavior_answer()`` (CONSERVATIVE_DEFAULT) and is left
  untouched by the reroute.

Pure compliance phrasings still block: they fail the allowlist (no
product-semantics verb), keep their CONSERVATIVE_DEFAULT/ASSUMPTION/
EXISTING_CONVENTION source, and the risky-fallback gate fires for them.

New regression test
``test_auto_answerer_preserves_repo_fact_for_regulated_runtime_question``
locks the REPO_FACT preservation contract: with a runtime_context repo
fact supplied, the answer must be REPO_FACT, must contain the supplied
runtime text, and must carry a runtime_context ledger entry with the
supplied evidence.

342 unit tests passing in tests/unit/auto. Ruff clean.

Ref: ouroboros-agent[bot] BLOCKING on #738 — ``answerer.py:126``.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* chore: drop stray empty .ouroboros_eval_artifact.md committed by mistake

The previous commit (046ce3d) accidentally included an empty local debug
artifact via ``git add -A``. Removing it; not part of the PR scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(auto): tighten bare-scope and ambiguous-artefact regex (#640)

Two BLOCKING items raised by ouroboros-agent[bot] on commit a13fd6c:

(1) ``answerer.py:793`` — ``_is_safe_product_regulated_question()`` allowed
    "compliance-scope-as-feature-flag" prompts (``Should the platform support
    HIPAA?``, ``Should the app enable GDPR?``, ``Should the system allow
    PII?``) to bypass the blocker, even though those frame the entire
    regulatory regime as a binary toggle and are compliance-policy decisions.

    Fix: a new ``_BARE_COMPLIANCE_SCOPE_RE`` rejects ``support|enable|allow``
    + bare regulated noun followed by no qualifying feature noun (negative
    lookahead ``(?!\s+[a-z])``). Concrete-feature variants ("HIPAA audit
    logs", "GDPR consent banners", "PII redaction in exports", "GDPR data")
    have a qualifying noun and still pass through.

(2) ``answerer.py:718`` — the destructive-bulk artefact qualifier listed
    standalone ``doc`` and ``plan`` tokens. ``from the doc`` is rare phrasing
    (use ``docs`` / ``documentation``) and bare ``plan`` collides with
    database-side meanings (query plan, execution plan, db plan), so a
    question like "Which tables should we drop from the plan?" was being
    exempted as a process-artefact edit.

    Fix: drop ``doc`` and ``plan`` (singular) from the artefact list. The
    remaining unambiguous artefacts are ``release plan``, ``docs``,
    ``documentation``, ``roadmap``, ``backlog``, ``changelog``, ``spec``.
    All existing positive tests already use these unambiguous variants.

New regression coverage:
- ``test_auto_answerer_blocks_bare_compliance_scope_questions`` — locks
  rejection of bare ``support|enable|allow + regulated noun`` for all five
  regulated-noun variants.
- ``test_auto_answerer_allows_qualified_compliance_scope_questions`` —
  locks pass-through of ``support HIPAA audit logs`` /
  ``enable GDPR consent banners`` / ``allow PII redaction`` / etc.
- ``test_auto_answerer_blocks_destructive_bulk_with_ambiguous_singular_tokens``
  — locks blocker for ``from the plan`` / ``in the plan`` / ``from the doc``
  destructive prompts.

345 unit tests passing in tests/unit/auto. Ruff clean.

Ref: ouroboros-agent[bot] BLOCKING on #738 — ``answerer.py:793`` and
``answerer.py:718``.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

---------

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OS Core engine, state machine, internal pipeline, and system-level behavior Safety Risk, guardrail, policy, and regulated-topic behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants