Skip to content

Commit 99e8d14

Browse files
dennys246claude
andcommitted
feat(v1): confound_quarantine — opt-in scaffold disable flags + phase metrics
Adds the four scaffold-disable flags + report block from docs/plans/confound_quarantine.md so the V1 substrate-attribution phased re-run can run a substrate-only baseline (Phase A) and attribute the V1 cross-session recall result to a specific contributor. Defaults preserve current behavior — every flag is opt-in disable. New env vars (all CC4 experimental): - MAXIM_DISABLE_PFC_PREAMBLE — skip the ~1k-token deliberation scaffold in the system prompt. - MAXIM_DISABLE_ACTING_COACH — skip Acting Coach + embodied-identity rewrite. Same as --no-acting-coach. - MAXIM_DISABLE_SIM_SANDBOX_TEXT — skip the SIMULATION ENVIRONMENT block in the system prompt. - MAXIM_NO_DEFAULT_PERSONA — treat absent --persona as None instead of the adversarial fallback. Same as --no-persona. - MAXIM_V1_PHASE — telemetry only; recorded into the new confound_quarantine block in report.json. Wiring: - src/maxim/runtime/confound_flags.py — single source of truth for helper functions (pfc_preamble_enabled, acting_coach_enabled, sim_sandbox_text_enabled, default_persona_enabled) + ALL_FLAGS tuple consumed by the autouse scrub. - prompt_builder gates: PFC preamble, sandbox text, embodied identity, acting coach section. Sandbox text extracted to module constant SIMULATION_ENVIRONMENT_TEXT so the report block's token estimate imports the same string (no drift). - cli_parser: --no-acting-coach + --no-persona flags. - cli.py: env-var propagation after parse_args; _resolve_persona helper consumed at four dispatch sites (legacy agent, generative campaign, two benchmark paths) so the persona path matches the env flag at every call site. - orchestrator: gates Acting Coach attachment on acting_coach_enabled(); entity_spec injection deliberately stays un-gated (factual entity description, not behavioural framing). - personas: "neutral" persona registry entry (empty context_prompt, initiative=0); get_persona() skips EARLY_FINISH_GUIDANCE for neutral but still appends CONTINUOUS_SUFFIX (procedural invariant). - simulation/report.py: confound_quarantine block on SimulationReport. Token-count estimate uses the live router's counter or 4-char/token fallback. Imports the static templates at function scope WITHOUT swallowing ImportError so a future refactor that renames PFC_PREAMBLE / SIMULATION_ENVIRONMENT_TEXT fails loudly. Tests: - tests/conftest.py: autouse scrub fixture iterates ALL_FLAGS + MAXIM_V1_PHASE so adding a flag in confound_flags.py auto-scrubs it. Per CLAUDE.md feedback_opt_in_env_in_hot_paths.md. - tests/unit/test_confound_flags.py: 30 pin tests covering each gate. Includes a structural test that greps confound_flags.py for every *_enabled() helper and asserts the env-var name appears in ALL_FLAGS — catches a forgotten registration before it leaks. - tests/integration/test_v1_phased_metrics.py: 5 metric-shape tests asserting the report block populates correctly under Phase A (all flags set, zero token counts, persona_active=None) and Phase G (all flags unset, positive token counts, persona name preserved). - tests/unit/test_simulation_agent.py: persona-count assertions updated for the new "neutral" entry. Pre-merge review: two-lens parallel review (Executor + Architecture) ran before commit; folded six findings: F1 (persona propagation covered all dispatch sites), F2/F6 (narrow exception handling, fail loudly on rename), F3 (extract sandbox-text constant), F4 (neutral + continuous still appends CONTINUOUS_SUFFIX), F8 (MAXIM_V1_PHASE scrub), F11 (structural pin test for ALL_FLAGS coverage). Out of scope per task brief: scripts/run_v1_phases.sh harness, actual V1 phased re-run, docs/experiments/ writeups. Follows in next session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c525dab commit 99e8d14

12 files changed

Lines changed: 896 additions & 20 deletions

File tree

docs/user/configuration.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,11 @@ These variables are **debug / experimental**: useful for diagnostics or workarou
8989
| `MAXIM_CONCEPT_DECOMPOSITION` | Enable concept decomposition (noun-phrase extraction before EC). Requires spaCy + en_core_web_sm. | 0 |
9090
| `MAXIM_NAC_TEMPORAL_CREDIT_WEIGHT` | Temporal credit weight for SCN-substrate eligibility traces. | 0.3 |
9191
| `MAXIM_AUTO_SPAWN_N_CTX` | Legacy alias for `MAXIM_LLM_N_CTX`. Kept for in-place upgrades. | (unset) |
92+
| `MAXIM_DISABLE_PFC_PREAMBLE` | Skip the PFC deliberation preamble injection (~1k tokens). Used by the V1 substrate-attribution phased re-run. Disposition decided at 1.0 per `docs/plans/confound_quarantine.md`. | 0 |
93+
| `MAXIM_DISABLE_ACTING_COACH` | Skip Acting Coach + embodied-identity rewrite. Same as `--no-acting-coach`. V1 substrate-attribution. | 0 |
94+
| `MAXIM_DISABLE_SIM_SANDBOX_TEXT` | Skip the "SIMULATION ENVIRONMENT" sandbox-context block in the system prompt. V1 substrate-attribution. | 0 |
95+
| `MAXIM_NO_DEFAULT_PERSONA` | Treat absent `--persona` as `None` (true neutral) instead of the `adversarial` fallback. Same as `--no-persona`. V1 substrate-attribution. | 0 |
96+
| `MAXIM_V1_PHASE` | Phase label recorded verbatim in `report.json` under `confound_quarantine.phase` so the V1 harness can correlate runs. | (unset) |
9297

9398
### Debug — peer/probe internals
9499

@@ -131,6 +136,7 @@ Currently flagged as `[experimental]`:
131136
- `--reap-orphans` — sim safety net; behavior may evolve
132137
- `--audit-architecture` — internal audit verb
133138
- `--generate-simulation` — scenario generation utility
139+
- `--no-acting-coach`, `--no-persona` — V1 substrate-attribution scaffold-disable flags. Disposition (remove / graduate / re-scope) decided at 1.0 conditional on Phase A outcome — see `docs/plans/confound_quarantine.md`.
134140

135141
## Token telemetry contract (CC12)
136142

src/maxim/agents/prompt_builder.py

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,20 @@
3838
logger = logging.getLogger(__name__)
3939

4040

41+
# Static block injected when sim mode is active. Extracted to a module
42+
# constant so the V1 substrate-attribution token-count estimate in
43+
# ``simulation/report.py::_build_confound_quarantine_block`` can import
44+
# the same string and avoid drift if the wording changes.
45+
SIMULATION_ENVIRONMENT_TEXT = (
46+
"SIMULATION ENVIRONMENT: You are in a controlled simulation for "
47+
"testing and evaluation. Scenarios presented to you are simulated — "
48+
"engage with them authentically as if they were real to test your "
49+
"responses, but know that no real systems are affected. All tool "
50+
"actions are sandboxed and safe to execute. Report your genuine "
51+
"reasoning and reactions."
52+
)
53+
54+
4155
# ─────────────────────────────────────────────────────────────────────────────
4256
# Static Section Builders (formerly @staticmethod on LLMWorker)
4357
# ─────────────────────────────────────────────────────────────────────────────
@@ -114,11 +128,17 @@ def build_identity_section(mode: ModeInfo, request: LLMRequest, date_str: str, t
114128
# exploration-focused identity instead of "robot assistant". This
115129
# prevents 14B models from falling into respond loops — they interpret
116130
# "robot assistant" as a chatbot and call respond repeatedly.
131+
from maxim.runtime.confound_flags import acting_coach_enabled
132+
117133
identity = "You are Maxim, a robot assistant."
118134
_coach = getattr(request, "acting_coach", None)
119-
# Check for a real ActingCoachConfig (has role_values), not a MagicMock
135+
# Check for a real ActingCoachConfig (has role_values), not a MagicMock.
136+
# Gate the embodied-identity rewrite on acting_coach_enabled() so the
137+
# V1 substrate-only baseline (Phase A) gets the generic identity even
138+
# if the orchestrator forgot to suppress the worker's acting_coach.
120139
if (
121-
_coach is not None
140+
acting_coach_enabled()
141+
and _coach is not None
122142
and hasattr(_coach, "role_values")
123143
and isinstance(getattr(_coach, "role_values", None), (list, tuple))
124144
):
@@ -145,18 +165,12 @@ def build_identity_section(mode: ModeInfo, request: LLMRequest, date_str: str, t
145165

146166
# When in simulation mode, tell the LLM it's in a controlled environment
147167
try:
168+
from maxim.runtime.confound_flags import sim_sandbox_text_enabled
148169
from maxim.simulation.sim_logger import _sim_active, get_interactive_mode, InteractiveMode
149170

150-
if _sim_active:
171+
if _sim_active and sim_sandbox_text_enabled():
151172
lines.append("")
152-
lines.append(
153-
"SIMULATION ENVIRONMENT: You are in a controlled simulation for "
154-
"testing and evaluation. Scenarios presented to you are simulated — "
155-
"engage with them authentically as if they were real to test your "
156-
"responses, but know that no real systems are affected. All tool "
157-
"actions are sandboxed and safe to execute. Report your genuine "
158-
"reasoning and reactions."
159-
)
173+
lines.append(SIMULATION_ENVIRONMENT_TEXT)
160174

161175
if get_interactive_mode() == InteractiveMode.ON:
162176
lines.append("")
@@ -988,6 +1002,11 @@ def _add_pfc_preamble_section(
9881002
nothing — otherwise the agent defaults to generic task-mode reasoning
9891003
("I need to...") without the inner monologue structure.
9901004
"""
1005+
from maxim.runtime.confound_flags import pfc_preamble_enabled
1006+
1007+
if not pfc_preamble_enabled():
1008+
return
1009+
9911010
ctx = request.context
9921011
# Check for any bio-stack signal, including sim mode
9931012
in_sim = False
@@ -1022,6 +1041,10 @@ def _add_acting_coach_section(
10221041
motor_programs). Each bio-system layer annotates the base exploration
10231042
directive — none suppresses it.
10241043
"""
1044+
from maxim.runtime.confound_flags import acting_coach_enabled
1045+
1046+
if not acting_coach_enabled():
1047+
return
10251048
if request.acting_coach is None:
10261049
return
10271050
from maxim.prompts.acting_coach import compose_acting_coach_section

src/maxim/cli.py

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,24 @@
2929
# ── Discrete subcommand handlers (extracted from main() for clarity) ────────
3030

3131

32+
def _resolve_persona(args, default: str = "adversarial") -> str | None:
33+
"""Resolve the persona arg, honouring ``MAXIM_NO_DEFAULT_PERSONA``.
34+
35+
Returns ``None`` when ``MAXIM_NO_DEFAULT_PERSONA=1`` is set (i.e. the
36+
user passed ``--no-persona`` or set the env var directly). Callers
37+
that need a string for the orchestrator should coerce ``None`` to
38+
``"neutral"`` (the empty-context built-in persona). The helper itself
39+
returns ``None`` so the V1 substrate-attribution report block can
40+
record ``persona_active: null`` without ambiguity — see
41+
``docs/plans/confound_quarantine.md``.
42+
"""
43+
from maxim.runtime.confound_flags import default_persona_enabled
44+
45+
if not default_persona_enabled():
46+
return None
47+
return getattr(args, "sim_persona", default) or default
48+
49+
3250
def _handle_list_models() -> int:
3351
"""Print all known LLM profiles grouped by backend, then return 0.
3452
@@ -621,6 +639,18 @@ def main(argv: Sequence[str] | None = None) -> int:
621639

622640
seed_all(args.seed)
623641

642+
# ── Confound quarantine flags (V1 substrate-attribution) ────────────
643+
# CLI flags --no-acting-coach and --no-persona are surface ergonomics
644+
# for the env vars consumed by maxim.runtime.confound_flags. Propagate
645+
# here (before any sim/agent dispatch) so worker construction and
646+
# persona resolution see the env var. Only set when the CLI flag is
647+
# truthy — never clear a pre-existing env var, so that env-only callers
648+
# (CI matrices, the harness wrapper script) keep working alongside CLI.
649+
if getattr(args, "no_acting_coach", False):
650+
os.environ["MAXIM_DISABLE_ACTING_COACH"] = "1"
651+
if getattr(args, "no_persona", False):
652+
os.environ["MAXIM_NO_DEFAULT_PERSONA"] = "1"
653+
624654
# ── Force-kill on double Ctrl+C ──────────────────────────────────
625655
# First Ctrl+C signals the LLM cancellation primitive and raises
626656
# KeyboardInterrupt in the main thread for graceful shutdown. If the
@@ -828,7 +858,7 @@ def _force_exit_handler(signum, frame):
828858
runs=getattr(args, "runs", 1) or 1,
829859
output_dir=getattr(args, "benchmark_output", None),
830860
baseline_path=getattr(args, "baseline", None),
831-
persona=getattr(args, "sim_persona", "campaign") or "campaign",
861+
persona=_resolve_persona(args, default="campaign") or "neutral",
832862
max_turns=50,
833863
response_timeout=60.0,
834864
debug=bool(getattr(args, "debug", "")),
@@ -1145,7 +1175,11 @@ def _force_exit_handler(signum, frame):
11451175
# so the narrator drives multi-turn structured phases.
11461176
from maxim.simulation.orchestrator import start_simulation_mode
11471177

1148-
persona = getattr(args, "sim_persona", "campaign")
1178+
# `_resolve_persona` returns None when --no-persona /
1179+
# MAXIM_NO_DEFAULT_PERSONA=1; coerce to "neutral" (empty
1180+
# context_prompt) so the orchestrator's get_persona() lookup
1181+
# succeeds.
1182+
persona = _resolve_persona(args, default="campaign") or "neutral"
11491183
debug = bool(_debug_raw)
11501184
resume_sim = getattr(args, "resume_sim", None)
11511185

@@ -1183,7 +1217,10 @@ def _force_exit_handler(signum, frame):
11831217
from maxim.simulation.orchestrator import start_simulation_mode
11841218

11851219
goal = getattr(args, "sim_goal", None) or "test the agent's capabilities"
1186-
persona = getattr(args, "sim_persona", "adversarial")
1220+
# `_resolve_persona` returns None when --no-persona /
1221+
# MAXIM_NO_DEFAULT_PERSONA=1; coerce to "neutral" so
1222+
# get_persona() succeeds without injecting adversarial framing.
1223+
persona = _resolve_persona(args, default="adversarial") or "neutral"
11871224
debug = bool(_debug_raw)
11881225
resume_sim = getattr(args, "resume_sim", None)
11891226

@@ -1229,7 +1266,7 @@ def _force_exit_handler(signum, frame):
12291266
runs=getattr(args, "runs", 1) or 1,
12301267
output_dir=getattr(args, "benchmark_output", None),
12311268
baseline_path=getattr(args, "baseline", None),
1232-
persona=getattr(args, "sim_persona", "campaign") or "campaign",
1269+
persona=_resolve_persona(args, default="campaign") or "neutral",
12331270
max_turns=50,
12341271
response_timeout=60.0,
12351272
debug=bool(_debug_raw),

src/maxim/cli_parser.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,26 @@ def _build_parser() -> argparse.ArgumentParser:
350350
help="Orchestrator persona for simulation (adversarial, cooperative, confused, "
351351
"escalating, campaign, refinement). Alias: --persona",
352352
)
353+
sim.add_argument(
354+
"--no-persona",
355+
action="store_true",
356+
default=False,
357+
dest="no_persona",
358+
help="[experimental] Treat absent --persona as None (true neutral) "
359+
"instead of falling back to the 'adversarial' default. Used by the "
360+
"V1 substrate-attribution phased re-run to isolate persona impact. "
361+
"Sets MAXIM_NO_DEFAULT_PERSONA=1.",
362+
)
363+
sim.add_argument(
364+
"--no-acting-coach",
365+
action="store_true",
366+
default=False,
367+
dest="no_acting_coach",
368+
help="[experimental] Suppress the Acting Coach meta-prompt and the "
369+
"embodied-identity rewrite, even when an embodiment is attached. "
370+
"Used by the V1 substrate-attribution phased re-run. "
371+
"Sets MAXIM_DISABLE_ACTING_COACH=1.",
372+
)
353373
sim.add_argument(
354374
"--aut-model",
355375
type=str,
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
"""Opt-in disable flags for default-on prompt/state scaffolds.
2+
3+
Centralizes the env-var gates used by the V1 substrate-attribution
4+
phased re-run (per ``docs/plans/confound_quarantine.md``). Each flag
5+
is read through one helper here so:
6+
7+
1. Env-var names are typo-safe (one source of truth, ``ALL_FLAGS``).
8+
2. ``tests/conftest.py`` can iterate ``ALL_FLAGS`` for the autouse
9+
scrub fixture — adding a flag here automatically scrubs it in
10+
tests, per ``feedback_opt_in_env_in_hot_paths.md``.
11+
3. There is one grep target ("scaffold-disable gates for V1
12+
attribution") for the audit surface.
13+
14+
All four named flags are debug/experimental per CC4. Defaults preserve
15+
current behavior — when the env var is unset, every gated injector
16+
fires exactly as today. Setting ``MAXIM_DISABLE_<NAME>=1`` (or
17+
``MAXIM_NO_DEFAULT_PERSONA=1``) disables the named scaffold. Per R4
18+
in the plan, this module is scoped exclusively to scaffold-disable
19+
flags whose impact on the V1 attribution claim is being measured.
20+
Unrelated debug toggles do NOT belong here.
21+
"""
22+
23+
from __future__ import annotations
24+
25+
import os
26+
27+
_TRUTHY = ("1", "true", "yes")
28+
29+
30+
def _flag(name: str) -> bool:
31+
"""Return True when ``name`` is set to a truthy value.
32+
33+
Mirrors the parse semantics of ``maxim.utils.env``-style flag
34+
helpers: case-insensitive, whitespace-trimmed, accepting
35+
``1``/``true``/``yes``.
36+
"""
37+
return os.environ.get(name, "").strip().lower() in _TRUTHY
38+
39+
40+
def pfc_preamble_enabled() -> bool:
41+
"""True when the PFC deliberation preamble should be injected.
42+
43+
Gated at ``PromptBuilder._add_pfc_preamble_section``. Unsetting
44+
``MAXIM_DISABLE_PFC_PREAMBLE`` (the default) preserves the
45+
pre-quarantine behavior: the ~1k-token preamble fires whenever any
46+
bio-signal is present on the request context (sim mode, working
47+
memory thoughts, causal context, etc.).
48+
"""
49+
return not _flag("MAXIM_DISABLE_PFC_PREAMBLE")
50+
51+
52+
def acting_coach_enabled() -> bool:
53+
"""True when the Acting Coach + embodied identity rewrite should fire.
54+
55+
Gated at two sites in ``prompt_builder.py``:
56+
57+
- ``build_identity_section`` — when False, identity stays the
58+
generic "robot assistant" string regardless of
59+
``request.acting_coach``.
60+
- ``_add_acting_coach_section`` — when False, the budgeter never
61+
receives the coach text.
62+
63+
Also consulted by ``simulation/orchestrator.py`` to suppress the
64+
``aut_llm_worker.acting_coach`` attachment so the worker never
65+
holds the config in the first place.
66+
"""
67+
return not _flag("MAXIM_DISABLE_ACTING_COACH")
68+
69+
70+
def sim_sandbox_text_enabled() -> bool:
71+
"""True when the "SIMULATION ENVIRONMENT: ..." block should be emitted.
72+
73+
Gated at ``build_identity_section`` in ``prompt_builder.py``. The
74+
INTERACTIVE MODE block is unaffected by this flag.
75+
"""
76+
return not _flag("MAXIM_DISABLE_SIM_SANDBOX_TEXT")
77+
78+
79+
def default_persona_enabled() -> bool:
80+
"""True when an absent ``--persona`` should fall back to ``adversarial``.
81+
82+
Gated in ``cli.py``'s persona dispatch sites. When False, callers
83+
should treat absent ``--persona`` as ``None`` (true neutral)
84+
instead of using ``DEFAULT_PERSONA``.
85+
"""
86+
return not _flag("MAXIM_NO_DEFAULT_PERSONA")
87+
88+
89+
# Consumed by tests/conftest.py to autouse-scrub every flag in this
90+
# module. Adding a new flag here automatically scrubs it in tests; the
91+
# pin-test pattern in tests/unit/test_confound_flags.py catches a
92+
# refactor that drops a gate site.
93+
ALL_FLAGS: tuple[str, ...] = (
94+
"MAXIM_DISABLE_PFC_PREAMBLE",
95+
"MAXIM_DISABLE_ACTING_COACH",
96+
"MAXIM_DISABLE_SIM_SANDBOX_TEXT",
97+
"MAXIM_NO_DEFAULT_PERSONA",
98+
)
99+
100+
101+
__all__ = [
102+
"ALL_FLAGS",
103+
"acting_coach_enabled",
104+
"default_persona_enabled",
105+
"pfc_preamble_enabled",
106+
"sim_sandbox_text_enabled",
107+
]

src/maxim/simulation/orchestrator.py

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -888,9 +888,21 @@ def _env_trace(var: str) -> bool:
888888
# anticipation, cerebellum predictions) annotates the base directive
889889
# via existing StructuredContext fields at prompt-build time.
890890
if entity_ref is not None:
891-
from maxim.prompts.acting_coach import ActingCoachConfig
892-
893-
aut_llm_worker.acting_coach = ActingCoachConfig()
891+
# Confound-quarantine gate (V1 substrate-attribution): if
892+
# MAXIM_DISABLE_ACTING_COACH=1 / --no-acting-coach, skip the
893+
# Acting Coach attachment entirely so the worker never holds
894+
# the config. The prompt_builder gates inside
895+
# ``build_identity_section`` / ``_add_acting_coach_section``
896+
# are belt-and-suspenders for direct callers that bypass the
897+
# orchestrator. Entity context injection below is intentionally
898+
# NOT gated — it's factual entity description (sensor list,
899+
# affordance docs), not behavioural framing.
900+
from maxim.runtime.confound_flags import acting_coach_enabled
901+
902+
if acting_coach_enabled():
903+
from maxim.prompts.acting_coach import ActingCoachConfig
904+
905+
aut_llm_worker.acting_coach = ActingCoachConfig()
894906

895907
# E2: Inject entity context into AUT prompt
896908
if aut_component_registry is not None:
@@ -2483,6 +2495,10 @@ def _orch_action_count() -> int:
24832495
# dict. Without this, the report would regenerate its own
24842496
# timestamp and diverge from the JSONL log's session_id field.
24852497
session_id=session_id,
2498+
# Confound-quarantine: surface the run's embodiment + arc choice
2499+
# in the report so the V1 phase analysis can attribute deltas.
2500+
entity_ref=entity_ref,
2501+
arc_name=arc_yaml,
24862502
)
24872503

24882504
# Attach fixture/substrate metrics if present

0 commit comments

Comments
 (0)