docs: SEM learning loop guides + reference updates across 9 files

dennys246 · claude · dennys246 · commit 6aca98c75bc7 · 2026-04-17T13:09:06.000-06:00
Update user-facing guides, reference docs, and CLAUDE.md to document
the SEM learning loop (valence annotation, cerebellum activation,
distribute_reward, success reactions, pain spike boundaries).

Markdown docs:
- reference.md: Valence + Episode Boundary glossary entries, updated
  Cerebellum + NAc entries, reactions/ in module table
- embodiment_guide.md: "SEM Learning Loop (Phase 2)" section with
  full signal flow, success reactions, NAc reward distribution
- decisions.md: "Reward Distribution" subsection documenting
  distribute_reward → credit_node → reward_bias → EC threshold
- CLAUDE.md: quick-reference table (Valence row, updated Causal/
  Embodiment rows), 4 new architectural invariants
- concept-decomposition.md: cross-reference to valence annotation

HTML guides:
- maxim-embodiment.html: SEM Learning Loop section with signal flow
  diagram and CerebellumModulator three-path outcomes
- maxim-memory-systems.html: Episode Valence &amp; Affective Memory +
  Episode Boundaries sections
- maxim-proprioception.html: Pain Reactions &amp; SEM Learning Loop
  section (dual-bus, distribute_reward, pain spike boundary)
- maxim-roadmap.html: updated shipped status + new SEM loop card

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -115,6 +115,11 @@ Simulations call a live LLM for every turn and can burn cost + time quickly. Whe
 
 - **`runtime/bio_stack.py::build_bio_stack` is the canonical bio-pipeline construction site** (Wave 3 of biosystem_unification, 2026-04-17). Composes the four individual Wave 1+2 builders (`build_reaction_bus`, `build_pain_bus`, `build_memory_hub`, `build_default_network`) in the correct dependency order. Returns a frozen `BioStack` dataclass containing all wired bio-systems. `persistence_dir: Path | str | None` is the primary configuration — sub-paths (`hippocampus.json`, `atl.json`, `angular_gyrus.json`) are derived internally. `pain_bus=` parameter accepts a pre-built PainBus (sim AUT pattern where the sandbox needs the bus before the rest of the stack); standard learners are subscribed to the pre-existing bus. `with_default_network=True` constructs a DefaultNetwork (Reachy + sim AUT only). Four production callers: cli.py non-sim, simulation/orchestrator.py AUT + orch NPC, embodied_runtime/agentic_runtime.py Reachy. AgentFactory (site #7) deferred to `agent_factory_canonicalization.md` Wave 4 — conditional `remembers`/`learns` + auto_load doesn't fit the umbrella. CLI sim modes stay as-is (just `build_pain_bus`). See [docs/plans/bio_stack_unification.md](docs/plans/bio_stack_unification.md).
 
+- **`Episode.valence` defaults to 0.0 on old data.** Backward compatible. Old episode dicts without the valence field deserialize cleanly.
+- **`spreading_activation(propagate_valence=False)` returns `dict[str, float]` unchanged.** The `propagate_valence=True` path returns `dict[str, tuple[float, float]]`. Existing callers are unaffected.
+- **NAc `_reward_bias` clamps to [0, max_reward_bias].** Negative rewards (pain) produce 0.0 bias. Bias only widens EC recognition, never narrows. Pain avoidance is handled by valence annotation on edges, not by reward bias.
+- **`BioStack.save_cerebellum()` must be called at session end.** Without it, learned forward models are lost.
+
 ## `maxim doctor` — environment diagnostics
 
 Runs platform-aware checks + prints fix hints with the user's actual IPs filled in.
@@ -237,7 +242,7 @@ Project structure is documented in [docs/reference.md](docs/reference.md).
 | Tools | `tools/` (register in registry), `runtime/executor.py` (aliases) |
 | LLM routing | `models/language/router.py` (provider fallback, typed exception branches, `dispatch_exhausted` aggregated WARN), `models/language/maxim_peer_backend.py` (self-hosted peer backend — one HTTP call, typed failure, streaming with strict mid-stream fail, `health_check` + `for_url` factory), `runtime/lane_backends.py::BACKEND_CLASSES` (dispatch table), `models/language/config.py` (profiles), `models/language/json_parser.py` (JSON repair) |
 | Memory | `memory/hippocampus.py`, `memory/concept_extractor.py`, `memory/store.py` (protocols), `memory/percept_trace_buffer.py` (τ-decay ring buffer) |
-| Causal learning | `decisions/nac.py` (reward bias, eligibility traces), `decisions/causal_link.py` (CausalLink, percept_refs) |
+| Causal learning | `decisions/nac.py` (reward bias, eligibility traces, distribute_reward), `decisions/causal_link.py` (CausalLink, percept_refs) |
 | Substrate encoding | `similarity/encoder.py` (LinguisticEncoder), `similarity/ec.py` (pattern_complete_or_separate, centroid update) |
 | Prompt composition | `prompts/assembler.py` (PromptAssembler, MemorySummary), `agents/prompt_builder.py` (legacy) |
 | Percept schema | `agents/percept_context.py` (PerceptContext), `agents/percept_factory.py` (factories), `agents/modality.py` (SensoryTag, SubstrateModality) |
@@ -250,7 +255,8 @@ Project structure is documented in [docs/reference.md](docs/reference.md).
 | DM campaigns | `simulation/dm_schema.py`, `simulation/dm_runtime.py` |
 | Benchmarks | `simulation/benchmark.py`, `simulation/validation.py` |
 | Research | `simulation/research_agents.py`, `simulation/research_orchestrator.py` |
-| Embodiment | `embodiment/sem.py`, `embodiment/body.py`, `embodiment/cerebellum.py`, `embodiment/motor.py` |
+| Valence | `memory/episode.py` (Episode.valence, apply_hebbian_on_close, salience_spike_rule), `agents/bus.py` (propagate_valence), `memory/hippocampus.py` (capture_reaction, include_valence) |
+| Embodiment | `embodiment/sem.py`, `embodiment/body.py`, `embodiment/cerebellum.py` (forward models), `embodiment/backends/cerebellum_modulator.py` (predict/fallback/train + success reactions), `embodiment/motor.py` |
 | Mesh | `mesh/identity.py`, `mesh/knowledge.py`, `mesh/task_delegation.py`, `mesh/clock.py` |
 | Lane tiers | `runtime/function_router.py`, `runtime/lane_models.py`, `runtime/lane_backends.py` |
 | Multi-agent | `runtime/agent_factory.py`, `runtime/agent_pool.py` |
diff --git a/docs/decisions.md b/docs/decisions.md
@@ -329,6 +329,14 @@ config = NACConfig(
 )
 ```
 
+### Reward Distribution (SEM Learning Loop)
+
+`NAc.distribute_reward(agent_id, reward)` distributes reward across eligible nodes via `credit_node()`. Eligibility traces are set by `update_eligibility()` when percepts complete to substrate nodes. The ReactionBus subscriber in `build_bio_stack` maps reactions to rewards:
+- `Valence.NEGATIVE` -- reward = -intensity (clamps to 0 in credit_node -- bias only widens)
+- `Valence.POSITIVE` -- reward = +intensity (widens EC recognition radius)
+
+`get_threshold_overrides(agent_id)` returns the per-node bias map for EC to use during `pattern_complete`.
+
 ---
 
 ## Biological Inspiration
diff --git a/docs/embodiment_guide.md b/docs/embodiment_guide.md
@@ -302,10 +302,34 @@ Executes motor programs step by step with:
 - **PainBus subscription** for mid-sequence interrupts
 - **Gate tightening** after painful executions (10% per failure)
 
+## SEM Learning Loop (Phase 2 -- Shipped)
+
+When a SEM entity interaction produces a reaction (pain on failure, satisfaction on confident prediction), the signal flows through the full bio-pipeline:
+
+1. **CerebellumModulator** executes affordance -- emits failure reaction (NEGATIVE) or success reaction (POSITIVE)
+2. **ReactionBus** dispatches to subscribers:
+   - `hippocampus.capture_reaction` -- episode valence annotation
+   - `nac.distribute_reward` -- EC threshold adjustment
+3. **Episode close** -- `apply_hebbian_on_close` annotates edges with `metadata["valence"]`
+4. **Pain spike** -- `salience_spike_rule` closes the episode boundary
+5. **Future retrieval** -- `spreading_activation(propagate_valence=True)` carries affective memory
+
+### Success reactions (negativity bias)
+
+CerebellumModulator emits `_emit_success_reaction` when confident enough to skip LLM fallback. Intensity is lower than failure (0.1-0.3 vs 0.3-0.5) -- biologically motivated negativity bias.
+
+### NAc reward distribution
+
+`distribute_reward` credits eligible substrate nodes proportionally to eligibility traces. Positive rewards widen EC recognition (lower threshold); negative rewards clamp to 0 (bias never narrows).
+
+### Cerebellum activation in production
+
+`BioStack.cerebellum` is now constructed by `build_bio_stack` and forwarded via `build_executor(cerebellum=...)` to `generate_tools_for_entity`, which creates `CerebellumModulator` instances with a wired `reaction_bus`. This means every SEM affordance tool now has a live Cerebellum backing it -- predictions, training, and reaction emission all happen automatically.
+
 ## What's Next
 
-- **Phase 2**: Composable failure modes — persistent failures with recovery conditions
-- **Phase 3**: Hardware adapter — wrap real robot SDKs as SEM backends
+- **Phase 2**: Composable failure modes -- persistent failures with recovery conditions
+- **Phase 3**: Hardware adapter -- wrap real robot SDKs as SEM backends
 
 See [embodiment_core_plan.md](plans/archive/embodiment_core_plan.md) (archived) for the historical roadmap.
 
diff --git a/docs/reference.md b/docs/reference.md
@@ -32,7 +32,8 @@ Agents -> Planning -> Decision Engine -> Runtime -> Executor -> Tools -> Environ
 | `src/maxim/mesh/` | Simulation-only: `bus`, `identity`, `message`, `naming` (R0 deleted the dead agent-mesh subsystem; see "Removed in R0" below) |
 | `src/maxim/simulation/` | Simulation modes, generative campaigns, research protocol, benchmarks |
 | `src/maxim/integration/` | MemoryHub cross-system coordinator (11 bio-systems) |
-| `src/maxim/decisions/` | NAc causal learning, adaptive planner |
+| `src/maxim/reactions/` | Reaction types, ReactionBus (per-kind dispatch), PerceptProducer/ReactionProducer protocols |
+| `src/maxim/decisions/` | NAc causal learning, adaptive planner, reward distribution |
 | `src/maxim/time/` | SCN temporal rhythm indexing |
 | `src/maxim/similarity/` | Entorhinal Cortex (pattern completion, centroid update) + LinguisticEncoder (P1) + ConceptDecomposer (noun-phrase extraction before EC) |
 | `src/maxim/prompts/` | PromptAssembler (B1), MemorySummary, prompt profiles |
@@ -122,13 +123,15 @@ Maxim uses neuroscience-inspired names. Here is the translation:
 |----------|--------------|--------|--------------|
 | Hippocampus | Episodic memory | `memory/` | Stores and recalls experiences (events, conversations) |
 | ATL | Semantic memory | `memory/` | Extracts concepts, categories, and generalizations |
-| NAc | Reward / causal learning | `decisions/` | Learns cause-and-effect relationships ("what leads to what") |
+| NAc | Reward / causal learning | `decisions/` | Learns cause-and-effect relationships ("what leads to what"). `distribute_reward` now wired via ReactionBus subscriber in `build_bio_stack` |
 | SCN | Internal clock | `time/` | Tracks circadian-like temporal patterns and rhythms |
 | EC | Memory indexing + substrate recognition | `similarity/` | Routes queries via similarity; pattern_complete_or_separate for substrate nodes (P1) |
 | Angular Gyrus | Cross-modal algebra | `math/` | Combines memories across different modalities |
-| Cerebellum | Motor prediction | `embodiment/` | Predicts outcomes of physical actions, learns motor programs |
+| Cerebellum | Motor prediction | `embodiment/` | Predicts outcomes of physical actions, learns motor programs. Now activated in production via `BioStack.cerebellum` and `build_executor(cerebellum=...)` |
 | Amygdala / Fear | Threat detection | `proprioception/` | Detects harm, triggers pain signals, gates risky actions |
 | Default Network | Reactive behavior | `default_network/` | Background processing, idle behaviors, spontaneous thoughts |
+| Valence | Affective edge signal | `memory/episode.py` | Affective signal on Hebbian edges (`Edge.metadata["valence"]`), computed from Reactions at episode close via `apply_hebbian_on_close`. Propagated by `spreading_activation(propagate_valence=True)` |
+| Episode Boundary Rules | Pluggable boundary detection | `memory/episode.py` | `BoundaryRule` callables on `EpisodeBoundaryDetector`. Defaults: tick gap, channel change, scn_tag change. New: `salience_spike_rule(min_intensity=0.5)` triggers boundary on pain/salience spikes via `CaptureEvent.salience_spike` |
 
 ---
 
diff --git a/docs/user/concept-decomposition.md b/docs/user/concept-decomposition.md
@@ -134,3 +134,7 @@ node_ids = encoder.encode_decomposed(
 - **English only** (Stage 1). spaCy `en_core_web_sm` is English-only. Multi-language support requires a multilingual model (`xx_ent_wiki_sm`) or per-language model selection.
 - **Short fragments** may over-decompose. The `min_chunk_len` filter helps, but domain-specific inputs may need a custom strategy.
 - **No relation tagging yet** (Stage 2). Chunks are bound with untagged Hebbian edges. Role-tagged edges (`relation="spatial"`) are planned for a future stage.
+
+## Connection to Valence Annotation
+
+With concept decomposition enabled, valence annotation targets individual concept nodes ("rusty sword", "heavy", "sharp") rather than whole-sentence blobs. This means the agent learns "rusty sword is associated with pain" rather than "the entire sentence about picking up a rusty sword is painful." See [valence_annotation_poc.md](../experiments/valence_annotation_poc.md) for the demonstration.
diff --git a/htmls-guides/maxim-embodiment.html b/htmls-guides/maxim-embodiment.html
@@ -122,6 +122,74 @@ <h3 class="text-lg font-semibold text-emerald-400 mb-2">Biological Inspiration</
     <p class="text-slate-300 leading-relaxed">The LLM is a <strong class="text-white">teacher</strong>, not a per-tick oracle. After enough observations, the Cerebellum handles predictions deterministically. In testing, LLM calls drop from 100 to &le;40 over 100 actions.</p>
   </section>
 
+  <!-- SEM Learning Loop -->
+  <section class="space-y-4">
+    <h2 id="sem-learning-loop" class="text-2xl font-semibold text-white border-b border-indigo-500/50 pb-2 scroll-mt-32">The SEM Learning Loop</h2>
+
+    <div class="bg-slate-800 border-l-4 border-emerald-500 rounded-xl p-6">
+      <h3 class="text-lg font-semibold text-emerald-400 mb-2">Biological Inspiration</h3>
+      <p class="text-slate-300">In the brain, the cerebellum doesn't just predict &mdash; it emits signals when predictions fail <em>or</em> succeed. These signals propagate to the hippocampus (contextual memory) and nucleus accumbens (reward learning) simultaneously, closing the loop between motor execution and long-term behavioral adaptation. Success and failure are not symmetric: negative outcomes carry disproportionate weight, a phenomenon known as <em>negativity bias</em>.</p>
+    </div>
+
+    <p class="text-slate-300 leading-relaxed">When the Cerebellum evaluates an affordance, the outcome flows through the bio-pipeline as a <em class="text-indigo-400 not-italic font-medium">reaction</em> &mdash; a typed evaluative signal that drives learning across multiple systems simultaneously. This is the SEM Learning Loop.</p>
+
+    <h3 class="text-white font-semibold mt-4">CerebellumModulator Outcomes</h3>
+    <p class="text-slate-300 leading-relaxed">The <code class="bg-slate-950 px-1 rounded text-xs">CerebellumModulator</code> classifies each affordance execution into one of three outcome paths:</p>
+
+    <div class="grid md:grid-cols-3 gap-4">
+      <div class="bg-slate-800 border border-slate-700 rounded-xl p-5">
+        <h4 class="text-emerald-400 font-semibold mb-2">Confident Prediction</h4>
+        <p class="text-slate-300 text-sm">Confidence &ge; 0.3 and low variance. The Cerebellum handles the prediction without LLM fallback. Emits a <strong class="text-white">success reaction</strong> with positive valence.</p>
+      </div>
+      <div class="bg-slate-800 border border-slate-700 rounded-xl p-5">
+        <h4 class="text-amber-400 font-semibold mb-2">LLM Fallback</h4>
+        <p class="text-slate-300 text-sm">Confidence &lt; 0.3 or high variance. The LLM acts as teacher. The Cerebellum trains on the LLM's response via Rescorla-Wagner update. No reaction emitted &mdash; the system is still learning.</p>
+      </div>
+      <div class="bg-slate-800 border border-slate-700 rounded-xl p-5">
+        <h4 class="text-red-400 font-semibold mb-2">Failure</h4>
+        <p class="text-slate-300 text-sm">Affordance execution triggers a failure mode (e.g., shatter, overheat). Emits a <strong class="text-white">pain reaction</strong> with negative valence via the PainBus.</p>
+      </div>
+    </div>
+
+    <h3 class="text-white font-semibold mt-6">Signal Flow</h3>
+    <p class="text-slate-300 leading-relaxed">Both success and pain reactions flow through the <code class="bg-slate-950 px-1 rounded text-xs">ReactionBus</code>, which dispatches to two subscribers in parallel:</p>
+
+    <div class="bg-slate-950 rounded-lg p-5 font-mono text-sm text-slate-300 overflow-x-auto whitespace-pre"><span class="text-indigo-400 text-xs uppercase tracking-wider">SEM Learning Loop Signal Flow</span>
+
+CerebellumModulator
+  |
+  |-- confident prediction --&rarr; _emit_success_reaction(valence=+0.3..+1.0)
+  |-- failure mode fired   --&rarr; PainBus.publish(PainSignal) --&rarr; pain_signal_to_reaction
+  |
+  &darr;
+ReactionBus.dispatch(reaction)
+  |
+  |-- Subscriber 1: Hippocampus
+  |     Episode captures the reaction context
+  |     Episode.valence set at finalize (mean of all reactions)
+  |     Hebbian edges annotated with Edge.metadata["valence"]
+  |
+  |-- Subscriber 2: NAc (via distribute_reward)
+  |     Adjusts per-node reward_bias in the Hebbian graph
+  |     EC similarity thresholds shift: positive &rarr; tighter, negative &rarr; looser
+  |
+  &darr;
+Episode boundary: salience_spike_rule
+  Pain spike (intensity &ge; 0.7) forces episode close
+  New episode starts with clean slate
+  |
+  &darr;
+Future retrieval: retrieve_on_cue(include_valence=True)
+  Spreading activation propagates edge valence
+  Recalled memories carry affective coloring</div>
+
+    <h3 class="text-white font-semibold mt-6">Negativity Bias</h3>
+    <p class="text-slate-300 leading-relaxed">Success reactions carry positive valence but at lower intensity than pain reactions &mdash; mirroring biological negativity bias. A single painful failure creates a stronger learning signal than several routine successes. This asymmetry means the agent develops caution around dangerous affordances faster than it develops confidence around safe ones, which is the correct survival trade-off for an embodied system.</p>
+
+    <h3 class="text-white font-semibold mt-6">Cerebellum Activation via BioStack</h3>
+    <p class="text-slate-300 leading-relaxed">In production, the Cerebellum is wired through <code class="bg-slate-950 px-1 rounded text-xs">BioStack.cerebellum</code> and activated by <code class="bg-slate-950 px-1 rounded text-xs">build_executor</code>. This means every agent entry point that constructs a bio-stack gets Cerebellum forward models and the full SEM Learning Loop automatically &mdash; no per-caller wiring required.</p>
+  </section>
+
   <!-- Motor Programs -->
   <section class="space-y-4">
     <h2 id="motor-programs" class="text-2xl font-semibold text-white border-b border-indigo-500/50 pb-2 scroll-mt-32">Motor Programs</h2>
diff --git a/htmls-guides/maxim-memory-systems.html b/htmls-guides/maxim-memory-systems.html
diff --git a/htmls-guides/maxim-proprioception.html b/htmls-guides/maxim-proprioception.html
diff --git a/htmls-guides/maxim-roadmap.html b/htmls-guides/maxim-roadmap.html