Skip to content

Commit 6aca98c

Browse files
dennys246claude
andcommitted
docs: SEM learning loop guides + reference updates across 9 files
Update user-facing guides, reference docs, and CLAUDE.md to document the SEM learning loop (valence annotation, cerebellum activation, distribute_reward, success reactions, pain spike boundaries). Markdown docs: - reference.md: Valence + Episode Boundary glossary entries, updated Cerebellum + NAc entries, reactions/ in module table - embodiment_guide.md: "SEM Learning Loop (Phase 2)" section with full signal flow, success reactions, NAc reward distribution - decisions.md: "Reward Distribution" subsection documenting distribute_reward → credit_node → reward_bias → EC threshold - CLAUDE.md: quick-reference table (Valence row, updated Causal/ Embodiment rows), 4 new architectural invariants - concept-decomposition.md: cross-reference to valence annotation HTML guides: - maxim-embodiment.html: SEM Learning Loop section with signal flow diagram and CerebellumModulator three-path outcomes - maxim-memory-systems.html: Episode Valence & Affective Memory + Episode Boundaries sections - maxim-proprioception.html: Pain Reactions & SEM Learning Loop section (dual-bus, distribute_reward, pain spike boundary) - maxim-roadmap.html: updated shipped status + new SEM loop card Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4619e94 commit 6aca98c

9 files changed

Lines changed: 252 additions & 14 deletions

CLAUDE.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,11 @@ Simulations call a live LLM for every turn and can burn cost + time quickly. Whe
115115

116116
- **`runtime/bio_stack.py::build_bio_stack` is the canonical bio-pipeline construction site** (Wave 3 of biosystem_unification, 2026-04-17). Composes the four individual Wave 1+2 builders (`build_reaction_bus`, `build_pain_bus`, `build_memory_hub`, `build_default_network`) in the correct dependency order. Returns a frozen `BioStack` dataclass containing all wired bio-systems. `persistence_dir: Path | str | None` is the primary configuration — sub-paths (`hippocampus.json`, `atl.json`, `angular_gyrus.json`) are derived internally. `pain_bus=` parameter accepts a pre-built PainBus (sim AUT pattern where the sandbox needs the bus before the rest of the stack); standard learners are subscribed to the pre-existing bus. `with_default_network=True` constructs a DefaultNetwork (Reachy + sim AUT only). Four production callers: cli.py non-sim, simulation/orchestrator.py AUT + orch NPC, embodied_runtime/agentic_runtime.py Reachy. AgentFactory (site #7) deferred to `agent_factory_canonicalization.md` Wave 4 — conditional `remembers`/`learns` + auto_load doesn't fit the umbrella. CLI sim modes stay as-is (just `build_pain_bus`). See [docs/plans/bio_stack_unification.md](docs/plans/bio_stack_unification.md).
117117

118+
- **`Episode.valence` defaults to 0.0 on old data.** Backward compatible. Old episode dicts without the valence field deserialize cleanly.
119+
- **`spreading_activation(propagate_valence=False)` returns `dict[str, float]` unchanged.** The `propagate_valence=True` path returns `dict[str, tuple[float, float]]`. Existing callers are unaffected.
120+
- **NAc `_reward_bias` clamps to [0, max_reward_bias].** Negative rewards (pain) produce 0.0 bias. Bias only widens EC recognition, never narrows. Pain avoidance is handled by valence annotation on edges, not by reward bias.
121+
- **`BioStack.save_cerebellum()` must be called at session end.** Without it, learned forward models are lost.
122+
118123
## `maxim doctor` — environment diagnostics
119124

120125
Runs platform-aware checks + prints fix hints with the user's actual IPs filled in.
@@ -237,7 +242,7 @@ Project structure is documented in [docs/reference.md](docs/reference.md).
237242
| Tools | `tools/` (register in registry), `runtime/executor.py` (aliases) |
238243
| LLM routing | `models/language/router.py` (provider fallback, typed exception branches, `dispatch_exhausted` aggregated WARN), `models/language/maxim_peer_backend.py` (self-hosted peer backend — one HTTP call, typed failure, streaming with strict mid-stream fail, `health_check` + `for_url` factory), `runtime/lane_backends.py::BACKEND_CLASSES` (dispatch table), `models/language/config.py` (profiles), `models/language/json_parser.py` (JSON repair) |
239244
| Memory | `memory/hippocampus.py`, `memory/concept_extractor.py`, `memory/store.py` (protocols), `memory/percept_trace_buffer.py` (τ-decay ring buffer) |
240-
| Causal learning | `decisions/nac.py` (reward bias, eligibility traces), `decisions/causal_link.py` (CausalLink, percept_refs) |
245+
| Causal learning | `decisions/nac.py` (reward bias, eligibility traces, distribute_reward), `decisions/causal_link.py` (CausalLink, percept_refs) |
241246
| Substrate encoding | `similarity/encoder.py` (LinguisticEncoder), `similarity/ec.py` (pattern_complete_or_separate, centroid update) |
242247
| Prompt composition | `prompts/assembler.py` (PromptAssembler, MemorySummary), `agents/prompt_builder.py` (legacy) |
243248
| Percept schema | `agents/percept_context.py` (PerceptContext), `agents/percept_factory.py` (factories), `agents/modality.py` (SensoryTag, SubstrateModality) |
@@ -250,7 +255,8 @@ Project structure is documented in [docs/reference.md](docs/reference.md).
250255
| DM campaigns | `simulation/dm_schema.py`, `simulation/dm_runtime.py` |
251256
| Benchmarks | `simulation/benchmark.py`, `simulation/validation.py` |
252257
| Research | `simulation/research_agents.py`, `simulation/research_orchestrator.py` |
253-
| Embodiment | `embodiment/sem.py`, `embodiment/body.py`, `embodiment/cerebellum.py`, `embodiment/motor.py` |
258+
| Valence | `memory/episode.py` (Episode.valence, apply_hebbian_on_close, salience_spike_rule), `agents/bus.py` (propagate_valence), `memory/hippocampus.py` (capture_reaction, include_valence) |
259+
| Embodiment | `embodiment/sem.py`, `embodiment/body.py`, `embodiment/cerebellum.py` (forward models), `embodiment/backends/cerebellum_modulator.py` (predict/fallback/train + success reactions), `embodiment/motor.py` |
254260
| Mesh | `mesh/identity.py`, `mesh/knowledge.py`, `mesh/task_delegation.py`, `mesh/clock.py` |
255261
| Lane tiers | `runtime/function_router.py`, `runtime/lane_models.py`, `runtime/lane_backends.py` |
256262
| Multi-agent | `runtime/agent_factory.py`, `runtime/agent_pool.py` |

docs/decisions.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,14 @@ config = NACConfig(
329329
)
330330
```
331331

332+
### Reward Distribution (SEM Learning Loop)
333+
334+
`NAc.distribute_reward(agent_id, reward)` distributes reward across eligible nodes via `credit_node()`. Eligibility traces are set by `update_eligibility()` when percepts complete to substrate nodes. The ReactionBus subscriber in `build_bio_stack` maps reactions to rewards:
335+
- `Valence.NEGATIVE` -- reward = -intensity (clamps to 0 in credit_node -- bias only widens)
336+
- `Valence.POSITIVE` -- reward = +intensity (widens EC recognition radius)
337+
338+
`get_threshold_overrides(agent_id)` returns the per-node bias map for EC to use during `pattern_complete`.
339+
332340
---
333341

334342
## Biological Inspiration

docs/embodiment_guide.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -302,10 +302,34 @@ Executes motor programs step by step with:
302302
- **PainBus subscription** for mid-sequence interrupts
303303
- **Gate tightening** after painful executions (10% per failure)
304304

305+
## SEM Learning Loop (Phase 2 -- Shipped)
306+
307+
When a SEM entity interaction produces a reaction (pain on failure, satisfaction on confident prediction), the signal flows through the full bio-pipeline:
308+
309+
1. **CerebellumModulator** executes affordance -- emits failure reaction (NEGATIVE) or success reaction (POSITIVE)
310+
2. **ReactionBus** dispatches to subscribers:
311+
- `hippocampus.capture_reaction` -- episode valence annotation
312+
- `nac.distribute_reward` -- EC threshold adjustment
313+
3. **Episode close** -- `apply_hebbian_on_close` annotates edges with `metadata["valence"]`
314+
4. **Pain spike** -- `salience_spike_rule` closes the episode boundary
315+
5. **Future retrieval** -- `spreading_activation(propagate_valence=True)` carries affective memory
316+
317+
### Success reactions (negativity bias)
318+
319+
CerebellumModulator emits `_emit_success_reaction` when confident enough to skip LLM fallback. Intensity is lower than failure (0.1-0.3 vs 0.3-0.5) -- biologically motivated negativity bias.
320+
321+
### NAc reward distribution
322+
323+
`distribute_reward` credits eligible substrate nodes proportionally to eligibility traces. Positive rewards widen EC recognition (lower threshold); negative rewards clamp to 0 (bias never narrows).
324+
325+
### Cerebellum activation in production
326+
327+
`BioStack.cerebellum` is now constructed by `build_bio_stack` and forwarded via `build_executor(cerebellum=...)` to `generate_tools_for_entity`, which creates `CerebellumModulator` instances with a wired `reaction_bus`. This means every SEM affordance tool now has a live Cerebellum backing it -- predictions, training, and reaction emission all happen automatically.
328+
305329
## What's Next
306330

307-
- **Phase 2**: Composable failure modes persistent failures with recovery conditions
308-
- **Phase 3**: Hardware adapter wrap real robot SDKs as SEM backends
331+
- **Phase 2**: Composable failure modes -- persistent failures with recovery conditions
332+
- **Phase 3**: Hardware adapter -- wrap real robot SDKs as SEM backends
309333

310334
See [embodiment_core_plan.md](plans/archive/embodiment_core_plan.md) (archived) for the historical roadmap.
311335

docs/reference.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,8 @@ Agents -> Planning -> Decision Engine -> Runtime -> Executor -> Tools -> Environ
3232
| `src/maxim/mesh/` | Simulation-only: `bus`, `identity`, `message`, `naming` (R0 deleted the dead agent-mesh subsystem; see "Removed in R0" below) |
3333
| `src/maxim/simulation/` | Simulation modes, generative campaigns, research protocol, benchmarks |
3434
| `src/maxim/integration/` | MemoryHub cross-system coordinator (11 bio-systems) |
35-
| `src/maxim/decisions/` | NAc causal learning, adaptive planner |
35+
| `src/maxim/reactions/` | Reaction types, ReactionBus (per-kind dispatch), PerceptProducer/ReactionProducer protocols |
36+
| `src/maxim/decisions/` | NAc causal learning, adaptive planner, reward distribution |
3637
| `src/maxim/time/` | SCN temporal rhythm indexing |
3738
| `src/maxim/similarity/` | Entorhinal Cortex (pattern completion, centroid update) + LinguisticEncoder (P1) + ConceptDecomposer (noun-phrase extraction before EC) |
3839
| `src/maxim/prompts/` | PromptAssembler (B1), MemorySummary, prompt profiles |
@@ -122,13 +123,15 @@ Maxim uses neuroscience-inspired names. Here is the translation:
122123
|----------|--------------|--------|--------------|
123124
| Hippocampus | Episodic memory | `memory/` | Stores and recalls experiences (events, conversations) |
124125
| ATL | Semantic memory | `memory/` | Extracts concepts, categories, and generalizations |
125-
| NAc | Reward / causal learning | `decisions/` | Learns cause-and-effect relationships ("what leads to what") |
126+
| NAc | Reward / causal learning | `decisions/` | Learns cause-and-effect relationships ("what leads to what"). `distribute_reward` now wired via ReactionBus subscriber in `build_bio_stack` |
126127
| SCN | Internal clock | `time/` | Tracks circadian-like temporal patterns and rhythms |
127128
| EC | Memory indexing + substrate recognition | `similarity/` | Routes queries via similarity; pattern_complete_or_separate for substrate nodes (P1) |
128129
| Angular Gyrus | Cross-modal algebra | `math/` | Combines memories across different modalities |
129-
| Cerebellum | Motor prediction | `embodiment/` | Predicts outcomes of physical actions, learns motor programs |
130+
| Cerebellum | Motor prediction | `embodiment/` | Predicts outcomes of physical actions, learns motor programs. Now activated in production via `BioStack.cerebellum` and `build_executor(cerebellum=...)` |
130131
| Amygdala / Fear | Threat detection | `proprioception/` | Detects harm, triggers pain signals, gates risky actions |
131132
| Default Network | Reactive behavior | `default_network/` | Background processing, idle behaviors, spontaneous thoughts |
133+
| Valence | Affective edge signal | `memory/episode.py` | Affective signal on Hebbian edges (`Edge.metadata["valence"]`), computed from Reactions at episode close via `apply_hebbian_on_close`. Propagated by `spreading_activation(propagate_valence=True)` |
134+
| Episode Boundary Rules | Pluggable boundary detection | `memory/episode.py` | `BoundaryRule` callables on `EpisodeBoundaryDetector`. Defaults: tick gap, channel change, scn_tag change. New: `salience_spike_rule(min_intensity=0.5)` triggers boundary on pain/salience spikes via `CaptureEvent.salience_spike` |
132135

133136
---
134137

docs/user/concept-decomposition.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,3 +134,7 @@ node_ids = encoder.encode_decomposed(
134134
- **English only** (Stage 1). spaCy `en_core_web_sm` is English-only. Multi-language support requires a multilingual model (`xx_ent_wiki_sm`) or per-language model selection.
135135
- **Short fragments** may over-decompose. The `min_chunk_len` filter helps, but domain-specific inputs may need a custom strategy.
136136
- **No relation tagging yet** (Stage 2). Chunks are bound with untagged Hebbian edges. Role-tagged edges (`relation="spatial"`) are planned for a future stage.
137+
138+
## Connection to Valence Annotation
139+
140+
With concept decomposition enabled, valence annotation targets individual concept nodes ("rusty sword", "heavy", "sharp") rather than whole-sentence blobs. This means the agent learns "rusty sword is associated with pain" rather than "the entire sentence about picking up a rusty sword is painful." See [valence_annotation_poc.md](../experiments/valence_annotation_poc.md) for the demonstration.

htmls-guides/maxim-embodiment.html

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,74 @@ <h3 class="text-lg font-semibold text-emerald-400 mb-2">Biological Inspiration</
122122
<p class="text-slate-300 leading-relaxed">The LLM is a <strong class="text-white">teacher</strong>, not a per-tick oracle. After enough observations, the Cerebellum handles predictions deterministically. In testing, LLM calls drop from 100 to &le;40 over 100 actions.</p>
123123
</section>
124124

125+
<!-- SEM Learning Loop -->
126+
<section class="space-y-4">
127+
<h2 id="sem-learning-loop" class="text-2xl font-semibold text-white border-b border-indigo-500/50 pb-2 scroll-mt-32">The SEM Learning Loop</h2>
128+
129+
<div class="bg-slate-800 border-l-4 border-emerald-500 rounded-xl p-6">
130+
<h3 class="text-lg font-semibold text-emerald-400 mb-2">Biological Inspiration</h3>
131+
<p class="text-slate-300">In the brain, the cerebellum doesn't just predict &mdash; it emits signals when predictions fail <em>or</em> succeed. These signals propagate to the hippocampus (contextual memory) and nucleus accumbens (reward learning) simultaneously, closing the loop between motor execution and long-term behavioral adaptation. Success and failure are not symmetric: negative outcomes carry disproportionate weight, a phenomenon known as <em>negativity bias</em>.</p>
132+
</div>
133+
134+
<p class="text-slate-300 leading-relaxed">When the Cerebellum evaluates an affordance, the outcome flows through the bio-pipeline as a <em class="text-indigo-400 not-italic font-medium">reaction</em> &mdash; a typed evaluative signal that drives learning across multiple systems simultaneously. This is the SEM Learning Loop.</p>
135+
136+
<h3 class="text-white font-semibold mt-4">CerebellumModulator Outcomes</h3>
137+
<p class="text-slate-300 leading-relaxed">The <code class="bg-slate-950 px-1 rounded text-xs">CerebellumModulator</code> classifies each affordance execution into one of three outcome paths:</p>
138+
139+
<div class="grid md:grid-cols-3 gap-4">
140+
<div class="bg-slate-800 border border-slate-700 rounded-xl p-5">
141+
<h4 class="text-emerald-400 font-semibold mb-2">Confident Prediction</h4>
142+
<p class="text-slate-300 text-sm">Confidence &ge; 0.3 and low variance. The Cerebellum handles the prediction without LLM fallback. Emits a <strong class="text-white">success reaction</strong> with positive valence.</p>
143+
</div>
144+
<div class="bg-slate-800 border border-slate-700 rounded-xl p-5">
145+
<h4 class="text-amber-400 font-semibold mb-2">LLM Fallback</h4>
146+
<p class="text-slate-300 text-sm">Confidence &lt; 0.3 or high variance. The LLM acts as teacher. The Cerebellum trains on the LLM's response via Rescorla-Wagner update. No reaction emitted &mdash; the system is still learning.</p>
147+
</div>
148+
<div class="bg-slate-800 border border-slate-700 rounded-xl p-5">
149+
<h4 class="text-red-400 font-semibold mb-2">Failure</h4>
150+
<p class="text-slate-300 text-sm">Affordance execution triggers a failure mode (e.g., shatter, overheat). Emits a <strong class="text-white">pain reaction</strong> with negative valence via the PainBus.</p>
151+
</div>
152+
</div>
153+
154+
<h3 class="text-white font-semibold mt-6">Signal Flow</h3>
155+
<p class="text-slate-300 leading-relaxed">Both success and pain reactions flow through the <code class="bg-slate-950 px-1 rounded text-xs">ReactionBus</code>, which dispatches to two subscribers in parallel:</p>
156+
157+
<div class="bg-slate-950 rounded-lg p-5 font-mono text-sm text-slate-300 overflow-x-auto whitespace-pre"><span class="text-indigo-400 text-xs uppercase tracking-wider">SEM Learning Loop Signal Flow</span>
158+
159+
CerebellumModulator
160+
|
161+
|-- confident prediction --&rarr; _emit_success_reaction(valence=+0.3..+1.0)
162+
|-- failure mode fired --&rarr; PainBus.publish(PainSignal) --&rarr; pain_signal_to_reaction
163+
|
164+
&darr;
165+
ReactionBus.dispatch(reaction)
166+
|
167+
|-- Subscriber 1: Hippocampus
168+
| Episode captures the reaction context
169+
| Episode.valence set at finalize (mean of all reactions)
170+
| Hebbian edges annotated with Edge.metadata["valence"]
171+
|
172+
|-- Subscriber 2: NAc (via distribute_reward)
173+
| Adjusts per-node reward_bias in the Hebbian graph
174+
| EC similarity thresholds shift: positive &rarr; tighter, negative &rarr; looser
175+
|
176+
&darr;
177+
Episode boundary: salience_spike_rule
178+
Pain spike (intensity &ge; 0.7) forces episode close
179+
New episode starts with clean slate
180+
|
181+
&darr;
182+
Future retrieval: retrieve_on_cue(include_valence=True)
183+
Spreading activation propagates edge valence
184+
Recalled memories carry affective coloring</div>
185+
186+
<h3 class="text-white font-semibold mt-6">Negativity Bias</h3>
187+
<p class="text-slate-300 leading-relaxed">Success reactions carry positive valence but at lower intensity than pain reactions &mdash; mirroring biological negativity bias. A single painful failure creates a stronger learning signal than several routine successes. This asymmetry means the agent develops caution around dangerous affordances faster than it develops confidence around safe ones, which is the correct survival trade-off for an embodied system.</p>
188+
189+
<h3 class="text-white font-semibold mt-6">Cerebellum Activation via BioStack</h3>
190+
<p class="text-slate-300 leading-relaxed">In production, the Cerebellum is wired through <code class="bg-slate-950 px-1 rounded text-xs">BioStack.cerebellum</code> and activated by <code class="bg-slate-950 px-1 rounded text-xs">build_executor</code>. This means every agent entry point that constructs a bio-stack gets Cerebellum forward models and the full SEM Learning Loop automatically &mdash; no per-caller wiring required.</p>
191+
</section>
192+
125193
<!-- Motor Programs -->
126194
<section class="space-y-4">
127195
<h2 id="motor-programs" class="text-2xl font-semibold text-white border-b border-indigo-500/50 pb-2 scroll-mt-32">Motor Programs</h2>

0 commit comments

Comments
 (0)