Skip to content

Commit 3953741

Browse files
dennys246claude
andcommitted
docs: clean up plans/README.md -- condense shipped entries, add missing shell plans
Collapsed verbose per-stage Plan 4 entries into compact shipped summaries. Added 5 root plan files that were missing from the index (reactive mesh roadmap, cross-platform file lock, mesh doc transport, pain bus bridge unification, node security simplification). Biosystem unification entry condensed from 7 lines to 1 (all waves archived). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 667e0d1 commit 3953741

1 file changed

Lines changed: 12 additions & 29 deletions

File tree

docs/plans/README.md

Lines changed: 12 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -36,37 +36,20 @@ These accumulate evidence and refinement over time. They are not on the critical
3636

3737
- [substrate_concept_decomposition.md](substrate_concept_decomposition.md)**Stage 1 COMPLETE + VALIDATED** (2026-04-17). Protocol-based noun-phrase extraction. 100% concept-level recall vs 36.4% baseline. Stage 2 (role-tagged edges) pending.
3838
- [substrate_episode_boundary_enrichment.md](substrate_episode_boundary_enrichment.md)**PARTIAL** (2026-04-17). Stage 3 (pain/salience spike) SHIPPED via sem_learning_loop.md. `observe_episode_event` now wired into production agent loop via behavioral_convergence_wiring.md. Stages 1-2 (tool execution + semantic shift) remain — ship before P5.
39-
- [biosystem_unification.md](biosystem_unification.md)**central tracking doc** (2026-04-14, updated 2026-04-17) for the bio-system structural-enforcement work. Waves 0-2 **ALL SHIPPED + ARCHIVED**. Wave 3 (bio_stack umbrella) committed on `feat/bio-stack-unification`, PR pending. Wave 4 (agent_factory_canonicalization) not scheduled.
40-
- [archive/executor_bootstrap_unification.md](archive/executor_bootstrap_unification.md) — ✅ Wave 0 SHIPPED (PR #114). `build_executor(*, pain_bus)`.
41-
- [archive/pain_bus_unification.md](archive/pain_bus_unification.md) — ✅ Wave 1 SHIPPED (PR #125). `build_pain_bus(*, hippocampus, nac)`.
42-
- [archive/reaction_bus_unification.md](archive/reaction_bus_unification.md) — ✅ Wave 1 SHIPPED (PR #134). `build_reaction_bus(*)`.
43-
- [archive/memory_hub_unification.md](archive/memory_hub_unification.md) — ✅ Wave 2 SHIPPED (PR #136). `build_memory_hub(*, hippocampus, scn, nac, ec)`.
44-
- [archive/default_network_unification.md](archive/default_network_unification.md) — ✅ Wave 2 SHIPPED (PR #135). `build_default_network(*, nac)`.
45-
- [archive/bio_stack_unification.md](archive/bio_stack_unification.md) — ✅ Wave 3 SHIPPED (PR #140). `build_bio_stack(*, persistence_dir)`. 4 sites migrated.
39+
- [biosystem_unification.md](biosystem_unification.md)**central tracking doc** (2026-04-14, updated 2026-04-17). Waves 0-3 **ALL SHIPPED + ARCHIVED**. Wave 4 (agent_factory_canonicalization) not scheduled.
4640
- [tool_refinement_plan.md](tool_refinement_plan.md) — living doc for agent tool surface curation
4741
- [agent_factory_canonicalization.md](agent_factory_canonicalization.md)**RUNNING DOC, not scheduled** (2026-04-14). The Option D follow-up to `executor_bootstrap_unification.md` — make `AgentFactory.create_agent` the only door for constructing an agent in Maxim. Becomes a downhill rewrite once Wave 3 `build_bio_stack` merges. Subsumes `sem_execution_hook.md` Stage 2b. Trigger conditions documented inline.
48-
- [node_security_simplification.md](node_security_simplification.md) — Phase 1 immediate security fixes (timing-safe auth comparison, rate-limiter bucket key, help-text corrections). Phase 2 config-surface unification deferred after Plan 4.
49-
- [llm_path_refinement.md](llm_path_refinement.md) — meta-plan for the LLM routing path refactor. Motivated by two 2026-04-12 peer-leader incidents + an audit that revealed `_OpenAIBackend` has a hidden ~52s retry loop. **Ships as the 0.4 stability version.** Plans 1, 2, 3, 3.5 fully shipped and archived; Plan 3.6 R5 (VRAM spillover detection) shipped; Plan 4 Stages A+B (agent_id observability + recovery-time bench) shipped; **Plan 4 Stage C1+C2+C3.1 shipped** (`mesh.yml` schema + `list-nodes` + drain/resume + `init-mesh`); **C3.2+C3.3+C3.4 shipped** (`add-node`/`remove-node`, `--node install`, `/v1/debug/vram` VRAM endpoint); C3.5/C3.6 + C4 (router-drain coupling) remain in scope. Authoritative architecture reference at [../architecture/llm_routing.md](../architecture/llm_routing.md); stress test protocol at [../experiments/protocols/llm_path_stress_test.md](../experiments/protocols/llm_path_stress_test.md).
50-
- **✅ Plan 1 (Foundation) — SHIPPED, ARCHIVED**[archive/llm_path_foundation.md](archive/llm_path_foundation.md). R0 deleted ~1,250 LOC dead mesh (commit `e811787`); R1 shipped `maxim/utils/http.py` with endpoint registry + typed `HTTPError` + `RequestContext` contextvars + `X-Maxim-*` header propagation (PRs #88, #90, #91). See [project_llm_path_r1_shipped.md](../../.claude/projects/-Users-dennyschaedig-Scripts-Maxim/memory/project_llm_path_r1_shipped.md).
51-
- **✅ Plan 2 (Typed Errors + Role Detection) — SHIPPED, ARCHIVED**[archive/llm_path_typed_errors.md](archive/llm_path_typed_errors.md). R2a-d: role detection at CLI boot, typed `BackendError` hierarchy with `.fix_hint`, two-stage probe, SSRF moved to `utils/net.py` (PRs #92, #93). See [project_llm_path_r2_shipped.md](../../.claude/projects/-Users-dennyschaedig-Scripts-Maxim/memory/project_llm_path_r2_shipped.md).
52-
- **✅ Plan 3 (Fast Failover) — SHIPPED, ARCHIVED**[archive/llm_path_fast_failover.md](archive/llm_path_fast_failover.md). R2.5 `_MaximPeerBackend` purpose-built single-HTTP-call backend + router typed-exception dispatch + `BACKEND_CLASSES`; R2.6 probe consolidation. **The 52s fail-slow is dead.** PR #94, commit `ce5f034`. Programmatic gate: < 5s p99 against mocked-dead-peer fixture. See [project_llm_path_r3_shipped.md](../../.claude/projects/-Users-dennyschaedig-Scripts-Maxim/memory/project_llm_path_r3_shipped.md) for the 10 load-bearing invariants.
53-
- **✅ Plan 3.5 (Cancellation Hygiene) — SHIPPED, ARCHIVED**[archive/llm_path_cancellation_hygiene.md](archive/llm_path_cancellation_hygiene.md). R1-R6: cooperative cancellation primitives in `maxim/utils/cancellation.py` + "HTTP fires first" timeout contract (HTTP authoritative at 300s, agent layer strict safety net above). PR #96, commit `6a4f505`. See [project_llm_path_cancellation_hygiene_shipped.md](../../.claude/projects/-Users-dennyschaedig-Scripts-Maxim/memory/project_llm_path_cancellation_hygiene_shipped.md).
54-
- [llm_path_peer_failover.md](llm_path_peer_failover.md)**Plan 3.6: Peer Failover — PARTIAL SHIP (2026-04-14).** **R5 VRAM spillover detection ✅ SHIPPED** (PR #99, commit `2884e58`): doctor `check_vram_pressure` + spawn-time `_check_vram_spillover_risk` + shared `project_vram_usage` math + fix for pre-existing `check_llm_model_active` mutable-global bug. Dynamic headroom `max(1.5, 0.55 × weights_gb)` calibrated to the 2026-04-13 incident. R1–R4 (multi-leader `peer.yml`) **remain draft** — on hold until the user's second GPU comes online. See [project_vram_spillover_detection_shipped.md](../../.claude/projects/-Users-dennyschaedig-Scripts-Maxim/memory/project_vram_spillover_detection_shipped.md) for the 5 R5 load-bearing invariants.
55-
- [llm_path_operator_visibility.md](llm_path_operator_visibility.md)**Plan 4: Operator Visibility — PARTIAL SHIP (2026-04-14).** Split into three sequential stages:
56-
- **✅ Stage A — agent_id observability fix** (PR in review on `feat/llm-path-operator-visibility`). Three complementary changes close the Phase D observability gap: router capability-flag kwarg forwarding, `set_context` boundary binding in `LLMWorker._call_llm_with_timeout`, and contextvar fallback in `_normalize_request_context`. 11 new regression tests.
57-
- **✅ Stage B — recovery-time bench harness** (same PR). New `maxim bench recovery-time` CLI subcommand at `src/maxim/bench/` (NOT `benchmark/` — name collision with `maxim.api.benchmark` public verb). Uses `_MaximPeerBackend` directly to measure peer recovery without sim-cadence workload artifacts. 21 new tests. **Phase D2 hardware validation:** 58.68s recovery window on real RTX 5080 (matches 53s leader self-report + ~5s proxy gap), 750/750 `agent_id` coverage, 199/199 typed `BackendDown` failures, fast-fail p99=614ms. See [llm_path_stress_plan4_20260414.md](../experiments/results/llm_path_stress_plan4_20260414.md) and [bench_recovery_time_rerun.md](../experiments/protocols/bench_recovery_time_rerun.md). See [project_llm_path_operator_visibility_ab_shipped.md](../../.claude/projects/-Users-dennyschaedig-Scripts-Maxim/memory/project_llm_path_operator_visibility_ab_shipped.md) for the 8 load-bearing invariants.
58-
- **✅ Stage C1 — `mesh.yml` schema + `list-nodes` + `--node status|health`** (PR #108, merged 2026-04-14). Hand-rolled `mesh.yml` parser (FROZEN dialect — no PyYAML), `peer/probe_classify.py` shared classifier (single source of truth for probe outcome → CheckResult mapping across mesh_cli + doctor), peer.yml→mesh fallback for zero-breaking-change. 2 review rounds caught 31 findings incl. silent `cluster_key` `#` truncation. See [project_plan4_c1_shipped.md](../../.claude/projects/-Users-dennyschaedig-Scripts-Maxim/memory/project_plan4_c1_shipped.md).
59-
- **✅ Stage C2 — drain/resume with runtime state layer** (PR #113, merged 2026-04-14). `~/.maxim/util/drained_nodes.{role}.txt` with `filelock.FileLock` cross-process serialization. Pivoted from Option A1 (config-only drain + TOML migration) to Option B (runtime state) after pre-design review caught 3 criticals. Pre-merge review caught 15 findings incl. 1 triple-confirmed orphan retry_id bug. Renamed in-house `maxim/utils/filelock.py``process_lock.py` to avoid name collision with 3rd-party. New `atomic_write_secret` wrapper. See [project_plan4_c2_shipped.md](../../.claude/projects/-Users-dennyschaedig-Scripts-Maxim/memory/project_plan4_c2_shipped.md).
60-
- **✅ Stage C3.1 — `init-mesh` verb + `mesh.yml` writer infrastructure** (PR #118, merged 2026-04-14). `MeshConfig.to_yaml()` round-trip serializer + `write_mesh_config()` disk-I/O wrapper using `atomic_write_secret`. Strict CI grep allow-list enforces `write_mesh_config` callers (only `mesh_setup.py` + tests). Pre-merge review caught 14 findings incl. 6 cross-confirmed. See `project_plan4_c3.1_shipped.md` (will be added to memory).
61-
- **🚧 Stage C3.2 — `add-node` + `remove-node` verbs** (PR pending, branch `feat/plan4-c3.2-add-remove-node`, fold complete). Closes the gap C3.1 left open: operators can now grow/shrink `mesh.yml::nodes` from the CLI without hand-editing. Renamed `init_mesh.py``mesh_setup.py` to group the 3 setup verbs. `MeshConfig.__post_init__` now validates `self_name in nodes` (hoisted from parser per A1 cross-confirmed fold). Pre-merge review caught 16 findings incl. 4 cross-confirmed. See `project_plan4_c3.2_shipped.md` (will be added to memory).
62-
- **Stage C3 remaining (DEFERRED):** `--node install` + VRAM precheck, `--node refresh`, `/v1/mesh/*` admin API, per-agent rate limiting, request-trace ring buffer, cluster key rotation. The full scope is still in [llm_path_operator_visibility.md](llm_path_operator_visibility.md) under "Phases".
63-
- Deferred shell plans (revive on stress-test-defined triggers):
64-
- [deferred/llm_path_multi_peer_dispatch.md](deferred/llm_path_multi_peer_dispatch.md) — multi-peer reactive overflow with rendezvous-hash distribution. **Partially triggered (2026-04-13)** by the user's RTX 3070 hardware; awaiting Plan 3.6 R1-R4 + Plan 4 Stage C ship.
65-
- [deferred/llm_mesh_capability_aware.md](deferred/llm_mesh_capability_aware.md) — capability advertisement + capability-aware router ranking. Revive when ≥2 nodes serve **different** loaded models.
66-
- [deferred/llm_path_async_router.md](deferred/llm_path_async_router.md) — async router if `_inference_lock` becomes the bottleneck
67-
- [deferred/llm_path_fair_scheduling.md](deferred/llm_path_fair_scheduling.md) — bio-inspired priority classes + fair-share
68-
69-
**Long-term mesh roadmap** (current state → true reactive mesh): see the "Long-term roadmap" section in [llm_path_refinement.md](llm_path_refinement.md). Five concrete steps from leader/peer to peer-to-peer mesh with leader election.
42+
- [node_security_simplification.md](node_security_simplification.md) — Phase 1 ✅ SHIPPED. Phase 2 config-surface unification deferred.
43+
- [reactive_peer_mesh_roadmap.md](reactive_peer_mesh_roadmap.md) — living roadmap for the full reactive peer mesh arc (C3→C9). C3-C4.6 COMPLETE. C5+ remain.
44+
- [cross_platform_file_lock.md](cross_platform_file_lock.md) — shell plan to unify `utils/process_lock` and `filelock.FileLock`. Blocks nothing.
45+
- [mesh_doc_transport.md](mesh_doc_transport.md) — shell plan for mesh-to-mesh structured doc exchange (C9). Not started.
46+
- [pain_bus_bridge_subscriber_unification.md](pain_bus_bridge_subscriber_unification.md) — shell plan for bridge×subscriber attribution-asymmetry fix. Not started.
47+
- [llm_path_refinement.md](llm_path_refinement.md) — meta-plan for the LLM routing path refactor. Plans 1-3.5 archived; Plan 3.6 R5 shipped; **Plan 4 Stages A+B + C1-C3.6 + C4+C4.5+C4.6 ALL SHIPPED.** Reactive mesh self-healing loop complete. Only stress phases B/C/E remain in scope. Architecture ref: [../architecture/llm_routing.md](../architecture/llm_routing.md).
48+
- [llm_path_peer_failover.md](llm_path_peer_failover.md) — Plan 3.6 R5 (VRAM spillover) ✅ SHIPPED. R1-R4 (multi-leader) remain draft, on hold until second GPU.
49+
- [llm_path_operator_visibility.md](llm_path_operator_visibility.md) — Plan 4. **Core stages ALL SHIPPED** (A, B, C1-C3.6, C4, C4.5, C4.6). Remaining deferred scope (admin API, rate limiting, key rotation) tracked in [reactive_peer_mesh_roadmap.md](reactive_peer_mesh_roadmap.md) as C6/C7.
50+
- Deferred: [deferred/llm_path_multi_peer_dispatch.md](deferred/llm_path_multi_peer_dispatch.md), [deferred/llm_mesh_capability_aware.md](deferred/llm_mesh_capability_aware.md), [deferred/llm_path_async_router.md](deferred/llm_path_async_router.md), [deferred/llm_path_fair_scheduling.md](deferred/llm_path_fair_scheduling.md)
51+
52+
**Long-term mesh roadmap**: [reactive_peer_mesh_roadmap.md](reactive_peer_mesh_roadmap.md). Stages C3-C4.6 complete; C5 (capacity-aware routing), C6 (admin API), C7 (security hardening) remain.
7053

7154
## Deferred (post-1.0, revive on trigger)
7255

0 commit comments

Comments
 (0)