kunalsuri · kunalsuri · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026 · Jul 3, 2026
diff --git a/ai/INDEX.md b/ai/INDEX.md
@@ -28,6 +28,7 @@ change, update this one file.
 | Feature specs | `ai/lab/specs/` | human + AI draft | per feature |
 | Evaluations | `ai/lab/evaluations/` | human | post-ship |
 | Experiments | `ai/lab/experiments/` | human + AI | when trying new agent approaches |
+| Lessons learnt (kit dev) | `docs/dev/lessons-learnt/` | human + AI draft | when questioning the ai/ design |
 | Install manifest | `ai/install-manifest.json` | installer | uninstall only |
 | Maturity report | `ai/analysis/audit-reports/MATURITY_REPORT.json` | `check-repo-maturity` | on demand |
 | Drift report | `ai/analysis/audit-reports/DRIFT_REPORT.md` | `drift` | on demand |

diff --git a/ai/lab/README.md b/ai/lab/README.md
@@ -4,6 +4,24 @@
 The strategic layer: *how* we build and *what we learned* — not code, not navigation.
 Loaded when planning or reviewing, not on every agent session.
 
+## Why this folder matters `[inferred]`
+
+`ai/lab/` holds the only knowledge in `ai/` that **cannot be regenerated**. The maps
+(`ai/guide/`) and analyses (`ai/analysis/`) describe *what the repo is* — if they were
+lost, `/cold-start` could rebuild them from the code and a human would re-verify.
+`lab/` records *why the repo is the way it is and what was learned building it*:
+decisions, trade-offs, failed approaches, retrospectives, and resolved design
+questions. No amount of re-crawling the code recovers a *reason* — if this folder is
+lost, that knowledge is gone forever.
+
+This is also what agents need to work safely: the maps tell an agent *where* code is;
+`lab/` tells it which "weird" code is deliberate and must not be "fixed", and which
+plans are current. Without `lab/`, the `ai/` layer is a navigation tool; with it, it
+is a knowledge base. In memory terms: `guide/` is the repo's semantic memory (facts),
+`lab/` is its episodic memory (experience) — see
+[docs/dev/lessons-learnt/knowledge-kinds-memory-context-and-harness.md](../../docs/dev/lessons-learnt/knowledge-kinds-memory-context-and-harness.md)
+for the full rationale.
+
 | Folder | Contains | Who writes it |
 |---|---|---|
 | `specs/` | One spec per planned/in-progress feature | Human + AI draft |

diff --git a/docs/README.md b/docs/README.md
@@ -38,7 +38,7 @@ Everything about *using* the kit and *understanding* the method lives here.
 - [METHODOLOGY.md](METHODOLOGY.md) — the trust model (`[inferred]` → `[verified]`), Process 1 vs 2, the 7-step workflow, how the map stays honest over time.
 - [PROBLEM-SOLUTION-STATEMENT.md](PROBLEM-SOLUTION-STATEMENT.md) — the one-page problem framing.
 - [reports/technical-report-draft.md](reports/technical-report-draft.md) — the academic treatment (draft).
-- [dev/lessons-learnt/](dev/lessons-learnt/drift-blindspots-and-automation-bias.md) — recorded lessons, e.g. drift blind spots and automation bias, and [model tiering](dev/lessons-learnt/model-tiering-plan-heavy-implement-light.md) (plan with a heavy model, implement with a light one).
+- [dev/lessons-learnt/](dev/lessons-learnt/drift-blindspots-and-automation-bias.md) — recorded lessons, e.g. drift blind spots and automation bias, [model tiering](dev/lessons-learnt/model-tiering-plan-heavy-implement-light.md) (plan with a heavy model, implement with a light one), and [two kinds of knowledge](dev/lessons-learnt/knowledge-kinds-memory-context-and-harness.md) (why `ai/lab/` belongs in a knowledge repo; memory, context, and the harness).
 
 **Maintain:**
 - [RELEASE-CHECKLIST.md](RELEASE-CHECKLIST.md) — release-day procedure (tagging, Zenodo, post-release).

diff --git a/docs/dev/lessons-learnt/knowledge-kinds-memory-context-and-harness.md b/docs/dev/lessons-learnt/knowledge-kinds-memory-context-and-harness.md
@@ -0,0 +1,195 @@
+<!-- Copyright (c) 2026 Kunal Suri (CEA LIST). All rights reserved. -->
+# Two Kinds of Knowledge: Memory, Context, and the Harness
+
+## Metadata
+
+| Field | Value |
+|---|---|
+| **Timestamp** | 2026-07-03T00:00:00+02:00 |
+| **Category** | Knowledge-Layer Design / Memory / Context Engineering / Harness Engineering |
+| **Status** | `[inferred]` — agent-drafted; a human must audit before flipping any item to `[verified]` |
+
+This document records the resolution of a recurring design question about the `ai/`
+folder — why a "knowledge repo" contains `lab/` at all — so the same confusion is
+not re-litigated in future sessions or PRs. It then generalizes the answer into how
+memory, context, context engineering, and harness engineering interlock in this kit.
+
+---
+
+## Lesson 1 — `ai/` holds TWO kinds of knowledge, and that is by design
+
+**The confusion (2026-07-03):** "`ai/` is supposed to be a knowledge repo — a map of
+the codebase. But it contains `lab/` (specs, ADRs, experiments), which doesn't
+describe the code. Doesn't that defeat the point?"
+
+**Resolution:** it doesn't, because "knowledge" here is two different things:
+
+| | Descriptive knowledge | Intentional knowledge |
+|---|---|---|
+| **Answers** | *What is the repo? Where is the code?* | *Why is it that way? What is planned? What did we learn?* |
+| **Lives in** | `ai/guide/`, `ai/analysis/`, `ai/repo-profile.json` | `ai/lab/` (specs, ADRs, evaluations, experiments) |
+| **Derivable from code?** | **Yes** — if lost, `/cold-start` can regenerate it and a human re-verifies | **No** — if lost, it is gone forever; no amount of re-crawling recovers a *reason* |
+| **Freshness model** | Regenerated / drift-checked against the code | Accumulated; append-mostly, archived when superseded |
+| **Load pattern** | `guide/` every session; `analysis/` on demand | Only when planning or reviewing |
+
+Both are knowledge an agent genuinely needs. The map (`guide/`) tells an agent *where*
+code is; the ADRs tell it which "weird" code is **deliberate and must not be fixed**
+— that is the backbone of the frozen/Stability rule in `ai/guide/MODULE_MAP.md`.
+Specs in `lab/specs/` are what `/add-feature` executes against. Without `lab/`, the
+kit is a navigation tool; with it, it is a knowledge base.
+
+**Invariants that keep the two kinds from contaminating each other:**
+
+1. **Load-pattern separation** (already encoded in `ai/INDEX.md`): `guide/` loads
+   every agent session, `analysis/` on demand per task, `lab/` only when planning or
+   reviewing. `lab/` therefore never costs tokens during a normal "where is X" lookup.
+2. **Dependency direction:** navigation must **never require reading `lab/`**.
+   `guide/` has to make sense on its own. `lab/` may reference `guide/`; never the
+   reverse as a prerequisite.
+3. **Provenance:** both kinds obey the same `[inferred]` → `[verified]` rule, so a
+   reader can trust them under one model.
+
+**The one real structural risk:** `lab/` is the only part of `ai/` that *accumulates*
+rather than *regenerates*. Stale specs for shipped features rot into noise, and an
+agent planning from `lab/` could mistake an old plan for a current one. Mitigation:
+the lifecycle in `ai/lab/README.md` ends with "Archive → mark spec implemented"; the
+`/check-drift` pass should enforce that implemented specs are actually marked.
+
+---
+
+## Lesson 2 — the right analogy is *memory*, not a "central nervous system"
+
+The CNS analogy misleads because a nervous system *controls* the body. `ai/` controls
+nothing — the source code never reads it; it only *informs* the agents that edit the
+code. The accurate analogy is **memory**, and it maps surprisingly precisely onto the
+folder layout:
+
+| Human memory system | What it stores | `ai/` equivalent |
+|---|---|---|
+| **Semantic memory** | Facts about the world ("Paris is a capital") | `ai/guide/` — module map, architecture, conventions: facts about the repo |
+| **Episodic memory** | Experiences and events ("last release we broke X") | `ai/lab/` — ADRs, evaluations, experiments: what we decided, tried, learned |
+| **Procedural memory** | How to do things (riding a bike) | Skills/commands (`/cold-start`, `/add-feature`, `/check-drift`) + `CONVENTIONS.md` |
+| **Perception** (not memory, but feeds it) | Fresh observations of the current world | `ai/analysis/` — regenerated catalogs, diagrams, audit reports |
+| **Working memory** | The handful of items held *right now* while reasoning | The agent's **context window** during a session — the only part NOT in `ai/` |
+
+Two consequences of taking the analogy seriously:
+
+- A brain with only semantic memory can describe the world but cannot learn from
+  experience. Removing `lab/` would lobotomize exactly that: the episodic half.
+- Everything in `ai/` is **external, persistent, shared memory**. It survives the end
+  of a session (unlike the context window), and it is shared across *agents and
+  humans* (unlike any single tool's proprietary memory feature). That is what makes
+  the kit tool-agnostic: Claude, Cursor, Copilot, and a new human teammate all read
+  the same memory.
+
+A less biological framing that also works: `guide/` is **the map**, `lab/` is **the
+captain's log**. A ship needs both; nobody confuses them because they are shelved
+separately.
+
+---
+
+## Lesson 3 — how memory, context, context engineering, and harness engineering fit together
+
+These four terms are layered, not interchangeable. As of mid-2026 the industry
+describes them roughly as three phases of AI-engineering maturity — prompt
+engineering → context engineering → harness engineering — with memory as the
+substrate all of them manage.
+
+### The four layers
+
+1. **Memory** *(what is known)* — durable knowledge outside any single model call.
+   Short-term memory is the running conversation (including tool results); long-term
+   memory persists across sessions. The `ai/` folder is this repo's long-term memory,
+   deliberately stored as plain files under version control so that provenance
+   (`[inferred]`/`[verified]`), diffing, and review all come for free.
+
+2. **Context** *(what is loaded right now)* — the finite token window a model
+   actually sees in one call: system prompt, instructions, tool schemas, retrieved
+   files, and history. Context is the *working memory* into which small slices of
+   long-term memory are paged. It is scarce and degrades when overfilled — the
+   failure mode the industry calls **context rot**: irrelevant tokens crowding out
+   signal until decisions degrade.
+
+3. **Context engineering** *(deciding what gets loaded, when)* — the discipline of
+   curating that window across a multi-step task: just-in-time retrieval instead of
+   pre-loading, compaction of older turns into summaries, structured note-taking to
+   files, and sub-agent isolation so heavy reading happens in a *different* context
+   window and only conclusions return. The kit's design choices are context
+   engineering decisions made once and reused forever:
+   - `INDEX.md` load patterns (every session / on demand / when planning) = a
+     **paging policy** for memory.
+   - "Locate via `MODULE_MAP.md`, open only needed files" = **just-in-time
+     retrieval** (measured in `ai/lab/evaluations/2026-06-15-value-demo-context-budget.md`
+     at ~3.1× less context for the same task).
+   - `repo-explorer` / `feature-builder` / `test-runner` subagents = **context
+     isolation**.
+   - Writing findings into `ai/analysis/` = **structured note-taking**: an agent's
+     working memory persisted into shared long-term memory before the window resets.
+
+4. **Harness engineering** *(the machinery around the model)* — everything in the
+   agent system *except* the model: **Agent = Model + Harness**. Tools, permission
+   guardrails, verification loops, hooks, observability. The term was popularized in
+   early 2026 (commonly attributed to Mitchell Hashimoto) and its key principle is
+   **deterministic enforcement over probabilistic compliance**: telling an agent
+   "follow our standards" in a prompt is prompt engineering; wiring a check that
+   *blocks* the change when standards are violated is harness engineering. In this
+   repo, the harness is:
+   - `node install.mjs verify . --strict` — broken knowledge-paths fail
+     deterministically instead of relying on the agent's diligence.
+   - The provenance rule ("agents must NEVER flip `[inferred]` → `[verified]`") —
+     a hard trust boundary, not a suggestion.
+   - MODULE_MAP `Stability: frozen` — an enforceable gate consulted before edits.
+   - `orient` producing `repo-profile.json` deterministically — facts generated by
+     code, not by model recall.
+
+### How they interlock
+
+```
+long-term MEMORY (ai/ on disk, versioned, human-verified)
+        │  paged in just-in-time…
+        ▼
+CONTEXT (the finite window: working memory of one session)
+        ▲                                   │
+        │  …by CONTEXT ENGINEERING          │  writes notes back
+        │  (INDEX load patterns, module-map │  (analysis/, lab/,
+        │   lookup, subagents, compaction)  ▼   always [inferred])
+        └──────────────── enforced by the HARNESS ────────────────
+          (verify --strict, provenance rule, stability gates,
+           deterministic orient/indepth generators)
+```
+
+The loop that matters: **memory feeds context; context engineering keeps the feed
+small and relevant; the agent's new conclusions flow back into memory; the harness
+guarantees that what flows back is verifiable and cannot silently corrupt the trusted
+layer.** Break any link and the system degrades in a predictable way:
+
+- No long-term memory → every session re-derives the repo from scratch (the exact
+  cold-start cost this kit exists to eliminate).
+- No context engineering → context rot: the map gets pre-loaded wholesale and drowns
+  the task.
+- No harness → memory rots differently: unverified `[inferred]` claims masquerade as
+  truth, paths drift, and agents trust a stale map — worse than no map.
+
+**Positioning takeaway:** ai-fication-kit is best described not as "documentation"
+but as a **memory + harness layer for coding agents**: versioned long-term memory
+(`ai/`), a context-engineering policy for reading it (`INDEX.md` load patterns +
+module-map-first navigation), and a harness that keeps it trustworthy (verify,
+provenance, stability gates).
+
+---
+
+## Sources `[inferred]` (checked 2026-07-03; primary pages paywalled/blocked from this environment — summaries via search)
+
+- Anthropic — *Effective context engineering for AI agents* (anthropic.com/engineering):
+  context as a finite resource, context rot, compaction, structured note-taking,
+  sub-agent architectures, just-in-time retrieval.
+- Claude Cookbook — [Context engineering: memory, compaction, and tool clearing](https://platform.claude.com/cookbook/tool-use-context-engineering-context-engineering-tools).
+- Martin Fowler (site) — [Harness engineering for coding agent users](https://martinfowler.com/articles/harness-engineering.html).
+- Augment Code — [Harness engineering for AI coding agents](https://www.augmentcode.com/guides/harness-engineering-ai-coding-agents):
+  "Agent = Model + Harness"; deterministic constraints over probabilistic compliance.
+- Faros — [Harness engineering: making AI coding agents work in 2026](https://www.faros.ai/blog/harness-engineering):
+  attribution of the term to Mitchell Hashimoto (early Feb 2026); third phase after
+  prompt → context engineering.
+- Sourcegraph — [Context engineering: a practical guide for AI agents (2026)](https://sourcegraph.com/blog/context-engineering).
+- mem0 — [Context engineering AI: how to build smarter LLM agents in 2026](https://mem0.ai/blog/context-engineering-ai-agents-guide):
+  short-term vs long-term agent memory, compaction step.