Skip to content

feat(db+search): Phase 4A — EXP-21 per-entity temporal linkage list#13

Draft
moralespanitz wants to merge 1 commit intoexperiment/phase2-combined-stackfrom
feature/exp-21-per-entity-temporal
Draft

feat(db+search): Phase 4A — EXP-21 per-entity temporal linkage list#13
moralespanitz wants to merge 1 commit intoexperiment/phase2-combined-stackfrom
feature/exp-21-per-entity-temporal

Conversation

@moralespanitz
Copy link
Copy Markdown

Summary

Per-entity sparse linked-list of facts indexed by (user_id, entity_id, created_at DESC). Targets BEAM EO (event ordering — currently 0/2 across all 19 sprint-2 iters) and MR (multi-session reasoning).

Why

EO is mechanically broken in current AtomicMemory: bag-of-facts ranked by RRF, no entity-graph-aware ordering. EXP-21 builds a per-entity sequence at ingest time and traverses it in temporal order at retrieval when query mentions an entity.

Sprint 2 EXP-13 (PR #5) added event_boundary metadata at 3.6% coverage — the boost was applied to too thin a slice. EXP-21 takes a different angle: explicit per-entity sequencing rather than per-fact boundary scoring.

Scope (660 insertions, 12 files)

  • src/db/schema.sql (+30) — new table atomic_entity_temporal_links with (user_id, entity_id, created_at DESC) index
  • src/db/repository-entity-temporal-links.ts (NEW, 81 LOC) — repository layer
  • src/db/stores.ts (+5), pg-representation-store.ts (+9), memory-repository.ts (+13) — wiring
  • src/services/entity-temporal-linkage.ts (NEW, 176 LOC) — boost stage + normalizeEntityId
  • src/services/memory-storage.ts (+34) — maybeWriteEntityTemporalLinks + ingest hook
  • src/services/search-pipeline.ts (+49) — applyEntityTemporalLinkageStage wiring
  • src/services/memory-search.ts (+9) — representation passthrough
  • src/config.ts (+15) — 2 new flags + env loaders + allowlist
  • src/app/runtime-container.ts (+3) — CoreRuntimeConfig
  • src/services/__tests__/entity-temporal-linkage.test.ts (NEW, 240 LOC, 10/10 passing)

Spec adaptation (flagged)

Original spec said "place this stage AFTER current-state-ranking but BEFORE final RRF rerank (same insertion point convention as EXP-12)". EXP-12's applyRecencyBinStage is sync inside applyRankingProtectionStages. The new temporal-linkage stage needs an async DB call, so it sits one level up in applyExpansionAndReranking — runs immediately after protection stages return, before selectAndExpandCandidates. Relative ordering (vs current-state, recency-bin, instruction-boost, link-expansion, MMR) is preserved.

Hypothesis-killing risk (explicit)

If the existing LLM-extractor misses entities at ingest, the linkage list is sparse and the boost fires on nothing. Mitigations:

  1. Retrieval side runs extractNamedEntityCandidates (regex-based: capitalized proper nouns + quoted phrases) over the QUERY independently — query-side recall is robust even if writer side is patchy
  2. Stage emits trace event with boostedCount — if BEAM EO numbers don't move, surface the empty-linkage cause rather than masking it

New config keys (defaults-off)

perEntityTemporalLinkageEnabled: boolean = false
perEntityTemporalLinkageBoostWeight: number = 0.15

Tests

10 new cases all passing:

  • flag-off no-op
  • three-Stripe-facts chronological ordering (the canonical test)
  • no-entity query
  • no-overlap
  • multi-entity min-rank wins
  • weight=0 short-circuit
  • empty input
  • normalizeEntityId edge cases (lower/whitespace)

Full suite: 1222/1223 pass. The 1 failure is pre-existing on baseline 709242f, not introduced here.

Test plan

  • Typecheck clean
  • All 10 new unit tests pass
  • Full suite still passes (modulo pre-existing failure)
  • Stage-7 dryrun with this enabled — measure EO lift from 0/2 baseline + MR lift

BEAM EO (event ordering) is currently 0/2 across 19 sprint-2 iterations
because retrieval is bag-of-facts ranked by RRF — there's no entity-graph
chronology at retrieval time. EXP-21 introduces a sparse linked-list per
entity sorted by created_at, then walks that list at retrieval time when
the query mentions an entity to boost facts by their chronological position.

Both halves are gated by `perEntityTemporalLinkageEnabled` (default false).

Changes:
- New table `atomic_entity_temporal_links` (user_id, entity_id, fact_id,
  created_at) with the (user_id, entity_id, created_at DESC) traversal
  index and ON DELETE CASCADE through memory_atomic_facts.
- Repository helpers in `repository-entity-temporal-links.ts` (storeLinks
  + listLinks). Wired into `RepresentationStore` (interface + Pg impl +
  legacy `MemoryRepository` shim).
- Ingest write in `memory-storage.ts::storeProjection`: when the flag is
  on, every stored fact emits one row per mentioned entity (lowercased,
  whitespace-collapsed). Dedupes within a single fact.
- New retrieval stage `applyEntityTemporalLinkageBoost` in
  `entity-temporal-linkage.ts`. Wired into `search-pipeline.ts` between
  the protection stages and `selectAndExpandCandidates`, matching the
  EXP-12 insertion site convention.
- Two new config keys (`perEntityTemporalLinkageEnabled`,
  `perEntityTemporalLinkageBoostWeight=0.15`) on RuntimeConfig,
  CoreRuntimeConfig, INTERNAL_POLICY_CONFIG_FIELDS.
- 10 unit tests (flag-off no-op, 3-fact chronological ordering, no-entity
  query, no-overlap, multi-entity rank min, weight-0, empty input,
  normalizeEntityId behaviors).

Surprises adapted from spec:
- Spec said "after current-state-ranking, before final RRF rerank, same
  insertion point as EXP-12". EXP-12's `applyRecencyBinStage` is called
  inside the sync helper `applyRankingProtectionStages`. The temporal-
  linkage stage needs an async DB call, so it sits one level up in
  `applyExpansionAndReranking` — runs immediately after the protection
  stages return and before `selectAndExpandCandidates`. Same effective
  ordering relative to other ranking stages and link/MMR expansion.
- Spec mentioned a `RESERVED_METADATA_KEYS` static-drift guard in
  `repository-types.ts`. The keys set exists but EXP-21 doesn't write
  any new metadata keys (the linkage state lives in its own table, not
  in fact metadata), so no update was needed there.
- Entity extraction reliability is a real risk for the hypothesis: the
  extractor produces structured ExtractedEntity[] from the LLM at ingest
  time. If extraction misses entities, the linkage list is sparse and
  the boost has nothing to fire on. The retrieval side mirrors the
  existing `extractNamedEntityCandidates` regex over the query, which
  is independent of the LLM extractor — query-side recall is at least
  decent for capitalized proper nouns. Empty linkage lists are an
  expected failure mode that the benchmark will surface cleanly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant