feat(extraction): EXP-05 H-310 — extend INSTRUCTION_MARKERS for BEAM phrasings by moralespanitz · Pull Request #9 · atomicmemory/atomicmemory-core

moralespanitz · 2026-04-30T05:20:05Z

Summary

DB diagnostic on Stage-7 v1 ingestion (2000 facts) showed only 15 (0.8%) were tagged metadata.fact_role: 'instruction'. The original markers (always X / never Y / from now on) only match strict imperatives. BEAM users phrase instructions softly, so the EXP-05 boost has nothing to boost.

This PR extends the markers to 24 total (12 original preserved + 12 new BEAM soft imperatives) and adds an 11-pattern false-positive filter.

Stacked on experiment/phase2-combined-stack (PR base) since the EXP-05 commit lives there.

Validation

14 new unit tests (9 positive + 5 FP-prevention), all pass
33 total extraction-enrichment tests pass
8 instruction-boost tests still pass
typecheck clean

Risks flagged

please is broad
want overcatches one-off intents
prefer matches preference statements (importance floored to 0.95)

Behind instructionBoostEnabled flag — defaults preserve current behavior.

Full hypothesis audit trail in atomicmemory-research/memory-research/benchmarks-sprint2/experiments/EXPERIMENT-LEDGER.md (H-310). Stage-7 v5 in flight will measure IF lift.

…phrasings DB diagnostic showed only 0.8% of Stage-7 v1 ingest facts (15/2000) were tagged metadata.fact_role='instruction', because the original markers list ('always X', 'never Y', 'from now on', 'going forward') matches strict imperatives but misses BEAM-style soft imperatives: - 'I want X' - 'make sure to Y' - 'please Z' - 'I prefer A' - 'I'd like B' - 'always include C' - 'remember to D' - ... (full list in the diff) False-positive prevention: - 'I want to know' (question prefix) does not match - 'I prefer not to' (negation) handled separately Targets BEAM IF (currently 0/2 in v1 dryrun). Honcho IF is 0.844 — the biggest improvement headroom is closing this gap. Behind the existing instructionBoostEnabled flag (defaults preserve current behavior).

…#9) The original gate was a single alternation regex that fired on any single occurrence of `first|last|before|after|then|later|track|...`. That over-fired on plain factual queries that incidentally contained one of those tokens — `what is my first name`, `the model used before GPT-4`, `track my spending` — pulling in unrelated TLL chain memories on the augmented retrieval path. Replaced the gate with a two-tier check: 1. ORDERING_TERMS_RE — a curated set of single-token signals (first/last/before/after/then/later/earlier/previous/next/prior). Only fires TLL when TWO co-occur, e.g. "what aspects did I discuss BEFORE and AFTER X". 2. SEQUENCE_PATTERNS — phrase-level structural signals (`in (chronological/reverse/the) order`, `when did`, `since when`, `over time`, `evolution of`, `history|timeline of`, `originally`/`initially`, `progression of`, `how X evolved/shifted/changed`, `brought up`). Single phrase hit is enough. Removed `track`, `sequence`, and bare `order` from the gate — they were the largest false-positive contributors. Updated `src/services/__tests__/tll-retrieval.test.ts`: - Positive list rewritten to canonical EO/MSR/TR shapes that hit one of the structural patterns or co-occurring ordering terms. - Negative list now includes the false-positive shapes the loose regex used to match (the three reviewer-cited ones plus a handful of single-ordering-term factual queries). 41/41 unit tests pass against the updated gate.

Squashed across four review rounds on PR #18, all of which surfaced after the initial wave of fixes (#1–#11). Each item below maps to a finding in the PR's review threads. Search path - hydrateChainMemories now returns fully-shaped SearchResults via SELECT * + normalizeMemoryRow. The previous projection set similarity: null and omitted source_site / score / summary / observed_at, crashing the buildInjection formatter (`memory.similarity.toFixed(2)`) the first time TLL augmentation actually fired against a populated chain. The `as unknown as SearchResult` cast was hiding it from tsc. (review v2 #1) - Hydration query now uses `unnest($2::uuid[]) WITH ORDINALITY ... ORDER BY req.ord` so the chronological order chainsFor returns through the augmentation pipeline survives. (review v4 #2) - Workspace isolation: hydrateChainMemories filters `m.workspace_id IS NULL` to match the gate behavior performSearch's postProcessResults already applies. Without it, a workspace memory chained from a global memory's entity could surface in a global response. (review v4 #1) - Defensive `relevance: 1.0` on hydrated rows locks in the chain-membership bypass invariant against future filter drift. The augmented rows are appended after applySearchRelevanceFilter today, but `similarity: 0` + `score: 0` would make them load-bearing on `relevance` if any future filter past appendTllAugmentation checked `memory.relevance >= threshold`. Regression test drives performSearch with a high retrievalOptions.relevanceThreshold and confirms the augmented row survives. (review v5 #2) Repository - TLL chain reads (chain, chainEventsForEntities) now derive chronological position and predecessor via `ROW_NUMBER()` and `LAG()` window functions ordered by observation_date ASC (with stored position_in_chain as a deterministic tiebreaker for events sharing an observation_date). The stored predecessor_memory_id and position_in_chain columns become insertion-order audit metadata; the API surface returns chronological ordering. Backfilled out-of-order events surface in their true position with chronologically-correct predecessors. (review v3 #1) - chainEventsForEntities adds `m.workspace_id IS NULL` for the same reason as the search-path fix above — the global event-chains HTTP endpoint must not surface workspace memories. (review v4 #1) Schema - FirstMentionsExtractBodySchema validates memory_ids_by_turn_id values as UUIDs so a non-UUID returns 400 (schema layer) instead of leaking a Postgres "invalid input syntax for type uuid" as 500 from the route. (review v3 #2) - New SearchResult.retrieval_signal optional field tags chain-augmented rows so observability and any future ranker can distinguish them from similarity-ranked candidates. (review v2 #1, plumbed through v4) Refactor - Extracted maybeExpandViaTLL, hydrateChainMemories, appendTllAugmentation out of memory-search.ts into a sibling tll-augmentation.ts module. The search file dropped from 551 → 385 LOC, back under the 400-LOC project cap. Shared internal types (PostProcessedSearch, RelevanceFilterSummary) pulled into memory-search-types.ts so the two consumers don't duplicate. (review v5 #4) Test coverage - New integration test (services/__tests__/tll-augmentation-integration.test.ts) drives performSearch end-to-end through appendTllAugmentation, with cases for: rendering augmented rows through buildInjection without crashing, the SQL contract (unnest ORDINALITY + ORDER BY req.ord), workspace-leak prevention, the relevance-1.0 bypass invariant, and no-augmentation for non-TLL queries. - New repository tests for backfill chronological ordering of chain() and chainEventsForEntities() and for chainEventsForEntities workspace isolation. - New route test asserts 400 (not 500) on non-UUID memory_ids_by_turn_id. Verification: `npx tsc --noEmit` clean · 1286/1286 vitest pass against the test DB · `npx fallow audit --no-cache` exit 0. Deferrals (parked durably in the research repo's tech-debt log at Atomicmemory-research/docs/core-repo/tech-debt.md): - predecessor_memory_id ON DELETE CASCADE vs SET NULL — design call, contested between two reviewers; current default kept. - process.env.ALLOWED_ORIGINS direct read in src/routes/memories.ts — pre-existing CLAUDE.md violation, out of scope for this PR. - shouldUseTLL non-adjacent "after did" — low operational risk; tightening the regex was the explicit goal of review #9 and re-broadening risks reintroducing false positives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

moralespanitz mentioned this pull request May 6, 2026

feat(core): productize first-mention events + TLL EO read-path #18

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(extraction): EXP-05 H-310 — extend INSTRUCTION_MARKERS for BEAM phrasings#9

feat(extraction): EXP-05 H-310 — extend INSTRUCTION_MARKERS for BEAM phrasings#9
moralespanitz wants to merge 1 commit intoexperiment/phase2-combined-stackfrom
feature/h310-extend-instruction-markers

moralespanitz commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moralespanitz commented Apr 30, 2026

Summary

Validation

Risks flagged

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant