feat(extraction): EXP-05 H-310 — extend INSTRUCTION_MARKERS for BEAM phrasings#9
Draft
moralespanitz wants to merge 1 commit intoexperiment/phase2-combined-stackfrom
Draft
Conversation
…phrasings
DB diagnostic showed only 0.8% of Stage-7 v1 ingest facts (15/2000)
were tagged metadata.fact_role='instruction', because the original
markers list ('always X', 'never Y', 'from now on', 'going forward')
matches strict imperatives but misses BEAM-style soft imperatives:
- 'I want X'
- 'make sure to Y'
- 'please Z'
- 'I prefer A'
- 'I'd like B'
- 'always include C'
- 'remember to D'
- ... (full list in the diff)
False-positive prevention:
- 'I want to know' (question prefix) does not match
- 'I prefer not to' (negation) handled separately
Targets BEAM IF (currently 0/2 in v1 dryrun). Honcho IF is 0.844 — the
biggest improvement headroom is closing this gap.
Behind the existing instructionBoostEnabled flag (defaults preserve
current behavior).
3 tasks
moralespanitz
added a commit
that referenced
this pull request
May 6, 2026
…#9) The original gate was a single alternation regex that fired on any single occurrence of `first|last|before|after|then|later|track|...`. That over-fired on plain factual queries that incidentally contained one of those tokens — `what is my first name`, `the model used before GPT-4`, `track my spending` — pulling in unrelated TLL chain memories on the augmented retrieval path. Replaced the gate with a two-tier check: 1. ORDERING_TERMS_RE — a curated set of single-token signals (first/last/before/after/then/later/earlier/previous/next/prior). Only fires TLL when TWO co-occur, e.g. "what aspects did I discuss BEFORE and AFTER X". 2. SEQUENCE_PATTERNS — phrase-level structural signals (`in (chronological/reverse/the) order`, `when did`, `since when`, `over time`, `evolution of`, `history|timeline of`, `originally`/`initially`, `progression of`, `how X evolved/shifted/changed`, `brought up`). Single phrase hit is enough. Removed `track`, `sequence`, and bare `order` from the gate — they were the largest false-positive contributors. Updated `src/services/__tests__/tll-retrieval.test.ts`: - Positive list rewritten to canonical EO/MSR/TR shapes that hit one of the structural patterns or co-occurring ordering terms. - Negative list now includes the false-positive shapes the loose regex used to match (the three reviewer-cited ones plus a handful of single-ordering-term factual queries). 41/41 unit tests pass against the updated gate.
ethanj
added a commit
that referenced
this pull request
May 6, 2026
Squashed across four review rounds on PR #18, all of which surfaced after the initial wave of fixes (#1–#11). Each item below maps to a finding in the PR's review threads. Search path - hydrateChainMemories now returns fully-shaped SearchResults via SELECT * + normalizeMemoryRow. The previous projection set similarity: null and omitted source_site / score / summary / observed_at, crashing the buildInjection formatter (`memory.similarity.toFixed(2)`) the first time TLL augmentation actually fired against a populated chain. The `as unknown as SearchResult` cast was hiding it from tsc. (review v2 #1) - Hydration query now uses `unnest($2::uuid[]) WITH ORDINALITY ... ORDER BY req.ord` so the chronological order chainsFor returns through the augmentation pipeline survives. (review v4 #2) - Workspace isolation: hydrateChainMemories filters `m.workspace_id IS NULL` to match the gate behavior performSearch's postProcessResults already applies. Without it, a workspace memory chained from a global memory's entity could surface in a global response. (review v4 #1) - Defensive `relevance: 1.0` on hydrated rows locks in the chain-membership bypass invariant against future filter drift. The augmented rows are appended after applySearchRelevanceFilter today, but `similarity: 0` + `score: 0` would make them load-bearing on `relevance` if any future filter past appendTllAugmentation checked `memory.relevance >= threshold`. Regression test drives performSearch with a high retrievalOptions.relevanceThreshold and confirms the augmented row survives. (review v5 #2) Repository - TLL chain reads (chain, chainEventsForEntities) now derive chronological position and predecessor via `ROW_NUMBER()` and `LAG()` window functions ordered by observation_date ASC (with stored position_in_chain as a deterministic tiebreaker for events sharing an observation_date). The stored predecessor_memory_id and position_in_chain columns become insertion-order audit metadata; the API surface returns chronological ordering. Backfilled out-of-order events surface in their true position with chronologically-correct predecessors. (review v3 #1) - chainEventsForEntities adds `m.workspace_id IS NULL` for the same reason as the search-path fix above — the global event-chains HTTP endpoint must not surface workspace memories. (review v4 #1) Schema - FirstMentionsExtractBodySchema validates memory_ids_by_turn_id values as UUIDs so a non-UUID returns 400 (schema layer) instead of leaking a Postgres "invalid input syntax for type uuid" as 500 from the route. (review v3 #2) - New SearchResult.retrieval_signal optional field tags chain-augmented rows so observability and any future ranker can distinguish them from similarity-ranked candidates. (review v2 #1, plumbed through v4) Refactor - Extracted maybeExpandViaTLL, hydrateChainMemories, appendTllAugmentation out of memory-search.ts into a sibling tll-augmentation.ts module. The search file dropped from 551 → 385 LOC, back under the 400-LOC project cap. Shared internal types (PostProcessedSearch, RelevanceFilterSummary) pulled into memory-search-types.ts so the two consumers don't duplicate. (review v5 #4) Test coverage - New integration test (services/__tests__/tll-augmentation-integration.test.ts) drives performSearch end-to-end through appendTllAugmentation, with cases for: rendering augmented rows through buildInjection without crashing, the SQL contract (unnest ORDINALITY + ORDER BY req.ord), workspace-leak prevention, the relevance-1.0 bypass invariant, and no-augmentation for non-TLL queries. - New repository tests for backfill chronological ordering of chain() and chainEventsForEntities() and for chainEventsForEntities workspace isolation. - New route test asserts 400 (not 500) on non-UUID memory_ids_by_turn_id. Verification: `npx tsc --noEmit` clean · 1286/1286 vitest pass against the test DB · `npx fallow audit --no-cache` exit 0. Deferrals (parked durably in the research repo's tech-debt log at Atomicmemory-research/docs/core-repo/tech-debt.md): - predecessor_memory_id ON DELETE CASCADE vs SET NULL — design call, contested between two reviewers; current default kept. - process.env.ALLOWED_ORIGINS direct read in src/routes/memories.ts — pre-existing CLAUDE.md violation, out of scope for this PR. - shouldUseTLL non-adjacent "after did" — low operational risk; tightening the regex was the explicit goal of review #9 and re-broadening risks reintroducing false positives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DB diagnostic on Stage-7 v1 ingestion (2000 facts) showed only 15 (0.8%) were tagged
metadata.fact_role: 'instruction'. The original markers (always X / never Y / from now on) only match strict imperatives. BEAM users phrase instructions softly, so the EXP-05 boost has nothing to boost.This PR extends the markers to 24 total (12 original preserved + 12 new BEAM soft imperatives) and adds an 11-pattern false-positive filter.
Stacked on
experiment/phase2-combined-stack(PR base) since the EXP-05 commit lives there.Validation
Risks flagged
pleaseis broadwantovercatches one-off intentsprefermatches preference statements (importance floored to 0.95)Behind
instructionBoostEnabledflag — defaults preserve current behavior.Full hypothesis audit trail in
atomicmemory-research/memory-research/benchmarks-sprint2/experiments/EXPERIMENT-LEDGER.md(H-310). Stage-7 v5 in flight will measure IF lift.