KS76: Universal prompt + temporal boost + importance scoring#15
KS76: Universal prompt + temporal boost + importance scoring#15
Conversation
- Replace per-file prompt variants with canonical READER_SYSTEM_PROMPT
and READER_USER_TEMPLATE module-level constants
- Add context fence (-----) separating memories from question
- Add "not prior knowledge" constraint for grounded extraction
- Add "Answer:" completion suffix for consistent output format
- Remove hedge clause ("say you don't have that information") from v1
- Standardize v1 to temperature=0.0, num_predict=64 (matching v2)
- All 3 files now use identical prompt text
Files: run_longmemeval.py, run_longmemeval_v2.py, cross_model_smoke.py
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…e score cap - Replace inline TEMPORAL_KEYWORDS with shared TEMPORAL_QUERY_KEYWORDS from shrimpk_core (adds: recently, today, yesterday, last week/month/year, just now, this morning/week/month, days/weeks/months ago) - Increase temporal boost from +0.015 to +0.08 for meaningful ranking impact - Raise score inflation cap from 0.35 to 0.50 for temporal + importance headroom - Update existing test assertion to match new boost magnitude - Add temporal_boost_uses_shared_keywords test covering "recently" keyword Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Graph polish: view transitions, Louvain viz, edge labels, temporal
slider, custom node shapes, entity super-nodes, echo-frequency sizing
- Memory curation: inline edit, merge, manual links, retag, bulk ops
- Export formats: per-memory JSON, GraphML/GEXF graph export
… 0.25 - Add TEMPORAL_QUERY_KEYWORDS (21 keywords) to shrimpk-core/config.rs as single source of truth for echo scoring and reformulation - Re-export from shrimpk_core lib.rs - Change importance_weight default from 0.0 to 0.25 so importance scoring contributes to final ranking out of the box Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Greptile SummaryThis PR delivers three coordinated improvements to the ShrimPK memory kernel: a unified extraction-framing prompt across all three benchmark runners (eliminating LLM refusals), a temporal boost refactor that replaces a stale 9-keyword inline list with a shared 21-keyword constant and raises the boost magnitude from +0.015 to +0.08, and an importance-weight activation change that switches the default from 0.0 to 0.25 so the scoring factor actually influences retrieval. All three tracks are well-tested (629 passing, new Key changes:
Confidence Score: 4/5Safe to merge after fixing the The three tracks are coherent and all pass CI (629 tests). The only blocking issue is
|
| Filename | Overview |
|---|---|
| crates/shrimpk-core/src/config.rs | Adds TEMPORAL_QUERY_KEYWORDS constant (21 keywords, single source of truth) and changes importance_weight default to 0.25, but #[serde(default)] on that field still resolves to 0.0 via f32::default(), creating a divergence from the manual Default impl. |
| crates/shrimpk-core/src/lib.rs | Re-exports TEMPORAL_QUERY_KEYWORDS at the crate root — clean, minimal change. |
| crates/shrimpk-memory/src/echo.rs | Replaces inline 9-keyword temporal list with shared constant (21 keywords), raises boost from +0.015 to +0.08, raises score cap from +0.35 to +0.50, updates existing test, and adds temporal_boost_uses_shared_keywords test. Float equality assertions use f64::EPSILON which is fragile for accumulated float arithmetic. |
| benchmarks/run_longmemeval.py | Replaces hedge-clause QA prompt with extraction-framing prompt; logic and structure unchanged. |
| benchmarks/run_longmemeval_v2.py | Aligned to the same canonical extraction prompt as run_longmemeval.py and cross_model_smoke.py; no behavioral changes. |
| benchmarks/cross_model_smoke.py | Prompt updated to match canonical extraction prompt; no other logic changes. |
| BACKLOG.md | New backlog tracking file added; documentation only, no code impact. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
Q[Query string] --> TL[to_lowercase]
TL --> KW{Any keyword in TEMPORAL_QUERY_KEYWORDS 21 shared terms}
KW -- No --> SKIP[No temporal boost applied]
KW -- Yes --> ITER[Iterate EchoResults]
ITER --> TLB{result.labels contains temporal:* prefix?}
TLB -- No --> NEXT[Next result]
TLB -- Yes --> BOOST[final_score += 0.08]
BOOST --> CAP{final_score > similarity + 0.50?}
CAP -- No --> NEXT
CAP -- Yes --> CLAMP[final_score = similarity + 0.50]
CLAMP --> NEXT
NEXT --> SORT[Re-sort by final_score]
SORT --> OUT[EchoResults returned]
Comments Outside Diff (1)
-
crates/shrimpk-core/src/config.rs, line 289-290 (link)#[serde(default)]diverges fromDefaultimpl forimportance_weight#[serde(default)]with no argument resolves to the field type'sDefault::default()— i.e.f32::default()=0.0. But the manualDefault for EchoConfigwas just changed to return0.25. These two deserialization paths are now inconsistent:// line 289 – what serde uses when this field is absent in the input #[serde(default)] // ← f32::default() = 0.0 pub importance_weight: f32, // ← but Default::default() is now 0.25
In the current production flow (
FileConfig→resolve_config→EchoConfig::auto_detect()) this is invisible becauseEchoConfigis never directly deserialized from TOML. However,EchoConfigderivesDeserialize, so any direct round-trip (integration tests, future API endpoints, ortoml::from_str::<EchoConfig>()) will silently get0.0instead of0.25, re-introducing the "computed but unused" bug that this PR is supposed to fix.The rest of the codebase uses the correct pattern for non-zero defaults (compare
activation_weighton line 286). Apply the same pattern here:fn default_importance_weight() -> f32 { 0.25 } // in EchoConfig struct: /// Weight of importance boost in scoring formula (0.0 = consolidation only). #[serde(default = "default_importance_weight")] pub importance_weight: f32,
Reviews (1): Last reviewed commit: "KS76: add shared TEMPORAL_QUERY_KEYWORDS..." | Re-trigger Greptile
Summary
TEMPORAL_QUERY_KEYWORDSconstant (21 keywords) from shrimpk-core. Increased boost magnitude from +0.015 to +0.08. Fixes keyword mismatch where "recently", "last week" etc. never triggered the boost.importance_weightdefault from 0.0 to 0.25 (was computed but unused). Raised score inflation cap from +0.35 to +0.50 to give boosts ranking headroom.Changes
benchmarks/run_longmemeval.pybenchmarks/run_longmemeval_v2.pybenchmarks/cross_model_smoke.pycrates/shrimpk-core/src/config.rsTEMPORAL_QUERY_KEYWORDSconstant,importance_weight: 0.25crates/shrimpk-core/src/lib.rsTEMPORAL_QUERY_KEYWORDScrates/shrimpk-memory/src/echo.rsTest plan
cargo test --workspace— 629 passed, 0 failedcargo clippy --workspace -- -D warnings— 0 warningstemporal_boost_uses_shared_keywordspasses🤖 Generated with Claude Code