Skip to content

KS76: Universal prompt + temporal boost + importance scoring#15

Merged
Liorrr merged 4 commits intomasterfrom
feat/ks76-prompt-retrieval
Apr 9, 2026
Merged

KS76: Universal prompt + temporal boost + importance scoring#15
Liorrr merged 4 commits intomasterfrom
feat/ks76-prompt-retrieval

Conversation

@Liorrr
Copy link
Copy Markdown
Contributor

@Liorrr Liorrr commented Apr 9, 2026

Summary

  • Track 1 — Universal Reader Prompt: Replaced hedge-clause QA prompt with extraction-framing prompt across all 3 benchmark files. Zero refusals on qwen2.5:3b, gemma3:1b, llama3.2:3b (0/15).
  • Track 2 — Temporal Boost Fix: Replaced inline 9-keyword list with shared TEMPORAL_QUERY_KEYWORDS constant (21 keywords) from shrimpk-core. Increased boost magnitude from +0.015 to +0.08. Fixes keyword mismatch where "recently", "last week" etc. never triggered the boost.
  • Track 3 — Retrieval Scoring: Changed importance_weight default from 0.0 to 0.25 (was computed but unused). Raised score inflation cap from +0.35 to +0.50 to give boosts ranking headroom.

Changes

File What
benchmarks/run_longmemeval.py Universal extraction prompt, no hedge clauses
benchmarks/run_longmemeval_v2.py Aligned to same canonical prompt
benchmarks/cross_model_smoke.py Aligned to same canonical prompt
crates/shrimpk-core/src/config.rs TEMPORAL_QUERY_KEYWORDS constant, importance_weight: 0.25
crates/shrimpk-core/src/lib.rs Re-export TEMPORAL_QUERY_KEYWORDS
crates/shrimpk-memory/src/echo.rs Use shared keywords, boost +0.08, cap +0.50, new test

Test plan

  • cargo test --workspace — 629 passed, 0 failed
  • cargo clippy --workspace -- -D warnings — 0 warnings
  • Cross-model prompt test: gemma3:1b (5/5), qwen2.5:3b (5/5), llama3.2:3b (5/5) — 0 refusals
  • New unit test temporal_boost_uses_shared_keywords passes
  • Python syntax valid (all 3 files)
  • Prompt text identical across all 3 benchmark files
  • LME-S full run with GPT-4o judge (post-merge, ~$1.50)

🤖 Generated with Claude Code

Liorrr and others added 4 commits April 9, 2026 03:03
- Replace per-file prompt variants with canonical READER_SYSTEM_PROMPT
  and READER_USER_TEMPLATE module-level constants
- Add context fence (-----) separating memories from question
- Add "not prior knowledge" constraint for grounded extraction
- Add "Answer:" completion suffix for consistent output format
- Remove hedge clause ("say you don't have that information") from v1
- Standardize v1 to temperature=0.0, num_predict=64 (matching v2)
- All 3 files now use identical prompt text

Files: run_longmemeval.py, run_longmemeval_v2.py, cross_model_smoke.py

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…e score cap

- Replace inline TEMPORAL_KEYWORDS with shared TEMPORAL_QUERY_KEYWORDS from shrimpk_core
  (adds: recently, today, yesterday, last week/month/year, just now, this morning/week/month, days/weeks/months ago)
- Increase temporal boost from +0.015 to +0.08 for meaningful ranking impact
- Raise score inflation cap from 0.35 to 0.50 for temporal + importance headroom
- Update existing test assertion to match new boost magnitude
- Add temporal_boost_uses_shared_keywords test covering "recently" keyword

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
  - Graph polish: view transitions, Louvain viz, edge labels, temporal
    slider, custom node shapes, entity super-nodes, echo-frequency sizing
  - Memory curation: inline edit, merge, manual links, retag, bulk ops
  - Export formats: per-memory JSON, GraphML/GEXF graph export
… 0.25

- Add TEMPORAL_QUERY_KEYWORDS (21 keywords) to shrimpk-core/config.rs as single
  source of truth for echo scoring and reformulation
- Re-export from shrimpk_core lib.rs
- Change importance_weight default from 0.0 to 0.25 so importance scoring
  contributes to final ranking out of the box

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 9, 2026

Greptile Summary

This PR delivers three coordinated improvements to the ShrimPK memory kernel: a unified extraction-framing prompt across all three benchmark runners (eliminating LLM refusals), a temporal boost refactor that replaces a stale 9-keyword inline list with a shared 21-keyword constant and raises the boost magnitude from +0.015 to +0.08, and an importance-weight activation change that switches the default from 0.0 to 0.25 so the scoring factor actually influences retrieval. All three tracks are well-tested (629 passing, new temporal_boost_uses_shared_keywords test added).

Key changes:

  • TEMPORAL_QUERY_KEYWORDS constant introduced in shrimpk-core as single source of truth; re-exported at crate root — eliminates the keyword mismatch that prevented "recently"/"last week" from triggering boosts
  • importance_weight default changed from 0.00.25 in EchoConfig::default(); score inflation cap raised from +0.35 → +0.50 to give the now-active boosts headroom
  • #[serde(default)] on importance_weight was not updated — it still resolves to f32::default() = 0.0 if EchoConfig is ever directly deserialized, diverging from the Default impl (see inline comment); fix is a one-liner following the activation_weight pattern
  • Benchmark prompt is now identical across all three files — a clean canonical text that frames the task as extraction rather than hedged QA

Confidence Score: 4/5

Safe to merge after fixing the #[serde(default)] divergence on importance_weight; all other changes are well-tested and correct.

The three tracks are coherent and all pass CI (629 tests). The only blocking issue is #[serde(default)] on importance_weight still resolving to 0.0 via f32::default() while the manual Default impl returns 0.25 — any direct EchoConfig deserialization silently reverts the intended default. It is a one-line fix following the existing activation_weight pattern. The float-precision assertion style is a P2 nit.

crates/shrimpk-core/src/config.rs#[serde(default)] on importance_weight needs to be changed to #[serde(default = "default_importance_weight")].

Vulnerabilities

No security concerns identified. Changes are limited to scoring constants, a shared keyword list, and benchmark prompt text. No new network surface, authentication paths, or secret handling.

Important Files Changed

Filename Overview
crates/shrimpk-core/src/config.rs Adds TEMPORAL_QUERY_KEYWORDS constant (21 keywords, single source of truth) and changes importance_weight default to 0.25, but #[serde(default)] on that field still resolves to 0.0 via f32::default(), creating a divergence from the manual Default impl.
crates/shrimpk-core/src/lib.rs Re-exports TEMPORAL_QUERY_KEYWORDS at the crate root — clean, minimal change.
crates/shrimpk-memory/src/echo.rs Replaces inline 9-keyword temporal list with shared constant (21 keywords), raises boost from +0.015 to +0.08, raises score cap from +0.35 to +0.50, updates existing test, and adds temporal_boost_uses_shared_keywords test. Float equality assertions use f64::EPSILON which is fragile for accumulated float arithmetic.
benchmarks/run_longmemeval.py Replaces hedge-clause QA prompt with extraction-framing prompt; logic and structure unchanged.
benchmarks/run_longmemeval_v2.py Aligned to the same canonical extraction prompt as run_longmemeval.py and cross_model_smoke.py; no behavioral changes.
benchmarks/cross_model_smoke.py Prompt updated to match canonical extraction prompt; no other logic changes.
BACKLOG.md New backlog tracking file added; documentation only, no code impact.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Q[Query string] --> TL[to_lowercase]
    TL --> KW{Any keyword in TEMPORAL_QUERY_KEYWORDS 21 shared terms}
    KW -- No --> SKIP[No temporal boost applied]
    KW -- Yes --> ITER[Iterate EchoResults]
    ITER --> TLB{result.labels contains temporal:* prefix?}
    TLB -- No --> NEXT[Next result]
    TLB -- Yes --> BOOST[final_score += 0.08]
    BOOST --> CAP{final_score > similarity + 0.50?}
    CAP -- No --> NEXT
    CAP -- Yes --> CLAMP[final_score = similarity + 0.50]
    CLAMP --> NEXT
    NEXT --> SORT[Re-sort by final_score]
    SORT --> OUT[EchoResults returned]
Loading

Comments Outside Diff (1)

  1. crates/shrimpk-core/src/config.rs, line 289-290 (link)

    P1 #[serde(default)] diverges from Default impl for importance_weight

    #[serde(default)] with no argument resolves to the field type's Default::default() — i.e. f32::default() = 0.0. But the manual Default for EchoConfig was just changed to return 0.25. These two deserialization paths are now inconsistent:

    // line 289 – what serde uses when this field is absent in the input
    #[serde(default)]           // ← f32::default() = 0.0
    pub importance_weight: f32, // ← but Default::default() is now 0.25

    In the current production flow (FileConfigresolve_configEchoConfig::auto_detect()) this is invisible because EchoConfig is never directly deserialized from TOML. However, EchoConfig derives Deserialize, so any direct round-trip (integration tests, future API endpoints, or toml::from_str::<EchoConfig>()) will silently get 0.0 instead of 0.25, re-introducing the "computed but unused" bug that this PR is supposed to fix.

    The rest of the codebase uses the correct pattern for non-zero defaults (compare activation_weight on line 286). Apply the same pattern here:

    fn default_importance_weight() -> f32 {
        0.25
    }
    
    // in EchoConfig struct:
    /// Weight of importance boost in scoring formula (0.0 = consolidation only).
    #[serde(default = "default_importance_weight")]
    pub importance_weight: f32,

Reviews (1): Last reviewed commit: "KS76: add shared TEMPORAL_QUERY_KEYWORDS..." | Re-trigger Greptile

@Liorrr Liorrr merged commit d55c5f2 into master Apr 9, 2026
7 checks passed
@Liorrr Liorrr deleted the feat/ks76-prompt-retrieval branch April 9, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant