KS76: Universal prompt + temporal boost + importance scoring by Liorrr · Pull Request #15 · bellkisai/kernel

Liorrr · 2026-04-09T00:46:43Z

Summary

Track 1 — Universal Reader Prompt: Replaced hedge-clause QA prompt with extraction-framing prompt across all 3 benchmark files. Zero refusals on qwen2.5:3b, gemma3:1b, llama3.2:3b (0/15).
Track 2 — Temporal Boost Fix: Replaced inline 9-keyword list with shared TEMPORAL_QUERY_KEYWORDS constant (21 keywords) from shrimpk-core. Increased boost magnitude from +0.015 to +0.08. Fixes keyword mismatch where "recently", "last week" etc. never triggered the boost.
Track 3 — Retrieval Scoring: Changed importance_weight default from 0.0 to 0.25 (was computed but unused). Raised score inflation cap from +0.35 to +0.50 to give boosts ranking headroom.

Changes

File	What
`benchmarks/run_longmemeval.py`	Universal extraction prompt, no hedge clauses
`benchmarks/run_longmemeval_v2.py`	Aligned to same canonical prompt
`benchmarks/cross_model_smoke.py`	Aligned to same canonical prompt
`crates/shrimpk-core/src/config.rs`	`TEMPORAL_QUERY_KEYWORDS` constant, `importance_weight: 0.25`
`crates/shrimpk-core/src/lib.rs`	Re-export `TEMPORAL_QUERY_KEYWORDS`
`crates/shrimpk-memory/src/echo.rs`	Use shared keywords, boost +0.08, cap +0.50, new test

Test plan

cargo test --workspace — 629 passed, 0 failed
cargo clippy --workspace -- -D warnings — 0 warnings
Cross-model prompt test: gemma3:1b (5/5), qwen2.5:3b (5/5), llama3.2:3b (5/5) — 0 refusals
New unit test temporal_boost_uses_shared_keywords passes
Python syntax valid (all 3 files)
Prompt text identical across all 3 benchmark files
LME-S full run with GPT-4o judge (post-merge, ~$1.50)

🤖 Generated with Claude Code

- Replace per-file prompt variants with canonical READER_SYSTEM_PROMPT and READER_USER_TEMPLATE module-level constants - Add context fence (-----) separating memories from question - Add "not prior knowledge" constraint for grounded extraction - Add "Answer:" completion suffix for consistent output format - Remove hedge clause ("say you don't have that information") from v1 - Standardize v1 to temperature=0.0, num_predict=64 (matching v2) - All 3 files now use identical prompt text Files: run_longmemeval.py, run_longmemeval_v2.py, cross_model_smoke.py Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…e score cap - Replace inline TEMPORAL_KEYWORDS with shared TEMPORAL_QUERY_KEYWORDS from shrimpk_core (adds: recently, today, yesterday, last week/month/year, just now, this morning/week/month, days/weeks/months ago) - Increase temporal boost from +0.015 to +0.08 for meaningful ranking impact - Raise score inflation cap from 0.35 to 0.50 for temporal + importance headroom - Update existing test assertion to match new boost magnitude - Add temporal_boost_uses_shared_keywords test covering "recently" keyword Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Graph polish: view transitions, Louvain viz, edge labels, temporal slider, custom node shapes, entity super-nodes, echo-frequency sizing - Memory curation: inline edit, merge, manual links, retag, bulk ops - Export formats: per-memory JSON, GraphML/GEXF graph export

… 0.25 - Add TEMPORAL_QUERY_KEYWORDS (21 keywords) to shrimpk-core/config.rs as single source of truth for echo scoring and reformulation - Re-export from shrimpk_core lib.rs - Change importance_weight default from 0.0 to 0.25 so importance scoring contributes to final ranking out of the box Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

greptile-apps · 2026-04-09T00:51:35Z

Greptile Summary

This PR delivers three coordinated improvements to the ShrimPK memory kernel: a unified extraction-framing prompt across all three benchmark runners (eliminating LLM refusals), a temporal boost refactor that replaces a stale 9-keyword inline list with a shared 21-keyword constant and raises the boost magnitude from +0.015 to +0.08, and an importance-weight activation change that switches the default from 0.0 to 0.25 so the scoring factor actually influences retrieval. All three tracks are well-tested (629 passing, new temporal_boost_uses_shared_keywords test added).

Key changes:

TEMPORAL_QUERY_KEYWORDS constant introduced in shrimpk-core as single source of truth; re-exported at crate root — eliminates the keyword mismatch that prevented "recently"/"last week" from triggering boosts
importance_weight default changed from 0.0 → 0.25 in EchoConfig::default(); score inflation cap raised from +0.35 → +0.50 to give the now-active boosts headroom
#[serde(default)] on importance_weight was not updated — it still resolves to f32::default() = 0.0 if EchoConfig is ever directly deserialized, diverging from the Default impl (see inline comment); fix is a one-liner following the activation_weight pattern
Benchmark prompt is now identical across all three files — a clean canonical text that frames the task as extraction rather than hedged QA

Confidence Score: 4/5

Safe to merge after fixing the #[serde(default)] divergence on importance_weight; all other changes are well-tested and correct.

The three tracks are coherent and all pass CI (629 tests). The only blocking issue is #[serde(default)] on importance_weight still resolving to 0.0 via f32::default() while the manual Default impl returns 0.25 — any direct EchoConfig deserialization silently reverts the intended default. It is a one-line fix following the existing activation_weight pattern. The float-precision assertion style is a P2 nit.

crates/shrimpk-core/src/config.rs — #[serde(default)] on importance_weight needs to be changed to #[serde(default = "default_importance_weight")].

Vulnerabilities

No security concerns identified. Changes are limited to scoring constants, a shared keyword list, and benchmark prompt text. No new network surface, authentication paths, or secret handling.

Important Files Changed

Filename	Overview
crates/shrimpk-core/src/config.rs	Adds `TEMPORAL_QUERY_KEYWORDS` constant (21 keywords, single source of truth) and changes `importance_weight` default to 0.25, but `#[serde(default)]` on that field still resolves to 0.0 via `f32::default()`, creating a divergence from the manual `Default` impl.
crates/shrimpk-core/src/lib.rs	Re-exports `TEMPORAL_QUERY_KEYWORDS` at the crate root — clean, minimal change.
crates/shrimpk-memory/src/echo.rs	Replaces inline 9-keyword temporal list with shared constant (21 keywords), raises boost from +0.015 to +0.08, raises score cap from +0.35 to +0.50, updates existing test, and adds `temporal_boost_uses_shared_keywords` test. Float equality assertions use `f64::EPSILON` which is fragile for accumulated float arithmetic.
benchmarks/run_longmemeval.py	Replaces hedge-clause QA prompt with extraction-framing prompt; logic and structure unchanged.
benchmarks/run_longmemeval_v2.py	Aligned to the same canonical extraction prompt as `run_longmemeval.py` and `cross_model_smoke.py`; no behavioral changes.
benchmarks/cross_model_smoke.py	Prompt updated to match canonical extraction prompt; no other logic changes.
BACKLOG.md	New backlog tracking file added; documentation only, no code impact.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Q[Query string] --> TL[to_lowercase]
    TL --> KW{Any keyword in TEMPORAL_QUERY_KEYWORDS 21 shared terms}
    KW -- No --> SKIP[No temporal boost applied]
    KW -- Yes --> ITER[Iterate EchoResults]
    ITER --> TLB{result.labels contains temporal:* prefix?}
    TLB -- No --> NEXT[Next result]
    TLB -- Yes --> BOOST[final_score += 0.08]
    BOOST --> CAP{final_score > similarity + 0.50?}
    CAP -- No --> NEXT
    CAP -- Yes --> CLAMP[final_score = similarity + 0.50]
    CLAMP --> NEXT
    NEXT --> SORT[Re-sort by final_score]
    SORT --> OUT[EchoResults returned]

Comments Outside Diff (1)

crates/shrimpk-core/src/config.rs, line 289-290 (link)

#[serde(default)] diverges from Default impl for importance_weight

#[serde(default)] with no argument resolves to the field type's Default::default() — i.e. f32::default() = 0.0. But the manual Default for EchoConfig was just changed to return 0.25. These two deserialization paths are now inconsistent:
```
// line 289 – what serde uses when this field is absent in the input
#[serde(default)]           // ← f32::default() = 0.0
pub importance_weight: f32, // ← but Default::default() is now 0.25
```
In the current production flow (FileConfig → resolve_config → EchoConfig::auto_detect()) this is invisible because EchoConfig is never directly deserialized from TOML. However, EchoConfig derives Deserialize, so any direct round-trip (integration tests, future API endpoints, or toml::from_str::<EchoConfig>()) will silently get 0.0 instead of 0.25, re-introducing the "computed but unused" bug that this PR is supposed to fix.

The rest of the codebase uses the correct pattern for non-zero defaults (compare activation_weight on line 286). Apply the same pattern here:
```
fn default_importance_weight() -> f32 {
    0.25
}

// in EchoConfig struct:
/// Weight of importance boost in scoring formula (0.0 = consolidation only).
#[serde(default = "default_importance_weight")]
pub importance_weight: f32,
```

_{Reviews (1): Last reviewed commit: "KS76: add shared TEMPORAL_QUERY_KEYWORDS..." | Re-trigger Greptile}

Liorrr and others added 4 commits April 9, 2026 03:03

Liorrr merged commit d55c5f2 into master Apr 9, 2026
7 checks passed

Liorrr deleted the feat/ks76-prompt-retrieval branch April 9, 2026 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KS76: Universal prompt + temporal boost + importance scoring#15

KS76: Universal prompt + temporal boost + importance scoring#15
Liorrr merged 4 commits intomasterfrom
feat/ks76-prompt-retrieval

Liorrr commented Apr 9, 2026

Uh oh!

greptile-apps Bot commented Apr 9, 2026 •

edited

Loading

Vulnerabilities

Comments Outside Diff (1)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Liorrr commented Apr 9, 2026

Summary

Changes

Test plan

Uh oh!

greptile-apps Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Vulnerabilities

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 9, 2026 •

edited

Loading