diff --git a/CHANGELOG.md b/CHANGELOG.md index c4647fa..fac3733 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,31 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/). ## [Unreleased] +## [0.7.5] — 2026-04-10 + +### Added +- **GraphRAG hybrid retrieval (KS61-KS64):** Full hybrid GraphRAG pipeline with label-graph navigation, 14 MCP tools, 517 tests +- **GraphRAG Viz MVP (KS65):** Tauri + Sigma.js graph visualization app with 3 daemon endpoints and LOD architecture +- **Schema-driven fact extraction (KS67):** LLM-based structured extraction with supersession for knowledge updates; 80% micro-benchmark recall +- **Entity unification (KS73):** EntityFrame, EntityId-based supersession for structured entity tracking +- **Configurable embedding (KS75):** EmbeddingProvider trait with 10 fastembed models and OpenAI API support +- **Universal prompt (KS76):** Single prompt template for all reader models (no per-model tuning); temporal boost and 5-signal importance scoring +- **Temporal boost (KS76):** Temporal-aware retrieval weighting for time-sensitive queries +- **Multiplicative supersession demotion (KS78):** Superseded memories receive 0.40x multiplicative penalty (configurable) +- **12 MCP tools:** Added `memory_graph`, `memory_related`, `memory_get` for graph navigation; `config_set` and `persist` for management + +### Changed +- **MCP tool count:** 9 to 12 tools (graph navigation + management tools) +- **Benchmark results:** 19/20 seeded micro-benchmark, 5/5 abstention, 3/3 NR, 24.2% LME-S (GPT-4o judge) +- **Default reader model:** qwen2.5:1.5b for consolidation + +### Fixed +- **Persistence format bug (Issue #16):** Format version mismatch causing MCP store/echo to fail +- **KU-3 knowledge update (KS77):** Fixed retrieval for updated knowledge entries +- **Temporal label dedup trap (KS77):** Prevent adverse dedup interaction when parent has temporal content +- **Child memory pipeline rewrite (KS69):** Consolidation redesign fixing IE-1 and KU-1 categories +- **Superseded_count declaration (KS78):** Variable declared in both temporal and standard code paths + ## [0.7.0] — 2026-04-02 ### Added diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4d61ee5..7c3c8dd 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -30,7 +30,7 @@ Unit tests run entirely in-memory and complete in seconds. Integration tests dow | `shrimpk-security` | Sandbox, permissions | Planned (stub) | | `shrimpk-kernel` | Integration facade | Stable | | `shrimpk-python` | PyO3 bindings | Exists (untested in CI) | -| `shrimpk-mcp` | MCP server (9 tools) | Stable | +| `shrimpk-mcp` | MCP server (12 tools) | Stable | | `shrimpk-daemon` | HTTP daemon + proxy | Stable | | `shrimpk-tray` | System tray app | Stable | diff --git a/SECURITY.md b/SECURITY.md index 402e1a4..437fc8f 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -4,8 +4,8 @@ | Version | Supported | |---------|-----------| -| 0.5.x (latest) | Yes | -| < 0.5.0 | No | +| 0.7.x (latest) | Yes | +| < 0.7.0 | No | Only the latest released version receives security fixes. If you are running an older version, please upgrade before reporting. diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index c283c75..8b4ffee 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -662,7 +662,7 @@ Integration layer that wires together `shrimpk-memory`, `shrimpk-context`, and ` ### shrimpk-mcp -Model Context Protocol server. Exposes Echo Memory as MCP tools (`store`, `echo`, `stats`, `forget`, `status`, `config_show`, `dump`) via JSON-RPC 2.0 over stdio. Compatible with any MCP-aware AI client. +Model Context Protocol server. Exposes Echo Memory as 12 MCP tools (`store`, `echo`, `memory_graph`, `memory_related`, `memory_get`, `stats`, `forget`, `status`, `config_show`, `config_set`, `dump`, `persist`) via JSON-RPC 2.0 over stdio. Compatible with any MCP-aware AI client. Key design: the `EchoEngine` is lazily initialized on first tool call. The server starts in milliseconds; fastembed model loading (a few seconds) is deferred until a memory operation is actually requested. diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md index 0468bf2..2f0d61a 100644 --- a/docs/ROADMAP.md +++ b/docs/ROADMAP.md @@ -1,331 +1,221 @@ # ShrimPK Roadmap This roadmap reflects the current state of the kernel and planned directions for future releases. -Dates are aspirational. Contributions are welcome at any stage — see the Contribution Opportunities +Dates are aspirational. Contributions are welcome at any stage -- see the Contribution Opportunities section for specific items you can pick up today. --- -## Current State — v0.5.0 +## Current State -- v0.7.5 -Released March 2026. The core pipeline is stable and benchmarked. +Released April 2026. The kernel is a mature push-based AI memory system with hybrid GraphRAG +retrieval, entity unification, configurable embedding, and universal prompt support. + +### Workspace + +11 crates + CLI binary: + +| Crate | Purpose | +|-------|---------| +| `shrimpk-core` | Types: MemoryEntry, EchoResult, EchoConfig, Modality | +| `shrimpk-memory` | Engine: EchoEngine, embedding, LSH, Bloom, Hebbian, labels, FSRS decay, ACT-R activation | +| `shrimpk-daemon` | HTTP server: axum, proxy, routes (/health, /debug, /v1/chat/completions) | +| `shrimpk-mcp` | MCP server (stdio): 12 tools for memory management and graph navigation | +| `shrimpk-context` | ContextAssembler: token-budgeted prompt compilation | +| `shrimpk-router` | CascadeRouter: provider routing (not yet wired in daemon) | +| `shrimpk-security` | PII masking (stub -- 6 categories, 14 regex patterns) | +| `shrimpk-kernel` | Facade crate re-exporting core + memory + context | +| `shrimpk-python` | PyO3 bindings (maturin) | +| `shrimpk-ros2` | ROS2 bridge (stub) | +| `shrimpk-tray` | Windows system tray (win32) | +| `cli/` | CLI binary: store, echo, status, explore (ratatui TUI) | ### What is shipped and working **Echo pipeline** The full retrieval chain is operational: Bloom filter pre-screening (O(1) topic elimination), -LSH candidate retrieval (sub-linear at scale), cosine reranking, Hebbian co-activation boosting, -and recency decay. Optional HyDE (hypothetical document expansion) and LLM reranking are -available via config flags. +LSH candidate retrieval (sub-linear at scale), label-based pre-filtering, cosine reranking, +Hebbian co-activation boosting, FSRS decay, ACT-R activation, temporal boost, importance +scoring, and multiplicative supersession demotion. Optional HyDE (hypothetical document +expansion) and LLM reranking are available via config flags. -**Text memory — BGE-small-EN-v1.5** +**Hybrid GraphRAG (KS61-KS64)** -Primary embedding model: `BAAI/bge-small-en-v1.5` via fastembed. The pipeline achieves 84% -top-3 recall (combined HyDE + LLM reranker config) on a realistic 41-memory, 25-query benchmark -spanning five LongMemEval categories: information extraction, multi-session reasoning, temporal -reasoning, knowledge update, and preference tracking. Temporal queries hit 100% (5/5) across -all pipeline configs. +Full hybrid GraphRAG pipeline combining vector similarity with label-graph traversal. +Label-graph navigation enables neighborhood exploration from any memory. 14 MCP tools +support store, retrieval, graph exploration, and management operations. 517 tests cover +the complete pipeline. -**Vision memory — CLIP ViT-B/32** +**Entity unification (KS73)** -Image memories are embedded using CLIP ViT-B/32 (512-dim) via fastembed's `ClipVitB32` variant. -Cross-modal retrieval (text queries retrieving image memories) works in the same embedding space. -The vision feature is gated behind `--features vision`. +EntityFrame and EntityId-based supersession for structured entity tracking. When new +information contradicts or updates an existing entity, the old memory is superseded and +receives a multiplicative demotion penalty (default 0.40x, configurable). This prevents +stale knowledge from outranking current facts. -**Sleep consolidation** +**Configurable embedding (KS75)** -A background consolidation pass runs during idle periods (configurable schedule). It uses a local -LLM via Ollama to extract atomic facts from raw memories, de-duplicate, and merge related entries. -In benchmarks, consolidation lifted top-3 recall from 72% to 76% over the baseline (no -consolidation) configuration. +EmbeddingProvider trait abstraction with 10 fastembed models and OpenAI API support. +Default model: BGE-small-EN-v1.5 (384-dim) via fastembed. The provider can be swapped +at configuration time without code changes. -**SHRM v2 storage format** +**Universal prompt (KS76)** -Memory-mapped binary format with 32-bit CRC per entry, atomic flush, and crash recovery. Stores -text embeddings (384-dim), optional vision embeddings (512-dim), optional speech embeddings -(640-dim field, populated from v0.6.0 onward), metadata, and sensitivity labels. +One prompt template for all reader models. No per-model tuning required. Validated with +qwen2.5:1.5b (default) and qwen2.5:3b. Includes temporal boost for time-sensitive queries +and a 5-signal importance scoring system. -**Speech architecture (structure only)** +**Multimodal SHRM v2** + +Memory-mapped binary format with 32-bit CRC per entry, atomic flush, and crash recovery. +Stores text embeddings (384-dim), optional vision embeddings (512-dim), optional speech +embeddings (640-dim), metadata, and sensitivity labels. Three-channel architecture: text +(BGE-small-EN-v1.5), vision (CLIP ViT-B/32), speech (ECAPA-TDNN 256 + Whisper-tiny 384). + +**Sleep consolidation** -`shrimpk-memory/src/speech.rs` defines the full `SpeechEmbedder` struct with dimension constants -(`SPEAKER_DIM=256`, `PROSODY_DIM=384`, `SPEECH_DIM=640`), Whisper log-Mel preprocessing, and -ONNX sessions wired in v0.6.0. The 16 kHz resampler uses linear interpolation. +Background consolidation using a local LLM via Ollama with schema-driven fact extraction. +Child memory pipeline creates atomic facts from raw memories, supports supersession for +knowledge updates. Default reader model: qwen2.5:1.5b. -**MCP server** +**MCP server (12 tools)** -`shrimpk-mcp` exposes nine tools over stdio: `store`, `echo`, `forget`, `stats`, `status`, -`config_show`, `config_set`, `dump`, `persist` (plus `store_image` and `store_audio` when -multimodal features are enabled). Compatible with Claude Desktop and any MCP client. +`shrimpk-mcp` exposes 12 tools over stdio: `store`, `echo`, `memory_graph`, +`memory_related`, `memory_get`, `stats`, `forget`, `status`, `config_show`, `config_set`, +`dump`, `persist`. Additional multimodal tools (`store_image`, `store_audio`) available +when feature flags are enabled. Compatible with Claude Desktop and any MCP client. **Daemon + tray** -`shrimpk-daemon` runs as a background HTTP service on `localhost:11435`. `shrimpk-tray` provides -a system tray icon and launch/stop controls on Windows. +`shrimpk-daemon` runs as a background HTTP service on `localhost:11435` with OpenAI-compatible +proxy (`/v1/chat/completions`). `shrimpk-tray` provides a system tray icon and launch/stop +controls on Windows. + +### Benchmark results + +| Benchmark | Score | +|-----------|-------| +| Seeded micro-benchmark | 19/20 | +| Abstention (no-answer detection) | 5/5 | +| Negative retrieval | 3/3 | +| LME-S (GPT-4o judge) | 24.2% overall, 25.3% task-avg | **Performance (release build, i7-1165G7)** | Metric | Result | |--------|--------| | P50 echo latency at 10K memories | 3.50ms | -| P50 echo latency at 100K memories | 23.79ms (regression — see Known Issues) | | Store throughput | ~128 memories/sec | | RAM (10K text memories) | ~85 MB | ---- - -## v0.6.0 — Speech and Vision Upgrade - -Target: Q2 2026. Focus: wire the speech ONNX models and upgrade the vision model. - -### Speech: ONNX models wired (640-dim — DONE in KS51) - -The speech pipeline is **640-dim** (ECAPA-TDNN 256 + Whisper-tiny encoder 384). The emotion -channel (Wav2Small, CC-BY-NC-SA-4.0) was dropped as license-incompatible. Both wired models -carry permissive licenses: ECAPA-TDNN (Apache-2.0) and Whisper-tiny (MIT). - -#### ECAPA-TDNN 256-dim — speaker identification - -Model: `Wespeaker/wespeaker-cnceleb-resnet34-LM` (`cnceleb_resnet34_LM.onnx`, ~24 MB, -Apache 2.0). Loaded via `ort` (ONNX Runtime Rust crate). Auto-downloads from HuggingFace Hub. - -Input: 80-bin FBank features, shape `(1, frames, 80)`, 25ms frame, 10ms hop, 16 kHz. -Output: 256-dim L2-normalized speaker embedding (output name: `embs`). - -#### Whisper-tiny encoder 384-dim — prosody - -Model: `onnx-community/whisper-tiny` (`onnx/encoder_model.onnx`, 32.9 MB, MIT). The encoder -takes 80-bin Whisper log-Mel spectrogram, shape `(batch, 80, 3000)`, padded to 30 seconds. -Mean-pooling over the sequence dimension produces a 384-dim prosody vector. - -#### Spectrogram preprocessing - -Two spectrogram pipelines run in parallel: - -- **Kaldi fbank** for ECAPA-TDNN: 80 Mel bins, 25ms frame, 10ms hop, 16 kHz. Implementation via - the `mel-spec` crate (v0.3.4, MIT). -- **Whisper log-Mel** for the encoder: 80 Mel bins, N_FFT=400, hop=160 samples, normalized as - `(log_spec + 4.0) / 4.0`. Also handled by `mel-spec`. - -#### Band-limited resampling - -The current `resample_linear()` stub in `speech.rs` introduces aliasing at high downsample ratios -(e.g., 48 kHz → 16 kHz). v0.6.0 replaces it with the `rubato` crate (v1.0.1), which provides -sinc-interpolation and FFT-based resamplers that are alias-free. - -#### VAD gate — Silero VAD - -A Voice Activity Detection pass runs before the ECAPA and Whisper sessions. Silent frames -(below a configurable threshold) are skipped entirely to avoid embedding noise as speech. -Silero VAD is loaded as a small ONNX model (~2 MB, MIT license) via a direct `ort::Session`. -The `silero-vad` crate on crates.io is GPL-2.0 and is explicitly avoided — the ONNX model -is loaded directly. +### Key milestones (KS67-KS78) -#### ort version pinning - -fastembed v5.x pins `ort = "=2.0.0-rc.11"`. The speech code must use the exact same version -to avoid Cargo dependency conflicts. Do not add `ort` as a direct workspace dependency with a -different version specifier. - -#### Model download on first use - -Models are downloaded on first `SpeechEmbedder::from_config()` call if not already cached, -following the fastembed pattern: `hf-hub` crate + `dirs::cache_dir()/shrimpk/models/speech/`. -Total first-use download: ~60 MB (ECAPA 25 MB + Whisper encoder 33 MB + Silero VAD 2 MB). - -### Vision: CLIP ViT-B/32 → Nomic Embed Vision v1.5 (512 → 768-dim) - -`NomicEmbedVisionV15` is already a first-class variant in fastembed v5 (`ImageEmbeddingModel` -enum). The swap is a single-line change in `embedder.rs`. The quality improvement is substantial: -+7.8 percentage points on ImageNet zero-shot (71.0% vs 63.2%) and dramatically better cross-modal -MTEB quality (62.28 vs 43.82 for the paired text model). The q4-quantized ONNX is 62 MB vs -CLIP's unquantized 352 MB — a 6x size reduction. - -The 512 → 768 dimension change is a **breaking migration** for stored vision embeddings. The -SHRM v2 format header records embedding dimensions per modality. On first launch after upgrade, -the kernel will detect the dimension mismatch, re-embed all stored vision memories, and rewrite -the store. For the v0.5.0 → v0.6.0 transition the user base is small and a hard-cut re-embed -is the correct strategy. A migration guide will be included in the release notes. - -Cross-modal text queries against vision memories must use Nomic Text v1.5 with the mandatory -`search_query:` prefix. This is handled internally by the embedder — callers do not need to -add the prefix manually. - -### Fix: 100K latency regression - -The P50 latency at 100K memories is 23.79ms against a 4.0ms target. Investigation is required -before v0.6.0 ships. See Known Issues for details. +| Sprint | Milestone | +|--------|-----------| +| KS67 | Schema-driven fact extraction, 80% micro-benchmark recall | +| KS68 | IE-1 + KU-1 fixed, 17/20 embedding-only, Greptile P1s resolved | +| KS69 | Consolidation redesign, child memory pipeline rewrite, 19/20 seeded | +| KS70 | 20/20 seeded, qwen2.5:1.5b default, first real consolidation validation | +| KS73 | Entity unification, EntityFrame, EntityId supersession | +| KS75 | Configurable embedding: EmbeddingProvider trait, 10 models, OpenAI API | +| KS76 | Universal prompt, temporal boost, importance scoring | +| KS77 | 19/20 seeded, 5/5 abstention, KU-3 fixed, temporal dedup trap found | +| KS78 | Multiplicative supersession demotion (0.40x default) | --- -## v0.7.0 — Robotics, Speaker Upgrade, and Quantization - -Target: Q3 2026. Focus: ROS2 integration, model quality improvements, and memory footprint. - -### ROS2 bridge — `shrimpk-ros2` crate - -A new workspace crate `crates/shrimpk-ros2` will provide a ROS2 node that exposes ShrimPK -memory over standard ROS2 topics and services. - -The node subscribes to: -- `/shrimpk/store/text` (`std_msgs/String`) — text memories -- `/shrimpk/store/image` (`sensor_msgs/CompressedImage`) — visual memories via CLIP -- `/shrimpk/store/audio` (`audio_common_msgs/AudioStamped`) — speech memories +## Next -- KS79: Multi-Resolution Retrieval -The node publishes to: -- `/shrimpk/echo` (`shrimpk_msgs/EchoResults`) — push-activated memories -- `/shrimpk/context` (`std_msgs/String`, latched) — current context string for downstream LLMs -- `/shrimpk/status` (`std_msgs/String`, JSON) — health and latency stats +Target: Q2 2026. Focus: retrieval quality at multiple granularity levels. -A `/shrimpk/query` service (`shrimpk_msgs/EchoQuery`) supports pull-based querying for nodes -that prefer request/response semantics over the push model. +Multi-resolution retrieval allows the echo pipeline to match queries against memories at +different levels of abstraction -- raw memories, consolidated facts, entity summaries, and +topic clusters. This enables both precise fact lookup and broad contextual recall within +the same query. -Primary integration path: `rclrs` 0.7+ with colcon on ROS2 Jazzy (Ubuntu 24.04). -Alternative: `r2r` for simpler `cargo build` integration without colcon. -Optional feature flag: `ros2-native` using `ros2-client` (pure Rust DDS, no ROS2 install needed) -for distribution to users who do not have a full ROS2 environment. - -The echo latency budget is feasible: 3.50ms ShrimPK echo is well within a 30 Hz camera frame -(33ms). The full pipeline including embedding and topic publish should stay under 15–20ms. - -No other push-based memory system has a ROS2 bridge. ReMEmbR (NVIDIA) is pull-based and -Python-only. `shrimpk-ros2` would be the first native-Rust, push-activated memory layer for ROS2. - -### Speaker upgrade: ECAPA-TDNN → CAM++ - -CAM++ (Context-Aware Masking) achieves lower equal error rate than ECAPA-TDNN on VoxCeleb1/2 -at comparable model size. The upgrade is a drop-in replacement at the 512-dim output level -provided an Apache 2.0-compatible ONNX export is available. If no suitable pre-built ONNX exists, -the ECAPA-TDNN model ships in v0.7.0 and CAM++ is deferred to v0.8.0. +--- -### f16 quantization for vision and speech embeddings +## Next -- KS80: Memory Lifecycle -Stored vision and speech embeddings currently use f32 (4 bytes/dimension). A v0.7.0 storage -format revision (SHRM v3) will store these as f16 (2 bytes/dimension) with promotion to f32 -at query time. Impact: ~50% reduction in disk and memory footprint for vision/speech memories, -no measurable quality loss for cosine similarity. +Target: Q2 2026. Focus: memory aging, archival, and lifecycle management. -SHRM v3 will include automatic migration from v2 on first launch. +Formalize the memory lifecycle from creation through active use, staleness detection, +archival, and eventual pruning. Integrate FSRS scheduling data with usage patterns to +make informed retention decisions. Provide user-facing controls for lifecycle policies. --- -## Future — No Fixed Timeline +## Future -- No Fixed Timeline These items are research directions or require dependencies that are not yet settled. +### ROS2 bridge production readiness + +`shrimpk-ros2` exists as a stub. Production readiness requires ROS2 Jazzy integration +via `rclrs`, topic/service wiring, and latency validation within a 30 Hz camera frame +budget. The push-based architecture maps naturally to ROS2 topic publishing. + ### Custom fine-tuned embedding model -The text embedding model (BGE-small) is a general-purpose model trained on web text. A model -fine-tuned specifically on personal memory data (short episodic sentences, user preferences, -recurring entities) could improve recall quality without increasing model size. This requires -a labeled dataset and an ML training pipeline — it is a research item, not an implementation task. +A model fine-tuned specifically on personal memory data (short episodic sentences, user +preferences, recurring entities) could improve recall quality without increasing model +size. This requires a labeled dataset and an ML training pipeline. ### crates.io publish Publishing `shrimpk-core`, `shrimpk-memory`, and (eventually) `shrimpk-ros2` to crates.io -is planned once the API stabilizes beyond v0.6.0. The current pre-1.0 semver signals that -breaking changes are expected. +is planned once the API stabilizes. The current pre-1.0 semver signals that breaking +changes are expected. ### Cloud sync -Optional encrypted sync of the memory store across devices. End-to-end encrypted, the server -sees only ciphertext. The key design question is key management — the server must never hold -decryption keys. This is a future research and design item. +Optional encrypted sync of the memory store across devices. End-to-end encrypted, the +server sees only ciphertext. The key design question is key management -- the server must +never hold decryption keys. -### Emotion channel +### Vision model upgrade -The 3-dim arousal/dominance/valence emotion channel is architecturally present in `speech.rs` -(`EMOTION_DIM=3`) but has no available ONNX model under a permissive license. If a suitable -Apache 2.0 or MIT model emerges, the emotion channel can be re-enabled without a breaking change -to the storage format (the slot is reserved). Alternatively, a categorical speech emotion -recognition model (4-class: angry, happy, sad, neutral) under a permissive license could -replace the dimensional approach. +Nomic Embed Vision v1.5 or SigLIP 2 as a CLIP replacement. The 512 to 768 dimension +change would be a breaking migration for stored vision embeddings. Deferred until the +user base is large enough to justify the migration complexity. + +### Speaker upgrade: ECAPA-TDNN to CAM++ + +CAM++ (Context-Aware Masking) achieves lower equal error rate than ECAPA-TDNN on +VoxCeleb1/2. Blocked on availability of an Apache 2.0-compatible ONNX export. --- ## Contribution Opportunities -All issues below are open for contribution. The project uses Apache 2.0. Opening a discussion -issue before starting significant work is encouraged to avoid duplication. +All issues below are open for contribution. The project uses Apache 2.0. Opening a +discussion issue before starting significant work is encouraged to avoid duplication. ### Good first issue -**Fix vision feature flag propagation** (difficulty: low, Rust knowledge required) -Vision benchmarks (`echo_multimodal_bench.rs`) are blocked because -`#[cfg(feature = "vision")]` checks the root test crate's features, not `shrimpk-memory`'s. -The fix is adding a forwarding `vision` feature to the root `Cargo.toml` that enables -`shrimpk-memory/vision`. Estimated: 1–2 hours. - -**Add `search_query:` prefix for cross-modal text queries** (difficulty: low, Rust) -When Nomic Embed Vision v1.5 is the active vision model (v0.6.0), text queries used in -cross-modal retrieval must be prefixed with `"search_query: "`. This should be applied -automatically in `MultiEmbedder` when the Nomic vision model is active, not pushed to callers. -Requires reading the fastembed API and adding a model-variant check. - **Extend the Tier 2 benchmark with a CrossEncoder config** (difficulty: low, Rust) -The realistic Tier 2 benchmark tests four pipeline configs (Baseline, HyDE, Reranker-LLM, -Combined). A CrossEncoder-only config was benchmarked separately and showed strong results -(2,823ms average at 100% recall on 6 regression cases). Adding it to the standard Tier 2 -suite would complete the comparison matrix. +The realistic Tier 2 benchmark tests four pipeline configs. Adding a CrossEncoder-only +config would complete the comparison matrix. ### Help wanted -**Investigate 100K latency regression** (difficulty: medium, Rust + profiling) -P50 at 100K memories is 23.79ms against a 4.0ms target. Likely causes: LSH bucket saturation -with BGE-small embedding distribution, brute-force fallback frequency, or Windows I/O interference -during the benchmark. The investigation should profile LSH hit rate, Bloom false-positive rate, -and brute-force fallback frequency at scale. Tools: `perf`, `cargo flamegraph`, or the -`tracing` spans already in the echo path. A fix might involve tuning LSH parameters -(hash count, bucket width) for the BGE-small distribution. - -**~~Wire ECAPA-TDNN ONNX session~~** — DONE (KS51). Wespeaker ResNet34 256-dim, FBank -preprocessing implemented in pure Rust (`compute_fbank_flat()`), `ort` version matches -fastembed's pinned `=2.0.0-rc.11`. - -**~~Wire Whisper-tiny encoder ONNX session~~** — DONE (KS51). Whisper-tiny encoder takes -`(1, 80, 3000)` log-Mel spectrogram, outputs `(1, 1500, 384)` hidden states, mean-pooled -to 384-dim. -Preprocessing uses the Whisper log-Mel formula implemented in `mel-spec`. Can be done in -parallel with the ECAPA item by a different contributor. - -**Implement band-limited resampling with `rubato`** (difficulty: medium, Rust + DSP) -Replace `resample_linear()` in `speech.rs` with sinc or FFT-based resampling from the `rubato` -crate (v1.0.1). The current linear resampler causes aliasing at high downsample ratios and is -documented as a placeholder. The replacement should pass the existing `resample_*` unit tests -and add a new test verifying that a 1 kHz sine wave downsampled from 48 kHz to 16 kHz does not -contain aliasing artifacts above 8 kHz. - **Linux CI hardening** (difficulty: medium, DevOps + Rust) -The kernel builds and tests pass on CI for Linux and macOS, but the test coverage is lower than -on the primary Windows development machine. Specifically: daemon startup tests, tray icon tests, -and file locking tests need Linux-specific validation. Contributions improving Linux CI coverage -are welcome. +The kernel builds and tests pass on CI, but test coverage is lower on Linux than on the +primary Windows development machine. Contributions improving Linux CI coverage are welcome. + +**100K latency profiling** (difficulty: medium, Rust + profiling) +P50 at 100K memories needs investigation. Likely causes: LSH bucket saturation with +BGE-small embedding distribution, or brute-force fallback frequency. Tools: `perf`, +`cargo flamegraph`, or the `tracing` spans in the echo path. ### Research needed **Emotion model under permissive license** (difficulty: high, ML research) -The 3-dim arousal/dominance/valence emotion slot in the speech pipeline is reserved but empty -because all mature dimensional emotion models (Wav2Small, wav2vec2-large-robust) carry -CC-BY-NC-SA-4.0 licenses. Options: (1) identify an existing Apache 2.0 / MIT categorical -speech emotion model that can be exported to ONNX and mapped to a valence proxy, (2) train a -small distillation model on CC0 or public-domain audio corpora, or (3) propose an alternative -paralinguistic dimension that has available permissive models. +The 3-dim arousal/dominance/valence emotion slot in the speech pipeline is reserved but +empty because all mature dimensional emotion models carry CC-BY-NC-SA-4.0 licenses. **LSH parameter tuning for BGE-small distribution** (difficulty: high, information retrieval) -The LSH index was tuned for `all-MiniLM-L6-v2` embeddings. The upgrade to `BGE-small-EN-v1.5` -changed the embedding distribution in ways that may require different hash count, bucket width, -or candidate list size to maintain sub-10ms P50 at 100K scale. This is an empirical research -task: vary LSH parameters, run the 100K latency benchmark, and identify the configuration that -recovers the 4.0ms target. - -**CAM++ Apache 2.0 ONNX availability** (difficulty: medium, ML research) -The v0.7.0 speaker upgrade to CAM++ depends on finding or producing an Apache 2.0-compatible -ONNX export. WeSpeaker provides CAM++ checkpoints but the license status of any pre-built -ONNX exports needs verification. This research item should produce a clear verdict: model ID, -license, ONNX file location, and input/output specification. - -**SigLIP 2 fastembed support** (difficulty: high, ML + Rust) -SigLIP 2 ViT-B/16 achieves 78.2% ImageNet zero-shot (vs Nomic Vision v1.5 at 71.0%) but has -no official ONNX model and no fastembed support as of March 2026. If an Apache 2.0 ONNX export -emerges, contributing a `SigLIP2VitB16` variant to fastembed and then updating ShrimPK's -vision channel would be a meaningful quality improvement. +The LSH index was tuned for all-MiniLM-L6-v2 embeddings. The upgrade to BGE-small changed +the embedding distribution in ways that may require different hash count, bucket width, or +candidate list size to maintain sub-10ms P50 at 100K scale.