Background
Suggested by Trevor (tech lead). Users — nurses and midwives in Zanzibar — often type short, colloquial, or incomplete queries (e.g. "baby not breathing", "damu baada ya kujifungua"). These may not semantically match the formal language used in WHO/MOHSW guideline documents, leading to poor retrieval even when the relevant content exists.
Query rewriting reformulates the user's input into a cleaner, more retrieval-friendly form before embedding and vector search.
Approaches to Investigate
1. LLM-based rewriting (on-device)
Use Gemma 4 E4B itself to rewrite the query before retrieval. A lightweight prompt like:
Rewrite the following clinical question in formal medical language suitable for searching clinical guidelines:
User query: "baby not breathing"
Rewritten: "Neonatal resuscitation for apnea at birth"
Tradeoff: Adds one LLM inference step before retrieval — latency cost to measure.
2. HyDE (Hypothetical Document Embeddings)
Ask the LLM to generate a hypothetical answer, then embed that answer instead of the original query for retrieval. Often improves recall significantly.
3. Multi-query retrieval
Generate multiple reformulations of the query, retrieve for each, then deduplicate/merge results before passing context to the LLM.
4. Query expansion (lightweight, no LLM)
Append synonyms or related clinical terms to the query using a small medical ontology. No extra LLM call needed.
Questions to Answer
Notes
Query rewriting is most impactful when retrieval is the bottleneck. Recommend running the evaluation pipeline (issue #33) first to confirm retrieval quality is actually the weak point before investing in this.
Background
Suggested by Trevor (tech lead). Users — nurses and midwives in Zanzibar — often type short, colloquial, or incomplete queries (e.g. "baby not breathing", "damu baada ya kujifungua"). These may not semantically match the formal language used in WHO/MOHSW guideline documents, leading to poor retrieval even when the relevant content exists.
Query rewriting reformulates the user's input into a cleaner, more retrieval-friendly form before embedding and vector search.
Approaches to Investigate
1. LLM-based rewriting (on-device)
Use Gemma 4 E4B itself to rewrite the query before retrieval. A lightweight prompt like:
Tradeoff: Adds one LLM inference step before retrieval — latency cost to measure.
2. HyDE (Hypothetical Document Embeddings)
Ask the LLM to generate a hypothetical answer, then embed that answer instead of the original query for retrieval. Often improves recall significantly.
3. Multi-query retrieval
Generate multiple reformulations of the query, retrieve for each, then deduplicate/merge results before passing context to the LLM.
4. Query expansion (lightweight, no LLM)
Append synonyms or related clinical terms to the query using a small medical ontology. No extra LLM call needed.
Questions to Answer
RagPipeline.kt, or as a preprocessing step inRagStream.kt?Notes
Query rewriting is most impactful when retrieval is the bottleneck. Recommend running the evaluation pipeline (issue #33) first to confirm retrieval quality is actually the weak point before investing in this.