Skip to content

Query rewriting: reformulate user input before retrieval to improve RAG quality #36

@nmrenyi

Description

@nmrenyi

Background

Suggested by Trevor (tech lead). Users — nurses and midwives in Zanzibar — often type short, colloquial, or incomplete queries (e.g. "baby not breathing", "damu baada ya kujifungua"). These may not semantically match the formal language used in WHO/MOHSW guideline documents, leading to poor retrieval even when the relevant content exists.

Query rewriting reformulates the user's input into a cleaner, more retrieval-friendly form before embedding and vector search.

Approaches to Investigate

1. LLM-based rewriting (on-device)

Use Gemma 4 E4B itself to rewrite the query before retrieval. A lightweight prompt like:

Rewrite the following clinical question in formal medical language suitable for searching clinical guidelines:
User query: "baby not breathing"
Rewritten: "Neonatal resuscitation for apnea at birth"

Tradeoff: Adds one LLM inference step before retrieval — latency cost to measure.

2. HyDE (Hypothetical Document Embeddings)

Ask the LLM to generate a hypothetical answer, then embed that answer instead of the original query for retrieval. Often improves recall significantly.

3. Multi-query retrieval

Generate multiple reformulations of the query, retrieve for each, then deduplicate/merge results before passing context to the LLM.

4. Query expansion (lightweight, no LLM)

Append synonyms or related clinical terms to the query using a small medical ontology. No extra LLM call needed.

Questions to Answer

  • Does query rewriting meaningfully improve retrieval quality on MAM-AI's current document corpus? (Measure with Context Precision/Recall via RAGAS — see issue Integrate RAGAS and DeepEval for end-to-end RAG evaluation (retrieval + generation) #33)
  • What is the latency cost of an extra LLM rewriting step on-device?
  • Does it help more for Swahili queries than English ones?
  • Where does rewriting fit in the pipeline — before embedding in RagPipeline.kt, or as a preprocessing step in RagStream.kt?

Notes

Query rewriting is most impactful when retrieval is the bottleneck. Recommend running the evaluation pipeline (issue #33) first to confirm retrieval quality is actually the weak point before investing in this.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions