Query rewriting: reformulate user input before retrieval to improve RAG quality

## Background

Suggested by Trevor (tech lead). Users — nurses and midwives in Zanzibar — often type short, colloquial, or incomplete queries (e.g. "baby not breathing", "damu baada ya kujifungua"). These may not semantically match the formal language used in WHO/MOHSW guideline documents, leading to poor retrieval even when the relevant content exists.

Query rewriting reformulates the user's input into a cleaner, more retrieval-friendly form before embedding and vector search.

## Approaches to Investigate

### 1. LLM-based rewriting (on-device)
Use Gemma 4 E4B itself to rewrite the query before retrieval. A lightweight prompt like:
```
Rewrite the following clinical question in formal medical language suitable for searching clinical guidelines:
User query: "baby not breathing"
Rewritten: "Neonatal resuscitation for apnea at birth"
```
**Tradeoff**: Adds one LLM inference step before retrieval — latency cost to measure.

### 2. HyDE (Hypothetical Document Embeddings)
Ask the LLM to generate a hypothetical answer, then embed that answer instead of the original query for retrieval. Often improves recall significantly.

### 3. Multi-query retrieval
Generate multiple reformulations of the query, retrieve for each, then deduplicate/merge results before passing context to the LLM.

### 4. Query expansion (lightweight, no LLM)
Append synonyms or related clinical terms to the query using a small medical ontology. No extra LLM call needed.

## Questions to Answer

- [ ] Does query rewriting meaningfully improve retrieval quality on MAM-AI's current document corpus? (Measure with Context Precision/Recall via RAGAS — see issue #33)
- [ ] What is the latency cost of an extra LLM rewriting step on-device?
- [ ] Does it help more for Swahili queries than English ones?
- [ ] Where does rewriting fit in the pipeline — before embedding in `RagPipeline.kt`, or as a preprocessing step in `RagStream.kt`?

## Notes

Query rewriting is most impactful when retrieval is the bottleneck. Recommend running the evaluation pipeline (issue #33) first to confirm retrieval quality is actually the weak point before investing in this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query rewriting: reformulate user input before retrieval to improve RAG quality #36

Background

Approaches to Investigate

1. LLM-based rewriting (on-device)

2. HyDE (Hypothetical Document Embeddings)

3. Multi-query retrieval

4. Query expansion (lightweight, no LLM)

Questions to Answer

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Query rewriting: reformulate user input before retrieval to improve RAG quality #36

Description

Background

Approaches to Investigate

1. LLM-based rewriting (on-device)

2. HyDE (Hypothetical Document Embeddings)

3. Multi-query retrieval

4. Query expansion (lightweight, no LLM)

Questions to Answer

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions