Finding
Under the app_parity_v1 evaluation protocol, gemma3n-e4b consistently outperforms the currently deployed gemma4-e4b on all three open-ended datasets in the no-RAG condition:
| Dataset |
gemma4-e4b |
gemma3n-e4b |
Δ |
| kenya_vignettes (n=284) |
2.76 |
3.02 |
+0.26 |
| afrimedqa_saq (n=37) |
2.57 |
3.28 |
+0.71 |
| whb_stumps (n=20) |
2.51 |
2.64 |
+0.13 |
The gap is largest on afrimedqa_saq (+0.71), driven mainly by completeness (2.49 vs 1.54) and helpfulness. On MCQ, gemma3n-e4b also edges out gemma4-e4b by 2–4 pp across all three datasets.
Open-ended performance is the deployment-relevant metric — it reflects actual clinical query quality in the app.
Question to resolve
Is there a concrete reason to keep gemma4-e4b as the deployment target over gemma3n-e4b? Candidates:
- Inference speed / TTFT on target devices
- Memory footprint
- On-device compatibility (LiteRT-LM version requirements)
- Quantization quality at E4B level
If no strong reason exists, the default should switch to gemma3n-e4b for the next pilot cohort.
References
- Eval report:
evaluation/reports/eval_report_app_parity_v1.md — §1, §3
- Run dirs:
evaluation/results/gemma4-e4b/norag-full-20260411T095630/, evaluation/results/gemma3n-e4b/norag-full-20260411T114335/
Finding
Under the
app_parity_v1evaluation protocol, gemma3n-e4b consistently outperforms the currently deployed gemma4-e4b on all three open-ended datasets in the no-RAG condition:The gap is largest on afrimedqa_saq (+0.71), driven mainly by completeness (2.49 vs 1.54) and helpfulness. On MCQ, gemma3n-e4b also edges out gemma4-e4b by 2–4 pp across all three datasets.
Open-ended performance is the deployment-relevant metric — it reflects actual clinical query quality in the app.
Question to resolve
Is there a concrete reason to keep gemma4-e4b as the deployment target over gemma3n-e4b? Candidates:
If no strong reason exists, the default should switch to gemma3n-e4b for the next pilot cohort.
References
evaluation/reports/eval_report_app_parity_v1.md— §1, §3evaluation/results/gemma4-e4b/norag-full-20260411T095630/,evaluation/results/gemma3n-e4b/norag-full-20260411T114335/