A GUI-based clinical decision-support tool that combines Google Gemini LLM intelligence with a MYCIN-inspired symbolic inference engine to differentiate four endemic West African febrile illnesses: Malaria, Typhoid, Dengue, and Lassa Fever.
- Overview
- Architecture
- How It Works
- Project Structure
- Setup & Installation
- Usage
- Comparative Benchmarking
- Knowledge Base
- CF Mathematics
- Audit Trail & Explainability
- Configuration
- Troubleshooting
- Disclaimer
This system implements a neuro-symbolic approach to medical diagnosis:
| Layer | Role | Technology |
|---|---|---|
| Neural | Understands free-text patient complaints and extracts structured symptoms | Google Gemini 2.5 Flash |
| Unification | Maps neural output onto the symbolic knowledge base vocabulary | Python rule-based mapping |
| Symbolic | Performs evidence-weighted inference using certainty factor mathematics | MYCIN-based forward & backward chaining |
| Explanation | Produces a full audit trail of every rule fired and calculation made | JSON audit trail + formatted output |
The system is not a black box β every diagnostic conclusion is fully traceable through its audit trail.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Patient Complaint β
β "I have a pounding headache and pain β
β behind my eyes, and I feel very weak" β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β NEURAL LAYER β
β (Gemini 2.5 Flash) β
β β
β Extracts symptoms as β
β structured JSON with β
β certainty factors β
βββββββββββββ¬βββββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β UNIFICATION LAYER β
β β
β Maps neural output β
β β KB vocabulary β
β Logs unmapped to file β
βββββββββββββ¬βββββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β INFERENCE ENGINE β
β (Evidence-Weighted β
β DAG) β
β β
β Forward Chaining βββΊ β
β Backward Chaining βββΊ β
β MYCIN CF Maths β
βββββββββββββ¬βββββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β EXPLANATION β
β FACILITY β
β β
β Audit trail (JSON) β
β Diagnostic report β
β CF bar charts β
ββββββββββββββββββββββββββ
The patient types a free-text complaint. Gemini parses it into a strict JSON dictionary:
Patient: "I've had a terrible headache and my eyes hurt behind them, and I feel really weak"
{
"headache": 0.95,
"retro_orbital_pain": 0.90,
"general_weakness": 0.85
}The extracted symptoms are matched against the 23 recognised symptom keys in knowledge_base.json. Any valid but unrecognised symptoms are logged to unmapped_symptoms.log for future KB expansion.
The engine iterates every rule. When all conditions of a rule are present in the evidence, the rule fires. Multiple rules supporting the same disease are combined using parallel (incremental evidence) combination.
If the highest-confidence diagnosis is below the threshold (default: 0.40), the engine identifies the top hypothesis, finds which symptoms are missing from its rules, and asks the user directly:
β€ Do you have 'Severe Bone Joint Pain'? (Enter CF 0.0β1.0, or 0 if absent):
It then re-runs forward chaining with the augmented evidence set.
Every rule that fired, the symptoms that triggered it, and the exact mathematical calculation are recorded and printed.
ExpertSystemforMedDiag/
β
βββ app.py # Flask Web Server β API & Frontend Handler
βββ main.py # CLI Entry point β thin launcher
βββ evaluate.py # Batch evaluator for comparative benchmarking
βββ knowledge_base.json # Decoupled disease rules & symptom vocabulary
βββ requirements.txt # Python dependencies
βββ .env # Your Gemini API key (create from .env.example)
βββ .env.example # Template for the .env file
βββ vignettes_dataset.csv # Benchmark dataset (expected by evaluate.py)
βββ evaluation_results.json # Generated benchmark metrics output (after evaluation)
βββ unmapped_symptoms.log # Auto-generated log of unrecognised symptoms
βββ README.md # This file
β
βββ templates/ # HTML Templates (Single Page App)
β βββ index.html # The Web UI
β
βββ static/ # Frontend Assets
β βββ app.js # Frontend Logic (API calls, DAG rendering)
β βββ style.css # Custom Styles
β
βββ engine/ # Core package (clean architecture)
βββ __init__.py # Public API exports
βββ __main__.py # Allows: python -m engine
βββ config.py # Constants, paths, .env loading, Gemini prompt
βββ knowledge_base.py # KnowledgeBase class (loads & queries the KB)
βββ neural_layer.py # GeminiNeuralLayer (Gemini API integration)
βββ unification.py # UnificationLayer (neural β symbolic mapping)
βββ inference.py # InferenceEngine (forward/backward chaining)
βββ explanation.py # ExplanationFacility (audit trail & reporting)
βββ orchestrator.py # run() β main consultation loop
| Module | Class / Function | Responsibility |
|---|---|---|
config.py |
β | Loads .env, defines all constants, paths, thresholds, and the Gemini system prompt |
knowledge_base.py |
KnowledgeBase |
Loads knowledge_base.json, exposes disease rules and symptom vocabulary |
neural_layer.py |
GeminiNeuralLayer |
Sends complaints to Gemini, parses the structured JSON response, handles retries |
unification.py |
UnificationLayer |
Splits neural output into mapped (in KB) and unmapped symptoms; logs unmapped |
inference.py |
InferenceEngine |
Forward & backward chaining with MYCIN CF series/parallel combination |
explanation.py |
ExplanationFacility |
Formats the audit trail as JSON and a human-readable report |
orchestrator.py |
run() |
Wires all components together in the interactive consultation loop |
- Python 3.13+
- A Google Gemini API key β get one free at Google AI Studio
# 1. Clone the repository
git clone https://github.com/Mopheshi/ExpertSystemforMedDiag.git
cd ExpertSystemforMedDiag
# 2. Create and activate a virtual environment
python -m venv .venv
# Windows PowerShell:
.venv\Scripts\Activate.ps1
# macOS / Linux:
# source .venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Set up your API key (ONE-TIME setup)
copy .env.example .env
# Then edit .env and replace 'your-api-key-here' with your actual keyπ‘ You only set the API key once. It's stored in
.envand loaded automatically every time you run the program.
Option 1: Web Interface (Recommended)
python app.py
# Open http://127.0.0.1:5000 in your browserOption 2: CLI Mode
python main.py
# or
python -m engineββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
NEURO-SYMBOLIC MEDICAL EXPERT SYSTEM
Endemic West African Febrile Illness Differentiator
(Malaria Β· Typhoid Β· Dengue Β· Lassa Fever)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Knowledge Base loaded: KnowledgeBase(diseases=['malaria', ...], rules=8, symptoms=23)
β Gemini Neural Layer initialised (model: gemini-2.5-flash)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Enter patient complaint (or 'quit' to exit):
β€ I have a pounding headache and pain behind my eyes, with severe joint pain
β³ Sending complaint to Gemini for symptom extractionβ¦
β Gemini extracted 3 symptom(s): {'headache': 0.95, 'retro_orbital_pain': 0.90, ...}
β Mapped to KB: 3 symptom(s)
β³ Running forward chainingβ¦
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DIAGNOSTIC REPORT β Neuro-Symbolic Expert System
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βΈ FINAL DISEASE CERTAINTY FACTORS:
Dengue [ββββββββββββββββββββββββββββββ] 0.8100
Malaria [ββββββββββββββββββββββββββββββ] 0.0000
Typhoid [ββββββββββββββββββββββββββββββ] 0.0000
Lassa Fever [ββββββββββββββββββββββββββββββ] 0.0000
β¦ MOST LIKELY DIAGNOSIS: DENGUE (CF = 0.8100)
Type quit, exit, or q to end the session.
The project now includes evaluate.py for side-by-side empirical comparison of three systems on the same vignette dataset:
- Neuro-Symbolic (full pipeline)
- Pure LLM (direct Gemini diagnosis)
- Classical Rule-Based (keyword matching + symbolic inference)
# Uses default dataset: vignettes_dataset.csv
python evaluate.py
# Explicit dataset path
python evaluate.py --dataset vignettes_dataset.csv --delay 2.0
# Offline/low-cost mode: skip direct LLM baseline
python evaluate.py --dataset vignettes_dataset.csv --skip-llmevaluate.py expects these CSV columns:
vignette_idtrue_labelvignette_text
- Per-system classification reports in terminal
- Side-by-side precision/recall/F1 comparison table in terminal
- LaTeX table block for paper inclusion (printed in terminal)
- Reproducibility artefact saved to
evaluation_results.json
The knowledge base (knowledge_base.json) is a decoupled, editable JSON file containing:
- 4 diseases: Malaria, Typhoid, Dengue, Lassa Fever
- 8 evidence-based rules (2β3 per disease) with pathognomonic symptom sets
- 23 unique symptom keys in a controlled vocabulary
| Malaria | Typhoid | Dengue | Lassa Fever |
|---|---|---|---|
high_temperature |
gradually_increasing_high_fever |
sudden_high_fever_40C |
slight_fever |
chills_and_rigors |
abdominal_pain |
retro_orbital_pain |
facial_and_neck_swelling |
heavy_sweating |
constipation |
severe_bone_joint_pain |
general_weakness |
cyclical_fever_48h |
persistent_fever |
fever |
|
headache |
rose_spots_rash_on_trunk |
deafness_or_hearing_loss |
|
jaundice_yellow_eyes |
extreme_tiredness |
chest_pain |
|
mucosal_bleeding_eyes_gums |
|||
difficulty_breathing |
Edit knowledge_base.json directly. Each rule follows this schema:
{
"id": "R10",
"hypothesis": "disease_name",
"conditions": [
{"symptom": "symptom_key_1"},
{"symptom": "symptom_key_2"}
],
"rule_cf": 0.85
}The inference engine implements the classic MYCIN certainty factor formulas:
When a rule requires multiple symptoms, the minimum symptom CF is used (weakest-link principle), then multiplied by the rule's base confidence:
When multiple rules support the same disease, their CFs are combined using the remaining uncertainty margin β ensuring the combined CF never exceeds 1.0:
Rule R05 fires for Dengue:
Symptoms: sudden_high_fever_40C=0.90, retro_orbital_pain=0.85, severe_bone_joint_pain=0.80
Series: min(0.90, 0.85, 0.80) Γ 0.90 = 0.7200
Parallel: 0.0000 + 0.7200 Γ (1 β 0.0000) = 0.7200
The system is not a black box. Every consultation produces:
- Extracted symptoms with their certainty factors
- Unmapped symptoms logged for KB expansion
- Rule firing trace β which rules fired, which symptoms triggered them, and the exact maths
- CF bar chart β visual comparison of all disease hypotheses
- Full JSON audit trail β machine-readable record of the entire inference
[
{
"rule_id": "R05",
"hypothesis": "dengue",
"matched_symptoms": {
"sudden_high_fever_40C": 0.90,
"retro_orbital_pain": 0.85,
"severe_bone_joint_pain": 0.80
},
"min_symptom_cf": 0.80,
"rule_base_cf": 0.90,
"series_cf": 0.7200,
"previous_disease_cf": 0.0000,
"new_disease_cf": 0.7200,
"formula_series": "min(0.90, 0.85, 0.80) Γ 0.90 = 0.7200",
"formula_parallel": "0.0000 + 0.7200 Γ (1 β 0.0000) = 0.7200"
}
]All tunable parameters live in engine/config.py:
| Parameter | Default | Description |
|---|---|---|
BACKWARD_CHAIN_THRESHOLD |
0.4 |
CF below which backward chaining is triggered |
GEMINI_MODEL |
gemini-2.5-flash |
Gemini model to use for symptom extraction |
GEMINI_MAX_RETRIES |
3 |
Max retry attempts on rate-limit (429) errors |
GEMINI_INITIAL_BACKOFF_SECS |
30.0 |
Initial wait time before first retry (doubles each attempt) |
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
β | Your Google Gemini API key. Set in .env file. |
| Error | Cause | Solution |
|---|---|---|
GEMINI_API_KEY environment variable is not set |
Missing .env file or key |
Copy .env.example to .env and add your key |
429 RESOURCE_EXHAUSTED |
Free-tier Gemini quota used up | Wait for daily reset or upgrade at ai.dev/rate-limit |
Knowledge base not found |
knowledge_base.json missing or moved |
Ensure it's in the project root directory |
Dataset not found: vignettes_dataset.csv |
Incorrect file path or wrong working directory | Run from project root, or pass an absolute path to --dataset |
No rules were triggered |
Extracted symptoms don't match any full rule set | The system will attempt backward chaining to gather more evidence |
This is a decision-support tool for educational and research purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. A qualified clinician must confirm any diagnosis. Always seek the advice of your physician or other qualified health provider with any questions you may have regarding a medical condition.
Built with π§ Neuro-Symbolic AI β where neural understanding meets symbolic reasoning.