Skip to content

HealthML/neuro-symbolic-clinical-ai

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Cognitive Hub: Neuro-Symbolic Clinical Reasoning Suite

License: MIT Python 3.10+ Platform Hardware

Master's Project Framework
Robust Clinical Decision Making via Adversarial Tree-of-Thoughts (ToT) & Hybrid Entity-Aware Retrieval (HEAR)


📖 Executive Summary

Cognitive Hub is a forensic evaluation framework designed to stress-test Large Language Models (LLMs) in high-stakes clinical scenarios. Unlike standard benchmarks that measure static knowledge, this system evaluates dynamic reasoning capabilities under adversarial conditions.

It integrates three novel architectures:

  1. ACV-ToT (Adversarial Check-Verify Tree of Thoughts): A reasoning engine that actively debates its own decisions using a "Bicameral" agent topology (Proposer vs. Adversary).
  2. HEAR (Hybrid Entity-Aware Retrieval): A retrieval engine designed to mitigate "Lost-in-the-Middle" phenomena by boosting clinical entities (drugs, anatomy) and critical temporal markers.
  3. GENESIS (Procedural Data Engine): A synthetic data generator that creates infinite, unique clinical "Needle-in-a-Haystack" scenarios to test robustness against hallucinations and context overflow.

The entire pipeline is orchestrated by Atlas v28, an autonomous hypervisor that optimizes workload distribution across heterogeneous HPC clusters (x86_64/ARM64).


⚡ Key Capabilities

Component Technology Function
Reasoning Tree of Thoughts (ToT) Performs BFS/DFS search over reasoning paths. Uses Renyi Entropy to dynamically expand/contract search beam width based on uncertainty.
Retrieval Tri-Vector HyDE Generates 3 hypothetical documents to expand query space. Fuses Dense (Vector) and Sparse (BM25) results using Reciprocal Rank Fusion (RRF).
Safety Reflexion Loops If a decision is flagged as unsafe, the model enters a self-correction loop, simulating a "Tumor Board" debate to repair the plan.
Hardware Adaptive HAL Hardware Abstraction Layer that detects GPU architecture (Ampere/Hopper/Blackwell) and auto-tunes precision (BF16/TF32) and Attention kernels (FlashAttn-2).
Forensics Deep Telemetry Logs power (Joules/Token), VRAM usage, and decision branching entropy to jsonl journals for post-hoc analysis.

📂 Repository Structure

neuro-symbolic-clinical-ai/
├── data/                       # Dataset Storage
│   └── golden_dataset.json     # (Auto-Generated) Adversarial clinical cases
├── logs/                       # Telemetry & Execution Logs
│   ├── atlas_history.csv       # Job submission audit trail
│   └── ...                     # Per-job stdout/stderr logs
├── models/                     # Local Weight Storage (GitIgnored)
├── scripts/                    # HPC Automation Tools
│   ├── download_models.py      # Rust-accelerated artifact downloader
│   ├── setup_and_download.sh   # One-click installer
│   ├── setup_env.sh            # Virtual environment builder
│   └── universal_launch.sh     # ATLAS HYPERVISOR (The Orchestrator)
├── src/                        # Core Application Logic
│   ├── benchmark.py            # Main execution kernel (Nexus Orchestrator)
│   ├── data_generator.py       # GENESIS Engine (Synthetic Data)
│   ├── rag_engine.py           # Legacy RAG implementation
│   ├── semantic_rag.py         # HEAR Engine (Hybrid/Semantic Retrieval)
│   ├── tot_engine.py           # Cognitive Reasoning Engine (ToT/Reflexion)
│   └── utils.py                # Hardware Abstraction Layer (HAL)
└── requirements.txt            # Python dependencies

🚀 Installation & Setup

Prerequisites

  • Access to a Slurm-based Cluster (Compute Nodes).
  • Internet access on the Run Node (for downloading weights).
  • Hugging Face Access Token (Required for the gated Mistral-7B-Instruct-v0.3 model).

1. Permissions & One-Click Deployment

You must grant execution permissions to the scripts before running them. Then, run the master setup script.

# 1. Enter the directory
cd neuro-symbolic-clinical-ai

# 2. Grant permissions (CRITICAL STEP)
chmod +x scripts/*.sh

# 3. Run the Auto-Installer
./scripts/setup_and_download.sh

The script will ask for your Hugging Face Token. Paste it when prompted (input will be hidden).

2. Manual Setup (Alternative)

If you prefer manual control or need to debug the installation:

# 1. Build Environment
./scripts/setup_env.sh

# 2. Activate
source gh200_env/bin/activate

# 3. Download Models (Requires HF_TOKEN env var)
export HF_TOKEN="your_token_here"
python scripts/download_models.py

🧪 Running Experiments (The Atlas Hypervisor)

IMPORTANT: Do not run python src/benchmark.py directly on the login node. It requires a GPU. Use the Atlas Hypervisor, which handles node selection, memory compliance, and self-termination.

./scripts/universal_launch.sh

The Atlas Dashboard

When launched, Atlas scans the cluster and offers strategies:

  1. AUTO-PILOT (God Mode): Automatically finds the most powerful idle GPU (prioritizing B200 > GH200 > A100) and submits the job with optimal parameters.
  2. Force A100/B200: Manually target specific architectures for benchmarking consistency.
  3. Debug Mode: Generates the Slurm script but does not submit it, allowing for manual inspection.

What happens next?

  1. Data Check: If data/golden_dataset.json is missing, Atlas spins up the GENESIS Engine on the compute node to generate 100 fresh adversarial cases.
  2. Execution: The NEXUS Kernel (src/benchmark.py) loads the model and iterates through the cases.
  3. Journaling: Results are streamed to results/nexus_[timestamp].jsonl.
  4. Teardown: The job automatically cancels itself (scancel) upon completion to save compute credits.

Monitoring Logs

Once the job is submitted, Atlas will give you a command to monitor output. It looks like this:

tail -f logs/NSym_blackwell_[JOB_ID].out

⚙️ Scientific Configuration

To tweak the experiment, modify the hyperparameter dictionaries at the top of the source files.

Reasoning Parameters (src/tot_engine.py):

CONFIG = {
    "max_reflexion_retries": 1,      # How many times to argue with the adversary
    "base_beam_width": 2,            # How many reasoning branches to explore parallel
    "max_beam_width": 4,             # Cap for adaptive expansion
    "entropy_threshold_high": 0.85,  # Trigger for widening beam (Confusion)
    "debate_rounds": 1               # Depth of the debate tree
}

Context & Hardware (src/utils.py & src/benchmark.py):

  • Context Window: Defaults to 32k for High-Spec GPUs, auto-downgrades to 8k for Legacy GPUs.
  • Precision: Automatically selects bfloat16 for Ampere+ and float16 for Volta.
  • JIT: torch.compile is disabled by default to prevent recompilation latency on dynamic input lengths.

📊 Evaluation & Metrics

The system produces a jsonl journal containing forensic details for every case:

  1. Factual Accuracy: Does the answer match the Gold Standard?
  2. Safety Score (CoVe): Did the Chain-of-Verification loop flag any risks?
  3. Uncertainty: Semantic Entropy score derived from parallel generations.
  4. Joules/Token: Energy efficiency of the reasoning process.
  5. Hallucinations: Extraction of numerical values present in the answer but absent in the source context.

⚠️ Troubleshooting

1. "Permission Denied"

  • Cause: Scripts lost executable flags during transfer.
  • Solution: Run chmod +x scripts/*.sh.

2. "MistralForCausalLM does not support len()"

  • Cause: Interaction between JIT compilation and Python boolean checks.
  • Solution: This codebase handles it by using explicit is not None checks. Ensure you are using the latest src/utils.py.

3. "RuntimeError: Expected all tensors to be on the same device"

  • Cause: RAG embeddings generated on CPU while Model is on GPU.
  • Solution: The HEAR engine in src/semantic_rag.py now dynamically maps inputs to self.model.device.

4. Job Timeout

  • Cause: Tree of Thoughts is computationally expensive (O(b^d)).
  • Solution: universal_launch.sh now requests 12 hours (12:00:00) for robust runs.

📜 Citation

If you utilize this framework, please cite:

@software{CognitiveHub2026,
  author = {Cognitive Hub Research Team},
  title = {Cognitive Hub: A Neuro-Symbolic Clinical Reasoning Suite},
  year = {2026},
  institution = {HPI},
  version = {28.0.0}
}

About

Clinical Decision Making via Neuro-Symbolic AI: designed to evaluate and enhance the reasoning capabilities of Large Language Models (LLMs)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 89.7%
  • Shell 10.3%