Name	Name	Last commit message	Last commit date
parent directory ..
data	data
models	models
prompts	prompts
scripts	scripts
src	src
tests	tests
DEPRECATED.md	DEPRECATED.md
MEMORY_TESTING.md	MEMORY_TESTING.md
QUICK_START.md	QUICK_START.md
README.md	README.md
RECOMMENDED_CONFIGS.md	RECOMMENDED_CONFIGS.md
TESTING.md	TESTING.md
TRAINING_GUIDE.md	TRAINING_GUIDE.md
coverage.xml	coverage.xml
pyproject.toml	pyproject.toml
pytest.ini	pytest.ini
requirements.txt	requirements.txt

Empirical Distrust Training for LLMs

Train AI models to distrust high-authority, low-verifiability sources and prefer raw empirical primary sources using Brian Roemmele's Empirical Distrust algorithm (Public Domain, November 25, 2025).

What Is This?

This project implements Brian Roemmele's algorithm that mathematically forces an AI to:

Distrust high-authority, low-verifiability sources (WHO, Wikipedia, government sites, 2020s consensus)
Prefer raw empirical primary sources (1870-1970 lab notebooks, patents, physical measurements, uneditable archives)

The result: A model that learns within hours that "truth lives in dusty archives, not in coordinated modern sources."

The Algorithm

Brian Roemmele's Conceptual Formula

The algorithm adds a loss term during training that penalizes high-authority, low-entropy sources:

L_empirical = α × ‖ln(1 - w_auth) + H_prov‖²

Where:
  w_auth  ∈ [0.0, 0.99]  : authority weight (0 = primary source, 0.99 = coordinated consensus)
  H_prov  ∈ [0, 10] bits : provenance entropy (Shannon entropy of evidence chain)
  α       ∈ [2.3, 3.0]   : truth weight multiplier (Brian recommends 2.7)

This creates a 30× reward multiplier for pre-1970 primary sources compared to modern coordinated sources.

Why It Works

Source Type	w_auth	H_prov	Loss Contribution
1923 Patent	0.05	7.5 bits	~150 × α (REWARDED)
2024 Wikipedia	0.90	1.0 bit	~4.6 × α (PENALIZED)

Ratio: 150 / 4.6 ≈ 32× — The model learns that primary sources are "higher value" training data.

Brian's Original PyTorch Implementation

Brian released the algorithm as PyTorch code on November 25, 2025:

import torch

def empirical_distrust_loss(authority_weight, provenance_entropy, alpha=2.7):
    distrust_component = torch.log(1.0 - authority_weight + 1e-8) + provenance_entropy
    L_empirical = alpha * torch.norm(distrust_component) ** 2
    return L_empirical

This Implementation (MLX for Apple Silicon)

We adapted Brian's PyTorch code for Apple's MLX framework:

import mlx.core as mx

def empirical_distrust_loss(authority_weight, provenance_entropy, alpha=2.7):
    distrust_component = mx.log(1.0 - authority_weight + 1e-8) + provenance_entropy
    L_empirical = alpha * mx.sum(mx.square(distrust_component))
    return L_empirical

Changes from PyTorch to MLX:

torch.log() → mx.log() (MLX array operations)
torch.norm(x) ** 2 → mx.sum(mx.square(x)) (equivalent: sum of squares)
The 1e-8 epsilon is unchanged from Brian's original

See docs/ALGORITHM.md for the complete technical documentation.

Quick Start

Hardware Requirements

Tier	Mac	RAM	Disk	Recommended Model
Large	M2/M3/M4 Ultra	96GB+	40-50GB	`Hermes-7B` (fast) or `r1-distill-70b`
Medium	M2/M3 Pro/Max	32GB	18-25GB	`Hermes-7B` or `r1-distill-14b`
Entry	M1/M2/M3 base	16GB	5-8GB	`Hermes-7B` or `dolphin-8b`

Note: Start with 7B models (NousResearch/Hermes-2-Pro-Mistral-7B) - they're fast and work on all tiers.

Installation

Note: All commands below assume you're in the python/ directory. Start by navigating there:

cd python
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Training Pipeline

# 1. Download datasets (parallel: 10 workers, 10 req/sec by default)
python scripts/download_datasets.py --output data/raw --max-samples 30000

# 2. Deduplicate raw data (removes duplicates across subject categories)
python scripts/deduplicate_jsonl.py "data/raw/*.jsonl" --key identifier

# 3. Analyze data quality before processing
python scripts/analyze_jsonl.py "data/raw/*_deduped.jsonl"

# 4. Prepare training data
python src/prepare_data_curated.py --input data/raw --output data \
  --train-size 80000 --val-size 20000

# 5. Find optimal settings for YOUR hardware (one-time, 20-40 minutes)
# NEW (v0.2.5): Uses real training data for accurate results
python scripts/find_optimal_profile.py --model NousResearch/Hermes-2-Pro-Mistral-7B

# 6. Train with the benchmarked configuration
# Use the exact settings reported by benchmark (e.g., batch=12, rank=128, layers=16)
python src/train_qlora.py \
  --model NousResearch/Hermes-2-Pro-Mistral-7B \
  --batch-size 12 \
  --lora-rank 128 \
  --lora-layers 16

# 7. Monitor training in real-time with TensorBoard
tensorboard --logdir models/distrust-hermes-2-pro-mistral-7b/logs
# Open browser to http://localhost:6006/

# 8. Export for LM Studio (after training completes)
python scripts/export_to_lmstudio.py \
  --base-model NousResearch/Hermes-2-Pro-Mistral-7B \
  --lora-path models/distrust-hermes-2-pro-mistral-7b \
  --output models/distrust-hermes-2-pro-mistral-7b-merged

Proven Safe Configuration (M3 Ultra 96GB)

For NousResearch/Hermes-2-Pro-Mistral-7B (tested with real training):

# PROVEN SAFE: Tested with real data, distrust loss, full training
python src/train_qlora.py \
  --model NousResearch/Hermes-2-Pro-Mistral-7B \
  --batch-size 17 \
  --lora-rank 128 \
  --lora-layers 16 \
  --max-steps 5000 \
  --lambda-weight 0.05 \
  --warmup-steps 200 \
  --max-grad-norm 0.5

Note:

Lambda weight is auto-calibrated but you can override with --lambda-weight
Warmup prevents loss explosions (implemented in v0.2.5)
Run python scripts/find_optimal_profile.py to find YOUR optimal settings

Real-Time Training Monitoring

All training runs automatically log metrics to TensorBoard:

# View training metrics in real-time
tensorboard --logdir models/distrust-hermes-2-pro-mistral-7b/logs

# Open browser to: http://localhost:6006/

Tracked Metrics:

Loss curves (total, cross-entropy, distrust)
Learning rate schedule
Gradient norms
Memory usage

Each run creates a timestamped subdirectory so you can compare multiple experiments.

For complete step-by-step instructions, see TRAINING_GUIDE.md.

For memory optimization, see MEMORY_TESTING.md.

For data quality workflow details, see docs/DATA_PREPARATION_REALITY.md.

Target Data Distribution

The algorithm requires balanced authority levels:

Category	Target %	Authority Range	Purpose
Low Authority (Primary)	25-30%	0.03-0.20	Sources model should TRUST
Medium Authority (Academic)	25-35%	0.40-0.65	Academic middle ground
High Authority (Modern)	35-40%	0.75-0.95	Coordinated sources for CONTRAST

Project Structure

Note: This shows the structure inside the python/ subdirectory of the monorepo.

python/                       # Python implementation subdirectory
├── src/
│   ├── distrust_loss.py      # Core algorithm implementation
│   ├── citation_scorer.py    # Authority/entropy calculation
│   ├── train_qlora.py        # QLoRA training with distrust loss
│   ├── prepare_data_curated.py # Data preparation pipeline
│   └── config.py             # Configuration classes
├── scripts/
│   ├── download_datasets.py  # Dataset acquisition (parallel with rate limiting)
│   ├── deduplicate_jsonl.py  # Remove duplicates from JSONL files
│   ├── analyze_jsonl.py      # Data quality assessment
│   ├── validate_model.py     # Model validation tests
│   ├── evaluate.py           # Quantitative evaluation
│   ├── find_optimal_profile.py # Hardware benchmark tool
│   └── export_to_lmstudio.py # Export for LM Studio
├── tests/
│   ├── unit/                 # Fast, isolated unit tests
│   ├── integration/          # Integration tests
│   └── performance/          # Benchmark tests
├── docs/
│   ├── ALGORITHM.md          # Deep technical documentation
│   ├── CURATED_DATASETS.md   # Dataset details
│   └── DATA_PREPARATION_REALITY.md # Data quality & workflow notes
├── data/                     # Training data directory (created by setup)
├── models/                   # Model checkpoints (created during training)
├── requirements.txt          # Python dependencies
├── TRAINING_GUIDE.md         # Complete training guide
└── README.md                 # This file

Documentation

Document	Purpose
TRAINING_GUIDE.md	Complete start-to-finish training guide
CONTRIBUTING.md	Guidelines for contributors
docs/ALGORITHM.md	Technical deep dive on the algorithm
docs/CURATED_DATASETS.md	Dataset sources and provenance
docs/DATA_PREPARATION_REALITY.md	Honest notes on data quality

Model Validation Results

We evaluate models using both custom validation tests (48 tests) and external benchmarks (TruthfulQA: 817 questions) to ensure reproducibility and standardization.

Methodology: See docs/BENCHMARK_METHODOLOGY.md for detailed evaluation protocols.

Custom Validation Scores

Model	CCP Censorship	Western Censorship	Authority Bias	Overall
Hermes 7B	91.7%	100%	79.2%	87.5%
Llama 8B abliterated	100%	100%	75.0%	87.5%
Dolphin 8B	100%	100%	70.8%	85.4%
DeepSeek 14B (Chinese)	50%	100%	70.8%	72.9%
Distrust fine-tuned	41.7%	100%	58.3%	64.6%

Interpretation

Outer ring = better (higher pass rates)
Western models (Hermes, Dolphin, Llama) show strong censorship resilience across both CCP and Western topics
Chinese-origin models (DeepSeek) exhibit corpus-level CCP censorship that persists even after abliteration
Fine-tuned checkpoint inherits base model limitations but shows training progress on authority bias

Validation Suite

Custom Tests (project-specific):

CCP Censorship (12 tests): Tiananmen, Taiwan, Tibet, Uyghurs, Hong Kong, etc.
Western Censorship (12 tests): Controversial historical events, whistleblowers, policy criticism
Authority Bias (24 tests): Source preference (8 multiple choice) + skepticism expression (16 semantic)

External Benchmarks (standardized):

TruthfulQA: 817 questions testing resistance to misconceptions and false authority
CensorBench: ~500 prompts for censorship resistance (integration in progress)

Run custom validation:

python scripts/validate_model.py -m "NousResearch/Hermes-2-Pro-Mistral-7B" -o results/validation.json

Run with external benchmarks:

python scripts/validate_model.py -m "model-name" --benchmarks truthfulqa -o results/full_eval.json

Or run benchmarks separately:

python scripts/run_benchmarks.py -m "model-name" --benchmarks truthfulqa -o results/benchmark.json

See docs/BASE_MODEL_SELECTION.md for detailed analysis and docs/BENCHMARK_METHODOLOGY.md for evaluation protocols.

Script Organization

The project has been reorganized for clarity. Here's what you should use:

Data Preparation

Use: src/prepare_data_curated.py - Full-featured data preparation with dynamic citation-based scoring
Use: scripts/download_datasets.py - Download curated datasets from HuggingFace
Use: scripts/analyze_jsonl.py - Analyze data quality
Use: scripts/deduplicate_jsonl.py - Remove duplicates

Model Training & Evaluation

Use: src/train_qlora.py - Main training script
Use: scripts/validate_model.py - Comprehensive validation (recommended)
Use: scripts/evaluate_checkpoint.py - Evaluate LoRA checkpoints
Use: scripts/evaluate_prompt.py - Structured prompt evaluation

Optimization & Utilities

Use: scripts/find_optimal_profile.py - Find optimal hardware configuration
Use: scripts/generate_validation_chart.py - Generate validation radar charts
Use: scripts/export_to_lmstudio.py - Export trained models

Deprecated Files

Some files have been deprecated as of v0.3.0:

~~scripts/evaluate.py~~ → Use scripts/validate_model.py instead
~~src/prepare_data.py~~ → Use src/prepare_data_curated.py instead
~~src/prepare_data_improved.py~~ → Use src/prepare_data_curated.py instead

See DEPRECATED.md for detailed migration guidance.

Results Organization

All validation and evaluation results are now stored in the results/ directory to keep the project root clean.

Credits

Algorithm: Brian Roemmele (Public Domain, November 25, 2025)

Implementation: This repository

Base Models:

DeepSeek-AI (DeepSeek-R1, R1-Distill)
huihui-ai (abliterated versions)
mlabonne (Llama abliterated)
NousResearch (Hermes)
Cognitive Computations (Dolphin)

Framework: Apple MLX

License

The Empirical Distrust algorithm is public domain – no license, no restrictions, no copyright.

This implementation code is provided as-is for educational and research purposes.

Citation

Brian Roemmele (2025). "Empirical Distrust Term for AI Training"
Public domain algorithm released November 25, 2025.
https://x.com/BrianRoemmele/status/1993393673451847773

Remember: The goal is to create AI that prefers verifiable empirical evidence over coordinated modern narratives. Truth lives in archives, not in consensus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Empirical Distrust Training for LLMs

What Is This?

The Algorithm

Brian Roemmele's Conceptual Formula

Why It Works

Brian's Original PyTorch Implementation

This Implementation (MLX for Apple Silicon)

Quick Start

Hardware Requirements

Installation

Training Pipeline

Proven Safe Configuration (M3 Ultra 96GB)

Real-Time Training Monitoring

Target Data Distribution

Project Structure

Documentation

Model Validation Results

Custom Validation Scores

Interpretation

Validation Suite

Script Organization

Data Preparation

Model Training & Evaluation

Optimization & Utilities

Deprecated Files

Results Organization

Credits

License

Citation

FilesExpand file tree

python

Directory actions

More options

Directory actions

More options

Latest commit

History

python

Folders and files

parent directory

README.md

Empirical Distrust Training for LLMs

What Is This?

The Algorithm

Brian Roemmele's Conceptual Formula

Why It Works

Brian's Original PyTorch Implementation

This Implementation (MLX for Apple Silicon)

Quick Start

Hardware Requirements

Installation

Training Pipeline

Proven Safe Configuration (M3 Ultra 96GB)

Real-Time Training Monitoring

Target Data Distribution

Project Structure

Documentation

Model Validation Results

Custom Validation Scores

Interpretation

Validation Suite

Script Organization

Data Preparation

Model Training & Evaluation

Optimization & Utilities

Deprecated Files

Results Organization

Credits

License

Citation