Skip to content

cahlen/convergent

Repository files navigation

Convergent

Convergent — bigcompute.science research companion

The bigcompute.science research companion — training toolkit, evaluation, and agentic inference pipeline.

Early Preview — Convergent is a work in progress, expressly trained to act as a research assistant with the bigcompute.science MCP server. The model, data, and tooling will be updated frequently as new experiments and findings are produced. Expect occasional bugs until we reach a GA release. Contributions and bug reports welcome.

Convergent is part of the bigcompute.science conjecture-driven GPU research project. It is a QLoRA fine-tuned model that connects to the bigcompute.science MCP server to reason about computational mathematics findings, write CUDA kernels, and suggest novel research directions for unsolved problems in number theory.

This repository contains the full training pipeline. The model weights and training data are hosted on HuggingFace:

Repository Description
cahlen/Convergent-7B Model weights (merged, ready to use)
cahlen/Convergent-7B-data Training dataset (5,783 entries)
cahlen/convergent This repo — training code, eval, CLI toolkit

What Convergent Can Do

  • Reason about number theory: continued fractions, Zaremba's conjecture, Hausdorff dimensions, Kronecker coefficients, Ramsey numbers, Flint Hills series
  • Scaffold CUDA kernels: generates GPU kernel structure for number theory with architecture-specific flags (sm_86 through sm_120) — output requires expert review
  • Use tools via MCP: call bigcompute.science endpoints in agentic ReAct loops (Hermes function-calling format)
  • Suggest experiments: propose novel research directions based on computational findings
  • Guide students: provide specific, actionable advice for contributing to computational number theory

Quick Start

# Install dependencies
pip install -r requirements.txt

# Run the full pipeline (generate data → train → eval)
./convergent pipeline

# Or run steps individually:
./convergent generate-blocks    # Generate training data
./convergent merge              # Merge and deduplicate
./convergent validate           # Check data quality
./convergent train              # QLoRA fine-tuning
./convergent merge-weights      # Merge LoRA into base model
./convergent eval               # Run evaluation benchmark

Using the Model

Convergent CLI

# Interactive agentic mode — connects to the MCP server for live research
./convergent agent

# Single query with tool execution
./convergent agent -q "How many Zaremba exceptions exist for digit set {1,2,3}?"

# Chat mode (no tools, just conversation)
./convergent chat

# Point to a local MCP server
./convergent agent --mcp-url http://localhost:8000

In agent mode, Convergent runs a full ReAct loop:

  1. You ask a question
  2. The model reasons and decides whether to call a tool
  3. If it outputs a <tool_call>, the CLI executes it against the MCP server
  4. The tool result is fed back to the model
  5. The model reasons about the result and either calls another tool or gives a final answer

Toolkit Commands

Inference:

Command Description
./convergent agent Interactive agentic chat with MCP server
./convergent agent -q "..." Single-query agentic mode
./convergent chat Interactive chat (no tools)

Training pipeline:

Command Description
./convergent generate-blocks Run all scripts/blocks/block_*.py to generate training data
./convergent generate-synthetic Generate synthetic CoT data from remote LLM endpoints
./convergent merge Merge all blocks, deduplicate, remove eval leaks
./convergent validate Validate dataset format, balance, and quality
./convergent stats Show dataset composition breakdown
./convergent train Run QLoRA fine-tuning on the dataset
./convergent merge-weights Merge LoRA adapter into base model weights
./convergent eval Run the 103-question custom evaluation benchmark
./convergent eval-standard Run standard benchmarks (GSM8K, ARC, MMLU)
./convergent validate-eval Validate the evaluation benchmark itself
./convergent pipeline Run the full pipeline end-to-end
./convergent add-block NAME Create a new training data block from template

The Research Flywheel

Convergent is designed for continuous improvement:

  GPU Computation → Findings → Train into Model → Reason & Discuss → New Experiments
       ↑                                                                    │
       └────────────────────────────────────────────────────────────────────┘

When bigcompute.science produces new results:

  1. Add a new block: ./convergent add-block new_findings
  2. Edit the block: Add training entries about the new findings
  3. Register it: Add the block filename to scripts/merge_dataset.py
  4. Run the pipeline: ./convergent merge && ./convergent validate && ./convergent train
  5. Evaluate: ./convergent merge-weights && ./convergent eval

Training Data Structure

The training data is organized as modular blocks in scripts/blocks/:

Block Entries Description
block_identity.py ~30 Model identity, mission, hardware
block_bcd_agent.py ~80 MCP tool definitions and agentic examples
block_tool_variations.py ~120 Tool-call format reinforcement
block_cuda_mastery.py ~40 Advanced CUDA kernel development
block_university.py ~30 Number theory from academic sources
block_erdos.py ~20 Open Erdős problems
block_reasoning.py ~25 Mathematical reasoning methodology
block_prime_convergents.py ~15 Cahlen Humphreys' paper on prime convergents
block_v10_targeted.py ~85 Targeted weak-spot reinforcement
block_v11_reinforce.py ~30 Standard math and planning reinforcement
... ... See scripts/merge_dataset.py for the full list

Synthetic data from Qwen2.5-Math-72B and Gemma-4-26B adds ~4,500 deep mathematical reasoning entries.

Total: 5,783 entries after deduplication.

Evaluation

The custom benchmark (eval/benchmark.jsonl) contains 97 questions across 20 categories:

Category Questions Description
standard_math 8 BK theorem, Hausdorff dimension, Kronecker coefficients
agentic_tool_use 8 Correct tool-call format and JSON
factual_recall 10 Exact computational findings from bigcompute.science
paper_comprehension 6 Understanding of research papers
novel_synthesis 6 Connecting findings to suggest new research
multi_turn_react 3 Full ReAct loops with THINK/ACT/OBSERVE/SYNTHESIZE
error_recovery 3 Graceful handling of tool failures
mcp_decision 2 When to call tools vs. answer from knowledge
... ... 21 categories total

Benchmark Results

Custom evaluation: 76% across 103 questions in 20 categories (includes nvcc compilation-tested CUDA scoring)

Standard benchmarks (alignment tax):

Benchmark Base Model Convergent Delta
GSM8K (5-shot) 80% 82% +2%
MMLU Math (3 subjects) 51.3% 51.3% 0%
ARC-Challenge (25-shot) 65.5% 59.5% -6%

Math reasoning improved. Math knowledge preserved. General reasoning has a 6% tax — acceptable for a specialized research model.

Known Issues

  • Use PEFT standard merge, not Unsloth merge. Unsloth's save_pretrained_merged has been observed to produce corrupted weights (degenerate/garbled output) even when the LoRA adapter itself is correct. The ./convergent merge-weights command uses PEFT standard merge by default. Only pass --unsloth if you have verified the output.

Configuration

Training config: configs/qlora.yaml

Key parameters:

  • LoRA rank: 128 (high capacity for diverse training data)
  • LoRA alpha: 256
  • Epochs: 2
  • Learning rate: 2e-4 with cosine schedule
  • Max sequence length: 4096
  • Quantization: NF4 with double quantization
  • NEFTune: noise alpha 5 for improved generalization

Environment Variables

Variable Default Description
MODEL_DIR Qwen/Qwen2.5-7B-Instruct Base model path or HuggingFace ID
MERGED_MODEL_DIR cahlen/Convergent-7B Model path (HuggingFace ID or local directory)
LORA_OUTPUT_DIR outputs/convergent-lora LoRA adapter output directory
MATH_MODEL_ENDPOINT vLLM endpoint for math model (synthetic data)
GEMMA_MODEL_ENDPOINT vLLM endpoint for Gemma model (synthetic data)
LM_EVAL_BIN lm_eval Path to lm-evaluation-harness binary

Hardware Requirements

  • Training: NVIDIA GPU with ≥ 24GB VRAM (RTX 4090, RTX 5090, A100, H100)
  • Inference: NVIDIA GPU with ≥ 16GB VRAM (merged bf16 model is ~15GB)
  • Synthetic data generation: Remote vLLM endpoints (Qwen2.5-Math-72B, Gemma-4-26B)

Project Structure

convergent/
├── convergent              # CLI toolkit entry point
├── configs/
│   └── qlora.yaml          # QLoRA training configuration
├── scripts/
│   ├── train.py            # QLoRA fine-tuning with instruction masking
│   ├── merge.py            # LoRA adapter merge
│   ├── merge_dataset.py    # Assemble all training blocks
│   ├── dedup_and_clean.py  # Deduplicate and remove eval leaks
│   ├── validate_all.py     # Comprehensive dataset validation
│   ├── final_stats.py      # Dataset composition statistics
│   ├── fix_format.py       # Tool response format correction
│   ├── fix_system_prompts.py # System prompt unification
│   ├── convert_external.py # External dataset converter (Hermes FC)
│   ├── blocks/             # Training data generators (40 block_*.py files)
│   └── data_generation/    # Synthetic data generation from remote LLMs
├── eval/
│   ├── run_benchmark.py    # Evaluation runner with specialized scorers
│   ├── benchmark.jsonl     # 97-question custom benchmark
│   └── validate_eval.py    # Benchmark self-validation
├── data/                   # Generated training data (gitignored)
├── DATA_SOURCES.md         # Documentation of all data sources
├── requirements.txt
└── LICENSE                 # MIT

Citation

@misc{humphreys2026convergent,
  author = {Humphreys, Cahlen},
  title = {Convergent: A QLoRA-tuned Research Companion for Computational Number Theory},
  year = {2026},
  url = {https://github.com/cahlen/convergent},
  note = {bigcompute.science}
}

Links


This project is maintained by a single person. If you run into issues, please file them on GitHub or HuggingFace and I will do my best to address them. I apologize in advance for any delays in response time.

About

Convergent: the bigcompute.science research companion model toolkit

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors