The bigcompute.science research companion — training toolkit, evaluation, and agentic inference pipeline.
Early Preview — Convergent is a work in progress, expressly trained to act as a research assistant with the bigcompute.science MCP server. The model, data, and tooling will be updated frequently as new experiments and findings are produced. Expect occasional bugs until we reach a GA release. Contributions and bug reports welcome.
Convergent is part of the bigcompute.science conjecture-driven GPU research project. It is a QLoRA fine-tuned model that connects to the bigcompute.science MCP server to reason about computational mathematics findings, write CUDA kernels, and suggest novel research directions for unsolved problems in number theory.
This repository contains the full training pipeline. The model weights and training data are hosted on HuggingFace:
| Repository | Description |
|---|---|
| cahlen/Convergent-7B | Model weights (merged, ready to use) |
| cahlen/Convergent-7B-data | Training dataset (5,783 entries) |
| cahlen/convergent | This repo — training code, eval, CLI toolkit |
- Base model: Qwen/Qwen2.5-7B-Instruct
- License: MIT (code), Apache 2.0 (model), CC-BY-4.0 (data)
- MCP Server: mcp.bigcompute.science
- Reason about number theory: continued fractions, Zaremba's conjecture, Hausdorff dimensions, Kronecker coefficients, Ramsey numbers, Flint Hills series
- Scaffold CUDA kernels: generates GPU kernel structure for number theory with architecture-specific flags (sm_86 through sm_120) — output requires expert review
- Use tools via MCP: call bigcompute.science endpoints in agentic ReAct loops (Hermes function-calling format)
- Suggest experiments: propose novel research directions based on computational findings
- Guide students: provide specific, actionable advice for contributing to computational number theory
# Install dependencies
pip install -r requirements.txt
# Run the full pipeline (generate data → train → eval)
./convergent pipeline
# Or run steps individually:
./convergent generate-blocks # Generate training data
./convergent merge # Merge and deduplicate
./convergent validate # Check data quality
./convergent train # QLoRA fine-tuning
./convergent merge-weights # Merge LoRA into base model
./convergent eval # Run evaluation benchmark# Interactive agentic mode — connects to the MCP server for live research
./convergent agent
# Single query with tool execution
./convergent agent -q "How many Zaremba exceptions exist for digit set {1,2,3}?"
# Chat mode (no tools, just conversation)
./convergent chat
# Point to a local MCP server
./convergent agent --mcp-url http://localhost:8000In agent mode, Convergent runs a full ReAct loop:
- You ask a question
- The model reasons and decides whether to call a tool
- If it outputs a
<tool_call>, the CLI executes it against the MCP server - The tool result is fed back to the model
- The model reasons about the result and either calls another tool or gives a final answer
Inference:
| Command | Description |
|---|---|
./convergent agent |
Interactive agentic chat with MCP server |
./convergent agent -q "..." |
Single-query agentic mode |
./convergent chat |
Interactive chat (no tools) |
Training pipeline:
| Command | Description |
|---|---|
./convergent generate-blocks |
Run all scripts/blocks/block_*.py to generate training data |
./convergent generate-synthetic |
Generate synthetic CoT data from remote LLM endpoints |
./convergent merge |
Merge all blocks, deduplicate, remove eval leaks |
./convergent validate |
Validate dataset format, balance, and quality |
./convergent stats |
Show dataset composition breakdown |
./convergent train |
Run QLoRA fine-tuning on the dataset |
./convergent merge-weights |
Merge LoRA adapter into base model weights |
./convergent eval |
Run the 103-question custom evaluation benchmark |
./convergent eval-standard |
Run standard benchmarks (GSM8K, ARC, MMLU) |
./convergent validate-eval |
Validate the evaluation benchmark itself |
./convergent pipeline |
Run the full pipeline end-to-end |
./convergent add-block NAME |
Create a new training data block from template |
Convergent is designed for continuous improvement:
GPU Computation → Findings → Train into Model → Reason & Discuss → New Experiments
↑ │
└────────────────────────────────────────────────────────────────────┘
When bigcompute.science produces new results:
- Add a new block:
./convergent add-block new_findings - Edit the block: Add training entries about the new findings
- Register it: Add the block filename to
scripts/merge_dataset.py - Run the pipeline:
./convergent merge && ./convergent validate && ./convergent train - Evaluate:
./convergent merge-weights && ./convergent eval
The training data is organized as modular blocks in scripts/blocks/:
| Block | Entries | Description |
|---|---|---|
block_identity.py |
~30 | Model identity, mission, hardware |
block_bcd_agent.py |
~80 | MCP tool definitions and agentic examples |
block_tool_variations.py |
~120 | Tool-call format reinforcement |
block_cuda_mastery.py |
~40 | Advanced CUDA kernel development |
block_university.py |
~30 | Number theory from academic sources |
block_erdos.py |
~20 | Open Erdős problems |
block_reasoning.py |
~25 | Mathematical reasoning methodology |
block_prime_convergents.py |
~15 | Cahlen Humphreys' paper on prime convergents |
block_v10_targeted.py |
~85 | Targeted weak-spot reinforcement |
block_v11_reinforce.py |
~30 | Standard math and planning reinforcement |
| ... | ... | See scripts/merge_dataset.py for the full list |
Synthetic data from Qwen2.5-Math-72B and Gemma-4-26B adds ~4,500 deep mathematical reasoning entries.
Total: 5,783 entries after deduplication.
The custom benchmark (eval/benchmark.jsonl) contains 97 questions across 20 categories:
| Category | Questions | Description |
|---|---|---|
standard_math |
8 | BK theorem, Hausdorff dimension, Kronecker coefficients |
agentic_tool_use |
8 | Correct tool-call format and JSON |
factual_recall |
10 | Exact computational findings from bigcompute.science |
paper_comprehension |
6 | Understanding of research papers |
novel_synthesis |
6 | Connecting findings to suggest new research |
multi_turn_react |
3 | Full ReAct loops with THINK/ACT/OBSERVE/SYNTHESIZE |
error_recovery |
3 | Graceful handling of tool failures |
mcp_decision |
2 | When to call tools vs. answer from knowledge |
| ... | ... | 21 categories total |
Custom evaluation: 76% across 103 questions in 20 categories (includes nvcc compilation-tested CUDA scoring)
Standard benchmarks (alignment tax):
| Benchmark | Base Model | Convergent | Delta |
|---|---|---|---|
| GSM8K (5-shot) | 80% | 82% | +2% |
| MMLU Math (3 subjects) | 51.3% | 51.3% | 0% |
| ARC-Challenge (25-shot) | 65.5% | 59.5% | -6% |
Math reasoning improved. Math knowledge preserved. General reasoning has a 6% tax — acceptable for a specialized research model.
- Use PEFT standard merge, not Unsloth merge. Unsloth's
save_pretrained_mergedhas been observed to produce corrupted weights (degenerate/garbled output) even when the LoRA adapter itself is correct. The./convergent merge-weightscommand uses PEFT standard merge by default. Only pass--unslothif you have verified the output.
Training config: configs/qlora.yaml
Key parameters:
- LoRA rank: 128 (high capacity for diverse training data)
- LoRA alpha: 256
- Epochs: 2
- Learning rate: 2e-4 with cosine schedule
- Max sequence length: 4096
- Quantization: NF4 with double quantization
- NEFTune: noise alpha 5 for improved generalization
| Variable | Default | Description |
|---|---|---|
MODEL_DIR |
Qwen/Qwen2.5-7B-Instruct |
Base model path or HuggingFace ID |
MERGED_MODEL_DIR |
cahlen/Convergent-7B |
Model path (HuggingFace ID or local directory) |
LORA_OUTPUT_DIR |
outputs/convergent-lora |
LoRA adapter output directory |
MATH_MODEL_ENDPOINT |
— | vLLM endpoint for math model (synthetic data) |
GEMMA_MODEL_ENDPOINT |
— | vLLM endpoint for Gemma model (synthetic data) |
LM_EVAL_BIN |
lm_eval |
Path to lm-evaluation-harness binary |
- Training: NVIDIA GPU with ≥ 24GB VRAM (RTX 4090, RTX 5090, A100, H100)
- Inference: NVIDIA GPU with ≥ 16GB VRAM (merged bf16 model is ~15GB)
- Synthetic data generation: Remote vLLM endpoints (Qwen2.5-Math-72B, Gemma-4-26B)
convergent/
├── convergent # CLI toolkit entry point
├── configs/
│ └── qlora.yaml # QLoRA training configuration
├── scripts/
│ ├── train.py # QLoRA fine-tuning with instruction masking
│ ├── merge.py # LoRA adapter merge
│ ├── merge_dataset.py # Assemble all training blocks
│ ├── dedup_and_clean.py # Deduplicate and remove eval leaks
│ ├── validate_all.py # Comprehensive dataset validation
│ ├── final_stats.py # Dataset composition statistics
│ ├── fix_format.py # Tool response format correction
│ ├── fix_system_prompts.py # System prompt unification
│ ├── convert_external.py # External dataset converter (Hermes FC)
│ ├── blocks/ # Training data generators (40 block_*.py files)
│ └── data_generation/ # Synthetic data generation from remote LLMs
├── eval/
│ ├── run_benchmark.py # Evaluation runner with specialized scorers
│ ├── benchmark.jsonl # 97-question custom benchmark
│ └── validate_eval.py # Benchmark self-validation
├── data/ # Generated training data (gitignored)
├── DATA_SOURCES.md # Documentation of all data sources
├── requirements.txt
└── LICENSE # MIT
@misc{humphreys2026convergent,
author = {Humphreys, Cahlen},
title = {Convergent: A QLoRA-tuned Research Companion for Computational Number Theory},
year = {2026},
url = {https://github.com/cahlen/convergent},
note = {bigcompute.science}
}- bigcompute.science — Conjecture-driven GPU research in computational mathematics
- MCP Server — Model Context Protocol server for experimental data and tools
- Convergent-7B Model — Trained model weights on HuggingFace
- Convergent-7B Data — Training dataset on HuggingFace
- guerrillamathematics.com — Mathematical research blog
This project is maintained by a single person. If you run into issues, please file them on GitHub or HuggingFace and I will do my best to address them. I apologize in advance for any delays in response time.

