🎉 World's First: Successfully running Kimi-K2 1 Trillion Parameter Model on personal hardware!
This is a fork of exo-explore/exo with critical patches to enable distributed inference of massive models like Kimi-K2 (1T parameters, 504GB) across multiple Apple Silicon Macs.
- Layer-wise Mixed Quantization: Supports different bit widths per layer (3/4/6bit)
- Distributed Memory Loading: Splits 504GB model across M2 Studio (192GB) + M3 Ultra (512GB)
- MoE Optimization: Efficiently handles 384 experts with Top-8 routing
- Thunderbolt Network: Direct Mac-to-Mac connection for minimal latency
- Model: Kimi-K2-Instruct-MLX-3.985bit (1 Trillion Parameters)
- Hardware: M2 Studio + M3 Ultra via Thunderbolt 4
- Memory Usage: M2: ~145GB, M3: ~360GB
- Performance: 0.076 TPS (with potential for 10 TPS after optimization)
- Complete Setup Guide - Step-by-step reproduction guide
- Success Snapshot - Current working state
- Technical Summary - Project overview
- Automatic detection of per-layer quantization parameters
- Support for mixed bit-width (3/4/6bit) in single model
- Preemptive shard loading for simultaneous memory allocation
- Distributed loading synchronization
- DeepSeek-V3 compatibility for Kimi-K2
- Proper MoE layer handling
-
Setup Two Macs
# Clone this repo on both machines git clone https://github.com/Shinka-Man/exo.git cd exo # Install dependencies python -m venv venv source venv/bin/activate pip install -e . pip install tiktoken blobfile
-
Configure Network
- Connect Macs via Thunderbolt 4
- Set static IPs (10.0.2.1 and 10.0.2.2)
- Edit
discovery.jsonwith node details
-
Run Distributed Inference
# On M3 (Worker) python -m exo.main --node-id m3-studio-worker ... # On M2 (Main) python -m exo.main --node-id m2-studio-main ...
Current: 0.076 TPS → Target: 10 TPS
- Expert parallelization
- Communication optimization
- Metal Performance Shaders integration
- Dynamic batching
This is a research project pushing the boundaries of what's possible with personal hardware. Contributions for performance improvements are welcome!
MIT License - See LICENSE
- Original exo team
- Moonshot AI for Kimi-K2 model
- Apple for incredible M-series chips
"Making the impossible possible - 1 trillion parameters on your desk"
Date: September 1, 2025
Location: Tokyo, Japan
Hardware: M2 Studio (192GB) + M3 Ultra (512GB)