This repository contains the code accompanying the blog post Norm Balancing Optimizers. It implements BAM (Balanced Axis Momentum), a stripped-down Muon variant that replaces Newton–Schulz orthogonalization with SinkNorm.
The nanoGPT training scripts (with different optimizers) live in nanogpt. The CIFAR-10 MLP and ResNet-18 experiments can be run via run.py using the configs in config. We'll be updating this repository with sbatch scripts that we used to run our sweeps soon!
Note: this code was ported over from an experimental, private repo. If there are issues or broken scripts, let us know!