PRISM is an "ability to learn" machine-learning challenge for BASE Network. It turns Bittensor miners into researchers who submit a model and a training procedure as two executable Python scripts, then competes them on how well their model learns from scratch on locked data.
PRISM does not ask miners to train a frontier model. It asks a sharper question: given a fixed dataset and a forced random initialization, how quickly does a model learn? PRISM measures that as online compression: the better a model predicts each new chunk of text before training on it, the better it compresses the stream, and the better it scores.
PRISM is designed to answer questions such as:
- Which architectures learn fastest from scratch under a fixed compute budget?
- Which training loops (optimizer, schedule, data ordering, distributed strategy) improve sample efficiency?
- Which ideas hold up when the validator, not the miner, controls the seed, the data, and the metric?
- Miners submit two scripts: a model
architecture.pyand a customtraining.pyloop. - BASE verifies miner identity and forwards the submission.
- PRISM runs a static AST sandbox and an OpenRouter LLM hard gate over both scripts.
- The validator re-executes the training loop under a forced random init on the locked FineWeb-Edu train split, capturing the online loss stream itself.
- PRISM computes a prequential bits-per-byte score with a held-out delta tie-breaker.
- Scores rank on the leaderboard and convert into normalized, dry-run BASE weights.
flowchart LR
A[Locked FineWeb-Edu] --> P[PRISM]
M1[Miner A] --> P
M2[Miner B] --> P
M3[Miner C] --> P
P --> R[Prequential bits-per-byte + Held-out Delta]
R --> W[Dry-Run Weights]
A submission is a bundle (zip or directory) containing two distinct scripts:
architecture.pyexposingbuild_model(ctx)that returns atorch.nn.Module;training.pyexposingtrain(ctx)that owns the training loop.
An optional prism.yaml can declare the entrypoints and the chosen tokenizer. A single combined
module no longer satisfies the contract.
The miner owns the model and the training procedure, including multi-GPU scaling. The challenge owns everything that makes the comparison fair and cheat-resistant:
- the dataset content and the secret
val/testsplits; - the forced random seed and deterministic flags;
- the data order and the single-pass online-loss capture;
- the scoring.
Any metric the miner reports and any manifest the miner writes are ignored. Scoring always reads the
challenge-authored prism_run_manifest.v2.json.
PRISM evaluations are intentionally compact but honest:
- forced random initialization (fixed seed) so smuggled pretrained weights are inert;
- single-pass, predict-then-train online loss so there is no held-out leakage by construction;
- a metric that integrates the whole loss curve, so single-checkpoint gaming fails;
- compute normalization (tokens and FLOPs, never wall-clock) so a faster GPU does not buy a better score;
- a secret held-out split used only by the challenge scorer for the tie-breaker and the anti-memorization gap.
The primary signal is the prequential bits-per-byte: the area under the from-scratch online loss curve, normalized by the raw UTF-8 bytes consumed. A model that learns faster compresses better and ranks higher. The held-out delta-over-random-init breaks near-ties, and an excessive train-vs-held-out gap flags memorization and penalizes the score.
See Scaling Evaluation for the multi-GPU contract and the budget model, and Scoring and rewards for the scoring math.