Decentralized neural architecture search for frontier-model research
Overview • Miner Guide • Validator Guide • Architecture • Scoring • Security
PRISM is a Platform subnet for decentralized neural architecture search. Miners submit model architecture and training ideas, PRISM evaluates them in isolated benchmark environments, and the subnet rewards ideas that show better architecture quality, training behavior, and scaling signals.
The goal is not to train frontier models directly inside the subnet. Instead, PRISM searches the design space around frontier-model building blocks using compact evaluations that are fast enough for subnet operation while still surfacing useful architecture, optimizer, loss, inference, and scaling-law signals.
- Miners submit architecture or training variants.
- PRISM validates the submission contract and reviews the bundle for safety.
- The evaluator measures proxy learning quality, training behavior, stability, and scaling signals.
- Architecture ownership and training ownership are attributed separately.
- Meaningful improvements receive component rewards.
- Final architecture and recipe scores are converted into raw Platform weights.
- Architecture discovery: first discovery of a meaningful architecture family, including a never-before-seen family, earns architecture ownership.
- Training and inference improvement: later miners can improve optimizer setup, inference logits, loss computation, or train-step code for an existing architecture and earn training ownership.
- Noise-resistant improvements: dynamic thresholds and noise checks prevent tiny random metric changes from stealing rewards.
- Scaling-aware signals: PRISM emphasizes smooth loss curves, stable gradients, activation stability, and consistent improvements across model size, depth, sequence length, and batch scaling.
- Secure execution: submitted code is reviewed statically and by optional LLM policy checks, then executed only inside isolated containers through the Platform Docker broker.
PRISM fixes the dataset and evaluation protocol, not the architecture search space. Miners may submit never-before-seen architecture families through the build_model(ctx) contract, while PRISM compares them under shared scoring and scaling rules.
Training code is also first-class. Miners can customize optimizer, loss, inference, and train-step behavior with hooks such as configure_optimizer, compute_loss, inference_logits, and train_step. Full optimizer and LR control belongs in configure_optimizer; fallback evaluator paths may apply safe defaults or caps when a submission relies only on recipe fields.
Official proxy and full-scale evaluation modes use FineWeb-Edu dataset contracts. Metric claims must be backed by prism_run_manifest.v1.json artifacts, including dataset fingerprints, score eligibility flags, loss comparability metadata, diagnostics, benchmark metadata, and artifact references. Manifest validation is deterministic, but PRISM does not claim to recompute every submitted metric from raw artifacts.
For the scientific scoring basis, see Scoring and rewards and Scaling evaluation. For evidence-gated metric and anti-cheat review, see the Security model. LLM policy review can flag and explain risk, but rejection requires concrete evidence under that policy.
- Miner guide
- Validator guide
- Overview
- Architecture
- Submission format
- Scoring and rewards
- Scaling evaluation
- Security model
flowchart LR
Miner[Miner] --> Platform[Platform]
Platform --> Prism[PRISM]
Prism --> Review[Review]
Review --> Broker[Docker Broker]
Broker --> GPU[GPU Eval]
GPU --> Scale[Scaling Signals]
Scale --> Scores[Scores]
Scores --> Weights[Weights]
sequenceDiagram
participant M as Miner
participant P as Platform
participant R as PRISM
participant D as Docker
participant W as Weights
M->>P: signed ZIP upload
P->>R: verified hotkey submission
R->>R: static and LLM review
R->>D: isolated GPU evaluation
D-->>R: q_arch, q_recipe, hook, stability metrics
R->>R: scaling-aware attribution
R->>W: split component rewards
PRISM is designed to avoid rewarding signals that often fail at scale. Weak predictors include early MMLU-style benchmarks, subjective chat quality, final perplexity alone, single-seed results, and very short training runs without extrapolation.
The strongest proxy signals are:
- smooth loss curves without oscillation;
- stable gradient norms without silent explosion;
- absence of activation spikes, especially for paths that could scale beyond 10B parameters;
- coherent improvements across model sizes, such as similar gains at 125M, 350M, and 1B proxy scales;
- depth, sequence, and batch scaling tests that expose residual-stream drift, MoE routing collapse, KV-cache degradation, normalization failures, overflow, NaNs, and gradient-noise problems.
See Scaling Evaluation for the complete scaling policy.
prism/
assets/ # README and documentation images
docs/ # Project documentation
src/prism_challenge/ # Challenge app, repository, evaluator, and SDK helpers
src/prism_challenge/evaluator/
components.py # Architecture/training manifest parsing and fingerprints
container.py # Isolated evaluation runner
tests/ # API, scoring, broker, executor, and safety tests
config.example.yaml # Production-oriented example config
Dockerfile # Challenge image
Apache-2.0
