PRISM is a BASE challenge service. It runs as a FastAPI application with SQLite state, internal BASE authentication, and GPU evaluation through the BASE Docker broker. PRISM measures a model's ability to learn: miners submit two scripts, the challenge owns the data and the evaluation, and the validator re-executes the miner's training loop under a forced random init and computes the score itself.
flowchart LR
Miner[Miner] --> Proxy[BASE Proxy]
Proxy --> Bridge[PRISM Bridge]
Bridge --> DB[(SQLite)]
Bridge --> Queue[Worker Queue]
Queue --> Static[Static Sandbox + Param Cap + Distributed Contract]
Static --> LLM[OpenRouter Hard Gate]
LLM --> Broker[Docker Broker]
Broker --> Reexec[Forced-Init Re-Execution Runner]
Reexec --> Score[Prequential bpb + Held-out Delta]
Score --> Weights[Dry-Run get_weights]
| Component | Responsibility |
|---|---|
| FastAPI app | Public and internal HTTP routes |
| Repository | SQLite persistence for submissions, scores, sources, eval jobs, and GPU leases |
| Worker | Claims pending submissions, runs static + LLM gates, dispatches re-execution, finalizes scores |
| Component resolver | Resolves the two-script contract (architecture.py/build_model + training.py/train) and computes fingerprints |
| Static sandbox | AST hard-blocks over both scripts, the forced-seed parameter-cap instantiation, and the multi-GPU static contract |
| LLM hard gate | OpenRouter review of both scripts; a reject is terminal before any GPU work |
| Container runner | Challenge-owned forced-init re-execution that captures the online loss stream |
| Scoring | Prequential bits-per-byte plus the held-out delta tie-breaker and anti-memorization gap |
| Weights module | Converts normalized completed scores into dry-run BASE weights |
BASE is responsible for miner-facing upload security. It verifies signatures, timestamps, nonces, and hotkey identity before forwarding a submission to PRISM.
PRISM receives verified submissions on:
POST /internal/v1/bridge/submissions
The bridge trusts only internal BASE authentication and the verified hotkey header. Miner-supplied identity headers are not trusted.
A submission is a bundle (zip or directory snapshot) containing two distinct scripts:
architecture.pyexposesbuild_model(ctx), a factory returning atorch.nn.Module.training.pyexposestrain(ctx), the miner-owned training loop entrypoint.
An optional prism.yaml may declare the entrypoints and chosen tokenizer. A single combined module
no longer satisfies the contract: the architecture and training roles must be two distinct scripts.
PRISM does not execute miner submissions directly in the master process. The worker performs static inspection and the LLM hard gate, then sends the project to an isolated evaluator container through the BASE Docker broker:
PRISM worker -> DockerExecutor -> BASE Docker broker -> GPU evaluator container
The pre-GPU static gates run in this order, and a rejection at any of them is terminal before the LLM review and before any GPU work:
- AST sandbox hard-blocks over both scripts.
- Forced-seed
build_modelinstantiation and the 150M parameter cap. - The multi-GPU static contract and single-node bound.
Legacy local-CPU and remote-Lium execution paths are not part of the supported backend set.
The challenge harness drives every scored run; the miner code only supplies the model and the loop body.
- The harness writes a challenge-owned runner that imports the miner's
architecture.pyandtraining.py, sets the global seeds and deterministic flags (torch.manual_seed,cuda.manual_seed_all,use_deterministic_algorithms(True), cudnn deterministic) before any miner code runs, then launchestorchrun --standalone --nnodes=1 --nproc-per-node=1withMASTER_ADDR=127.0.0.1. - The runner installs an instrumented loss capture. The data iterator yields fresh, single-pass
batches from the read-only locked
trainsplit in a challenge-controlled order, and the challenge records the model's per-batch loss on each new batch before the optimizer updates on it. Because the data is single-pass, this online training loss is the prequential code-length by construction. - The challenge authors
prism_run_manifest.v2.jsonfrom the captured stream. Any manifest the miner writes is discarded; any metric the miner reports is ignored.
The eval container is non-root, runs with a read-only rootfs except artifacts_dir, uses
network=none, and is bounded by a wall-clock budget that is only a safety cap, never part of the
score.
PRISM stores state in SQLite. Important tables include:
minerssubmissionseval_jobsgpu_leasesscoressubmission_sourcesllm_reviewsplagiarism_reviewsepochs
eval_jobs tracks each evaluation attempt (including the level='l1' static-tracking placeholder
created at submission time, which is not GPU work). gpu_leases records the exclusive single-GPU
lease for a scored run. scores holds the challenge-computed prequential bits-per-byte final_score
and its metrics payload.
After the forced-init re-execution completes with a valid challenge-authored
prism_run_manifest.v2.json, scoring computes everything from the challenge-owned capture:
- the prequential bits-per-byte primary score (lower bpb yields a better
final_score); - the held-out delta-over-random-init tie-breaker on the secret
valsplit (a near-tie refinement that never overrides the primary bpb axis); - the train-vs-held-out anti-memorization gap, which penalizes an excessive gap;
- a step-0 / smuggled-weights anomaly multiplier that zeroes an anomalous run.
flowchart LR
Reexec[Online Loss Stream] --> Bpb[Prequential bits-per-byte]
Bpb --> Final[final_score]
Heldout[Held-out Delta on Secret val] --> Final
Gap[Train-vs-Held-out Gap] --> Final
Final --> Board[Leaderboard ORDER BY final_score DESC]
Board --> Weights[Normalized Dry-Run Weights]
The leaderboard orders by final_score with a deterministic earliest-commit-wins tie-break, and
get_weights returns one normalized, dry-run weight per hotkey (best submission per hotkey). Weights
are never written on-chain.
Submissions can end in one of these states:
pendingrunningcompletedfailedrejectedheld
Rejected submissions fail static review, the two-script contract, the LLM hard gate, or duplicate review. Failed submissions passed the gates but failed the re-execution, scoring, or infrastructure. Held submissions are quarantined by the LLM review pending operator attention; the v1-NAS component-attribution holds and ownership-event machinery have been decommissioned.