A full, local-first, multi-agent civic-intelligence project. It fuses City of Toronto open datasets into a knowledge graph; a supervisor agent + specialized sub-agents (on a local Nemotron model, ASUS Ascent GX10 / NVIDIA GB10) produce an actionable, cited risk read for any address or business; and Urban-OS runs an on-device urban-stress simulation that quantifies coordinated interventions across domains. No data leaves the device. It began at the NVIDIA Spark Hack β Toronto (May 2026) and is now developed as an ongoing project β the hackathon framing lives on in the ADRs / pitch / video kit as origin history.
Live instance: https://gx10-4428.taila9fe06.ts.net β served from the GX10 box over Tailscale Funnel (read-only); may be offline outside active windows. To bring it up, see docs/REMOTE_ACCESS.md.
The risk app above is now one lens on a deeper system. Urban-OS is a local
simulation kernel that ingests Toronto open data, runs an urban-dynamics model
entirely on the DGX Spark, and produces a quantified, cited intervention. The
kernel owns a substrate (a road/transit graph) and a time loop; every behaviour is a
plugin using four operators β source (inject forcing), transport (move a quantity
on the graph), couple (fieldβfield), observe (fieldsβmetrics + a cost term J).
An optimizer searches plugin-declared levers to minimize J = Ξ£ wβΒ·Jβ.
Two plugin axes: city adapters turn a city's data into the substrate; domain lenses (event surge, economics, safetyβ¦) are portable across adapters for free. The static risk app becomes the Safety/Public-Services lens running on this kernel β proving the adapterΓlens architecture.
On a peak FIFA World Cup 2026 day, four downtown venues let out into the same corridor at once (BMO Field FIFA + Rogers Centre + Scotiabank Arena + the Fort York Fan Festival β 140,800 people; see ADR-0018):
"Union Station reaches 3.7Γ safe capacity 19 minutes after full-time; a 14-minute staggered release cuts the peak by 67% β a net intervention benefit of ~$218k (cost J $323k β $105k)." β
make urbanos-cli
Add the cross-domain lenses (--safety --business) β ~$281k combined; the live
:8001 /optimize (3-lens, with the weather/shelter lever) lands on a 16-min
release + 80% shelter, Union 4.0Γ β 1.0Γ, ~$458k combined benefit. One
coordinated lever is the operations side of offsetting the Fan Festival's $6.2M
deficit.
Specific station, timing, lever, dollars β emergent from the dynamics, and grounded: the figures are computed deterministically and the local model only phrases them, behind the same hallucination guard as the risk app (any invented number β deterministic fallback). See ADR-0003.
make urbanos-cli # run + optimize the downtown egress scenario, print the cited insight
make urbanos # offline map + heatmap/time-slider at http://localhost:8000/
make urbanos-accel # (optional, on the box) build the Rust core; prints the active backendEndpoints: GET /scenario (substrate) Β· GET /simulate?release_minutes=β¦ (per-step
heatmap frames) Β· GET /optimize (before/after + the cited insight) Β· GET /health.
City of Toronto Open Data (CKAN) βββΊ City adapter (adapters/toronto.py)
TTC GTFS Β· traffic volumes Β· event permits builds the road/transit substrate
β (offline-deterministic synthetic
βΌ downtown; real GTFS on the GX10)
KERNEL (urbanos/kernel/kernel) numpy fields over a networkx graph
ββ source β transport β couple β observe ββ βββ transport runs on a Rust core
β time loop: integrate at NΓ real-time β (drop-in; numpy fallback, ADR-0004)
ββββββββββββββββββββββββββββββββββββββββββββ
β lenses: EventSurge (egress wave) + Economic (risk = Ο^2.5, $ delay)
βΌ
Optimizer (optimize.py) βββΊ J-minimizing intervention (deterministic grid search)
β
βΌ
Narrator (narrate.py, local model + hallucination guard) βββΊ the cited one-liner
β
βΌ
FastAPI + offline MapLibre/PMTiles heatmap + time slider (api.py)
Each accelerator is wired with a CPU fallback (the demo never blocks if a GPU lib
is absent) and opt-in on the box. Install with make gpu-install; prove which
backend actually ran with make gpu-check (prints cugraph / cudf-polars on the
box, CPU fallback elsewhere). Honest scale note: these pay off on full-city data, not
the tiny demo substrate β same as the Rust accelerator (ADR-0009).
- NeMo / Nemotron (local) β the insight narrator and agentic lenses, fully on-device. Wired and live (verified grounded on the box).
nx-cugraph(RAPIDS) β GPU backend for the substrate shortest-paths bake (kernel/state.py), enabled byURBANOS_GPU_GRAPH=1. Falls back to networkx CPU.- cuDF (RAPIDS) via Polars β the civic ingest uses Polars, whose
collect(engine="gpu")runs on cuDF; enabled byURBANOS_GPU_DF=1. Falls back to Polars-CPU, then pandas. Drop-in: identical rows, golden numbers unchanged. - cuOpt (RAPIDS) β solves the optimal evacuation max-flow on the capacitated
substrate (
GET /flow,urbanos/kernel/flow.py): the theoretical ceiling the staggered -release sim approaches. A real LP (cuOpt's wheelhouse) β not the lever search (cuOpt can't evaluate the black-box sim).URBANOS_GPU_FLOW=1; networkx max-flow CPU fallback. Verified on the GB10. - cuML (RAPIDS) β clusters the scored civic addresses into spatial risk hotspots
(
GET /clusters,urbanos/risk/cluster.py) via GPU KMeans.URBANOS_GPU_CLUSTER=1; deterministic numpy KMeans CPU fallback. - TensorRT-LLM β the narrator client is runtime-agnostic (OpenAI-compatible HTTP), so
serving Nemotron behind
trtllm-serveis a config swap:LLM_RUNTIME=tensorrt-llm+ pointLLM_BASE_URLat it.make llm-checkreports which runtime answered. Box-proven on the GB10 (Nemotron-3-Nano NVFP4 served via the NGC TRT-LLM container). Capability, not a speedup: measured single-stream decode is not faster than Ollama (54.5 vs 61.2 tok/s, ADR-0027); a throughput-under-load win is unproven (next-step). Falls back to Ollama / the deterministic narrator. (ADR-0027) - PhysicsNeMo (Modulus) β a neural surrogate of the optimizer objective
J(levers)for city-scale search (urbanos/kernel/surrogate.py,URBANOS_SURROGATE=1). Shipped as an interface only: the exact kernel still decides every result (the surrogate's prediction is recorded alongside, never used to choose); a trained checkpoint is the documented next step. Default off β identical to the grid optimizer. (ADR-0027) - Rust core + 128 GB unified memory β the full graph, live sim state, and the model
coexist; the kernel steps at NΓ real-time (measure with
make urbanos-accel).
A parallel track grounds the kernel in real Toronto TMC 15-min counts β each piece is
opt-in + CPU-fallback, so the headline numbers above never move unless you explicitly turn
it on (see docs/research/tpf-and-data-driven-lenses.md):
- CongestionNowcast (Phase 1, advisory) β scores the kernel's crowd profile against the
observed counts (scale-free shape agreement); no lever, no cost. Reported in
/lenses. - Learned-dynamics floor (Phase 2, ADR-0028, advisory) β fits a velocity field from the
observed marginals and reports whether it beats the exact kernel at matching ground truth.
URBANOS_LEARNED_DYNAMICS=1; surfaced in the UI labelled learned/approximate, never a headline number. (Phase 3 / TPF is a documented NO-GO β the win is purely gradient, not rotational.) - TransitLoad (Fit C, ADR-0029) β injects the measured throughput as a real
source()(honest background ridership on top of the event egress).URBANOS_TRANSIT_LOAD=1/--transit-load; off by default, no lever and no J term, so the exact kernel still prices every person and the golden numbers are unchanged.
Design decisions are recorded in docs/adr/.
- Track: Public Services (frames cleanly as Economic Systems for the investor pitch).
- Winning pattern (mirrors the NYC Spark Hack overall winner): multi-dataset knowledge graph + multi-agent + an actionable output + an obvious commercial buyer (city inspections, insurers, lenders, commercial real estate) + a 100% on-device story.
- Hardware reality: GB10 has 128 GB unified memory but ~273 GB/s bandwidth β use MoE / small-active models (Nemotron Nano, or gpt-oss-120B MoE) for a snappy live demo. Dense 70B+ decode is too slow (~2.7 tok/s).
City of Toronto Open Data (CKAN)
building permits Β· DineSafe inspections Β· 311 requests Β· business licences
β ingest/ckan.py + ingest/datasets.py
βΌ
Knowledge graph (graph/builder.py, networkx)
β
βΌ
Supervisor agent βββΊ sub-agents (retrieval Β· compliance Β· risk)
agents/supervisor.py agents/subagents.py
β local LLM via OpenAI-compatible endpoint (agents/llm.py)
βΌ
FastAPI /analyze?address=... (api/server.py) + CLI (cli.py)
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
make install-hooks # enable the pre-push test gate (run once per clone)
cp .env.example .env # point LLM_BASE_URL at your local Ollama/NIM endpoint
python scripts/download_data.py # pre-fetch datasets (do this BEFORE the venue!)
make serve # FastAPI on :8000 β GET /analyze?address=...
# or
python -m urbanos.risk.cli analyze "100 Queen St W"make demo serves the offline map against committed slices of real downtown
Toronto data (demo_data/) β DineSafe inspections, business licences, and
active building permits that share addresses, so ~12 establishments link all three
sources on one pin (real cross-dataset fusion, not staged; permits + infractions both
feed the risk score). Filtered to the basemap's bbox so every pin lands on the map.
make demo-cli runs a deterministic check on synthetic fixtures.
make demo # offline map + real downtown establishments at http://localhost:8000/
make demo-cli # deterministic report on synthetic fixtures (100 Queen St W β 1.0)
make demo-data # rebuild the real slice from the live datasetThen open http://localhost:8000/ β a fully offline map (MapLibre GL rendering
a self-hosted PMTiles vector basemap of downtown Toronto, static/toronto.pmtiles)
with pins colored by risk (red = high). No tile servers, no CDN β demo-proof against
flaky venue WiFi. Click a pin to run the agentic read. Endpoints:
GET /map UIGET /addressesgeocoded addresses + fast risk score (no LLM) β drives the pinsGET /analyze?address=β¦full agentic read (Nemotron Nano, interactive tier)GET /digestcity-wide briefing (gpt-oss-120B MoE, batch tier)
The basemap is committed (static/toronto.pmtiles, ~6 MB). To refresh or widen it:
scripts/build_tiles.sh # needs the `pmtiles` CLI; pulls only the bbox via range requestsThe risk score and findings are computed without an LLM. The model only proposes
per-claim output β each claim is a JSON object tied to a source-record tag (E1, E2, β¦).
Every claim is then verified (agents/verify.py): each cited tag must be a real
evidence record, and every number must trace to the actual findings. Any claim that
invents a number or a source ID is rejected and we fall back to deterministic,
correct-by-construction claims β so a hallucinated figure or fabricated source can
never reach the user. The map panel renders each claim with a β verify link that
reveals the exact source record behind it (click-to-verify). Caught in testing: the model
once claimed "9 permits" when the data showed 8 β rejected. Maps to the Prime Intellect
"Verifiers" bounty.
LLM_MODEL (Nemotron Nano) handles snappy interactive /analyze; LLM_BATCH_MODEL
(gpt-oss-120B MoE) handles the heavier /digest. Both are MoE / small-active so they
decode acceptably within the GB10's ~273 GB/s bandwidth.
The datasets and risk engine are exposed as MCP tools (list_datasets,
dataset_resources, analyze_address, top_risk) so a local agent runtime can
call them β the pattern the NYC winner used:
python -m urbanos.risk.mcp_server # stdio MCP serverOn the GX10, point NemoClaw (running Nemotron locally via OpenShell) at
config/nemoclaw.mcp.json so the agent answers civic-risk questions through our
tools β the "Best Use of Nemotron/NemoClaw" integration.
Messy address matching is our hard problem. scripts/finetune_address_resolution.py
trains a Nemotron-Nano QLoRA adapter (Unsloth/TRL) on fixtures/address_resolution.sample.jsonl,
served back via vllm serve --enable-lora. An optional research track.
The GX10 ships with Ollama + DGX OS (ARM64). Pull a small-active model and serve its OpenAI-compatible endpoint:
ollama pull nemotron-3-nano # or: gpt-oss:120b (MoE, ~35-40 tok/s)
# Ollama exposes http://localhost:11434/v1 -> set LLM_BASE_URL accordingly- Cache real data slices locally:
scripts/download_data.py+make demo-data(discover datasets withscripts/catalog.py search "<query>"). - Build images ARM64-native (
docker buildon the GX10 or--platform linux/arm64). -
make testgreen before deploy; bring the box up per docs/ON_THE_BOX.md.
See docs/ for the architecture (ADRs) and the origin pitch. MIT-style use.