Skip to content

k2jac9/urbanos

Repository files navigation

UrbanOS β€” Turning urban data into real-time insight through AI

CI

A full, local-first, multi-agent civic-intelligence project. It fuses City of Toronto open datasets into a knowledge graph; a supervisor agent + specialized sub-agents (on a local Nemotron model, ASUS Ascent GX10 / NVIDIA GB10) produce an actionable, cited risk read for any address or business; and Urban-OS runs an on-device urban-stress simulation that quantifies coordinated interventions across domains. No data leaves the device. It began at the NVIDIA Spark Hack β€” Toronto (May 2026) and is now developed as an ongoing project β€” the hackathon framing lives on in the ADRs / pitch / video kit as origin history.

Live instance: https://gx10-4428.taila9fe06.ts.net β€” served from the GX10 box over Tailscale Funnel (read-only); may be offline outside active windows. To bring it up, see docs/REMOTE_ACCESS.md.


🟒 Urban-OS β€” on-device urban-stress OS (flagship: Urban Operations)

The risk app above is now one lens on a deeper system. Urban-OS is a local simulation kernel that ingests Toronto open data, runs an urban-dynamics model entirely on the DGX Spark, and produces a quantified, cited intervention. The kernel owns a substrate (a road/transit graph) and a time loop; every behaviour is a plugin using four operators β€” source (inject forcing), transport (move a quantity on the graph), couple (fieldβ†’field), observe (fieldsβ†’metrics + a cost term J). An optimizer searches plugin-declared levers to minimize J = Ξ£ wβ‚šΒ·Jβ‚š.

Two plugin axes: city adapters turn a city's data into the substrate; domain lenses (event surge, economics, safety…) are portable across adapters for free. The static risk app becomes the Safety/Public-Services lens running on this kernel β€” proving the adapterΓ—lens architecture.

The killer insight (live, from the model)

On a peak FIFA World Cup 2026 day, four downtown venues let out into the same corridor at once (BMO Field FIFA + Rogers Centre + Scotiabank Arena + the Fort York Fan Festival β€” 140,800 people; see ADR-0018):

"Union Station reaches 3.7Γ— safe capacity 19 minutes after full-time; a 14-minute staggered release cuts the peak by 67% β€” a net intervention benefit of ~$218k (cost J $323k β†’ $105k)." β€” make urbanos-cli

Add the cross-domain lenses (--safety --business) β†’ ~$281k combined; the live :8001 /optimize (3-lens, with the weather/shelter lever) lands on a 16-min release + 80% shelter, Union 4.0Γ— β†’ 1.0Γ—, ~$458k combined benefit. One coordinated lever is the operations side of offsetting the Fan Festival's $6.2M deficit.

Specific station, timing, lever, dollars β€” emergent from the dynamics, and grounded: the figures are computed deterministically and the local model only phrases them, behind the same hallucination guard as the risk app (any invented number β†’ deterministic fallback). See ADR-0003.

Run it

make urbanos-cli     # run + optimize the downtown egress scenario, print the cited insight
make urbanos         # offline map + heatmap/time-slider at http://localhost:8000/
make urbanos-accel   # (optional, on the box) build the Rust core; prints the active backend

Endpoints: GET /scenario (substrate) Β· GET /simulate?release_minutes=… (per-step heatmap frames) Β· GET /optimize (before/after + the cited insight) Β· GET /health.

Architecture

City of Toronto Open Data (CKAN)  ──►  City adapter (adapters/toronto.py)
  TTC GTFS Β· traffic volumes Β· event permits        builds the road/transit substrate
        β”‚                                            (offline-deterministic synthetic
        β–Ό                                             downtown; real GTFS on the GX10)
   KERNEL  (urbanos/kernel/kernel)                          numpy fields over a networkx graph
   β”Œβ”€ source ─ transport ─ couple ─ observe ─┐  ◄── transport runs on a Rust core
   β”‚   time loop: integrate at NΓ— real-time   β”‚       (drop-in; numpy fallback, ADR-0004)
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚  lenses: EventSurge (egress wave) + Economic (risk = ρ^2.5, $ delay)
        β–Ό
   Optimizer (optimize.py)  ──►  J-minimizing intervention  (deterministic grid search)
        β”‚
        β–Ό
   Narrator (narrate.py, local model + hallucination guard)  ──►  the cited one-liner
        β”‚
        β–Ό
   FastAPI + offline MapLibre/PMTiles heatmap + time slider (api.py)

NVIDIA stack (on the GX10)

Each accelerator is wired with a CPU fallback (the demo never blocks if a GPU lib is absent) and opt-in on the box. Install with make gpu-install; prove which backend actually ran with make gpu-check (prints cugraph / cudf-polars on the box, CPU fallback elsewhere). Honest scale note: these pay off on full-city data, not the tiny demo substrate β€” same as the Rust accelerator (ADR-0009).

  • NeMo / Nemotron (local) β€” the insight narrator and agentic lenses, fully on-device. Wired and live (verified grounded on the box).
  • nx-cugraph (RAPIDS) β€” GPU backend for the substrate shortest-paths bake (kernel/state.py), enabled by URBANOS_GPU_GRAPH=1. Falls back to networkx CPU.
  • cuDF (RAPIDS) via Polars β€” the civic ingest uses Polars, whose collect(engine="gpu") runs on cuDF; enabled by URBANOS_GPU_DF=1. Falls back to Polars-CPU, then pandas. Drop-in: identical rows, golden numbers unchanged.
  • cuOpt (RAPIDS) β€” solves the optimal evacuation max-flow on the capacitated substrate (GET /flow, urbanos/kernel/flow.py): the theoretical ceiling the staggered -release sim approaches. A real LP (cuOpt's wheelhouse) β€” not the lever search (cuOpt can't evaluate the black-box sim). URBANOS_GPU_FLOW=1; networkx max-flow CPU fallback. Verified on the GB10.
  • cuML (RAPIDS) β€” clusters the scored civic addresses into spatial risk hotspots (GET /clusters, urbanos/risk/cluster.py) via GPU KMeans. URBANOS_GPU_CLUSTER=1; deterministic numpy KMeans CPU fallback.
  • TensorRT-LLM β€” the narrator client is runtime-agnostic (OpenAI-compatible HTTP), so serving Nemotron behind trtllm-serve is a config swap: LLM_RUNTIME=tensorrt-llm + point LLM_BASE_URL at it. make llm-check reports which runtime answered. Box-proven on the GB10 (Nemotron-3-Nano NVFP4 served via the NGC TRT-LLM container). Capability, not a speedup: measured single-stream decode is not faster than Ollama (54.5 vs 61.2 tok/s, ADR-0027); a throughput-under-load win is unproven (next-step). Falls back to Ollama / the deterministic narrator. (ADR-0027)
  • PhysicsNeMo (Modulus) β€” a neural surrogate of the optimizer objective J(levers) for city-scale search (urbanos/kernel/surrogate.py, URBANOS_SURROGATE=1). Shipped as an interface only: the exact kernel still decides every result (the surrogate's prediction is recorded alongside, never used to choose); a trained checkpoint is the documented next step. Default off β†’ identical to the grid optimizer. (ADR-0027)
  • Rust core + 128 GB unified memory β€” the full graph, live sim state, and the model coexist; the kernel steps at NΓ— real-time (measure with make urbanos-accel).

Data-driven lenses (real Toronto counts, advisory)

A parallel track grounds the kernel in real Toronto TMC 15-min counts β€” each piece is opt-in + CPU-fallback, so the headline numbers above never move unless you explicitly turn it on (see docs/research/tpf-and-data-driven-lenses.md):

  • CongestionNowcast (Phase 1, advisory) β€” scores the kernel's crowd profile against the observed counts (scale-free shape agreement); no lever, no cost. Reported in /lenses.
  • Learned-dynamics floor (Phase 2, ADR-0028, advisory) β€” fits a velocity field from the observed marginals and reports whether it beats the exact kernel at matching ground truth. URBANOS_LEARNED_DYNAMICS=1; surfaced in the UI labelled learned/approximate, never a headline number. (Phase 3 / TPF is a documented NO-GO β€” the win is purely gradient, not rotational.)
  • TransitLoad (Fit C, ADR-0029) β€” injects the measured throughput as a real source() (honest background ridership on top of the event egress). URBANOS_TRANSIT_LOAD=1 / --transit-load; off by default, no lever and no J term, so the exact kernel still prices every person and the golden numbers are unchanged.

Design decisions are recorded in docs/adr/.


Why this shape

  • Track: Public Services (frames cleanly as Economic Systems for the investor pitch).
  • Winning pattern (mirrors the NYC Spark Hack overall winner): multi-dataset knowledge graph + multi-agent + an actionable output + an obvious commercial buyer (city inspections, insurers, lenders, commercial real estate) + a 100% on-device story.
  • Hardware reality: GB10 has 128 GB unified memory but ~273 GB/s bandwidth β€” use MoE / small-active models (Nemotron Nano, or gpt-oss-120B MoE) for a snappy live demo. Dense 70B+ decode is too slow (~2.7 tok/s).

Architecture

City of Toronto Open Data (CKAN)
  building permits Β· DineSafe inspections Β· 311 requests Β· business licences
        β”‚  ingest/ckan.py + ingest/datasets.py
        β–Ό
   Knowledge graph (graph/builder.py, networkx)
        β”‚
        β–Ό
   Supervisor agent  ──►  sub-agents (retrieval Β· compliance Β· risk)
   agents/supervisor.py     agents/subagents.py
        β”‚  local LLM via OpenAI-compatible endpoint (agents/llm.py)
        β–Ό
   FastAPI  /analyze?address=...   (api/server.py)  +  CLI (cli.py)

Quickstart

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
make install-hooks              # enable the pre-push test gate (run once per clone)
cp .env.example .env            # point LLM_BASE_URL at your local Ollama/NIM endpoint

python scripts/download_data.py # pre-fetch datasets (do this BEFORE the venue!)
make serve                      # FastAPI on :8000  β†’ GET /analyze?address=...
# or
python -m urbanos.risk.cli analyze "100 Queen St W"

Instant demo

make demo serves the offline map against committed slices of real downtown Toronto data (demo_data/) β€” DineSafe inspections, business licences, and active building permits that share addresses, so ~12 establishments link all three sources on one pin (real cross-dataset fusion, not staged; permits + infractions both feed the risk score). Filtered to the basemap's bbox so every pin lands on the map. make demo-cli runs a deterministic check on synthetic fixtures.

make demo        # offline map + real downtown establishments at http://localhost:8000/
make demo-cli    # deterministic report on synthetic fixtures (100 Queen St W β†’ 1.0)
make demo-data   # rebuild the real slice from the live dataset

Then open http://localhost:8000/ β€” a fully offline map (MapLibre GL rendering a self-hosted PMTiles vector basemap of downtown Toronto, static/toronto.pmtiles) with pins colored by risk (red = high). No tile servers, no CDN β€” demo-proof against flaky venue WiFi. Click a pin to run the agentic read. Endpoints:

  • GET / map UI
  • GET /addresses geocoded addresses + fast risk score (no LLM) β€” drives the pins
  • GET /analyze?address=… full agentic read (Nemotron Nano, interactive tier)
  • GET /digest city-wide briefing (gpt-oss-120B MoE, batch tier)

The basemap is committed (static/toronto.pmtiles, ~6 MB). To refresh or widen it:

scripts/build_tiles.sh    # needs the `pmtiles` CLI; pulls only the bbox via range requests

Hallucination resistance

The risk score and findings are computed without an LLM. The model only proposes per-claim output β€” each claim is a JSON object tied to a source-record tag (E1, E2, …). Every claim is then verified (agents/verify.py): each cited tag must be a real evidence record, and every number must trace to the actual findings. Any claim that invents a number or a source ID is rejected and we fall back to deterministic, correct-by-construction claims β€” so a hallucinated figure or fabricated source can never reach the user. The map panel renders each claim with a βœ“ verify link that reveals the exact source record behind it (click-to-verify). Caught in testing: the model once claimed "9 permits" when the data showed 8 β†’ rejected. Maps to the Prime Intellect "Verifiers" bounty.

Two model tiers

LLM_MODEL (Nemotron Nano) handles snappy interactive /analyze; LLM_BATCH_MODEL (gpt-oss-120B MoE) handles the heavier /digest. Both are MoE / small-active so they decode acceptably within the GB10's ~273 GB/s bandwidth.

Agentic tools over MCP (NemoClaw / OpenClaw)

The datasets and risk engine are exposed as MCP tools (list_datasets, dataset_resources, analyze_address, top_risk) so a local agent runtime can call them β€” the pattern the NYC winner used:

python -m urbanos.risk.mcp_server      # stdio MCP server

On the GX10, point NemoClaw (running Nemotron locally via OpenShell) at config/nemoclaw.mcp.json so the agent answers civic-risk questions through our tools β€” the "Best Use of Nemotron/NemoClaw" integration.

Roadmap: QLoRA fine-tune (GX10 GPU)

Messy address matching is our hard problem. scripts/finetune_address_resolution.py trains a Nemotron-Nano QLoRA adapter (Unsloth/TRL) on fixtures/address_resolution.sample.jsonl, served back via vllm serve --enable-lora. An optional research track.

Local model (on the GX10)

The GX10 ships with Ollama + DGX OS (ARM64). Pull a small-active model and serve its OpenAI-compatible endpoint:

ollama pull nemotron-3-nano        # or: gpt-oss:120b  (MoE, ~35-40 tok/s)
# Ollama exposes http://localhost:11434/v1  -> set LLM_BASE_URL accordingly

Operations checklist

  • Cache real data slices locally: scripts/download_data.py + make demo-data (discover datasets with scripts/catalog.py search "<query>").
  • Build images ARM64-native (docker build on the GX10 or --platform linux/arm64).
  • make test green before deploy; bring the box up per docs/ON_THE_BOX.md.

See docs/ for the architecture (ADRs) and the origin pitch. MIT-style use.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors