This document is the executable contract for the feature that justifies this project. The whole reason
opencost-aiexists, rather than "point Claude at your OpenCost data," is that regulated and sovereign environments cannot ship cluster cost data across a corporate perimeter. Everything here assumes a cluster with zero egress to the public internet.
This flow takes a freshly-built cluster with no outbound internet
access, a registry reachable only from inside the perimeter, and the
four artefacts listed below, and produces a working opencost-ai
install:
- The gateway image (
ghcr.io/opencost/opencost-ai-gateway) built from this repo. - The bridge image (
ghcr.io/jonigl/ollama-mcp-bridge), upstream. - The Ollama runtime image (
ollama/ollama), upstream. - The model weights (
granite4.1:8bby default), packaged as a GGUF file wrapped in an OCI artefact.
The output is a helm install against the chart in
deploy/helm/opencost-ai/ with every image.repository value
overridden to the internal registry, and the Ollama PVC pre-populated
with the default model.
+---------------------+ +-----------------------+ +-----------------+
| Connected host | --> | Internal OCI registry | --> | Air-gap cluster |
| (staging machine) | | (inside perimeter) | | (no egress) |
+---------------------+ +-----------------------+ +-----------------+
| ^ |
| ORAS push (GGUF) | |
| crane push (images) | |
+-----------------------------+ |
|
helm install --set image.repository=... --------+
Three rules the design enforces:
- No runtime
ollama pull. The Ollama NetworkPolicy whitelists only kube-dns. If a manifest reaches the cluster and tries to pull a model at runtime, it will fail closed. Weights must be on the PVC before the Ollama pod starts. - No registry indirection at runtime. Helm
image.repositoryis overridden at install time; pods never reference the originalghcr.ioordocker.iopaths. - Model distribution uses the same registry auth as images. Model weights ride the OCI protocol via ORAS so operators reuse the registry mirror / cosign / pull-secret infrastructure they already operate.
ollama≥ 0.30.8 (Granite 4.x needs a recent hybrid-model-capable runtime)oras≥ 1.2crane(go-containerregistry) for lossless image copycosign≥ 2.4 (optional, recommended for verification pre-push)helm≥ 3.16 andkubectl≥ 1.31- ~20 GB free disk (weights + image layers)
- Kubernetes ≥ 1.25 with a CNI that enforces NetworkPolicy (Calico, Cilium, Antrea). Flannel without an overlay renders policies but does not enforce them — the air-gap posture degrades silently.
- A StorageClass backing the Ollama PVC (ReadWriteOnce is enough for the default single-replica StatefulSet).
- An internal OCI registry reachable from every node. Path-addressable
(e.g. Harbor, Zot, ECR / ACR / GAR in-VPC, or a plain
registry:2behind mTLS) — what matters is thatcontainerdcan pull from it andoras pullworks against it with the same credentials. - OpenCost ≥ v1.118 already installed (this chart is a consumer of its built-in MCP server on port 8081).
The goal is to produce two artefacts:
- A GGUF file on disk.
- An OCI artefact in the internal registry that wraps that GGUF.
ollama pull downloads a manifest-plus-blobs tree into ~/.ollama.
We re-use the existing blob layout: Ollama's model storage is already
a content-addressable store, so the GGUF is a single file we can point
ORAS at.
# 1. Pull on a machine with internet access.
ollama pull granite4.1:8b
# 2. Locate the GGUF blob that backs the manifest. Ollama stores blobs
# under $HOME/.ollama/models/blobs/ as sha256-<hex> filenames on
# disk (the colon from the digest is replaced with a dash); the
# manifest under $HOME/.ollama/models/manifests/ references the
# layer whose mediaType is application/vnd.ollama.image.model.
./scripts/air-gap/export-gguf.sh granite4.1:8b ./stage/granite4.1-8b.gguf
# 3. Push the GGUF to the internal registry as an OCI artefact. The
# reference uses an `ollama-model/` path prefix by convention so
# registry admins can scope policies separately from container
# images.
./scripts/air-gap/oras-push-model.sh \
registry.internal.example/ollama-model/granite4.1-8b:latest \
./stage/granite4.1-8b.ggufscripts/air-gap/oras-push-model.sh tags the artefact with the
application/vnd.ollama.image.model media type — the same type Ollama
itself uses, so the artefact round-trips through Ollama tooling later.
It also attaches the source Modelfile as a separate layer so the
cluster side has everything needed to run ollama create without
inventing parameters.
The artefact layout on the registry:
manifest (application/vnd.oci.image.manifest.v1+json)
├── config: application/vnd.ollama.image.config+json
├── layer 0: application/vnd.ollama.image.model (GGUF bytes)
└── layer 1: application/vnd.ollama.image.modelfile (Modelfile)
Three container images, one command each. crane copy preserves the
manifest digest, which matters when you later pin
image.digest in values.yaml (recommended for reproducible
installs).
SRC_GATEWAY=ghcr.io/opencost/opencost-ai-gateway:v0.1.0
SRC_BRIDGE=ghcr.io/jonigl/ollama-mcp-bridge:v0.2.0
SRC_OLLAMA=ollama/ollama:0.30.8
DST=registry.internal.example/opencost-ai
./scripts/air-gap/mirror-images.sh \
"${SRC_GATEWAY}=${DST}/opencost-ai-gateway:v0.1.0" \
"${SRC_BRIDGE}=${DST}/ollama-mcp-bridge:v0.2.0" \
"${SRC_OLLAMA}=${DST}/ollama:0.30.8"The script uses crane copy and prints the destination digest for
each image. Record those digests — they feed the image.digest
values below. Digest pinning is the belt-and-braces answer to "how do
I know the registry admin did not silently retag?".
If the staging host cannot reach the internal registry directly (i.e.
staging is doubly-isolated), replace the single-step copy with a
two-step crane pull → sneakernet tarball → crane push flow. The
script supports this via the OCI_LAYOUT and AIRGAP_LAYOUT_MODE
environment variables (pull on the connected side, push on the
disconnected side); set CRANE_INSECURE=1 if the destination
registry serves plain HTTP.
The chart in this repo does not (yet) ship a model-bootstrap Job — that work is tracked separately and must land before this section gains an executable example. Until then, pre-populate the PVC by hand. The flow below is the supported path for v0.1.
Sketch only — no template renders these values today, do not paste
this into a real values.yaml:
# Future (NOT shipped yet — pseudo-config):
# ollama:
# modelBootstrap:
# enabled: true
# ociRef: registry.internal.example/ollama-model/granite4.1-8b:latest
# modelName: granite4.1:8bWhen this ships, the Job will borrow the Ollama image (already has
the ollama binary) plus a tiny oras sidecar; both images must
be mirrored per Step 2. Until then, use the manual approach below.
For shops that provision volumes out of band (NetApp / PowerScale / pre-baked CSI snapshots), skip the Job entirely:
# On a helper pod with the Ollama PVC mounted at /var/lib/ollama and
# the `ollama` binary in PATH (any image based on ollama/ollama works).
# The helper script locates the actual `*.Modelfile` filename inside
# the pulled artefact (the push side preserves the operator-supplied
# basename) and synthesises one if the artefact has none, so this
# does not break when the source filename was, say,
# `granite4.1-8b.Modelfile`.
scripts/air-gap/oras-pull-model.sh \
registry.internal.example/ollama-model/granite4.1-8b:latest \
granite4.1:8b
# Ollama writes the registered model into $HOME/.ollama which HOME is
# relocated to /var/lib/ollama by the StatefulSet (see
# deploy/helm/opencost-ai/templates/ollama-statefulset.yaml).Either way, verify the model is on the PVC before installing the chart:
kubectl -n opencost-ai exec pod/ollama-model-loader -- \
ollama list | grep granite4.1:8bCreate a values-airgap.yaml in your GitOps repo (not this repo —
air-gap coordinates are site-specific):
# values-airgap.yaml
imagePullSecrets:
- name: internal-registry
gateway:
image:
repository: registry.internal.example/opencost-ai/opencost-ai-gateway
tag: v0.1.0
digest: sha256:... # paste from Step 2 output
pullPolicy: IfNotPresent
bridge:
image:
repository: registry.internal.example/opencost-ai/ollama-mcp-bridge
tag: v0.2.0
digest: sha256:...
ollama:
image:
repository: registry.internal.example/opencost-ai/ollama
tag: "0.30.8"
digest: sha256:...
persistence:
storageClassName: "internal-block" # explicit, not default
networkPolicy:
# The default policies already block internet egress. Nothing to do
# here in an air-gap install — this section documents the invariant.
enabled: trueInstall:
kubectl create namespace opencost-ai
kubectl label namespace opencost-ai \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/audit=restricted \
pod-security.kubernetes.io/warn=restricted
kubectl -n opencost-ai create secret generic opencost-ai-auth \
--from-literal=token="$(openssl rand -hex 32)"
helm install opencost-ai ./deploy/helm/opencost-ai \
--namespace opencost-ai \
--values values-airgap.yaml \
--set gateway.auth.existingSecret=opencost-ai-auth \
--wait --timeout 5mFrom a pod inside the cluster (not from a port-forward — we want to exercise the NetworkPolicy):
kubectl -n opencost-ai run curl --rm -it --restart=Never \
--image=registry.internal.example/opencost-ai/curl:8.10 -- \
sh -c '
TOKEN=$(cat /var/run/secrets/opencost-ai/token)
curl -fsS -H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"query\":\"what did i spend yesterday?\"}" \
http://opencost-ai-gateway.opencost-ai.svc:8080/v1/ask
'A successful response proves:
- The gateway image pulled from the internal registry.
- The bridge reached Ollama on the private
ClusterIP. - Ollama found the model on the PVC (no pull attempt, because the NetworkPolicy would have blocked one).
- The bridge reached the OpenCost MCP server on :8081.
Confirm nothing in the namespace can reach the internet. This is the single most important assertion — if it fails, the install is a security regression regardless of whether the happy path works:
kubectl -n opencost-ai debug -it \
deploy/opencost-ai-gateway \
--image=registry.internal.example/opencost-ai/curl:8.10 \
--target=gateway -- \
sh -c 'curl --max-time 5 -sS https://1.1.1.1 || echo "egress blocked (expected)"'Expected: egress blocked (expected). Anything else means the
NetworkPolicy did not render, the CNI does not enforce it, or a
sidecar is leaking.
test/airgap/run.sh automates a reduced version of Steps 1–5 against
a disposable kind cluster. It is the real validation for this
feature — not a render-only check — and runs in CI on the
airgap-e2e workflow (see .github/workflows/airgap-e2e.yml).
The harness:
- Boots a normal disposable kind cluster (no default-route surgery — the egress block is enforced at the host firewall, not via kind networking).
- Stands up an in-cluster
registry:2attached to thekinddocker network. - Builds the gateway image and pushes it into the in-cluster
registry via
docker push(against the registry's127.0.0.1:5000bind). Optionally (--mirror-upstream) alsocrane copys real upstreamollama/ollamaandghcr.io/jonigl/ollama-mcp-bridgeimages into the registry to exercise the mirror path for the full stack. Also runsscripts/air-gap/oras-push-model.shagainst a synthetic few-KB GGUF whenorasis on PATH, so a regression in the ORAS push/pull scripts is caught by the same job. - Applies two
iptablesrules on the host'sDOCKER-USERchain: one RETURN rule for packets destined to the kind CIDR (so the registry stays reachable), and a DROP rule for everything else leaving the kind bridge. Rules are tagged with a comment so theEXITtrap cleans them up even on partial failure. - Installs the chart with
gateway.image.repositorypointing at the in-cluster registry, PodSecurityrestrictedenforced on the namespace, and the bridge/ollama components disabled (seetest/airgap/README.mdfor why the full stack is out of scope here). - Runs three probe pods:
probe-healthexits 0 if/v1/healthreturnsstatus:ok.probe-egressexits 0 ifcurl https://1.1.1.1fails.probe-registryexits 0 if the in-cluster registry is reachable.
- Cross-checks the Deployment's image reference against the
in-cluster registry path so a silent fallback to
ghcr.iocannot pass. - Tears down kind and removes the iptables rules.
Run it locally:
sudo test/airgap/run.shIt requires sudo because the iptables rule is a host-level change —
this is the honest cost of "really blocked" vs. "looks blocked." An
earlier no-sudo mode based on Docker network isolation was removed
after review; see test/airgap/README.md for why a netns-only block
was not honest enough to keep.
Replace Step 1's artefact in the internal registry and re-run the
init Job (Step 3a). The Job is idempotent — ollama create on an
existing tag is a no-op unless the Modelfile changed. Models are
content-addressed on the PVC, so rolling back is helm upgrade with
the previous ollama.modelBootstrap.ociRef value.
The chart reads pull credentials from the imagePullSecrets list.
Rotate the Secret, let the Deployment roll; the ORAS-based model
artefacts use the same credentials as image pulls (both go through
the Docker config file path) so rotating once covers both.
Nothing in this flow assumes CPU-only inference. If the Ollama pod
schedules on a GPU node, the ollama/ollama image already contains
the CUDA bits; the model cache is hardware-independent. Add the
standard nvidia.com/gpu: 1 resource request under
ollama.resources.limits in your values file.
granite4.1:8b, granite4.1:30b, and granite4.1:3b are all
redistributable under the Apache 2.0 licence — shipping the weights
into a private registry is covered by it. Teams pushing other
models are responsible for confirming the licence terms of their
chosen weights before mirroring.
ollama pullat runtime against an internal Ollama Registry. The upstream registry protocol is not stable enough to target as a first-class mirror. The OCI+ORAS path here is deliberately registry-agnostic.- Fine-tuned adapters (LoRA). The cache path exists in Ollama but the chart does not template a values schema for them yet.
- Cross-registry replication. The chart reads one repository per image; multi-region replication is the registry's job, not the chart's.
Cross-reference with docs/architecture.md §10:
- Decision 3: Model weights in air-gap: OCI registry via ORAS. This doc is the concrete implementation of that decision.
- Decision 1/2: MCP transport is
streamable_http. The bridge config rendered by the chart already uses this, no air-gap-specific change. - Decision 5: Default model is
granite4.1:8b. The VRAM floor (~7 GB for the 8B default, ~18 GB forgranite4.1:30b) determines node sizing on the cluster side and Helmollama.resources.limitsvalues.
If this document and docs/architecture.md disagree, the architecture
doc wins and this one needs a fix in the same PR that makes them
agree.