|
| 1 | +--- |
| 2 | +title: |
| 3 | + page: Inference with Ollama |
| 4 | + nav: Inference with Ollama |
| 5 | +description: Run local and cloud models inside an OpenShell sandbox using the Ollama community sandbox, or route sandbox requests to a host-level Ollama server. |
| 6 | +topics: |
| 7 | +- Generative AI |
| 8 | +- Cybersecurity |
| 9 | +tags: |
| 10 | +- Tutorial |
| 11 | +- Inference Routing |
| 12 | +- Ollama |
| 13 | +- Local Inference |
| 14 | +- Sandbox |
| 15 | +content: |
| 16 | + type: tutorial |
| 17 | + difficulty: technical_intermediate |
| 18 | + audience: |
| 19 | + - engineer |
| 20 | +--- |
| 21 | + |
| 22 | +<!-- |
| 23 | + SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 24 | + SPDX-License-Identifier: Apache-2.0 |
| 25 | +--> |
| 26 | + |
| 27 | +# Run Local Inference with Ollama |
| 28 | + |
| 29 | +This tutorial covers two ways to use Ollama with OpenShell: |
| 30 | + |
| 31 | +1. **Ollama sandbox (recommended)** — a self-contained sandbox with Ollama, Claude Code, and Codex pre-installed. One command to start. |
| 32 | +2. **Host-level Ollama** — run Ollama on the gateway host and route sandbox inference to it. Useful when you want a single Ollama instance shared across multiple sandboxes. |
| 33 | + |
| 34 | +After completing this tutorial, you will know how to: |
| 35 | + |
| 36 | +- Launch the Ollama community sandbox for a batteries-included experience. |
| 37 | +- Use `ollama launch` to start coding agents inside a sandbox. |
| 38 | +- Expose a host-level Ollama server to sandboxes through `inference.local`. |
| 39 | + |
| 40 | +## Prerequisites |
| 41 | + |
| 42 | +- A working OpenShell installation. Complete the {doc}`/get-started/quickstart` before proceeding. |
| 43 | + |
| 44 | +## Option A: Ollama Community Sandbox (Recommended) |
| 45 | + |
| 46 | +The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches. |
| 47 | + |
| 48 | +### Step 1: Create the Sandbox |
| 49 | + |
| 50 | +```console |
| 51 | +$ openshell sandbox create --from ollama |
| 52 | +``` |
| 53 | + |
| 54 | +This pulls the community sandbox image, applies the bundled policy, and drops you into a shell with Ollama running. |
| 55 | + |
| 56 | +::: |
| 57 | + |
| 58 | +### Step 2: Chat with a Model |
| 59 | + |
| 60 | +Chat with a local model |
| 61 | + |
| 62 | +```console |
| 63 | +$ ollama run qwen3.5 |
| 64 | +``` |
| 65 | + |
| 66 | +Or a cloud model |
| 67 | + |
| 68 | +```console |
| 69 | +$ ollama run kimi-k2.5:cloud |
| 70 | +``` |
| 71 | + |
| 72 | + |
| 73 | +Or use `ollama launch` to start a coding agent with Ollama as the model backend: |
| 74 | + |
| 75 | +```console |
| 76 | +$ ollama launch claude |
| 77 | +$ ollama launch codex |
| 78 | +$ ollama launch opencode |
| 79 | +``` |
| 80 | + |
| 81 | +For CI/CD and automated workflows, `ollama launch` supports a headless mode: |
| 82 | + |
| 83 | +```console |
| 84 | +$ ollama launch claude --yes --model qwen3.5 |
| 85 | +``` |
| 86 | + |
| 87 | +### Model Recommendations |
| 88 | + |
| 89 | +| Use case | Model | Notes | |
| 90 | +|---|---|---| |
| 91 | +| Smoke test | `qwen3.5:0.8b` | Fast, lightweight, good for verifying setup | |
| 92 | +| Coding and reasoning | `qwen3.5` | Strong tool calling support for agentic workflows | |
| 93 | +| Complex tasks | `nemotron-3-super` | 122B parameter model, needs 48GB+ VRAM | |
| 94 | +| No local GPU | `qwen3.5:cloud` | Runs on Ollama's cloud infrastructure, no `ollama pull` required | |
| 95 | + |
| 96 | +:::{note} |
| 97 | +Cloud models use the `:cloud` tag suffix and do not require local hardware. |
| 98 | + |
| 99 | +```console |
| 100 | +$ openshell sandbox create --from ollama |
| 101 | +``` |
| 102 | +::: |
| 103 | + |
| 104 | +### Tool Calling |
| 105 | + |
| 106 | +Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the [Ollama model library](https://ollama.com/library) for the latest models. |
| 107 | + |
| 108 | +### Updating Ollama |
| 109 | + |
| 110 | +To update Ollama inside a running sandbox: |
| 111 | + |
| 112 | +```console |
| 113 | +$ update-ollama |
| 114 | +``` |
| 115 | + |
| 116 | +Or auto-update on every sandbox start: |
| 117 | + |
| 118 | +```console |
| 119 | +$ openshell sandbox create --from ollama -e OLLAMA_UPDATE=1 |
| 120 | +``` |
| 121 | + |
| 122 | +## Option B: Host-Level Ollama |
| 123 | + |
| 124 | +Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through `inference.local`. |
| 125 | + |
| 126 | +:::{note} |
| 127 | +This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name. |
| 128 | +::: |
| 129 | + |
| 130 | +### Step 1: Install and Start Ollama |
| 131 | + |
| 132 | +Install [Ollama](https://ollama.com/) on the gateway host: |
| 133 | + |
| 134 | +```console |
| 135 | +$ curl -fsSL https://ollama.com/install.sh | sh |
| 136 | +``` |
| 137 | + |
| 138 | +Start Ollama on all interfaces so it is reachable from sandboxes: |
| 139 | + |
| 140 | +```console |
| 141 | +$ OLLAMA_HOST=0.0.0.0:11434 ollama serve |
| 142 | +``` |
| 143 | + |
| 144 | +:::{tip} |
| 145 | +If you see `Error: listen tcp 0.0.0.0:11434: bind: address already in use`, Ollama is already running as a system service. Stop it first: |
| 146 | + |
| 147 | +```console |
| 148 | +$ systemctl stop ollama |
| 149 | +$ OLLAMA_HOST=0.0.0.0:11434 ollama serve |
| 150 | +``` |
| 151 | +::: |
| 152 | + |
| 153 | +### Step 2: Pull a Model |
| 154 | + |
| 155 | +In a second terminal, pull a model: |
| 156 | + |
| 157 | +```console |
| 158 | +$ ollama run qwen3.5:0.8b |
| 159 | +``` |
| 160 | + |
| 161 | +Type `/bye` to exit the interactive session. The model stays loaded. |
| 162 | + |
| 163 | +### Step 3: Create a Provider |
| 164 | + |
| 165 | +Create an OpenAI-compatible provider pointing at the host Ollama: |
| 166 | + |
| 167 | +```console |
| 168 | +$ openshell provider create \ |
| 169 | + --name ollama \ |
| 170 | + --type openai \ |
| 171 | + --credential OPENAI_API_KEY=empty \ |
| 172 | + --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1 |
| 173 | +``` |
| 174 | + |
| 175 | +OpenShell injects `host.openshell.internal` so sandboxes and the gateway can reach the host machine. You can also use the host's LAN IP. |
| 176 | + |
| 177 | +### Step 4: Set Inference Routing |
| 178 | + |
| 179 | +```console |
| 180 | +$ openshell inference set --provider ollama --model qwen3.5:0.8b |
| 181 | +``` |
| 182 | + |
| 183 | +Confirm: |
| 184 | + |
| 185 | +```console |
| 186 | +$ openshell inference get |
| 187 | +``` |
| 188 | + |
| 189 | +### Step 5: Verify from a Sandbox |
| 190 | + |
| 191 | +```console |
| 192 | +$ openshell sandbox create -- \ |
| 193 | + curl https://inference.local/v1/chat/completions \ |
| 194 | + --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}' |
| 195 | +``` |
| 196 | + |
| 197 | +The response should be JSON from the model. |
| 198 | + |
| 199 | +## Troubleshooting |
| 200 | + |
| 201 | +Common issues and fixes: |
| 202 | + |
| 203 | +- **Ollama not reachable from sandbox** — Ollama must be bound to `0.0.0.0`, not `127.0.0.1`. This applies to host-level Ollama only; the community sandbox handles this automatically. |
| 204 | +- **`OPENAI_BASE_URL` wrong** — Use `http://host.openshell.internal:11434/v1`, not `localhost` or `127.0.0.1`. |
| 205 | +- **Model not found** — Run `ollama ps` to confirm the model is loaded. Run `ollama pull <model>` if needed. |
| 206 | +- **HTTPS vs HTTP** — Code inside sandboxes must call `https://inference.local`, not `http://`. |
| 207 | +- **AMD GPU driver issues** — Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures. |
| 208 | + |
| 209 | +Useful commands: |
| 210 | + |
| 211 | +```console |
| 212 | +$ openshell status |
| 213 | +$ openshell inference get |
| 214 | +$ openshell provider get ollama |
| 215 | +``` |
| 216 | + |
| 217 | +## Next Steps |
| 218 | + |
| 219 | +- To learn more about managed inference, refer to {doc}`/inference/index`. |
| 220 | +- To configure a different self-hosted backend, refer to {doc}`/inference/configure`. |
| 221 | +- To explore more community sandboxes, refer to {doc}`/sandboxes/community-sandboxes`. |
0 commit comments