Skip to content

Commit c0569d2

Browse files
committed
docs(ollama): update ollama tutorial and references to match latest
features
1 parent 51aeffc commit c0569d2

File tree

6 files changed

+231
-160
lines changed

6 files changed

+231
-160
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ uv tool install -U openshell
3636
### Create a sandbox
3737

3838
```bash
39-
openshell sandbox create -- claude # or opencode, codex, copilot, ollama
39+
openshell sandbox create -- claude # or opencode, codex, copilot
4040
```
4141

4242
A gateway is created automatically on first use. To deploy on a remote host instead, pass `--remote user@host` to the create command.

docs/about/supported-agents.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ The following table summarizes the agents that run in OpenShell sandboxes. All a
99
| [Codex](https://developers.openai.com/codex) | [`base`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/base) | No coverage | Pre-installed. Requires a custom policy with OpenAI endpoints and Codex binary paths. Requires `OPENAI_API_KEY`. |
1010
| [GitHub Copilot CLI](https://docs.github.com/en/copilot/github-copilot-in-the-cli) | [`base`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/base) | Full coverage | Pre-installed. Works out of the box. Requires `GITHUB_TOKEN` or `COPILOT_GITHUB_TOKEN`. |
1111
| [OpenClaw](https://openclaw.ai/) | [`openclaw`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/openclaw) | Bundled | Agent orchestration layer. Launch with `openshell sandbox create --from openclaw`. |
12-
| [Ollama](https://ollama.com/) | [`ollama`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/ollama) | Bundled | Run cloud and local models. Includes Claude Code, Codex, and OpenClaw. Launch with `openshell sandbox create --from ollama`. |
12+
| [Ollama](https://ollama.com/) | [`ollama`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/ollama) | Bundled | Run cloud and local models. Includes Claude Code, Codex, and OpenCode. Launch with `openshell sandbox create --from ollama`. |
1313

1414
More community agent sandboxes are available in the {doc}`../sandboxes/community-sandboxes` catalog.
1515

16-
For a complete support matrix, refer to the {doc}`../reference/support-matrix` page.
16+
For a complete support matrix, refer to the {doc}`../reference/support-matrix` page.

docs/inference/configure.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,12 @@ $ openshell provider create \
8181

8282
Use `--config OPENAI_BASE_URL` to point to any OpenAI-compatible server running where the gateway runs. For host-backed local inference, use `host.openshell.internal` or the host's LAN IP. Avoid `127.0.0.1` and `localhost`. Set `OPENAI_API_KEY` to a dummy value if the server does not require authentication.
8383

84+
:::{tip}
85+
For a self-contained setup, the Ollama community sandbox bundles Ollama inside the sandbox itself — no host-level provider needed. See {doc}`/tutorials/local-inference-ollama` for details.
86+
:::
87+
88+
Ollama also supports cloud-hosted models using the `:cloud` tag suffix (e.g., `qwen3.5:cloud`).
89+
8490
::::
8591

8692
::::{tab-item} Anthropic

docs/sandboxes/community-sandboxes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ The following community sandboxes are available in the catalog.
4343
| Sandbox | Description |
4444
|---|---|
4545
| `base` | Foundational image with system tools and dev environment |
46-
| `ollama` | Ollama with cloud and local model support, Claude Code, Codex, and OpenClaw pre-installed |
46+
| `ollama` | Ollama with cloud and local model support, Claude Code, OpenCode, and Codex pre-installed. Use `ollama launch` inside the sandbox to start coding agents with zero config.
4747
| `openclaw` | Open agent manipulation and control |
4848
| `sdg` | Synthetic data generation workflows |
4949

docs/tutorials/inference-ollama.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
---
2+
title:
3+
page: Inference with Ollama
4+
nav: Inference with Ollama
5+
description: Run local and cloud models inside an OpenShell sandbox using the Ollama community sandbox, or route sandbox requests to a host-level Ollama server.
6+
topics:
7+
- Generative AI
8+
- Cybersecurity
9+
tags:
10+
- Tutorial
11+
- Inference Routing
12+
- Ollama
13+
- Local Inference
14+
- Sandbox
15+
content:
16+
type: tutorial
17+
difficulty: technical_intermediate
18+
audience:
19+
- engineer
20+
---
21+
22+
<!--
23+
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
24+
SPDX-License-Identifier: Apache-2.0
25+
-->
26+
27+
# Run Local Inference with Ollama
28+
29+
This tutorial covers two ways to use Ollama with OpenShell:
30+
31+
1. **Ollama sandbox (recommended)** — a self-contained sandbox with Ollama, Claude Code, and Codex pre-installed. One command to start.
32+
2. **Host-level Ollama** — run Ollama on the gateway host and route sandbox inference to it. Useful when you want a single Ollama instance shared across multiple sandboxes.
33+
34+
After completing this tutorial, you will know how to:
35+
36+
- Launch the Ollama community sandbox for a batteries-included experience.
37+
- Use `ollama launch` to start coding agents inside a sandbox.
38+
- Expose a host-level Ollama server to sandboxes through `inference.local`.
39+
40+
## Prerequisites
41+
42+
- A working OpenShell installation. Complete the {doc}`/get-started/quickstart` before proceeding.
43+
44+
## Option A: Ollama Community Sandbox (Recommended)
45+
46+
The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches.
47+
48+
### Step 1: Create the Sandbox
49+
50+
```console
51+
$ openshell sandbox create --from ollama
52+
```
53+
54+
This pulls the community sandbox image, applies the bundled policy, and drops you into a shell with Ollama running.
55+
56+
:::
57+
58+
### Step 2: Chat with a Model
59+
60+
Chat with a local model
61+
62+
```console
63+
$ ollama run qwen3.5
64+
```
65+
66+
Or a cloud model
67+
68+
```console
69+
$ ollama run kimi-k2.5:cloud
70+
```
71+
72+
73+
Or use `ollama launch` to start a coding agent with Ollama as the model backend:
74+
75+
```console
76+
$ ollama launch claude
77+
$ ollama launch codex
78+
$ ollama launch opencode
79+
```
80+
81+
For CI/CD and automated workflows, `ollama launch` supports a headless mode:
82+
83+
```console
84+
$ ollama launch claude --yes --model qwen3.5
85+
```
86+
87+
### Model Recommendations
88+
89+
| Use case | Model | Notes |
90+
|---|---|---|
91+
| Smoke test | `qwen3.5:0.8b` | Fast, lightweight, good for verifying setup |
92+
| Coding and reasoning | `qwen3.5` | Strong tool calling support for agentic workflows |
93+
| Complex tasks | `nemotron-3-super` | 122B parameter model, needs 48GB+ VRAM |
94+
| No local GPU | `qwen3.5:cloud` | Runs on Ollama's cloud infrastructure, no `ollama pull` required |
95+
96+
:::{note}
97+
Cloud models use the `:cloud` tag suffix and do not require local hardware.
98+
99+
```console
100+
$ openshell sandbox create --from ollama
101+
```
102+
:::
103+
104+
### Tool Calling
105+
106+
Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the [Ollama model library](https://ollama.com/library) for the latest models.
107+
108+
### Updating Ollama
109+
110+
To update Ollama inside a running sandbox:
111+
112+
```console
113+
$ update-ollama
114+
```
115+
116+
Or auto-update on every sandbox start:
117+
118+
```console
119+
$ openshell sandbox create --from ollama -e OLLAMA_UPDATE=1
120+
```
121+
122+
## Option B: Host-Level Ollama
123+
124+
Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through `inference.local`.
125+
126+
:::{note}
127+
This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.
128+
:::
129+
130+
### Step 1: Install and Start Ollama
131+
132+
Install [Ollama](https://ollama.com/) on the gateway host:
133+
134+
```console
135+
$ curl -fsSL https://ollama.com/install.sh | sh
136+
```
137+
138+
Start Ollama on all interfaces so it is reachable from sandboxes:
139+
140+
```console
141+
$ OLLAMA_HOST=0.0.0.0:11434 ollama serve
142+
```
143+
144+
:::{tip}
145+
If you see `Error: listen tcp 0.0.0.0:11434: bind: address already in use`, Ollama is already running as a system service. Stop it first:
146+
147+
```console
148+
$ systemctl stop ollama
149+
$ OLLAMA_HOST=0.0.0.0:11434 ollama serve
150+
```
151+
:::
152+
153+
### Step 2: Pull a Model
154+
155+
In a second terminal, pull a model:
156+
157+
```console
158+
$ ollama run qwen3.5:0.8b
159+
```
160+
161+
Type `/bye` to exit the interactive session. The model stays loaded.
162+
163+
### Step 3: Create a Provider
164+
165+
Create an OpenAI-compatible provider pointing at the host Ollama:
166+
167+
```console
168+
$ openshell provider create \
169+
--name ollama \
170+
--type openai \
171+
--credential OPENAI_API_KEY=empty \
172+
--config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
173+
```
174+
175+
OpenShell injects `host.openshell.internal` so sandboxes and the gateway can reach the host machine. You can also use the host's LAN IP.
176+
177+
### Step 4: Set Inference Routing
178+
179+
```console
180+
$ openshell inference set --provider ollama --model qwen3.5:0.8b
181+
```
182+
183+
Confirm:
184+
185+
```console
186+
$ openshell inference get
187+
```
188+
189+
### Step 5: Verify from a Sandbox
190+
191+
```console
192+
$ openshell sandbox create -- \
193+
curl https://inference.local/v1/chat/completions \
194+
--json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
195+
```
196+
197+
The response should be JSON from the model.
198+
199+
## Troubleshooting
200+
201+
Common issues and fixes:
202+
203+
- **Ollama not reachable from sandbox** — Ollama must be bound to `0.0.0.0`, not `127.0.0.1`. This applies to host-level Ollama only; the community sandbox handles this automatically.
204+
- **`OPENAI_BASE_URL` wrong** — Use `http://host.openshell.internal:11434/v1`, not `localhost` or `127.0.0.1`.
205+
- **Model not found** — Run `ollama ps` to confirm the model is loaded. Run `ollama pull <model>` if needed.
206+
- **HTTPS vs HTTP** — Code inside sandboxes must call `https://inference.local`, not `http://`.
207+
- **AMD GPU driver issues** — Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures.
208+
209+
Useful commands:
210+
211+
```console
212+
$ openshell status
213+
$ openshell inference get
214+
$ openshell provider get ollama
215+
```
216+
217+
## Next Steps
218+
219+
- To learn more about managed inference, refer to {doc}`/inference/index`.
220+
- To configure a different self-hosted backend, refer to {doc}`/inference/configure`.
221+
- To explore more community sandboxes, refer to {doc}`/sandboxes/community-sandboxes`.

0 commit comments

Comments
 (0)