Skip to content

Commit 85903b9

Browse files
authored
docs: add debug-inference skill, Ollama tutorial, and remove stale inference policy references (#353)
1 parent 079c8f8 commit 85903b9

File tree

17 files changed

+554
-37
lines changed

17 files changed

+554
-37
lines changed
Lines changed: 345 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,345 @@
1+
---
2+
name: debug-inference
3+
description: Debug why inference.local or external inference setup is failing. Use when the user cannot reach a local model server, has provider base URL issues, sees inference verification failures, hits protocol mismatches, or needs to diagnose inference on local vs remote gateways. Trigger keywords - debug inference, inference.local, local inference, ollama, vllm, sglang, trtllm, NIM, inference failing, model server unreachable, failed to verify inference endpoint, host.openshell.internal.
4+
---
5+
6+
# Debug Inference
7+
8+
Diagnose why OpenShell inference is failing and recommend exact fix commands.
9+
10+
Use `openshell` CLI commands to inspect the active gateway, provider records, managed inference config, and sandbox behavior. Use a short sandbox probe when needed to confirm end-to-end routing.
11+
12+
## Overview
13+
14+
OpenShell supports two different inference paths. Diagnose the correct one first.
15+
16+
1. **Managed inference** through `https://inference.local`
17+
- Configured by `openshell inference set`
18+
- Shared by every sandbox on the active gateway
19+
- Credentials and model are injected by OpenShell
20+
2. **Direct external inference** to hosts like `api.openai.com`
21+
- Controlled by `network_policies`
22+
- Requires the application to call the external host directly
23+
- Requires provider attachment and network access to be configured separately
24+
25+
For local or self-hosted engines such as Ollama, vLLM, SGLang, TRT-LLM, and many NIM deployments, the most common managed inference pattern is an `openai` provider with `OPENAI_BASE_URL` pointing at a host the gateway can reach.
26+
27+
## Prerequisites
28+
29+
- `openshell` is on the PATH
30+
- The active gateway is running
31+
- You know the failing setup, or can infer it from commands and config
32+
33+
## Tools Available
34+
35+
Use these commands first:
36+
37+
```bash
38+
# Which gateway is active, and can the CLI reach it?
39+
openshell status
40+
41+
# Show managed inference config for inference.local
42+
openshell inference get
43+
44+
# Inspect the provider record referenced by inference.local
45+
openshell provider get <provider-name>
46+
47+
# Inspect gateway topology details when remote/local confusion is suspected
48+
openshell gateway info
49+
50+
# Run a minimal end-to-end probe from a sandbox
51+
openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
52+
```
53+
54+
## Workflow
55+
56+
When the user asks to debug inference, run diagnostics automatically in this order. Stop and report findings as soon as a root cause is identified.
57+
58+
### Determine Context
59+
60+
Establish these facts first:
61+
62+
1. Is the application calling `https://inference.local` or a direct external host?
63+
2. Which gateway is active, and is it local, remote, or cloud?
64+
3. Which provider and model are configured for managed inference?
65+
4. Is the upstream local to the gateway host, or somewhere else?
66+
67+
### Step 0: Check the Active Gateway
68+
69+
Run:
70+
71+
```bash
72+
openshell status
73+
openshell gateway info
74+
```
75+
76+
Look for:
77+
78+
- Active gateway name and endpoint
79+
- Whether the gateway is local or remote
80+
- Whether `host.openshell.internal` would point to the local machine or a remote host
81+
82+
Common mistake:
83+
84+
- **Laptop-local model + remote gateway**: `host.openshell.internal` points to the remote gateway host, not your laptop. A laptop-local Ollama or vLLM server will not be reachable without a tunnel or shared reachable network path.
85+
86+
### Step 1: Check Whether Managed Inference Is Configured
87+
88+
Run:
89+
90+
```bash
91+
openshell inference get
92+
```
93+
94+
Interpretation:
95+
96+
- **`Not configured`**: `inference.local` has no backend yet. Fix by configuring it:
97+
98+
```bash
99+
openshell inference set --provider <name> --model <id>
100+
```
101+
102+
- **Provider and model shown**: Continue to provider inspection.
103+
104+
### Step 2: Inspect the Provider Record
105+
106+
Run:
107+
108+
```bash
109+
openshell provider get <provider-name>
110+
```
111+
112+
Check:
113+
114+
- Provider type matches the client API shape
115+
- `openai` for OpenAI-compatible engines such as Ollama, vLLM, SGLang, TRT-LLM, and many NIM deployments
116+
- `anthropic` for Anthropic Messages API
117+
- `nvidia` for NVIDIA-hosted OpenAI-compatible endpoints
118+
- Required credential key exists
119+
- `*_BASE_URL` override is correct when using a self-hosted endpoint
120+
121+
Fix examples:
122+
123+
```bash
124+
openshell provider create --name ollama --type openai --credential OPENAI_API_KEY=empty --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
125+
126+
openshell provider update ollama --type openai --credential OPENAI_API_KEY=empty --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
127+
```
128+
129+
### Step 3: Check Local Host Reachability
130+
131+
For host-backed local inference, confirm the upstream server:
132+
133+
- Binds to `0.0.0.0`, not only `127.0.0.1`
134+
- Runs on the same machine as the gateway
135+
- Is reachable through `host.openshell.internal`, the host's LAN IP, or another reachable hostname
136+
137+
Common mistakes:
138+
139+
- **Base URL uses `127.0.0.1` or `localhost`**: usually wrong for managed inference. Replace with `host.openshell.internal` or the host's LAN IP.
140+
- **Server binds only to loopback**: reconfigure it to bind to `0.0.0.0`.
141+
- **Inference engine runs as a system service**: changing the bind address may require updating the service configuration and restarting the service before the new listener becomes reachable.
142+
143+
### Step 4: Check Request Shape
144+
145+
Managed inference only works for `https://inference.local` and supported inference API paths.
146+
147+
Supported patterns include:
148+
149+
- `POST /v1/chat/completions`
150+
- `POST /v1/completions`
151+
- `POST /v1/responses`
152+
- `POST /v1/messages`
153+
- `GET /v1/models`
154+
155+
Common mistakes:
156+
157+
- **Wrong scheme**: `http://inference.local` instead of `https://inference.local`
158+
- **Unsupported path**: request does not match a known inference API
159+
- **Protocol mismatch**: Anthropic client against an `openai` provider, or vice versa
160+
161+
Fix guidance:
162+
163+
- Use a supported path and provider type
164+
- Point OpenAI-compatible SDKs at `https://inference.local/v1`
165+
- If the SDK requires an API key, pass any non-empty placeholder such as `test`
166+
167+
### Step 5: Probe from a Sandbox
168+
169+
Run a minimal request from inside a sandbox:
170+
171+
```bash
172+
openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
173+
```
174+
175+
Interpretation:
176+
177+
- **`cluster inference is not configured`**: set the managed route with `openshell inference set`
178+
- **`connection not allowed by policy`** on `inference.local`: unsupported method or path
179+
- **`no compatible route`**: provider type and client API shape do not match
180+
- **Connection refused / upstream unavailable / verification failures**: base URL, bind address, topology, or credentials are wrong
181+
182+
### Step 6: Reapply or Repair the Managed Route
183+
184+
After fixing the provider, repoint `inference.local`:
185+
186+
```bash
187+
openshell inference set --provider <name> --model <id>
188+
```
189+
190+
If the endpoint is intentionally offline and you only want to save the config:
191+
192+
```bash
193+
openshell inference set --provider <name> --model <id> --no-verify
194+
```
195+
196+
Inference updates are hot-reloaded to all sandboxes on the active gateway within about 5 seconds by default.
197+
198+
### Step 7: Diagnose Direct External Inference
199+
200+
If the application calls `api.openai.com`, `api.anthropic.com`, or another external host directly, this is not a managed inference issue.
201+
202+
Check instead:
203+
204+
1. The application is configured to call the external hostname directly
205+
2. A provider with the needed credentials exists
206+
3. The sandbox is launched with that provider attached
207+
4. `network_policies` allow that host, port, and HTTP rules
208+
209+
Use the `generate-sandbox-policy` skill when the user needs help authoring policy YAML.
210+
211+
## Fix: Local Host Inference Timeouts (Firewall)
212+
213+
Use this fix when a sandbox can reach `https://inference.local`, but OpenShell reports an upstream timeout against a host-local backend such as Ollama.
214+
215+
Example symptom:
216+
217+
```json
218+
{"error":"request to http://host.docker.internal:11434/v1/models timed out"}
219+
```
220+
221+
### When This Happens
222+
223+
This failure commonly appears on Linux hosts that:
224+
225+
- Run the OpenShell gateway in Docker
226+
- Route `inference.local` to a host-local OpenAI-compatible endpoint such as Ollama
227+
- Have a host firewall or networking configuration that denies container-to-host traffic by default
228+
229+
In this case, OpenShell routing is usually working correctly. The failing hop is container-to-host traffic on the backend port.
230+
231+
### Why CoreDNS Is Not the Cause
232+
233+
This is not the same issue as the Colima CoreDNS fix.
234+
235+
OpenShell injects `host.docker.internal` and `host.openshell.internal` into sandbox pods with `hostAliases`. That path bypasses cluster DNS lookup. If the request still times out, the usual cause is host firewall or network policy, not CoreDNS.
236+
237+
### Verify the Problem
238+
239+
1. Confirm the model server works on the host:
240+
241+
```bash
242+
curl -sS http://127.0.0.1:11434/v1/models
243+
```
244+
245+
2. Confirm the host gateway address also works on the host:
246+
247+
```bash
248+
curl -sS http://172.17.0.1:11434/v1/models
249+
```
250+
251+
3. Test the same endpoint from the OpenShell cluster container:
252+
253+
```bash
254+
docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models
255+
```
256+
257+
If steps 1 and 2 succeed but step 3 times out, the host firewall or network configuration is blocking the container-to-host path.
258+
259+
### Fix
260+
261+
Allow the Docker bridge network used by the OpenShell cluster to reach the host-local inference port. The exact command depends on your firewall tooling (iptables, nftables, firewalld, UFW, etc.), but the rule should allow:
262+
263+
- **Source**: the Docker bridge subnet used by the OpenShell cluster container (commonly `172.18.0.0/16`)
264+
- **Destination**: the host gateway IP injected into sandbox pods for `host.docker.internal` (commonly `172.17.0.1`)
265+
- **Port**: the inference server port (e.g. `11434/tcp` for Ollama)
266+
267+
To find the actual values on your system:
268+
269+
```bash
270+
# Docker bridge subnet for the OpenShell cluster network
271+
docker network inspect $(docker network ls --filter name=openshell -q) --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}'
272+
273+
# Host gateway IP visible from inside the container
274+
docker exec openshell-cluster-<gateway> cat /etc/hosts | grep host.docker.internal
275+
```
276+
277+
Adjust the source subnet, destination IP, or port to match your local Docker network layout.
278+
279+
### Verify the Fix
280+
281+
1. Re-run the cluster container check:
282+
283+
```bash
284+
docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models
285+
```
286+
287+
2. Re-test from a sandbox:
288+
289+
```bash
290+
curl -sS https://inference.local/v1/models
291+
```
292+
293+
Both commands should return the upstream model list.
294+
295+
### If It Still Fails
296+
297+
- Confirm the backend listens on a host-reachable address: `ss -ltnp | rg ':11434\b'`
298+
- Confirm the provider points at the host alias path you expect: `openshell provider get <provider-name>`
299+
- Confirm the active inference route: `openshell inference get`
300+
- Inspect sandbox logs for upstream timeout details: `openshell logs <sandbox-name> --since 10m`
301+
302+
## Common Failure Patterns
303+
304+
| Symptom | Likely cause | Fix |
305+
|---------|--------------|-----|
306+
| `openshell inference get` shows `Not configured` | No managed inference route configured | `openshell inference set --provider <name> --model <id>` |
307+
| `failed to verify inference endpoint` | Bad base URL, wrong credentials, wrong provider type, or upstream not reachable | Fix provider config, then rerun `openshell inference set`; use `--no-verify` only when the endpoint is intentionally offline |
308+
| Base URL uses `127.0.0.1` | Loopback points at the wrong runtime | Use `host.openshell.internal` or another gateway-reachable host |
309+
| Local engine works only when gateway is local | Gateway moved to remote host | Run the engine on the gateway host, add a tunnel, or use direct external access |
310+
| `connection not allowed by policy` on `inference.local` | Unsupported path or method | Use a supported inference API path |
311+
| `no compatible route` | Provider type does not match request shape | Switch provider type or change the client API |
312+
| Direct call to external host is denied | Missing policy or provider attachment | Update `network_policies` and launch sandbox with the right provider |
313+
| SDK fails on empty auth token | Client requires a non-empty API key even though OpenShell injects the real one | Use any placeholder token such as `test` |
314+
| Upstream timeout from container to host-local backend | Host firewall or network config blocks container-to-host traffic | Allow the Docker bridge subnet to reach the inference port on the host gateway IP (see firewall fix section above) |
315+
316+
## Full Diagnostic Dump
317+
318+
Run this when you want a compact report before deciding on a fix:
319+
320+
```bash
321+
echo "=== Gateway Status ==="
322+
openshell status
323+
324+
echo "=== Gateway Info ==="
325+
openshell gateway info
326+
327+
echo "=== Managed Inference ==="
328+
openshell inference get
329+
330+
echo "=== Providers ==="
331+
openshell provider list
332+
333+
echo "=== Selected Provider ==="
334+
openshell provider get <provider-name>
335+
336+
echo "=== Sandbox Probe ==="
337+
openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
338+
```
339+
340+
When you report back, state:
341+
342+
1. Which inference path is failing (`inference.local` vs direct external)
343+
2. Whether gateway topology is part of the problem
344+
3. The most likely root cause
345+
4. The exact fix commands the user should run

.agents/skills/openshell-cli/SKILL.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ openshell sandbox delete sandbox-1 sandbox-2 sandbox-3 # Multiple at once
208208

209209
This is the most important multi-step workflow. It enables a tight feedback cycle where sandbox policy is refined based on observed activity.
210210

211-
**Key concept**: Policies have static fields (immutable after creation: `filesystem_policy`, `landlock`, `process`) and dynamic fields (hot-reloadable on a running sandbox: `network_policies`, `inference`). Only dynamic fields can be updated without recreating the sandbox.
211+
**Key concept**: Policies have static fields (immutable after creation: `filesystem_policy`, `landlock`, `process`) and one dynamic field (`network_policies`). Only `network_policies` can be updated without recreating the sandbox.
212212

213213
```
214214
Create sandbox with initial policy
@@ -272,7 +272,7 @@ Edit `current-policy.yaml` to allow the blocked actions. **For policy content au
272272
- Enforcement modes (`audit` vs `enforce`)
273273
- Binary matching patterns
274274

275-
Only `network_policies` and `inference` sections can be modified at runtime. If `filesystem_policy`, `landlock`, or `process` need changes, the sandbox must be recreated.
275+
Only `network_policies` can be modified at runtime. If `filesystem_policy`, `landlock`, or `process` need changes, the sandbox must be recreated.
276276

277277
### Step 5: Push the updated policy
278278

@@ -564,4 +564,5 @@ $ openshell sandbox upload --help
564564
|-------|------------|
565565
| `generate-sandbox-policy` | Creating or modifying policy YAML content (network rules, L7 inspection, access presets, endpoint configuration) |
566566
| `debug-openshell-cluster` | Diagnosing cluster startup or health failures |
567+
| `debug-inference` | Diagnosing `inference.local`, host-backed local inference, and provider base URL issues |
567568
| `tui-development` | Developing features for the OpenShell TUI (`openshell term`) |

.agents/skills/openshell-cli/cli-reference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,7 @@ View sandbox logs. Supports one-shot and streaming.
270270

271271
### `openshell policy set <name> --policy <PATH>`
272272

273-
Update the policy on a live sandbox. Only dynamic fields (`network_policies`, `inference`) can be changed at runtime.
273+
Update the policy on a live sandbox. Only the dynamic `network_policies` field can be changed at runtime.
274274

275275
| Flag | Default | Description |
276276
|------|---------|-------------|

.agents/skills/triage-issue/SKILL.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ Check whether the issue body contains a substantive agent diagnostic section. Lo
9191
>
9292
> This issue was opened without an agent investigation.
9393
>
94-
> OpenShell is an agent-first project before we triage this, please point your coding agent at the repo and have it investigate. Your agent can load skills like `debug-openshell-cluster` (for cluster issues), `openshell-cli` (for usage questions), or `generate-sandbox-policy` (for policy help).
94+
> OpenShell is an agent-first project - before we triage this, please point your coding agent at the repo and have it investigate. Your agent can load skills like `debug-openshell-cluster` (for cluster issues), `debug-inference` (for inference setup issues), `openshell-cli` (for usage questions), or `generate-sandbox-policy` (for policy help).
9595
>
9696
> See [CONTRIBUTING.md](https://github.com/NVIDIA/OpenShell/blob/main/CONTRIBUTING.md#before-you-open-an-issue) for the full workflow.
9797
>
@@ -123,6 +123,7 @@ Based on the sub-agent's analysis, also attempt to validate the report directly:
123123
- For bug reports: check the relevant code paths, look for the described failure mode
124124
- For feature requests: assess feasibility against the existing architecture
125125
- For cluster/infrastructure issues: reference the `debug-openshell-cluster` skill's known failure patterns
126+
- For inference and provider-topology issues: reference the `debug-inference` skill's known failure patterns
126127
- For CLI/usage issues: reference the `openshell-cli` skill's command reference
127128

128129
## Step 5: Classify

0 commit comments

Comments
 (0)