|
| 1 | +--- |
| 2 | +name: debug-inference |
| 3 | +description: Debug why inference.local or external inference setup is failing. Use when the user cannot reach a local model server, has provider base URL issues, sees inference verification failures, hits protocol mismatches, or needs to diagnose inference on local vs remote gateways. Trigger keywords - debug inference, inference.local, local inference, ollama, vllm, sglang, trtllm, NIM, inference failing, model server unreachable, failed to verify inference endpoint, host.openshell.internal. |
| 4 | +--- |
| 5 | + |
| 6 | +# Debug Inference |
| 7 | + |
| 8 | +Diagnose why OpenShell inference is failing and recommend exact fix commands. |
| 9 | + |
| 10 | +Use `openshell` CLI commands to inspect the active gateway, provider records, managed inference config, and sandbox behavior. Use a short sandbox probe when needed to confirm end-to-end routing. |
| 11 | + |
| 12 | +## Overview |
| 13 | + |
| 14 | +OpenShell supports two different inference paths. Diagnose the correct one first. |
| 15 | + |
| 16 | +1. **Managed inference** through `https://inference.local` |
| 17 | + - Configured by `openshell inference set` |
| 18 | + - Shared by every sandbox on the active gateway |
| 19 | + - Credentials and model are injected by OpenShell |
| 20 | +2. **Direct external inference** to hosts like `api.openai.com` |
| 21 | + - Controlled by `network_policies` |
| 22 | + - Requires the application to call the external host directly |
| 23 | + - Requires provider attachment and network access to be configured separately |
| 24 | + |
| 25 | +For local or self-hosted engines such as Ollama, vLLM, SGLang, TRT-LLM, and many NIM deployments, the most common managed inference pattern is an `openai` provider with `OPENAI_BASE_URL` pointing at a host the gateway can reach. |
| 26 | + |
| 27 | +## Prerequisites |
| 28 | + |
| 29 | +- `openshell` is on the PATH |
| 30 | +- The active gateway is running |
| 31 | +- You know the failing setup, or can infer it from commands and config |
| 32 | + |
| 33 | +## Tools Available |
| 34 | + |
| 35 | +Use these commands first: |
| 36 | + |
| 37 | +```bash |
| 38 | +# Which gateway is active, and can the CLI reach it? |
| 39 | +openshell status |
| 40 | + |
| 41 | +# Show managed inference config for inference.local |
| 42 | +openshell inference get |
| 43 | + |
| 44 | +# Inspect the provider record referenced by inference.local |
| 45 | +openshell provider get <provider-name> |
| 46 | + |
| 47 | +# Inspect gateway topology details when remote/local confusion is suspected |
| 48 | +openshell gateway info |
| 49 | + |
| 50 | +# Run a minimal end-to-end probe from a sandbox |
| 51 | +openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}' |
| 52 | +``` |
| 53 | + |
| 54 | +## Workflow |
| 55 | + |
| 56 | +When the user asks to debug inference, run diagnostics automatically in this order. Stop and report findings as soon as a root cause is identified. |
| 57 | + |
| 58 | +### Determine Context |
| 59 | + |
| 60 | +Establish these facts first: |
| 61 | + |
| 62 | +1. Is the application calling `https://inference.local` or a direct external host? |
| 63 | +2. Which gateway is active, and is it local, remote, or cloud? |
| 64 | +3. Which provider and model are configured for managed inference? |
| 65 | +4. Is the upstream local to the gateway host, or somewhere else? |
| 66 | + |
| 67 | +### Step 0: Check the Active Gateway |
| 68 | + |
| 69 | +Run: |
| 70 | + |
| 71 | +```bash |
| 72 | +openshell status |
| 73 | +openshell gateway info |
| 74 | +``` |
| 75 | + |
| 76 | +Look for: |
| 77 | + |
| 78 | +- Active gateway name and endpoint |
| 79 | +- Whether the gateway is local or remote |
| 80 | +- Whether `host.openshell.internal` would point to the local machine or a remote host |
| 81 | + |
| 82 | +Common mistake: |
| 83 | + |
| 84 | +- **Laptop-local model + remote gateway**: `host.openshell.internal` points to the remote gateway host, not your laptop. A laptop-local Ollama or vLLM server will not be reachable without a tunnel or shared reachable network path. |
| 85 | + |
| 86 | +### Step 1: Check Whether Managed Inference Is Configured |
| 87 | + |
| 88 | +Run: |
| 89 | + |
| 90 | +```bash |
| 91 | +openshell inference get |
| 92 | +``` |
| 93 | + |
| 94 | +Interpretation: |
| 95 | + |
| 96 | +- **`Not configured`**: `inference.local` has no backend yet. Fix by configuring it: |
| 97 | + |
| 98 | + ```bash |
| 99 | + openshell inference set --provider <name> --model <id> |
| 100 | + ``` |
| 101 | + |
| 102 | +- **Provider and model shown**: Continue to provider inspection. |
| 103 | + |
| 104 | +### Step 2: Inspect the Provider Record |
| 105 | + |
| 106 | +Run: |
| 107 | + |
| 108 | +```bash |
| 109 | +openshell provider get <provider-name> |
| 110 | +``` |
| 111 | + |
| 112 | +Check: |
| 113 | + |
| 114 | +- Provider type matches the client API shape |
| 115 | + - `openai` for OpenAI-compatible engines such as Ollama, vLLM, SGLang, TRT-LLM, and many NIM deployments |
| 116 | + - `anthropic` for Anthropic Messages API |
| 117 | + - `nvidia` for NVIDIA-hosted OpenAI-compatible endpoints |
| 118 | +- Required credential key exists |
| 119 | +- `*_BASE_URL` override is correct when using a self-hosted endpoint |
| 120 | + |
| 121 | +Fix examples: |
| 122 | + |
| 123 | +```bash |
| 124 | +openshell provider create --name ollama --type openai --credential OPENAI_API_KEY=empty --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1 |
| 125 | + |
| 126 | +openshell provider update ollama --type openai --credential OPENAI_API_KEY=empty --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1 |
| 127 | +``` |
| 128 | + |
| 129 | +### Step 3: Check Local Host Reachability |
| 130 | + |
| 131 | +For host-backed local inference, confirm the upstream server: |
| 132 | + |
| 133 | +- Binds to `0.0.0.0`, not only `127.0.0.1` |
| 134 | +- Runs on the same machine as the gateway |
| 135 | +- Is reachable through `host.openshell.internal`, the host's LAN IP, or another reachable hostname |
| 136 | + |
| 137 | +Common mistakes: |
| 138 | + |
| 139 | +- **Base URL uses `127.0.0.1` or `localhost`**: usually wrong for managed inference. Replace with `host.openshell.internal` or the host's LAN IP. |
| 140 | +- **Server binds only to loopback**: reconfigure it to bind to `0.0.0.0`. |
| 141 | +- **Inference engine runs as a system service**: changing the bind address may require updating the service configuration and restarting the service before the new listener becomes reachable. |
| 142 | + |
| 143 | +### Step 4: Check Request Shape |
| 144 | + |
| 145 | +Managed inference only works for `https://inference.local` and supported inference API paths. |
| 146 | + |
| 147 | +Supported patterns include: |
| 148 | + |
| 149 | +- `POST /v1/chat/completions` |
| 150 | +- `POST /v1/completions` |
| 151 | +- `POST /v1/responses` |
| 152 | +- `POST /v1/messages` |
| 153 | +- `GET /v1/models` |
| 154 | + |
| 155 | +Common mistakes: |
| 156 | + |
| 157 | +- **Wrong scheme**: `http://inference.local` instead of `https://inference.local` |
| 158 | +- **Unsupported path**: request does not match a known inference API |
| 159 | +- **Protocol mismatch**: Anthropic client against an `openai` provider, or vice versa |
| 160 | + |
| 161 | +Fix guidance: |
| 162 | + |
| 163 | +- Use a supported path and provider type |
| 164 | +- Point OpenAI-compatible SDKs at `https://inference.local/v1` |
| 165 | +- If the SDK requires an API key, pass any non-empty placeholder such as `test` |
| 166 | + |
| 167 | +### Step 5: Probe from a Sandbox |
| 168 | + |
| 169 | +Run a minimal request from inside a sandbox: |
| 170 | + |
| 171 | +```bash |
| 172 | +openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}' |
| 173 | +``` |
| 174 | + |
| 175 | +Interpretation: |
| 176 | + |
| 177 | +- **`cluster inference is not configured`**: set the managed route with `openshell inference set` |
| 178 | +- **`connection not allowed by policy`** on `inference.local`: unsupported method or path |
| 179 | +- **`no compatible route`**: provider type and client API shape do not match |
| 180 | +- **Connection refused / upstream unavailable / verification failures**: base URL, bind address, topology, or credentials are wrong |
| 181 | + |
| 182 | +### Step 6: Reapply or Repair the Managed Route |
| 183 | + |
| 184 | +After fixing the provider, repoint `inference.local`: |
| 185 | + |
| 186 | +```bash |
| 187 | +openshell inference set --provider <name> --model <id> |
| 188 | +``` |
| 189 | + |
| 190 | +If the endpoint is intentionally offline and you only want to save the config: |
| 191 | + |
| 192 | +```bash |
| 193 | +openshell inference set --provider <name> --model <id> --no-verify |
| 194 | +``` |
| 195 | + |
| 196 | +Inference updates are hot-reloaded to all sandboxes on the active gateway within about 5 seconds by default. |
| 197 | + |
| 198 | +### Step 7: Diagnose Direct External Inference |
| 199 | + |
| 200 | +If the application calls `api.openai.com`, `api.anthropic.com`, or another external host directly, this is not a managed inference issue. |
| 201 | + |
| 202 | +Check instead: |
| 203 | + |
| 204 | +1. The application is configured to call the external hostname directly |
| 205 | +2. A provider with the needed credentials exists |
| 206 | +3. The sandbox is launched with that provider attached |
| 207 | +4. `network_policies` allow that host, port, and HTTP rules |
| 208 | + |
| 209 | +Use the `generate-sandbox-policy` skill when the user needs help authoring policy YAML. |
| 210 | + |
| 211 | +## Fix: Local Host Inference Timeouts (Firewall) |
| 212 | + |
| 213 | +Use this fix when a sandbox can reach `https://inference.local`, but OpenShell reports an upstream timeout against a host-local backend such as Ollama. |
| 214 | + |
| 215 | +Example symptom: |
| 216 | + |
| 217 | +```json |
| 218 | +{"error":"request to http://host.docker.internal:11434/v1/models timed out"} |
| 219 | +``` |
| 220 | + |
| 221 | +### When This Happens |
| 222 | + |
| 223 | +This failure commonly appears on Linux hosts that: |
| 224 | + |
| 225 | +- Run the OpenShell gateway in Docker |
| 226 | +- Route `inference.local` to a host-local OpenAI-compatible endpoint such as Ollama |
| 227 | +- Have a host firewall or networking configuration that denies container-to-host traffic by default |
| 228 | + |
| 229 | +In this case, OpenShell routing is usually working correctly. The failing hop is container-to-host traffic on the backend port. |
| 230 | + |
| 231 | +### Why CoreDNS Is Not the Cause |
| 232 | + |
| 233 | +This is not the same issue as the Colima CoreDNS fix. |
| 234 | + |
| 235 | +OpenShell injects `host.docker.internal` and `host.openshell.internal` into sandbox pods with `hostAliases`. That path bypasses cluster DNS lookup. If the request still times out, the usual cause is host firewall or network policy, not CoreDNS. |
| 236 | + |
| 237 | +### Verify the Problem |
| 238 | + |
| 239 | +1. Confirm the model server works on the host: |
| 240 | + |
| 241 | + ```bash |
| 242 | + curl -sS http://127.0.0.1:11434/v1/models |
| 243 | + ``` |
| 244 | + |
| 245 | +2. Confirm the host gateway address also works on the host: |
| 246 | + |
| 247 | + ```bash |
| 248 | + curl -sS http://172.17.0.1:11434/v1/models |
| 249 | + ``` |
| 250 | + |
| 251 | +3. Test the same endpoint from the OpenShell cluster container: |
| 252 | + |
| 253 | + ```bash |
| 254 | + docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models |
| 255 | + ``` |
| 256 | + |
| 257 | +If steps 1 and 2 succeed but step 3 times out, the host firewall or network configuration is blocking the container-to-host path. |
| 258 | + |
| 259 | +### Fix |
| 260 | + |
| 261 | +Allow the Docker bridge network used by the OpenShell cluster to reach the host-local inference port. The exact command depends on your firewall tooling (iptables, nftables, firewalld, UFW, etc.), but the rule should allow: |
| 262 | + |
| 263 | +- **Source**: the Docker bridge subnet used by the OpenShell cluster container (commonly `172.18.0.0/16`) |
| 264 | +- **Destination**: the host gateway IP injected into sandbox pods for `host.docker.internal` (commonly `172.17.0.1`) |
| 265 | +- **Port**: the inference server port (e.g. `11434/tcp` for Ollama) |
| 266 | + |
| 267 | +To find the actual values on your system: |
| 268 | + |
| 269 | +```bash |
| 270 | +# Docker bridge subnet for the OpenShell cluster network |
| 271 | +docker network inspect $(docker network ls --filter name=openshell -q) --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}' |
| 272 | + |
| 273 | +# Host gateway IP visible from inside the container |
| 274 | +docker exec openshell-cluster-<gateway> cat /etc/hosts | grep host.docker.internal |
| 275 | +``` |
| 276 | + |
| 277 | +Adjust the source subnet, destination IP, or port to match your local Docker network layout. |
| 278 | + |
| 279 | +### Verify the Fix |
| 280 | + |
| 281 | +1. Re-run the cluster container check: |
| 282 | + |
| 283 | + ```bash |
| 284 | + docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models |
| 285 | + ``` |
| 286 | + |
| 287 | +2. Re-test from a sandbox: |
| 288 | + |
| 289 | + ```bash |
| 290 | + curl -sS https://inference.local/v1/models |
| 291 | + ``` |
| 292 | + |
| 293 | +Both commands should return the upstream model list. |
| 294 | + |
| 295 | +### If It Still Fails |
| 296 | + |
| 297 | +- Confirm the backend listens on a host-reachable address: `ss -ltnp | rg ':11434\b'` |
| 298 | +- Confirm the provider points at the host alias path you expect: `openshell provider get <provider-name>` |
| 299 | +- Confirm the active inference route: `openshell inference get` |
| 300 | +- Inspect sandbox logs for upstream timeout details: `openshell logs <sandbox-name> --since 10m` |
| 301 | + |
| 302 | +## Common Failure Patterns |
| 303 | + |
| 304 | +| Symptom | Likely cause | Fix | |
| 305 | +|---------|--------------|-----| |
| 306 | +| `openshell inference get` shows `Not configured` | No managed inference route configured | `openshell inference set --provider <name> --model <id>` | |
| 307 | +| `failed to verify inference endpoint` | Bad base URL, wrong credentials, wrong provider type, or upstream not reachable | Fix provider config, then rerun `openshell inference set`; use `--no-verify` only when the endpoint is intentionally offline | |
| 308 | +| Base URL uses `127.0.0.1` | Loopback points at the wrong runtime | Use `host.openshell.internal` or another gateway-reachable host | |
| 309 | +| Local engine works only when gateway is local | Gateway moved to remote host | Run the engine on the gateway host, add a tunnel, or use direct external access | |
| 310 | +| `connection not allowed by policy` on `inference.local` | Unsupported path or method | Use a supported inference API path | |
| 311 | +| `no compatible route` | Provider type does not match request shape | Switch provider type or change the client API | |
| 312 | +| Direct call to external host is denied | Missing policy or provider attachment | Update `network_policies` and launch sandbox with the right provider | |
| 313 | +| SDK fails on empty auth token | Client requires a non-empty API key even though OpenShell injects the real one | Use any placeholder token such as `test` | |
| 314 | +| Upstream timeout from container to host-local backend | Host firewall or network config blocks container-to-host traffic | Allow the Docker bridge subnet to reach the inference port on the host gateway IP (see firewall fix section above) | |
| 315 | + |
| 316 | +## Full Diagnostic Dump |
| 317 | + |
| 318 | +Run this when you want a compact report before deciding on a fix: |
| 319 | + |
| 320 | +```bash |
| 321 | +echo "=== Gateway Status ===" |
| 322 | +openshell status |
| 323 | + |
| 324 | +echo "=== Gateway Info ===" |
| 325 | +openshell gateway info |
| 326 | + |
| 327 | +echo "=== Managed Inference ===" |
| 328 | +openshell inference get |
| 329 | + |
| 330 | +echo "=== Providers ===" |
| 331 | +openshell provider list |
| 332 | + |
| 333 | +echo "=== Selected Provider ===" |
| 334 | +openshell provider get <provider-name> |
| 335 | + |
| 336 | +echo "=== Sandbox Probe ===" |
| 337 | +openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}' |
| 338 | +``` |
| 339 | + |
| 340 | +When you report back, state: |
| 341 | + |
| 342 | +1. Which inference path is failing (`inference.local` vs direct external) |
| 343 | +2. Whether gateway topology is part of the problem |
| 344 | +3. The most likely root cause |
| 345 | +4. The exact fix commands the user should run |
0 commit comments