Skip to content

Commit 1acb73e

Browse files
authored
chore: remove navigator references from codebase (#208)
1 parent 4fb07c4 commit 1acb73e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+1115
-1184
lines changed

.agents/skills/debug-navigator-cluster/SKILL.md

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Diagnose why a openshell cluster failed to start after `openshell gateway start`
1313

1414
1. **Pre-deploy check**: `openshell gateway start` in interactive mode prompts to **reuse** (keep volume, clean stale nodes) or **recreate** (destroy everything, fresh start). `mise run cluster` always recreates before deploy.
1515
2. Ensure cluster image is available (local build or remote pull)
16-
3. Create Docker network (`navigator-cluster`) and volume (`navigator-cluster-{name}`)
17-
4. Create and start a privileged Docker container (`navigator-cluster-{name}`)
16+
3. Create Docker network (`openshell-cluster`) and volume (`openshell-cluster-{name}`)
17+
4. Create and start a privileged Docker container (`openshell-cluster-{name}`)
1818
5. Wait for k3s to generate kubeconfig (up to 60s)
1919
6. **Clean stale nodes**: Remove any `NotReady` k3s nodes left over from previous container instances that reused the same persistent volume
2020
7. **Prepare local images** (if `OPENSHELL_PUSH_IMAGES` is set): In `internal` registry mode, bootstrap waits for the in-cluster registry and pushes tagged images there. In `external` mode, bootstrap uses legacy `ctr -n k8s.io images import` push-mode behavior.
@@ -35,7 +35,7 @@ The host port is configurable via `--port` on `openshell gateway start` (default
3535

3636
The TCP host is also added as an extra gateway TLS SAN so mTLS hostname validation succeeds.
3737

38-
The default cluster name is `openshell`. The container is `navigator-cluster-{name}`.
38+
The default cluster name is `openshell`. The container is `openshell-cluster-{name}`.
3939

4040
## Prerequisites
4141

@@ -51,7 +51,7 @@ When the user asks to debug a cluster failure, **run diagnostics automatically**
5151

5252
Before running commands, establish:
5353

54-
1. **Cluster name**: Default is `openshell`, giving container name `navigator-cluster-openshell`
54+
1. **Cluster name**: Default is `openshell`, giving container name `openshell-cluster-openshell`
5555
2. **Remote or local**: If the user deployed with `--remote <host>`, all Docker commands must target that host
5656
3. **Config directory**: `~/.config/openshell/clusters/{name}/`
5757

@@ -62,37 +62,37 @@ For remote clusters, prefix Docker commands with SSH:
6262
ssh <host> docker <command>
6363

6464
# Remote kubectl inside the container
65-
ssh <host> docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl <command>'
65+
ssh <host> docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl <command>'
6666
```
6767

6868
For local clusters, run Docker commands directly:
6969

7070
```bash
7171
docker <command>
72-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl <command>'
72+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl <command>'
7373
```
7474

7575
### Step 1: Check Docker Container State
7676

7777
First, determine if the container exists and its state:
7878

7979
```bash
80-
docker ps -a --filter name=navigator-cluster- --format 'table {{.ID}}\t{{.Names}}\t{{.Status}}\t{{.Ports}}'
80+
docker ps -a --filter name=openshell-cluster- --format 'table {{.ID}}\t{{.Names}}\t{{.Status}}\t{{.Ports}}'
8181
```
8282

8383
If the container does not exist:
8484

8585
```bash
8686
# Check if the image is available
87-
docker images 'navigator/cluster*' --format 'table {{.Repository}}\t{{.Tag}}\t{{.Size}}'
87+
docker images 'openshell/cluster*' --format 'table {{.Repository}}\t{{.Tag}}\t{{.Size}}'
8888
```
8989

9090
If the image is missing, re-deploy so bootstrap can pull the published cluster image (or set `OPENSHELL_CLUSTER_IMAGE` explicitly).
9191

9292
If the container exists but is not running, inspect it:
9393

9494
```bash
95-
docker inspect navigator-cluster-<name> --format '{{.State.Status}} exit={{.State.ExitCode}} oom={{.State.OOMKilled}} error={{.State.Error}}'
95+
docker inspect openshell-cluster-<name> --format '{{.State.Status}} exit={{.State.ExitCode}} oom={{.State.OOMKilled}} error={{.State.Error}}'
9696
```
9797

9898
- **OOMKilled=true**: The host doesn't have enough memory.
@@ -103,7 +103,7 @@ docker inspect navigator-cluster-<name> --format '{{.State.Status}} exit={{.Stat
103103
Get recent container logs to identify startup failures:
104104

105105
```bash
106-
docker logs navigator-cluster-<name> --tail 100
106+
docker logs openshell-cluster-<name> --tail 100
107107
```
108108

109109
Look for:
@@ -119,13 +119,13 @@ Verify k3s itself is functional:
119119

120120
```bash
121121
# API server readiness
122-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get --raw="/readyz"'
122+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get --raw="/readyz"'
123123

124124
# Node status
125-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get nodes -o wide'
125+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get nodes -o wide'
126126

127127
# All pods
128-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get pods -A -o wide'
128+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get pods -A -o wide'
129129
```
130130

131131
If `/readyz` fails, k3s is still starting or has crashed. Check container logs (Step 2).
@@ -138,16 +138,16 @@ The OpenShell server is deployed via a HelmChart CR as a StatefulSet with persis
138138

139139
```bash
140140
# StatefulSet status
141-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get statefulset/navigator -o wide'
141+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get statefulset/navigator -o wide'
142142

143143
# OpenShell pod logs
144-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator logs statefulset/navigator --tail=100'
144+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator logs statefulset/navigator --tail=100'
145145

146146
# Describe statefulset for events
147-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator describe statefulset/navigator'
147+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator describe statefulset/navigator'
148148

149149
# Helm install job logs (the job that installs the OpenShell chart)
150-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n kube-system logs -l job-name=helm-install-navigator --tail=200'
150+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n kube-system logs -l job-name=helm-install-navigator --tail=200'
151151
```
152152

153153
Common issues:
@@ -162,15 +162,15 @@ The Envoy Gateway provides HTTP/gRPC ingress:
162162

163163
```bash
164164
# Gateway status
165-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get gateway/navigator-gateway'
165+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get gateway/navigator-gateway'
166166

167167
# Check port bindings on the host
168-
docker port navigator-cluster-<name>
168+
docker port openshell-cluster-<name>
169169
```
170170

171171
Expected ports: `6443/tcp`, `30051/tcp` (mapped to configurable host port, default 8080; set via `--port` on deploy).
172172
Only one local cluster can run on a Docker host at a time because `6443` is fixed.
173-
`mise run cluster` handles this by removing conflicting local `navigator-cluster-*` containers first.
173+
`mise run cluster` handles this by removing conflicting local `openshell-cluster-*` containers first.
174174

175175
If ports are missing or conflicting, another process may be using them. Check with:
176176

@@ -185,37 +185,37 @@ If using Docker-in-Docker (`DOCKER_HOST=tcp://docker:2375`), verify metadata poi
185185

186186
Component images (server, sandbox, pki-job) can reach kubelet via two paths:
187187

188-
**Local/external pull mode** (default local via `mise run cluster` / `mise run cluster:build`): Local images are tagged to the configured local registry base (default `127.0.0.1:5000/navigator/*`), pushed to that registry, and pulled by k3s via `registries.yaml` mirror endpoint (typically `host.docker.internal:5000`). `cluster:build` builds then pushes images; `cluster` pushes prebuilt local tags (`navigator/*:dev`, falling back to `localhost:5000/navigator/*:dev` or `127.0.0.1:5000/navigator/*:dev`).
188+
**Local/external pull mode** (default local via `mise run cluster`): Local images are tagged to the configured local registry base (default `127.0.0.1:5000/openshell/*`), pushed to that registry, and pulled by k3s via `registries.yaml` mirror endpoint (typically `host.docker.internal:5000`). The `cluster` task pushes prebuilt local tags (`openshell/*:dev`, falling back to `localhost:5000/openshell/*:dev` or `127.0.0.1:5000/openshell/*:dev`).
189189

190190
```bash
191191
# Verify image refs currently used by openshell deployment
192-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get deploy navigator -o jsonpath="{.spec.template.spec.containers[*].image}"'
192+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get deploy navigator -o jsonpath="{.spec.template.spec.containers[*].image}"'
193193

194194
# Verify registry mirror/auth endpoint configuration
195-
docker exec navigator-cluster-<name> cat /etc/rancher/k3s/registries.yaml
195+
docker exec openshell-cluster-<name> cat /etc/rancher/k3s/registries.yaml
196196
```
197197

198-
**Legacy push mode** (`mise run cluster:push`): Images are imported into the k3s containerd `k8s.io` namespace.
198+
**Legacy push mode**: Images are imported into the k3s containerd `k8s.io` namespace.
199199

200200
```bash
201201
# Check if images were imported into containerd (k3s default namespace is k8s.io)
202-
docker exec navigator-cluster-<name> ctr -a /run/k3s/containerd/containerd.sock images ls | grep navigator
202+
docker exec openshell-cluster-<name> ctr -a /run/k3s/containerd/containerd.sock images ls | grep navigator
203203
```
204204

205205
If images are missing, re-import with:
206206

207207
```bash
208-
docker save <image-ref> | docker exec -i navigator-cluster-<name> ctr -a /run/k3s/containerd/containerd.sock images import -
208+
docker save <image-ref> | docker exec -i openshell-cluster-<name> ctr -a /run/k3s/containerd/containerd.sock images import -
209209
```
210210

211211
**External pull mode** (remote deploy, or local with `OPENSHELL_REGISTRY_HOST`/`IMAGE_REPO_BASE` pointing at a non-local registry): Images are pulled from an external registry at runtime. The entrypoint generates `/etc/rancher/k3s/registries.yaml`.
212212

213213
```bash
214214
# Verify registries.yaml exists and has credentials
215-
docker exec navigator-cluster-<name> cat /etc/rancher/k3s/registries.yaml
215+
docker exec openshell-cluster-<name> cat /etc/rancher/k3s/registries.yaml
216216

217217
# Test pulling an image manually from inside the cluster
218-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml crictl pull ghcr.io/nvidia/nemoclaw/server:latest'
218+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml crictl pull ghcr.io/nvidia/nemoclaw/server:latest'
219219
```
220220

221221
If `registries.yaml` is missing or has wrong values, verify env wiring (`OPENSHELL_REGISTRY_HOST`, `OPENSHELL_REGISTRY_INSECURE`, username/password for authenticated registries).
@@ -226,10 +226,10 @@ TLS certificates are generated by the `navigator-bootstrap` crate (using `rcgen`
226226

227227
```bash
228228
# Check if the three TLS secrets exist
229-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get secret navigator-server-tls navigator-server-client-ca navigator-client-tls'
229+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get secret navigator-server-tls navigator-server-client-ca navigator-client-tls'
230230

231231
# Inspect server cert expiry (if openssl is available in the container)
232-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get secret navigator-server-tls -o jsonpath="{.data.tls\.crt}" | base64 -d | openssl x509 -noout -dates 2>/dev/null || echo "openssl not available"'
232+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n navigator get secret navigator-server-tls -o jsonpath="{.data.tls\.crt}" | base64 -d | openssl x509 -noout -dates 2>/dev/null || echo "openssl not available"'
233233

234234
# Check if CLI-side mTLS files exist locally
235235
ls -la ~/.config/openshell/clusters/<name>/mtls/
@@ -247,7 +247,7 @@ Common mTLS issues:
247247
Events catch scheduling failures, image pull errors, and resource issues:
248248

249249
```bash
250-
docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get events -A --sort-by=.lastTimestamp' | tail -n 50
250+
docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get events -A --sort-by=.lastTimestamp' | tail -n 50
251251
```
252252

253253
Look for:
@@ -264,13 +264,13 @@ DNS misconfiguration is a common root cause, especially on remote/Linux hosts:
264264

265265
```bash
266266
# Check the resolv.conf k3s is using
267-
docker exec navigator-cluster-<name> cat /etc/rancher/k3s/resolv.conf
267+
docker exec openshell-cluster-<name> cat /etc/rancher/k3s/resolv.conf
268268

269269
# Test DNS resolution from inside the container
270-
docker exec navigator-cluster-<name> sh -c 'nslookup google.com || wget -q -O /dev/null http://google.com && echo "network ok" || echo "network unreachable"'
270+
docker exec openshell-cluster-<name> sh -c 'nslookup google.com || wget -q -O /dev/null http://google.com && echo "network ok" || echo "network unreachable"'
271271

272272
# Check the entrypoint's DNS decision (in container logs)
273-
docker logs navigator-cluster-<name> 2>&1 | head -20
273+
docker logs openshell-cluster-<name> 2>&1 | head -20
274274
```
275275

276276
The entrypoint script selects DNS resolvers in this priority:
@@ -317,15 +317,15 @@ For clusters deployed with `--remote <host>`, all commands must target the remot
317317

318318
```bash
319319
ssh <host> docker ps -a
320-
ssh <host> docker logs navigator-cluster-<name>
321-
ssh <host> docker exec navigator-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get pods -A'
320+
ssh <host> docker logs openshell-cluster-<name>
321+
ssh <host> docker exec openshell-cluster-<name> sh -lc 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get pods -A'
322322
```
323323

324324
**Option B: Docker SSH context**:
325325

326326
```bash
327327
docker -H ssh://<host> ps -a
328-
docker -H ssh://<host> logs navigator-cluster-<name>
328+
docker -H ssh://<host> logs openshell-cluster-<name>
329329
```
330330

331331
**Setting up kubectl access** (requires tunnel):
@@ -344,7 +344,7 @@ Run all diagnostics at once for a comprehensive report:
344344
```bash
345345
HOST="<host>" # leave empty for local, or set to SSH destination
346346
NAME="openshell" # cluster name
347-
CONTAINER="navigator-cluster-${NAME}"
347+
CONTAINER="openshell-cluster-${NAME}"
348348
KCFG="KUBECONFIG=/etc/rancher/k3s/k3s.yaml"
349349

350350
# Helper: run docker command locally or remotely

.agents/skills/nemoclaw-cli/SKILL.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -436,32 +436,32 @@ openshell sandbox delete work-session
436436

437437
---
438438

439-
## Workflow 7: Cluster Inference
439+
## Workflow 7: Gateway Inference
440440

441-
Configure the cluster's managed inference route for `inference.local`.
441+
Configure the gateway's managed inference route for `inference.local`.
442442

443-
### Set cluster inference
443+
### Set gateway inference
444444

445445
First ensure the provider record exists:
446446

447447
```bash
448448
openshell provider list
449449
```
450450

451-
Then point cluster inference at that provider and model:
451+
Then point gateway inference at that provider and model:
452452

453453
```bash
454-
openshell cluster inference set \
454+
openshell inference set \
455455
--provider nvidia \
456456
--model nvidia/nemotron-3-nano-30b-a3b
457457
```
458458

459-
This updates the cluster-managed `inference.local` route. There is no per-route create/list/update/delete workflow for sandbox inference.
459+
This updates the gateway-managed `inference.local` route. There is no per-route create/list/update/delete workflow for sandbox inference.
460460

461461
### Inspect current inference config
462462

463463
```bash
464-
openshell cluster inference get
464+
openshell inference get
465465
```
466466

467467
### How sandboxes use it
@@ -549,8 +549,8 @@ $ openshell sandbox upload --help
549549
| Download files from sandbox | `openshell sandbox download <name> <path>` |
550550
| Create provider | `openshell provider create --name N --type T --from-existing` |
551551
| List providers | `openshell provider list` |
552-
| Configure cluster inference | `openshell cluster inference set --provider P --model M` |
553-
| View cluster inference | `openshell cluster inference get` |
552+
| Configure gateway inference | `openshell inference set --provider P --model M` |
553+
| View gateway inference | `openshell inference get` |
554554
| Delete sandbox | `openshell sandbox delete <name>` |
555555
| Destroy cluster | `openshell gateway destroy` |
556556
| Self-teach any command | `openshell <group> <cmd> --help` |

.agents/skills/nemoclaw-cli/cli-reference.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ Quick-reference for the `openshell` command-line interface. For workflow guidanc
99
| Flag | Description |
1010
|------|-------------|
1111
| `-v`, `--verbose` | Increase verbosity (`-v` = info, `-vv` = debug, `-vvv` = trace) |
12-
| `-g`, `--gateway <NAME>` | Gateway to operate on. Also settable via `OPENSHELL_CLUSTER` env var. Falls back to active gateway in `~/.config/openshell/active_cluster`. |
12+
| `-g`, `--gateway <NAME>` | Gateway to operate on. Also settable via `OPENSHELL_GATEWAY` env var. Falls back to active gateway in `~/.config/openshell/active_gateway`. |
1313

1414
## Environment Variables
1515

1616
| Variable | Description |
1717
|----------|-------------|
18-
| `OPENSHELL_CLUSTER` | Override active gateway name (same as `--gateway`) |
18+
| `OPENSHELL_GATEWAY` | Override active gateway name (same as `--gateway`) |
1919
| `OPENSHELL_SANDBOX_POLICY` | Path to default sandbox policy YAML (fallback when `--policy` is not provided) |
2020

2121
---
@@ -122,7 +122,7 @@ Print or start an SSH tunnel for kubectl access to a remote cluster.
122122

123123
### `openshell gateway select [name]`
124124

125-
Set the active gateway. Writes to `~/.config/openshell/active_cluster`. When called without arguments, lists all provisioned gateways with the active one marked with `*`.
125+
Set the active gateway. Writes to `~/.config/openshell/active_gateway`. When called without arguments, lists all provisioned gateways with the active one marked with `*`.
126126

127127
---
128128

@@ -319,29 +319,29 @@ Delete one or more providers by name.
319319

320320
---
321321

322-
## Cluster Inference Commands
322+
## Inference Commands
323323

324-
### `openshell cluster inference set`
324+
### `openshell inference set`
325325

326-
Configure the managed cluster inference route used by `inference.local`. Both flags are required.
326+
Configure the managed gateway inference route used by `inference.local`. Both flags are required.
327327

328328
| Flag | Default | Description |
329329
|------|---------|-------------|
330330
| `--provider <NAME>` | -- | Provider record name (required) |
331331
| `--model <ID>` | -- | Model identifier to use for generation requests (required) |
332332

333-
### `openshell cluster inference update`
333+
### `openshell inference update`
334334

335-
Partially update the cluster inference configuration. Fetches the current config and applies only the provided overrides. At least one flag is required.
335+
Partially update the gateway inference configuration. Fetches the current config and applies only the provided overrides. At least one flag is required.
336336

337337
| Flag | Default | Description |
338338
|------|---------|-------------|
339339
| `--provider <NAME>` | unchanged | Provider record name |
340340
| `--model <ID>` | unchanged | Model identifier |
341341

342-
### `openshell cluster inference get`
342+
### `openshell inference get`
343343

344-
Show the current cluster inference configuration.
344+
Show the current gateway inference configuration.
345345

346346
---
347347

0 commit comments

Comments
 (0)