Problem Statement
NemoClaw has several implemented features — gateway auto-recovery, onboard resume after hard kill, stale state cleanup, sandbox process recovery, inference error classification, and sandbox data preservation on reuse — that have no E2E test proving they work in a real environment.
Users rely on these paths during the most stressful moments: their bot stopped responding, their laptop died mid-setup, or they are trying to recover a broken state. If a regression lands in any of these code paths, there is no automated signal — it only surfaces when a real user hits it.
TC-01 — connect/status auto-restarts dead gateway
- Real-life situation: A user's Telegram/Slack bot stops responding. They open a terminal and run
nemoclaw status or nemoclaw connect to diagnose. The CLI is expected to detect the gateway is down, automatically restart it, and return a healthy status — all without manual intervention.
- Why we need this: The
status path has partial coverage; the connect path has zero. Without this test, a regression on the auto-restart logic would silently break the most common recovery workflow, leaving users with a dead agent and no path forward.
TC-02 — SIGKILL mid-onboard → stale lock → resume
- Real-life situation: A user is setting up NemoClaw for the first time. Their laptop battery dies or they accidentally close the terminal halfway through onboarding. The next time they run
nemoclaw onboard, it is expected to detect the stale lock file from the dead process, clean it up, and resume from the last completed step — no data loss, no manual cleanup needed.
- Why we need this: The resume feature is only tested with a clean controlled failure. A real hard kill leaves a different kind of stale state. Without this test, users who hit this scenario may get stuck with a corrupted lock file and no clear path to recover.
TC-04 — Stale gateway cleanup in preflight
- Real-life situation: Docker updated overnight and wiped the gateway container. The user opens their laptop and sends a Telegram message — no reply. They run
nemoclaw status, which triggers auto-recovery. If auto-recovery succeeds, they are back to normal with no further action needed. If it fails (container and volumes too corrupted for openshell gateway start to recover), the CLI prints guidance directing them to run nemoclaw onboard as a last resort. Only at that point does the user re-run onboard — not as a first instinct, but as a deliberate recovery step after auto-recovery could not fix things.
- Why we need this: TC-04 is the severity-2 companion to TC-01. TC-01 covers a stopped gateway (container exists but not running — auto-recovery usually works). TC-04 covers a deleted gateway container (no container, stale metadata — auto-recovery fails, onboard is the fallback). The preflight cleanup logic that detects the ghost state and resets it cleanly has no test. Without coverage, a regression here would leave users stranded at the exact moment they are following the CLI's own guidance to recover.
TC-05 — Sandbox process recovery on connect/status
- Real-life situation: The user's machine had a memory spike or high load. The Kubernetes pod inside the sandbox restarted and the OpenClaw gateway process inside it died. When the user runs
nemoclaw connect to resume work, the CLI is expected to detect the dead process, restart it automatically, and drop the user into a working shell as expected.
- Why we need this: This is newly written code with zero E2E coverage. New recovery paths with no tests are the most likely place for regressions to appear undetected, especially on a flow users depend on to resume work after an unexpected interruption.
TC-06 — Inference validation error → classified error message
- Real-life situation: A user pastes a wrong API key during onboard, or they are on a corporate network where the NVIDIA endpoint is blocked. The wizard is expected to detect the failure, classify it (wrong key vs. unreachable endpoint vs. quota exceeded), and show a clear actionable error message — not a raw stack trace.
- Why we need this: Without classified error handling, users cannot tell whether the problem is their key, their network, or a NemoClaw bug. This test also catches the silent
ANTHROPIC_API_KEY env var override that currently reroutes traffic to Anthropic without any warning.
TC-07 — Double onboard "reuse" preserves sandbox data
- Real-life situation: A user re-runs
nemoclaw onboard to change their inference provider or rotate their API key. The wizard asks "Reuse existing sandbox?" They confirm yes. Their agent's memory, workspace files, and scheduled tasks are expected to remain completely untouched — only the inference config changes.
- Why we need this: If the reuse path silently wipes sandbox data, the user loses all accumulated agent context — conversations, saved files, tasks — without any warning. No test currently verifies that data actually survives this path, making it an invisible regression risk.
Proposed Design
Add six E2E test scripts under test/e2e/, following the existing baseline → disrupt → verify
skeleton from test-sandbox-survival.sh. No new test infrastructure required.
─────────────────────────────────────────────────────────────────
TC-01 — connect/status auto-restarts dead gateway
File: test/e2e/test-gateway-auto-restart.sh
Steps:
- Onboard normally. Verify inference works.
- Kill the gateway process directly from the host:
kill -9 $(pgrep openshell-gateway)
(Container still exists, process is dead — simulates a crashed gateway.)
- Run
nemoclaw <name> status.
- Run
nemoclaw <name> connect.
Pass criteria:
- Both commands detect the dead gateway and restart it automatically.
- No manual intervention required.
nemoclaw <name> status returns healthy after recovery.
nemoclaw <name> connect drops into a working shell.
- Inference returns a response after recovery.
─────────────────────────────────────────────────────────────────
TC-02 — SIGKILL mid-onboard → stale lock → resume
File: test/e2e/test-onboard-sigkill-resume.sh
Steps:
- Start
nemoclaw onboard --non-interactive in the background:
nemoclaw onboard --non-interactive &
- Watch logs and wait until the gateway setup step completes
(look for "gateway select nemoclaw" in output).
- Hard-kill the onboard process:
kill -9
- Run
nemoclaw onboard --non-interactive again.
Let the auto-detection handle it.
Pass criteria:
- Output contains "Found an interrupted onboarding session — resuming it."
- Output shows the gateway step was skipped (not re-executed).
- Exit code is 0.
- No orphaned Docker containers
(docker ps -a shows no dangling openclaw-* containers).
- ~/.nemoclaw/onboard-session.json is valid JSON with status: "complete".
- Inference returns a response after resume completes.
─────────────────────────────────────────────────────────────────
TC-04 — Stale gateway cleanup in preflight
File: test/e2e/test-stale-gateway-cleanup.sh
This test simulates the severity-2 disruption: the gateway container was deleted entirely
(not just stopped), leaving stale metadata behind. It follows the real user sequence —
status first, onboard only as a last resort after auto-recovery fails.
Steps:
- Onboard normally. Verify inference works.
- Force-delete the gateway container from the host:
docker rm -f openshell-cluster-nemoclaw
(Leave ~/.nemoclaw metadata intact — this creates the ghost state.)
- Run
nemoclaw <name> status.
- If status succeeds (auto-recovery handled it) → log as a TC-01 variant, skip to done.
- If status fails with guidance to re-run onboard → proceed to step 4.
- Run
nemoclaw onboard following the CLI's own recovery guidance.
- Verify preflight cleaned up the stale state and onboard completes.
Pass criteria:
- No manual cleanup required by the user at any step.
- If auto-recovery in step 3 fails, onboard in step 4 completes cleanly
with no conflicting state errors or duplicate container entries.
- Inference returns a response after recovery.
Relationship to TC-01:
TC-01 covers a stopped gateway (container exists, not running — auto-recovery path).
TC-04 covers a deleted gateway container (no container, stale metadata — onboard fallback).
Kept as separate tests to preserve distinct CI failure signals for each code path.
─────────────────────────────────────────────────────────────────
TC-05 — Sandbox process recovery on connect/status
File: test/e2e/test-sandbox-process-recovery.sh
Steps:
- Onboard normally. Verify inference works.
- Connect into the sandbox and kill the OpenClaw gateway process inside the pod:
nemoclaw connect
kill -9 $(pgrep -f "openclaw gateway")
exit
- From the host, run
nemoclaw <name> status.
- From the host, run
nemoclaw <name> connect.
Pass criteria:
- CLI detects the dead process inside the pod.
- Process is restarted automatically — no manual intervention.
nemoclaw <name> status returns healthy.
nemoclaw <name> connect drops into a working shell.
- Inference returns a response after recovery.
─────────────────────────────────────────────────────────────────
TC-06 — Inference validation error → classified error message
File: test/e2e/test-inference-error-classification.sh
Runner: PR-safe (no real API key needed)
Steps:
- Run nemoclaw onboard in non-interactive mode with an intentionally invalid API key:
NVIDIA_API_KEY=invalid-key-for-testing nemoclaw onboard --non-interactive
- Capture stdout and stderr.
- Repeat with the NVIDIA endpoint replaced by an unreachable URL
to simulate a blocked corporate network.
Pass criteria:
- Exit code is non-zero in both cases.
- Output contains a human-readable classified error message
(e.g. "Invalid API key" or "Endpoint unreachable") — not a raw stack trace.
- If ANTHROPIC_API_KEY is set in the shell environment, output contains a warning
that it was detected — traffic is not silently rerouted to Anthropic.
─────────────────────────────────────────────────────────────────
TC-07 — Double onboard "reuse" preserves sandbox data
File: test/e2e/test-double-onboard-reuse.sh
Steps:
- Onboard normally.
- Write a marker file inside the sandbox:
nemoclaw connect
echo "marker" > /sandbox/marker.txt
exit
- Re-run
nemoclaw onboard. When prompted "Reuse existing sandbox?", select yes.
Change only the inference provider (e.g. switch from NVIDIA Cloud to Ollama).
- Connect and verify:
nemoclaw connect
cat /sandbox/marker.txt
Pass criteria:
- marker.txt is present with its original content.
- Agent memory directory (~/.openclaw/memory/) is untouched.
- Only the inference config has changed (openshell inference show reflects new provider).
- Inference returns a response from the new provider.
─────────────────────────────────────────────────────────────────
Suggested CI workflow: .github/workflows/e2e-existing-features.yml
- TC-06 runs on every PR (no secrets needed).
- TC-01, TC-02, TC-04, TC-05, TC-07 run nightly with real API keys.
- Failures auto-create a GitHub issue labeled bug + CI/CD,
consistent with the existing nightly failure convention.
Alternatives Considered
No response
Category
enhancement: feature
Checklist
Problem Statement
NemoClaw has several implemented features — gateway auto-recovery, onboard resume after hard kill, stale state cleanup, sandbox process recovery, inference error classification, and sandbox data preservation on reuse — that have no E2E test proving they work in a real environment.
Users rely on these paths during the most stressful moments: their bot stopped responding, their laptop died mid-setup, or they are trying to recover a broken state. If a regression lands in any of these code paths, there is no automated signal — it only surfaces when a real user hits it.
TC-01 — connect/status auto-restarts dead gateway
nemoclaw statusornemoclaw connectto diagnose. The CLI is expected to detect the gateway is down, automatically restart it, and return a healthy status — all without manual intervention.statuspath has partial coverage; theconnectpath has zero. Without this test, a regression on the auto-restart logic would silently break the most common recovery workflow, leaving users with a dead agent and no path forward.TC-02 — SIGKILL mid-onboard → stale lock → resume
nemoclaw onboard, it is expected to detect the stale lock file from the dead process, clean it up, and resume from the last completed step — no data loss, no manual cleanup needed.TC-04 — Stale gateway cleanup in preflight
nemoclaw status, which triggers auto-recovery. If auto-recovery succeeds, they are back to normal with no further action needed. If it fails (container and volumes too corrupted foropenshell gateway startto recover), the CLI prints guidance directing them to runnemoclaw onboardas a last resort. Only at that point does the user re-run onboard — not as a first instinct, but as a deliberate recovery step after auto-recovery could not fix things.TC-05 — Sandbox process recovery on connect/status
nemoclaw connectto resume work, the CLI is expected to detect the dead process, restart it automatically, and drop the user into a working shell as expected.TC-06 — Inference validation error → classified error message
ANTHROPIC_API_KEYenv var override that currently reroutes traffic to Anthropic without any warning.TC-07 — Double onboard "reuse" preserves sandbox data
nemoclaw onboardto change their inference provider or rotate their API key. The wizard asks "Reuse existing sandbox?" They confirm yes. Their agent's memory, workspace files, and scheduled tasks are expected to remain completely untouched — only the inference config changes.Proposed Design
Add six E2E test scripts under test/e2e/, following the existing baseline → disrupt → verify
skeleton from test-sandbox-survival.sh. No new test infrastructure required.
─────────────────────────────────────────────────────────────────
TC-01 — connect/status auto-restarts dead gateway
File: test/e2e/test-gateway-auto-restart.sh
Steps:
kill -9 $(pgrep openshell-gateway)
(Container still exists, process is dead — simulates a crashed gateway.)
nemoclaw <name> status.nemoclaw <name> connect.Pass criteria:
nemoclaw <name> statusreturns healthy after recovery.nemoclaw <name> connectdrops into a working shell.─────────────────────────────────────────────────────────────────
TC-02 — SIGKILL mid-onboard → stale lock → resume
File: test/e2e/test-onboard-sigkill-resume.sh
Steps:
nemoclaw onboard --non-interactivein the background:nemoclaw onboard --non-interactive &
(look for "gateway select nemoclaw" in output).
kill -9
nemoclaw onboard --non-interactiveagain.Let the auto-detection handle it.
Pass criteria:
(docker ps -a shows no dangling openclaw-* containers).
─────────────────────────────────────────────────────────────────
TC-04 — Stale gateway cleanup in preflight
File: test/e2e/test-stale-gateway-cleanup.sh
This test simulates the severity-2 disruption: the gateway container was deleted entirely
(not just stopped), leaving stale metadata behind. It follows the real user sequence —
status first, onboard only as a last resort after auto-recovery fails.
Steps:
docker rm -f openshell-cluster-nemoclaw
(Leave ~/.nemoclaw metadata intact — this creates the ghost state.)
nemoclaw <name> status.nemoclaw onboardfollowing the CLI's own recovery guidance.Pass criteria:
with no conflicting state errors or duplicate container entries.
Relationship to TC-01:
TC-01 covers a stopped gateway (container exists, not running — auto-recovery path).
TC-04 covers a deleted gateway container (no container, stale metadata — onboard fallback).
Kept as separate tests to preserve distinct CI failure signals for each code path.
─────────────────────────────────────────────────────────────────
TC-05 — Sandbox process recovery on connect/status
File: test/e2e/test-sandbox-process-recovery.sh
Steps:
nemoclaw connect
kill -9 $(pgrep -f "openclaw gateway")
exit
nemoclaw <name> status.nemoclaw <name> connect.Pass criteria:
nemoclaw <name> statusreturns healthy.nemoclaw <name> connectdrops into a working shell.─────────────────────────────────────────────────────────────────
TC-06 — Inference validation error → classified error message
File: test/e2e/test-inference-error-classification.sh
Runner: PR-safe (no real API key needed)
Steps:
NVIDIA_API_KEY=invalid-key-for-testing nemoclaw onboard --non-interactive
to simulate a blocked corporate network.
Pass criteria:
(e.g. "Invalid API key" or "Endpoint unreachable") — not a raw stack trace.
that it was detected — traffic is not silently rerouted to Anthropic.
─────────────────────────────────────────────────────────────────
TC-07 — Double onboard "reuse" preserves sandbox data
File: test/e2e/test-double-onboard-reuse.sh
Steps:
nemoclaw connect
echo "marker" > /sandbox/marker.txt
exit
nemoclaw onboard. When prompted "Reuse existing sandbox?", select yes.Change only the inference provider (e.g. switch from NVIDIA Cloud to Ollama).
nemoclaw connect
cat /sandbox/marker.txt
Pass criteria:
─────────────────────────────────────────────────────────────────
Suggested CI workflow: .github/workflows/e2e-existing-features.yml
consistent with the existing nightly failure convention.
Alternatives Considered
No response
Category
enhancement: feature
Checklist