test(e2e): add sandbox operations recovery and multi-sandbox isolatio…#1865
test(e2e): add sandbox operations recovery and multi-sandbox isolatio…#1865ericksoa merged 2 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new Bash end-to-end test script Changes
Sequence Diagram(s)sequenceDiagram
participant CI as CI Runner
participant Script as test-sbx.sh
participant Nemoclaw as Nemoclaw CLI
participant Docker as Docker Engine
participant Sandbox as Sandbox (VM/Container)
participant Gateway as OpenClaw Gateway
CI->>Script: start (with timeout)
Script->>Nemoclaw: preflight checks (cli present, creds)
Script->>Docker: verify Docker responsive
Script->>Nemoclaw: destroy stale sandboxes
alt onboarding needed
Script->>Nemoclaw: onboard sandbox A (--non-interactive)
Nemoclaw->>Sandbox: provision sandbox A
end
Script->>Sandbox: run SSH one-shot command (check token)
Script->>Nemoclaw: `nemoclaw <sandbox> status` (validate fields)
Script->>Sandbox: stream logs (--follow), then stop follower
Script->>Sandbox: kill gateway process (in-sandbox)
Sandbox->>Gateway: gateway restarts/recovers
Script->>Nemoclaw: verify recovery via status and SSH
Script->>Nemoclaw: onboard sandbox B
Script->>Sandbox: SSH probe HTTP endpoints (network isolation)
Script->>Nemoclaw: destroy sandbox B
Script->>Nemoclaw: final cleanup (destroy A, remove locks)
Script->>CI: summary (PASS/FAIL/SKIP/TOTAL)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
test/e2e/test-sandbox-operations.sh (4)
391-399: Document the purpose of port 18789.The hardcoded port
18789is used without explanation. Consider adding a comment explaining what service is expected on this port to improve maintainability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 391 - 399, The probe using sandbox_exec_for to curl node's http.get targets hardcoded port 18789 (variable names: probe_a, SANDBOX_B) without context; add a short inline comment above this block explaining what service listens on port 18789 (e.g., "internal test HTTP probe for sandbox service X"), and preferably replace the literal with a clearly named shell variable (e.g., SERVICE_PORT or SANDBOX_HTTP_PORT) defined near the other SANDBOX_* variables so future readers know the intent and can update the port in one place.
60-72: Consider handlingssh-configfailures explicitly.If
openshell sandbox ssh-configfails (non-zero exit), the script may exit due toset -e. Adding|| trueor checking the exit status would make the function more robust.Proposed fix
sandbox_exec_for() { local name="$1" cmd="$2" local ssh_cfg ssh_cfg="$(mktemp)" - openshell sandbox ssh-config "$name" >"$ssh_cfg" 2>/dev/null + if ! openshell sandbox ssh-config "$name" >"$ssh_cfg" 2>/dev/null; then + rm -f "$ssh_cfg" + echo "ERROR: Could not get SSH config for $name" + return + fi local result🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 60 - 72, The sandbox_exec_for function currently assumes openshell sandbox ssh-config succeeds; capture and check its exit status after running openshell sandbox ssh-config "$name" into ssh_cfg (or run it with || true and then test if the file is non-empty) and if it failed, remove ssh_cfg and return a non-zero result or informative error string instead of proceeding to ssh; update logic around ssh_cfg creation in sandbox_exec_for to explicitly handle failure (clean up tmp file, log or echo an error, and avoid calling ssh with an invalid config).
241-243: Log message precedes actual deletion.The log states "Backed up and deleted" but deletion occurs on the next line. Consider reordering or splitting the message.
Proposed fix
cp "$registry" "${registry}.bak" - log " Backed up and deleted sandboxes.json" + log " Backed up sandboxes.json" rm -f "$registry" + log " Deleted sandboxes.json"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 241 - 243, The log message currently says "Backed up and deleted sandboxes.json" before the deletion actually happens; update the sequence so the operation and message match: either perform rm -f "$registry" immediately after cp "$registry" "${registry}.bak" and then call log " Backed up and deleted sandboxes.json", or change the log call to " Backed up sandboxes.json" before removal and add a second log after rm -f "$registry" (e.g., log " Deleted sandboxes.json"); modify the lines using the "$registry" variable and the log function to keep messages consistent with cp and rm commands.
494-503: Teardown runs twice: explicitly inmain()and viatrap EXIT.The
teardownfunction is called at line 497 and again whentrap teardown EXITfires on script exit (line 503). Whileteardownis idempotent (sandboxes are already destroyed), this results in redundantnemoclaw listcalls. Consider removing the explicit call or unsetting the trap beforesummary.Option A: Remove explicit teardown call
# Phase 5: Cleanup verification (destroys sandbox B) test_sbx_05_destroy_cleanup "$SANDBOX_B" - # Teardown everything - teardown - # Report summary }Option B: Clear trap before summary
# Teardown everything teardown + trap - EXIT # Report summary🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 494 - 503, The script calls teardown twice: once explicitly in main() after test_sbx_05_destroy_cleanup and again via trap teardown EXIT; remove the redundancy by either deleting the explicit teardown call in main() (leave trap teardown EXIT to run cleanup on exit) or by unsetting the trap (e.g., using trap - EXIT) before calling summary so teardown only runs once; update references around the teardown invocation in main(), the trap teardown EXIT declaration, and the summary call to implement the chosen option.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 46-50: The increment helpers (pass, fail, skip) use post-increment
arithmetic ((PASS++), ((FAIL++), ((SKIP++), ((TOTAL++))) which can return 0 and
cause the script to exit under set -e; update those increments in the functions
pass, fail, skip (and any other helper using ((VAR++))) to use
addition-assignment ((PASS+=1), ((FAIL+=1), ((SKIP+=1), ((TOTAL+=1))) so the
expression always evaluates truthy and the script won’t abort on the first
increment.
- Around line 1-17: Add an SPDX license identifier immediately after the
existing shebang line (#!/usr/bin/env bash) so the script includes a one-line
SPDX header (e.g., # SPDX-License-Identifier: <Your-License-Identifier>),
ensuring the identifier matches the project's license policy and appears before
any other comments or code in test-sandbox-operations.sh.
- Around line 175-197: Add the missing SPDX header to the top of
test/e2e/test-sandbox-operations.sh (add the two lines shown in the review) and
update the TC-SBX-03 test (function test_sbx_03_status_fields) to match the
actual nemoclaw status output: change the expected field "Sandbox" to
"Sandboxes:" (or the exact label emitted by showStatusCommand), remove or
replace "Provider" and "OpenClaw" with the actual service/status labels that
nemoclaw prints (or assert against model info and service names produced by
showStatusCommand), and keep the rest of the loop/validation logic unchanged so
the test reflects real output from the nemoclaw "<name> status" command.
---
Nitpick comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 391-399: The probe using sandbox_exec_for to curl node's http.get
targets hardcoded port 18789 (variable names: probe_a, SANDBOX_B) without
context; add a short inline comment above this block explaining what service
listens on port 18789 (e.g., "internal test HTTP probe for sandbox service X"),
and preferably replace the literal with a clearly named shell variable (e.g.,
SERVICE_PORT or SANDBOX_HTTP_PORT) defined near the other SANDBOX_* variables so
future readers know the intent and can update the port in one place.
- Around line 60-72: The sandbox_exec_for function currently assumes openshell
sandbox ssh-config succeeds; capture and check its exit status after running
openshell sandbox ssh-config "$name" into ssh_cfg (or run it with || true and
then test if the file is non-empty) and if it failed, remove ssh_cfg and return
a non-zero result or informative error string instead of proceeding to ssh;
update logic around ssh_cfg creation in sandbox_exec_for to explicitly handle
failure (clean up tmp file, log or echo an error, and avoid calling ssh with an
invalid config).
- Around line 241-243: The log message currently says "Backed up and deleted
sandboxes.json" before the deletion actually happens; update the sequence so the
operation and message match: either perform rm -f "$registry" immediately after
cp "$registry" "${registry}.bak" and then call log " Backed up and deleted
sandboxes.json", or change the log call to " Backed up sandboxes.json" before
removal and add a second log after rm -f "$registry" (e.g., log " Deleted
sandboxes.json"); modify the lines using the "$registry" variable and the log
function to keep messages consistent with cp and rm commands.
- Around line 494-503: The script calls teardown twice: once explicitly in
main() after test_sbx_05_destroy_cleanup and again via trap teardown EXIT;
remove the redundancy by either deleting the explicit teardown call in main()
(leave trap teardown EXIT to run cleanup on exit) or by unsetting the trap
(e.g., using trap - EXIT) before calling summary so teardown only runs once;
update references around the teardown invocation in main(), the trap teardown
EXIT declaration, and the summary call to implement the chosen option.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 8053da33-7bce-4dfa-8547-38c89f76b172
📒 Files selected for processing (1)
test/e2e/test-sandbox-operations.sh
2619f28 to
3c52774
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/e2e/test-sandbox-operations.sh (1)
62-64:⚠️ Potential issue | 🔴 Critical
set -e+((VAR++))will exit on the first PASS/FAIL/SKIP update.Lines 62-64 use post-increment arithmetic in command context. With
set -e, the first increment evaluates to0and returns exit code1, which can terminate the script immediately.#!/usr/bin/env bash set -euo pipefail PASS=0 ((PASS++)) echo "UNREACHABLE: this should not print"Proposed fix
pass() { ((PASS++)); ((TOTAL++)); echo -e "${GREEN} PASS${NC} $1" | tee -a "$LOG_FILE"; } fail() { ((FAIL++)); ((TOTAL++)); echo -e "${RED} FAIL${NC} $1 — $2" | tee -a "$LOG_FILE"; } skip() { ((SKIP++)); ((TOTAL++)); echo -e "${YELLOW} SKIP${NC} $1 — $2" | tee -a "$LOG_FILE"; } +pass() { ((PASS+=1)); ((TOTAL+=1)); echo -e "${GREEN} PASS${NC} $1" | tee -a "$LOG_FILE"; } +fail() { ((FAIL+=1)); ((TOTAL+=1)); echo -e "${RED} FAIL${NC} $1 — $2" | tee -a "$LOG_FILE"; } +skip() { ((SKIP+=1)); ((TOTAL+=1)); echo -e "${YELLOW} SKIP${NC} $1 — $2" | tee -a "$LOG_FILE"; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 62 - 64, The post-increment arithmetic inside the pass(), fail(), and skip() functions (symbols: pass, fail, skip, and variables PASS, FAIL, SKIP, TOTAL) can trigger a non-zero exit under set -e; replace the in-command ((VAR++)) usage with safe arithmetic assignments such as PASS=$((PASS+1)) and TOTAL=$((TOTAL+1)) (and respectively for FAIL and SKIP) so increments never produce a failing exit code, keeping the existing echo/tee behavior unchanged.
🧹 Nitpick comments (1)
test/e2e/test-sandbox-operations.sh (1)
559-565: Teardown runs twice (explicit call + EXIT trap).Line 559 calls
teardown, and Line 565 also registerstrap teardown EXIT. This is safe but redundant; keeping only one path will reduce duplicate cleanup logs and extra CLI calls.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 559 - 565, The script calls teardown explicitly and also registers trap teardown EXIT, causing duplicate cleanup; remove the explicit teardown invocation and rely on the EXIT trap (leave the trap teardown EXIT and the teardown function intact) so cleanup runs only once. Reference: the teardown function and the trap teardown EXIT registration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 345-347: The destroy pipeline can cause the script to exit under
set -euo pipefail and skip test result recording; wrap the nemoclaw "$target"
destroy --yes | tee -a "$LOG_FILE" invocation in a temporary errexit-safe block:
disable exiting on failure (or run in a subshell), capture the command's exit
status (use PIPESTATUS[0] for the left side of the pipe) into a variable, then
re-enable strict mode and call fail() when that exit status is non-zero or
pass() when zero; reference the nemoclaw command, the LOG_FILE, and the
fail()/pass() result handlers when making the change.
---
Duplicate comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 62-64: The post-increment arithmetic inside the pass(), fail(),
and skip() functions (symbols: pass, fail, skip, and variables PASS, FAIL, SKIP,
TOTAL) can trigger a non-zero exit under set -e; replace the in-command
((VAR++)) usage with safe arithmetic assignments such as PASS=$((PASS+1)) and
TOTAL=$((TOTAL+1)) (and respectively for FAIL and SKIP) so increments never
produce a failing exit code, keeping the existing echo/tee behavior unchanged.
---
Nitpick comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 559-565: The script calls teardown explicitly and also registers
trap teardown EXIT, causing duplicate cleanup; remove the explicit teardown
invocation and rely on the EXIT trap (leave the trap teardown EXIT and the
teardown function intact) so cleanup runs only once. Reference: the teardown
function and the trap teardown EXIT registration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9501b01d-b1c4-4fbe-b468-47df15ba7054
📒 Files selected for processing (1)
test/e2e/test-sandbox-operations.sh
3c52774 to
05d263d
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/e2e/test-sandbox-operations.sh (1)
252-259:⚠️ Potential issue | 🟡 MinorTest expects "OpenClaw" field but status command doesn't output it.
The
nemoclaw statuscommand outputsSandbox:,Model:,Provider:,Inference:,GPU:, andPolicies:fields. The string "OpenClaw" only appears in error/recovery messages (e.g., "Recovered NemoClaw gateway runtime") and is not present in normal status output, causing this test to fail.Replace "OpenClaw" with "GPU", which is always present:
Proposed fix
- for field in "Sandbox" "Model" "Provider" "OpenClaw"; do + for field in "Sandbox" "Model" "Provider" "GPU"; do🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 252 - 259, The test loop in test-sandbox-operations.sh checks for fields "Sandbox", "Model", "Provider", and "OpenClaw" but the status output never includes "OpenClaw"; change the for-list in the for field in "Sandbox" "Model" "Provider" "OpenClaw"; do ... to use "GPU" instead of "OpenClaw" so it checks against a field that is always present (i.e., update the string literal "OpenClaw" to "GPU" in that for-loop).
🧹 Nitpick comments (1)
test/e2e/test-sandbox-operations.sh (1)
586-593: Teardown runs twice: once explicitly and once via EXIT trap.Line 586 calls
teardownexplicitly, thensummarycallsexit, which triggers theEXITtrap at line 592, runningteardownagain. While the second run is idempotent (sandboxes already destroyed), it produces duplicate log output and unnecessarynemoclaw listcalls.Consider adding a guard variable or removing the explicit
teardowncall since the trap handles it.Option 1: Remove explicit call (preferred)
# Phase 5: Cleanup verification (destroys sandbox B) test_sbx_05_destroy_cleanup "$SANDBOX_B" - # Teardown everything - teardown - # Report summary }Option 2: Add guard variable
+TEARDOWN_DONE=0 + teardown() { + if [[ $TEARDOWN_DONE -eq 1 ]]; then + return + fi + TEARDOWN_DONE=1 # Disable errexit during teardown — cleanup must be best-effort set +e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 586 - 593, The teardown function is being invoked twice (explicit call before summary and again via the trap on EXIT); remove the explicit call to teardown that appears just before calling summary in main so the EXIT trap (trap teardown EXIT) is the single place that runs cleanup, or alternatively implement a guard (e.g., a TEARDOWN_DONE flag checked/set inside teardown and respected by the EXIT trap) to make teardown idempotent; locate and update the teardown invocation and the trap/summary/main flow (symbols: teardown, summary, trap EXIT, main) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 252-259: The test loop in test-sandbox-operations.sh checks for
fields "Sandbox", "Model", "Provider", and "OpenClaw" but the status output
never includes "OpenClaw"; change the for-list in the for field in "Sandbox"
"Model" "Provider" "OpenClaw"; do ... to use "GPU" instead of "OpenClaw" so it
checks against a field that is always present (i.e., update the string literal
"OpenClaw" to "GPU" in that for-loop).
---
Nitpick comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 586-593: The teardown function is being invoked twice (explicit
call before summary and again via the trap on EXIT); remove the explicit call to
teardown that appears just before calling summary in main so the EXIT trap (trap
teardown EXIT) is the single place that runs cleanup, or alternatively implement
a guard (e.g., a TEARDOWN_DONE flag checked/set inside teardown and respected by
the EXIT trap) to make teardown idempotent; locate and update the teardown
invocation and the trap/summary/main flow (symbols: teardown, summary, trap
EXIT, main) accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ce77f38f-b934-473e-bcc3-bfa3f68d9694
📒 Files selected for processing (1)
test/e2e/test-sandbox-operations.sh
05d263d to
22738ac
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 339-345: The test currently swallows the exit code of the
`nemoclaw "$SANDBOX_A" status` call by appending `|| true` and then treats any
non-empty output as success (affecting TC-SBX-06 and TC-SBX-08); change the
`status_output=$($TIMEOUT_CMD 120 nemoclaw "$SANDBOX_A" status 2>&1) || true`
pattern so the command's exit status is preserved (capture both the output and
the exit code, e.g., assign `status_output` and then store `$?` into a
variable), then only treat the test as passing when the output matches an
explicit recovery/health token (the grep for "recover|running|healthy|OpenClaw")
and the exit code indicates success; apply the same fix to the similar block
around the other occurrence (lines 410-423) so fatal CLI errors no longer count
as passes.
- Around line 134-149: The onboard_sandbox flow currently ignores a non-zero
exit from the nemoclaw onboard command and proceeds to succeed if the sandbox
shows up later; update the function to fail fast when onboard_exit is non-zero
by returning that exit code (or a non-zero value) instead of continuing: after
running the nemoclaw onboard block that sets onboard_exit, if [[ $onboard_exit
-ne 0 ]]; then log the error (use the existing log message) and return
$onboard_exit (or 1) immediately; keep the subsequent check that verifies
nemoclaw list but only run it when onboard_exit is zero so functions like
onboard_sandbox and the test suite cannot proceed on a partial/failed onboard.
- Around line 484-490: The test currently treats STATUS_404 as equivalent to
blocked, which is too permissive; update the conditional that inspects the probe
output (the grep on "$probe_a" used in the TC-SBX-11 assertion) to only consider
STATUS_403, ERROR, or TIMEOUT as a pass (remove STATUS_404 from the regex), and
make the analogous change for the symmetric check that examines "$probe_b" (the
identical block around lines 504-510) so both isolation checks only accept
STATUS_403/ERROR/TIMEOUT as evidence of blocking.
- Around line 367-379: The test TC-SBX-05 currently only verifies "nemoclaw
list" after running "nemoclaw $target destroy"; update the test to also assert
that the underlying runtime artifacts are gone by checking the container runtime
(e.g., run "docker ps -a" / "podman ps -a" or use "docker container ls --filter
name=$target") for containers/images/networks matching $target and fail if any
are found; add this runtime check after the existing nemoclaw list check (use
the same $target variable and destroy_exit handling) and report a clear failure
message like "Still running container for $target" so the test covers both
registry and Docker cleanup.
- Around line 273-291: The test treats any non-empty stderr as success and
doesn't verify the exit status or that the follower was running; update the
first logs invocation to capture and assert the exit code (e.g., run
output=$($TIMEOUT_CMD 10 nemoclaw "$SANDBOX_A" logs 2>&1); rc=$?; require rc==0
and non-empty output) and for the --follow branch check that the background pid
($pid) is alive before killing it (e.g., test ps -p "$pid" &>/dev/null and fail
if not), then kill and wait as currently done; reference the variables/commands
output, $TIMEOUT_CMD, nemoclaw "$SANDBOX_A" logs, --follow, pid to locate and
patch the checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 6d2cfa6a-6387-46e7-91b6-8256e9df4821
📒 Files selected for processing (1)
test/e2e/test-sandbox-operations.sh
22738ac to
83135c7
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
test/e2e/test-sandbox-operations.sh (3)
274-292:⚠️ Potential issue | 🟠 Major
TC-SBX-04can still pass whenlogsnever works.The first branch ignores the exit code from
nemoclaw ... logs, so any error text counts as success. The--followbranch only proves the PID is gone afterkill; if the follower has already exited before the sleep, this still goes green. Require a zero exit code for the one-shot logs call and fail if the background follower is already dead before the kill step.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 274 - 292, The test currently treats any non-empty stderr/stdout from the one-shot logs call as success and doesn't check its exit code; modify the block using output=$($TIMEOUT_CMD 10 nemoclaw "$SANDBOX_A" logs 2>&1) || true to instead capture the exit code and require it to be zero (use the actual command exit status, e.g., run without forcing success and assert $? == 0) so TC-SBX-04 fails on errors, and for the follow branch ensure the background follower is alive before the kill by checking the PID (ps -p "$pid") immediately after sleep and failing if that check shows the process already exited, only then proceed to kill/wait and assert the process is gone.
332-345:⚠️ Potential issue | 🟠 Major
TC-SBX-08does not prove the recovery path was exercised.The fault-injection step is entirely best-effort, and the subsequent
statusprobe also drops its exit code. If no gateway PID was actually terminated, this case is just observing a healthy sandbox and can still pass. Capture whether a process was killed, then only accept recovery output from a zero-exitnemoclaw status.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 332 - 345, The test currently runs a best-effort kill via sandbox_exec (the "pkill -9 -f 'openclaw gateway' || kill -9 $(pgrep -f 'openclaw gateway') || ..." pipeline) but discards whether any process was actually terminated and likewise ignores the exit code of the nemoclaw status probe; change the logic so you first capture the exit status of the sandbox_exec kill pipeline (e.g., store its exit code or a boolean like "killed_any") and only proceed to validate recovery if a kill actually occurred, then run the status probe using $TIMEOUT_CMD 120 nemoclaw "$SANDBOX_A" status and require its exit code to be zero before accepting the recovery messages in status_output (still inspect status_output for "recover|running|healthy|OpenClaw"); update the TC-SBX-08 pass/fail decision to fail when no process was killed or when status exits non-zero even if output matches.
403-425:⚠️ Potential issue | 🟠 Major
TC-SBX-06can pass without an actual gateway outage.
docker killis best-effort, and a still-running gateway only emits a warning before the test matches recovery text fromstatus. If the container survives or restarts immediately, this block is exercising normal behavior, not recovery. Gate the assertion on an observed down/removed state and requirenemoclaw statusto exit 0.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-sandbox-operations.sh` around lines 403 - 425, The test currently treats any matching text in status_output as a recovery even if the gateway never actually went down; modify the logic so you first gate on the observed container_state (from container_state=$(docker inspect -f '{{.State.Running}}' "$container" ...)) and only perform the recovery assertion when container_state is "false" or "removed"; if the container_state is still "true" then log a warning and skip the recovery assertion. Also require that the nemoclaw status invocation (status_output=$($TIMEOUT_CMD 300 nemoclaw "$SANDBOX_A" status 2>&1) || true) actually exits 0 before checking for recovery strings — use the command exit code rather than accepting any output — and only pass "TC-SBX-06" when both the container is down/removed and the nemoclaw status command exited 0 and contains the expected recovery/health text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 451-462: TC-SBX-10 currently only checks that SANDBOX_A and
SANDBOX_B names appear in the output of `nemoclaw list` (using variables
found_a/found_b) but does not validate sandbox metadata; update the test so
after confirming presence it calls the command that returns sandbox details
(e.g., `nemoclaw show` or equivalent inspect command) for each sandbox and
parses/validates the expected metadata fields (model, provider, version or other
required keys) for SANDBOX_A and SANDBOX_B, and only call pass "TC-SBX-10: Both
sandboxes visible in nemoclaw list" if both presence and metadata checks succeed
(otherwise call fail with clear details which sandbox and which metadata field
mismatched).
- Around line 476-492: The check treating empty probe output as a successful
isolation is wrong because sandbox_exec_for is wrapped with "|| true" and can
return an empty string on SSH/infrastructure failure; remove the "|| true"
wrapper or capture sandbox_exec_for's exit status and the probe output (probe_a)
and explicitly fail when exit != 0 or probe_a is empty to surface infra errors,
then only treat outputs that contain STATUS_403, ERROR, or TIMEOUT as PASS for
isolation (leave the existing grep checks for STATUS_403/ERROR/TIMEOUT and the
STATUS_[0-9]+ failure branch unchanged, but add an initial guard that fails with
an infrastructure/SSH error if probe_a is empty or the command exit code
indicates failure).
- Around line 368-386: The test currently logs non-zero exits from the nemoclaw
destroy command but does not mark the test as failed; update the block that
checks destroy_exit to call fail(...) when destroy_exit != 0 (use the same test
id/context "TC-SBX-05: Destroy ($target)" and include the exit code in the
failure message) instead of only logging, and keep the subsequent nemoclaw list
and openshell sandbox list checks for diagnostic validation; reference the
destroy_exit variable, the nemoclaw destroy invocation, and the fail()/pass()
functions to locate where to add the explicit fail call.
---
Duplicate comments:
In `@test/e2e/test-sandbox-operations.sh`:
- Around line 274-292: The test currently treats any non-empty stderr/stdout
from the one-shot logs call as success and doesn't check its exit code; modify
the block using output=$($TIMEOUT_CMD 10 nemoclaw "$SANDBOX_A" logs 2>&1) ||
true to instead capture the exit code and require it to be zero (use the actual
command exit status, e.g., run without forcing success and assert $? == 0) so
TC-SBX-04 fails on errors, and for the follow branch ensure the background
follower is alive before the kill by checking the PID (ps -p "$pid") immediately
after sleep and failing if that check shows the process already exited, only
then proceed to kill/wait and assert the process is gone.
- Around line 332-345: The test currently runs a best-effort kill via
sandbox_exec (the "pkill -9 -f 'openclaw gateway' || kill -9 $(pgrep -f
'openclaw gateway') || ..." pipeline) but discards whether any process was
actually terminated and likewise ignores the exit code of the nemoclaw status
probe; change the logic so you first capture the exit status of the sandbox_exec
kill pipeline (e.g., store its exit code or a boolean like "killed_any") and
only proceed to validate recovery if a kill actually occurred, then run the
status probe using $TIMEOUT_CMD 120 nemoclaw "$SANDBOX_A" status and require its
exit code to be zero before accepting the recovery messages in status_output
(still inspect status_output for "recover|running|healthy|OpenClaw"); update the
TC-SBX-08 pass/fail decision to fail when no process was killed or when status
exits non-zero even if output matches.
- Around line 403-425: The test currently treats any matching text in
status_output as a recovery even if the gateway never actually went down; modify
the logic so you first gate on the observed container_state (from
container_state=$(docker inspect -f '{{.State.Running}}' "$container" ...)) and
only perform the recovery assertion when container_state is "false" or
"removed"; if the container_state is still "true" then log a warning and skip
the recovery assertion. Also require that the nemoclaw status invocation
(status_output=$($TIMEOUT_CMD 300 nemoclaw "$SANDBOX_A" status 2>&1) || true)
actually exits 0 before checking for recovery strings — use the command exit
code rather than accepting any output — and only pass "TC-SBX-06" when both the
container is down/removed and the nemoclaw status command exited 0 and contains
the expected recovery/health text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9dd4d120-4362-4325-8b2f-e6cfd9064d10
📒 Files selected for processing (1)
test/e2e/test-sandbox-operations.sh
83135c7 to
553e583
Compare
There was a problem hiding this comment.
Could we add at least a basic CI job in this PR that runs the script?
There was a problem hiding this comment.
Thanks, I have updated with the integration
32a75aa to
ad6edce
Compare
…n suite Add test/e2e/test-sandbox-operations.sh covering 10 test cases (14 assertions) for sandbox operation features that currently lack E2E coverage: - TC-SBX-01: nemoclaw list shows onboarded sandbox - TC-SBX-02: SSH + openclaw agent chat returns expected response - TC-SBX-03: nemoclaw status output contains expected fields - TC-SBX-04: nemoclaw logs produces output; --follow exits cleanly - TC-SBX-05: nemoclaw destroy removes sandbox from registry and Docker - TC-SBX-06: docker kill gateway container; nemoclaw status handles gracefully; re-onboard succeeds - TC-SBX-07: delete sandboxes.json; nemoclaw list rebuilds from live OpenShell state - TC-SBX-08: kill OpenClaw gateway process inside sandbox; nemoclaw status detects and recovers - TC-SBX-10: two sandboxes onboarded; nemoclaw list shows both - TC-SBX-11: cross-sandbox HTTP probe returns 403 (proxy blocks) Follows conventions from test-sandbox-survival.sh: SSH via openshell sandbox ssh-config, node for in-sandbox probes, gtimeout/timeout portability. Validated locally: 14/14 PASS on macOS Apple Silicon, Colima, OpenShell 0.0.26. Signed-off-by: Truong Nguyen <[email protected]> Made-with: Cursor
12ec0f6 to
4dcd009
Compare
ericksoa
left a comment
There was a problem hiding this comment.
Approving — this is a well-structured E2E test suite with good phase ordering, robust helpers, and sensible assertions. The earlier CodeRabbit issues have all been addressed (safe arithmetic, exit code checks, metadata validation, tightened isolation probes).
A few minor items for a follow-up PR:
-
TC-SBX-06: gate on
status_exit— Unlike TC-SBX-08 which checksstatus_exitbefore asserting on output, TC-SBX-06 skips that check. If the status command fails with an error message containing "gateway", it passes. At minimum, log a warning whenstatus_exit != 0. -
TC-SBX-08: comment the kill fallback chain — The three-strategy kill command is correct but hard to read. A brief comment explaining why the fallbacks exist (different sandbox environments) would help.
-
TC-SBX-11: document port 18789 — No comment explaining what service is expected on this port.
-
TC-SBX-10: note the metadata parsing assumption —
grep -A1 "$sb" | tail -1assumes metadata is on the next line. A comment would help ifnemoclaw listformat ever changes.
None of these block merge. Nice work on the coverage. 👍
|
Note for future PRs: please make sure your commits are signed with verified signatures (GPG or SSH signing). Not blocking this one, but we'll need that going forward. Will merge once CI passes. 👍 |
Thank you, @ericksoa |
Summary
Add test/e2e/test-sandbox-operations.sh covering 10 sandbox operation features that currently lack E2E coverage. The script runs 14 assertions across basic operations, recovery paths, and multi-sandbox isolation.
The integration of those to pipeline will be delivered in another PR
Test cases:
TC-SBX-01: nemoclaw list shows onboarded sandbox
TC-SBX-02: SSH into sandbox + openclaw agent -m returns expected response
TC-SBX-03: nemoclaw status output contains Sandbox, Model, Provider, OpenClaw fields
TC-SBX-04: nemoclaw logs produces output; --follow exits cleanly after kill
TC-SBX-05: nemoclaw destroy removes sandbox from registry and Docker
TC-SBX-06: docker kill openshell-cluster-nemoclaw (hard crash); nemoclaw status handles gracefully; re-onboard succeeds
TC-SBX-07: Delete sandboxes.json; nemoclaw list rebuilds registry from live OpenShell state
TC-SBX-08: Kill OpenClaw gateway process inside sandbox; nemoclaw status detects and recovers
TC-SBX-10: Two sandboxes onboarded; nemoclaw list shows both with correct metadata
TC-SBX-11: Cross-sandbox node HTTP probe returns HTTP 403 (proxy blocks); both directions verified
Related Issue
Link to issue 1863
Changes
Type of Change
Testing
npx prek run --all-filespasses (or equivalentlymake check).npm testpasses.make docsbuilds without warnings. (for doc-only changes)Checklist
General
Code Changes
npx prek run --all-filesauto-fixes formatting (ormake formatfor targeted runs).Doc Changes
nemoclaw-contributor-update-docsagent skill to draft changes while complying with the style guide. For example, prompt your agent with "/nemoclaw-contributor-update-docscatch up the docs for the new changes I made in this PR."Signed-off-by: Truong Nguyen [email protected]
Summary by CodeRabbit