feat(sandbox): run boot hook on sandbox startup by drew · Pull Request #775 · NVIDIA/OpenShell

drew · 2026-04-07T00:01:44Z

Summary

Run /etc/openshell/boot.sh as a supervisor-managed startup hook on every sandbox startup before the long-lived child process.

Add the shared boot script path constant, cover the hook with startup/failure/regression tests, and document the new sandbox image contract.

Related Issue

None.

Changes

add the canonical /etc/openshell/boot.sh path in openshell-policy
run the boot hook from the sandbox supervisor startup path before the normal child process
fail sandbox startup on non-zero boot hook exit and skip cleanly when the hook is missing
add boot hook tests and update architecture and user docs

Testing

mise run pre-commit passes
Unit tests added/updated
E2E tests added/updated (if applicable)

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable)

Signed-off-by: Drew Newberry <[email protected]>

github-actions · 2026-04-07T00:02:56Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/OpenShell/pr-preview/pr-775/
Built to branch `gh-pages` at 2026-04-07 06:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

The boot hook tests were failing on CI because they used ProcessHandle::spawn which applies Linux sandbox enforcement (seccomp, landlock, privilege dropping) in a pre_exec hook. On CI containers running as root, drop_privileges tried to switch to the non-existent sandbox user, causing EINVAL. Replace run_test_boot_hook and spawn_test_process with a test-specific implementation using plain tokio::process::Command that exercises boot hook logic without sandbox enforcement.

mjamiv · 2026-04-11T19:06:11Z

Production datapoint from an OpenShell v0.0.25 deployment that just hit the failure mode this PR addresses.

After an unexpected host power cycle, the sandbox pods respawned cleanly and reached Ready phase, but the long-lived child process inside each sandbox (an OpenClaw gateway) did NOT auto-start — the pod was up but the process table inside was empty of the gateway process. The fleet had to be recovered manually by re-running each sandbox's startup script via openshell sandbox exec.

That is exactly the gap this PR fixes — having the supervisor run /etc/openshell/boot.sh before the long-lived child gives the sandbox image a documented contract for "what needs to run on every startup" and removes the manual recovery step.

On v0.0.25 the workaround is a host-side watchdog: a systemd user timer that curls each forwarded /health endpoint every 60s and re-launches the in-sandbox process + re-creates the forward on failure. It works but is clearly a v0.0.25-era substitute — once this PR ships in a release, the right pattern is an in-sandbox boot hook that the operator doesn't have to maintain out-of-band.

Happy to test this PR on a proxy-only sandbox deployment if the branch is ready for outside verification.

feat(sandbox): run boot hook on sandbox startup

f3fb96f

Signed-off-by: Drew Newberry <[email protected]>

drew requested a review from a team as a code owner April 7, 2026 00:01

drew self-assigned this Apr 7, 2026

drew marked this pull request as draft April 7, 2026 00:08

drew mentioned this pull request Apr 7, 2026

feat(sandbox): persist startup command across gateway stop/start cycles #753

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sandbox): run boot hook on sandbox startup#775

feat(sandbox): run boot hook on sandbox startup#775
drew wants to merge 2 commits intomainfrom
feat/sandbox-boot-hook

drew commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-07 06:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

mjamiv commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drew commented Apr 7, 2026

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

github-actions bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-07 06:51 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

mjamiv commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 7, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-07 06:51 UTC.
Preview will be ready when the GitHub Pages deployment is complete.