Skip to content

fix(sandbox): resolve symlinked binary paths in network policy matching#774

Open
johntmyers wants to merge 5 commits intomainfrom
fix/770-symlink-binary-resolution
Open

fix(sandbox): resolve symlinked binary paths in network policy matching#774
johntmyers wants to merge 5 commits intomainfrom
fix/770-symlink-binary-resolution

Conversation

@johntmyers
Copy link
Copy Markdown
Collaborator

@johntmyers johntmyers commented Apr 6, 2026

Summary

Policy binary paths specified as symlinks (e.g., /usr/bin/python3) were silently denied because the kernel reports the canonical path via /proc/<pid>/exe (e.g., /usr/bin/python3.11). This fix resolves symlinks through the container filesystem after the entrypoint starts, expanding the OPA policy data so both the original and resolved paths match.

Related Issue

Closes #770

Changes

  • opa.rs: Added resolve_binary_in_container() helper that resolves symlinks via /proc/<pid>/root/ on Linux using iterative read_link (not canonicalize, which resolves the procfs mount itself). Added from_proto_with_pid() and reload_from_proto_with_pid() methods that expand binary paths during OPA data construction. Existing from_proto() / reload_from_proto() delegate with pid=0 (backward-compatible, no expansion). Added normalize_path() for relative symlink targets with .. components.
  • lib.rs: load_policy() now retains the proto for post-start OPA rebuild. After entrypoint_pid.store(), triggers a one-shot OPA rebuild with the real PID. run_policy_poll_loop() passes the PID on each hot-reload so symlinks are re-resolved.
  • sandbox-policy.rego: Deny reason for binary mismatches now leads with SYMLINK HINT and includes actionable fix guidance (readlink -f command, what to check in logs).

Design decisions

  • Expand policy data, not evaluation logic — the Rego rules and per-request evaluation path are untouched. Only the OPA data (binary list) is enriched at load time. This avoids introducing new code in the security-critical hot path.
  • Graceful degradation — if symlink resolution fails for any reason, the original path is preserved and behavior is identical to before this change. Resolution is best-effort.
  • No Rego changes needed — the existing b.path == exec.path strict equality naturally matches the expanded entry.
  • read_link over canonicalizestd::fs::canonicalize resolves /proc/<pid>/root itself (a kernel pseudo-symlink to /), stripping the prefix needed for path extraction. We use iterative read_link which reads only the specified symlink target, staying within the container namespace.

Best-effort approach and known risks

Symlink resolution is opportunistic — it improves the common case but cannot be guaranteed in all environments. When resolution fails, we are loud about it: per-binary WARN-level logs explain exactly what failed and what the operator should do. Deny reasons include prominent SYMLINK HINT text with actionable fix commands. Both flow through the gRPC LogPushLayer and are visible via openshell logs.

Environments where resolution will not work:

Environment Reason User impact
Restricted ptrace scope (kernel.yama.ptrace_scope >= 2) /proc/<pid>/root/ returns EACCES even for own PID Symlinks must be specified as canonical paths in policy
Rootless containers (rootless Docker, Podman) User namespace isolation prevents procfs root traversal Same — canonical paths required
Kubernetes pods without elevated security context Default seccomp/AppArmor profiles may block procfs root access Same — canonical paths required
Standalone/local mode (--policy-rules/--policy-data, no --sandbox-id) No retained proto to rebuild, no gRPC log push Resolution doesn't run; deny reasons appear on stdout only
Multi-level symlinks through /etc/alternatives Should work (iterative loop handles chains up to 40 levels), but unusual layouts may produce unexpected resolved paths Verify with readlink -f inside sandbox
Dynamically created symlinks after container start Resolution runs at startup and on policy reload, not continuously New symlinks won't be resolved until next policy reload

In all failure cases: the original user-specified path is preserved, the deny behavior is identical to pre-fix, and the operator gets a clear warning log explaining why resolution didn't work and what to do about it.

Testing

  • mise run pre-commit passes
  • 19 new unit tests covering:
    • normalize_path helper for ../. resolution
    • resolve_binary_in_container edge cases (glob skip, pid=0, nonexistent paths)
    • Expanded binary matching (resolved path allowed, original preserved, unrelated binaries denied)
    • Ancestor matching with expanded paths
    • Proto round-trips with _with_pid variants
    • Hot-reload behavior (engine replacement, symlink expansion on reload, LKG preservation)
    • Deny reason includes SYMLINK HINT and readlink -f command
    • Linux-specific e2e tests with real symlinks (single-level, multi-level, non-symlink, full proto-to-decision, hot-reload before/after) — gracefully skip in restricted environments
  • All 452 existing + new tests pass (449 sandbox + 5 integration)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@johntmyers johntmyers requested a review from a team as a code owner April 6, 2026 22:44
@johntmyers johntmyers self-assigned this Apr 6, 2026
@johntmyers johntmyers added the test:e2e Requires end-to-end coverage label Apr 6, 2026
@mjamiv
Copy link
Copy Markdown

mjamiv commented Apr 9, 2026

Confirming this affects real deployments. We run 51+ CLI tools in an OpenShell sandbox, all symlinked from /sandbox/.local/bin/ → actual binaries elsewhere. When the proxy resolves through the symlink before checking the binary allowlist, tools that should be permitted get blocked.

Current workaround: list both the symlink path AND the resolved binary path in the policy binaries: section. This is brittle and scales poorly with tool count.

Would be great to see this merged — it would simplify our policy config significantly.

@johntmyers
Copy link
Copy Markdown
Collaborator Author

Thanks @mjamiv any chance you were able to build this branch and verify?

Policy binary paths specified as symlinks (e.g., /usr/bin/python3) were
silently denied because the kernel reports the canonical path via
/proc/<pid>/exe (e.g., /usr/bin/python3.11). The strict string equality
in Rego never matched.

Expand policy binary paths by resolving symlinks through the container
filesystem (/proc/<pid>/root/) after the entrypoint starts. The OPA data
now contains both the original and resolved paths, so Rego's existing
strict equality check naturally matches either.

- Add resolve_binary_in_container() helper for Linux symlink resolution
- Add from_proto_with_pid() and reload_from_proto_with_pid() to OpaEngine
- Trigger one-shot OPA rebuild after entrypoint_pid is stored
- Thread entrypoint_pid through run_policy_poll_loop for hot-reloads
- Improve deny reason with symlink debugging hint
- Add 18 new tests including hot-reload and Linux symlink e2e tests

Closes #770
…naccessible

The Linux-specific symlink resolution tests depend on /proc/<pid>/root/
being readable, which requires CAP_SYS_PTRACE or permissive ptrace
scope. This is unavailable in CI containers, rootless containers, and
hardened hosts. Add a procfs_root_accessible() guard that skips these
tests gracefully instead of failing.
…improve deny messages

When /proc/<pid>/root/ is inaccessible (restricted ptrace, rootless
containers, hardened hosts), resolve_binary_in_container now logs a
per-binary warning with the specific error, the path it tried, and
actionable guidance (use canonical path or grant CAP_SYS_PTRACE).
Previously this was completely silent.

The Rego deny reason for binary mismatches now leads with 'SYMLINK HINT'
and includes a concrete fix command ('readlink -f' inside the sandbox)
plus what to look for in logs if automatic resolution isn't working.
…ution

std::fs::canonicalize resolves /proc/<pid>/root itself (a kernel
pseudo-symlink to /) which strips the prefix needed for path extraction.
This caused resolution to silently fail in all environments, not just CI.

Replace with an iterative read_link loop that walks the symlink chain
within the container namespace without resolving the /proc mount point.
Add normalize_path helper for relative symlink targets containing ..
components. Update procfs_root_accessible test guard to actually probe
the full resolution path instead of just checking path existence.
@johntmyers johntmyers force-pushed the fix/770-symlink-binary-resolution branch from 8253c6c to 907b9fe Compare April 9, 2026 22:12
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 9, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mjamiv
Copy link
Copy Markdown

mjamiv commented Apr 10, 2026

@johntmyers Haven't been able to build from source — no Rust toolchain on this host and it's a production VPS I'd rather not clutter.

Happy to test a pre-built binary or release candidate if one's available. Here's what I'd verify:

Test case: 51 CLI tools symlinked as /sandbox/.local/bin/<tool> → various targets (/sandbox/clawd/tools/<tool>/bin, /sandbox/.local/lib/node_modules/.bin/<tool>, /usr/local/bin/<tool>). Policy has binaries: entries using the symlink paths. On v0.0.25, the proxy resolves through the symlink before checking the allowlist, so only the canonical target path matches — the symlink paths in the policy get silently rejected.

Current workaround: Listing both symlink AND resolved paths in the binaries: section. Scales poorly with 51+ tools.

What I'd verify:

  1. Policy with only symlink paths in binaries: allows traffic
  2. The resolve_binary_in_container via /proc/<pid>/root/ resolves correctly
  3. Policy hot-reload (via openshell policy set) also picks up the symlink resolution
  4. No regression on non-symlinked binaries

If there's a dev release tag or binary I can drop in, I'll test same-day.

@mjamiv
Copy link
Copy Markdown

mjamiv commented Apr 10, 2026

@johntmyers Built, deployed, and tested. Here's what I found — the fix compiles cleanly and runs, but does not actually resolve the bug in our environment. The warning path fires unconditionally and the 403 reproduces. Details:

Environment

  • OpenShell v0.0.25 cluster (Docker container: ghcr.io/nvidia/openshell/cluster:0.0.25) on Ubuntu 24.04 VPS
  • K3s-managed sandbox (claw-test), supervisor running as root in the cluster container
  • Supervisor PID and mount namespaces match the sandbox child (verified — they share pid:[4026533081] and mnt:[4026533080])
  • CAP_SYS_PTRACE IS present in CapEff (0x00000004a82c35fbCAP_SYS_PTRACE bit set)

Build

  • rustc 1.94.1, cargo build --release -p openshell-sandbox on host (4 cores, CARGO_BUILD_JOBS=2)
  • ~6m35s build, 16MB binary, 0.0.27-dev.8+g907b9fe
  • 14 warnings (mostly dead-code), no errors

Deployment

  • docker cp the patched binary to /opt/openshell/bin/openshell-sandbox in the cluster container
  • kill -9 the running supervisor, K3s respawned with the new binary
  • Backup of stock v0.0.25 retained, rollback was clean

Test case

  • Policy lists /usr/bin/python3 in binaries: (not /usr/bin/python3.12)
  • Inside the sandbox: /usr/bin/python3 -> python3.12 (standard Debian symlink)
  • Policy group brave_search allows api.search.brave.com
  • Before: python3 -c 'import urllib.request; urllib.request.urlopen("https://api.search.brave.com")'Tunnel connection failed: 403 Forbidden

With the patched supervisor

Same 403. The supervisor logs 23 warnings at startup (once per policy binary ref), all identical:

WARN openshell_sandbox::opa: Cannot access container filesystem for symlink resolution;
binary paths in policy will be matched literally. If a policy binary is a symlink
(e.g., /usr/bin/python3 -> python3.11), use the canonical path instead, or run with CAP_SYS_PTRACE

No Resolved policy binary symlink info logs — every call to resolve_binary_in_container() hits the error branch in symlink_metadata(). It fires for non-symlinks too (/usr/bin/bash, /usr/bin/gh), which means symlink_metadata() itself is erroring on the /proc/<pid>/root/<path> lookup, not failing on the symlink check.

The puzzle

Running the exact same access manually from within the supervisor's PID + mount namespace works fine:

$ nsenter -t <supervisor_pid> -p -m sh
$ ls -la /proc/<child_pid>/root/usr/bin/python3
lrwxrwxrwx 1 root root 10 Nov 12 12:15 /proc/<child_pid>/root/usr/bin/python3 -> python3.12

So the path exists, the namespaces are right, CAP_SYS_PTRACE is held, and the supervisor is root. Yet std::fs::symlink_metadata("/proc/<pid>/root/usr/bin/bash") returns Err.

Best guesses

  1. Timing: reload_from_proto_with_pid(proto, handle.pid()) runs immediately after ProcessHandle::spawn in lib.rs:728. At that moment the child may have the PID allocated but /proc/<pid>/root/ may not be populated yet (pre-exec or pre-namespace-setup). The poll loop at 10s intervals never re-runs the resolve either — it calls reload_from_proto_with_pid with entrypoint_pid.load(), but I see zero Resolved policy binary symlink logs after 15+ seconds of uptime.
  2. PID translation: handle.pid() might be returning a PID in a namespace different from the one the /proc fs lookup uses (though our ns verification says they match).
  3. The error = %e field isn't showing up in log output — the default tracing subscriber elides fields. It'd be very helpful if the error message was appended to the warning body itself, or if there was a RUST_LOG=openshell_sandbox::opa=debug path that dumps the raw io::Error.

What would help

  • Include error = %e inline in the warning string so we can see ENOENT vs EACCES vs something stranger
  • Add a tracing::debug log right before the symlink_metadata call printing the exact container_path being tested
  • Consider calling resolve_binary_in_container later in the startup sequence, after the child has had a few hundred ms to initialize, or from the first policy poll tick rather than immediately after spawn

Happy to rebuild with an instrumented version if you want me to patch in a few extra logs and re-test. I also kept the build cache so re-runs are ~1 min.

Rollback verified clean, sandbox is back on stock v0.0.25.

…ready

The one-shot resolve ran immediately after ProcessHandle::spawn, before
the child's mount namespace and /proc/<pid>/root/ were populated. This
caused symlink_metadata to fail with ENOENT on every binary, and the
poll loop never retried because it only reloads when the policy hash
changes on the server.

Replace the synchronous resolve with an async task that probes
/proc/<pid>/root/ with retries (10 attempts, 500ms apart, 5s total).
The child's mount namespace is typically ready within a few hundred ms.

Also inline error values into warning message strings so they appear in
default log output (not just as structured tracing fields that may be
elided), and add debug-level logs before each symlink_metadata call to
aid diagnosis.
@mjamiv
Copy link
Copy Markdown

mjamiv commented Apr 10, 2026

@johntmyers Retested with commit 5607fc2 just now. The deferred-resolution fix works — here's the evidence.

Environment

  • Same setup as my previous test: OpenShell v0.0.25 cluster container (Docker, ghcr.io/nvidia/openshell/cluster:0.0.25), Ubuntu 24.04 VPS, K3s-managed sandbox running ghcr.io/nvidia/openshell-community/sandboxes/openclaw:latest.
  • Cargo incremental build from /tmp/openshell-build/target/ cache; binary e8965e605615905aed0640763387b999 deployed via docker cp into the cluster container's /opt/openshell/bin/openshell-sandbox (which is shared with the sandbox pod via the overlay lower-dir, so the swap propagates on the next pod restart).
  • Policy: our real network-policy YAML from Host Molt — 23 network_policies, 8 unique binaries per policy (184 total binary references). 7 of 8 are /usr/bin/* (exist in the openclaw sandbox image); 1 is /sandbox/.local/bin/gog (a symlinked Molt CLI tool that does not exist in this test sandbox's filesystem).

Observation: warning pattern changed dramatically vs. synchronous version

Previous test (pre-retry version, commit 907b9fec):

Warning path "Cannot access container filesystem for symlink resolution" fires for ALL 8 policy binaries (even non-symlinks), meaning symlink_metadata() itself errors before the symlink check.

Current test (5607fc26, deferred async retry):

  • Total WARN lines: 23
  • Unique path= values: /sandbox/.local/bin/gog only (× 23, once per network_policy)
  • Zero WARN lines for /usr/bin/{python3,node,git,curl,npm,gh,bash}

The 7 existing binaries each resolved through resolve_binary_in_container without warning — meaning symlink_metadata() at /proc/<entrypoint_pid>/root/<path> returned Ok for every one of them. Only the legitimately-missing gog path produced the expected ENOENT warning.

Direct verification from inside the pod

After the restart, entrypoint_pid = 42, confirmed accessible via /proc/42/root/:

$ kubectl -n openshell exec claw-test -c agent -- sh -c '
  ls /proc/42/root/usr/bin/python3
  readlink -f /usr/bin/python3
  ls /proc/42/root/sandbox/.local/bin/gog
'
/proc/42/root/usr/bin/python3
/usr/bin/python3.12
ls: cannot access '/proc/42/root/sandbox/.local/bin/gog': No such file or directory

That's exactly what the fix needs: /proc/<pid>/root/usr/bin/python3 is reachable once the mount namespace is ready, and the symlink chain resolves to python3.12.

What I couldn't test

I wanted to also do a functional A/B — i.e., reproduce a 403 on a symlinked binary under stock and show it gone under patched. That part was inconclusive in this sandbox because our claw-test pod isn't currently running an active egress-enforcement iptables/netfilter interception path (separate issue — I'll dig in or file a new bug once I figure out why). Under both stock and patched, unrestricted HTTPS requests to example.com, slack.com, and even evil.com (not in any network_policy) all returned 200, confirming no policy-level enforcement was reachable to A/B against. Functional proof will have to wait for a sandbox where enforcement is fully wired.

So: no direct 200/403 proof-of-fix, but the log-pattern evidence is crisp — the previous test failed because symlink_metadata couldn't see any binary through /proc/<pid>/root/; this build succeeds for every binary that actually exists. That's the exact behavior change the deferred retry is meant to produce.

Cleanup

Rolled the cluster container back to stock ce1c6e0c126e23a9e95fdb7560eb9653 (v0.0.25). /tmp/openshell-build/ cache kept in case another iteration is needed.

One small nit

RUST_LOG / OPENSHELL_LOG_LEVEL defaults to warn, so the new info!("Container filesystem accessible, resolving policy binary symlinks") and the per-attempt debug!(...) retry lines don't show in default operator logs. That made it harder to see the retry loop working until I cross-checked against the warning pattern. Not a blocker — just worth noting for anyone else reproducing. A one-line OPENSHELL_LOG_LEVEL=info env injection on the sandbox CR is enough to get the full trace.

LGTM from a real-world behavioral standpoint. Thanks for the fast turnaround on the iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Sandbox egress proxy checks resolved binary path, not symlink — python3 silently blocked

2 participants