Skip to content

feat: verify if tokens exposed in sandbox#1857

Merged
ericksoa merged 5 commits intoNVIDIA:mainfrom
hunglp6d:feat/verify-token-exposed-in-sandbox
Apr 14, 2026
Merged

feat: verify if tokens exposed in sandbox#1857
ericksoa merged 5 commits intoNVIDIA:mainfrom
hunglp6d:feat/verify-token-exposed-in-sandbox

Conversation

@hunglp6d
Copy link
Copy Markdown
Contributor

@hunglp6d hunglp6d commented Apr 14, 2026

Summary

Add deep credential exposure checks (M5a–M5h) to the messaging provider E2E test.
These verify that real bot tokens never appear on any observable surface inside the
sandbox — full environment dump, process cmdlines, and filesystem — beyond the existing
single-env-var check (M3/M4).

Related Issue

Closes #1852

Changes

  • Extend Phase 2 (Credential Isolation) of test/e2e/test-messaging-providers.sh with 8 new assertions (M5a–M5h):
    • M5a/M5e: full env dump must not contain the real Telegram/Discord token.
    • M5b/M5f: process cmdlines must not contain the real Telegram/Discord token.
    • M5c/M5g: recursive filesystem grep (/sandbox, /home, /etc, /tmp, /var) must not find the real token.
    • M5d/M5h: placeholder string must be present in the sandbox environment (positive check).
  • Capture env and process cmdlines once via SSH and reuse across all checks to avoid redundant round-trips.
  • Update header comment to reflect the broader credential isolation scope.

Type of Change

  • Code change for a new feature, bug fix, or refactor.

Testing

  • npx prek run --all-files passes (or equivalently make check).
  • npm test passes.
  • make docs builds without warnings. (for doc-only changes)

Checklist

General

Code Changes

  • Formatters applied — npx prek run --all-files auto-fixes formatting (or make format for targeted runs).
  • Tests added or updated for new or changed behavior.
  • No secrets, API keys, or credentials committed.
  • Doc pages updated for any user-facing behavior changes (new commands, changed defaults, new features, bug fixes that contradict existing docs).

Signed-off-by: Hung Le [email protected]

Summary by CodeRabbit

  • Tests
    • Extended credential-isolation checks for Telegram and Discord: verify real tokens are absent from sandbox environment variables, process listings, and filesystem scans.
    • Added safe remote execution mode that pipes sensitive input via stdin to avoid exposing tokens in command lines.
    • Added provider placeholder presence checks and new capture-and-evaluate points with pass/fail/skip outcomes for improved observability.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1e0780bc-a997-47c9-8e24-d888fbf92bc6

📥 Commits

Reviewing files that changed from the base of the PR and between 899f4b5 and 4b31a68.

📒 Files selected for processing (1)
  • test/e2e/test-messaging-providers.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/test-messaging-providers.sh

📝 Walkthrough

Walkthrough

Adds Phase 2 credential-isolation checks that capture the sandbox environment, process cmdlines, and perform recursive filesystem searches for real Telegram/Discord tokens; introduces sandbox_exec_stdin() to run sandbox commands with sensitive patterns via STDIN and asserts absence/presence with pass/fail/skip logic.

Changes

Cohort / File(s) Summary
Phase 2 Credential Isolation & helpers
test/e2e/test-messaging-providers.sh
Added sandbox_exec_stdin() to run sandbox commands with sensitive patterns via STDIN. Capture full sandbox env (sandbox_env_all), process cmdlines (sandbox_ps), and recursive filesystem greps for Telegram/Discord tokens (sandbox_fs_tg, sandbox_fs_dc). Added Phase 2 pass/fail/skip checks for real token absence and placeholder presence.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CI as Tester (CI)
    participant SSH as SSH Proxy
    participant Guest as Sandbox Guest
    participant FS as Sandbox Filesystem

    CI->>SSH: invoke sandbox_exec_stdin(cmd, token-pattern via STDIN)
    SSH->>Guest: execute command (capture `env`, read /proc/*/cmdline, grep filesystem paths)
    Guest->>FS: read files under /sandbox /home /etc /tmp /var
    Guest-->>SSH: return env dump, process cmdlines, grep results
    SSH-->>CI: stdout captured into sandbox_env_all, sandbox_ps, sandbox_fs_*
    CI->>CI: evaluate outputs -> pass / fail / skip for real tokens & placeholders
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I hop through env and files at night,
I whisper tokens through a secret pipe,
I peek at procs and every tree,
I guard the sandbox silently,
Carrots up — no secrets in sight. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main feature: verifying token exposure in the sandbox, which is the primary objective of this changeset.
Linked Issues check ✅ Passed The PR successfully implements all five coding requirements from issue #1852: capturing environment and process cmdlines via SSH, asserting real tokens absent from env/cmdlines/filesystem, and confirming placeholder presence.
Out of Scope Changes check ✅ Passed All changes are scoped to extending Phase 2 credential isolation checks in test-messaging-providers.sh as specified in issue #1852; no unrelated modifications present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@hunglp6d hunglp6d force-pushed the feat/verify-token-exposed-in-sandbox branch 4 times, most recently from 8e8ca0e to 32a0d83 Compare April 14, 2026 02:46
@hunglp6d hunglp6d marked this pull request as ready for review April 14, 2026 02:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 286-287: The snapshot captures using sandbox_exec (variables
sandbox_env_all and sandbox_ps) may return empty strings on SSH/remote failure,
causing false PASS; modify the test logic so after calling sandbox_exec for
env/ps (and the other captures at 289-300, 323-334) you validate the returned
snapshot is non-empty and parseable before proceeding: if sandbox_env_all or
sandbox_ps (or other snapshot variables) are empty or fail basic parsing, mark
the check as failed/skipped (exit non-zero or emit a failing test result) and do
not run the token-leak assertions (M5a/M5b/M5e/M5f) until a confirmed successful
capture exists; use the sandbox_exec wrapper name and the snapshot variable
names to locate and gate the subsequent token checks.
- Around line 303-305: The grep call embeds TELEGRAM_TOKEN into the remote
command argv (variable sandbox_fs_tg via sandbox_exec), exposing the secret in
the sandbox process list; change the approach to send the token via stdin
instead of the command line (e.g., use grep -Ff - or an equivalent remote helper
that reads patterns from stdin) and pipe TELEGRAM_TOKEN into sandbox_exec so the
token never appears in the remote argv; apply the same change to the other
occurrence around lines 337-338 to ensure both searches use stdin-based pattern
input rather than embedding $TELEGRAM_TOKEN in the SSH/command string.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d89a3554-9583-4c30-b7a7-1194c0fcccb3

📥 Commits

Reviewing files that changed from the base of the PR and between 230d27e and 32a0d83.

📒 Files selected for processing (1)
  • test/e2e/test-messaging-providers.sh

@hunglp6d hunglp6d force-pushed the feat/verify-token-exposed-in-sandbox branch from 32a0d83 to 00a5064 Compare April 14, 2026 05:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
test/e2e/test-messaging-providers.sh (2)

337-337: Add --max-count=1 to grep for early exit.

Once the first match is found, the check fails regardless of additional matches. Adding -m 1 avoids scanning the entire filesystem unnecessarily.

⚡ Proposed optimization
-sandbox_fs_tg=$(printf '%s' "$TELEGRAM_TOKEN" | sandbox_exec_stdin "grep -rFl -f /dev/stdin /sandbox /home /etc /tmp /var 2>/dev/null || true")
+sandbox_fs_tg=$(printf '%s' "$TELEGRAM_TOKEN" | sandbox_exec_stdin "grep -rFl -m 1 -f /dev/stdin /sandbox /home /etc /tmp /var 2>/dev/null || true")

Apply the same change to Line 374 for the Discord token search.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` at line 337, The grep calls that scan
the filesystem for tokens (e.g., the command assigning sandbox_fs_tg) should
exit after the first match to avoid unnecessary full scans; update the grep
invocation inside sandbox_exec_stdin to include --max-count=1 (or -m 1) so it
stops on the first hit, and apply the same change to the corresponding Discord
token search (the variable/assignment for the Discord token) to mirror the
optimization.

307-309: Consider using consistent capture methods.

Line 307 uses sandbox_exec while Lines 308-309 use openshell sandbox exec directly. Both work, but sandbox_exec provides consistent timeout handling and error suppression. Using the same helper for both captures would improve maintainability.

♻️ Suggested change for consistency
 sandbox_env_all=$(sandbox_exec "env 2>/dev/null" 2>/dev/null || true)
-sandbox_ps=$(openshell sandbox exec -n "$SANDBOX_NAME" -- \
-  sh -c 'cat /proc/[0-9]*/cmdline 2>/dev/null | tr "\0" "\n"' 2>/dev/null || true)
+sandbox_ps=$(sandbox_exec 'cat /proc/[0-9]*/cmdline 2>/dev/null | tr "\0" "\n"' 2>/dev/null || true)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-messaging-providers.sh` around lines 307 - 309, The two
captures use different helpers: sandbox_env_all uses sandbox_exec but sandbox_ps
calls openshell directly; change sandbox_ps to use the same sandbox_exec helper
so timeout handling and error suppression are consistent. Replace the direct
openshell invocation (the command generating sandbox_ps) with a sandbox_exec
call running the same sh -c 'cat /proc/[0-9]*/cmdline ...' string, preserve the
current 2>/dev/null || true behavior and assign the result back to sandbox_ps so
both sandbox_env_all and sandbox_ps use the sandbox_exec helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-messaging-providers.sh`:
- Around line 102-121: The sandbox_exec_stdin helper uses remote grep with -f
/dev/stdin which can silently fail when /dev/stdin does not exist on the remote,
producing false negatives; update sandbox_exec_stdin to send the pattern via a
temporary file on the remote (or use a here-doc/process substitution that is
supported by the remote shell) instead of relying on /dev/stdin, ensuring the
temp file is created on the remote before running grep -f, cleaned up afterward,
and that callers (the M5c/M5g checks) continue to pass the token pattern into
sandbox_exec_stdin unchanged; reference the sandbox_exec_stdin function to
locate where to implement remote temp-file creation, grep invocation change, and
cleanup.

---

Nitpick comments:
In `@test/e2e/test-messaging-providers.sh`:
- Line 337: The grep calls that scan the filesystem for tokens (e.g., the
command assigning sandbox_fs_tg) should exit after the first match to avoid
unnecessary full scans; update the grep invocation inside sandbox_exec_stdin to
include --max-count=1 (or -m 1) so it stops on the first hit, and apply the same
change to the corresponding Discord token search (the variable/assignment for
the Discord token) to mirror the optimization.
- Around line 307-309: The two captures use different helpers: sandbox_env_all
uses sandbox_exec but sandbox_ps calls openshell directly; change sandbox_ps to
use the same sandbox_exec helper so timeout handling and error suppression are
consistent. Replace the direct openshell invocation (the command generating
sandbox_ps) with a sandbox_exec call running the same sh -c 'cat
/proc/[0-9]*/cmdline ...' string, preserve the current 2>/dev/null || true
behavior and assign the result back to sandbox_ps so both sandbox_env_all and
sandbox_ps use the sandbox_exec helper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e278c7a8-aef5-410e-bee3-11f1c8fea4f2

📥 Commits

Reviewing files that changed from the base of the PR and between 00a5064 and 899f4b5.

📒 Files selected for processing (1)
  • test/e2e/test-messaging-providers.sh

Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good — approving. The core logic is correct, stdin-based pattern passing avoids leaking tokens in the remote argv, skip-on-empty prevents false greens, and the CodeRabbit issues have all been addressed.

A few minor items to clean up in a follow-up PR:

  1. Inconsistent capture helpersandbox_env_all uses sandbox_exec (SSH with timeout) but sandbox_ps calls openshell sandbox exec directly. Use sandbox_exec for both so timeout/error handling is consistent.

  2. Repetitive check blocks — The 8 checks follow 3 repeating patterns duplicated for Telegram/Discord. A small helper like assert_token_absent "$surface_data" "$TOKEN" "M5a" "description" would cut ~60 lines and make adding future providers easier.

  3. Guard against empty token variable — If $TELEGRAM_TOKEN or $DISCORD_TOKEN were somehow unset, grep -f - with empty stdin matches nothing and M5c/M5g pass trivially. A [ -z "$TOKEN" ] && skip guard before the filesystem greps would add a safety net for a security test.

  4. Temp file cleanup on signalsandbox_exec_stdin leaks the ssh-config temp file if the script is killed mid-SSH. A trap would be more robust (low priority since this runs on ephemeral infra).

None of these block merge. Nice work on the credential isolation coverage. 👍

@ericksoa ericksoa merged commit da055a5 into NVIDIA:main Apr 14, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Verify real bot token is never exposed inside the sandbox

3 participants