Problem
NemoClaw provides strong container-level isolation via OpenShell (Landlock, seccomp, network namespaces), but has no application-layer scanning of tool inputs and outputs for prompt injection attacks. An agent inside the sandbox can receive injected instructions through tool responses (e.g., fetched web content, file contents) that attempt to override system prompts, manipulate tool calls, or exfiltrate data.
Proposal
Add a self-contained injection scanner module under nemoclaw/src/security/ that can optionally be wired into the sandbox tool-call flow as a pre/post-check.
Design
15 regex patterns across 4 categories:
- Role/system prompt overrides (4 patterns): "you are now", "ignore previous instructions", system tags, system colon-prefix
- Instruction injection (5 patterns): IMPORTANT, CRITICAL, OVERRIDE,
[INST], <<SYS>>
- Tool manipulation (3 patterns): "call/invoke/use tool", "use function", "execute command"
- Data exfiltration (3 patterns): "base64 encode", "send/post/upload to", POST+secret/token/key
Evasion resistance:
- NFKC unicode normalization (catches homoglyph attacks via fullwidth characters)
- Zero-width character stripping (U+200B, U+200C, U+200D, U+FEFF)
- Control character stripping (preserves newlines/tabs)
- Base64 decode-and-rescan for encoded payloads (with strict alphabet validation)
Severity tiers: high / medium / low with helper functions for escalation decisions.
Integration: Exported scanFields() function that accepts Record<string, string> and returns findings. No dependencies on NemoClaw internals. Can be called before forwarding tool inputs to the sandbox or after receiving tool outputs.
Scope
- New file:
nemoclaw/src/security/injection-scanner.ts
- New file:
nemoclaw/src/security/injection-scanner.test.ts
- No changes to existing NemoClaw code
- Full Vitest test coverage
Non-goals
- Runtime integration with OpenShell's tool-call flow (future work)
- Blocking/escalation logic (consumer decides what to do with findings)
- LLM-based detection (this is deterministic regex-based scanning)
Problem
NemoClaw provides strong container-level isolation via OpenShell (Landlock, seccomp, network namespaces), but has no application-layer scanning of tool inputs and outputs for prompt injection attacks. An agent inside the sandbox can receive injected instructions through tool responses (e.g., fetched web content, file contents) that attempt to override system prompts, manipulate tool calls, or exfiltrate data.
Proposal
Add a self-contained injection scanner module under
nemoclaw/src/security/that can optionally be wired into the sandbox tool-call flow as a pre/post-check.Design
15 regex patterns across 4 categories:
[INST],<<SYS>>Evasion resistance:
Severity tiers: high / medium / low with helper functions for escalation decisions.
Integration: Exported
scanFields()function that acceptsRecord<string, string>and returns findings. No dependencies on NemoClaw internals. Can be called before forwarding tool inputs to the sandbox or after receiving tool outputs.Scope
nemoclaw/src/security/injection-scanner.tsnemoclaw/src/security/injection-scanner.test.tsNon-goals