Safety policy for constraining meta-agent modifications

HyperAgents executes model-generated code in a self-improvement loop where the meta-agent rewrites task agent source autonomously. The README correctly flags this as executing "untrusted, model-generated code."

We've put together a safety policy pack that constrains what the meta-agent can do during the optimization loop:

- **Reads**: unrestricted (meta-agent needs to observe task agent performance)
- **Writes**: restricted to `workspace/` only, with approval gate (prevents rewriting evaluation harness, own source, or system files)
- **Command execution**: blocked (meta-agent rewrites code; execution goes through the framework)
- **File deletion**: blocked (preserves full optimization history)
- **Network requests**: blocked (closed-loop optimization, no data exfiltration)
- **Rate limit**: 10 tool calls/minute (prevents runaway rewrite cycles)

Every allowed and denied action produces a signed receipt. The full run produces a verifiable audit chain — useful for debugging optimization regressions and for reproducibility.

The policies are available in both JSON and [Cedar](https://www.cedarpolicy.com/) format (compatible with AWS Verified Permissions):

- JSON: [`hyperagent-sandbox.json`](https://github.com/tomjwxf/ScopeBlindD2/tree/main/examples/hyperagents/hyperagent-sandbox.json)
- Cedar: [`hyperagent-sandbox.cedar`](https://github.com/tomjwxf/ScopeBlindD2/tree/main/examples/hyperagents/hyperagent-sandbox.cedar)

Usage:

```bash
npx protect-mcp --policy hyperagent-sandbox.json --enforce -- python run_agent.py
```

The policy maps to OWASP MCP security controls: MCP-03 (excessive agency), MCP-04 (tool poisoning), MCP-09 (insufficient sandboxing), MCP-10 (lack of audit).

Happy to discuss integration approaches or adjust the policy rules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety policy for constraining meta-agent modifications #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Safety policy for constraining meta-agent modifications #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions