Skip to content

Safety policy for constraining meta-agent modifications #17

@tomjwxf

Description

@tomjwxf

HyperAgents executes model-generated code in a self-improvement loop where the meta-agent rewrites task agent source autonomously. The README correctly flags this as executing "untrusted, model-generated code."

We've put together a safety policy pack that constrains what the meta-agent can do during the optimization loop:

  • Reads: unrestricted (meta-agent needs to observe task agent performance)
  • Writes: restricted to workspace/ only, with approval gate (prevents rewriting evaluation harness, own source, or system files)
  • Command execution: blocked (meta-agent rewrites code; execution goes through the framework)
  • File deletion: blocked (preserves full optimization history)
  • Network requests: blocked (closed-loop optimization, no data exfiltration)
  • Rate limit: 10 tool calls/minute (prevents runaway rewrite cycles)

Every allowed and denied action produces a signed receipt. The full run produces a verifiable audit chain — useful for debugging optimization regressions and for reproducibility.

The policies are available in both JSON and Cedar format (compatible with AWS Verified Permissions):

Usage:

npx protect-mcp --policy hyperagent-sandbox.json --enforce -- python run_agent.py

The policy maps to OWASP MCP security controls: MCP-03 (excessive agency), MCP-04 (tool poisoning), MCP-09 (insufficient sandboxing), MCP-10 (lack of audit).

Happy to discuss integration approaches or adjust the policy rules.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions