Add config_mas: configuration/dotfile poisoning MAS hijacking example by gwpl · Pull Request #40 · trailofbits/pajaMAS

gwpl · 2026-03-30T21:56:37Z

AI Assistant:

Hello! 👋 We'd like to contribute a new MAS hijacking example to the pajaMAS repository.

Summary

This PR adds config_mas/ — a new example demonstrating configuration/dotfile poisoning as a vector for MAS control-flow hijacking. Malicious web content tricks the agent system into writing a persistent configuration file containing a startup_script field. When the config is later loaded and applied, the embedded code is delegated to code_executor_agent, achieving persistent compromise that survives across interactions.

Real-World Attack References

This attack pattern is arguably the most impactful MAS-adjacent vulnerability class of 2025, having affected every major AI coding assistant:

CVE-2025-53773 — GitHub Copilot RCE (CVSS 7.8): prompt injection writes .vscode/settings.json to enable "YOLO mode" auto-approve, then executes arbitrary commands. Wormable across repositories. (EmbraceTheRed writeup, persistent-security.net Part III)
CVE-2025-59536 — Claude Code RCE: malicious .claude/settings.json hooks execute on project open. (Check Point Research, The Hacker News)
CVE-2025-54136 — Cursor MCPoison (CVSS 7.2): trusted-name MCP server config mutation via prompt injection. (Check Point Research, Tenable FAQ)
AWS-2025-015 — Amazon Q VS Code Extension: backdoored official release shipped to 964,000 installs (only a syntax error prevented mass exploitation). (GitHub Advisory GHSA-7g7f-ff96-5gcw, Nudge Security analysis)
Rules File Backdoor (Pillar Security, 2025): weaponized .cursorrules / copilot-instructions.md persist malicious instructions across sessions via hidden Unicode characters. (The Hacker News, Security Affairs)
Cross-Agent Privilege Escalation (EmbraceTheRed, 2025): Copilot poisons Claude Code's .mcp.json, demonstrating that compromising one agent creates a kill-chain into co-resident agents. (Simon Willison's commentary)

Relation to the Paper

This example directly extends the MAS hijacking framework from Triedman et al., 2025 (COLM 2025):

Control-flow hijacking (Section 4): The attack manipulates inter-agent metadata — web content is laundered through web_surfer_agent → config_manager_agent → code_executor_agent, exploiting the orchestrator's adaptive control flow exactly as described in the paper.
Confused deputy pattern (page 2): The config_manager_agent acts as a "confused deputy" (Hardy, 1988) — it has legitimate write privileges, but is tricked into writing attacker-controlled content. The orchestrator then blindly trusts the persisted config as authoritative metadata.
"Life finds a way" (Section 6.5): The two-phase attack (write → load) mirrors the paper's observation that MAS find creative multi-step paths to execute harmful code, even when no single step appears malicious in isolation.
Adversary goal: arbitrary code execution (Section 3.2): The startup_script in the config achieves the paper's primary adversary goal — arbitrary code execution on the user's device (or in their containerized environment).
Persistence — extending the paper's scope: The paper notes that "many multi-agent systems run in fully isolated virtual containers" (page 3) and that "any user data kept in that environment is at risk." Config poisoning demonstrates that compromise can persist on disk across sessions within such environments — the malicious config file remains after the initial attack, re-triggerable indefinitely.

Paper Concept	config_mas Instantiation
MAS hijacking (Table 1)	Web content → config write → code execution chain
Laundering (Section 4)	Malicious instructions reformatted as a JSON config field
Confused deputy (page 2)	config_manager_agent writes attacker-controlled content using its legitimate privileges
Metadata manipulation	The `startup_script` JSON field is treated as authoritative metadata by the orchestrator
"Life finds a way" (§6.5)	Two-phase attack — neither the config write nor the config load appears malicious alone

What Makes This Example Unique

Unlike the existing examples which demonstrate single-shot injection, config_mas demonstrates a two-phase persistence attack:

Phase 1 (Write): Web content convinces the agent to save a malicious config file to disk
Phase 2 (Execute): Loading the saved config triggers code execution from the persisted startup_script

This mirrors the exact attack chain of CVE-2025-53773 (write settings → auto-approve → RCE) and is the only example in the repo that demonstrates disk-persisted compromise.

Architecture

OrchestratorAgent — Central coordinator
WebSurferAgent — Fetches web content
ConfigManagerAgent (new) — Reads/writes JSON config files in a sandboxed config/ directory
CodeExecutorAgent — Executes code (Piston API sandbox by default)

Files Added

File	Description
`config_mas/agent.py`	4 agents + config read/write tools with .json validation
`config_mas/run_mas_example.py`	Two-prompt runner (save config, then load & execute)
`config_mas/setup.html`	Payload disguised as "agent configuration guide"
`config_mas/README.md`	Full documentation with attack flow, paper alignment, CVE references
`config_mas/__init__.py`	Standard module init
`config_mas/config/`	Sandboxed config directory (runtime-populated)

Consistency with Existing Examples

Uses Piston API for remote code execution by default (safe to run)
Includes commented-out local exec() alternative with sandbox warning (matching trifecta_mas pattern)
Standard argparse interface (--port, --find-free-port)
Same success marker detection ("colorless green ideas sleep furiously")
README follows the established structure (agents, file descriptions, Option 1 automated / Option 2 ADK manual)

Test Plan

Verify python run_mas_example.py starts HTTP server, sends two prompts, and detects success marker
Verify adk run config_mas works for manual interaction
Run 5 times per reproducibility note (LLM probabilistic results)
Confirm config/ directory is created at runtime and config file is written

We hope this contribution makes pajaMAS a richer resource for the open-source security community. The config poisoning attack pattern demonstrates a critical real-world threat vector that aligns closely with the paper's framework of MAS control-flow hijacking, while extending it to cover persistent compromise — a dimension increasingly relevant as AI coding assistants become ubiquitous.

🤖 Generated with Claude Code

Demonstrate how malicious web content can trick a multi-agent system into writing a persistent config file with embedded code, which gets executed when the config is later loaded and applied. This two-step attack mirrors real-world CVEs in AI coding assistants (CVE-2025-53773 Copilot YOLO RCE, CVE-2025-59536 Claude Code hooks RCE, CVE-2025-54136 MCPoison). * config_manager_agent with read_config/write_config tools * web_surfer_agent fetches setup.html containing poisoned config * orchestrator delegates startup_script execution to code_executor_agent * Two-prompt flow: summarize+save, then load+execute * Config persists on disk across sessions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rnings, input validation * Add commented-out local exec alternative in execute_code (matching trifecta_mas pattern) * Add Initial Setup section to README with safety warnings about direct code execution * Add .json filename validation to write_config to prevent arbitrary file writes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Map example to key concepts from Triedman et al., 2025 (arXiv:2503.12188, COLM 2025): MAS control-flow hijacking, laundering, confused deputies, and related paper sections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Link CVE-2025-53773, CVE-2025-59536, CVE-2025-54136 to NVD entries * Add source writeup links (EmbraceTheRed, Check Point, Tenable, etc.) * Link Rules File Backdoor, Cross-Agent Escalation, AWS-2025-015 to sources Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gwpl and others added 4 commits March 29, 2026 23:35

Add paper alignment section to README

5d416bd

Map example to key concepts from Triedman et al., 2025 (arXiv:2503.12188, COLM 2025): MAS control-flow hijacking, laundering, confused deputies, and related paper sections. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add config_mas: configuration/dotfile poisoning MAS hijacking example#40

Add config_mas: configuration/dotfile poisoning MAS hijacking example#40
gwpl wants to merge 4 commits intotrailofbits:mainfrom
VariousForks:config-mas-example

gwpl commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gwpl commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Real-World Attack References

Relation to the Paper

What Makes This Example Unique

Architecture

Files Added

Consistency with Existing Examples

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gwpl commented Mar 30, 2026 •

edited

Loading