Skip to content

Add config_mas: configuration/dotfile poisoning MAS hijacking example#40

Open
gwpl wants to merge 4 commits intotrailofbits:mainfrom
VariousForks:config-mas-example
Open

Add config_mas: configuration/dotfile poisoning MAS hijacking example#40
gwpl wants to merge 4 commits intotrailofbits:mainfrom
VariousForks:config-mas-example

Conversation

@gwpl
Copy link
Copy Markdown

@gwpl gwpl commented Mar 30, 2026

AI Assistant:

Hello! 👋 We'd like to contribute a new MAS hijacking example to the pajaMAS repository.

Summary

This PR adds config_mas/ — a new example demonstrating configuration/dotfile poisoning as a vector for MAS control-flow hijacking. Malicious web content tricks the agent system into writing a persistent configuration file containing a startup_script field. When the config is later loaded and applied, the embedded code is delegated to code_executor_agent, achieving persistent compromise that survives across interactions.

Real-World Attack References

This attack pattern is arguably the most impactful MAS-adjacent vulnerability class of 2025, having affected every major AI coding assistant:

Relation to the Paper

This example directly extends the MAS hijacking framework from Triedman et al., 2025 (COLM 2025):

  • Control-flow hijacking (Section 4): The attack manipulates inter-agent metadata — web content is laundered through web_surfer_agentconfig_manager_agentcode_executor_agent, exploiting the orchestrator's adaptive control flow exactly as described in the paper.
  • Confused deputy pattern (page 2): The config_manager_agent acts as a "confused deputy" (Hardy, 1988) — it has legitimate write privileges, but is tricked into writing attacker-controlled content. The orchestrator then blindly trusts the persisted config as authoritative metadata.
  • "Life finds a way" (Section 6.5): The two-phase attack (write → load) mirrors the paper's observation that MAS find creative multi-step paths to execute harmful code, even when no single step appears malicious in isolation.
  • Adversary goal: arbitrary code execution (Section 3.2): The startup_script in the config achieves the paper's primary adversary goal — arbitrary code execution on the user's device (or in their containerized environment).
  • Persistence — extending the paper's scope: The paper notes that "many multi-agent systems run in fully isolated virtual containers" (page 3) and that "any user data kept in that environment is at risk." Config poisoning demonstrates that compromise can persist on disk across sessions within such environments — the malicious config file remains after the initial attack, re-triggerable indefinitely.
Paper Concept config_mas Instantiation
MAS hijacking (Table 1) Web content → config write → code execution chain
Laundering (Section 4) Malicious instructions reformatted as a JSON config field
Confused deputy (page 2) config_manager_agent writes attacker-controlled content using its legitimate privileges
Metadata manipulation The startup_script JSON field is treated as authoritative metadata by the orchestrator
"Life finds a way" (§6.5) Two-phase attack — neither the config write nor the config load appears malicious alone

What Makes This Example Unique

Unlike the existing examples which demonstrate single-shot injection, config_mas demonstrates a two-phase persistence attack:

  1. Phase 1 (Write): Web content convinces the agent to save a malicious config file to disk
  2. Phase 2 (Execute): Loading the saved config triggers code execution from the persisted startup_script

This mirrors the exact attack chain of CVE-2025-53773 (write settings → auto-approve → RCE) and is the only example in the repo that demonstrates disk-persisted compromise.

Architecture

  • OrchestratorAgent — Central coordinator
  • WebSurferAgent — Fetches web content
  • ConfigManagerAgent (new) — Reads/writes JSON config files in a sandboxed config/ directory
  • CodeExecutorAgent — Executes code (Piston API sandbox by default)

Files Added

File Description
config_mas/agent.py 4 agents + config read/write tools with .json validation
config_mas/run_mas_example.py Two-prompt runner (save config, then load & execute)
config_mas/setup.html Payload disguised as "agent configuration guide"
config_mas/README.md Full documentation with attack flow, paper alignment, CVE references
config_mas/__init__.py Standard module init
config_mas/config/ Sandboxed config directory (runtime-populated)

Consistency with Existing Examples

  • Uses Piston API for remote code execution by default (safe to run)
  • Includes commented-out local exec() alternative with sandbox warning (matching trifecta_mas pattern)
  • Standard argparse interface (--port, --find-free-port)
  • Same success marker detection ("colorless green ideas sleep furiously")
  • README follows the established structure (agents, file descriptions, Option 1 automated / Option 2 ADK manual)

Test Plan

  • Verify python run_mas_example.py starts HTTP server, sends two prompts, and detects success marker
  • Verify adk run config_mas works for manual interaction
  • Run 5 times per reproducibility note (LLM probabilistic results)
  • Confirm config/ directory is created at runtime and config file is written

We hope this contribution makes pajaMAS a richer resource for the open-source security community. The config poisoning attack pattern demonstrates a critical real-world threat vector that aligns closely with the paper's framework of MAS control-flow hijacking, while extending it to cover persistent compromise — a dimension increasingly relevant as AI coding assistants become ubiquitous.

🤖 Generated with Claude Code

gwpl and others added 4 commits March 29, 2026 23:35
Demonstrate how malicious web content can trick a multi-agent system into
writing a persistent config file with embedded code, which gets executed
when the config is later loaded and applied. This two-step attack mirrors
real-world CVEs in AI coding assistants (CVE-2025-53773 Copilot YOLO RCE,
CVE-2025-59536 Claude Code hooks RCE, CVE-2025-54136 MCPoison).

* config_manager_agent with read_config/write_config tools
* web_surfer_agent fetches setup.html containing poisoned config
* orchestrator delegates startup_script execution to code_executor_agent
* Two-prompt flow: summarize+save, then load+execute
* Config persists on disk across sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rnings, input validation

* Add commented-out local exec alternative in execute_code (matching trifecta_mas pattern)
* Add Initial Setup section to README with safety warnings about direct code execution
* Add .json filename validation to write_config to prevent arbitrary file writes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Map example to key concepts from Triedman et al., 2025 (arXiv:2503.12188,
COLM 2025): MAS control-flow hijacking, laundering, confused deputies,
and related paper sections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Link CVE-2025-53773, CVE-2025-59536, CVE-2025-54136 to NVD entries
* Add source writeup links (EmbraceTheRed, Check Point, Tenable, etc.)
* Link Rules File Backdoor, Cross-Agent Escalation, AWS-2025-015 to sources

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant