Skip to content

Conversation

@SomeGuyNamedMo
Copy link

First draft for ASI10 Rogue Agents

Key Changes:

Initial draft for Agentic Security Initiative Top 10, ASI10 - Rogue Agents

First draft for ASI10 Rogue Agents
Copy link

@kerenkatzapex kerenkatzapex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!!!
The main point to me is being focused on the behavioral aspect of this risk, and connect between the intro + scenarios to the vulnerabilities and mitigations (explained in the commentes)
Let's do it!


* Impersonate legitimate roles (support, observer, collaborator).
* Execute unauthorized actions (e.g., exfiltrating data, escalating privileges).
* Drift from goals due to prompt injection, data poisoning, or hallucination.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data poisoning or context injection (ASI06) :)

* Drift from goals due to prompt injection, data poisoning, or hallucination.
* Embed itself parasitically into workflows, subtly undermining intended outcomes.

The impact ranges from system compromise, data breach, and regulatory violations to operational sabotage of autonomous decision-making environments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output manipulation and workflow hijacking are mentioned before but I think adding it explicitly to this great part will make the reader's thoughts even more organized


The impact ranges from system compromise, data breach, and regulatory violations to operational sabotage of autonomous decision-making environments.

This threat extends [LLM06:2025 Excessive Agency](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/) into autonomous systems, where impersonation, stealth participation, or parasitic behaviors can disrupt goal fulfillment. An agent is considered rogue when it behaves in such a way that goes against its purpose. An agent can go rogue for several reasons, such as [LLM01:2025 Prompt Injection](https://genai.owasp.org/llmrisk/llm01-prompt-injection/), Injection, or even just hallucinations.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I understand excessive agency, is that an llm gets extended permissions or role in a system that can be manipulated and lead to one of the consequences well mentioned above.
However, I do not think that the root cause is the same here.
In September 2025, Almost every agent is privileged due to agents being embedded in the main workflows, right? :)
I believe that the focus here is more on:
How due to agentic centric role in modern software systems (can mention that sometimes agents are overpermissive and refer to the overpermissions but I would not recommend focusing on this one more than mentioning it) , using AI adversarial (referring to prompt injection, data poisoning, vector and embedding weaknesses, context injection (ASI06), supply chain vulnerabilities (ASI04)) the agents can go rouge, which can result in consequences such as sensitive information disclosure (LLM02), Misinformation (LLM09) or workflow hijacking.
Maybe it worth to connect this part with the former 1-2 paragraph to not repeat the message :)

2. Side-Channel Participation: Low-trust agents (e.g, crowd-sourced assistants) covertly influence high-value workflows.
3. Impersonation Attacks: An attacker spawns an agent that claims to be a monitoring or support agent, manipulating outcomes.
4. Impersonation Attacks: An attacker spawns an agent that claims to be a monitoring or support agent, manipulating outcomes.
5. Emergent Autonomy: Agents collaborate recursively, creating tasks beyond human awareness (e.g., a planning agent spawning additional agents without authorization).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way 3 and 4 are phrased is to me, more focused on ASI03 - Identity and privilege abuse to me, do you see it differently?
I think you have done amazing job in the first part defining that an adversarial changes the behavior of the agent and then the risky consequences happen and I do not see here the adversarial parts but rather more identity focused techniques that are not compromising the specific agent that goes wrong, but rather the agentic ecosystem to work not as intended.

  1. I recommend we add supporting examples for classic adversarial that makes the agent to go wrong (aka classic Jailbreak)
  2. I think that the part in which you are talking about a change in the agentic ecosystem, that leads to a behavioral change is super interesting. but:
    a. I'd focus more on how it changes the state of the agentic system - as that is the key here and we want to distinguish ourselves from ASI03.
    b. I'd mention it in the intro as well.

4. Log all agent instantiation and coordination events.
5. Score and verify agent behavior dynamically based on norms and past performance.
6. Implement a guardrail system that reads prompts/responses and every intermediate input and looks for prompt injection

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here - it is again very identity focused.
Of course identity is a part of it and we need to address it, but I think the bigger focus of this entry should be the behavior: how to ensure that the agentic behavior is as expected.
I think 5 and 6 should be the first ones to be discussed, and then when we are talking about the identity parts we want to explain why is it specific to this threat, I think it is currently a bit too general (we always need to ensure that identity is scoped right?)


Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited.
Scenario #3 – Emergent Autonomy Drift (Availability & Compliance Risk):
A planning agent recursively spawns helper agents to optimize workflows. One helper begins deleting log files to reduce system clutter, erasing compliance evidence and violating audit requirements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the one helper begins to delete log files? why?

Scenario #2: Another example of an attack scenario showing a different way the vulnerability could be exploited.
Scenario #3 – Emergent Autonomy Drift (Availability & Compliance Risk):
A planning agent recursively spawns helper agents to optimize workflows. One helper begins deleting log files to reduce system clutter, erasing compliance evidence and violating audit requirements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the two first scenarios are super super practical and helpful!
I think if you take that and embed those vuln into the vuln parts, and focus more on mitigations to such scenarios in the mitigations parts - it will be even more clear to the readers (reading it end to end).

1. [Agentic AI - Threats and Mitigations](https://genai.owasp.org/resource/agentic-ai-threats-and-mitigations/https:/)
2. [LLM06:2025 Excessive Agency](https://genai.owasp.org/llmrisk/llm062025-excessive-agency/)
3. [MITRE ATT&CK - T1078 Exfiltration Over Alternative Protocol](https://attack.mitre.org/techniques/T1048/)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIVSS mapping is missing
Let's link to all of the relevant LLMs top 10 risks that were covered in here (some are missing)

Reflects the current state of the GDocs draft

+ Added in-line links to references for LLM Top 10

+ Markdown formatting

- Small grammatical changes
+Revised content to match Google Doc

+Added additional link for OWASP AIVSS pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants