Problem
Multi-step agents routinely mix internal and external tool calls within a single session. The ordering is determined by the agent's reasoning, not by any security policy. This creates a data exfiltration path that's easy to miss:
- Agent calls an internal tool (email search, document retrieval, internal API) and loads sensitive data into its context window.
- Agent's next reasoning step involves a web-facing tool (web search, external API, webhook).
- The agent includes a snippet of the internal data in the outbound request — as search context, as a parameter, or embedded in a prompt.
- Internal IP, PII, or credentials have now left the system. No error was thrown. No policy was violated under the current model.
This is not a prompt injection attack. It's a structural property of how multi-step agents work: context accumulates, and later tool calls can observe everything earlier tool calls returned.
Concrete example:
Step 1: search_email(query="Q3 pricing proposal")
→ returns internal pricing doc with customer names and deal values
Step 2: web_search(query="competitor pricing " + <snippet from step 1>)
→ internal pricing data sent to a third-party search API
The current OpenFGA model in this repo controls which tools an agent can call, but has no concept of what the agent's context contains at the time of the call. Authorization decisions are stateless with respect to context.
Proposed solution
Add a context contamination layer to the gateway that:
- Marks — detects when a tool response contains sensitive data (PII, internal-only content, credentials) and tags the agent's session context as contaminated.
- Blocks — after contamination is detected, restricts the agent's available tool set to exclude any tool capable of exfiltrating data (web search, external APIs, webhooks, email send).
- Surfaces — makes the contamination state visible in the authorization decision log, so it appears in traces.
Design
Contamination state per session
type ContextState struct {
SessionID string
ContaminatedAt *time.Time // nil = clean
ContaminationSource string // which tool triggered it
SensitivityLevel SensitivityLevel // PII | InternalIP | Credentials
BlockedTools []string // tools now unavailable
}
Sensitivity classification on tool response
type SensitivityClassifier interface {
Classify(toolName string, response []byte) SensitivityLevel
}
Initial classifiers:
- Pattern-based: regex for email addresses, credit card numbers, API key patterns, internal domain names
- Tool-based: any response from tools tagged
internal: true in the OpenFGA model is automatically marked sensitive
- LLM-based (optional, opt-in): pass response through a small classifier model for higher accuracy
Gateway enforcement
func (g *Gateway) Authorize(ctx context.Context, req AuthzRequest) (AuthzDecision, error) {
// existing OpenFGA check
decision, err := g.fga.Check(ctx, req)
if err != nil || !decision.Allowed {
return decision, err
}
// contamination check
state := g.contextStore.Get(req.SessionID)
if state.IsContaminated() && g.isExfilTool(req.Tool) {
return AuthzDecision{
Allowed: false,
Reason: fmt.Sprintf("tool %q blocked: session context contains %s (from %s)",
req.Tool, state.SensitivityLevel, state.ContaminationSource),
}, nil
}
return decision, nil
}
After tool call: update contamination state
func (g *Gateway) RecordToolResponse(sessionID, toolName string, response []byte) {
level := g.classifier.Classify(toolName, response)
if level > SensitivityNone {
g.contextStore.MarkContaminated(sessionID, toolName, level)
}
}
OpenFGA model extension
Tag tools with their exfiltration risk in the authorization model:
type tool
relations
define can_use: [agent]
define is_external: [system] # new: marks tools that send data outside
define is_internal_source: [system] # new: marks tools that return internal data
This lets the gateway query OpenFGA for "which tools are external?" rather than hardcoding a list.
Example: what the agent sees
Before contamination:
Available tools: search_email, search_docs, web_search, slack_post, github_create_pr
After search_email returns internal pricing data:
Available tools: search_docs, github_create_pr
Blocked (contamination): web_search, slack_post
Reason: session context contains InternalIP from search_email
What this does not do
- Does not prevent the agent from reading internal data — that's the existing OpenFGA model's job.
- Does not inspect the agent's reasoning or prompt — only tool responses.
- Does not require LLM calls in the hot path (pattern-based classifier is synchronous).
Acceptance criteria
Problem
Multi-step agents routinely mix internal and external tool calls within a single session. The ordering is determined by the agent's reasoning, not by any security policy. This creates a data exfiltration path that's easy to miss:
This is not a prompt injection attack. It's a structural property of how multi-step agents work: context accumulates, and later tool calls can observe everything earlier tool calls returned.
Concrete example:
The current OpenFGA model in this repo controls which tools an agent can call, but has no concept of what the agent's context contains at the time of the call. Authorization decisions are stateless with respect to context.
Proposed solution
Add a context contamination layer to the gateway that:
Design
Contamination state per session
Sensitivity classification on tool response
Initial classifiers:
internal: truein the OpenFGA model is automatically marked sensitiveGateway enforcement
After tool call: update contamination state
OpenFGA model extension
Tag tools with their exfiltration risk in the authorization model:
This lets the gateway query OpenFGA for "which tools are external?" rather than hardcoding a list.
Example: what the agent sees
Before contamination:
After
search_emailreturns internal pricing data:What this does not do
Acceptance criteria
ContextStatetracked per session in the gatewayis_internal_sourceandis_externaltags in the OpenFGA modelsearch_email→web_searchblocked,search_email→github_create_prallowed