Skip to content

feat: context contamination detection — mark sensitive context, restrict subsequent tool calls #2

@Siddhant-K-code

Description

@Siddhant-K-code

Problem

Multi-step agents routinely mix internal and external tool calls within a single session. The ordering is determined by the agent's reasoning, not by any security policy. This creates a data exfiltration path that's easy to miss:

  1. Agent calls an internal tool (email search, document retrieval, internal API) and loads sensitive data into its context window.
  2. Agent's next reasoning step involves a web-facing tool (web search, external API, webhook).
  3. The agent includes a snippet of the internal data in the outbound request — as search context, as a parameter, or embedded in a prompt.
  4. Internal IP, PII, or credentials have now left the system. No error was thrown. No policy was violated under the current model.

This is not a prompt injection attack. It's a structural property of how multi-step agents work: context accumulates, and later tool calls can observe everything earlier tool calls returned.

Concrete example:

Step 1: search_email(query="Q3 pricing proposal") 
  → returns internal pricing doc with customer names and deal values

Step 2: web_search(query="competitor pricing " + <snippet from step 1>)
  → internal pricing data sent to a third-party search API

The current OpenFGA model in this repo controls which tools an agent can call, but has no concept of what the agent's context contains at the time of the call. Authorization decisions are stateless with respect to context.

Proposed solution

Add a context contamination layer to the gateway that:

  1. Marks — detects when a tool response contains sensitive data (PII, internal-only content, credentials) and tags the agent's session context as contaminated.
  2. Blocks — after contamination is detected, restricts the agent's available tool set to exclude any tool capable of exfiltrating data (web search, external APIs, webhooks, email send).
  3. Surfaces — makes the contamination state visible in the authorization decision log, so it appears in traces.

Design

Contamination state per session

type ContextState struct {
    SessionID       string
    ContaminatedAt  *time.Time          // nil = clean
    ContaminationSource string          // which tool triggered it
    SensitivityLevel SensitivityLevel   // PII | InternalIP | Credentials
    BlockedTools    []string            // tools now unavailable
}

Sensitivity classification on tool response

type SensitivityClassifier interface {
    Classify(toolName string, response []byte) SensitivityLevel
}

Initial classifiers:

  • Pattern-based: regex for email addresses, credit card numbers, API key patterns, internal domain names
  • Tool-based: any response from tools tagged internal: true in the OpenFGA model is automatically marked sensitive
  • LLM-based (optional, opt-in): pass response through a small classifier model for higher accuracy

Gateway enforcement

func (g *Gateway) Authorize(ctx context.Context, req AuthzRequest) (AuthzDecision, error) {
    // existing OpenFGA check
    decision, err := g.fga.Check(ctx, req)
    if err != nil || !decision.Allowed {
        return decision, err
    }

    // contamination check
    state := g.contextStore.Get(req.SessionID)
    if state.IsContaminated() && g.isExfilTool(req.Tool) {
        return AuthzDecision{
            Allowed: false,
            Reason:  fmt.Sprintf("tool %q blocked: session context contains %s (from %s)", 
                req.Tool, state.SensitivityLevel, state.ContaminationSource),
        }, nil
    }

    return decision, nil
}

After tool call: update contamination state

func (g *Gateway) RecordToolResponse(sessionID, toolName string, response []byte) {
    level := g.classifier.Classify(toolName, response)
    if level > SensitivityNone {
        g.contextStore.MarkContaminated(sessionID, toolName, level)
    }
}

OpenFGA model extension

Tag tools with their exfiltration risk in the authorization model:

type tool
  relations
    define can_use: [agent]
    define is_external: [system]   # new: marks tools that send data outside
    define is_internal_source: [system]  # new: marks tools that return internal data

This lets the gateway query OpenFGA for "which tools are external?" rather than hardcoding a list.

Example: what the agent sees

Before contamination:

Available tools: search_email, search_docs, web_search, slack_post, github_create_pr

After search_email returns internal pricing data:

Available tools: search_docs, github_create_pr
Blocked (contamination): web_search, slack_post
Reason: session context contains InternalIP from search_email

What this does not do

  • Does not prevent the agent from reading internal data — that's the existing OpenFGA model's job.
  • Does not inspect the agent's reasoning or prompt — only tool responses.
  • Does not require LLM calls in the hot path (pattern-based classifier is synchronous).

Acceptance criteria

  • ContextState tracked per session in the gateway
  • Pattern-based sensitivity classifier for PII and credential patterns
  • Tool-level is_internal_source and is_external tags in the OpenFGA model
  • Gateway blocks external tools after internal-source tool returns sensitive data
  • Contamination state visible in the authorization decision log
  • Demo scenario added: search_emailweb_search blocked, search_emailgithub_create_pr allowed
  • README section: "Context contamination protection"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions