Problem
Issue #2 describes reactive contamination control: detect sensitive data in a tool response, then block subsequent exfiltration tools. That's the right first step, but it's inherently after the fact — the agent has already loaded sensitive data into context before any restriction kicks in.
The deeper question: can we prevent the contamination from happening in the first place by telling the agent before it starts that certain tool orderings are unsafe, so it can plan accordingly?
An agent that knows "if I call search_email first, I will lose access to web_search for the rest of this session" can reason about that constraint and reorder its plan: do the web search first, then the internal search, then synthesize. The result is the same, but no sensitive data ever co-exists in context with an active external tool.
This is a harder problem than reactive blocking. It requires the agent to reason about authorization constraints as part of its planning, not just encounter them at execution time.
Proposed solution
Expose tool sensitivity metadata to the agent at session start, so it can factor authorization consequences into its tool call ordering.
1. Session manifest with sensitivity annotations
When a session is initialized, the gateway returns a manifest describing each available tool and its sensitivity class:
{
"session_id": "sess_abc123",
"tools": [
{
"name": "search_email",
"sensitivity": "internal_source",
"consequence": "calling this tool will block: web_search, slack_post, external_api"
},
{
"name": "search_docs",
"sensitivity": "internal_source",
"consequence": "calling this tool will block: web_search, slack_post, external_api"
},
{
"name": "web_search",
"sensitivity": "external",
"consequence": "none — safe to call before internal tools"
},
{
"name": "github_create_pr",
"sensitivity": "external",
"consequence": "none — safe to call before internal tools"
}
],
"ordering_hint": "complete all external tool calls before calling internal_source tools"
}
2. System prompt injection
The gateway injects a short planning constraint into the agent's system prompt at session start:
Tool ordering constraint (enforced by authorization layer):
- Tools marked [internal] will restrict your access to tools marked [external] for the remainder of this session.
- If your task requires both internal and external tools, call external tools first.
- Affected tools: search_email [internal], search_docs [internal] → blocks web_search, slack_post
- Safe to call in any order: github_create_pr, github_read_file
This gives the agent the information it needs to self-reorder without requiring it to discover the constraint by hitting a block.
3. Planning validation endpoint (optional)
Before execution, the agent can submit its planned tool call sequence for validation:
POST /v1/session/{id}/validate-plan
{
"planned_calls": ["search_email", "web_search", "github_create_pr"]
}
→ 200 OK
{
"valid": false,
"violations": [
{
"at_step": 1,
"tool": "web_search",
"reason": "web_search is blocked after search_email (step 0) loads internal data",
"suggestion": "move web_search before search_email"
}
],
"safe_ordering": ["web_search", "search_email", "github_create_pr"]
}
The agent can call this before starting, get a corrected ordering, and proceed without hitting any runtime blocks.
What "smart enough to reorder" actually requires
For an agent to self-reorder based on this information, it needs:
- The sensitivity manifest at session start (this proposal provides it)
- A system prompt that explains the constraint in plain language (this proposal provides it)
- Sufficient reasoning capability to apply the constraint during planning
Modern frontier models (Claude 3.5+, GPT-4o) handle this reliably when the constraint is stated clearly. Smaller models may need the planning validation endpoint as a fallback — submit the plan, get the corrected ordering, proceed.
Relationship to issue #2
These two issues are complementary, not alternatives:
|
Issue #2 (reactive) |
This issue (proactive) |
| When |
After sensitive data enters context |
Before the agent starts calling tools |
| Mechanism |
Block tool calls at runtime |
Inform agent during planning |
| Failure mode |
Agent plan fails mid-execution |
Agent plan succeeds without contamination |
| Complexity |
Lower |
Higher |
Implement #2 first. This issue builds on top of it — the sensitivity classifications defined in #2 (is_internal_source, is_external) are the same ones used to generate the manifest here.
Acceptance criteria
Problem
Issue #2 describes reactive contamination control: detect sensitive data in a tool response, then block subsequent exfiltration tools. That's the right first step, but it's inherently after the fact — the agent has already loaded sensitive data into context before any restriction kicks in.
The deeper question: can we prevent the contamination from happening in the first place by telling the agent before it starts that certain tool orderings are unsafe, so it can plan accordingly?
An agent that knows "if I call
search_emailfirst, I will lose access toweb_searchfor the rest of this session" can reason about that constraint and reorder its plan: do the web search first, then the internal search, then synthesize. The result is the same, but no sensitive data ever co-exists in context with an active external tool.This is a harder problem than reactive blocking. It requires the agent to reason about authorization constraints as part of its planning, not just encounter them at execution time.
Proposed solution
Expose tool sensitivity metadata to the agent at session start, so it can factor authorization consequences into its tool call ordering.
1. Session manifest with sensitivity annotations
When a session is initialized, the gateway returns a manifest describing each available tool and its sensitivity class:
{ "session_id": "sess_abc123", "tools": [ { "name": "search_email", "sensitivity": "internal_source", "consequence": "calling this tool will block: web_search, slack_post, external_api" }, { "name": "search_docs", "sensitivity": "internal_source", "consequence": "calling this tool will block: web_search, slack_post, external_api" }, { "name": "web_search", "sensitivity": "external", "consequence": "none — safe to call before internal tools" }, { "name": "github_create_pr", "sensitivity": "external", "consequence": "none — safe to call before internal tools" } ], "ordering_hint": "complete all external tool calls before calling internal_source tools" }2. System prompt injection
The gateway injects a short planning constraint into the agent's system prompt at session start:
This gives the agent the information it needs to self-reorder without requiring it to discover the constraint by hitting a block.
3. Planning validation endpoint (optional)
Before execution, the agent can submit its planned tool call sequence for validation:
The agent can call this before starting, get a corrected ordering, and proceed without hitting any runtime blocks.
What "smart enough to reorder" actually requires
For an agent to self-reorder based on this information, it needs:
Modern frontier models (Claude 3.5+, GPT-4o) handle this reliably when the constraint is stated clearly. Smaller models may need the planning validation endpoint as a fallback — submit the plan, get the corrected ordering, proceed.
Relationship to issue #2
These two issues are complementary, not alternatives:
Implement #2 first. This issue builds on top of it — the sensitivity classifications defined in #2 (
is_internal_source,is_external) are the same ones used to generate the manifest here.Acceptance criteria
GET /v1/session/{id}/manifestreturns tool list with sensitivity annotations and consequence descriptionsPOST /v1/session/{id}/validate-planaccepts a planned tool sequence and returns violations + safe orderingsearch_emailandweb_search— with manifest, it callsweb_searchfirst without hitting a runtime block