fix(control-plane): authorize execution note writes(#420)#575
fix(control-plane): authorize execution note writes(#420)#575Luffy2208 wants to merge 2 commits into
Conversation
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
✅ Patch gate passedEvery surface whose lines were touched by this PR has patch coverage at or above the threshold. |
b9fb5f5 to
72f59ee
Compare
|
I fixed the patch coverage failure by adding targeted tests in What was added:
|
santoshkumarradha
left a comment
There was a problem hiding this comment.
🔴 PR-AF Review — Needs Major Rework
Automated multi-agent code review · PR-AF built with AgentField
10 findings · 🔴 3 critical · 🟠 4 important · 🔵 0 suggestions · ⚪ 0 nitpicks
PR Overview
Summary
Fixes an IDOR in the execution notes write endpoint by enforcing execution ownership before appending a note.
File-specific changes:
-
control-plane/internal/handlers/execution_notes.go- Resolves the caller agent identity before writing a note.
- Compares the caller agent ID with the execution owner,
execution.AgentNodeID. - Returns
403 execution_ownership_mismatchwhen the caller does not own the execution. - Supports DID-authenticated callers by resolving the verified caller DID to an agent ID.
-
control-plane/internal/server/middleware/auth.go- Stores API-key caller identity in Gin context using
CallerAgentIDKey. - Uses
X-Caller-Agent-IDfirst, withX-Agent-Node-IDas fallback.
- Stores API-key caller identity in Gin context using
-
control-plane/internal/handlers/execution_notes_test.go- Adds coverage for owner write success.
- Adds coverage for non-owner API-key write returning
403. - Adds coverage for DID-authenticated owner and non-owner behavior.
-
control-plane/internal/server/middleware/auth_test.go- Adds coverage that API-key auth populates caller identity in Gin context.
-
control-plane/internal/handlers/coverage_handlers_90_test.go- Updates the existing successful note-write coverage test to include matching caller identity.
Testing
-
./scripts/test-all.sh - Additional verification:
cd control-plane && go test ./internal/handlers ./internal/server/middlewarecd control-plane && go test ./internal/handlers ./internal/server/middleware -coverprofile=/tmp/issue-420.coverprofilecd control-plane && golangci-lint run --new-from-rev=upstream/main ./internal/handlers ./internal/server/middleware- Manual E2E curl verification:
- Agent A adding a note to Agent A’s execution returns
200 OK - Agent A adding a note to Agent B’s execution returns
403 Forbidden - Agent B’s fresh execution remains with
notes: []after the blocked write
- Agent A adding a note to Agent A’s execution returns
Note: full repo lint currently reports pre-existing unrelated Go lint issues outside this PR’s changed files. Changed-line lint for the touched packages reports 0 issues.
Checklist
- I updated documentation where applicable.
- I added or updated tests (or none were needed).
- I updated
CHANGELOG.md(or this change does not warrant a changelog entry).
Screenshots (if UI-related)
Not UI-related.
Related issues
Fixes #420
Key Findings
7 issue(s) should be addressed before merge:
- 🔴 Complete ownership enforcement bypass when APIKey is empty and DID auth is off — the default configuration (
control-plane/internal/server/middleware/auth.go:24) — WhenAuthConfig.APIKeyis empty (the default in ALL deployment configurations:config/agentfield.yaml, Docker Compose atdeployments/docker/docker-compose.yml, and Helm at `deployments/helm/agen… - 🔴 Three-tier identity fallback in execution notes handler has no fail-closed mechanism: raw-header tier silently becomes primary identity source when all upstream auth middleware is configuration-disabled (
control-plane/internal/handlers/execution_notes.go:184) — The execution notes handler's executionNoteCallerAgentID implements a three-tier identity resolution cascade: (1) verified DID from DIDAuthMiddleware, (2) CallerAgentIDKey context value from APIKeyAut… - 🔴 Fixing the default authentication bypass by enabling DID auth silently activates diverging DID resolution paths — a 'fix F4, expose F1' trap (
control-plane/internal/handlers/execution_notes.go:184) — F4 establishes that under the default configuration (APIKey empty AND did_auth_enabled false), the execution notes handler accepts unvalidated raw headers as caller identity — a complete ownership enf… - 🟠 GetExecutionNotesHandler leaks execution notes to any authenticated caller with no ownership enforcement (
control-plane/internal/handlers/execution_notes.go:235) — The PR fixes an IDOR on the write path (AddExecutionNoteHandler) by enforcing execution ownership, but the read path (GetExecutionNotesHandler) remains completely open: any API-key-authenticated calle… - 🟠 DID auth middleware provides no defense against no-auth bypass — attacker can simply omit DID headers (
control-plane/internal/server/middleware/did_auth.go:177) — The assignment question asks: "Does the DIDAuthMiddleware at routes_middleware.go:77-88 provide any defense when API key auth is off?" No, not in any meaningful way. If `DID.Enabled && DIDAuthEna… - 🟠 Two-tier DID resolution reads semantically-different field names (AgentID vs AgentNodeID) from independent tables with no structural equivalence guarantee (
control-plane/internal/handlers/execution_notes.go:206) — The functionresolveExecutionNoteAgentIDByDIDat line 206 resolves a caller DID to an agent identifier through two independent code paths that read *differently-named columns from different tables… - 🟠 Type-unsafe CallerAgentIDKey enables silent reversion to attacker-controlled raw-header identity when any middleware writes a non-string value — turning a compile-time type error into a runtime authentication bypass (
control-plane/internal/handlers/execution_notes.go:189) — The combination of F2 (CallerAgentIDKey accepts any value type via Gin'sc.Set(key string, value any)) and F4 (executionNoteCallerAgentID falls through to raw X-Caller-Agent-ID / X-Agent-Node-ID hea…
Files with findings: control-plane/internal/handlers/execution_notes.go, control-plane/internal/server/middleware/auth.go, control-plane/internal/server/middleware/did_auth.go
All Findings by Severity
🔴 Critical (3)
- Complete ownership enforcement bypass when APIKey is empty and DID auth is off — the default configuration
control-plane/internal/server/middleware/auth.go:24 - Three-tier identity fallback in execution notes handler has no fail-closed mechanism: raw-header tier silently becomes primary identity source when all upstream auth middleware is configuration-disabled
control-plane/internal/handlers/execution_notes.go:184 - Fixing the default authentication bypass by enabling DID auth silently activates diverging DID resolution paths — a 'fix F4, expose F1' trap
control-plane/internal/handlers/execution_notes.go:184
🟠 Important (4)
- GetExecutionNotesHandler leaks execution notes to any authenticated caller with no ownership enforcement
control-plane/internal/handlers/execution_notes.go:235 - DID auth middleware provides no defense against no-auth bypass — attacker can simply omit DID headers
control-plane/internal/server/middleware/did_auth.go:177 - Two-tier DID resolution reads semantically-different field names (AgentID vs AgentNodeID) from independent tables with no structural equivalence guarantee
control-plane/internal/handlers/execution_notes.go:206 - Type-unsafe CallerAgentIDKey enables silent reversion to attacker-controlled raw-header identity when any middleware writes a non-string value — turning a compile-time type error into a runtime authentication bypass
control-plane/internal/handlers/execution_notes.go:189
Review Process Details
Dimensions Analyzed (15):
- No-Auth Mode Identity Spoofing Bypass — 4 file(s)
- CallerAgentIDKey Context Semantics Collision — 3 file(s)
- GET/POST Execution Notes Authorization Asymmetry — 2 file(s)
- DID Resolution Silent Degradation — Error vs Not-Found Conflation — 3 file(s)
- Context.Background() in GET Handler Bypasses Request Timeout — 2 file(s)
- DID resolution field-name divergence: AgentID vs AgentNodeID — 3 file(s)
- CallerAgentIDKey context type contract: non-string write → silent fallback — 3 file(s)
- No-auth middleware bypass: mechanical trace from empty APIKey to unauthenticated header reads — 4 file(s)
- Storage error propagation contract: errors.As dependency on UpdateExecutionRecord fidelity — 2 file(s)
- Context.Background() drift in GET handler: deadline propagation gap vs POST handler — 2 file(s)
- No-auth bypass of execution ownership enforcement via raw header fallback — 4 file(s)
- Triplicated caller resolution logic with diverged priority chains sharing one context namespace — 3 file(s)
- Auth error classification breaks if production storage provider wraps or replaces closure errors — 2 file(s)
- APIKeyAuth global broadcast of CallerAgentIDKey changes semantics for ALL authenticated routes — 2 file(s)
- DID-to-agent ID resolution produces different identifiers depending on resolution path (AgentID vs AgentNodeID field mismatch) — 3 file(s)
Meta-Dimension Lenses (3):
- Semantic — 5 dimension(s), 92% coverage confidence
- Mechanical — 5 dimension(s), 92% coverage confidence
- Systemic — 5 dimension(s), 92% coverage confidence
Cross-Reference & Adversary Analysis:
- 6 compound finding(s) synthesized
- 12 finding(s) adversarially tested: 12 confirmed, 0 challenged
Pipeline Stats
| Metric | Value |
|---|---|
| Duration | 4643.7s |
| Agent invocations | 65 |
| Coverage iterations | 0 |
| Estimated cost | N/A (provider does not report cost) |
| Budget exhausted | No |
| PR type | bugfix |
| Complexity | standard |
Review ID: rev_b6e41625c18a
| return nil | ||
| } | ||
|
|
||
| func executionNoteCallerAgentID(ctx context.Context, c *gin.Context, storageProvider ExecutionNoteStorage) (string, error) { |
There was a problem hiding this comment.
🔴 [CRITICAL] Raw-header fallback becomes sole identity source under default config
executionNoteCallerAgentID has 3 tiers: (1) verified DID, (2) CallerAgentIDKey context value, (3) raw X-Caller-Agent-ID/X-Agent-Node-ID headers. Tiers 1 & 2 are config-gated. Under defaults (APIKey="", did_auth_enabled=false), both are skipped — tier 3 accepts attacker-controlled headers with zero validation, flowing directly to ensureExecutionNoteOwnership.
Evidence:
routes_middleware.go:77— DID middleware not installed when disabledauth.go:26-28— APIKeyAuth no-ops whenAPIKey=="", never sets context keyexecution_notes.go:196-201— raw header read with no validation- When APIKeyAuth does run, it reads the same headers (
auth.go:118-124), so the fallback is either dead code or active bypass — never legitimate
Fix: Delete the raw-header fallback. Add a startup assertion in routes_middleware.go that refuses to register write routes when both auth methods are disabled.
Compound Analysis · confidence 95%
🤖 Reviewed by AgentField PR-AF
|
|
||
| // GetExecutionNotesHandler handles GET /api/v1/executions/:execution_id/notes | ||
| // Retrieves notes for a specific execution with optional tag filtering | ||
| func GetExecutionNotesHandler(storageProvider ExecutionNoteStorage) gin.HandlerFunc { |
There was a problem hiding this comment.
🟠 [IMPORTANT] Read path leaks execution notes — no ownership check
PR fixes IDOR on write path but GetExecutionNotesHandler (line 235) remains open: any API-key-authenticated caller can read any execution's notes by ID. storageProvider.GetExecutionRecord() is called at line 256 with no caller identity resolution or comparison against execution.AgentNodeID — same IDOR pattern just fixed on write.
Notes carry workflow state — phase progress, intermediate results, confidence reasoning. See examples/python_agent_nodes/agentic_rag/main.py:912,914,963,1097. The UI details endpoint GetExecutionDetailsGlobalHandler (ui/executions.go:558, lines 784-785) also exposes NotesCount/LatestNote without ownership enforcement.
No code comment, test, or PR description text explains whether the open read is intentional. Likely oversight given the write-side fix.
Fix: Mirror write path — resolve caller via executionNoteCallerAgentID, then ensureExecutionNoteOwnership, return 403 on mismatch. If intentional, add a code comment + a test confirming non-owner reads are the intended contract.
Authorization Asymmetry · confidence 90%
🤖 Reviewed by AgentField PR-AF
|
|
||
| func resolveExecutionNoteAgentIDByDID(ctx context.Context, storageProvider ExecutionNoteStorage, callerDID string) (string, error) { | ||
| if lookup, ok := storageProvider.(executionNoteDIDDocumentLookup); ok { | ||
| if record, err := lookup.GetDIDDocument(ctx, callerDID); err == nil && record != nil { |
There was a problem hiding this comment.
🟡 [MEDIUM] DID resolution lookup doesn't filter revoked records
resolveExecutionNoteAgentIDByDID (line 208) accepts any non-error GetDIDDocument result. LocalStorage.GetDIDDocument (local.go:8305-8334) returns records with RevokedAt populated and err==nil — no revoked_at IS NULL filter, unlike GetDIDDocumentByAgentID (local.go:8345). ListAgentDIDs (local.go:6864-6917) similarly returns rows regardless of agent_dids.status (which can be active|inactive|revoked per migration 002).
Defense-in-depth concern, not a direct bypass for did:web: DIDAuthMiddleware → VerifyDIDOwnership → ResolveDID (did_web_service.go:128-149) already returns nil document for revoked did:web, so the auth layer rejects with 401 before reaching this handler. The handler's missing revocation check is a secondary gap — it matters if:
- The caller is a did:key whose entry in
agent_didswas markedrevoked(auth still passes because did:key is self-verifying) - Revocation happens between auth and handler execution (narrow race)
- This resolver is later reused in a code path that doesn't run
DIDAuthMiddlewarefirst
Fix: After the GetDIDDocument call, if record.IsRevoked() { /* fall through or error */ }. For ListAgentDIDs, skip entries where info.Status == AgentDIDStatusRevoked. Or push the filters into the queries as GetDIDDocumentByAgentID does.
Defense-in-depth: revocation filter · confidence 80%
🤖 Reviewed by AgentField PR-AF
|
@Luffy2208, just a heads-up that we're currently evaluating the review quality of https://github.com/Agent-Field/pr-af. If you notice any of these automated findings are off-base, noisy, or unhelpful, please let us know! Your feedback would be super helpful. |
Summary
Fixes an IDOR in the execution notes write endpoint by enforcing execution ownership before appending a note.
File-specific changes:
control-plane/internal/handlers/execution_notes.goexecution.AgentNodeID.403 execution_ownership_mismatchwhen the caller does not own the execution.control-plane/internal/server/middleware/auth.goCallerAgentIDKey.X-Caller-Agent-IDfirst, withX-Agent-Node-IDas fallback.control-plane/internal/handlers/execution_notes_test.go403.control-plane/internal/server/middleware/auth_test.gocontrol-plane/internal/handlers/coverage_handlers_90_test.goTesting
./scripts/test-all.shcd control-plane && go test ./internal/handlers ./internal/server/middlewarecd control-plane && go test ./internal/handlers ./internal/server/middleware -coverprofile=/tmp/issue-420.coverprofilecd control-plane && golangci-lint run --new-from-rev=upstream/main ./internal/handlers ./internal/server/middleware200 OK403 Forbiddennotes: []after the blocked writeNote: full repo lint currently reports pre-existing unrelated Go lint issues outside this PR’s changed files. Changed-line lint for the touched packages reports
0 issues.Checklist
CHANGELOG.md(or this change does not warrant a changelog entry).Screenshots (if UI-related)
Not UI-related.
Related issues
Fixes #420