Skip to content

Commit bb03d72

Browse files
authored
Merge pull request #5 from K8Harness/feat/approval-flow
feat(approval-flow): add human approval hold flow
2 parents 618b5f6 + 703578a commit bb03d72

29 files changed

Lines changed: 3983 additions & 107 deletions

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ Thumbs.db
5353
data/
5454
logs/
5555
*.log
56+
.compose-bin/
5657
postgres-data/
5758
redis-data/
5859

@@ -74,4 +75,3 @@ v0.md
7475
agents.md
7576

7677

77-

.kiro/specs/approval-flow/design.md

Lines changed: 667 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Requirements Document
2+
3+
## Introduction
4+
5+
The approval-flow feature enables human-in-the-loop oversight for tool calls that the policy gate classifies as `approvalRequired`. When such a call is received, the gateway holds the HTTP connection open, sends a Slack Block Kit notification with the tool details and decision buttons, and waits for a manager to approve or deny within a 5-minute window. An approved call is forwarded to the upstream MCP server and the result returned to the client; a denied or timed-out call receives a JSON-RPC error response.
6+
7+
## Boundary Context
8+
9+
- **In scope**: synchronous hold (HTTP connection open during approval wait), Slack Block Kit notification, `/slack/actions` webhook endpoint, Slack request signature verification, decision resume signaling, durable ticket status updates before signaling, 5-minute fixed timeout with deny-on-timeout, session mutex TTL extension during approval wait.
10+
- **Out of scope**: email, PagerDuty, Teams, or other notification channels; multi-approver quorum; approval history dashboard or query API; async approval (fire-and-return) — v0 uses synchronous hold only; approval state surviving a gateway restart — a restart during an approval wait results in a client-side timeout on retry.
11+
- **Adjacent expectations**: policy-gate creates ticket records with `pending` status for `approvalRequired` cases — this feature reads and updates those records, changing the prior behavior of returning a `pending` response immediately; session-mgmt provides a session mutex TTL extension mechanism — this feature calls it to keep the session lock alive during the approval wait; the Slack API delivers interactive action payloads via HTTP POST to the gateway's webhook endpoint.
12+
13+
## Requirements
14+
15+
### Requirement 1: Synchronous Approval Hold
16+
17+
**Objective:** As a gateway operator, I want the gateway to hold the HTTP connection open while awaiting a human decision, so that the agent client receives the tool call result only after the decision is made.
18+
19+
#### Acceptance Criteria
20+
1. When the policy gate classifies a tool call as `approvalRequired`, the Gateway shall suspend the response and hold the HTTP connection open until either a decision signal is received or the timeout fires.
21+
2. While an approval hold is active, the Gateway shall periodically extend the session mutex lock TTL to prevent it from expiring before the decision arrives.
22+
3. When a decision signal is received and the decision is `approved`, the Gateway shall forward the original tool call to the upstream MCP server and return its result to the client.
23+
4. When a decision signal is received and the decision is `denied`, the Gateway shall return a JSON-RPC error response with code `-32001` and message `"approval denied"`.
24+
5. If no decision is received within 5 minutes, the Gateway shall return a JSON-RPC error response with code `-32001` and message `"approval timeout"`.
25+
26+
### Requirement 2: Slack Approval Notification
27+
28+
**Objective:** As a manager, I want to receive a Slack message when a tool call requires my approval, so that I can make an informed, timely decision.
29+
30+
#### Acceptance Criteria
31+
1. When a tool call is held for approval, the Gateway shall send a Slack Block Kit message to the configured Slack channel containing: tool name, method/operation, call arguments, and session ID.
32+
2. The Slack Block Kit message shall include an Approve button and a Deny button, each carrying the associated ticket ID as an action value.
33+
3. If sending the Slack notification fails, the Gateway shall log the failure with the ticket ID and continue the approval hold — the timeout path still applies.
34+
4. The Gateway shall send the Slack notification using credentials read exclusively from environment variables; credentials must not be hardcoded.
35+
36+
### Requirement 3: Slack Webhook Endpoint and Signature Verification
37+
38+
**Objective:** As a gateway operator, I want the gateway to receive and verify Slack interactive action payloads, so that only legitimate Slack-originated decisions are processed.
39+
40+
#### Acceptance Criteria
41+
1. The Gateway shall expose a `POST /slack/actions` HTTP endpoint to receive Slack interactive component action payloads.
42+
2. When a request arrives at `/slack/actions`, the Gateway shall verify the Slack request signature using HMAC-SHA256 with the `X-Slack-Signature` and `X-Slack-Request-Timestamp` headers against the configured signing secret before processing the payload.
43+
3. If signature verification fails, the Gateway shall return HTTP 400 and take no further action on the payload.
44+
4. If the `X-Slack-Request-Timestamp` value is more than 5 minutes in the past, the Gateway shall reject the request with HTTP 400 to prevent replay attacks.
45+
5. The Gateway shall read the Slack signing secret from an environment variable; the secret must not be hardcoded.
46+
47+
### Requirement 4: Decision Persistence and Resume Signaling
48+
49+
**Objective:** As a gateway operator, I want approval decisions to be durably recorded before the waiting handler is notified, so that no decision is lost if the gateway restarts between recording and signaling.
50+
51+
#### Acceptance Criteria
52+
1. When a valid `approve` action is received at `/slack/actions`, the Gateway shall update the ticket record status to `approved` with a decision timestamp, and then publish a resume signal for that ticket, before returning a response to Slack.
53+
2. When a valid `deny` action is received at `/slack/actions`, the Gateway shall update the ticket record status to `denied` with a decision timestamp, and then publish a resume signal for that ticket, before returning a response to Slack.
54+
3. The Gateway shall always update the ticket record before publishing the resume signal.
55+
4. When the Gateway acknowledges a Slack action, it shall return HTTP 200 to dismiss the Slack button interaction.
56+
57+
### Requirement 5: Timeout and Failure Handling
58+
59+
**Objective:** As an agent client, I want the gateway to close the held connection with a clear error when no decision arrives in time, so that the connection does not hang indefinitely.
60+
61+
#### Acceptance Criteria
62+
1. If no decision signal is received within 5 minutes of the approval hold beginning, the Gateway shall return a JSON-RPC error response with code `-32001` and message `"approval timeout"`.
63+
2. When a timeout occurs, the Gateway shall update the ticket record status to `expired`.
64+
3. If the decision notification channel fails during the approval wait, the Gateway shall treat the failure as a timeout and return the same error response as criterion 1.
65+
4. The Gateway shall log all timeout and error events with the ticket ID, session ID, and failure cause.
66+
67+
### Requirement 6: Configuration
68+
69+
**Objective:** As a gateway operator, I want all Slack integration settings to be provided via environment variables, so that credentials and channel configuration are not embedded in the binary.
70+
71+
#### Acceptance Criteria
72+
1. The Gateway shall read the Slack notification channel identifier from an environment variable.
73+
2. The Gateway shall read the Slack signing secret from an environment variable.
74+
3. The Gateway shall read the Slack notification credential (webhook URL or bot token) from an environment variable.
75+
4. If any required Slack environment variable is absent at startup, the Gateway shall log an error describing which variable is missing and refuse to start.
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Research & Design Decisions: approval-flow
2+
3+
## Summary
4+
5+
- **Feature**: `approval-flow`
6+
- **Discovery Scope**: Complex Integration
7+
- **Key Findings**:
8+
- The pipeline's `(nil, nil)` continue-contract means `PolicyGateHandler` can block inline for the approval wait and return `(nil, nil)` on approval — the pipeline naturally continues to `UpstreamForwarder` with no forwarder injection or pipeline restructuring
9+
- Slack block_actions payloads arrive URL-encoded with a `payload` form field containing JSON; raw body must be read before any parsing for HMAC verification to work correctly
10+
- `go-redis/v9` is already present in the codebase; `client.Subscribe().Channel()` returns a Go channel that closes on connection loss — the nil-message case directly maps to the timeout fallback path (req 5.3)
11+
12+
---
13+
14+
## Research Log
15+
16+
### Slack Interactive Components Payload Format
17+
18+
- **Context**: Need to parse Slack manager's Approve/Deny click to extract ticket ID and decision
19+
- **Sources**: Slack API block_actions interaction payload documentation
20+
- **Findings**:
21+
- POST arrives as `application/x-www-form-urlencoded`; single field `payload` contains URL-encoded JSON
22+
- Top-level `type = "block_actions"` (not `interactive_message` — that is the legacy format)
23+
- Decision routing: `actions[0].action_id` (`"approval_approve"` / `"approval_deny"`)
24+
- Ticket ID carrier: `actions[0].value` (arbitrary string set when composing the Block Kit message)
25+
- No `callback_id` at root for block_actions — that belongs to the legacy payload type
26+
- `user.id` available for audit logging of `decision_by`
27+
- **Implications**: Parse `payload` field after URL-decoding from raw body; route on `action_id`; extract ticketID from `value`
28+
29+
### Slack Request Signature Verification
30+
31+
- **Context**: Req 3.2–3.4 require HMAC-SHA256 verification with replay protection
32+
- **Sources**: Slack verifying-requests-from-slack documentation
33+
- **Findings**:
34+
- Base string format: `"v0:" + X-Slack-Request-Timestamp + ":" + rawBody`
35+
- HMAC key: Slack Signing Secret (from app's Basic Info panel)
36+
- Expected signature format: `"v0=" + hex(HMAC-SHA256(key, basestring))`
37+
- Compare with `X-Slack-Signature` header using constant-time comparison (`hmac.Equal`)
38+
- Raw body must be read before any form parsing (body reader is consumed)
39+
- **Implications**: Buffer raw body first; timestamp check first (cheap) then HMAC check; use `hmac.Equal` not `==`
40+
41+
### Redis Pub/Sub API (go-redis/v9)
42+
43+
- **Context**: Need event-driven resume signal delivery from SlackWebhookHandler to waiting PolicyGateHandler
44+
- **Sources**: go-redis/v9 pkg.go.dev documentation
45+
- **Findings**:
46+
- `client.Subscribe(ctx, channel)` returns `*redis.PubSub`
47+
- `pubsub.Channel()` returns `<-chan *redis.Message`; closes when `pubsub.Close()` is called or connection drops
48+
- `client.Publish(ctx, channel, payload)` publishes to all current subscribers
49+
- Nil message received from a closed channel — select `case msg, ok := <-ch` where `!ok` indicates closure
50+
- No persistence: if no subscriber is active when signal is published, signal is lost
51+
- **Implications**: Nil/closed channel case must be handled as a timeout fallback; always `defer pubsub.Close()`
52+
53+
### Existing Pipeline Handler Contract
54+
55+
- **Context**: Need to understand how to wire the approval hold without modifying the pipeline
56+
- **Sources**: `/Users/henry/Programming/ToolGate/core/mcp/pipeline.go`, `handler.go`
57+
- **Findings**:
58+
- `(nil, nil)` → continue to next registered handler or terminal forwarder
59+
- `(*JSONRPCResponse, nil)` → halt pipeline, return response to client
60+
- `PolicyGateHandler` is the last `pipeline.Use()` middleware before the terminal `UpstreamForwarder`
61+
- Returning `(nil, nil)` from `PolicyGateHandler` on approval naturally calls `UpstreamForwarder.Handle(ctx, req)` with the original in-scope `req`
62+
- **Implications**: No forwarder injection needed; eliminates a component from the design
63+
64+
### Existing TicketStore and TicketRecord
65+
66+
- **Context**: Need to understand the current ticket API before extending it
67+
- **Sources**: `/Users/henry/Programming/ToolGate/cmd/gateway/ticket.go`
68+
- **Findings**:
69+
- `TicketRecord{SessionID, TurnID, ToolName, Arguments, ExpiresAt}` — no ID field
70+
- `Insert(ctx, TicketRecord) (string, error)` — returns UUID as string
71+
- `approved`, `denied`, `expired` status values exist in the DB schema (policy-gate defined them)
72+
- No `UpdateStatus` or `GetByID` currently exists
73+
- **Implications**: Add `UpdateStatus(ctx, id, status, decidedBy string) error`; idempotent (WHERE status='pending'); no GetByID needed since decision is in pub/sub payload
74+
75+
---
76+
77+
## Architecture Pattern Evaluation
78+
79+
| Option | Description | Strengths | Risks / Limitations | Decision |
80+
|--------|-------------|-----------|---------------------|----------|
81+
| Block inline in PolicyGateHandler | Block via `WaitForDecision`; return `(nil,nil)` on approve | No pipeline changes; original req stays in scope for free forwarding | Holds goroutine for 5 min; HTTP WriteTimeout must accommodate | **Selected** |
82+
| Separate ApprovalInterceptor middleware | Wrap PolicyGateHandler; detect pending response; run approval hold | No changes to PolicyGateHandler | Requires parsing response body to detect "pending" — fragile | Rejected |
83+
| Async resume (fire-and-return) | Return immediately; resume via callback when decision arrives | No long-held connections | Requires async request resumption infrastructure; explicitly out-of-scope for v0 | Rejected (v1+) |
84+
85+
---
86+
87+
## Design Decisions
88+
89+
### Decision: `(nil, nil)` continue-contract eliminates forwarder injection
90+
91+
- **Context**: On approval, the gateway must forward the original MCP request to the upstream server
92+
- **Alternatives Considered**:
93+
1. Inject `UpstreamForwarder` into `PolicyGateHandler` and call it directly
94+
2. Return `(nil, nil)` from the handler on approval; let the pipeline continue naturally
95+
- **Selected Approach**: Option 2 — return `(nil, nil)`; the pipeline calls the terminal forwarder
96+
- **Rationale**: The original `*JSONRPCRequest` stays in scope throughout the blocking wait; returning `(nil, nil)` invokes the forwarder with no additional wiring
97+
- **Trade-offs**: Subtle: the pipeline continues in the same goroutine that was blocked; context is still valid since the client is still connected
98+
- **Follow-up**: Verify that the upstream forwarder's HTTP client timeout starts fresh from the point it's called (not from the start of the approval wait)
99+
100+
### Decision: Decision carried in pub/sub payload, not re-fetched from DB
101+
102+
- **Context**: After the resume signal arrives, `WaitForDecision` needs to know the decision
103+
- **Alternatives Considered**:
104+
1. Publish a signal; waiter re-fetches ticket status from Postgres (requires `GetByID`)
105+
2. Publish the decision string in the signal payload (`"approved"` / `"denied"`)
106+
- **Selected Approach**: Option 2 — decision in payload
107+
- **Rationale**: Avoids a Postgres round-trip; simpler; the authoritative decision is already in DB (persisted before publish); the signal is just a notification
108+
- **Trade-offs**: Signal payload is trusted internal data (UUID-keyed channel, not user-controllable); no external trust issue
109+
- **Follow-up**: None
110+
111+
### Decision: SlackNotifier and ApprovalBridge as interfaces
112+
113+
- **Context**: The brief explicitly identifies both as boundary candidates for v1+ swap-in
114+
- **Alternatives Considered**:
115+
1. Concrete types only (simpler, fewer files)
116+
2. Interfaces (brief-specified extensibility)
117+
- **Selected Approach**: Interfaces — one implementation each for v0
118+
- **Rationale**: Brief explicitly states "Approval notifier interface (initially Slack; designed as interface so v1+ can swap in other channels)" and "the interface hides the [resume] mechanism". Synthesis principle allows interfaces when they are explicitly specified as boundary candidates, even with one implementation.
119+
- **Trade-offs**: Slightly more files; enables v1+ to add Teams/email without touching PolicyGateHandler
120+
- **Follow-up**: None
121+
122+
### Decision: `UpdateStatus` must be idempotent (WHERE status='pending')
123+
124+
- **Context**: Slack retries webhook delivery on 5xx responses; a duplicate action payload could double-update a ticket
125+
- **Selected Approach**: `UPDATE ticket SET ... WHERE id=$1 AND status='pending'` — no-op if already terminal
126+
- **Rationale**: Prevents double-approval; returns success even if no rows updated (Slack gets 200 on retry)
127+
- **Follow-up**: Verify Postgres `UPDATE` returning 0 rows is not treated as error in `UpdateStatus`
128+
129+
---
130+
131+
## Synthesis Outcomes
132+
133+
- **Generalization**: `ApprovalBridge` and `SlackNotifier` as interfaces generalize the "decision wait" and "notification delivery" concepts without over-building (one implementation each)
134+
- **Build vs. Adopt**: `go-redis/v9` Pub/Sub (already in codebase), `crypto/hmac`+`crypto/sha256` (stdlib), `net/http` (stdlib) — no new dependencies
135+
- **Simplification**: Removed `GetByID` (not needed; decision in payload); removed forwarder injection (pipeline contract handles it); removed `TicketRecord.ID` field addition (Insert returns ID separately)
136+
137+
---
138+
139+
## Risks & Mitigations
140+
141+
- **HTTP server WriteTimeout too short**: If `WriteTimeout < 5 minutes`, approval waits will be forcibly closed by the server before the timeout fires. Mitigation: document required server timeout configuration; add to deployment checklist.
142+
- **Gateway restart during approval wait**: In-flight approval waits are lost on restart; `WaitForDecision` context is cancelled and the client sees a connection close. The ticket remains `pending` until it expires naturally. Mitigation: document limitation; v0 spec explicitly scopes this out.
143+
- **Signal lost between UpdateStatus and Publish**: Ticket is persisted but signal is never received; waiter times out after 5 minutes. Mitigation: waiter timeout marks ticket `expired` (already terminal, no update); client sees timeout error. Acceptable for v0.
144+
- **Slack argument payload too long**: Approval messages with large arguments may exceed Slack block character limits. Mitigation: truncate arguments display in `SlackClient.SendApprovalRequest`.
145+
146+
---
147+
148+
## References
149+
150+
- Slack Block Actions payload: https://api.slack.com/reference/interaction-payloads/block-actions
151+
- Slack request signature verification: https://api.slack.com/authentication/verifying-requests-from-slack
152+
- Slack Block Kit message format: https://api.slack.com/block-kit
153+
- go-redis/v9 Pub/Sub: https://pkg.go.dev/github.com/redis/go-redis/v9
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
{
2+
"feature_name": "approval-flow",
3+
"created_at": "2026-05-15T00:00:00Z",
4+
"updated_at": "2026-05-16T00:00:00Z",
5+
"language": "en",
6+
"phase": "implemented",
7+
"approvals": {
8+
"requirements": {
9+
"generated": true,
10+
"approved": true
11+
},
12+
"design": {
13+
"generated": true,
14+
"approved": true
15+
},
16+
"tasks": {
17+
"generated": true,
18+
"approved": true
19+
}
20+
},
21+
"ready_for_implementation": true,
22+
"implementation_complete": true
23+
}

0 commit comments

Comments
 (0)