|
| 1 | +# Research & Design Decisions: approval-flow |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +- **Feature**: `approval-flow` |
| 6 | +- **Discovery Scope**: Complex Integration |
| 7 | +- **Key Findings**: |
| 8 | + - The pipeline's `(nil, nil)` continue-contract means `PolicyGateHandler` can block inline for the approval wait and return `(nil, nil)` on approval — the pipeline naturally continues to `UpstreamForwarder` with no forwarder injection or pipeline restructuring |
| 9 | + - Slack block_actions payloads arrive URL-encoded with a `payload` form field containing JSON; raw body must be read before any parsing for HMAC verification to work correctly |
| 10 | + - `go-redis/v9` is already present in the codebase; `client.Subscribe().Channel()` returns a Go channel that closes on connection loss — the nil-message case directly maps to the timeout fallback path (req 5.3) |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## Research Log |
| 15 | + |
| 16 | +### Slack Interactive Components Payload Format |
| 17 | + |
| 18 | +- **Context**: Need to parse Slack manager's Approve/Deny click to extract ticket ID and decision |
| 19 | +- **Sources**: Slack API block_actions interaction payload documentation |
| 20 | +- **Findings**: |
| 21 | + - POST arrives as `application/x-www-form-urlencoded`; single field `payload` contains URL-encoded JSON |
| 22 | + - Top-level `type = "block_actions"` (not `interactive_message` — that is the legacy format) |
| 23 | + - Decision routing: `actions[0].action_id` (`"approval_approve"` / `"approval_deny"`) |
| 24 | + - Ticket ID carrier: `actions[0].value` (arbitrary string set when composing the Block Kit message) |
| 25 | + - No `callback_id` at root for block_actions — that belongs to the legacy payload type |
| 26 | + - `user.id` available for audit logging of `decision_by` |
| 27 | +- **Implications**: Parse `payload` field after URL-decoding from raw body; route on `action_id`; extract ticketID from `value` |
| 28 | + |
| 29 | +### Slack Request Signature Verification |
| 30 | + |
| 31 | +- **Context**: Req 3.2–3.4 require HMAC-SHA256 verification with replay protection |
| 32 | +- **Sources**: Slack verifying-requests-from-slack documentation |
| 33 | +- **Findings**: |
| 34 | + - Base string format: `"v0:" + X-Slack-Request-Timestamp + ":" + rawBody` |
| 35 | + - HMAC key: Slack Signing Secret (from app's Basic Info panel) |
| 36 | + - Expected signature format: `"v0=" + hex(HMAC-SHA256(key, basestring))` |
| 37 | + - Compare with `X-Slack-Signature` header using constant-time comparison (`hmac.Equal`) |
| 38 | + - Raw body must be read before any form parsing (body reader is consumed) |
| 39 | +- **Implications**: Buffer raw body first; timestamp check first (cheap) then HMAC check; use `hmac.Equal` not `==` |
| 40 | + |
| 41 | +### Redis Pub/Sub API (go-redis/v9) |
| 42 | + |
| 43 | +- **Context**: Need event-driven resume signal delivery from SlackWebhookHandler to waiting PolicyGateHandler |
| 44 | +- **Sources**: go-redis/v9 pkg.go.dev documentation |
| 45 | +- **Findings**: |
| 46 | + - `client.Subscribe(ctx, channel)` returns `*redis.PubSub` |
| 47 | + - `pubsub.Channel()` returns `<-chan *redis.Message`; closes when `pubsub.Close()` is called or connection drops |
| 48 | + - `client.Publish(ctx, channel, payload)` publishes to all current subscribers |
| 49 | + - Nil message received from a closed channel — select `case msg, ok := <-ch` where `!ok` indicates closure |
| 50 | + - No persistence: if no subscriber is active when signal is published, signal is lost |
| 51 | +- **Implications**: Nil/closed channel case must be handled as a timeout fallback; always `defer pubsub.Close()` |
| 52 | + |
| 53 | +### Existing Pipeline Handler Contract |
| 54 | + |
| 55 | +- **Context**: Need to understand how to wire the approval hold without modifying the pipeline |
| 56 | +- **Sources**: `/Users/henry/Programming/ToolGate/core/mcp/pipeline.go`, `handler.go` |
| 57 | +- **Findings**: |
| 58 | + - `(nil, nil)` → continue to next registered handler or terminal forwarder |
| 59 | + - `(*JSONRPCResponse, nil)` → halt pipeline, return response to client |
| 60 | + - `PolicyGateHandler` is the last `pipeline.Use()` middleware before the terminal `UpstreamForwarder` |
| 61 | + - Returning `(nil, nil)` from `PolicyGateHandler` on approval naturally calls `UpstreamForwarder.Handle(ctx, req)` with the original in-scope `req` |
| 62 | +- **Implications**: No forwarder injection needed; eliminates a component from the design |
| 63 | + |
| 64 | +### Existing TicketStore and TicketRecord |
| 65 | + |
| 66 | +- **Context**: Need to understand the current ticket API before extending it |
| 67 | +- **Sources**: `/Users/henry/Programming/ToolGate/cmd/gateway/ticket.go` |
| 68 | +- **Findings**: |
| 69 | + - `TicketRecord{SessionID, TurnID, ToolName, Arguments, ExpiresAt}` — no ID field |
| 70 | + - `Insert(ctx, TicketRecord) (string, error)` — returns UUID as string |
| 71 | + - `approved`, `denied`, `expired` status values exist in the DB schema (policy-gate defined them) |
| 72 | + - No `UpdateStatus` or `GetByID` currently exists |
| 73 | +- **Implications**: Add `UpdateStatus(ctx, id, status, decidedBy string) error`; idempotent (WHERE status='pending'); no GetByID needed since decision is in pub/sub payload |
| 74 | + |
| 75 | +--- |
| 76 | + |
| 77 | +## Architecture Pattern Evaluation |
| 78 | + |
| 79 | +| Option | Description | Strengths | Risks / Limitations | Decision | |
| 80 | +|--------|-------------|-----------|---------------------|----------| |
| 81 | +| Block inline in PolicyGateHandler | Block via `WaitForDecision`; return `(nil,nil)` on approve | No pipeline changes; original req stays in scope for free forwarding | Holds goroutine for 5 min; HTTP WriteTimeout must accommodate | **Selected** | |
| 82 | +| Separate ApprovalInterceptor middleware | Wrap PolicyGateHandler; detect pending response; run approval hold | No changes to PolicyGateHandler | Requires parsing response body to detect "pending" — fragile | Rejected | |
| 83 | +| Async resume (fire-and-return) | Return immediately; resume via callback when decision arrives | No long-held connections | Requires async request resumption infrastructure; explicitly out-of-scope for v0 | Rejected (v1+) | |
| 84 | + |
| 85 | +--- |
| 86 | + |
| 87 | +## Design Decisions |
| 88 | + |
| 89 | +### Decision: `(nil, nil)` continue-contract eliminates forwarder injection |
| 90 | + |
| 91 | +- **Context**: On approval, the gateway must forward the original MCP request to the upstream server |
| 92 | +- **Alternatives Considered**: |
| 93 | + 1. Inject `UpstreamForwarder` into `PolicyGateHandler` and call it directly |
| 94 | + 2. Return `(nil, nil)` from the handler on approval; let the pipeline continue naturally |
| 95 | +- **Selected Approach**: Option 2 — return `(nil, nil)`; the pipeline calls the terminal forwarder |
| 96 | +- **Rationale**: The original `*JSONRPCRequest` stays in scope throughout the blocking wait; returning `(nil, nil)` invokes the forwarder with no additional wiring |
| 97 | +- **Trade-offs**: Subtle: the pipeline continues in the same goroutine that was blocked; context is still valid since the client is still connected |
| 98 | +- **Follow-up**: Verify that the upstream forwarder's HTTP client timeout starts fresh from the point it's called (not from the start of the approval wait) |
| 99 | + |
| 100 | +### Decision: Decision carried in pub/sub payload, not re-fetched from DB |
| 101 | + |
| 102 | +- **Context**: After the resume signal arrives, `WaitForDecision` needs to know the decision |
| 103 | +- **Alternatives Considered**: |
| 104 | + 1. Publish a signal; waiter re-fetches ticket status from Postgres (requires `GetByID`) |
| 105 | + 2. Publish the decision string in the signal payload (`"approved"` / `"denied"`) |
| 106 | +- **Selected Approach**: Option 2 — decision in payload |
| 107 | +- **Rationale**: Avoids a Postgres round-trip; simpler; the authoritative decision is already in DB (persisted before publish); the signal is just a notification |
| 108 | +- **Trade-offs**: Signal payload is trusted internal data (UUID-keyed channel, not user-controllable); no external trust issue |
| 109 | +- **Follow-up**: None |
| 110 | + |
| 111 | +### Decision: SlackNotifier and ApprovalBridge as interfaces |
| 112 | + |
| 113 | +- **Context**: The brief explicitly identifies both as boundary candidates for v1+ swap-in |
| 114 | +- **Alternatives Considered**: |
| 115 | + 1. Concrete types only (simpler, fewer files) |
| 116 | + 2. Interfaces (brief-specified extensibility) |
| 117 | +- **Selected Approach**: Interfaces — one implementation each for v0 |
| 118 | +- **Rationale**: Brief explicitly states "Approval notifier interface (initially Slack; designed as interface so v1+ can swap in other channels)" and "the interface hides the [resume] mechanism". Synthesis principle allows interfaces when they are explicitly specified as boundary candidates, even with one implementation. |
| 119 | +- **Trade-offs**: Slightly more files; enables v1+ to add Teams/email without touching PolicyGateHandler |
| 120 | +- **Follow-up**: None |
| 121 | + |
| 122 | +### Decision: `UpdateStatus` must be idempotent (WHERE status='pending') |
| 123 | + |
| 124 | +- **Context**: Slack retries webhook delivery on 5xx responses; a duplicate action payload could double-update a ticket |
| 125 | +- **Selected Approach**: `UPDATE ticket SET ... WHERE id=$1 AND status='pending'` — no-op if already terminal |
| 126 | +- **Rationale**: Prevents double-approval; returns success even if no rows updated (Slack gets 200 on retry) |
| 127 | +- **Follow-up**: Verify Postgres `UPDATE` returning 0 rows is not treated as error in `UpdateStatus` |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +## Synthesis Outcomes |
| 132 | + |
| 133 | +- **Generalization**: `ApprovalBridge` and `SlackNotifier` as interfaces generalize the "decision wait" and "notification delivery" concepts without over-building (one implementation each) |
| 134 | +- **Build vs. Adopt**: `go-redis/v9` Pub/Sub (already in codebase), `crypto/hmac`+`crypto/sha256` (stdlib), `net/http` (stdlib) — no new dependencies |
| 135 | +- **Simplification**: Removed `GetByID` (not needed; decision in payload); removed forwarder injection (pipeline contract handles it); removed `TicketRecord.ID` field addition (Insert returns ID separately) |
| 136 | + |
| 137 | +--- |
| 138 | + |
| 139 | +## Risks & Mitigations |
| 140 | + |
| 141 | +- **HTTP server WriteTimeout too short**: If `WriteTimeout < 5 minutes`, approval waits will be forcibly closed by the server before the timeout fires. Mitigation: document required server timeout configuration; add to deployment checklist. |
| 142 | +- **Gateway restart during approval wait**: In-flight approval waits are lost on restart; `WaitForDecision` context is cancelled and the client sees a connection close. The ticket remains `pending` until it expires naturally. Mitigation: document limitation; v0 spec explicitly scopes this out. |
| 143 | +- **Signal lost between UpdateStatus and Publish**: Ticket is persisted but signal is never received; waiter times out after 5 minutes. Mitigation: waiter timeout marks ticket `expired` (already terminal, no update); client sees timeout error. Acceptable for v0. |
| 144 | +- **Slack argument payload too long**: Approval messages with large arguments may exceed Slack block character limits. Mitigation: truncate arguments display in `SlackClient.SendApprovalRequest`. |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +## References |
| 149 | + |
| 150 | +- Slack Block Actions payload: https://api.slack.com/reference/interaction-payloads/block-actions |
| 151 | +- Slack request signature verification: https://api.slack.com/authentication/verifying-requests-from-slack |
| 152 | +- Slack Block Kit message format: https://api.slack.com/block-kit |
| 153 | +- go-redis/v9 Pub/Sub: https://pkg.go.dev/github.com/redis/go-redis/v9 |
0 commit comments