feat: add onCompletion webhook hooks to TaskSpawner#141
Conversation
Greptile SummaryThis PR adds an
Confidence Score: 5/5Safe to merge. All core paths — idempotency, SSRF protection, secret resolution, and annotation persistence — are implemented correctly and covered by tests. The three previously flagged implementation gaps (missing retry loop in persistAnnotationRetry, missing response-body drain, SSRF bypass via injected HTTP client and domain names) are all addressed correctly in this revision. The remaining gap is test coverage for non-2xx webhook responses — the annotation-not-persisted guarantee on delivery failure is not exercised by any table case — but the production code path itself is correct. internal/reporting/webhook_test.go — missing test case for server error responses
|
|
@greptile Review. |
1 similar comment
|
@greptile Review. |
|
@greptile Review. |
|
@greptile Review. |
1 similar comment
|
@greptile Review. |
|
@greptile Review. |
1 similar comment
|
@greptile Review. |
Add a new spec.onCompletion field to TaskSpawnerSpec that configures outbound HTTP webhook notifications when spawned Tasks reach terminal phases (Succeeded or Failed). This enables push-based alerting to external systems (Slack, PagerDuty, custom dashboards) without polling. Each hook supports phase filtering and optional Authorization header via secretRef. The hooks config is propagated to Tasks as an annotation and dispatched by the existing reporting loop for idempotency. Closes kelos-dev#749 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add retry-on-conflict loop to persistAnnotationRetry using retry.RetryOnConflict to prevent duplicate webhook deliveries - Drain response body before close for HTTP connection reuse - Use TerminalTaskPhase type with kubebuilder enum validation so the CRD rejects invalid phase values at admission time - Rewrite tests to use fake client exercising the full ReportWebhooks flow including idempotency and annotation persistence Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The persistent execution mode test involves multiple reconcile cycles (session controller + task controller) and 10s is too tight under CI resource contention. Increase to 30s to match the complexity of the test's multi-step lifecycle verification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Persist idempotency annotation even when all hooks are filtered by phase, preventing infinite re-evaluation on every reporting cycle - Remove spec.onCompletion gate so in-flight tasks get webhooks even if the spawner spec is later modified - Redact webhook URL from error messages to avoid leaking URL-embedded tokens (use hook name instead) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Restrict CRD URL pattern to HTTPS-only (^https://.+) - Add runtime host validation rejecting private, loopback, and link-local IP ranges before dispatching webhooks - Apply CheckRedirect policy on the default HTTP client to block redirects to internal addresses - Initialize a shared default HTTP client instead of allocating per-call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CheckRedirect policy on defaultWebhookHTTPClient was unreachable when runWebhookReportingCycle injects cfg.HTTPClient. Now httpClient() shallow-clones the injected client and applies ssrfCheckRedirect when no CheckRedirect is already configured, closing the redirect-based SSRF bypass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add IPv6 loopback (::1/128), link-local (fe80::/10), and unique-local (fc00::/7) ranges to isPrivateIP so webhook URLs targeting IPv6 internal addresses are rejected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
validateWebhookURL now performs DNS resolution for non-IP hostnames and checks all resulting addresses against the private IP blocklist. This prevents SSRF via internal Kubernetes service FQDNs or attacker domains with private-IP DNS records. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sure - Switch persistAnnotationRetry from full-object Update to MergeFrom Patch, preventing accidental clobber of concurrent changes from other controllers - Document that webhook URLs are stored in task annotations and should not contain embedded tokens; use secretRef instead Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test case verifying that when the webhook endpoint returns a 5xx status, ReportWebhooks returns an error and does not persist the idempotency annotation, allowing retry on the next reconcile cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change secretRef behavior so all keys in the referenced Secret are sent as HTTP headers on the webhook request (not just Authorization). This allows users to configure any headers their webhook endpoint requires (e.g. X-Api-Key, X-Webhook-Secret) by adding keys to the Secret. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These intermediate CRD/RBAC files are generated by controller-gen and processed by hack/update-install-manifest.sh into the final manifests. They should not be checked in. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a3aa19b to
a8e5b67
Compare
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds a new
spec.onCompletionfield toTaskSpawnerSpecthat configures outbound HTTP webhook notifications when spawned Tasks reach terminal phases (Succeeded or Failed). This enables push-based alerting to external systems (Slack incoming webhooks, PagerDuty, custom dashboards) without requiring users to poll the Kubernetes API or set up custom Prometheus alerting rules.Key design decisions:
kelos.dev/on-completion) at spawn time so the reporting loop can dispatch without looking up the TaskSpawnerkelos.dev/webhook-report-phaseannotation prevents duplicate deliveriesWhich issue(s) this PR is related to:
Fixes kelos-dev#749
Special notes for your reviewer:
This is Phase 1 of the incremental adoption path described in kelos-dev#749. Future phases will add built-in Slack notification formatting and consolidate
GitHubReportinginto the same framework.The webhook payload includes: task name, namespace, spawner name, phase, message, agent type, model, start/completion times, outputs, and results (cost/tokens).
The
secretReffield references a Secret whose keys are used as HTTP request headers. For example, a Secret with keysAuthorizationandX-Api-Keywill set both headers on the webhook request.Does this PR introduce a user-facing change?