Main to prod by knechtionscoding · Pull Request #123 · datagravity-ai/kelos

knechtionscoding · 2026-05-05T18:43:53Z

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

docs: document Workspace.spec.setupCommand in reference.md

Add ExcludePatterns field to the Slack TaskSpawner source. Regex patterns (RE2) reject messages when any pattern matches, enabling negative routing between multiple Slack-triggered TaskSpawners. Leading @-mentions are stripped before matching so patterns target semantic content. Does not apply to slash commands. Co-Authored-By: Claude Opus 4.6 <[email protected]> Co-authored-by: jkahuja <[email protected]>

…ude-patterns feat(api): add excludePatterns to Slack source for negative routing

test(e2e): poll Task phase via WaitForTaskPhase to fix flake

…mmentOn Mark bodyContains as deprecated and add reference rows for its regex replacements (bodyPattern, excludeBodyPatterns) and the related issue_comment scoping field (commentOn). New users following reference.md were silently being steered toward the deprecated field. Co-Authored-By: Claude Opus 4.7 <[email protected]>

docs: document githubWebhook bodyPattern, excludeBodyPatterns, and commentOn

…spawners Add `excludeAuthors: [kelos-bot[bot]]` at the top level of webhook-driven self-development TaskSpawners and remove the now-redundant `kelos-bot[bot]` positive author allowlist entries. Top-level ExcludeAuthors is evaluated before filters and takes precedence over filter-level Author matches, so agent-authored comments containing strings like `/kelos api-review` no longer recursively trigger another spawner. Affects: kelos-workers, kelos-planner, kelos-pr-responder, kelos-reviewer, kelos-api-reviewer. kelos-triage is left unchanged so bot-created issues (e.g. from kelos-self-update) continue to be triaged. kelos-squash-commits already restricts to gjkim42 only.

self-development: prevent kelos-bot from triggering self-development spawners

watchTask returned nil for both Succeeded and Failed phases, so 'kelos run -w' silently exited 0 even when the task failed — breaking CI/CD integration where downstream steps assumed success. Return a non-nil error on TaskPhaseFailed and print a diagnostic hint pointing users at 'kelos logs' and 'kelos get tasks -d' for details. Co-Authored-By: Claude Opus 4.7 <[email protected]>

fix(cli): exit non-zero on failed task in 'run -w'

greptile-apps · 2026-05-05T18:46:29Z

Greptile Summary

This PR merges main into prod, bringing three primary features: (1) a new ExcludePatterns field on the Slack TaskSpawner spec that rejects messages whose text matches any listed RE2 pattern, (2) improved watchTask testability and failure UX in the CLI, and (3) more reliable e2e phase assertions using a shared WaitForTaskPhase helper.

Slack exclude patterns (internal/slack/filter.go, api/v1alpha1/taskspawner_types.go): adds matchesExcludePatterns and stripLeadingMentions helpers, updates MatchesSpawner to apply the exclude check after the positive-match gate, and regenerates the deepcopy and CRD manifests.
CLI watchTask refactor (internal/cli/run.go, run_test.go): accepts io.Writer outputs for testability, writes an actionable hint to errOut on failure, and returns a non-nil error so callers can surface task failure to the user.
E2E reliability (test/e2e/, framework.go): replaces bare Expect(f.GetTaskPhase(...)) point-in-time reads with f.WaitForTaskPhase(...) polling to tolerate the async gap between Job completion and Task status reconciliation.

Confidence Score: 4/5

The change is safe to merge; logic is well-tested and the new ExcludePatterns path correctly gates on the positive-match result before applying exclusion.

Two minor concerns exist: the stdlib log call in filter.go likely doesn't integrate with the controller's structured logger, and the fixed 2-minute timeout in WaitForTaskPhase inflates the failure window for the dep-chain Waiting assertion that was previously bounded at 30 seconds. Neither affects correctness in production.

internal/slack/filter.go (stdlib log import) and test/e2e/framework/framework.go (hardcoded 2-minute timeout in WaitForTaskPhase).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Slack message received] --> B{matchesChannel?}
    B -- No --> REJECT1[return false]
    B -- Yes --> C{IsSlashCommand?}
    C -- Yes --> ALLOW1[return true bypass all filters]
    C -- No --> D{len Triggers == 0?}
    D -- Yes --> E[hasBotMention]
    D -- No --> F[matchesTriggers strips leading mentions checks pattern + mention]
    E --> G{positiveMatch?}
    F --> G
    G -- No --> REJECT2[return false]
    G -- Yes --> H[matchesExcludePatterns strips leading mentions OR across patterns]
    H -- matches --> REJECT3[return false]
    H -- no match --> ALLOW2[return true]

Comments Outside Diff (1)

test/e2e/framework/framework.go, line 808-816 (link)

Fixed 2-minute timeout hides fast-fail phase transitions

WaitForTaskPhase always waits up to 2 minutes. The dep-chain test previously used a targeted 30-second window to assert that dep-chain-b reaches the Waiting phase: a transient scheduling bug that prevents Waiting from being set now takes up to 2 minutes to surface instead of 30 seconds. Consider accepting a timeout time.Duration parameter so each call site can express the expected latency, or at least using a shorter default for phase assertions that should resolve in seconds.

Prompt To Fix With AI

This is a comment left during a code review.
Path: test/e2e/framework/framework.go
Line: 808-816

Comment:
**Fixed 2-minute timeout hides fast-fail phase transitions**

`WaitForTaskPhase` always waits up to 2 minutes. The dep-chain test previously used a targeted 30-second window to assert that `dep-chain-b` reaches the `Waiting` phase: a transient scheduling bug that prevents `Waiting` from being set now takes up to 2 minutes to surface instead of 30 seconds. Consider accepting a `timeout time.Duration` parameter so each call site can express the expected latency, or at least using a shorter default for phase assertions that should resolve in seconds.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
internal/slack/filter.go:60
**Stdlib `log` inconsistent with controller logging**

`log.Printf` (stdlib) was introduced here, while Kubernetes controllers typically route all structured logging through `logr` or `slog`. This call produces timestamped output to `stderr` without any of the key/value context that the rest of the controller surface uses. If an invalid pattern is stored in a CRD and the controller is running with a structured-log aggregator, this message will likely end up in an unstructured log stream and may be missed by alerting rules.

### Issue 2 of 2
test/e2e/framework/framework.go:808-816
**Fixed 2-minute timeout hides fast-fail phase transitions**

`WaitForTaskPhase` always waits up to 2 minutes. The dep-chain test previously used a targeted 30-second window to assert that `dep-chain-b` reaches the `Waiting` phase: a transient scheduling bug that prevents `Waiting` from being set now takes up to 2 minutes to surface instead of 30 seconds. Consider accepting a `timeout time.Duration` parameter so each call site can express the expected latency, or at least using a shorter default for phase assertions that should resolve in seconds.

_{Reviews (1): Last reviewed commit: "Merge pull request #1101 from kelos-dev/..." | Re-trigger Greptile}

greptile-apps · 2026-05-05T18:46:33Z

+	}
 	if err != nil {
-		return nil, err
+		log.Printf("Invalid regex pattern %q: %v", pattern, err)


Stdlib log inconsistent with controller logging

log.Printf (stdlib) was introduced here, while Kubernetes controllers typically route all structured logging through logr or slog. This call produces timestamped output to stderr without any of the key/value context that the rest of the controller surface uses. If an invalid pattern is stored in a CRD and the controller is running with a structured-log aggregator, this message will likely end up in an unstructured log stream and may be missed by alerting rules.

Prompt To Fix With AI

This is a comment left during a code review. Path: internal/slack/filter.go Line: 60 Comment: **Stdlib `log` inconsistent with controller logging** `log.Printf` (stdlib) was introduced here, while Kubernetes controllers typically route all structured logging through `logr` or `slog`. This call produces timestamped output to `stderr` without any of the key/value context that the rest of the controller surface uses. If an invalid pattern is stored in a CRD and the controller is running with a structured-log aggregator, this message will likely end up in an unstructured log stream and may be missed by alerting rules. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

anomalogravity

Auto-approved: PR assessed as low risk by Gravity.

gjkim42 and others added 12 commits May 2, 2026 13:31

docs: document Workspace.spec.setupCommand

2b2563d

Merge pull request kelos-dev#1077 from kelos-dev/kelos-task-1075

5fc47bb

docs: document Workspace.spec.setupCommand in reference.md

test(e2e): poll Task phase via WaitForTaskPhase to fix flake

c53538a

Merge pull request kelos-dev#1082 from datagravity-ai/feat/slack-excl…

2b6253b

…ude-patterns feat(api): add excludePatterns to Slack source for negative routing

Merge pull request kelos-dev#1084 from kelos-dev/fix-e2e-flake

47031af

test(e2e): poll Task phase via WaitForTaskPhase to fix flake

Merge pull request kelos-dev#1095 from kelos-dev/kelos-task-1092

a720a7c

docs: document githubWebhook bodyPattern, excludeBodyPatterns, and commentOn

Merge pull request kelos-dev#1100 from kelos-dev/kelos-task-1098

41b8297

self-development: prevent kelos-bot from triggering self-development spawners

Merge pull request kelos-dev#1101 from kelos-dev/kelos-task-1090

08826d6

fix(cli): exit non-zero on failed task in 'run -w'

github-actions Bot added needs-triage needs-kind needs-priority needs-actor needs-release-note labels May 5, 2026

knechtionscoding closed this May 5, 2026

greptile-apps Bot reviewed May 5, 2026

View reviewed changes

anomalogravity Bot approved these changes May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main to prod#123

Main to prod#123
knechtionscoding wants to merge 12 commits into
prodfrom
main

knechtionscoding commented May 5, 2026

Uh oh!

greptile-apps Bot commented May 5, 2026 •

edited

Loading

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 5, 2026

Uh oh!

anomalogravity Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

knechtionscoding commented May 5, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

greptile-apps Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

anomalogravity Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 5, 2026 •

edited

Loading