Main by knechtionscoding · Pull Request #115 · datagravity-ai/kelos

knechtionscoding · 2026-04-29T15:38:08Z

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The CLI printer was reading only the deprecated top-level spec.pollInterval, so users who set the recommended per-source pollInterval (e.g., spec.when.githubIssues.pollInterval) saw the stale deprecated default and assumed their override was ignored. Mirror the resolution in cmd/kelos-spawner/reconciler.go: the active source's pollInterval takes precedence, and we fall back to spec.pollInterval only when no per-source override is set. Co-Authored-By: Claude Opus 4.7 <[email protected]>

The Quick Start documents `oauthToken: "@~/.codex/auth.json"`, but the Go runtime does not perform shell-style tilde expansion. resolveContent called os.ReadFile with the literal path, producing "open ~/.codex/auth.json: no such file or directory". Resolve a leading "~" or "~/" to os.UserHomeDir() before reading, so the documented form works as advertised. The user-facing error still reports the original path. Add unit tests for both expandHome and the @~/ form through the run dry-run flow. Co-Authored-By: Claude Opus 4.7 <[email protected]>

Show effective poll interval in kelos get taskspawner

Expand ~ in @file paths for CLI config

Add configurable service type for webhook Services

Drop the explicit "max" effort level override from TaskSpawners and the fake-strategist Task so they fall back to the default effort.

Unset CLAUDE_CODE_EFFORT_LEVEL in self-development workflows

- Add the resolve-previous-review-threads rule to Standards so re-runs do not pile up stale inline comments on the same PR - Qualify the Makefile-targets convention with "When referencing project validation commands" since the api-reviewer is read-only and does not run validation locally - Replace the literal `{owner}/{repo}` segment in the inline review example with `gh api`'s `:owner/:repo` placeholders so the documented command does not 404

…mage-2.1.121 Update claude-code image to 2.1.121

…e-1.14.28 Update opencode image to 1.14.28

Align kelos-api-reviewer AgentConfig and prompt with kelos-reviewer

The report-pr-e2e-status job in CI was treating listWorkflowRunsForRepo as a precondition for writing the commit status: if the listing returned no runs (or only stale runs) for the head SHA, it logged a message and returned without calling createCommitStatus, leaving the previously-set pending value untouched. That listing is eventually consistent. The in-flight current run is not always indexed at the moment the report step queries the API, so a benign race could leave pr-e2e stuck on pending indefinitely with no way to recover except a CI re-run. Make the listing advisory: it is used only to detect a strictly newer run that has superseded the current one, and any failure or empty result is logged and ignored so that the status is still written from needs.test-e2e.result. Co-Authored-By: Claude Opus 4.7 <[email protected]>

Always write pr-e2e status when test-e2e completes

Add user-facing docs for the spec.when.webhook (GenericWebhook) source type that has been merged but undocumented: - README.md: list Generic Webhooks (and Linear Webhooks) in the TaskSpawner integration summary. - docs/integration.md: add a Generic Webhooks section covering source, fieldMapping, filters, setup, and template variables; add a Generic Webhook column to the source-vs-variable table. - docs/reference.md: add spec.when.webhook.* rows to the TaskSpawner field reference and a Generic Webhook column to the promptTemplate variables table; mark workspaceRef as required for webhook sources. - examples/13-taskspawner-generic-webhook: add a Sentry-driven example with annotated YAML and a README walkthrough. Co-Authored-By: Claude Opus 4.7 <[email protected]>

The previous docs claimed the generic webhook endpoint validates each delivery against a per-source <SOURCE>_WEBHOOK_SECRET via X-Hub-Signature-256, but the handler does not currently read or verify any signature header — operators following the docs would deploy an endpoint they believed was HMAC-protected but that actually accepts any unauthenticated POST. Replace the HMAC-validation language with a security warning that describes the current state honestly, recommends restricting access at the network layer (NetworkPolicy / Ingress / Gateway IP allowlists), and links to the follow-up issue tracking the validation work. Note the chart's webhookServer.sources.generic.secretName field as reserved for future use. Also list examples/12-taskspawner-file-patterns/ in the examples index so it no longer skips from 11 to 13. Co-Authored-By: Claude Opus 4.7 <[email protected]>

Adds four optional fields to the shared PodOverrides struct so that spawned Task pods can attach extra volumes, mount them into the agent container, and declare pod- and container-level securityContext. This unblocks PSS-restricted namespaces and use cases that need extra mounts (writable profile dirs, CA bundles, custom CLI configs). Field semantics: - Volumes: appended to the Job's PodSpec.Volumes; user-supplied names must not collide with the Kelos-managed names "workspace" and "kelos-plugin", and duplicates are rejected. - VolumeMounts: appended to the agent container's mounts. A flat list rather than a map keyed by container name -- only the agent container is exposed; init containers stay private. - PodSecurityContext: replaces the pod-level securityContext set by the controller. As a carve-out, FSGroup is preserved from Kelos's default when the user does not set it explicitly, so the agent user retains read/write access to the workspace volume. - ContainerSecurityContext: replaces the agent container's securityContext (allowPrivilegeEscalation=false, capabilities.drop=[ALL], readOnlyRootFilesystem=true, etc.) so the pod can land in a PSS-restricted namespace. Also fixes a latent bug in helmchart.Render: rendered templates were being split into YAML documents with strings.Split(content, "---\n"), which corrupts documents whose description text happens to contain "---\n". The new corev1 types pulled into the CRD surface this -- PodSecurityContext.fsGroup's description contains the literal example "rw-rw----\n", which the naive split treats as a document boundary. Replaced with k8s.io/apimachinery/pkg/util/yaml's reader, which only splits on lines that exactly match "---". Existing render tests that relied on a substring search for ":latest" were also tightened to match real image references rather than CRD description text that mentions ":latest" in prose.

Add an ordered list field `agentConfigRefs` alongside the existing singular `agentConfigRef` on Task and TaskTemplate types. Multiple AgentConfigs are merged in order at task creation time: agentsMD is concatenated, plugins/skills are appended, and mcpServers are appended with later entries winning on name collision. This eliminates duplicated instructions across AgentConfigs in multi-agent deployments by enabling composition — shared config lives in a base AgentConfig while role-specific layers add only their delta. Closes kelos-dev#693 Co-authored-by: Claude Opus 4.6 <[email protected]>

docs: Document GenericWebhook TaskSpawner source

…ent-configs feat(api): add agentConfigRefs for composable multi-layer AgentConfig…

The taskbuilder renderer uses Option("missingkey=error"), and ExtractGitHubWorkItem omits Branch/URL/Body keys when their values are empty. Direct access like {{if .Branch}} therefore errors at render time on issue events, blocking task creation in kelos-api-reviewer. Switch to {{with index . "Foo"}} which falls through cleanly when a key is absent. Also drop {{.Event | title}} from example 10, since `title` is not registered in the renderer's FuncMap and would fail to parse.

…egration Reframe the intro and Core Primitives around Kelos's two concerns: defining the agent together with its runtime environment, and defining how agents integrate with workflows. Move the self-development write-up out of the intro into a "Case Study: Kelos Developing Kelos" section after Orchestration Patterns so it stays visible without dominating the top of the page.

…key-branch Fix template render errors when optional GitHub keys are absent

docs: Refocus README on agent+environment and workflow integration

The kelos-workers TaskSpawner gated pick-up on the `actor/kelos` label, which is unnecessary — `/kelos pick-up` from an authorized author (gjkim42 or kelos-bot[bot]) is already a sufficient trigger. Drop the label filter and update the docs to match.

Append `@<login>` next to the PR number in each release-note line so readers can quickly see who contributed each change.

Remove actor/kelos label requirement from kelos-workers

…e-note Add PR author to generated release notes

…expansion Expand PodOverrides with volumes, volumeMounts, and securityContext

greptile-apps · 2026-04-29T15:41:32Z

Greptile Summary

This PR introduces three main features: (1) an agentConfigRefs plural field on TaskSpec/TaskTemplate that allows merging multiple AgentConfig resources in order, (2) new PodOverrides fields (volumes, volumeMounts, podSecurityContext, containerSecurityContext) for PSS-restricted namespace support, and (3) a generic webhook source for arbitrary HTTP POST triggers with JSONPath field mapping. It also fixes a Helm chart YAML-splitting bug caused by rw-rw---- in CRD descriptions matching the naive --- separator, and hardens the CI stale-run detection to avoid blocking on an eventually-consistent workflow listing.

Confidence Score: 4/5

PR is safe to merge; all findings are P2 style/quality suggestions with no functional impact on the changed paths.

No P0 or P1 issues found. Two P2 observations: a shallow-copy inconsistency in the single-element merge fast path and a missing comment on the poll-interval display function's intentionally limited scope. The core logic (merge, volume validation, FSGroup preservation, CI run-supersession check) is well-tested and correct.

internal/controller/agentconfig_merge.go (shallow copy fast path)

Sequence Diagram

sequenceDiagram
    participant User as User/TaskSpawner
    participant TC as TaskController
    participant K8s as Kubernetes API
    participant JB as JobBuilder

    User->>K8s: Create Task (agentConfigRefs: [base, role])
    TC->>TC: ResolveAgentConfigRefs(task.Spec)
    loop For each AgentConfigRef
        TC->>K8s: Get AgentConfig by name
        K8s-->>TC: AgentConfigSpec
    end
    TC->>TC: MergeAgentConfigs([base.Spec, role.Spec])
    Note over TC: agentsMD concatenated, plugins/skills appended,
    Note over TC: mcpServers: later wins on collision
    TC->>JB: Build(task, workspace, mergedConfig, prompt)
    JB->>JB: validateUserVolumes (reserved name check)
    JB->>JB: Apply PodSecurityContext (preserve FSGroup)
    JB->>JB: Apply ContainerSecurityContext
    JB-->>TC: batchv1.Job
    TC->>K8s: Create Job

Comments Outside Diff (2)

internal/controller/agentconfig_merge.go, line 1587-1590 (link)

Shallow copy in single-config fast path

The single-element path returns &result where result is a copy of configs[0]. Since Go copies slices by header (pointer + len + cap), the returned Plugins, Skills, and MCPServers slices share their underlying arrays with the original. The multi-element path below always allocates fresh slices. Any caller that appends to the returned spec would silently mutate the original AgentConfigSpec. Consider allocating fresh slices for consistency:

if len(configs) == 1 {
    result := configs[0]
    merged := kelosv1alpha1.AgentConfigSpec{AgentsMD: result.AgentsMD}
    merged.Plugins = append(merged.Plugins, result.Plugins...)
    merged.Skills = append(merged.Skills, result.Skills...)
    merged.MCPServers = append(merged.MCPServers, result.MCPServers...)
    return &merged
}

Prompt To Fix With AI

This is a comment left during a code review.
Path: internal/controller/agentconfig_merge.go
Line: 1587-1590

Comment:
**Shallow copy in single-config fast path**

The single-element path returns `&result` where `result` is a copy of `configs[0]`. Since Go copies slices by header (pointer + len + cap), the returned `Plugins`, `Skills`, and `MCPServers` slices share their underlying arrays with the original. The multi-element path below always allocates fresh slices. Any caller that appends to the returned spec would silently mutate the original `AgentConfigSpec`. Consider allocating fresh slices for consistency:

```go
if len(configs) == 1 {
    result := configs[0]
    merged := kelosv1alpha1.AgentConfigSpec{AgentsMD: result.AgentsMD}
    merged.Plugins = append(merged.Plugins, result.Plugins...)
    merged.Skills = append(merged.Skills, result.Skills...)
    merged.MCPServers = append(merged.MCPServers, result.MCPServers...)
    return &merged
}
```

How can I resolve this? If you propose a fix, please make it concise.

internal/cli/printer.go, line 1330-1340 (link)

effectivePollInterval silently drops LinearWebhook/GenericWebhook/Cron sources

The function only checks GitHubIssues, GitHubPullRequests, and Jira for per-source overrides. If a new polling source with its own PollInterval is added later, the display will silently show the deprecated top-level value. Adding a comment marking the fall-through as intentional would prevent future confusion.

Prompt To Fix With AI

This is a comment left during a code review.
Path: internal/cli/printer.go
Line: 1330-1340

Comment:
**`effectivePollInterval` silently drops LinearWebhook/GenericWebhook/Cron sources**

The function only checks `GitHubIssues`, `GitHubPullRequests`, and `Jira` for per-source overrides. If a new polling source with its own `PollInterval` is added later, the display will silently show the deprecated top-level value. Adding a comment marking the fall-through as intentional would prevent future confusion.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: internal/controller/agentconfig_merge.go
Line: 1587-1590

Comment:
**Shallow copy in single-config fast path**

The single-element path returns `&result` where `result` is a copy of `configs[0]`. Since Go copies slices by header (pointer + len + cap), the returned `Plugins`, `Skills`, and `MCPServers` slices share their underlying arrays with the original. The multi-element path below always allocates fresh slices. Any caller that appends to the returned spec would silently mutate the original `AgentConfigSpec`. Consider allocating fresh slices for consistency:

```go
if len(configs) == 1 {
    result := configs[0]
    merged := kelosv1alpha1.AgentConfigSpec{AgentsMD: result.AgentsMD}
    merged.Plugins = append(merged.Plugins, result.Plugins...)
    merged.Skills = append(merged.Skills, result.Skills...)
    merged.MCPServers = append(merged.MCPServers, result.MCPServers...)
    return &merged
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: internal/cli/printer.go
Line: 1330-1340

Comment:
**`effectivePollInterval` silently drops LinearWebhook/GenericWebhook/Cron sources**

The function only checks `GitHubIssues`, `GitHubPullRequests`, and `Jira` for per-source overrides. If a new polling source with its own `PollInterval` is added later, the display will silently show the deprecated top-level value. Adding a comment marking the fall-through as intentional would prevent future confusion.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Merge pull request #1041 from JustinElst..." | Re-trigger Greptile}

gjkim42 and others added 30 commits April 27, 2026 13:32

Merge pull request kelos-dev#1023 from kelos-dev/kelos-task-986

fa227bd

Show effective poll interval in kelos get taskspawner

Update claude-code image to 2.1.121

9235d0f

Update opencode image to 1.14.28

1020243

Add configurable service type for webhook Services

03a6a87

Merge pull request kelos-dev#1027 from kelos-dev/kelos-task-1025

e4fc46b

Expand ~ in @file paths for CLI config

Merge pull request kelos-dev#1022 from kelos-dev/kelos-task-1018

947612e

Add configurable service type for webhook Services

Unset CLAUDE_CODE_EFFORT_LEVEL in self-development workflows

d1c0a10

Drop the explicit "max" effort level override from TaskSpawners and the fake-strategist Task so they fall back to the default effort.

Merge pull request kelos-dev#1034 from kelos-dev/unset-max-effort

b7823d2

Unset CLAUDE_CODE_EFFORT_LEVEL in self-development workflows

Merge pull request kelos-dev#1030 from kelos-dev/update-claude-code-i…

ac4ee32

…mage-2.1.121 Update claude-code image to 2.1.121

Merge pull request kelos-dev#1031 from kelos-dev/update-opencode-imag…

56bf5ab

…e-1.14.28 Update opencode image to 1.14.28

Merge pull request kelos-dev#1036 from kelos-dev/kelos-task-1032

b00db22

Align kelos-api-reviewer AgentConfig and prompt with kelos-reviewer

Merge pull request kelos-dev#1038 from kelos-dev/kelos-task-1029

6445dab

Always write pr-e2e status when test-e2e completes

Merge pull request kelos-dev#1035 from kelos-dev/kelos-task-1033

71d57d9

docs: Document GenericWebhook TaskSpawner source

Merge pull request kelos-dev#1014 from datagravity-ai/feat/list-of-ag…

f260417

…ent-configs feat(api): add agentConfigRefs for composable multi-layer AgentConfig…

Merge pull request kelos-dev#1053 from kelos-dev/fix-template-missing…

5393ed8

…key-branch Fix template render errors when optional GitHub keys are absent

Merge pull request kelos-dev#1052 from kelos-dev/update-readme

b38892a

docs: Refocus README on agent+environment and workflow integration

Add PR author to generated release notes

f8e4ab1

Append `@<login>` next to the PR number in each release-note line so readers can quickly see who contributed each change.

Merge pull request kelos-dev#1054 from kelos-dev/loosen-worker-filter

1425fcf

Remove actor/kelos label requirement from kelos-workers

Merge pull request kelos-dev#1055 from kelos-dev/add-author-to-releas…

c979962

…e-note Add PR author to generated release notes

Merge pull request kelos-dev#1041 from JustinElst/feat/pod-overrides-…

a36f1cd

…expansion Expand PodOverrides with volumes, volumeMounts, and securityContext

github-actions Bot added needs-triage needs-kind needs-priority needs-actor needs-release-note labels Apr 29, 2026

knechtionscoding closed this Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main#115

Main#115
knechtionscoding wants to merge 31 commits into
prodfrom
main

knechtionscoding commented Apr 29, 2026

Uh oh!

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

knechtionscoding commented Apr 29, 2026

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

greptile-apps Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading