Skip to content

Main#115

Closed
knechtionscoding wants to merge 31 commits into
prodfrom
main
Closed

Main#115
knechtionscoding wants to merge 31 commits into
prodfrom
main

Conversation

@knechtionscoding
Copy link
Copy Markdown
Collaborator

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?


gjkim42 and others added 30 commits April 27, 2026 13:32
The CLI printer was reading only the deprecated top-level
spec.pollInterval, so users who set the recommended per-source
pollInterval (e.g., spec.when.githubIssues.pollInterval) saw the stale
deprecated default and assumed their override was ignored.

Mirror the resolution in cmd/kelos-spawner/reconciler.go: the active
source's pollInterval takes precedence, and we fall back to
spec.pollInterval only when no per-source override is set.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
The Quick Start documents `oauthToken: "@~/.codex/auth.json"`, but the
Go runtime does not perform shell-style tilde expansion. resolveContent
called os.ReadFile with the literal path, producing
"open ~/.codex/auth.json: no such file or directory".

Resolve a leading "~" or "~/" to os.UserHomeDir() before reading, so the
documented form works as advertised. The user-facing error still reports
the original path. Add unit tests for both expandHome and the @~/ form
through the run dry-run flow.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Show effective poll interval in kelos get taskspawner
Add configurable service type for webhook Services
Drop the explicit "max" effort level override from TaskSpawners and
the fake-strategist Task so they fall back to the default effort.
Unset CLAUDE_CODE_EFFORT_LEVEL in self-development workflows
- Add the resolve-previous-review-threads rule to Standards so re-runs do
  not pile up stale inline comments on the same PR
- Qualify the Makefile-targets convention with "When referencing project
  validation commands" since the api-reviewer is read-only and does not
  run validation locally
- Replace the literal `{owner}/{repo}` segment in the inline review
  example with `gh api`'s `:owner/:repo` placeholders so the documented
  command does not 404
…mage-2.1.121

Update claude-code image to 2.1.121
…e-1.14.28

Update opencode image to 1.14.28
Align kelos-api-reviewer AgentConfig and prompt with kelos-reviewer
The report-pr-e2e-status job in CI was treating listWorkflowRunsForRepo
as a precondition for writing the commit status: if the listing returned
no runs (or only stale runs) for the head SHA, it logged a message and
returned without calling createCommitStatus, leaving the previously-set
pending value untouched.

That listing is eventually consistent. The in-flight current run is not
always indexed at the moment the report step queries the API, so a
benign race could leave pr-e2e stuck on pending indefinitely with no
way to recover except a CI re-run.

Make the listing advisory: it is used only to detect a strictly newer
run that has superseded the current one, and any failure or empty
result is logged and ignored so that the status is still written from
needs.test-e2e.result.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Always write pr-e2e status when test-e2e completes
Add user-facing docs for the spec.when.webhook (GenericWebhook) source
type that has been merged but undocumented:

- README.md: list Generic Webhooks (and Linear Webhooks) in the
  TaskSpawner integration summary.
- docs/integration.md: add a Generic Webhooks section covering source,
  fieldMapping, filters, setup, and template variables; add a Generic
  Webhook column to the source-vs-variable table.
- docs/reference.md: add spec.when.webhook.* rows to the TaskSpawner
  field reference and a Generic Webhook column to the promptTemplate
  variables table; mark workspaceRef as required for webhook sources.
- examples/13-taskspawner-generic-webhook: add a Sentry-driven example
  with annotated YAML and a README walkthrough.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
The previous docs claimed the generic webhook endpoint validates each
delivery against a per-source <SOURCE>_WEBHOOK_SECRET via
X-Hub-Signature-256, but the handler does not currently read or verify
any signature header — operators following the docs would deploy an
endpoint they believed was HMAC-protected but that actually accepts any
unauthenticated POST.

Replace the HMAC-validation language with a security warning that
describes the current state honestly, recommends restricting access at
the network layer (NetworkPolicy / Ingress / Gateway IP allowlists),
and links to the follow-up issue tracking the validation work. Note
the chart's webhookServer.sources.generic.secretName field as reserved
for future use.

Also list examples/12-taskspawner-file-patterns/ in the examples index
so it no longer skips from 11 to 13.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Adds four optional fields to the shared PodOverrides struct so that
spawned Task pods can attach extra volumes, mount them into the agent
container, and declare pod- and container-level securityContext. This
unblocks PSS-restricted namespaces and use cases that need extra
mounts (writable profile dirs, CA bundles, custom CLI configs).

Field semantics:

- Volumes: appended to the Job's PodSpec.Volumes; user-supplied names
  must not collide with the Kelos-managed names "workspace" and
  "kelos-plugin", and duplicates are rejected.
- VolumeMounts: appended to the agent container's mounts. A flat list
  rather than a map keyed by container name -- only the agent
  container is exposed; init containers stay private.
- PodSecurityContext: replaces the pod-level securityContext set by
  the controller. As a carve-out, FSGroup is preserved from Kelos's
  default when the user does not set it explicitly, so the agent user
  retains read/write access to the workspace volume.
- ContainerSecurityContext: replaces the agent container's
  securityContext (allowPrivilegeEscalation=false,
  capabilities.drop=[ALL], readOnlyRootFilesystem=true, etc.) so the
  pod can land in a PSS-restricted namespace.

Also fixes a latent bug in helmchart.Render: rendered templates were
being split into YAML documents with strings.Split(content, "---\n"),
which corrupts documents whose description text happens to contain
"---\n". The new corev1 types pulled into the CRD surface this --
PodSecurityContext.fsGroup's description contains the literal example
"rw-rw----\n", which the naive split treats as a document boundary.
Replaced with k8s.io/apimachinery/pkg/util/yaml's reader, which only
splits on lines that exactly match "---". Existing render tests that
relied on a substring search for ":latest" were also tightened to
match real image references rather than CRD description text that
mentions ":latest" in prose.
Add an ordered list field `agentConfigRefs` alongside the existing singular `agentConfigRef` on Task and TaskTemplate types. Multiple AgentConfigs are merged in order at task creation time: agentsMD is concatenated, plugins/skills are appended, and mcpServers are appended
with later entries winning on name collision.

This eliminates duplicated instructions across AgentConfigs in multi-agent deployments by enabling composition — shared config lives in a base AgentConfig while role-specific layers add only their delta.

Closes kelos-dev#693

Co-authored-by: Claude Opus 4.6 <[email protected]>
docs: Document GenericWebhook TaskSpawner source
…ent-configs

feat(api): add agentConfigRefs for composable multi-layer AgentConfig…
The taskbuilder renderer uses Option("missingkey=error"), and
ExtractGitHubWorkItem omits Branch/URL/Body keys when their values
are empty. Direct access like {{if .Branch}} therefore errors at
render time on issue events, blocking task creation in
kelos-api-reviewer. Switch to {{with index . "Foo"}} which falls
through cleanly when a key is absent.

Also drop {{.Event | title}} from example 10, since `title` is not
registered in the renderer's FuncMap and would fail to parse.
…egration

Reframe the intro and Core Primitives around Kelos's two concerns:
defining the agent together with its runtime environment, and defining
how agents integrate with workflows. Move the self-development write-up
out of the intro into a "Case Study: Kelos Developing Kelos" section
after Orchestration Patterns so it stays visible without dominating the
top of the page.
…key-branch

Fix template render errors when optional GitHub keys are absent
docs: Refocus README on agent+environment and workflow integration
The kelos-workers TaskSpawner gated pick-up on the `actor/kelos` label,
which is unnecessary — `/kelos pick-up` from an authorized author
(gjkim42 or kelos-bot[bot]) is already a sufficient trigger.

Drop the label filter and update the docs to match.
Append `@<login>` next to the PR number in each release-note line so
readers can quickly see who contributed each change.
Remove actor/kelos label requirement from kelos-workers
…e-note

Add PR author to generated release notes
…expansion

Expand PodOverrides with volumes, volumeMounts, and securityContext
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR introduces three main features: (1) an agentConfigRefs plural field on TaskSpec/TaskTemplate that allows merging multiple AgentConfig resources in order, (2) new PodOverrides fields (volumes, volumeMounts, podSecurityContext, containerSecurityContext) for PSS-restricted namespace support, and (3) a generic webhook source for arbitrary HTTP POST triggers with JSONPath field mapping. It also fixes a Helm chart YAML-splitting bug caused by rw-rw---- in CRD descriptions matching the naive --- separator, and hardens the CI stale-run detection to avoid blocking on an eventually-consistent workflow listing.

Confidence Score: 4/5

PR is safe to merge; all findings are P2 style/quality suggestions with no functional impact on the changed paths.

No P0 or P1 issues found. Two P2 observations: a shallow-copy inconsistency in the single-element merge fast path and a missing comment on the poll-interval display function's intentionally limited scope. The core logic (merge, volume validation, FSGroup preservation, CI run-supersession check) is well-tested and correct.

internal/controller/agentconfig_merge.go (shallow copy fast path)

Sequence Diagram

sequenceDiagram
    participant User as User/TaskSpawner
    participant TC as TaskController
    participant K8s as Kubernetes API
    participant JB as JobBuilder

    User->>K8s: Create Task (agentConfigRefs: [base, role])
    TC->>TC: ResolveAgentConfigRefs(task.Spec)
    loop For each AgentConfigRef
        TC->>K8s: Get AgentConfig by name
        K8s-->>TC: AgentConfigSpec
    end
    TC->>TC: MergeAgentConfigs([base.Spec, role.Spec])
    Note over TC: agentsMD concatenated, plugins/skills appended,
    Note over TC: mcpServers: later wins on collision
    TC->>JB: Build(task, workspace, mergedConfig, prompt)
    JB->>JB: validateUserVolumes (reserved name check)
    JB->>JB: Apply PodSecurityContext (preserve FSGroup)
    JB->>JB: Apply ContainerSecurityContext
    JB-->>TC: batchv1.Job
    TC->>K8s: Create Job
Loading

Comments Outside Diff (2)

  1. internal/controller/agentconfig_merge.go, line 1587-1590 (link)

    P2 Shallow copy in single-config fast path

    The single-element path returns &result where result is a copy of configs[0]. Since Go copies slices by header (pointer + len + cap), the returned Plugins, Skills, and MCPServers slices share their underlying arrays with the original. The multi-element path below always allocates fresh slices. Any caller that appends to the returned spec would silently mutate the original AgentConfigSpec. Consider allocating fresh slices for consistency:

    if len(configs) == 1 {
        result := configs[0]
        merged := kelosv1alpha1.AgentConfigSpec{AgentsMD: result.AgentsMD}
        merged.Plugins = append(merged.Plugins, result.Plugins...)
        merged.Skills = append(merged.Skills, result.Skills...)
        merged.MCPServers = append(merged.MCPServers, result.MCPServers...)
        return &merged
    }
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: internal/controller/agentconfig_merge.go
    Line: 1587-1590
    
    Comment:
    **Shallow copy in single-config fast path**
    
    The single-element path returns `&result` where `result` is a copy of `configs[0]`. Since Go copies slices by header (pointer + len + cap), the returned `Plugins`, `Skills`, and `MCPServers` slices share their underlying arrays with the original. The multi-element path below always allocates fresh slices. Any caller that appends to the returned spec would silently mutate the original `AgentConfigSpec`. Consider allocating fresh slices for consistency:
    
    ```go
    if len(configs) == 1 {
        result := configs[0]
        merged := kelosv1alpha1.AgentConfigSpec{AgentsMD: result.AgentsMD}
        merged.Plugins = append(merged.Plugins, result.Plugins...)
        merged.Skills = append(merged.Skills, result.Skills...)
        merged.MCPServers = append(merged.MCPServers, result.MCPServers...)
        return &merged
    }
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

  2. internal/cli/printer.go, line 1330-1340 (link)

    P2 effectivePollInterval silently drops LinearWebhook/GenericWebhook/Cron sources

    The function only checks GitHubIssues, GitHubPullRequests, and Jira for per-source overrides. If a new polling source with its own PollInterval is added later, the display will silently show the deprecated top-level value. Adding a comment marking the fall-through as intentional would prevent future confusion.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: internal/cli/printer.go
    Line: 1330-1340
    
    Comment:
    **`effectivePollInterval` silently drops LinearWebhook/GenericWebhook/Cron sources**
    
    The function only checks `GitHubIssues`, `GitHubPullRequests`, and `Jira` for per-source overrides. If a new polling source with its own `PollInterval` is added later, the display will silently show the deprecated top-level value. Adding a comment marking the fall-through as intentional would prevent future confusion.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

Fix All in Claude Code

Prompt To Fix All With AI
This is a comment left during a code review.
Path: internal/controller/agentconfig_merge.go
Line: 1587-1590

Comment:
**Shallow copy in single-config fast path**

The single-element path returns `&result` where `result` is a copy of `configs[0]`. Since Go copies slices by header (pointer + len + cap), the returned `Plugins`, `Skills`, and `MCPServers` slices share their underlying arrays with the original. The multi-element path below always allocates fresh slices. Any caller that appends to the returned spec would silently mutate the original `AgentConfigSpec`. Consider allocating fresh slices for consistency:

```go
if len(configs) == 1 {
    result := configs[0]
    merged := kelosv1alpha1.AgentConfigSpec{AgentsMD: result.AgentsMD}
    merged.Plugins = append(merged.Plugins, result.Plugins...)
    merged.Skills = append(merged.Skills, result.Skills...)
    merged.MCPServers = append(merged.MCPServers, result.MCPServers...)
    return &merged
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: internal/cli/printer.go
Line: 1330-1340

Comment:
**`effectivePollInterval` silently drops LinearWebhook/GenericWebhook/Cron sources**

The function only checks `GitHubIssues`, `GitHubPullRequests`, and `Jira` for per-source overrides. If a new polling source with its own `PollInterval` is added later, the display will silently show the deprecated top-level value. Adding a comment marking the fall-through as intentional would prevent future confusion.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Merge pull request #1041 from JustinElst..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants