Skip to content

self-development: add CRD, e2e, agent-image, and Dockerfile-pin conventions from recent reviews#1162

Open
kelos-bot[bot] wants to merge 3 commits into
mainfrom
kelos-config-update-20260519-1800
Open

self-development: add CRD, e2e, agent-image, and Dockerfile-pin conventions from recent reviews#1162
kelos-bot[bot] wants to merge 3 commits into
mainfrom
kelos-config-update-20260519-1800

Conversation

@kelos-bot
Copy link
Copy Markdown

@kelos-bot kelos-bot Bot commented May 19, 2026

🤖 Kelos Agent @gjkim42

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Adds eight new agent conventions to AGENTS.md (with CLAUDE.md symlink) and the inline agentsMD copies in self-development/agentconfig.yaml and self-development/kelos-workers.yaml. Each rule is backed by specific review findings on kelos-bot–generated PRs from the past 7 days.

Original motivating PRs (initial commit):

PR #1154 (feat: Support env.valueFrom in MCP server config). The Kelos API Reviewer flagged three independent issues:

  1. CRD admission validation must match runtime checks. The PR's controller rejects valueFrom.fieldRef/resourceFieldRef/fileKeyRef at reconcile time, but the generated CRD (which embeds the full corev1.EnvVar schema) lets admission accept them. Reviewer's verdict: "once a CRD ships, what its schema accepts is effectively part of the contract — please tighten this at admission rather than at reconcile", suggesting +kubebuilder:validation:XValidation CEL rules or a narrowed Kelos-specific type. Cubic flagged a closely related issue: "valueFrom is not validated as exclusive, so unsupported fieldRef/resourceFieldRef can slip through". ➜ Codified.

  2. Mirror upstream Kubernetes semantics when reusing upstream types. The PR's resolver emits an entry with empty Value when valueFrom.secretKeyRef.optional: true and the secret/key is missing. Reviewer pointed out that kubelet, for a pod EnvVar with Optional=true and a missing key, does not set the variable on the container — surfacing FOO="" is "a non-obvious foot-gun". ➜ Codified.

  3. Breaking CRD storage-shape changes need explicit upgrade notes in release-note. The PR changes mcpServers[].env from map[string]string to []corev1.EnvVar (object → array). Reviewer: "This is a hard break for stored v1alpha1 AgentConfig objects… but the release-note only mentions the YAML shape change. Suggest extending the release note (or adding an upgrade note) to call out that existing in-cluster AgentConfig objects with mcpServers[].env set need to be re-applied after the upgrade." ➜ Codified.

PR #1175 (Drop removed --enable skills flag from codex entrypoint). Reviewer feedback from gjkim42: "this should not be a codex specific e2e test" (on test/e2e/skills_test.go). The agent initially added a codex-only e2e test for a Task with an AgentConfig that has a plugin; after the request, the agent rewrote it to parameterize across agentConfigs. ➜ Codified as a convention to always parameterize e2e tests for shared agent functionality across all supported agent types.

PR #1177 (docs: refresh model ID examples and document shorthand format). Reviewer feedback from gjkim42: "why don't we use a shorthand?". The original docs showed only versioned IDs like claude-sonnet-4-20250514; the agent revised to show shorthand (sonnet, opus) alongside the versioned form. ➜ Codified as a convention to prefer agent-native model shorthands in docs/examples.

Additional motivating PRs (follow-up commit):

PR #1194 (Wire Antigravity CLI as a first-class agent type — OPEN). The PR adds an antigravity/ image directory plus a kind-load step but does not extend the Task.spec.type / TaskTemplate.type +kubebuilder:validation:Enum, JobBuilder (internal/controller/job_builder.go), the apiKeyEnvVar/oauthEnvVar credential mapping, docs/agent-image-interface.md, or any e2e exercising the new type. The Kelos API Reviewer flagged: "the new agent is not reachable through the existing CRD API — that itself is an API gap worth resolving (or explicitly deferring) before merge." The same PR also drew two reproducibility findings:

  1. Cubic [P2] on antigravity/Dockerfile:41: "The agy install step is unpinned, so rebuilds can silently pick up new upstream versions and produce non-reproducible images."
  2. Kelos reviewer [P3] on the same line: "bash <(curl https://antigravity.google/cli/install.sh) fetches the latest agy build with no version arg and no checksum verification, so two rebuilds of the same Dockerfile commit can yield different agy binaries… matching the GEMINI_CLI_VERSION/OPENCODE_VERSION/CURSOR_CLI_VERSION pattern in the sibling images."

➜ Codified two conventions: end-to-end wiring required when adding a new agent runtime image, and pinning third-party CLI installs in agent-image Dockerfiles.

PR #1189 (Stream agent output through kelos-capture instead of writing to /tmp — OPEN). Both cubic [P2] and the Kelos reviewer [P2] flagged the same issue on internal/capture/usage.go:46: "Final flush error gets dropped. Buffered write can fail at EOF and function still returns success. Check and return bw.Flush() error." / "defer bw.Flush() discards the flush error… if it fails (e.g., transient stdout/pod-log writer failure), StreamUsage returns nil and the trailing buffered bytes silently disappear from pod logs. Capture the flush error explicitly so it surfaces, e.g. via a named return and defer func() { if ferr := bw.Flush(); err == nil { err = ferr } }()." ➜ Codified as a convention to propagate deferred Close()/Flush() errors via a named return.

Which issue(s) this PR is related to:

N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

🤖 Generated with Claude Code

Three new conventions drawn from the Kelos API Reviewer's blocking
findings on #1154 (Support env.valueFrom in MCP server config):

- CRD admission validation must match runtime checks. When embedding
  upstream Kubernetes types (e.g. corev1.EnvVar) but honoring only a
  subset of variants at reconcile time, restrict via CEL
  (+kubebuilder:validation:XValidation) or a narrowed Kelos type so
  admission rejects unsupported inputs at kubectl apply.

- Mirror upstream Kubernetes semantics when reusing upstream types.
  E.g. EnvVar with valueFrom.secretKeyRef.optional: true and a missing
  key should skip the variable (kubelet behavior), not emit "".

- Breaking CRD storage-shape changes (map -> list, etc.) require an
  explicit upgrade note in release-note covering re-application of
  existing in-cluster resources, not just the new YAML shape.

Applied to AGENTS.md (with CLAUDE.md symlink), and the inline
agentsMD copies in self-development/agentconfig.yaml and
self-development/kelos-workers.yaml.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Re-trigger cubic

…ions

Add two agent conventions from recent review feedback:

- Parameterize e2e tests across all supported agent types (PR #1175
  review: "this should not be a codex specific e2e test"). The agent
  initially wrote a codex-only test for an AgentConfig with a plugin;
  the reviewer asked for it to run across all agent types, which the
  agent then fixed by wrapping the Describe block in a function
  invoked from `for _, cfg := range agentConfigs`.

- Prefer agent-native model shorthands (e.g., `sonnet`, `opus`) over
  versioned IDs in docs and examples (PR #1177 review: "why don't we
  use a shorthand?"). The original docs showed only
  `claude-sonnet-4-20250514`; the reviewer asked for shorthand, and
  the agent revised to show `sonnet` alongside versioned IDs.

The new bullets are added to AGENTS.md (CLAUDE.md is a symlink) and
the inline agentsMD copies in self-development/agentconfig.yaml and
self-development/kelos-workers.yaml, matching the existing pattern.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
@kelos-bot kelos-bot Bot changed the title self-development: add CRD API-design conventions from PR #1154 review self-development: add CRD, e2e, and model-shorthand conventions from recent reviews May 21, 2026
… conventions

Add three conventions backed by review feedback on PRs from the past 7 days:

- "Wire new agent runtime images through the full stack in one PR" — from
  the /kelos api-review on PR #1194 (Antigravity CLI). The PR added an image
  directory + kind-load step but did not extend the Task.spec.type enum,
  JobBuilder, credential env-var mapping, agent-image-interface.md, or e2e
  coverage, so `type: antigravity` is rejected at admission.
- "Pin third-party CLI installs in agent-image Dockerfiles" — from cubic
  (P2) and kelos-bot (P3) findings on PR #1194's antigravity/Dockerfile:41
  unversioned `curl install.sh | bash`. The sibling images already use
  ARG XXX_VERSION + pinned URLs.
- "Propagate deferred Close()/Flush() errors via a named return" — from
  the cubic (P2) and kelos-bot (P2) findings on PR #1189
  internal/capture/usage.go:46 silent `defer bw.Flush()` error drop.
@kelos-bot kelos-bot Bot changed the title self-development: add CRD, e2e, and model-shorthand conventions from recent reviews self-development: add CRD, e2e, agent-image, and Dockerfile-pin conventions from recent reviews May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant