self-development: add CRD, e2e, agent-image, and Dockerfile-pin conventions from recent reviews#1162
Open
kelos-bot[bot] wants to merge 3 commits into
Open
self-development: add CRD, e2e, agent-image, and Dockerfile-pin conventions from recent reviews#1162kelos-bot[bot] wants to merge 3 commits into
kelos-bot[bot] wants to merge 3 commits into
Conversation
Three new conventions drawn from the Kelos API Reviewer's blocking findings on #1154 (Support env.valueFrom in MCP server config): - CRD admission validation must match runtime checks. When embedding upstream Kubernetes types (e.g. corev1.EnvVar) but honoring only a subset of variants at reconcile time, restrict via CEL (+kubebuilder:validation:XValidation) or a narrowed Kelos type so admission rejects unsupported inputs at kubectl apply. - Mirror upstream Kubernetes semantics when reusing upstream types. E.g. EnvVar with valueFrom.secretKeyRef.optional: true and a missing key should skip the variable (kubelet behavior), not emit "". - Breaking CRD storage-shape changes (map -> list, etc.) require an explicit upgrade note in release-note covering re-application of existing in-cluster resources, not just the new YAML shape. Applied to AGENTS.md (with CLAUDE.md symlink), and the inline agentsMD copies in self-development/agentconfig.yaml and self-development/kelos-workers.yaml. Co-Authored-By: Claude Opus 4.7 <[email protected]>
…ions Add two agent conventions from recent review feedback: - Parameterize e2e tests across all supported agent types (PR #1175 review: "this should not be a codex specific e2e test"). The agent initially wrote a codex-only test for an AgentConfig with a plugin; the reviewer asked for it to run across all agent types, which the agent then fixed by wrapping the Describe block in a function invoked from `for _, cfg := range agentConfigs`. - Prefer agent-native model shorthands (e.g., `sonnet`, `opus`) over versioned IDs in docs and examples (PR #1177 review: "why don't we use a shorthand?"). The original docs showed only `claude-sonnet-4-20250514`; the reviewer asked for shorthand, and the agent revised to show `sonnet` alongside versioned IDs. The new bullets are added to AGENTS.md (CLAUDE.md is a symlink) and the inline agentsMD copies in self-development/agentconfig.yaml and self-development/kelos-workers.yaml, matching the existing pattern. Co-Authored-By: Claude Opus 4.7 <[email protected]>
… conventions Add three conventions backed by review feedback on PRs from the past 7 days: - "Wire new agent runtime images through the full stack in one PR" — from the /kelos api-review on PR #1194 (Antigravity CLI). The PR added an image directory + kind-load step but did not extend the Task.spec.type enum, JobBuilder, credential env-var mapping, agent-image-interface.md, or e2e coverage, so `type: antigravity` is rejected at admission. - "Pin third-party CLI installs in agent-image Dockerfiles" — from cubic (P2) and kelos-bot (P3) findings on PR #1194's antigravity/Dockerfile:41 unversioned `curl install.sh | bash`. The sibling images already use ARG XXX_VERSION + pinned URLs. - "Propagate deferred Close()/Flush() errors via a named return" — from the cubic (P2) and kelos-bot (P2) findings on PR #1189 internal/capture/usage.go:46 silent `defer bw.Flush()` error drop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Kelos Agent @gjkim42
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Adds eight new agent conventions to
AGENTS.md(withCLAUDE.mdsymlink) and the inlineagentsMDcopies inself-development/agentconfig.yamlandself-development/kelos-workers.yaml. Each rule is backed by specific review findings on kelos-bot–generated PRs from the past 7 days.Original motivating PRs (initial commit):
PR #1154 (
feat: Support env.valueFrom in MCP server config). The Kelos API Reviewer flagged three independent issues:CRD admission validation must match runtime checks. The PR's controller rejects
valueFrom.fieldRef/resourceFieldRef/fileKeyRefat reconcile time, but the generated CRD (which embeds the fullcorev1.EnvVarschema) lets admission accept them. Reviewer's verdict: "once a CRD ships, what its schema accepts is effectively part of the contract — please tighten this at admission rather than at reconcile", suggesting+kubebuilder:validation:XValidationCEL rules or a narrowed Kelos-specific type. Cubic flagged a closely related issue: "valueFromis not validated as exclusive, so unsupportedfieldRef/resourceFieldRefcan slip through". ➜ Codified.Mirror upstream Kubernetes semantics when reusing upstream types. The PR's resolver emits an entry with empty
ValuewhenvalueFrom.secretKeyRef.optional: trueand the secret/key is missing. Reviewer pointed out that kubelet, for a podEnvVarwithOptional=trueand a missing key, does not set the variable on the container — surfacingFOO=""is "a non-obvious foot-gun". ➜ Codified.Breaking CRD storage-shape changes need explicit upgrade notes in
release-note. The PR changesmcpServers[].envfrommap[string]stringto[]corev1.EnvVar(object → array). Reviewer: "This is a hard break for stored v1alpha1 AgentConfig objects… but the release-note only mentions the YAML shape change. Suggest extending the release note (or adding an upgrade note) to call out that existing in-cluster AgentConfig objects withmcpServers[].envset need to be re-applied after the upgrade." ➜ Codified.PR #1175 (
Drop removed --enable skills flag from codex entrypoint). Reviewer feedback from gjkim42: "this should not be a codex specific e2e test" (ontest/e2e/skills_test.go). The agent initially added a codex-only e2e test for a Task with an AgentConfig that has a plugin; after the request, the agent rewrote it to parameterize acrossagentConfigs. ➜ Codified as a convention to always parameterize e2e tests for shared agent functionality across all supported agent types.PR #1177 (
docs: refresh model ID examples and document shorthand format). Reviewer feedback from gjkim42: "why don't we use a shorthand?". The original docs showed only versioned IDs likeclaude-sonnet-4-20250514; the agent revised to show shorthand (sonnet,opus) alongside the versioned form. ➜ Codified as a convention to prefer agent-native model shorthands in docs/examples.Additional motivating PRs (follow-up commit):
PR #1194 (
Wire Antigravity CLI as a first-class agent type— OPEN). The PR adds anantigravity/image directory plus a kind-load step but does not extend theTask.spec.type/TaskTemplate.type+kubebuilder:validation:Enum,JobBuilder(internal/controller/job_builder.go), theapiKeyEnvVar/oauthEnvVarcredential mapping,docs/agent-image-interface.md, or any e2e exercising the new type. The Kelos API Reviewer flagged: "the new agent is not reachable through the existing CRD API — that itself is an API gap worth resolving (or explicitly deferring) before merge." The same PR also drew two reproducibility findings:antigravity/Dockerfile:41: "Theagyinstall step is unpinned, so rebuilds can silently pick up new upstream versions and produce non-reproducible images."bash <(curl https://antigravity.google/cli/install.sh)fetches the latestagybuild with no version arg and no checksum verification, so two rebuilds of the same Dockerfile commit can yield differentagybinaries… matching theGEMINI_CLI_VERSION/OPENCODE_VERSION/CURSOR_CLI_VERSIONpattern in the sibling images."➜ Codified two conventions: end-to-end wiring required when adding a new agent runtime image, and pinning third-party CLI installs in agent-image Dockerfiles.
PR #1189 (
Stream agent output through kelos-capture instead of writing to /tmp— OPEN). Both cubic [P2] and the Kelos reviewer [P2] flagged the same issue oninternal/capture/usage.go:46: "Final flush error gets dropped. Buffered write can fail at EOF and function still returns success. Check and returnbw.Flush()error." / "defer bw.Flush()discards the flush error… if it fails (e.g., transient stdout/pod-log writer failure),StreamUsagereturnsniland the trailing buffered bytes silently disappear from pod logs. Capture the flush error explicitly so it surfaces, e.g. via a named return anddefer func() { if ferr := bw.Flush(); err == nil { err = ferr } }()." ➜ Codified as a convention to propagate deferredClose()/Flush()errors via a named return.Which issue(s) this PR is related to:
N/A
Special notes for your reviewer:
AGENTS.mdis a real file;CLAUDE.mdis a symlink to it, so a single edit covers both surfaces.agentsMDinself-development/agentconfig.yamlandself-development/kelos-workers.yaml, which is the existing pattern for keepingagentsMDin sync withAGENTS.md.Does this PR introduce a user-facing change?
🤖 Generated with Claude Code