Skip to content

API: Add Workspace.spec.setupCommand for pre-agent setup #1051

@gjkim42

Description

@gjkim42

Problem

Every Kelos Task starts from a cold workspace: a fresh git clone --depth 1 into an EmptyDir volume. There is no mechanism to run repo-specific setup steps (e.g., npm ci, pip install -r requirements.txt, go mod download, code generation) before the agent starts.

This causes the agent to spend tokens and wall-clock time typing and waiting on those commands itself on every task. For a TaskSpawner processing many issues, the overhead compounds.

Custom agent images cover the system-package side, but cannot help with installs that depend on the cloned repo's lockfile, since the repo is not in the image.

Proposal

Add a single optional field to WorkspaceSpec, shaped to match Kubernetes' container.command convention (exec form, []string, no implicit shell wrapping):

// SetupCommand is executed in the agent container after the workspace
// is cloned and before the agent process starts. It runs in /workspace/repo
// as the agent UID. A non-zero exit fails the Task.
//
// Follows the same exec-form convention as Kubernetes container.command:
// the slice is passed directly to exec, with no shell interpretation.
// Use ["sh", "-c", "<script>"] for shell pipelines.
// +optional
SetupCommand []string `json:"setupCommand,omitempty"`

Wiring:

  • The controller passes the value to the agent container via a KELOS_SETUP_COMMAND env var (JSON-encoded array, since env vars are strings).
  • Each agent's kelos_entrypoint.sh (claude-code, gemini, cursor, codex, opencode) decodes KELOS_SETUP_COMMAND and exec's it directly before invoking the agent. A non-zero exit aborts and the Task transitions to Failed.
  • Setup output is emitted to the container log with a clear banner so users can distinguish setup failures from agent failures.

Examples:

apiVersion: kelos.dev/v1alpha1
kind: Workspace
metadata:
  name: my-app
spec:
  repo: https://github.com/myorg/my-app.git
  secretRef:
    name: github-token
  # Most users want shell features (&&, pipes, env expansion).
  # Use sh -c, exactly like you would for container.command.
  setupCommand: ["sh", "-c", "npm ci --prefer-offline"]
spec:
  setupCommand:
    - sh
    - -c
    - |
      npm ci --prefer-offline
      pip install -r requirements.txt
spec:
  # Single binary, no shell features needed:
  setupCommand: ["go", "mod", "download"]

API shape rationale

  • List, not string. Matches Kubernetes' container.command, lifecycle.postStart.exec.command, and similar fields. Kelos's primary audience writes Kubernetes manifests; the muscle memory of ["sh", "-c", "..."] is more valuable than the one-line ergonomics of a shell-form string.
  • No implicit sh -c wrapping. Match K8s exec semantics exactly. Auto-wrapping would create a foreign idiom that looks like K8s but behaves differently.
  • No companion setupArgs. K8s splits command and args because container images have an entrypoint to override. There's nothing analogous here.
  • Docs must lead with the ["sh", "-c", "..."] form since that is what almost all real-world setup needs.

Scope and non-goals

  • One field. No per-step images, no separate init containers.
  • No new volume types, no PVC mounts, no admission policy. Shared dependency caches are out of scope; the contributor experience on API: Add setup containers and volume strategy to Workspace for dependency caching and pre-agent initialization #774 confirmed root filesystems are not shared between containers, so PVC-based caching does not pay off.
  • The agent image is expected to contain the toolchain (npm, pip, sh, etc.). This is already the convention for plugins and MCP.
  • This is a Workspace field, not a Task field, because setup is a property of the repository — every task on the same workspace needs the same setup. Task-level override was considered but deferred: it can be added additively later if a real use case appears, while the reverse migration (moving setup up from Task to Workspace once duplicated across many TaskSpawners) is harder.

Alternatives considered

  • Single-string shell form (setupCommand string). More ergonomic for the common case (setupCommand: npm ci), and matches Dockerfile RUN / GitHub Actions run: conventions. Rejected because Kelos's audience is K8s-native, and consistency with container.command was judged more valuable than ergonomic parity with CI tools.
  • Custom agent image only. Insufficient for repo-dependent installs (the repo is not in the image).
  • Workspace.spec.files[] + Claude SessionStart hook. Works today for Claude Code, but is agent-specific and requires repo-level Claude config. setupCommand is agent-agnostic.
  • Init container with a separate image. Rejected on API: Add setup containers and volume strategy to Workspace for dependency caching and pre-agent initialization #774 — root filesystem is not shared between containers, so the toolchain still has to live in the agent image.
  • Opening PodOverrides to allow injected init containers. Wider security and maintenance surface than this issue justifies, and it is a Task-level field that would duplicate per-task.
  • Task-level setupCommand. Considered. Deferred in favor of Workspace-level for DRY across tasks; revisit if real demand for per-task setup variation appears.

Backward compatibility

Optional field with zero-value default. Existing Workspaces are unaffected.

/kind feature

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions