Agent to Agent Handoff Data Structure

## Area: New API Extension

## Summary

Kelos agents today pass only flat metadata between dependent tasks — branch names, PR URLs, token counts (`TaskStatus.Results` as `map[string]string`). When a Linear investigation agent concludes, there is no efficient way to transfer its rich findings (root cause analysis, relevant code paths, reproduction steps) to a downstream solver agent. The solver either re-derives everything from scratch (burning tokens and time) or operates blind.

This proposal adds a **standard structured handoff** payload to `TaskStatus` — a versioned JSON format that agents write during execution and downstream agents consume via prompt templates. It is deliberately scoped to the **data format and plumbing** for inter-task context transfer, not control flow, pipeline management, or dynamic spawning.

## Problem

### 1. Results are metadata, not context

The existing `TaskStatus.Results` (`map[string]string`) is designed for machine-readable execution metadata — branch names, commit SHAs, PR URLs, cost, token counts. These are outputs of `kelos-capture`, not the agent itself. There is no channel for the agent to pass *substantive context* to a downstream agent: investigation findings, analysis summaries, Slack thread context, or decision rationale.

The prompt template system (`internal/controller/task_controller.go:849-886`) injects dependency results into downstream prompts:

```
{{index .Deps "investigate" "Results" "branch"}}
```

But there is no `Results` key for "here's what I found and why." Agents would have to stuff freeform text into a flat string map, which has no schema, no size discipline, and no semantic separation between metadata and content.

### 2. Downstream agents waste tokens re-deriving context

In a pipeline like `investigate → fix → open-pr`, the fix agent currently receives only coordinates (which branch, which PR). It must re-read the issue, re-analyze the codebase, and re-identify the root cause — duplicating work the investigation agent already completed. This is the primary cost multiplier in multi-stage agent workflows.

With a structured handoff, the investigation agent's summary flows directly into the fix agent's prompt, eliminating redundant analysis.

### 3. No audit trail for inter-agent communication

When debugging why an agent made a particular decision, operators can inspect `TaskStatus.Results` for metadata but cannot see what context the agent received from its upstream dependency. A structured handoff stored in `TaskStatus` provides a persistent, `kubectl`-inspectable record of exactly what context was transferred between agents.

### 4. Existing proposals assume Results is sufficient

Several open proposals (#747 conditional dependencies, #792 historyContext, #829 parent-child tasks, #983 TaskPipeline CRD) reference dependency results as the primary inter-task data channel. All of them would benefit from a richer, standardized handoff format — but none of them define one.

## Proposed Design

### The Handoff Type

Add a new type to `api/v1alpha1/task_types.go`:

```go
// TaskHandoff is the standard structured payload for agent-to-agent
// context transfer. Versioned for forward compatibility.
type TaskHandoff struct {
    // Version of the handoff schema.
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:default=1
    Version int `json:"version"`

    // Summary is a concise description of what was accomplished or found.
    // Intended to be token-efficient when injected into downstream prompts.
    // +kubebuilder:validation:MaxLength=4096
    Summary string `json:"summary"`

    // Detail contains rich context: findings, analysis, reasoning.
    // Downstream tasks can selectively include this when depth is needed.
    // +optional
    // +kubebuilder:validation:MaxLength=65536
    Detail string `json:"detail,omitempty"`

    // Data contains structured key-value pairs for machine-readable fields.
    // Supports templating in downstream prompts and CEL evaluation in
    // conditional dependencies (#747).
    // +optional
    Data map[string]string `json:"data,omitempty"`
}
```

Add the field to `TaskStatus`:

```go
type TaskStatus struct {
    // ... existing fields ...

    // Handoff contains the structured context payload produced by the agent
    // for consumption by downstream dependent tasks.
    // +optional
    Handoff *TaskHandoff `json:"handoff,omitempty"`
}
```

### Why separate from Results

| | `Results` | `Handoff` |
|---|---|---|
| **Purpose** | Execution metadata | Agent-to-agent context |
| **Producer** | `kelos-capture` (post-run binary) | The agent itself (during execution) |
| **Content** | Branch, PR URL, commit SHA, cost, tokens | Investigation findings, analysis, summaries |
| **Format** | Flat `map[string]string` | Versioned struct with summary/detail/data |
| **Consumer** | Controller (metrics, branch lock), prompt templates | Downstream agent prompts |

Keeping them separate means each can evolve independently. `Results` is part of the agent image interface; `Handoff` is part of the agent's *output to other agents*.

### How agents produce handoffs

**1. Well-known file path.** The agent writes JSON to a path specified by the `KELOS_HANDOFF_PATH` environment variable (default: `/tmp/kelos-handoff.json`).

**2. `kelos-capture` emits it.** After the agent exits, `kelos-capture` reads the handoff file (if it exists), validates the schema, and emits it between markers in stdout:

```
---KELOS_HANDOFF_START---
{"version":1,"summary":"...","detail":"...","data":{"key":"value"}}
---KELOS_HANDOFF_END---
```

**3. Controller parses it.** The controller extracts the handoff JSON from pod logs (same mechanism as `KELOS_OUTPUTS_START/END`) and stores it in `TaskStatus.Handoff`.

**4. No handoff = no error.** If the agent doesn't write a handoff file, nothing happens. The field remains nil. This makes handoffs fully opt-in with zero impact on existing tasks.

### How downstream tasks consume handoffs

The existing `.Deps` template context is extended to include `Handoff`:

```go
deps[depName] = map[string]interface{}{
    "Outputs": depTask.Status.Outputs,
    "Results": depTask.Status.Results,
    "Handoff": depTask.Status.Handoff,   // NEW
    "Name":    depName,
}
```

Downstream prompts access handoff fields via Go templates:

```yaml
prompt: |
  ## Investigation Summary
  {{index .Deps "investigate" "Handoff" "Summary"}}

  ## Detailed Findings
  {{index .Deps "investigate" "Handoff" "Detail"}}

  ## Specific Data
  Root cause file: {{index .Deps "investigate" "Handoff" "Data" "root_cause_file"}}

  Fix the issue described above on branch {{index .Deps "investigate" "Results" "branch"}}.
```

The task author controls exactly what gets injected — `Summary` for token efficiency, `Detail` when depth is needed, specific `Data` values for targeted references.

### Size limits

- `Summary`: max 4 KB — forces conciseness, keeps downstream prompts lean
- `Detail`: max 64 KB — room for rich context without blowing up etcd (object size limit ~1.5 MB)
- `Data`: inherits the 64 KB detail limit for the overall handoff
- `kelos-capture` validates and rejects oversized handoffs with a warning log

### Agent-side experience

The agent writes a JSON file during execution. No new tools, no MCP server, no special SDK — any agent that can write a file can produce a handoff:

```bash
# During agent execution, the agent writes:
cat > "$KELOS_HANDOFF_PATH" <<'EOF'
{
  "version": 1,
  "summary": "Root cause: null pointer in auth.go:142 when session cookie is expired. The middleware skips validation but the handler assumes non-nil session.",
  "detail": "Full stack trace:\n  auth.go:142 → session.Validate()\n  middleware.go:89 → next.ServeHTTP()\n\nReproduction:\n  curl -H 'Cookie: session=expired' https://api.example.com/protected\n\nThe fix requires a nil check before accessing session fields in the auth handler.",
  "data": {
    "root_cause_file": "pkg/auth/auth.go",
    "root_cause_line": "142",
    "severity": "critical",
    "issue_key": "LIN-423"
  }
}
EOF
```

For AI coding agents (Claude Code, Codex, etc.), the prompt can instruct the agent to write this file. The instruction can be part of the task prompt or injected via AgentConfig `agentsMD`.

## Concrete Examples

### Example 1: Investigation → Fix pipeline

```yaml
# Stage 1: Investigate the issue
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
  name: investigate
spec:
  type: claude-code
  credentials:
    type: oauth
    secretRef:
      name: claude-credentials
  workspaceRef:
    name: my-workspace
  prompt: |
    Investigate Linear issue LIN-423: "Users getting 500 errors on login."

    Analyze the codebase, identify the root cause, and find relevant code paths.
    Do NOT fix the issue — only investigate.

    When done, write your findings to $KELOS_HANDOFF_PATH as JSON:
    {
      "version": 1,
      "summary": "<concise root cause and location>",
      "detail": "<full analysis with code paths, stack traces, reproduction steps>",
      "data": {"root_cause_file": "<path>", "severity": "<low|medium|high|critical>"}
    }
---
# Stage 2: Fix the issue (receives investigation context)
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
  name: fix
spec:
  type: claude-code
  credentials:
    type: oauth
    secretRef:
      name: claude-credentials
  workspaceRef:
    name: my-workspace
  branch: fix/lin-423
  dependsOn: [investigate]
  prompt: |
    ## Investigation Summary
    {{index .Deps "investigate" "Handoff" "Summary"}}

    ## Detailed Findings
    {{index .Deps "investigate" "Handoff" "Detail"}}

    Fix the issue described above. Write tests that cover the failure case.
    Commit and push your changes.
---
# Stage 3: Open PR
apiVersion: kelos.dev/v1alpha1
kind: Task
metadata:
  name: open-pr
spec:
  type: claude-code
  credentials:
    type: oauth
    secretRef:
      name: claude-credentials
  workspaceRef:
    name: my-workspace
  branch: fix/lin-423
  dependsOn: [fix]
  prompt: |
    The fix for LIN-423 is ready on branch {{index .Deps "fix" "Results" "branch"}}.

    Investigation context:
    {{index .Deps "investigate" "Handoff" "Summary"}}

    Review the diff and open a pull request with `gh pr create`.
    Reference LIN-423 in the PR description.
```

The fix agent starts with full context immediately — no re-investigation, no wasted tokens. The PR agent also references the investigation summary for a well-written PR description.

### Example 2: Slack-triggered triage with context preservation

```yaml
apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: slack-triage
spec:
  when:
    genericWebhook:
      path: /slack-escalation
  taskTemplate:
    type: claude-code
    credentials:
      type: oauth
      secretRef:
        name: claude-credentials
    workspaceRef:
      name: my-workspace
    promptTemplate: |
      A Slack escalation was received: {{.Body}}

      Triage this issue:
      1. Identify the affected service and severity
      2. Check recent deployments and error logs
      3. Determine if this is a known issue

      Write your triage findings to $KELOS_HANDOFF_PATH so the resolver
      agent has full context without re-reading the Slack thread.
```

The downstream resolver task (via `dependsOn` or a future TaskPipeline) gets the triage context without needing to re-fetch and re-parse the Slack thread.

## Relationship to Existing Proposals

| Issue | What it does | How handoffs interact |
|-------|-------------|----------------------|
| **#792** (historyContext) | Injects prior task outcomes from the same spawner into prompts | Handoff summaries from historical tasks could be included via `includeKeys: [handoff_summary]`. Different axes: #792 is temporal (across runs), handoffs are spatial (across pipeline stages). |
| **#747** (conditional deps) | CEL-based routing on dependency results | Handoff `Data` fields become available for CEL evaluation: `handoff.data["severity"] == "critical"`. Richer routing signals than flat Results alone. |
| **#829** (parent-child tasks) | Agent-initiated dynamic task spawning via MCP | Parent agent reads child task handoffs via `get_task_status`. Child tasks inherit the parent's handoff context. The handoff format standardizes what flows in both directions. |
| **#983** (TaskPipeline CRD) | First-class pipeline with stages, matrix fan-out | Handoff becomes the inter-stage data format. Pipeline stages access upstream handoffs via `{{.Stages}}` templates. Matrix stages could aggregate handoffs from parallel tasks. |

This proposal does not depend on any of these issues and does not block them. It is a standalone, additive primitive that each of them benefits from.

## Files to Change

| File | Change |
|------|--------|
| `api/v1alpha1/task_types.go` | Add `TaskHandoff` struct, add `Handoff *TaskHandoff` to `TaskStatus` |
| `internal/controller/output_parser.go` | Add `ParseHandoff()` for `KELOS_HANDOFF_START/END` markers |
| `internal/controller/output_parser_test.go` | Tests for handoff parsing, size validation, malformed JSON |
| `internal/controller/task_controller.go` | Parse handoff from pod logs, store in status; inject into `.Deps` template context; set `KELOS_HANDOFF_PATH` env var |
| `internal/controller/task_controller_test.go` | Tests for handoff in template resolution, env var injection |
| `cmd/kelos-capture/main.go` | Read `/tmp/kelos-handoff.json`, validate, emit between markers |
| `docs/agent-image-interface.md` | Document handoff file contract, env var, markers |
| `examples/07-task-pipeline/pipeline.yaml` | Update example to demonstrate handoff usage |

Estimated: ~60 lines of types + ~50 lines of parser + ~30 lines of controller wiring + tests.

## Backward Compatibility

- **Purely additive**: New optional field on `TaskStatus`, no changes to existing behavior
- **Zero config for existing users**: Tasks that don't write a handoff file are completely unaffected
- **Existing Results/Outputs unchanged**: `kelos-capture` continues emitting `KELOS_OUTPUTS_START/END` as before; handoff markers are separate
- **Safe template fallback**: If `{{.Deps.X.Handoff.Summary}}` is referenced but no handoff exists, the template renders empty (Go template zero-value behavior)
- **No new CRDs**: Extends existing `TaskStatus` within the Task resource

/kind feature


	`Results`	`Handoff`
Purpose	Execution metadata	Agent-to-agent context
Producer	`kelos-capture` (post-run binary)	The agent itself (during execution)
Content	Branch, PR URL, commit SHA, cost, tokens	Investigation findings, analysis, summaries
Format	Flat `map[string]string`	Versioned struct with summary/detail/data
Consumer	Controller (metrics, branch lock), prompt templates	Downstream agent prompts

Issue	What it does	How handoffs interact
#792 (historyContext)	Injects prior task outcomes from the same spawner into prompts	Handoff summaries from historical tasks could be included via `includeKeys: [handoff_summary]`. Different axes: #792 is temporal (across runs), handoffs are spatial (across pipeline stages).
#747 (conditional deps)	CEL-based routing on dependency results	Handoff `Data` fields become available for CEL evaluation: `handoff.data["severity"] == "critical"`. Richer routing signals than flat Results alone.
#829 (parent-child tasks)	Agent-initiated dynamic task spawning via MCP	Parent agent reads child task handoffs via `get_task_status`. Child tasks inherit the parent's handoff context. The handoff format standardizes what flows in both directions.
#983 (TaskPipeline CRD)	First-class pipeline with stages, matrix fan-out	Handoff becomes the inter-stage data format. Pipeline stages access upstream handoffs via `{{.Stages}}` templates. Matrix stages could aggregate handoffs from parallel tasks.

File	Change
`api/v1alpha1/task_types.go`	Add `TaskHandoff` struct, add `Handoff *TaskHandoff` to `TaskStatus`
`internal/controller/output_parser.go`	Add `ParseHandoff()` for `KELOS_HANDOFF_START/END` markers
`internal/controller/output_parser_test.go`	Tests for handoff parsing, size validation, malformed JSON
`internal/controller/task_controller.go`	Parse handoff from pod logs, store in status; inject into `.Deps` template context; set `KELOS_HANDOFF_PATH` env var
`internal/controller/task_controller_test.go`	Tests for handoff in template resolution, env var injection
`cmd/kelos-capture/main.go`	Read `/tmp/kelos-handoff.json`, validate, emit between markers
`docs/agent-image-interface.md`	Document handoff file contract, env var, markers
`examples/07-task-pipeline/pipeline.yaml`	Update example to demonstrate handoff usage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent to Agent Handoff Data Structure #1026

Area: New API Extension

Summary

Problem

1. Results are metadata, not context

2. Downstream agents waste tokens re-deriving context

3. No audit trail for inter-agent communication

4. Existing proposals assume Results is sufficient

Proposed Design

The Handoff Type

Why separate from Results

How agents produce handoffs

How downstream tasks consume handoffs

Size limits

Agent-side experience

Concrete Examples

Example 1: Investigation → Fix pipeline

Example 2: Slack-triggered triage with context preservation

Relationship to Existing Proposals

Files to Change

Backward Compatibility

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Agent to Agent Handoff Data Structure #1026

Description

Area: New API Extension

Summary

Problem

1. Results are metadata, not context

2. Downstream agents waste tokens re-deriving context

3. No audit trail for inter-agent communication

4. Existing proposals assume Results is sufficient

Proposed Design

The Handoff Type

Why separate from Results

How agents produce handoffs

How downstream tasks consume handoffs

Size limits

Agent-side experience

Concrete Examples

Example 1: Investigation → Fix pipeline

Example 2: Slack-triggered triage with context preservation

Relationship to Existing Proposals

Files to Change

Backward Compatibility

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions