Skip to content

Stage 7 — Broker server: implement the missing 'broker' so developers stop holding AWS daemon keys #58

@hanwencheng

Description

@hanwencheng

Summary

Stage 7 today specifies the OIDC-federation primitive (docs/stage7-wip.md, wiki/oidc-federation.md) but stops one piece short of the architecture diagram: the broker server itself. Today every developer running the demo loop holds the long-lived agentkeys-daemon AWS access key in their shell — there is no actual broker, just a stub that signs JWTs.

This issue tracks implementing the broker server end-to-end and committing to the three-role model the architecture has implicitly assumed all along (app developer / app owner-operator / user). When this lands, only one role (the operator) ever sees AWS credentials, and the same server code runs locally for development and hosted for production with no client-side change.

Why this is needed

Pulled from the Stage 6 dev-setup conversation in [PR #TBD] (working branch):

  • A new developer joining the team currently has to be issued the operator's daemon AWS keys to run the demo loop. That violates the broker-not-proxy + clients-hold-only-bearer-tokens invariants (wiki/Home.md rules 2 + 3).
  • Multi-machine development (e.g., a second Linux box) currently requires copying the daemon key to a second machine, even though the AWS infrastructure is singleton (docs/spec/ses-email-architecture.md §6.5).
  • The future hosted backend is described abstractly throughout the docs but has no concrete stand-in. Without a broker server, the "hosted backend" remains a vacuum that conversations have to keep papering over.
  • The existing services/oidc-stub/ only mints OIDC JWTs. It does not do the sts:AssumeRoleWithWebIdentity exchange, does not hold daemon AWS credentials, and does not expose a client-facing API for daemons to fetch temp creds. It is half of Stage 7; this issue implements the other half and integrates the two.

What the broker server is

A long-running HTTP service that:

  1. Authenticates daemons via the existing bearer-token / session-token flow (wiki/session-token.md).
  2. Mints scoped OIDC JWTs with agentkeys_user_wallet claim — replaces the stub's /internal/sign, but with real authorization checks (session valid, grant present, scope matches) instead of accepting arbitrary claims.
  3. Performs sts:AssumeRoleWithWebIdentity against the singleton agentkeys-agent IAM role using the JWT it just signed. Returns the temp AWS creds (≤1h) to the authenticated daemon.
  4. Holds the long-lived agentkeys-daemon access key as operator-side configuration. The key never leaves the broker. No client (developer laptop, daemon sandbox) ever touches it.
  5. Emits audit records for every credential mint — interim SQLite for v0.1, on-chain extrinsic in v0.2+ (wiki/serve-and-audit.md).
  6. Serves the OIDC discovery + JWKS endpoints absorbed from services/oidc-stub/.

Same server, two deployment shapes:

Shape Who runs it URL AWS-creds source Notes
Local development The operator on their dev box http://localhost:<port> Env vars DAEMON_ACCESS_KEY_ID / DAEMON_SECRET_ACCESS_KEY (or 1Password CLI shim) Co-located with mock chain backend; teammates point their daemons at the operator's machine.
Hosted (future) AgentKeys infra https://broker.agentkeys.dev (or similar) KMS-sealed, never exposed Same binary; configuration differs. The TEE-backed Stage 8 evolution wraps this server; this issue does not require TEE.

The three roles (canonical, to be added to docs/dev-setup.md)

Role What they run What they hold What they need from this work
App developer — building an agent against AgentKeys agentkeys-daemon + the agent process A short-lived bearer token from the operator Daemon config flag --broker-url to point at any broker (local or hosted); no AWS creds
App owner / operator — running the broker agentkeys-broker-server (this issue) Long-lived agentkeys-daemon AWS key (1Password); operator's own master session token Setup runbook, broker binary, env-var schema, healthcheck
End user — using a credential-brokered agent agentkeys CLI 30-day master session token in OS keychain Nothing new; this issue does not change the user-facing surface

The dev-setup.md split that the conversation surfaced is what "Stage 7 done" must deliver. Today's dev-setup.md implicitly assumes all three roles are the same person.

Migration story (no client-side change at the boundary)

Day 0 (operator's first dev box): broker runs at `localhost`, env-var-backed.

Day 1 (a second developer joins): operator publishes broker URL on the local network or via tunnel; second developer's daemon points at it; second developer never holds AWS creds.

Day N (hosted broker available): clients change --broker-url from http://localhost:8090 to https://broker.agentkeys.dev; nothing else changes. The S3 environment variables that today litter scripts/stage6-demo-env.sh disappear from client-side completely.

Concrete deliverables

Code

  • New crate crates/agentkeys-broker-server/ — Axum HTTP server reusing the CredentialBackend trait. Initial implementation can wrap agentkeys-mock-server plus the new STS-exchange endpoints; refactor or merge as appropriate.
  • Configuration schema for the daemon AWS key:
    • BROKER_DAEMON_ACCESS_KEY_ID + BROKER_DAEMON_SECRET_ACCESS_KEY (env vars)
    • Optional op://... 1Password CLI integration for laptop dev
    • Future-hosted: KMS-sealed configuration source (interface only; full implementation is hosted-deploy work)
  • New endpoints:
    • POST /v1/mint-oidc-jwt — requires authenticated daemon session; mints scoped JWT with agentkeys_user_wallet claim
    • POST /v1/mint-aws-creds — convenience: combines JWT mint + STS exchange in one round trip; returns temp creds
    • GET /.well-known/openid-configuration and GET /.well-known/jwks.json — absorbed from services/oidc-stub/
    • GET /healthz, GET /readyz — operator-side smoke checks
  • CLI flag on agentkeys-daemon: --broker-url <url> (replaces / extends the existing --backend flag if appropriate); document precedence vs env var
  • Provisioner-scripts adapter: replace direct aws sts assume-role calls in scripts/stage6-demo-env.sh with a call to the broker
  • Removal of the env-var sourcing model from the developer-facing flow (operator still uses env vars; developers do not)

Docs

  • docs/dev-setup.md restructured around the three roles. Section per role, with: prerequisites, what to install, how to start the role's loop, what success looks like. The operator section absorbs what is currently §3 (Stage 6 AWS setup) + §4 demo. The app-developer section is brand new — they do not run AWS at all. The user section is minimal — agentkeys init + an example.
  • docs/stage6-aws-setup.md revised:
    • Mark §8 "Hand-back to Claude" as obsolete (it now goes into broker config, not into operator's shell)
    • Update scripts/stage6-demo-env.sh references — broker takes over the AssumeRole role; demo env script becomes broker config
    • Add a "What changes when broker server lands" section explicitly: what was env-var-on-laptop is now config-of-broker
  • docs/stage7-wip.md updated to reference the new broker crate as the canonical Stage 7 implementation; OIDC-stub becomes a documented testing convenience or is retired
  • New docs/operator-runbook.md (or similar) — covers starting + supervising the broker, rotating the daemon AWS key, monitoring audit, and the local-to-hosted migration path
  • harness/features.json extended with Stage 7 deliverables; harness/stage-7-done.sh authored

Tests

  • Broker unit tests (cargo test -p agentkeys-broker-server)
  • End-to-end test: operator starts broker on localhost:8091; app-developer's daemon is configured with --broker-url http://localhost:8091; daemon successfully provisions OpenRouter via the existing scraper path without holding any AWS keys directly
  • Negative test: app-developer machine with no AWS creds in env still completes the e2e flow (proves the credential-broker shape)
  • Migration test: same broker code starts with hosted-config (in-memory KMS shim acceptable for the test) and serves the same API

Acceptance criteria for "Stage 7 done"

  1. Operator can start agentkeys-broker-server on a fresh laptop with daemon AWS keys in env (or 1Password CLI) and zero other manual setup.
  2. App developer can run the existing OpenRouter demo loop with a single environment variable AGENTKEYS_BROKER_URL=... pointing at the operator's broker. The developer's machine has zero AWS env vars and never invokes aws sts.
  3. End user flow (agentkeys initagentkeys storeagentkeys read) is unchanged.
  4. docs/dev-setup.md has three top-level role sections, each completable by someone in that role with no prior context.
  5. docs/stage6-aws-setup.md no longer asks anyone except the operator to handle AWS keys.
  6. bash harness/stage-7-done.sh exits 0.

Out of scope (explicitly)

  • TEE integration. The broker server in this issue runs in plaintext on commodity hardware; TEE-backed hosting is the v0.2+ evolution.
  • The Stage 8 off-chain encrypted vault (#57). The broker is the federation layer; the vault is the storage layer. They compose but are independent deliverables.
  • Production hosting of https://broker.agentkeys.dev itself — interface design only; the actual hosting is operator infra work.
  • Migration of existing operator workflows that depend on stage6-demo-env.sh sourcing — handled in the doc revision.

Related

  • #57 — Stage 8 (off-chain vault); pairs with this issue but lands separately
  • #9 — master-seed HDKD; once landed, the broker's local-file ES256 key is replaced by TEE-derived oidc/issuer/v1
  • docs/spec/threat-model-key-custody.md — the overall security position the broker must preserve
  • docs/stage7-wip.md — current Stage 7 scratchpad; this issue absorbs and supersedes it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions