Skip to content

feat(provider): claudecode provider for Claude Max/Pro subscriptions#64

Draft
adelin-b wants to merge 12 commits intom0n0x41d:mainfrom
adelin-b:feat/claude-code-provider
Draft

feat(provider): claudecode provider for Claude Max/Pro subscriptions#64
adelin-b wants to merge 12 commits intom0n0x41d:mainfrom
adelin-b:feat/claude-code-provider

Conversation

@adelin-b
Copy link
Copy Markdown

Summary

Adds a claudecode LLM provider that wraps the claude CLI as a subprocess. Lets Pro/Max subscribers run haft agent without ANTHROPIC_API_KEY — auth is delegated entirely to the CLI.

  • New provider + tests in internal/provider/claudecode.go / claudecode_test.go
  • Factory dispatch + longest-prefix-match fix (claude-code was shadowed by claude-)
  • Registry entries: claude-code, claude-code:opus, claude-code:sonnet, claude-code:haiku
  • haft doctor surfaces the CLI presence + version
  • Docs at docs/claude-code-provider.md

Why subprocess and not the SDK

haft is Go. The Vercel AI SDK's ai-sdk-provider-claude-code is TypeScript-only, and the Claude Agent SDK itself only ships Python + TS bindings. Subprocess wrapping is the only path that reuses Claude Code's subscription auth without taking a non-Go dep (or building a CGO bridge to Bun/Node).

Scope (intentionally narrow)

In this PR:

  • Flatten haft messages → (system_prompt, user_prompt) pair
  • Pipe to claude -p --output-format stream-json --verbose --allowed-tools ''
  • Parse NDJSON, stream text deltas, return assistant Message

Out of scope (follow-ups):

  • Tool-use: haft's ToolSchema isn't translated to the CLI's MCP surface yet, so the model emits text only. Documented as a clear limitation — tool-driven flows should keep using the anthropic/openai providers. The right next step is `--mcp-config` pointing at a local haft MCP server so `haft_note`/`haft_problem`/etc. are callable.
  • Session reuse via --resume to amortize per-turn startup cost.
  • Propagating token counts from the result event.

I'd rather ship the scaffold small and iterate than bundle everything and risk a wholesale reject.

Design notes

  • Stdin pipe for the prompt (avoids argv size limits on long conversations).
  • --allowed-tools '' disables CLI built-ins so the model doesn't write files under the user's feet when haft is supposed to own the agent surface.
  • --no-session-persistence keeps each turn ephemeral (matches existing Anthropic/OpenAI providers' stateless shape).
  • ModelID() reports the haft-facing id (claude-code or claude-code:<sub>), not a real Anthropic model name — avoids confusing the registry.

Tests

ok  github.com/m0n0x41d/haft/internal/provider    8.031s

Unit tests cover:

  • flattenConversation — system merging, labeled body blocks, empty-turn skip
  • renderParts — text, tool_call, tool_result (incl. error variant)
  • parseClaudeStream — text delta extraction, result event (success / error), malformed line skip
  • guessProviderFromPrefix — regression: claude-code* routes to claudecode, not anthropic

Not yet in CI: end-to-end subprocess invocation against the real claude binary. Open to adding a -tags integration suite gated on claude being on PATH if you want it.

Test plan

  • `go test ./internal/provider/...` (clean)
  • `go build ./internal/provider/... ./internal/config/... ./internal/cli/...` (clean)
  • Manual: `model: claude-code` in `~/.haft/config.yaml`, run `haft "hello"`, see streamed response
  • `haft doctor` reports Claude Code CLI when `claude` is on PATH
  • Prefix routing: `--model claude-opus-4-20250514` still dispatches to anthropic (not claudecode)

Open questions for the maintainer

  1. Model id shape: is `claude-code:sonnet` acceptable, or prefer a flat list like `claude-code-sonnet`?
  2. Tool-use follow-up: MCP-bridge approach OK, or would you rather expose haft tools via `--allowed-tools` on custom MCP servers differently?
  3. Happy to gate the doctor check behind a config flag if you don't want CLI detection to run when the user isn't using this provider.

adelin-b added 3 commits April 20, 2026 00:42
Wraps the `claude` CLI (Claude Code) as an LLMProvider so users with a
Pro/Max subscription can run haft's interactive agent without setting
ANTHROPIC_API_KEY. Auth is delegated entirely to the CLI (OAuth, keychain,
or pass-through env var — whichever Claude Code is configured with).

Scope (MVP, intentionally narrow):
- Flattens haft's structured message history into a single (system, user)
  prompt pair and pipes it to `claude -p --output-format stream-json`.
- Parses NDJSON events, forwards text deltas as StreamDelta, returns the
  assembled assistant Message on `result`.
- Tool schemas are **not** translated to the CLI's MCP surface yet — the
  model emits text only. Tool-driven agent loops (haft_note etc.) should
  keep using the anthropic/openai providers until a follow-up PR wires
  --mcp-config. This is documented in both the package doc comment and
  docs/claude-code-provider.md so operators pick the right provider.

Changes:
- internal/provider/claudecode.go — new provider implementation.
- internal/provider/claudecode_test.go — unit tests for flatten/render/parse
  plus the prefix-routing regression.
- internal/provider/factory.go — dispatch "claudecode" provider id; rewrite
  guessProviderFromPrefix as ordered list so "claude-code" beats "claude-".
- internal/config/config.go — same ordered-prefix fix for ProviderForModel.
- internal/provider/registry.go — register Claude Code (CLI) provider with
  "claude-code" (+ :opus/:sonnet/:haiku sub-model variants). Cost fields
  left at zero since the subscription isn't per-token.
- internal/cli/doctor.go — surface `claude` CLI presence + version.
- docs/claude-code-provider.md — setup, model ids, limitations, follow-up.

Constraint: haft is Go, so the Vercel AI SDK's `ai-sdk-provider-claude-code`
doesn't apply and the TS/Python Claude Agent SDK can't be imported. Subprocess
wrapping is the only path that reuses Claude Code's subscription auth.
Rejected: embedding an Anthropic SDK OAuth flow | Max auth goes through the
CLI, not the public API. Rejected: CGO bridge to the TS SDK | pulls Bun/Node
into the build.
Confidence: medium — text path proven by tests; subprocess path exercised
only manually (no real CLI in CI yet).
Scope-risk: narrow — additive provider, no existing call sites touched.
Not-tested: end-to-end subprocess invocation; CLI auth-failure surface text;
large-conversation stdin > argv-safe sizes.
…illing

The Apr 2026 fix to claude-code #43333 makes `claude -p` draw from an
active Max/Pro subscription when OAuth'd and no ANTHROPIC_API_KEY is
present. If the user has a stray ANTHROPIC_API_KEY exported in their
shell, the CLI silently routes to per-token API billing instead (this
is the $1,800-in-two-days foot-gun tracked in #37686).

Strip ANTHROPIC_API_KEY from the child env in the provider itself so
the subscription path is taken by default. Parent env is untouched.

Also clean up the docs to reflect the post-fix reality and point users
at the anthropic provider when they *want* API-key billing.

Constraint: can't require the user to shell-unset before every call;
the goal of this provider is subscription-by-default ergonomics.
Rejected: require --unsetenv flag from the caller | still leaks on
forgotten exports. Rejected: error out when the key is set | false-
positives for users who intentionally want API billing but happened
to pick this provider.
Confidence: high — env strip is standard, covered by a unit test.
Scope-risk: narrow — only affects the claudecode subprocess.
Generate a per-turn --mcp-config tmpfile that spawns `haft serve` as the
backing MCP server, so the model can call haft_note / haft_problem /
haft_decision / haft_query / haft_refresh / haft_solution during a turn.
The CLI also keeps its built-in Read/Write/Bash/etc. under --permission-mode
bypassPermissions so file ops work without interactive approval.

Execution happens entirely inside the CLI subprocess; haft's outer agent
loop receives the final assistant text after all round-trips. Operators
who need per-tool hooks or cycle tracking must keep using the anthropic /
openai providers — this is documented.

Opt-out: HAFT_CLAUDECODE_NO_MCP=1 drops the bridge and restores the
previous text-only behavior (--allowed-tools '').

Constraint: schlunsen/claude-agent-sdk-go is the most maintained Go port
but its in-process MCP server support is TODO, so adding it as a dep
wouldn't solve this. Raw subprocess + --mcp-config pointing at the haft
binary itself is the smallest viable bridge.
Rejected: port haft's full tool surface to an in-process MCP server |
big refactor, duplicates the existing `haft serve` path.
Rejected: fail hard when no haft project root is found | makes the
provider unusable in quick-chat contexts; we fall back to text-only.
Confidence: medium — tmpfile + config shape are unit-tested; live tool
round-trip verified manually against Max subscription.
Scope-risk: narrow — additive to the provider; env opt-out preserves the
previous MVP behavior.
Not-tested: long conversations exceeding 1MB stdin; CLI subprocess killed
mid-turn leaving stale haft serve children.
@adelin-b adelin-b marked this pull request as draft April 19, 2026 23:15
adelin-b added 9 commits April 20, 2026 01:16
`haft agent` first-run gate (ensureConfigured) called setup.Run() whenever
no provider had an APIKey or AccessToken. The setup stub only knows how to
read OPENAI_API_KEY, so `model: claude-code` users hit:

  First run — let's set up Haft.
  Error: setup: OPENAI_API_KEY not set — set it in your environment

Auth for claudecode is owned by the `claude` CLI, so no credentials ever
live in ~/.haft/config.yaml for this provider. IsConfigured() now short-
circuits true when the selected model resolves to the claudecode provider.

Covered by four unit tests in internal/config (previously no test file).
…, not append

--append-system-prompt concatenates on top of Claude Code's ~30K-token
default prompt. Haft's FPF protocol instructions get diluted to the point
the model behaves like vanilla Claude Code instead of a haft agent.

Switch to --system-prompt so haft owns the prompt outright. If the user
wants Claude Code's built-in tool instructions, haft's own system prompt
can describe them — but that's a haft-side authoring choice, not a
provider concern.
Addressing feedback from the PR m0n0x41d#64 code review.

HIGH:
- Remove the dead `cleanup` variable. It was assigned a func that was
  never called (the defer above did the actual work) and was then
  abused as a nil-check bool. Replace with an explicit `mcpBridged`
  bool so the control flow is honest.
- Dedup the provider-prefix routing tables. `config.ProviderForModel`
  is now the single source of truth; `provider.guessProviderFromPrefix`
  delegates to it. The two tables had already drifted (`mistral` existed
  only in config) and this PR's addition made it worse.

MEDIUM:
- Explicitly chmod the MCP tmpfile to 0600 so a permissive umask can't
  leak the haft binary path + project root to other users.
- Use `--add-dir=<value>` (equals form) so a project root starting with
  `-` can't be mis-parsed as a CLI flag.
- Cap stderr at 64KB via a small `cappedBuffer` writer so a chatty
  --verbose session can't pressure parent memory.
- Truncate stderr text embedded in the exit error message to 8KB.
- docs: fix stale `--append-system-prompt` in the "How it works" example
  to match the actual code (`--system-prompt`).

No functional changes to the wire format or subprocess flags; tests
still pass (provider + config).
Before: `internal/cli/util.go` had a private `findProjectRoot` that walked
cwd up to a .haft/ dir. This PR's `internal/provider/claudecode.go`
duplicated that logic (`findHaftProjectRoot`). Two copies of the same
walker is the kind of thing that drifts.

Now:
- `internal/project.FindRoot(startDir)` — pure-ish public helper,
  reusable from any package, takes the starting dir as a parameter so
  it's testable without os.Chdir tricks.
- `internal/project.FindRootFromCwd()` — convenience wrapper.
- `internal/cli/util.go:findProjectRoot` — thin wrapper that preserves
  the existing error-returning signature its callers use.
- `internal/provider/claudecode.go` — uses `project.FindRootFromCwd`
  directly; local `findHaftProjectRoot` deleted.

Tests moved to `internal/project/findroot_test.go` where they
can exercise the pure `FindRoot(startDir)` without chdir. Added a
regression test that a regular file named `.haft` is ignored (only a
directory counts as a project marker — matches the existing cli
behavior and the IsDir check both walkers already had).

No behavior change for any existing caller.
Reuse:
- desktop/app.go:findProjectRoot delegates to project.FindRootFromCwd
  like the cli and provider copies do. Three walkers collapsed into one.
  Fixes a latent bug in the desktop copy that would have matched a
  regular file named .haft (no IsDir check).

Quality:
- Drop the redundant `subModel` field on ClaudeCodeProvider; derive
  from `modelID` via cliSubModel() so there's one source of truth.
  A failing test on "claude-code" (bare) caught a real bug: the
  previous field-based code was never wrong because the prefix-check
  stripping happened up front, but the new derivation had to check
  the `ok` return from strings.CutPrefix — which is now tested.
- registry.go: zero out ContextWindow / DefaultMaxOut for claude-code*
  entries. Those limits live in the CLI, not here; fabricating numbers
  would go stale. Comment explains why.

Efficiency:
- Cache `haftExe` and `projectRoot` on ClaudeCodeProvider at
  construction. Was re-running os.Executable + os.Getwd + filesystem
  walk on every turn. Per-turn cost drops to just the tmpfile
  (marshal + write + chmod + remove — ~microseconds vs the CLI's
  multi-second spawn). writeHaftMCPConfig is now a pure function of
  (exe, projectRoot) and easier to test without chdir tricks.
- cappedBuffer now keeps the TAIL of stderr, not the head. Real
  failures almost always print near the end; head-only meant error
  messages showed startup chatter and dropped the actual error for
  long --verbose sessions. Added tail / no-truncation tests.
- Removed the now-redundant "truncate error message" block in
  Stream() — the cappedBuffer does it itself.

No behavior change to the subprocess wire format.
Applied the same frame → compare → decide reasoning the project enforces
via FPF. Three of four items survived scrutiny as tiny wins; the fourth
(--resume) deserves its own PR.

Done here:
- Cache the filtered child env on ClaudeCodeProvider at construction.
  `envWithout(os.Environ(), "ANTHROPIC_API_KEY")` was called once per
  turn; move it once-per-provider. The only env var we mask is rarely
  toggled mid-session. Documented the tradeoff (added-after-construction
  vars are missed — callers needing fresh env should rebuild).
- Pre-size the text `strings.Builder` in parseClaudeStream to 16KB.
  Skips ~4 grow-and-copy doublings for typical outputs; still grows
  naturally for long ones. Tiny allocation win, zero ergonomic cost.

Not done (and why):
- --resume session reuse is the real ~3-5x speedup but it changes the
  LLMProvider contract (stateless → stateful), needs delta-only message
  forwarding, session-TTL fallback, and cleanup. Too much for this PR.
  Moved the entry in docs/claude-code-provider.md from a one-liner
  under "Follow-up" to a full design sketch so whoever picks it up
  has a starting point instead of re-deriving the constraints.
- `tui_spawn.go` env-stripping near-duplicate: different semantics
  (multi-key + replace vs single-key strip). Wrapping adds indirection
  with no payoff. Left alone.
All four skipped items from the simplify pass done.

(1) Session reuse via --resume
  - Provider grows sessionID + msgsSent + a mutex. Turn 1 spawns fresh,
    records the CLI's session id from stream-json events, and snapshots
    the conversation length. Turn 2+ checks that haft grew the message
    list by exactly one user message; if so, sends just that user text
    with --resume <id>, skipping --system-prompt. Any mismatch falls
    back to a fresh turn and clears state.
  - On parse or subprocess error, invalidate the session so a stale
    id doesn't keep failing turn after turn.
  - Dropped --no-session-persistence. Freshness is now controlled by
    whether we pass --resume, not by whether the session is stored.
  - Opt-out: HAFT_CLAUDECODE_NO_RESUME=1 forces every turn fresh.
  - Live-verified: turn 2 via --resume correctly recalled context from
    turn 1 ("my name is Zephyr" -> "Your name is Zephyr.").

(2) Shared env utility
  - New internal/envutil.Strip(env, keys...) with its own tests,
    covering single-key, multi-key, shared-prefix, order-preservation.
  - claudecode.go dropped its local envWithout; uses envutil.Strip.
  - cli/tui_spawn.go:tuiProcessEnv collapsed from 20 lines to 5; same
    behavior, one filter call.

Other provider changes:
  - parseClaudeStream now returns a claudeStreamResult struct (text,
    finishReason, sessionID) instead of three values. Cleaner, and
    extensible for future fields (token counts).
  - streamEvent picks up session_id from every event; the last one
    wins — `result` carries it reliably, so settled-state is correct.

Tests: 6 new cases for takeResumeDecision (fresh/continue/gap/non-user-
tail/env-optout) and session bookkeeping (invalidate/record). All green.
Moved the big --resume design sketch out of 'Future work' (it ships in
this PR). Replaced with a 'Session reuse' section explaining the warm
path, divergence fallback, and the opt-out env var.
Audited every file for quint/QUINT references. Kept the load-bearing
ones (migration code in cli/init, cli/serve fallback, project/index,
project.go legacy db read; historical attribution in README/CONTRIBUTING;
naming-migration table in spec/AGENT_CONTRACT, OPEN_QUESTIONS; canonical
install URL quint.codes in goreleaser). Fixed the stale ones that were
just mislabelled:

- internal/provider/claudecode.go + test: QUINT_PROJECT_ROOT →
  HAFT_PROJECT_ROOT. serve.go still accepts QUINT_PROJECT_ROOT as a
  fallback for old configs, but new invocations should use the modern
  env var.
- docs/claude-code-provider.md: matching env var rename.
- db/store_test.go: audit-log fixtures used quint_propose / quint_verify.
  Renamed to haft_propose / haft_verify — audit labels are free-form,
  but the test should reflect current tool naming.
- internal/artifact/nav_test.go:TestContract_NoToolCallSyntax: the
  contract was "no tool-call syntax in NextAction" but only checked
  for quint_. Expanded to also reject haft_ so new code doesn't
  regress — defense in depth keeps both checks.
- .golangci.yml: G204 comment said "we call quint/git as subprocess".
  Now claude/git.

Also: .gitignore picks up .osgrep (tool cache, shouldn't be tracked).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant