provider/llamacpp — llama-server HTTP Provider

Package path: offdev/micro-agent-go/internal/provider/llamacpp
Last updated: 2026-03-26

Overview

llamacpp.Provider implements core.LLMProvider by speaking to a running llama-server process over HTTP. It uses the OpenAI-compatible REST API exposed by llama-server. No CGo, no embedding of llama.cpp, no API key required. User messages with core.Message.Images are sent as multimodal content: a JSON array of text and image_url parts (data URLs with base64).

Default target in examples: llama-server on http://localhost:8080 (override with config / LLAMA_URL).

Usage

provider := llamacpp.New("")  // uses http://localhost:8080
provider := llamacpp.New("http://localhost:8080")

// Synchronous completion (TopP, TopK, MinP, PresencePenalty, RepetitionPenalty optional; 0 = server default)
resp, err := provider.Complete(ctx, &core.CompletionRequest{
    Model:       "default",
    Messages:    conv.Messages,
    Tools:       toolDefs,
    MaxTokens:   8192,
    Temperature: 0.7,
})

// Streaming (content and optional thinking)
ch, err := provider.Stream(ctx, req)
for delta := range ch {
    if delta.Done { break }
    if delta.Thinking {
        fmt.Print("[thinking] ", delta.Content)
    } else {
        fmt.Print(delta.Content)
    }
}
// When delta.Done, delta.Final holds the accumulated Response (content + tool_calls).

// Optional: trace raw response shape when parsed content/tool_calls are empty (e.g. -vvv)
provider.SetTraceLogger(log)

// Embeddings (not part of core.LLMProvider interface)
vec, err := provider.Embed(ctx, "some text to embed")

Wire Format

Requests are sent as JSON to /v1/chat/completions (OpenAI-compatible format).

core type	wire field
`Message{Role: RoleAssistant, ToolCalls: [...]}`	`"content": null, "tool_calls": [...]`
`Message{Role: RoleTool, ...}`	`"role": "tool", "tool_call_id": "..."`
`ToolDef.Parameters`	`function.parameters` (passed through verbatim)

Content nullability: Assistant messages with only tool calls send "content": null. Messages with text content send "content": "<text>".

Response content: The provider accepts message.content as either a JSON string or an array of parts (e.g. [{"type":"text","text":"..."}]). Some backends return content as an array after tool-use turns; parsing both shapes ensures the agent loop continues correctly (e.g. cron heartbeat with tool calls). When the server reports output tokens but parsed content and tool_calls are empty, trace logging (-vvv) can log the raw message for diagnostics (see SetTraceLogger).

Thinking / reasoning_content: When llama-server is run with thinking models (e.g. DeepSeek R1, Command R7B) and --reasoning-format deepseek (or equivalent), the stream delta may include reasoning_content in addition to content. The provider emits Delta{Thinking: true, Content: ...} for reasoning fragments and Delta{Thinking: false, Content: ...} for the main reply. The final delta has Done: true and Final set with accumulated content and tool_calls. If the server does not send reasoning_content, only content deltas are emitted (no thinking).

When CompletionRequest.ThinkingEnabled is non-nil, the provider maps it to the request body (e.g. chat-template / thinking flags) as supported by the server.

Embed

Provider.Embed calls POST /v1/embeddings and returns the first embedding vector. This method is outside the core.LLMProvider interface. internal/app passes it to memory.OpenStore as memory.EmbedFunc when MEMORY_EMBED is true, for Milvus vector storage and search.

HTTP Client

The http.Client has no timeout by default. LLM inference duration is unbounded; callers must use context cancellation to control per-request timeouts. The client is shared across all calls (connection pooling via net/http default transport).

Error Handling

Non-200 HTTP responses return fmt.Errorf("llamacpp: http status %d", ...).
JSON decode failures return wrapped errors.
Streaming goroutine errors are silently dropped (malformed SSE chunks are skipped); the channel is always closed cleanly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provider/llamacpp — llama-server HTTP Provider

Overview

Usage

Wire Format

Embed

HTTP Client

Error Handling

FilesExpand file tree

provider.md

Latest commit

History

provider.md

File metadata and controls

provider/llamacpp — llama-server HTTP Provider

Overview

Usage

Wire Format

Embed

HTTP Client

Error Handling