diff --git a/CHANGELOG.md b/CHANGELOG.md
index c1d104b..3b69f67 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,14 @@
# @copilotkit/llmock
+## 1.6.0
+
+### Minor Changes
+
+- Provider-specific endpoints: dedicated routes for Bedrock (`/model/{modelId}/invoke`), Ollama (`/api/chat`, `/api/generate`), Cohere (`/v2/chat`), and Azure OpenAI deployment-based routing (`/openai/deployments/{id}/chat/completions`)
+- Chaos injection: `ChaosConfig` type with `drop`, `malformed`, and `disconnect` actions; supports per-fixture chaos via `chaos` config on each fixture and server-wide chaos via `--chaos-drop`, `--chaos-malformed`, and `--chaos-disconnect` CLI flags
+- Metrics: `GET /metrics` endpoint exposing Prometheus text format with request counters and latency histograms per provider and route
+- Record-and-replay: `--record` flag and `proxyAndRecord` helper that proxies requests to real LLM APIs, collapses streaming responses, and writes fixture JSON to disk for future playback
+
## 1.5.1
### Patch Changes
diff --git a/README.md b/README.md
index f310c12..bd60779 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
# @copilotkit/llmock [](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) [](https://www.npmjs.com/package/@copilotkit/llmock)
-Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, and Azure API formats, driven entirely by fixtures. Zero runtime dependencies.
+Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, and Cohere API formats, driven entirely by fixtures. Zero runtime dependencies.
## Quick Start
@@ -45,7 +45,7 @@ MSW can't intercept any of those calls. llmock can — it's a real server on a r
**Use llmock when:**
- Multiple processes need to hit the same mock (E2E tests, agent frameworks, microservices)
-- You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini)
+- You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere)
- You prefer defining fixtures as JSON files rather than code
- You need a standalone CLI server
@@ -72,17 +72,20 @@ MSW can't intercept any of those calls. llmock can — it's a real server on a r
## Features
-- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [OpenAI Responses](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html)
+- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [OpenAI Responses](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html) (streaming + Converse), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
- **[Embeddings API](https://llmock.copilotkit.dev/embeddings.html)** — OpenAI-compatible embedding responses with configurable dimensions
- **[Structured output / JSON mode](https://llmock.copilotkit.dev/structured-output.html)** — `response_format`, `json_schema`, and function calling
- **[Sequential responses](https://llmock.copilotkit.dev/sequential-responses.html)** — Stateful multi-turn fixtures that return different responses on each call
- **[Streaming physics](https://llmock.copilotkit.dev/streaming-physics.html)** — Configurable `ttft`, `tps`, and `jitter` for realistic timing
- **[WebSocket APIs](https://llmock.copilotkit.dev/websocket.html)** — OpenAI Responses WS, Realtime API, and Gemini Live
- **[Error injection](https://llmock.copilotkit.dev/error-injection.html)** — One-shot errors, rate limiting, and provider-specific error formats
+- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
+- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
- **[Request journal](https://llmock.copilotkit.dev/docs.html)** — Record, inspect, and assert on every request
- **[Fixture validation](https://llmock.copilotkit.dev/fixtures.html)** — Schema validation at load time with `--validate-on-load`
- **CLI with hot-reload** — Standalone server with `--watch` for live fixture editing
- **[Docker + Helm](https://llmock.copilotkit.dev/docker.html)** — Container image and Helm chart for CI/CD pipelines
+- **Record-and-replay** — VCR-style proxy-on-miss records real API responses as fixtures for deterministic replay
- **[Drift detection](https://llmock.copilotkit.dev/drift-detection.html)** — Daily CI runs against real APIs to catch response format changes
- **Claude Code integration** — `/write-fixtures` skill teaches your AI assistant how to write fixtures correctly
@@ -92,17 +95,24 @@ MSW can't intercept any of those calls. llmock can — it's a real server on a r
llmock [options]
```
-| Option | Short | Default | Description |
-| -------------------- | ----- | ------------ | ----------------------------------------- |
-| `--port` | `-p` | `4010` | Port to listen on |
-| `--host` | `-h` | `127.0.0.1` | Host to bind to |
-| `--fixtures` | `-f` | `./fixtures` | Path to fixtures directory or file |
-| `--latency` | `-l` | `0` | Latency between SSE chunks (ms) |
-| `--chunk-size` | `-c` | `20` | Characters per SSE chunk |
-| `--watch` | `-w` | | Watch fixture path for changes and reload |
-| `--log-level` | | `info` | Log verbosity: `silent`, `info`, `debug` |
-| `--validate-on-load` | | | Validate fixture schemas at startup |
-| `--help` | | | Show help |
+| Option | Short | Default | Description |
+| -------------------- | ----- | ------------ | ------------------------------------------- |
+| `--port` | `-p` | `4010` | Port to listen on |
+| `--host` | `-h` | `127.0.0.1` | Host to bind to |
+| `--fixtures` | `-f` | `./fixtures` | Path to fixtures directory or file |
+| `--latency` | `-l` | `0` | Latency between SSE chunks (ms) |
+| `--chunk-size` | `-c` | `20` | Characters per SSE chunk |
+| `--watch` | `-w` | | Watch fixture path for changes and reload |
+| `--log-level` | | `info` | Log verbosity: `silent`, `info`, `debug` |
+| `--validate-on-load` | | | Validate fixture schemas at startup |
+| `--chaos-drop` | | `0` | Chaos: probability of 500 errors (0-1) |
+| `--chaos-malformed` | | `0` | Chaos: probability of malformed JSON (0-1) |
+| `--chaos-disconnect` | | `0` | Chaos: probability of disconnect (0-1) |
+| `--metrics` | | | Enable Prometheus metrics at /metrics |
+| `--record` | | | Record mode: proxy unmatched to real APIs |
+| `--strict` | | | Strict mode: fail on unmatched requests |
+| `--provider-*` | | | Upstream URL per provider (with `--record`) |
+| `--help` | | | Show help |
```bash
# Start with bundled example fixtures
@@ -113,6 +123,12 @@ llmock -p 8080 -f ./my-fixtures
# Simulate slow responses
llmock --latency 100 --chunk-size 5
+
+# Record mode: proxy unmatched requests to real APIs and save as fixtures
+llmock --record --provider-openai https://api.openai.com --provider-anthropic https://api.anthropic.com
+
+# Strict mode in CI: fail if any request doesn't match a fixture
+llmock --strict -f ./fixtures
```
## Documentation
diff --git a/charts/llmock/Chart.yaml b/charts/llmock/Chart.yaml
index 36de243..5603860 100644
--- a/charts/llmock/Chart.yaml
+++ b/charts/llmock/Chart.yaml
@@ -3,4 +3,4 @@ name: llmock
description: Deterministic mock LLM server for testing (OpenAI, Anthropic, Gemini)
type: application
version: 0.1.0
-appVersion: "1.4.0"
+appVersion: "1.6.0"
diff --git a/docs/aws-bedrock.html b/docs/aws-bedrock.html
index dd5fa99..09cf238 100644
--- a/docs/aws-bedrock.html
+++ b/docs/aws-bedrock.html
@@ -54,7 +54,8 @@
Providers
>Responses API (OpenAI)Claude MessagesGeminiAzure OpenAIAWS BedrockAWS BedrockOllamaCohereVertex AICompatible Providers
+
+ Streaming (invoke-with-response-stream)
+
+ The invoke-with-response-stream endpoint returns responses using the
+ AWS Event Stream binary protocol. llmock implements this protocol
+ natively — each response chunk is encoded as a binary frame with CRC32 checksums,
+ headers, and a JSON payload, exactly as the real Bedrock service sends them.
+
+ Streaming events follow the Bedrock Claude streaming sequence:
+
+ -
+
messageStart — opens the message with role: "assistant"
+
+ contentBlockStart — begins a content block
+ -
+
contentBlockDelta — delivers text chunks (text_delta) or
+ tool input (input_json_delta)
+
+ contentBlockStop — closes the content block
+ -
+
messageStop — closes the message with a stopReason
+
+
+
+
+
+
import { BedrockRuntimeClient, InvokeModelWithResponseStreamCommand } from "@aws-sdk/client-bedrock-runtime";
+
+const client = new BedrockRuntimeClient({
+ region: "us-east-1",
+ endpoint: "http://localhost:4005",
+ credentials: { accessKeyId: "mock", secretAccessKey: "mock" },
+});
+
+const response = await client.send(new InvokeModelWithResponseStreamCommand({
+ modelId: "anthropic.claude-3-5-sonnet-20241022-v2:0",
+ contentType: "application/json",
+ body: JSON.stringify({
+ anthropic_version: "bedrock-2023-05-31",
+ max_tokens: 512,
+ messages: [{ role: "user", content: "Hello" }],
+ }),
+}));
+
+
+ AWS Event Stream Binary Format
+
+ Unlike SSE-based streaming used by OpenAI and Claude, AWS Bedrock streaming uses a
+ binary event stream protocol. Each frame has the following layout:
+
+
+
+
[total_length: 4B uint32-BE]
+[headers_length: 4B uint32-BE]
+[prelude_crc32: 4B CRC32 of first 8 bytes]
+[headers: variable-length string key-value pairs]
+[payload: raw JSON bytes]
+[message_crc32: 4B CRC32 of entire frame minus last 4 bytes]
+
+
+ llmock encodes these frames with proper CRC32 checksums, so the AWS SDK can decode them
+ natively. The :event-type header in each frame carries the event name (e.g.
+ chunk), and the :content-type header is set to
+ application/json.
+
+
+ Converse API
+
+ The Converse API is AWS Bedrock's provider-agnostic conversation interface. It uses
+ camelCase field names and a different request structure than the Claude-native invoke
+ endpoints. llmock supports both /model/{modelId}/converse (non-streaming) and
+ /model/{modelId}/converse-stream (streaming via Event Stream binary).
+
+
+
+
+
{
+ "messages": [
+ {
+ "role": "user",
+ "content": [{ "text": "Hello" }]
+ }
+ ],
+ "system": [{ "text": "You are helpful" }],
+ "inferenceConfig": { "maxTokens": 512 }
+}
+
+
+
+
+
{
+ "output": {
+ "message": {
+ "role": "assistant",
+ "content": [{ "text": "Hello!" }]
+ }
+ },
+ "stopReason": "end_turn",
+ "usage": { "inputTokens": 0, "outputTokens": 0, "totalTokens": 0 }
+}
+
+
+
+ The Converse API also supports tool calls via toolUse and
+ toolResult content blocks, and tool definitions via the
+ toolConfig field. llmock translates all of these to the unified internal
+ format for fixture matching.
+