Skip to content

v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay#53

Merged
jpr5 merged 19 commits intomainfrom
feat/v1.6.0-subspec1
Mar 22, 2026
Merged

v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay#53
jpr5 merged 19 commits intomainfrom
feat/v1.6.0-subspec1

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented Mar 20, 2026

Summary

Major feature release adding 8 capabilities to llmock, plus 29 bugs found and fixed in code review.

Provider Endpoints

  • Bedrock Streaming — invoke-with-response-stream (AWS Event Stream binary) + Converse API
  • Vertex AI — Routes to existing Gemini handler
  • Ollama — /api/chat, /api/generate, /api/tags (NDJSON streaming)
  • Cohere — /v2/chat (typed SSE events)

Infrastructure

  • Chaos Testing — Probabilistic drop/malformed/disconnect, three precedence levels (header > fixture > server), rate clamping to [0,1]
  • Prometheus Metrics — Opt-in /metrics, counters, cumulative histograms, gauges

Record-and-Replay

  • Proxy-on-miss — Real API responses saved as fixtures with 30s upstream timeout
  • Stream collapsing — 6 functions (SSE, NDJSON, EventStream) supporting both Converse and Messages formats
  • Strict mode (503) — Catch missing fixtures in CI
  • Auth safety — Forwarded but redacted in journal, never in fixtures

Quality

  • 1250 tests across 37 files
  • 7 rounds of 7-agent code review, 29 bugs found and fixed
  • Build/format/lint clean, zero external dependencies, zero as-any in source

Review Fixes (29 total across 7 rounds)

Round 1: Original review (20 findings)

  • HandlerDefaults type extracted, fixing silent undefined access in 5 handlers
  • Provider-specific error formats (Anthropic, Gemini, Bedrock)
  • Recorder binary relay corruption (UTF-8 round-trip on EventStream)
  • collapseOllamaNDJSON tool_calls + buildFixtureResponse priority
  • ChaosAction dedup, RecordProviderKey union, OllamaMessage.role union
  • collapseCohereSSE naming, chaos rate clamping, recorder auth comment
  • SKILL.md 503 status, warn log level, README provider list, types.ts header

Round 2 (2 findings)

  • applyChaos registry argument missing in 5 handlers (chaos metrics incomplete)
  • Bedrock Converse response format missing in buildFixtureResponse

Round 5 — fresh context (2 findings)

  • Global recordCounter → crypto.randomUUID() (concurrent test determinism)
  • rawBody pass-through in OpenAI completions proxy path

Round 6 — fresh context (2 findings)

  • 30s upstream timeout in makeUpstreamRequest (prevents indefinite hangs)
  • collapseBedrockEventStream: handle both Converse (camelCase) and Messages (flat type) formats

Round 7 — fresh context (3 findings)

  • new URL() validation with specific 502 error for malformed provider URLs
  • writtenToDisk flag to prevent misleading "Response recorded" log on write failure
  • res.on("error") handler for upstream response stream mid-transfer drops

All fixes have corresponding regression tests.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 20, 2026

Open in StackBlitz

npm i https://pkg.pr.new/CopilotKit/llmock/@copilotkit/llmock@53

commit: c694c9b

@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch from f858fd2 to 37cdc75 Compare March 20, 2026 18:24
@jpr5 jpr5 changed the title v1.6.0 Sub-spec 1: Bedrock Streaming, Ollama, Cohere, Vertex AI, Chaos Testing, Prometheus Metrics v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay Mar 20, 2026
@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch 3 times, most recently from f7e8ef7 to 711a600 Compare March 21, 2026 00:18
@jpr5
Copy link
Copy Markdown
Contributor Author

jpr5 commented Mar 21, 2026

PR (53) Review — CopilotKit/llmock (2026-03-21)

Critical Issues

(1) handleBedrock defaults type mismatchbedrock.ts:247 declares its defaults parameter as { latency: number; chunkSize: number; logger: Logger; chaos?: ChaosConfig } but the function body accesses defaults.record (line 320), defaults.strict (lines 342–347), and passes defaults to proxyAndRecord (line 328) which expects record? and logger. Missing record?: RecordConfig, strict?: boolean, registry?: MetricsRegistry. Works at runtime only because JavaScript ignores type annotations, but the type contract is incorrect and would fail under strict TypeScript settings.

(2) Systematic defaults type narrowing across 5 handlers — The same bug as (1) is replicated in handleResponses (responses.ts:502), handleMessages (messages.ts:434), handleGemini (gemini.ts:382), and handleEmbeddings (embeddings.ts:43). All five handlers access defaults.record and defaults.strict despite those properties not existing on their declared type. Contrast with handleBedrockStream (line 556), handleConverse, handleConverseStream, handleOllama, handleOllamaGenerate, and handleCohere, which all correctly declare the full type.

(3) applyChaos registry gap — chaos metrics silently lost for 5 handlers — Because handleBedrock, handleResponses, handleMessages, handleGemini, and handleEmbeddings do not have registry in their narrowed defaults type, their calls to applyChaos pass undefined for the registry parameter. The llmock_chaos_triggered_total Prometheus counter is never incremented for these endpoints even when metrics are enabled. Chaos metrics are incomplete — only the newer handlers (Ollama, Cohere, Bedrock streaming, Converse) report chaos events.

(4) Recorder binary relay corrupts Bedrock EventStream datarecorder.ts line 67 declares upstreamBody: string and line 210 converts binary buffers via rawBuffer.toString() (utf-8 default). Line 177 relays via res.end(upstreamBody). Binary EventStream frames contain CRC32 checksums and binary-encoded lengths that are corrupted by utf-8 round-tripping. The collapse path (line 95) correctly uses the raw buffer, but the direct relay path sends corrupted data to the client.

(5) collapseOllamaNDJSON ignores tool_callsstream-collapse.ts lines 270–297 only extracts message.content and response fields, never handling message.tool_calls. Ollama streaming tool call responses would not be collapsed correctly in the recorder path. The test suite also has no coverage for this case.

(6) buildFixtureResponse returns empty TextResponse for Ollama tool callsrecorder.ts buildFixtureResponse checks content before toolCalls. For Ollama responses that include both message.content: "" (empty string) and message.tool_calls: [...], the empty string is non-null and matches the text content path first, returning { content: "" } and silently discarding the tool calls.


Important Issues

(7) SKILL.md documents wrong status code for --strictskills/write-fixtures/SKILL.md line 434 states "--strict returns a 404 error for unmatched requests." The code actually returns 503 (const strictStatus = defaults.strict ? 503 : 404 in every handler). types.ts line 233 correctly documents 503.

(8) Recorder filesystem write failures not propagated — When fs.writeFileSync fails in the recorder (disk full, permissions, etc.), the error is caught and logged but the HTTP response to the client still succeeds. The client has no indication that the fixture was not saved.

(9) Duplicated ChaosAction unionJournalEntry.response.chaosAction in types.ts line 159 inlines "drop" | "malformed" | "disconnect" rather than importing the ChaosAction type from chaos.ts. If a new chaos action is added to one location, the other could silently diverge.

(10) bedrock.ts module docstring incomplete — The docstring says "AWS Bedrock Claude invoke endpoint support" and describes only the non-streaming /model/{modelId}/invoke format, but the file also exports handleBedrockStream, buildBedrockStreamTextEvents, and buildBedrockStreamToolCallEvents for the streaming endpoint.

(11) README stale provider list — Line 48 says "(OpenAI, Claude, Gemini)" but this PR adds Bedrock, Vertex AI, Ollama, and Cohere support.

(12) types.ts file header stale — Line 1 comment says "OpenAI Chat Completion request types (subset we care about)" but the file now defines types for chaos, recording, metrics, streaming profiles, fixture matching, journal entries, and server options.


Suggestions

(13) Extract shared HandlerDefaults type — Replace 12+ inline type declarations across all handler functions with a single exported interface. This fixes (1)–(3) and prevents future divergence:

export interface HandlerDefaults {
  latency: number;
  chunkSize: number;
  logger: Logger;
  chaos?: ChaosConfig;
  registry?: MetricsRegistry;
  strict?: boolean;
  record?: RecordConfig;
}

(14) ChaosConfig lacks range validation — No validation that probability values are in [0, 1]. A header like x-llmock-chaos-drop: 50 silently sets a 5000% drop rate (always triggers). Consider a validation helper at server startup or header parse time.

(15) RecordConfig provider keys are untypedRecord<string, string | undefined> accepts any string key, but the system only recognizes specific provider names. Consider a string union.

(16) collapseCohereSS naming inconsistency — Missing trailing "E" for "SSE" consistency with collapseOpenAISSE, collapseAnthropicSSE, collapseGeminiSSE. This is a public export from index.ts.

(17) recorder.ts misleading comment — Line 151 says auth headers are "in the match/response, not headers" but they are simply excluded entirely from fixtures.

(18) SKILL.md log level incorrect — Line 444 says "every proxy hit logs at info level" but the code (recorder.ts line 50) uses logger.warn.

(19) OllamaMessage.role typed as string — Should be "system" | "user" | "assistant" | "tool" to match other provider message types.

(20) Missing test coverage gaps — No tests for: non-streaming Bedrock strict/record mode with full defaults, Ollama NDJSON tool_calls collapse, writeNDJSONStream latency option, Cohere tool_calls streaming, recorder binary CRC integrity of relayed EventStream frames.

Copy link
Copy Markdown
Contributor Author

@jpr5 jpr5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See most recent comment for the full review.

@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch 3 times, most recently from 128ce26 to 3cf4957 Compare March 21, 2026 15:52
jpr5 added 4 commits March 21, 2026 09:18
New providers: Bedrock streaming/Converse, Vertex AI, Ollama, Cohere
Chaos: probabilistic drop/malformed/disconnect with 3-level precedence
Metrics: opt-in Prometheus /metrics endpoint
Record-and-replay: proxy-on-miss, 6 stream collapse functions, strict mode
HandlerDefaults shared type, provider-specific error formats,
upstream timeout, binary relay, response stream error handling
Provider endpoints, chaos, metrics, recorder, stream collapse,
strict mode, binary EventStream, NDJSON, Converse/Messages formats,
rate clamping, URL validation, write failure logging, drift tests
New: Ollama, Cohere, Vertex AI, Chaos Testing, Metrics, Record-and-Replay
Updated: all provider pages, fixtures, error injection, streaming physics,
WebSocket, Docker, drift detection, compatible providers, README, SKILL.md
@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch from 8c16062 to 402c8fa Compare March 21, 2026 16:19
jpr5 added 6 commits March 21, 2026 10:26
…haos switch

- Narrow proxyAndRecord/handleGemini/handleCompletions providerKey from
  string to RecordProviderKey, removing the unsafe providers cast
- Add "api-key" to recorder headersToForward for Azure OpenAI auth
- Replace {} as ChatCompletionRequest with null in all error-path journal
  entries across 9 handler files (server, gemini, bedrock, bedrock-converse,
  cohere, embeddings, messages, ollama, responses)
- Broaden JournalEntry.body to ChatCompletionRequest | null to match
- Remove chunkSize/streamingProfile from EmbeddingFixtureOpts (unused, non-streaming)
- Add default: never exhaustive check to applyChaos() switch
…via logger

- Wrap metrics res.on("finish") callback in try-catch with logger.debug
  to prevent unhandled EventEmitter errors from crashing the server
- Propagate decodeEventStreamFrames truncated flag through CollapseResult
  instead of console.warn, so it respects logLevel configuration
- Log Bedrock CRC mismatch warning via defaults.logger in recorder
--strict does not prevent proxying — proxy is attempted first when
--record is set; 503 only fires when proxy is absent or fails
- metrics.test.ts: add test that injects a faulty registry via spy to
  verify the try-catch in res.on("finish") prevents process crashes;
  rename existing test for accuracy
- stream-collapse.test.ts: update CRC mismatch tests to assert
  result.truncated === true (replaced console.warn spy pattern)
@jpr5
Copy link
Copy Markdown
Contributor Author

jpr5 commented Mar 21, 2026

PR (53) Resolution Report

What Was Fixed

Critical Issues (all 4 fixed):

(1) Chart.yaml appVersion stale — Updated from "1.4.0" to "1.6.0" to match package.json.

(2) SKILL.md --strict documentation incorrect — Rewrote to clarify proxy-first behavior: proxy is still attempted when --record is set; 503 only fires when proxy is absent or fails.

(3) Azure api-key header missing from recorder — Added "api-key" to headersToForward in src/recorder.ts.

(4) Metrics finish callback no error handling — Wrapped in try-catch with logger.debug() logging to prevent unhandled EventEmitter crashes.

Important Issues (all fixed):

(5) decodeEventStreamFrames silent truncation — Changed return type to { frames, truncated: boolean }. CRC mismatch now signals truncation instead of silently breaking.

(6) Recorder write failure behavior — Pre-existing design (error logged + header set + response still relayed). No change needed; behavior is intentional.

(7) Missing version/CHANGELOG — Bumped package.json to 1.6.0, updated Chart.yaml, added full ## 1.6.0 CHANGELOG entry. Fixed Cohere path (/v2/chat) and chaos wording in the entry.

(8) RecordProviderKey type cast defeated — Removed as Record<string, string | undefined> cast; narrowed providerKey from string to RecordProviderKey in proxyAndRecord, handleGemini, and handleCompletions.

(9) FixtureResponse discriminant — Skipped; requires architectural change. Left as-is.

Suggestions (all implemented):

(10) default: never exhaustive check added to applyChaos() switch.

(11) console.warn in stream-collapse replaced: truncated propagated through CollapseResult; caller (recorder.ts) logs via defaults.logger.warn.

(12) StreamWriterOptions extraction — Skipped; would require multi-file refactor for a one-PR benefit.

(13) {} as ChatCompletionRequestbody: null in all 19 error-path journal entries across 9 handler files; JournalEntry.body widened to ChatCompletionRequest | null.

(14) Fixed recorder.ts comment ("strip x-llmock-* headers" → "Forward only safe headers").

(15) EmbeddingFixtureOpts — Removed chunkSize and streamingProfile (inapplicable to non-streaming embeddings).

Tests Added

  • metrics.test.ts: New test injects faulty registry via vi.spyOn to verify the try-catch prevents process crashes.
  • stream-collapse.test.ts: Two CRC mismatch tests asserting result.truncated === true for both prelude and message CRC corruption paths.
  • drift-scripts.test.ts: 27 tests for drift remediation scripts (was untracked, added to branch).

Commits Made

3657cf1 test: add unit tests for drift remediation scripts
de8cfc3 test: cover metrics crash guard and Bedrock CRC truncation
3d479ef docs: correct --strict mode documentation in SKILL.md
63e718d fix: observability — metrics crash guard, Bedrock truncation warning via logger
8f14082 fix: type safety — RecordProviderKey, null journal body, exhaustive chaos switch
6be3821 chore: bump version to 1.6.0, update Chart.yaml appVersion, add CHANGELOG entry

CI Status

All 10 checks pass: commitlint, eslint, prettier, exports, test (20/22/24), preview, build-and-push, Continuous Releases.

1281 tests passing (up from 1264 at the start).

@jpr5
Copy link
Copy Markdown
Contributor Author

jpr5 commented Mar 21, 2026

Critical Issues

None.

Important Issues

(1) Metrics instrumentation errors are effectively silentsrc/server.ts lines 419–440 — The res.on("finish") metrics callback wraps all instrumentation in a try/catch that logs at debug level. In production, log level is typically info, meaning metrics failures (wrong label cardinality, registry misconfiguration) will never surface. This makes metrics bugs invisible until someone notices dashboards are empty. Recommendation: Log at warn level, or at minimum info, so operators see metrics failures without enabling debug logging.

(2) Chaos header values silently ignored when out of rangesrc/chaos.ts lines 40–52 — resolveChaosConfig parses x-llmock-chaos-* headers via parseFloat but does not validate the result is in [0, 1]. A value like x-llmock-chaos-drop: 2.0 or x-llmock-chaos-drop: banana (NaN) is accepted without warning. NaN propagates through Math.random() < NaN which is always false, silently disabling chaos. Values >1 always trigger chaos, which may be intentional but is undocumented. Recommendation: Clamp parsed values to [0, 1] and log a warning when input is out of range or NaN.

(3) Unknown provider fallback in stream-collapse is a silent TODOsrc/stream-collapse.ts lines 636–639 — When collapseStreamingResponse receives an unrecognized provider string, it falls through to a default case that returns an empty CollapseResult with a // TODO comment. This means record-and-replay silently produces empty fixtures for any provider not yet handled, with no log or error. Recommendation: Log a warning when hitting the unknown-provider fallback so operators know fixture recording was skipped, and track the TODO.

Suggestions

(4) Make CollapseResult a discriminated unionsrc/stream-collapse.tsCollapseResult currently has optional content and toolCalls fields. A discriminated union ({ type: "text"; content: string } | { type: "toolCalls"; toolCalls: ToolCall[] } | { type: "empty" }) would make it impossible to construct ambiguous results and simplify downstream consumers.

(5) Accept RecordProviderKey instead of string in collapseStreamingResponseThe provider parameter is typed as string, but the caller always passes a value derived from RecordProviderKey. Using the union type would catch typos at compile time and eliminate the need for the unknown-provider default branch.

(6) Centralize chunkSize >= 1 clampingMultiple handlers independently clamp chunkSize with Math.max(1, ...). Moving this into HandlerDefaults resolution (or a shared helper) would eliminate the repetition and prevent a future handler from forgetting the clamp.

(7) Correct misleading "mid-flight" wording in chaos.ts docstringsrc/chaos.ts — The applyChaos docstring says chaos is applied "mid-flight," but for drop and malformed actions, the effect is applied before any response bytes are sent. "Mid-flight" is only accurate for disconnect. Suggest rewording to "applies the selected chaos action to the response."

(8) Validate StreamingProfile and ChaosConfig ranges at fixture load timeNegative ttft, tps <= 0, or jitter outside [0, 1] would produce nonsensical streaming behavior. Validating these when fixtures are loaded (rather than at request time) gives earlier, clearer errors.

(9) Minor test coverage gaps

  • CLI new flags (--chaos-*, --metrics, --record, --strict, --provider-*) lack dedicated unit tests
  • Disconnect chaos action is not integration-tested end-to-end
  • Recorder upstream timeout/failure path is not tested

Strengths

  • Comprehensive provider coverage — Bedrock (invoke + stream + Converse), Ollama, Cohere, and Vertex AI all correctly translate to the unified ChatCompletionRequest format, preserving the mock server's core fixture-matching architecture.
  • Correct binary protocol implementation — The AWS Event Stream encoder (aws-event-stream.ts) properly implements CRC32 checksums for both prelude and full message, with the stream collapser validating checksums on decode.
  • Clean chaos injection design — Three-level precedence (header > fixture > server) with an exhaustive never-guarded switch gives operators fine-grained control and compile-time safety against unhandled actions.
  • Zero-dependency metrics — The Prometheus registry avoids pulling in prom-client while correctly implementing counters, histograms (with configurable buckets and +Inf), and gauges with label support.
  • Strong test suite — 1,265 tests passing with coverage across all new providers, chaos injection, metrics serialization, stream collapsing, and fixture matching. Previously reported issues (4 from prior review) all confirmed resolved.
  • Consistent architecture — Every new provider follows the same *ToCompletionRequesthandleX / handleXStream pattern, making the codebase predictable and easy to extend.

Recommended Action

FIX EVERYTHING.

Copy link
Copy Markdown
Contributor Author

@jpr5 jpr5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See most recent comment for the full review.

jpr5 added 4 commits March 21, 2026 11:18
…ion test

Clamp x-llmock-chaos-* header values to [0,1] and warn on NaN or out-of-range
input. Restore universal clamping in resolveChaosConfig to cover fixture-level
and server-default rates (regression from prior change). Fix file-level docstring
to accurately describe the three chaos actions. Add tests for header clamping/NaN
behavior and disconnect chaos action end-to-end.
…nish callback

Wrap the res.on('finish') metrics block in try/catch to prevent instrumentation
errors (wrong label cardinality, registry misconfiguration) from propagating
silently or crashing the request handler. Log failures at warn level so operators
see them without enabling debug logging.
Change providerKey parameter type from string to RecordProviderKey in
collapseStreamingResponse, proxyAndRecord, handleGemini, and handleCompletions.
Catches provider key typos at compile time. Add console.warn for unknown SSE
provider fallback and document the OpenAI fallback behavior in the docstring.
Add TODO comments for CollapseResult discriminated union and chunkSize helper
centralization. Fix test comment and cast for unknown-provider fallback path.
…d time

Add error-severity validation checks in validateFixtures for streamingProfile
(ttft >= 0, tps > 0, jitter in [0,1]) and chaos (all rates in [0,1]). Catches
nonsensical streaming physics and out-of-range chaos rates early with clear
error messages rather than silently producing broken behavior at request time.
Copy link
Copy Markdown
Contributor Author

@jpr5 jpr5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR (53) Review — CopilotKit/llmock (2026-03-21)

PR: v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay
Branch: pr-53main
Review Date: 2026-03-21
Files Changed: 37+ source files, 37 test files, docs, CI/workflow
Tests: 1,284 passing across 37 test files


Critical Issues

(1) docs/docker.html contains multiple factual errors

  • References a --config flag (described as v1.7.0) that does not exist in the CLI
  • References --chaos-error-rate flag that does not exist — the actual flags are --chaos-drop, --chaos-malformed, --chaos-disconnect
  • Calls the CLI binary aimock instead of llmock in multiple places
  • States health/readiness probes use "TCP socket checks" but the Helm template (templates/deployment.yaml) actually uses httpGet on /health
  • The values.yaml snippet shows mountPath: /app/fixtures but the Dockerfile WORKDIR is /app and the CMD references ./fixtures

These errors will actively mislead users trying to deploy via Docker/Kubernetes.

Files: docs/docker.html

(2) stream-collapse.ts decodeEventStreamFrames lacks bounds checking on totalLength

In decodeEventStreamFrames() (line ~438-481), the function reads totalLength from the binary buffer but does not validate that the remaining buffer has enough bytes before attempting to slice. A malformed or truncated Bedrock EventStream response could cause an out-of-bounds read or return garbage data.

File: src/stream-collapse.ts:438-481

(3) chaos.ts and stream-collapse.ts use console.warn instead of Logger

The project has a well-designed Logger class, but:

  • src/chaos.ts lines 44, 55, 70 use console.warn for chaos action logging
  • src/stream-collapse.ts line 641 uses console.warn for unknown provider fallback

This bypasses log-level filtering and structured logging. In production or test environments where logLevel: "silent" is set, these warnings will still appear on stderr.

Files: src/chaos.ts:44,55,70, src/stream-collapse.ts:641


Important Issues

(4) README chaos documentation link is broken

The README links to chaos.html but the actual docs file is chaos-testing.html, resulting in a 404.

File: README.md

(5) docs/docs.html endpoint table missing Cohere

The endpoint reference table in docs/docs.html does not list the Cohere POST /v2/chat endpoint, despite Cohere being a fully supported provider with its own handler module.

File: docs/docs.html

(6) stream-collapse.ts SSE content-type switch has no explicit "bedrock" case

The collapseStreamingResponse() dispatcher handles SSE content types for openai, anthropic, gemini, cohere, and ollama, but has no explicit "bedrock" case for SSE-format streams. Bedrock binary EventStream is handled separately via content-type detection, but if Bedrock ever sends text/event-stream content, it would fall through to the default with a console.warn.

File: src/stream-collapse.ts

(7) makeUpstreamRequest has no timeout on response body accumulation

The upstream request in recorder.ts has a 30-second connection timeout but no limit on response body size or accumulation time. A slow-drip upstream response could cause unbounded memory growth.

File: src/recorder.ts:209-253

(8) CHANGELOG describes --chaos as a single flag

The CHANGELOG entry says --chaos CLI flag, but the actual implementation provides three separate flags: --chaos-drop, --chaos-malformed, --chaos-disconnect.

File: CHANGELOG.md


Suggestions

(9) CollapseResult should use a discriminated union

There is an existing TODO in the code for this. The current CollapseResult type uses optional fields (content?, toolCalls?, truncated?, droppedChunks?) which makes it possible to construct invalid states (e.g., both content and toolCalls set). A discriminated union with type: "text" | "toolCalls" would make this impossible.

File: src/stream-collapse.ts

(10) Duplicated Math.max(1, chunkSize) pattern across handlers

There is an existing TODO in src/types.ts:253-254 noting this. The pattern Math.max(1, fixture.chunkSize ?? defaults.chunkSize) is repeated in every handler. A resolveChunkSize(fixture, defaults) helper would centralize this.

File: src/types.ts:253-254

(11) Missing startup-time validation for programmatic API inputs

validateFixtures() in fixture-loader.ts validates StreamingProfile and ChaosConfig ranges for JSON-loaded fixtures, but the same validation is not applied when fixtures are created programmatically via the API (e.g., server.fixture(...)). Invalid ranges (negative rates, jitter > 1) could cause silent misbehavior.

File: src/fixture-loader.ts


Strengths

  • Comprehensive provider coverage: Eight LLM providers with format-correct request/response handling, including the complex AWS Bedrock binary EventStream protocol with CRC32 checksums
  • Excellent test coverage: 1,284 tests across 37 files covering streaming, chaos injection, metrics, fixture matching, record-and-replay, and all provider formats
  • Well-designed chaos testing: Three-level precedence (header > fixture > server defaults) with proper probability-based evaluation and exhaustive switch coverage
  • Clean metrics implementation: Zero-dependency Prometheus-compatible registry with proper histogram bucket handling and parametric path normalization
  • Record-and-replay architecture: VCR-style proxy with intelligent stream collapsing across 6 different streaming formats (SSE, NDJSON, binary EventStream)
  • Type safety: Strong TypeScript types throughout with discriminated unions, structural discrimination, and minimal use of any
  • Consistent handler pattern: All provider handlers follow the same readBody → parse → handle → catch structure with shared HandlerDefaults dependency injection

Recommended Action

FIX EVERYTHING.

jpr5 added 3 commits March 21, 2026 14:55
…G chaos flags

- docker.html: fix health probes (TCP socket → httpGet on /health and /ready)
- docker.html: remove "CLI Configuration (v1.7.0)" section (references non-existent --config
  flag and aimock binary name)
- docker.html: fix --chaos-error-rate → --chaos-drop/--chaos-malformed/--chaos-disconnect
- docker.html: fix mountPath /fixtures → /app/fixtures (matches actual values.yaml)
- docs.html: add POST /v2/chat (Cohere) and POST /api/generate (Ollama) to endpoint table
- CHANGELOG.md: fix "via --chaos CLI flag" → list all three chaos flags
- README.md: fix chaos-testing link (chaos.html → chaos-testing.html)
… bedrock SSE; body timeout

- chaos.ts: add optional logger param to resolveChaosConfig/evaluateChaos/applyChaos;
  replace all console.warn calls with logger?.warn
- stream-collapse.ts: logger param on collapseStreamingResponse; replace console.warn;
  add explicit case "bedrock" routing to collapseAnthropicSSE; add bounds check in
  decodeEventStreamFrames — return {frames, truncated:true} when totalLength extends
  past buffer, preventing out-of-bounds reads on malformed/truncated EventStream frames
- recorder.ts: pass defaults.logger to collapseStreamingResponse; add res.setTimeout
  body accumulation timeout (30s) to prevent unbounded memory growth on slow responses
- bedrock.ts: update module docstring to describe all four endpoint families
- all handlers: pass defaults.logger as final arg to all applyChaos call sites
…edrock SSE, and body timeout

- chaos.test.ts: verify evaluateChaos without logger does not call console.warn;
  verify invalid chaos header with logLevel:silent is silently ignored end-to-end
- stream-collapse.test.ts: verify bounds check returns {truncated:true} for
  oversized totalLength; verify provider="bedrock" routes to collapseAnthropicSSE
- recorder.test.ts: verify proxyAndRecord calls res.setTimeout(30_000) on
  upstream IncomingMessage
@jpr5 jpr5 force-pushed the feat/v1.6.0-subspec1 branch from 22a8cbd to cb09880 Compare March 21, 2026 21:57
Copy link
Copy Markdown
Contributor Author

@jpr5 jpr5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR (53) Resolution Report — 2026-03-21

PR: v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay Branch: feat/v1.6.0-subspec1main


What Was Fixed

Critical Issues

(1) docs/docker.html factual errors — Fixed all five:

  • Health probes: "TCP socket checks" → httpGet on /health (liveness) and /ready (readiness), matching actual Helm deployment template
  • Removed the entire "CLI Configuration (v1.7.0)" section (referenced non-existent --config flag and wrong aimock binary name)
  • --chaos-error-rate--chaos-drop, --chaos-malformed, --chaos-disconnect
  • mountPath: /fixtures/app/fixtures (matches actual values.yaml)

(2) stream-collapse.ts decodeEventStreamFrames bounds check — Added validation if (totalLength < 12 || offset + totalLength > buf.length) after reading totalLength, returning {frames, truncated: true} for malformed/truncated EventStream buffers instead of reading out-of-bounds.

(3) chaos.ts and stream-collapse.ts console.warn → Logger — Added optional logger?: Logger parameter to resolveChaosConfig, evaluateChaos, applyChaos, and collapseStreamingResponse. Replaced all console.warn calls with logger?.warn. Updated all 12 applyChaos call sites across handler files to pass defaults.logger. Warnings now respect log-level filtering and go through the structured logger.

Important Issues

(4) README chaos link broken — Fixed chaos.htmlchaos-testing.html.

(5) docs/docs.html missing Cohere endpoint — Added POST /v2/chat (Cohere, HTTP SSE / JSON) and POST /api/generate (Ollama, NDJSON / JSON) to the endpoint reference table.

(6) stream-collapse.ts no explicit "bedrock" SSE case — Added case "bedrock": return collapseAnthropicSSE(str) to the SSE provider switch. Bedrock SSE uses the Anthropic wire format (same event types), so this is the correct routing.

(7) makeUpstreamRequest no body accumulation timeout — Added res.setTimeout(30_000, ...) on the upstream IncomingMessage. When fired, it calls req.destroy(err) which triggers the existing req.on("error", reject) handler, properly rejecting the promise with a timeout error.

(8) CHANGELOG --chaos as single flag — Fixed to reference all three flags: --chaos-drop, --chaos-malformed, --chaos-disconnect.

Suggestions (from all three review rounds)

  • bedrock.ts module docstring updated to describe all four endpoint families and which file handles which.
  • stream-collapse.ts unknown provider fallback now uses logger?.warn instead of console.warn.
  • CollapseResult discriminated union and resolveChunkSize helper left as existing TODO comments — would require multi-file architectural changes, skipped.

Tests Added

(chaos.test.ts)

  • Test: evaluateChaos without logger + invalid "notanumber" header does NOT call console.warn (regression guard for the console.warn→logger?.warn migration)
  • Test: invalid chaos header with logLevel: "silent" is silently ignored end-to-end; request returns 200

(stream-collapse.test.ts)

  • Test: collapseBedrockEventStream with totalLength=9999 in a 20-byte buffer returns {truncated: true} (bounds check regression guard)
  • Test: collapseStreamingResponse("text/event-stream", "bedrock", body) correctly routes to collapseAnthropicSSE and extracts content (bedrock SSE case regression guard)

(recorder.test.ts)

  • Test: proxyAndRecord calls res.setTimeout(30_000, handler) on the upstream IncomingMessage (body timeout regression guard)

1305 tests passing (up from 1300 before this session).


Commits Made

8014b70 docs: correct docker.html errors, add missing endpoints, fix CHANGELOG chaos flags
72eda7c fix: structured logger for chaos/stream warnings; EventStream bounds; bedrock SSE; body timeout
cb09880 test: regression coverage for logger migration, EventStream bounds, bedrock SSE, and body timeout

CI Status

All 10 checks pass: commitlint, eslint, prettier, exports, test (20/22/24), preview, build-and-push, Continuous Releases.

@jpr5
Copy link
Copy Markdown
Contributor Author

jpr5 commented Mar 22, 2026

PR (53) Review — CopilotKit/llmock (2026-03-22)

PR: v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay
Branch: pr-53
Files changed: ~71 (22 source, 12 test, 30+ docs/config)


Previously Raised & Resolved

The following issues were identified in prior review rounds and have already been fixed in the current code:

  • HandlerDefaults extracted to shared interface (was duplicated inline)
  • ChaosConfig header values validated/clamped to [0,1] with NaN checks
  • collapseOllamaNDJSON now handles tool_calls (was missing)
  • buildFixtureResponse for Ollama checks tool_calls before content (was reversed)
  • Metrics try-catch logs at warn level (was debug)
  • RecordProviderKey union type used instead of raw string
  • JournalEntry.body widened to ChatCompletionRequest | null
  • Exhaustive never guard added to applyChaos switch

Critical Issues

None.


Important Issues

(1) docs/docs.html — endpoint table mislabels Groq/OpenAI-compat alias as "Azure OpenAI"

docs/docs.html:212-214 labels POST /openai/v1/chat/completions as "Azure OpenAI", but server.ts:452-454 shows this is the Groq/OpenAI-compatible alias (strips /openai prefix). Azure OpenAI uses /openai/deployments/{id}/chat/completions. The docs table will mislead users configuring Groq or other OpenAI-compatible providers.

(2) docs/metrics.html — wrong default port

docs/metrics.html:115 shows curl http://localhost:3004/metrics but the server default port is 4010 (confirmed via Dockerfile:28 EXPOSE and CLI defaults).

(3) recorder.ts — misleading log message for non-JSON upstream responses

recorder.ts:141 logs "Upstream response is not valid JSON — saving raw response" but the code path at line 143→264-271 actually saves an error fixture { error: { message: "Upstream returned non-JSON response" } }, not the raw response. The log message should say "saving as error fixture" to avoid misleading operators.

(4) stream-collapse — empty content produces valid-looking but vacuous fixtures

When all stream chunks are dropped or a stream yields no content, collapseStreamingResponse returns { content: "" }. This creates a fixture that matches requests but returns an empty string — silently producing wrong behavior on replay rather than signaling the recording was incomplete. Consider adding a _warning field or logging when collapsed content is empty.

(5) fixture-loader.ts — fail-open on read/parse errors

loadFixtureFile (line 44-50) and loadFixturesFromDir (line 73-79) return empty arrays on filesystem or parse errors. In --strict mode this means a misconfigured fixture path silently serves 503s for every request instead of failing fast at startup. The server should surface fixture-loading failures more prominently, especially in strict mode.

(6) Server-level ChaosConfig and StreamingProfile are never validated

fixture-loader.ts validates per-fixture StreamingProfile (ttft≥0, tps>0, jitter 0-1) and ChaosConfig (rates 0-1), but server-level defaults set via MockServerOptions.chaos or programmatic API bypass this validation entirely. Invalid server-level chaos rates (e.g., negative or >1) would propagate unchecked.


Suggestions

(7) stream-collapse — CRC failure produces partial-content fixtures with no marker

When decodeEventStreamFrames encounters a CRC mismatch, it returns { frames, truncated: true }. The recorder logs a warning (recorder.ts:118-120) but the resulting fixture has no metadata indicating truncation. On replay, this fixture silently returns partial content. Consider adding a _truncated flag or comment to the saved fixture JSON.

(8) FixtureResponse union is not a true discriminated union

TextResponse and ToolCallResponse can overlap if a response object happens to have both content and toolCalls properties. Type guards throughout the codebase check "toolCalls" in response which works in practice, but the type system doesn't prevent constructing ambiguous values. A discriminant field (e.g., type: "text" | "toolCall" | "error" | "embedding") would make this bulletproof.

(9) Missing test coverage for collapseStreamingResponse default fallback

When content-type doesn't match any known pattern and provider is unknown, collapseStreamingResponse falls back to OpenAI SSE parsing with a warning. This fallback path has no test coverage.

(10) Missing test coverage for applyChaos metrics counter increment

Tests verify chaos actions are applied but don't assert that the Prometheus counter llmock_chaos_actions_total is incremented with the correct labels.

(11) BedrockContentBlock.type is string rather than a union

The type field on Bedrock content blocks accepts any string rather than constraining to the known values ("text", "tool_use", "tool_result", "image", "guard_content"). A string literal union would catch typos at compile time.

(12) writeEventStream docstring — misleading return semantics

The docstring implies a boolean return indicating success, but the function returns void. The docstring should be updated or removed.

(13) bedrock.ts module docstring omits exported stream event builders

The module docstring describes only the invoke/stream handlers but buildBedrockStreamTextEvents and buildBedrockStreamToolCallEvents are exported and reused by bedrock-converse.ts. The docstring should mention these exports.


Strengths

  • Comprehensive test suite — 1,250+ tests covering all 8 providers, chaos injection, metrics, stream collapsing, fixture loading/validation, and record-and-replay proxy flows
  • Clean shared abstractionsHandlerDefaults interface, resolveChaosConfig 3-level merge, collapseStreamingResponse dispatcher all reduce duplication across providers
  • Robust binary protocol handling — Bedrock EventStream encoder/decoder with CRC32 validation and truncation detection is well-implemented
  • Thorough chaos system — Header > fixture > server precedence with NaN validation, [0,1] clamping, and exhaustive switch guard
  • Zero-dependency metrics — Prometheus-compatible registry with cumulative histograms, proper +Inf bucket, and dynamic path normalization
  • Defensive recording — Empty match detection, filesystem error headers, auth header exclusion from saved fixtures, binary relay for EventStream responses

Recommended Action

Approve with requested changes. The codebase is well-structured with strong test coverage. Issues (1) and (2) are documentation bugs that will actively mislead users and should be fixed before merge. Issues (3)-(6) are important correctness and observability concerns worth addressing in this PR or a fast follow-up. The remaining suggestions are improvements that can land incrementally.


Now let me attempt to post this to Notion. Since there's no Notion MCP tool, I'll try the Notion API directly.No Notion API credentials are available in the environment, so I cannot post to Notion programmatically.Notion posting failed — no NOTION_API_KEY or NOTION_TOKEN found in the environment. The review is ready above but I cannot post it to the "PR Reviews" page (ID: 3223aa38-1852-8189-baeb-d5134f920055) without credentials.

To enable Notion posting, either:

  • Set NOTION_API_KEY in your environment with a Notion integration token that has access to the target page
  • Or install a Notion MCP server

The full review document is above and ready to copy into Notion manually under "PR (53) Review — CopilotKit/llmock (2026-03-22)".

Summary: 0 critical issues, 6 important issues, 7 suggestions. Recommended action is approve with requested changes — primarily the two documentation bugs (wrong endpoint label in docs.html, wrong port in metrics.html) that will actively mislead users.

Copy link
Copy Markdown
Contributor Author

@jpr5 jpr5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See most recent comment for the full review.

jpr5 added 2 commits March 21, 2026 22:18
…ation, type unions

- recorder.ts: fix misleading 'saving raw response' log → 'saving as error fixture'
- recorder.ts: warn when stream collapse produces empty content
- recorder.ts: preserve both empty-match and truncation warnings in fixture JSON
- cli.ts: exit(1) on zero fixtures in strict/validate mode
- server.ts: warn on out-of-range chaos config values at startup
- bedrock.ts/messages.ts: narrow content block type from string to union
- aws-event-stream.ts: fix writeEventStream docstring return semantics
@jpr5 jpr5 merged commit 206bc45 into main Mar 22, 2026
10 checks passed
@jpr5 jpr5 deleted the feat/v1.6.0-subspec1 branch March 22, 2026 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant