v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay#53
Conversation
commit: |
f858fd2 to
37cdc75
Compare
f7e8ef7 to
711a600
Compare
PR (53) Review — CopilotKit/llmock (2026-03-21)Critical Issues(1) (2) Systematic defaults type narrowing across 5 handlers — The same bug as (1) is replicated in (3) (4) Recorder binary relay corrupts Bedrock EventStream data — (5) (6) Important Issues(7) SKILL.md documents wrong status code for (8) Recorder filesystem write failures not propagated — When (9) Duplicated (10) (11) README stale provider list — Line 48 says "(OpenAI, Claude, Gemini)" but this PR adds Bedrock, Vertex AI, Ollama, and Cohere support. (12) Suggestions(13) Extract shared export interface HandlerDefaults {
latency: number;
chunkSize: number;
logger: Logger;
chaos?: ChaosConfig;
registry?: MetricsRegistry;
strict?: boolean;
record?: RecordConfig;
}(14) (15) (16) (17) (18) SKILL.md log level incorrect — Line 444 says "every proxy hit logs at (19) (20) Missing test coverage gaps — No tests for: non-streaming Bedrock strict/record mode with full defaults, Ollama NDJSON tool_calls collapse, |
jpr5
left a comment
There was a problem hiding this comment.
See most recent comment for the full review.
128ce26 to
3cf4957
Compare
New providers: Bedrock streaming/Converse, Vertex AI, Ollama, Cohere Chaos: probabilistic drop/malformed/disconnect with 3-level precedence Metrics: opt-in Prometheus /metrics endpoint Record-and-replay: proxy-on-miss, 6 stream collapse functions, strict mode HandlerDefaults shared type, provider-specific error formats, upstream timeout, binary relay, response stream error handling
Provider endpoints, chaos, metrics, recorder, stream collapse, strict mode, binary EventStream, NDJSON, Converse/Messages formats, rate clamping, URL validation, write failure logging, drift tests
New: Ollama, Cohere, Vertex AI, Chaos Testing, Metrics, Record-and-Replay Updated: all provider pages, fixtures, error injection, streaming physics, WebSocket, Docker, drift detection, compatible providers, README, SKILL.md
8c16062 to
402c8fa
Compare
…haos switch
- Narrow proxyAndRecord/handleGemini/handleCompletions providerKey from
string to RecordProviderKey, removing the unsafe providers cast
- Add "api-key" to recorder headersToForward for Azure OpenAI auth
- Replace {} as ChatCompletionRequest with null in all error-path journal
entries across 9 handler files (server, gemini, bedrock, bedrock-converse,
cohere, embeddings, messages, ollama, responses)
- Broaden JournalEntry.body to ChatCompletionRequest | null to match
- Remove chunkSize/streamingProfile from EmbeddingFixtureOpts (unused, non-streaming)
- Add default: never exhaustive check to applyChaos() switch
…via logger
- Wrap metrics res.on("finish") callback in try-catch with logger.debug
to prevent unhandled EventEmitter errors from crashing the server
- Propagate decodeEventStreamFrames truncated flag through CollapseResult
instead of console.warn, so it respects logLevel configuration
- Log Bedrock CRC mismatch warning via defaults.logger in recorder
--strict does not prevent proxying — proxy is attempted first when --record is set; 503 only fires when proxy is absent or fails
- metrics.test.ts: add test that injects a faulty registry via spy to
verify the try-catch in res.on("finish") prevents process crashes;
rename existing test for accuracy
- stream-collapse.test.ts: update CRC mismatch tests to assert
result.truncated === true (replaced console.warn spy pattern)
PR (53) Resolution ReportWhat Was FixedCritical Issues (all 4 fixed): (1) Chart.yaml (2) SKILL.md (3) Azure (4) Metrics Important Issues (all fixed): (5) (6) Recorder write failure behavior — Pre-existing design (error logged + header set + response still relayed). No change needed; behavior is intentional. (7) Missing version/CHANGELOG — Bumped (8) (9) Suggestions (all implemented): (10) (11) (12) (13) (14) Fixed recorder.ts comment ("strip x-llmock-* headers" → "Forward only safe headers"). (15) Tests Added
Commits MadeCI StatusAll 10 checks pass: 1281 tests passing (up from 1264 at the start). |
Critical IssuesNone. Important Issues(1) Metrics instrumentation errors are effectively silent (2) Chaos header values silently ignored when out of range (3) Unknown provider fallback in stream-collapse is a silent TODO Suggestions(4) Make (5) Accept (6) Centralize (7) Correct misleading "mid-flight" wording in chaos.ts docstring (8) Validate (9) Minor test coverage gaps
Strengths
Recommended ActionFIX EVERYTHING. |
jpr5
left a comment
There was a problem hiding this comment.
See most recent comment for the full review.
…ion test Clamp x-llmock-chaos-* header values to [0,1] and warn on NaN or out-of-range input. Restore universal clamping in resolveChaosConfig to cover fixture-level and server-default rates (regression from prior change). Fix file-level docstring to accurately describe the three chaos actions. Add tests for header clamping/NaN behavior and disconnect chaos action end-to-end.
…nish callback
Wrap the res.on('finish') metrics block in try/catch to prevent instrumentation
errors (wrong label cardinality, registry misconfiguration) from propagating
silently or crashing the request handler. Log failures at warn level so operators
see them without enabling debug logging.
Change providerKey parameter type from string to RecordProviderKey in collapseStreamingResponse, proxyAndRecord, handleGemini, and handleCompletions. Catches provider key typos at compile time. Add console.warn for unknown SSE provider fallback and document the OpenAI fallback behavior in the docstring. Add TODO comments for CollapseResult discriminated union and chunkSize helper centralization. Fix test comment and cast for unknown-provider fallback path.
…d time Add error-severity validation checks in validateFixtures for streamingProfile (ttft >= 0, tps > 0, jitter in [0,1]) and chaos (all rates in [0,1]). Catches nonsensical streaming physics and out-of-range chaos rates early with clear error messages rather than silently producing broken behavior at request time.
jpr5
left a comment
There was a problem hiding this comment.
PR (53) Review — CopilotKit/llmock (2026-03-21)
PR: v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay
Branch: pr-53 → main
Review Date: 2026-03-21
Files Changed: 37+ source files, 37 test files, docs, CI/workflow
Tests: 1,284 passing across 37 test files
Critical Issues
(1) docs/docker.html contains multiple factual errors
- References a
--configflag (described as v1.7.0) that does not exist in the CLI - References
--chaos-error-rateflag that does not exist — the actual flags are--chaos-drop,--chaos-malformed,--chaos-disconnect - Calls the CLI binary
aimockinstead ofllmockin multiple places - States health/readiness probes use "TCP socket checks" but the Helm template (
templates/deployment.yaml) actually useshttpGeton/health - The
values.yamlsnippet showsmountPath: /app/fixturesbut the DockerfileWORKDIRis/appand the CMD references./fixtures
These errors will actively mislead users trying to deploy via Docker/Kubernetes.
Files: docs/docker.html
(2) stream-collapse.ts decodeEventStreamFrames lacks bounds checking on totalLength
In decodeEventStreamFrames() (line ~438-481), the function reads totalLength from the binary buffer but does not validate that the remaining buffer has enough bytes before attempting to slice. A malformed or truncated Bedrock EventStream response could cause an out-of-bounds read or return garbage data.
File: src/stream-collapse.ts:438-481
(3) chaos.ts and stream-collapse.ts use console.warn instead of Logger
The project has a well-designed Logger class, but:
src/chaos.tslines 44, 55, 70 useconsole.warnfor chaos action loggingsrc/stream-collapse.tsline 641 usesconsole.warnfor unknown provider fallback
This bypasses log-level filtering and structured logging. In production or test environments where logLevel: "silent" is set, these warnings will still appear on stderr.
Files: src/chaos.ts:44,55,70, src/stream-collapse.ts:641
Important Issues
(4) README chaos documentation link is broken
The README links to chaos.html but the actual docs file is chaos-testing.html, resulting in a 404.
File: README.md
(5) docs/docs.html endpoint table missing Cohere
The endpoint reference table in docs/docs.html does not list the Cohere POST /v2/chat endpoint, despite Cohere being a fully supported provider with its own handler module.
File: docs/docs.html
(6) stream-collapse.ts SSE content-type switch has no explicit "bedrock" case
The collapseStreamingResponse() dispatcher handles SSE content types for openai, anthropic, gemini, cohere, and ollama, but has no explicit "bedrock" case for SSE-format streams. Bedrock binary EventStream is handled separately via content-type detection, but if Bedrock ever sends text/event-stream content, it would fall through to the default with a console.warn.
File: src/stream-collapse.ts
(7) makeUpstreamRequest has no timeout on response body accumulation
The upstream request in recorder.ts has a 30-second connection timeout but no limit on response body size or accumulation time. A slow-drip upstream response could cause unbounded memory growth.
File: src/recorder.ts:209-253
(8) CHANGELOG describes --chaos as a single flag
The CHANGELOG entry says --chaos CLI flag, but the actual implementation provides three separate flags: --chaos-drop, --chaos-malformed, --chaos-disconnect.
File: CHANGELOG.md
Suggestions
(9) CollapseResult should use a discriminated union
There is an existing TODO in the code for this. The current CollapseResult type uses optional fields (content?, toolCalls?, truncated?, droppedChunks?) which makes it possible to construct invalid states (e.g., both content and toolCalls set). A discriminated union with type: "text" | "toolCalls" would make this impossible.
File: src/stream-collapse.ts
(10) Duplicated Math.max(1, chunkSize) pattern across handlers
There is an existing TODO in src/types.ts:253-254 noting this. The pattern Math.max(1, fixture.chunkSize ?? defaults.chunkSize) is repeated in every handler. A resolveChunkSize(fixture, defaults) helper would centralize this.
File: src/types.ts:253-254
(11) Missing startup-time validation for programmatic API inputs
validateFixtures() in fixture-loader.ts validates StreamingProfile and ChaosConfig ranges for JSON-loaded fixtures, but the same validation is not applied when fixtures are created programmatically via the API (e.g., server.fixture(...)). Invalid ranges (negative rates, jitter > 1) could cause silent misbehavior.
File: src/fixture-loader.ts
Strengths
- Comprehensive provider coverage: Eight LLM providers with format-correct request/response handling, including the complex AWS Bedrock binary EventStream protocol with CRC32 checksums
- Excellent test coverage: 1,284 tests across 37 files covering streaming, chaos injection, metrics, fixture matching, record-and-replay, and all provider formats
- Well-designed chaos testing: Three-level precedence (header > fixture > server defaults) with proper probability-based evaluation and exhaustive switch coverage
- Clean metrics implementation: Zero-dependency Prometheus-compatible registry with proper histogram bucket handling and parametric path normalization
- Record-and-replay architecture: VCR-style proxy with intelligent stream collapsing across 6 different streaming formats (SSE, NDJSON, binary EventStream)
- Type safety: Strong TypeScript types throughout with discriminated unions, structural discrimination, and minimal use of
any - Consistent handler pattern: All provider handlers follow the same readBody → parse → handle → catch structure with shared
HandlerDefaultsdependency injection
Recommended Action
FIX EVERYTHING.
…G chaos flags - docker.html: fix health probes (TCP socket → httpGet on /health and /ready) - docker.html: remove "CLI Configuration (v1.7.0)" section (references non-existent --config flag and aimock binary name) - docker.html: fix --chaos-error-rate → --chaos-drop/--chaos-malformed/--chaos-disconnect - docker.html: fix mountPath /fixtures → /app/fixtures (matches actual values.yaml) - docs.html: add POST /v2/chat (Cohere) and POST /api/generate (Ollama) to endpoint table - CHANGELOG.md: fix "via --chaos CLI flag" → list all three chaos flags - README.md: fix chaos-testing link (chaos.html → chaos-testing.html)
… bedrock SSE; body timeout
- chaos.ts: add optional logger param to resolveChaosConfig/evaluateChaos/applyChaos;
replace all console.warn calls with logger?.warn
- stream-collapse.ts: logger param on collapseStreamingResponse; replace console.warn;
add explicit case "bedrock" routing to collapseAnthropicSSE; add bounds check in
decodeEventStreamFrames — return {frames, truncated:true} when totalLength extends
past buffer, preventing out-of-bounds reads on malformed/truncated EventStream frames
- recorder.ts: pass defaults.logger to collapseStreamingResponse; add res.setTimeout
body accumulation timeout (30s) to prevent unbounded memory growth on slow responses
- bedrock.ts: update module docstring to describe all four endpoint families
- all handlers: pass defaults.logger as final arg to all applyChaos call sites
…edrock SSE, and body timeout
- chaos.test.ts: verify evaluateChaos without logger does not call console.warn;
verify invalid chaos header with logLevel:silent is silently ignored end-to-end
- stream-collapse.test.ts: verify bounds check returns {truncated:true} for
oversized totalLength; verify provider="bedrock" routes to collapseAnthropicSSE
- recorder.test.ts: verify proxyAndRecord calls res.setTimeout(30_000) on
upstream IncomingMessage
22a8cbd to
cb09880
Compare
jpr5
left a comment
There was a problem hiding this comment.
PR (53) Resolution Report — 2026-03-21
PR: v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay Branch: feat/v1.6.0-subspec1 → main
What Was Fixed
Critical Issues
(1) docs/docker.html factual errors — Fixed all five:
- Health probes: "TCP socket checks" → httpGet on
/health(liveness) and/ready(readiness), matching actual Helm deployment template - Removed the entire "CLI Configuration (v1.7.0)" section (referenced non-existent
--configflag and wrongaimockbinary name) --chaos-error-rate→--chaos-drop,--chaos-malformed,--chaos-disconnectmountPath: /fixtures→/app/fixtures(matches actualvalues.yaml)
(2) stream-collapse.ts decodeEventStreamFrames bounds check — Added validation if (totalLength < 12 || offset + totalLength > buf.length) after reading totalLength, returning {frames, truncated: true} for malformed/truncated EventStream buffers instead of reading out-of-bounds.
(3) chaos.ts and stream-collapse.ts console.warn → Logger — Added optional logger?: Logger parameter to resolveChaosConfig, evaluateChaos, applyChaos, and collapseStreamingResponse. Replaced all console.warn calls with logger?.warn. Updated all 12 applyChaos call sites across handler files to pass defaults.logger. Warnings now respect log-level filtering and go through the structured logger.
Important Issues
(4) README chaos link broken — Fixed chaos.html → chaos-testing.html.
(5) docs/docs.html missing Cohere endpoint — Added POST /v2/chat (Cohere, HTTP SSE / JSON) and POST /api/generate (Ollama, NDJSON / JSON) to the endpoint reference table.
(6) stream-collapse.ts no explicit "bedrock" SSE case — Added case "bedrock": return collapseAnthropicSSE(str) to the SSE provider switch. Bedrock SSE uses the Anthropic wire format (same event types), so this is the correct routing.
(7) makeUpstreamRequest no body accumulation timeout — Added res.setTimeout(30_000, ...) on the upstream IncomingMessage. When fired, it calls req.destroy(err) which triggers the existing req.on("error", reject) handler, properly rejecting the promise with a timeout error.
(8) CHANGELOG --chaos as single flag — Fixed to reference all three flags: --chaos-drop, --chaos-malformed, --chaos-disconnect.
Suggestions (from all three review rounds)
bedrock.tsmodule docstring updated to describe all four endpoint families and which file handles which.stream-collapse.tsunknown provider fallback now useslogger?.warninstead ofconsole.warn.CollapseResultdiscriminated union andresolveChunkSizehelper left as existing TODO comments — would require multi-file architectural changes, skipped.
Tests Added
(chaos.test.ts)
- Test:
evaluateChaoswithout logger + invalid"notanumber"header does NOT callconsole.warn(regression guard for the console.warn→logger?.warn migration) - Test: invalid chaos header with
logLevel: "silent"is silently ignored end-to-end; request returns 200
(stream-collapse.test.ts)
- Test:
collapseBedrockEventStreamwithtotalLength=9999in a 20-byte buffer returns{truncated: true}(bounds check regression guard) - Test:
collapseStreamingResponse("text/event-stream", "bedrock", body)correctly routes tocollapseAnthropicSSEand extracts content (bedrock SSE case regression guard)
(recorder.test.ts)
- Test:
proxyAndRecordcallsres.setTimeout(30_000, handler)on the upstreamIncomingMessage(body timeout regression guard)
1305 tests passing (up from 1300 before this session).
Commits Made
8014b70 docs: correct docker.html errors, add missing endpoints, fix CHANGELOG chaos flags
72eda7c fix: structured logger for chaos/stream warnings; EventStream bounds; bedrock SSE; body timeout
cb09880 test: regression coverage for logger migration, EventStream bounds, bedrock SSE, and body timeout
CI Status
All 10 checks pass: commitlint, eslint, prettier, exports, test (20/22/24), preview, build-and-push, Continuous Releases.
PR (53) Review — CopilotKit/llmock (2026-03-22)PR: v1.6.0: Provider Endpoints, Chaos, Metrics, Record-and-Replay Previously Raised & ResolvedThe following issues were identified in prior review rounds and have already been fixed in the current code:
Critical IssuesNone. Important Issues(1) docs/docs.html — endpoint table mislabels Groq/OpenAI-compat alias as "Azure OpenAI"
(2) docs/metrics.html — wrong default port
(3) recorder.ts — misleading log message for non-JSON upstream responses
(4) stream-collapse — empty content produces valid-looking but vacuous fixtures When all stream chunks are dropped or a stream yields no content, (5) fixture-loader.ts — fail-open on read/parse errors
(6) Server-level ChaosConfig and StreamingProfile are never validated
Suggestions(7) stream-collapse — CRC failure produces partial-content fixtures with no marker When (8) FixtureResponse union is not a true discriminated union
(9) Missing test coverage for When content-type doesn't match any known pattern and provider is unknown, (10) Missing test coverage for Tests verify chaos actions are applied but don't assert that the Prometheus counter (11) BedrockContentBlock.type is The (12) writeEventStream docstring — misleading return semantics The docstring implies a boolean return indicating success, but the function returns (13) bedrock.ts module docstring omits exported stream event builders The module docstring describes only the invoke/stream handlers but Strengths
Recommended ActionApprove with requested changes. The codebase is well-structured with strong test coverage. Issues (1) and (2) are documentation bugs that will actively mislead users and should be fixed before merge. Issues (3)-(6) are important correctness and observability concerns worth addressing in this PR or a fast follow-up. The remaining suggestions are improvements that can land incrementally. Now let me attempt to post this to Notion. Since there's no Notion MCP tool, I'll try the Notion API directly.No Notion API credentials are available in the environment, so I cannot post to Notion programmatically.Notion posting failed — no To enable Notion posting, either:
The full review document is above and ready to copy into Notion manually under "PR (53) Review — CopilotKit/llmock (2026-03-22)". Summary: 0 critical issues, 6 important issues, 7 suggestions. Recommended action is approve with requested changes — primarily the two documentation bugs (wrong endpoint label in docs.html, wrong port in metrics.html) that will actively mislead users. |
jpr5
left a comment
There was a problem hiding this comment.
See most recent comment for the full review.
…ation, type unions - recorder.ts: fix misleading 'saving raw response' log → 'saving as error fixture' - recorder.ts: warn when stream collapse produces empty content - recorder.ts: preserve both empty-match and truncation warnings in fixture JSON - cli.ts: exit(1) on zero fixtures in strict/validate mode - server.ts: warn on out-of-range chaos config values at startup - bedrock.ts/messages.ts: narrow content block type from string to union - aws-event-stream.ts: fix writeEventStream docstring return semantics
Summary
Major feature release adding 8 capabilities to llmock, plus 29 bugs found and fixed in code review.
Provider Endpoints
Infrastructure
Record-and-Replay
Quality
Review Fixes (29 total across 7 rounds)
Round 1: Original review (20 findings)
Round 2 (2 findings)
Round 5 — fresh context (2 findings)
Round 6 — fresh context (2 findings)
Round 7 — fresh context (3 findings)
All fixes have corresponding regression tests.