Live API Drift Detection

aimock produces responses shaped like real LLM APIs. Providers change their APIs over time. Drift means the mock no longer matches reality — your tests pass against aimock but break against the real API.

Three-Layer Approach

Drift detection compares three independent sources to triangulate the cause of any mismatch:

SDK types = Real API?	Real API = aimock?	Diagnosis
Yes	No	aimock drift — response builders need updating
No	No	Provider changed before SDK update — flag, wait for SDK catch-up
Yes	Yes	No drift — all clear
No	Yes	SDK drift — provider deprecated something SDK still references

Two-way comparison (mock vs real) can't distinguish between "we need to fix aimock" and "the SDK hasn't caught up yet." Three-way comparison can.

Running Drift Tests

# All providers (requires all three API keys)
OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-... GOOGLE_API_KEY=... pnpm test:drift

# Single provider (others skip automatically)
OPENAI_API_KEY=sk-... pnpm test:drift

# Strict mode — warnings also fail
STRICT_DRIFT=1 OPENAI_API_KEY=sk-... pnpm test:drift

Required environment variables:

OPENAI_API_KEY — OpenAI API key
ANTHROPIC_API_KEY — Anthropic API key
GOOGLE_API_KEY — Google AI API key

Each provider's tests skip independently if its key is not set. You can run drift tests for just one provider.

Reading Results

Severity levels

critical — Test fails. aimock produces a different shape than the real API for a field that both the SDK and real API agree on. This means aimock needs an update.
warning — Test passes (unless STRICT_DRIFT=1). The real API has a field that neither the SDK nor aimock knows about, or the SDK and real API disagree. Usually means a provider added something new.
info — Always passes. Known intentional differences (usage fields are always zero, optional fields aimock omits, etc.).

Example report output

API DRIFT DETECTED: OpenAI Chat Completions (non-streaming text)

  1. [critical] LLMOCK DRIFT — field in SDK + real API but missing from mock
     Path:    usage.completion_tokens_details
     SDK:     object { reasoning_tokens: number }
     Real:    object { reasoning_tokens: number, accepted_prediction_tokens: number }
     Mock:    <absent>

  2. [warning] PROVIDER ADDED FIELD — in real API but not in SDK or mock
     Path:    system_fingerprint
     SDK:     <absent>
     Real:    string
     Mock:    <absent>

  3. [info] MOCK EXTRA FIELD — in mock but not in real API
     Path:    choices[0].logprobs
     SDK:     null | object
     Real:    <absent>
     Mock:    null

Fixing Detected Drift

When a critical drift is detected:

Identify the response builder — the report path tells you which provider and field:
- OpenAI Chat Completions → src/helpers.ts (buildTextCompletion, buildToolCallCompletion, buildTextChunks, buildToolCallChunks)
- OpenAI Responses API → src/responses.ts (buildTextResponse, buildToolCallResponse, buildTextStreamEvents, buildToolCallStreamEvents)
- Anthropic Claude → src/messages.ts (buildClaudeTextResponse, buildClaudeToolCallResponse, buildClaudeTextStreamEvents, buildClaudeToolCallStreamEvents)
- Google Gemini → src/gemini.ts (buildGeminiTextResponse, buildGeminiToolCallResponse, buildGeminiTextStreamChunks, buildGeminiToolCallStreamChunks)
Update the builder — add or modify the field to match the real API shape.
Run conformance tests — pnpm test to verify existing API conformance tests still pass.
Run drift tests — pnpm test:drift to verify the drift is resolved.

Model Deprecation

The models.drift.ts test scrapes model names referenced in aimock's test files, README, and fixtures, then checks each provider's model listing API to verify they still exist.

When a model is deprecated:

Update the model name in the affected test files and fixtures
Update src/__tests__/drift/providers.ts if the cheap test model changed
Run pnpm test and pnpm test:drift

Adding a New Provider

Add the provider's SDK as a devDependency in package.json
Add shape extraction functions to src/__tests__/drift/sdk-shapes.ts
Add raw fetch client functions to src/__tests__/drift/providers.ts
Create src/__tests__/drift/<provider>.drift.ts with 4 test scenarios
Add model listing function to providers.ts and model check to models.drift.ts
If the provider uses WebSocket, add protocol functions to ws-providers.ts and create ws-<provider>.drift.ts
Update the allowlist in schema.ts if needed

WebSocket Drift Coverage

In addition to the 19 existing drift tests (16 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (4 verified + 2 canary = 6 WS tests):

Protocol	Text	Tool Call	Real Endpoint	Status
OpenAI Responses WS	✓	✓	`wss://api.openai.com/v1/responses`	Verified
OpenAI Realtime	✓	✓	`wss://api.openai.com/v1/realtime`	Verified
Gemini Live	—	—	`wss://generativelanguage.googleapis.com/ws/...BidiGenerateContent`	Unverified

Models: gpt-4o-mini for Responses WS, gpt-4o-mini-realtime-preview for Realtime.

Auth: Uses the same OPENAI_API_KEY and GOOGLE_API_KEY environment variables as HTTP tests. No new secrets needed.

How it works: A TLS WebSocket client (ws-providers.ts) connects to real provider endpoints using node:tls with RFC 6455 framing. Each protocol function handles the setup sequence (e.g., Realtime session negotiation, Gemini Live setup/setupComplete) and collects messages until a terminal event. The mock side uses the existing ws-test-client.ts plaintext client against the local aimock server.

Gemini Live: unverified

aimock's Gemini Live handler implements the text-based BidiGenerateContent protocol as documented in Google's Live API reference — setup/setupComplete handshake, clientContent with turns, serverContent with modelTurn.parts[].text, and toolCall responses. The protocol format is correct per the docs.

However, as of March 2026, the only models that support bidiGenerateContent are native-audio models (gemini-2.5-flash-native-audio-*), which reject text-only requests. No text-capable model exists for this endpoint yet, so we cannot triangulate aimock's output against a real API response.

A canary test (ws-gemini-live.drift.ts) queries the Gemini model listing API on each drift run and checks for a non-audio model that supports bidiGenerateContent. When Google ships one, the canary will flag it and the full drift tests can be enabled.

CI Schedule

Drift tests run on a schedule:

Daily: 6:00 AM UTC
Manual: Trigger via GitHub Actions UI (workflow_dispatch)
NOT on PR or push — these tests hit real APIs and cost money

See .github/workflows/test-drift.yml.

Automated Drift Remediation

When the daily drift test detects critical diffs on the main branch, the fix-drift.yml workflow runs automatically:

Collect — scripts/drift-report-collector.ts runs drift tests and produces a structured drift-report.json
Fix — scripts/fix-drift.ts (default mode) constructs a prompt from the report and invokes Claude Code to fix the builders
Verify — Independent pnpm test and pnpm test:drift steps confirm the fix works
PR — scripts/fix-drift.ts --create-pr stages and commits the changes, bumps the version, and opens a pull request
Issue (on failure) — scripts/fix-drift.ts --create-issue opens a GitHub issue with the drift report and Claude Code output

Steps 2 and 4/5 are separate invocations of fix-drift.ts with different modes.

Artifacts

Both workflows upload artifacts:

drift-report.json — structured drift data (retained 30 days)
claude-code-output.log — Claude Code's reasoning and tool calls (fix workflow only)

Manual trigger

The fix workflow also supports workflow_dispatch for manual runs.

Cost

~25 API calls per run (16 HTTP response-shape + 3 model listing + 6 WS including canaries) using the cheapest available models (gpt-4o-mini, gpt-4o-mini-realtime-preview, claude-haiku-4-5-20251001, gemini-2.5-flash) with 10-100 max tokens each. Under $0.15/week at daily cadence. When Gemini Live text-capable models become available, the 2 canary tests will become full drift tests, increasing real WS connections from 4 to 6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Live API Drift Detection

Three-Layer Approach

Running Drift Tests

Reading Results

Severity levels

Example report output

Fixing Detected Drift

Model Deprecation

Adding a New Provider

WebSocket Drift Coverage

Gemini Live: unverified

CI Schedule

Automated Drift Remediation

Artifacts

Manual trigger

Cost

FilesExpand file tree

DRIFT.md

Latest commit

History

DRIFT.md

File metadata and controls

Live API Drift Detection

Three-Layer Approach

Running Drift Tests

Reading Results

Severity levels

Example report output

Fixing Detected Drift

Model Deprecation

Adding a New Provider

WebSocket Drift Coverage

Gemini Live: unverified

CI Schedule

Automated Drift Remediation

Artifacts

Manual trigger

Cost