Skip to content

Commit 4fd065e

Browse files
authored
Caching AI SDK Models (#317)
* Added the server implementation and added the types to storage * Server will now attempt to find another port if 3006 is unavailable. * Added AI SDK cache functions * Added caching to all built-in scorers * Updates * Showed cache hits in the UI * Phase 1 of integration * Got it half working * Updates * Fixed bugs and added tests * Updates to fix CI * Updates * Tweak to changeset * Bugfix for table header * Updated docs * Updates * Updates to test reliability
1 parent 7097ffb commit 4fd065e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+2097
-666
lines changed

.changeset/0000-cache-config.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"evalite": minor
3+
---
4+
5+
Added cache config & --no-cache CLI flag. Config cache via evalite.config.ts or disable with --no-cache flag.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"evalite": patch
3+
---
4+
5+
Added cache debug mode via debugCache in runEvalite to debug cache hits/misses.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"evalite": patch
3+
---
4+
5+
Server will now attempt to find another port if 3006 is unavailable.

.changeset/better-tires-battle.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"evalite": major
3+
---
4+
5+
Removed `traceAISDKModel` in favor of `wrapAISDKModel` which includes both caching and tracing.

apps/evalite-docs/astro.config.mts

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ export default defineConfig({
99
"/quickstart": "/guides/quickstart",
1010

1111
// Guides reorganization
12-
"/guides/traces": "/api/traces",
12+
"/guides/traces": "/api/report-trace",
1313
"/guides/variant-comparison": "/tips/comparing-different-approaches",
1414
"/guides/multi-modal": "/tips/images-and-media",
1515
"/guides/cli": "/api/cli",
@@ -22,6 +22,10 @@ export default defineConfig({
2222

2323
// Examples moved to tips
2424
"/examples/ai-sdk": "/tips/vercel-ai-sdk",
25+
26+
// Documentation reorganization
27+
"/tips/adding-traces": "/tips/vercel-ai-sdk",
28+
"/api/traces": "/api/report-trace",
2529
},
2630
integrations: [
2731
starlight({
@@ -150,10 +154,6 @@ export default defineConfig({
150154
label: "A/B Testing",
151155
slug: "tips/comparing-different-approaches",
152156
},
153-
{
154-
label: "Adding Traces",
155-
slug: "tips/adding-traces",
156-
},
157157
{
158158
label: "Vercel AI SDK",
159159
slug: "tips/vercel-ai-sdk",
@@ -241,8 +241,12 @@ export default defineConfig({
241241
slug: "api/evalite-file",
242242
},
243243
{
244-
label: "Traces",
245-
slug: "api/traces",
244+
label: "wrapAISDKModel()",
245+
slug: "api/ai-sdk",
246+
},
247+
{
248+
label: "reportTrace()",
249+
slug: "api/report-trace",
246250
},
247251
{
248252
label: "runEvalite()",
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
title: AI SDK
3+
---
4+
5+
Evalite integrates deeply with the Vercel AI SDK to provide automatic tracing and caching of all LLM calls.
6+
7+
## `wrapAISDKModel()`
8+
9+
Wraps a Vercel AI SDK model to enable automatic tracing and caching of all LLM calls.
10+
11+
```typescript
12+
import { openai } from "@ai-sdk/openai";
13+
import { generateText } from "ai";
14+
import { evalite } from "evalite";
15+
import { wrapAISDKModel } from "evalite/ai-sdk";
16+
17+
// Wrap the model
18+
const model = wrapAISDKModel(openai("gpt-4o-mini"));
19+
20+
evalite("My Eval", {
21+
data: [{ input: "Hello", expected: "Hi" }],
22+
task: async (input) => {
23+
// All calls are automatically traced and cached
24+
const result = await generateText({
25+
model,
26+
prompt: input,
27+
});
28+
29+
return result.text;
30+
},
31+
});
32+
```
33+
34+
### Signature
35+
36+
```typescript
37+
wrapAISDKModel(
38+
model: LanguageModelV2,
39+
options?: {
40+
tracing?: boolean;
41+
caching?: boolean;
42+
}
43+
): LanguageModelV2
44+
```
45+
46+
**Parameters:**
47+
48+
- `model` - A Vercel AI SDK language model (from `@ai-sdk/openai`, `@ai-sdk/anthropic`, etc.)
49+
- `options` (optional) - Configuration options:
50+
- `tracing` - Enable automatic trace capture (default: `true`)
51+
- `caching` - Enable response caching (default: `true`)
52+
53+
**Returns:** A wrapped model with the same interface as the original.
54+
55+
### Disabling Tracing
56+
57+
```typescript
58+
const model = wrapAISDKModel(openai("gpt-4o-mini"), {
59+
tracing: false, // Disable automatic traces for this model
60+
});
61+
```
62+
63+
### Disabling Caching
64+
65+
```typescript
66+
const model = wrapAISDKModel(openai("gpt-4o-mini"), {
67+
caching: false, // Disable caching for this model
68+
});
69+
```
70+
71+
## What Gets Captured
72+
73+
### Tracing
74+
75+
When tracing is enabled, `wrapAISDKModel` automatically captures:
76+
77+
- Full prompt/messages sent to the model
78+
- Model responses (text and tool calls)
79+
- Token usage (input, output, total)
80+
- Timing information (start/end timestamps)
81+
82+
Traces appear in the Evalite UI under each test case.
83+
84+
### Caching
85+
86+
When caching is enabled, `wrapAISDKModel` automatically:
87+
88+
- Generates cache keys from model + parameters + prompt
89+
- Checks cache before making LLM calls
90+
- Returns cached responses (0 tokens used) on cache hits
91+
- Stores new responses in cache for future runs
92+
- Reports cache hits to the UI
93+
94+
Cache hits are then tracked and displayed in the UI.
95+
96+
### Persistent Caching
97+
98+
By default, Evalite's uses an in-memory [storage](/guides/storage), both for caching and for storing results.
99+
100+
If you want to persist the cache across runs, you can use the SQLite storage adapter.
101+
102+
```ts
103+
// evalite.config.ts
104+
import { defineConfig } from "evalite/config";
105+
import { createSqliteStorage } from "evalite/sqlite-storage";
106+
107+
export default defineConfig({
108+
storage: () => createSqliteStorage("./evalite.db"),
109+
});
110+
```
111+
112+
## Works With All AI SDK Methods
113+
114+
`wrapAISDKModel` works with all Vercel AI SDK methods:
115+
116+
**Generate:**
117+
118+
```typescript
119+
import { generateText } from "ai";
120+
121+
const result = await generateText({
122+
model: wrapAISDKModel(openai("gpt-4")),
123+
prompt: "Hello",
124+
});
125+
```
126+
127+
**Stream:**
128+
129+
```typescript
130+
import { streamText } from "ai";
131+
132+
const result = await streamText({
133+
model: wrapAISDKModel(openai("gpt-4")),
134+
prompt: "Hello",
135+
});
136+
137+
const text = await result.text;
138+
```
139+
140+
**Generate Object:**
141+
142+
```typescript
143+
import { generateObject } from "ai";
144+
import { z } from "zod";
145+
146+
const result = await generateObject({
147+
model: wrapAISDKModel(openai("gpt-4")),
148+
schema: z.object({ name: z.string() }),
149+
prompt: "Generate a person",
150+
});
151+
```
152+
153+
**Stream Object:**
154+
155+
```typescript
156+
import { streamObject } from "ai";
157+
import { z } from "zod";
158+
159+
const result = await streamObject({
160+
model: wrapAISDKModel(openai("gpt-4")),
161+
schema: z.object({ name: z.string() }),
162+
prompt: "Generate a person",
163+
});
164+
165+
const object = await result.object;
166+
```
167+
168+
## Behavior in Production
169+
170+
`wrapAISDKModel` is a no-op when called outside an Evalite context:
171+
172+
- Tracing: No traces are captured (no performance overhead)
173+
- Caching: No cache reads or writes occur (normal LLM behavior)
174+
175+
This means you can safely use `wrapAISDKModel` in production code without any performance impact.
176+
177+
## See Also
178+
179+
- [Vercel AI SDK Guide](/tips/vercel-ai-sdk) - Complete integration guide with examples
180+
- [`reportTrace()` Reference](/api/report-trace) - Manual trace reporting for non-AI SDK calls
181+
- [Configuration Guide](/guides/configuration) - Global cache configuration options
182+
- [CLI Reference](/api/cli) - Command-line flags for controlling cache behavior

apps/evalite-docs/src/content/docs/api/cli.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ evalite run path/to/eval.eval.ts
3232
- `--threshold <number>` - Fails the process if the score is below threshold. Specified as 0-100. Default is 100.
3333
- `--outputPath <path>` - Path to write test results in JSON format after evaluation completes.
3434
- `--hideTable` - Hides the detailed table output in the CLI.
35+
- `--no-cache` - Disables caching of AI SDK model outputs. See [Vercel AI SDK caching](/tips/vercel-ai-sdk#caching).
3536

3637
**Examples:**
3738

@@ -69,6 +70,7 @@ evalite watch path/to/eval.eval.ts
6970

7071
- `--threshold <number>` - Fails the process if the score is below threshold. Specified as 0-100. Default is 100.
7172
- `--hideTable` - Hides the detailed table output in the CLI.
73+
- `--no-cache` - Disables caching of AI SDK model outputs. See [Vercel AI SDK caching](/tips/vercel-ai-sdk#caching).
7274

7375
**Note:** `--outputPath` is not supported in watch mode.
7476

@@ -103,6 +105,7 @@ evalite serve path/to/eval.eval.ts
103105
- `--threshold <number>` - Fails the process if the score is below threshold. Specified as 0-100. Default is 100.
104106
- `--outputPath <path>` - Path to write test results in JSON format after evaluation completes.
105107
- `--hideTable` - Hides the detailed table output in the CLI.
108+
- `--no-cache` - Disables caching of AI SDK model outputs. See [Vercel AI SDK caching](/tips/vercel-ai-sdk#caching).
106109

107110
**Examples:**
108111

0 commit comments

Comments
 (0)