Skip to content

feat: Add prompt caching to OpenAI-compatible custom models #1587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/thin-fans-deliver.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"roo-cline": patch
---

Add prompt caching to OpenAI-compatible custom model info
37 changes: 36 additions & 1 deletion src/api/providers/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ export class OpenAiHandler extends BaseProvider implements SingleCompletionHandl
}

if (this.options.openAiStreamingEnabled ?? true) {
const systemMessage: OpenAI.Chat.ChatCompletionSystemMessageParam = {
let systemMessage: OpenAI.Chat.ChatCompletionSystemMessageParam = {
role: "system",
content: systemPrompt,
}
Expand All @@ -83,7 +83,42 @@ export class OpenAiHandler extends BaseProvider implements SingleCompletionHandl
} else if (ark) {
convertedMessages = [systemMessage, ...convertToSimpleMessages(messages)]
} else {
if (modelInfo.supportsPromptCache) {
systemMessage = {
role: "system",
content: [
{
type: "text",
text: systemPrompt,
// @ts-ignore-next-line
cache_control: { type: "ephemeral" },
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using // @ts-ignore-next-line to bypass type errors for the cache_control property. Consider extending the type definitions instead, and also evaluate extracting this caching logic into a shared helper to reduce duplication with the OpenRouter implementation.

},
],
}
}
convertedMessages = [systemMessage, ...convertToOpenAiMessages(messages)]
if (modelInfo.supportsPromptCache) {
// Note: the following logic is copied from openrouter:
// Add cache_control to the last two user messages
// (note: this works because we only ever add one user message at a time, but if we added multiple we'd need to mark the user message before the last assistant message)
const lastTwoUserMessages = convertedMessages.filter((msg) => msg.role === "user").slice(-2)
lastTwoUserMessages.forEach((msg) => {
if (typeof msg.content === "string") {
msg.content = [{ type: "text", text: msg.content }]
}
if (Array.isArray(msg.content)) {
// NOTE: this is fine since env details will always be added at the end. but if it weren't there, and the user added a image_url type message, it would pop a text part before it and then move it after to the end.
let lastTextPart = msg.content.filter((part) => part.type === "text").pop()

if (!lastTextPart) {
lastTextPart = { type: "text", text: "..." }
msg.content.push(lastTextPart)
}
// @ts-ignore-next-line
lastTextPart["cache_control"] = { type: "ephemeral" }
}
})
}
}

const requestOptions: OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming = {
Expand Down
114 changes: 112 additions & 2 deletions webview-ui/src/components/settings/ApiOptions.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -819,7 +819,7 @@ const ApiOptions = ({
style={{ fontSize: "12px" }}
/>
</div>
<div className="text-sm text-vscode-descriptionForeground">
<div className="text-sm text-vscode-descriptionForeground pt-1">
Is this model capable of processing and understanding images?
</div>
</div>
Expand All @@ -842,11 +842,34 @@ const ApiOptions = ({
style={{ fontSize: "12px" }}
/>
</div>
<div className="text-sm text-vscode-descriptionForeground [pt">
<div className="text-sm text-vscode-descriptionForeground pt-1">
Is this model capable of interacting with a browser? (e.g. Claude 3.7 Sonnet).
</div>
</div>

<div>
<div className="flex items-center gap-1">
<Checkbox
checked={apiConfiguration?.openAiCustomModelInfo?.supportsPromptCache ?? false}
onChange={handleInputChange("openAiCustomModelInfo", (checked) => {
return {
...(apiConfiguration?.openAiCustomModelInfo || openAiModelInfoSaneDefaults),
supportsPromptCache: checked,
}
})}>
<span className="font-medium">Prompt Caching</span>
</Checkbox>
<i
className="codicon codicon-info text-vscode-descriptionForeground"
title="Enable if the model supports prompt caching. This can improve performance and reduce costs."
style={{ fontSize: "12px" }}
/>
</div>
<div className="text-sm text-vscode-descriptionForeground pt-1">
Is this model capable of caching prompts?
</div>
</div>

<div>
<VSCodeTextField
value={
Expand Down Expand Up @@ -933,6 +956,93 @@ const ApiOptions = ({
</VSCodeTextField>
</div>

{apiConfiguration?.openAiCustomModelInfo?.supportsPromptCache && (
<>
<div>
<VSCodeTextField
value={
apiConfiguration?.openAiCustomModelInfo?.cacheReadsPrice?.toString() ?? "0"
}
type="text"
style={{
borderColor: (() => {
const value = apiConfiguration?.openAiCustomModelInfo?.cacheReadsPrice

if (!value && value !== 0) {
return "var(--vscode-input-border)"
}

return value >= 0
? "var(--vscode-charts-green)"
: "var(--vscode-errorForeground)"
})(),
}}
onChange={handleInputChange("openAiCustomModelInfo", (e) => {
const value = (e.target as HTMLInputElement).value
const parsed = parseFloat(value)

return {
...(apiConfiguration?.openAiCustomModelInfo ??
openAiModelInfoSaneDefaults),
cacheReadsPrice: isNaN(parsed) ? 0 : parsed,
}
})}
placeholder="e.g. 0.0001"
className="w-full">
<div className="flex items-center gap-1">
<span className="font-medium">Cache Reads Price</span>
<i
className="codicon codicon-info text-vscode-descriptionForeground"
title="Cost per million tokens for reading from the cache. This is the price charged when a cached response is retrieved."
style={{ fontSize: "12px" }}
/>
</div>
</VSCodeTextField>
</div>
<div>
<VSCodeTextField
value={
apiConfiguration?.openAiCustomModelInfo?.cacheWritesPrice?.toString() ?? "0"
}
type="text"
style={{
borderColor: (() => {
const value = apiConfiguration?.openAiCustomModelInfo?.cacheWritesPrice

if (!value && value !== 0) {
return "var(--vscode-input-border)"
}

return value >= 0
? "var(--vscode-charts-green)"
: "var(--vscode-errorForeground)"
})(),
}}
onChange={handleInputChange("openAiCustomModelInfo", (e) => {
const value = (e.target as HTMLInputElement).value
const parsed = parseFloat(value)

return {
...(apiConfiguration?.openAiCustomModelInfo ??
openAiModelInfoSaneDefaults),
cacheWritesPrice: isNaN(parsed) ? 0 : parsed,
}
})}
placeholder="e.g. 0.00005"
className="w-full">
<div className="flex items-center gap-1">
<span className="font-medium">Cache Writes Price</span>
<i
className="codicon codicon-info text-vscode-descriptionForeground"
title="Cost per million tokens for writing to the cache. This is the price charged when a prompt is cached for the first time."
style={{ fontSize: "12px" }}
/>
</div>
</VSCodeTextField>
</div>
</>
)}

<Button
variant="secondary"
onClick={() =>
Expand Down
Loading