Skip to content

feat: Add prompt caching to OpenAI-compatible custom models #1587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 12, 2025

Conversation

dleen
Copy link

@dleen dleen commented Mar 12, 2025

Context

Builds on the PR: #1562 to add the OpenAI compatible provider support for cache control. The previous PR updates the UI to add an option to specify that a model supports prompt caching.

Implementation

The OpenRouter provider has an implementation of adding the cache control key to OpenAI messages. Acknowledging the risk of duplication we pretty much copy the implementation wholesale. There probably is an opportunity to combine the implementations for OpenAI compatible and OpenRouter in the future.

Screenshots

image

How to Test

  1. Selected the prompt caching option in the UI. Entered gateway server for base URL.
  2. Started a chat request.
  3. On the gateway server observed the cache control key in the messages: 'type': 'text', 'cache_control': {'type': 'ephemeral'}
  4. On the gateway server observe the usage response:
'usage': {'cacheReadInputTokenCount': 16974, 'cacheReadInputTokens': 16974, 'cacheWriteInputTokenCount': 4438, 'cacheWriteInputTokens': 4438, 'inputTokens': 4, 'outputTokens': 222, 'totalTokens': 21638}

Get in Touch

Roo Code Discord handle: dleen


Important

Adds prompt caching support for OpenAI-compatible custom models, including UI options and message handling in openai.ts.

  • Behavior:
    • Adds prompt caching support to OpenAI-compatible custom models in openai.ts.
    • Implements cache control in createMessage() for models with supportsPromptCache.
    • Copies logic from OpenRouter to add cache_control to user messages.
  • UI Changes:
    • Adds "Prompt Caching" checkbox in ApiOptions.tsx for OpenAI-compatible models.
    • Allows configuration of cache read/write prices when prompt caching is enabled.
  • Misc:
    • Updates .changeset/thin-fans-deliver.md to document the feature addition.

This description was created by Ellipsis for 10c1e7d. It will automatically update as commits are pushed.

Copy link

changeset-bot bot commented Mar 12, 2025

🦋 Changeset detected

Latest commit: 10c1e7d

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
roo-cline Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Mar 12, 2025
type: "text",
text: systemPrompt,
// @ts-ignore-next-line
cache_control: { type: "ephemeral" },
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using // @ts-ignore-next-line to bypass type errors for the cache_control property. Consider extending the type definitions instead, and also evaluate extracting this caching logic into a shared helper to reduce duplication with the OpenRouter implementation.

Copy link
Collaborator

@mrubens mrubens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 12, 2025
@mrubens mrubens merged commit 9b5ee27 into RooCodeInc:main Mar 12, 2025
18 checks passed
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Mar 12, 2025
@dleen dleen deleted the cache branch March 12, 2025 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants