-
Notifications
You must be signed in to change notification settings - Fork 2.5k
docs+fix: add Cloudflare Workers AI / AI Gateway provider pages and unify CLOUDFLARE_API_TOKEN env var #9530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mchenco
wants to merge
6
commits into
Kilo-Org:main
Choose a base branch
from
mchenco:docs/cloudflare-providers
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+265
−0
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
984d89d
docs: add Cloudflare Workers AI and Cloudflare AI Gateway provider pages
mchenco fa2b69e
fix(provider): accept CLOUDFLARE_API_TOKEN for cloudflare-workers-ai
mchenco 271b4a9
fix(provider): identify Cloudflare requests as kilo, not opencode
mchenco 0402248
fix: address PR review feedback for Cloudflare provider docs and meta…
mchenco b3ee497
Merge branch 'main' into docs/cloudflare-providers
mchenco 6b99dbe
scope: drop packages/opencode/ changes, ship docs-only
mchenco File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
144 changes: 144 additions & 0 deletions
144
packages/kilo-docs/pages/ai-providers/cloudflare-ai-gateway.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,144 @@ | ||
| --- | ||
| description: Configure the Cloudflare AI Gateway in Kilo Code to route requests to OpenAI, Anthropic, Workers AI, and other providers through a single endpoint with caching, analytics, rate-limiting, and budget controls. | ||
| keywords: | ||
| - kilo code | ||
| - cloudflare | ||
| - cloudflare ai gateway | ||
| - ai gateway | ||
| - ai provider | ||
| - prompt caching | ||
| - rate limiting | ||
| - usage tracking | ||
| sidebar_label: Cloudflare AI Gateway | ||
| --- | ||
|
|
||
| # Using Cloudflare AI Gateway With Kilo Code | ||
|
|
||
| The [Cloudflare AI Gateway](https://developers.cloudflare.com/ai-gateway/) is a unified proxy that sits in front of upstream model providers and adds caching, analytics, rate limiting, retries, fallbacks, and spend controls. Kilo Code routes through the [Unified API](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/), which lets one provider config in Kilo reach OpenAI, Anthropic, and Cloudflare Workers AI models through the same endpoint. | ||
|
|
||
| Useful links: | ||
|
|
||
| - Dashboard: [dash.cloudflare.com → AI → AI Gateway](https://dash.cloudflare.com/?to=/:account/ai/ai-gateway) | ||
| - Docs: [developers.cloudflare.com/ai-gateway](https://developers.cloudflare.com/ai-gateway/) | ||
| - Unified API model list: [developers.cloudflare.com/ai-gateway/usage/chat-completion](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/) | ||
|
|
||
| --- | ||
|
|
||
| ## Getting Credentials | ||
|
|
||
| The AI Gateway requires **three** values: | ||
|
|
||
| 1. **Account ID** — visible in the right sidebar of any zone in the [Cloudflare dashboard](https://dash.cloudflare.com), or under **Workers & Pages → Overview**. | ||
| 2. **Gateway Name (Gateway ID)** — you must create a gateway in the dashboard before you can use it. Go to **AI → AI Gateway → Create Gateway**, give it a slug-style name (e.g. `kilo-code`), and use that name as `CLOUDFLARE_GATEWAY_ID`. This is the same value that appears in the gateway's URL: `gateway.ai.cloudflare.com/v1/{ACCOUNT_ID}/{GATEWAY_NAME}/...`. | ||
| 3. **API Token** — create one at [dash.cloudflare.com/profile/api-tokens](https://dash.cloudflare.com/profile/api-tokens). The token needs the `AI Gateway: Run` permission. If your gateway has authentication enabled (recommended), this token is required; if it's an unauthenticated gateway it is still required so Kilo can attribute usage. Copy it immediately — it is only shown once. | ||
|
|
||
| --- | ||
|
|
||
| ## Configuration in Kilo Code | ||
|
|
||
| {% tabs %} | ||
| {% tab label="VSCode (Legacy)" %} | ||
|
|
||
| 1. **Open Kilo Code Settings:** Click the gear icon ({% codicon name="gear" /%}) in the Kilo Code panel. | ||
| 2. **Select Provider:** Choose "Cloudflare AI Gateway" from the "API Provider" dropdown. | ||
| 3. **Enter Credentials:** Paste your Account ID, Gateway name, and API token into the corresponding fields. | ||
| 4. **Select Model:** Choose your desired model from the "Model" dropdown. | ||
|
|
||
| {% /tab %} | ||
| {% tab label="VSCode" %} | ||
|
|
||
| Open **Settings** (gear icon) and go to the **Providers** tab to add Cloudflare AI Gateway. You'll be prompted for your Account ID, Gateway name, and API token. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| The extension stores this in your `kilo.json` config file. You can also edit the config file directly — see the **CLI** tab for the file format. | ||
|
|
||
| {% /tab %} | ||
| {% tab label="CLI" %} | ||
|
|
||
| Authenticate interactively, or set environment variables: | ||
|
|
||
| ```bash | ||
| kilo auth cloudflare-ai-gateway | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| ``` | ||
|
|
||
| **Environment variables:** | ||
|
|
||
| ```bash | ||
| export CLOUDFLARE_ACCOUNT_ID="your-account-id" | ||
| export CLOUDFLARE_GATEWAY_ID="your-gateway-name" | ||
| export CLOUDFLARE_API_TOKEN="your-api-token" | ||
| # Or, if you already use the alias: | ||
| # export CF_AIG_TOKEN="your-api-token" | ||
| ``` | ||
|
|
||
| **Config file** (`~/.config/kilo/kilo.json` or `./kilo.json`): | ||
|
|
||
| ```jsonc | ||
| { | ||
| "provider": { | ||
| "cloudflare-ai-gateway": { | ||
| "env": [ | ||
| "CLOUDFLARE_ACCOUNT_ID", | ||
| "CLOUDFLARE_GATEWAY_ID", | ||
| "CLOUDFLARE_API_TOKEN", | ||
| ], | ||
| }, | ||
| }, | ||
| } | ||
| ``` | ||
|
|
||
| Then set your default model (Unified API format `provider/model`): | ||
|
|
||
| ```jsonc | ||
| { | ||
| "model": "cloudflare-ai-gateway/anthropic/claude-sonnet-4-5", | ||
| } | ||
| ``` | ||
|
|
||
| {% /tab %} | ||
| {% /tabs %} | ||
|
|
||
| --- | ||
|
|
||
| ## Supported Models | ||
|
|
||
| Kilo Code uses the AI Gateway [Unified API](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/), so model IDs follow the format `provider/model`. The gateway exposes models from: | ||
|
|
||
| - **`openai/...`** — e.g. `openai/gpt-5.2-codex`, `openai/gpt-5.2` | ||
| - **`anthropic/...`** — e.g. `anthropic/claude-sonnet-4-5`, `anthropic/claude-opus-4` | ||
| - **`workers-ai/@cf/...`** — Cloudflare Workers AI models routed through your gateway, e.g. `workers-ai/@cf/moonshotai/kimi-k2.6` | ||
|
|
||
| The full list is fetched automatically from `models.dev`. For the authoritative supported-models reference, see the [Unified API docs](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/). | ||
|
|
||
| --- | ||
|
|
||
| ## Caching, Rate Limits, and Metadata | ||
|
|
||
| The AI Gateway provider supports a few extra knobs through the `options` block of your provider config: | ||
|
|
||
| ```jsonc | ||
| { | ||
| "provider": { | ||
| "cloudflare-ai-gateway": { | ||
| "options": { | ||
| "cacheTtl": 3600, | ||
| "cacheKey": "kilo-default", | ||
| "skipCache": false, | ||
| "collectLog": true, | ||
| "metadata": { "team": "platform", "env": "dev" }, | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
| ``` | ||
|
|
||
| These are forwarded to the gateway via the corresponding `cf-aig-*` headers. See the [Cloudflare AI Gateway configuration docs](https://developers.cloudflare.com/ai-gateway/configuration/) for the full set of options. | ||
|
|
||
| --- | ||
|
|
||
| ## Tips and Notes | ||
|
|
||
| - **Create the gateway first.** The Gateway Name is not auto-generated — you must visit **AI → AI Gateway → Create Gateway** in the Cloudflare dashboard and pick a name before this provider will work. | ||
| - **OpenAI reasoning models:** Kilo automatically drops the `maxOutputTokens` cap for OpenAI reasoning models (`gpt-5.x`, `o`-series) routed through the gateway, since the Unified API rejects `max_tokens` for those models. No action needed on your side. | ||
| - **Cost & analytics:** The dashboard shows per-model cost, token, and latency stats for every request that flows through the gateway. | ||
| - **Bring your own keys:** If you've configured upstream provider keys directly in the gateway settings, you don't need to set OpenAI/Anthropic/etc. keys in Kilo — the gateway uses its stored keys. | ||
| - **Direct Workers AI access:** If you only need Workers AI models and don't want a gateway in front, use the [Cloudflare Workers AI](/docs/ai-providers/cloudflare-workers-ai) provider directly. | ||
113 changes: 113 additions & 0 deletions
113
packages/kilo-docs/pages/ai-providers/cloudflare-workers-ai.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| --- | ||
| description: Configure Cloudflare Workers AI in Kilo Code to run open-source models on Cloudflare's global GPU network via the OpenAI-compatible endpoint. | ||
| keywords: | ||
| - kilo code | ||
| - cloudflare | ||
| - cloudflare workers ai | ||
| - workers ai | ||
| - ai provider | ||
| - openai compatible | ||
| sidebar_label: Cloudflare Workers AI | ||
| --- | ||
|
|
||
| # Using Cloudflare Workers AI With Kilo Code | ||
|
|
||
| [Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai/) runs open-source large language models on Cloudflare's global GPU network. Models are served through an OpenAI-compatible endpoint, billed per-token, and available with no infrastructure to manage. | ||
|
|
||
| **Website:** [https://developers.cloudflare.com/workers-ai/](https://developers.cloudflare.com/workers-ai/) | ||
|
|
||
| ## Getting Credentials | ||
|
|
||
| You need two values: | ||
|
|
||
| 1. **Account ID** — visible in the right sidebar of any zone in the [Cloudflare dashboard](https://dash.cloudflare.com), or under **Workers & Pages → Overview**. | ||
| 2. **API Token** — create one at [dash.cloudflare.com/profile/api-tokens](https://dash.cloudflare.com/profile/api-tokens). Use the **"Workers AI"** template, or create a custom token with the `Workers AI: Read` permission. Copy the token immediately — it is only shown once. | ||
|
|
||
| ## Configuration in Kilo Code | ||
|
|
||
| {% tabs %} | ||
| {% tab label="VSCode (Legacy)" %} | ||
|
|
||
| 1. **Open Kilo Code Settings:** Click the gear icon ({% codicon name="gear" /%}) in the Kilo Code panel. | ||
| 2. **Select Provider:** Choose "Cloudflare Workers AI" from the "API Provider" dropdown. | ||
| 3. **Enter Account ID and API Token:** Paste your Cloudflare Account ID and API token into the corresponding fields. | ||
| 4. **Select Model:** Choose your desired model from the "Model" dropdown. | ||
|
|
||
| {% /tab %} | ||
| {% tab label="VSCode" %} | ||
|
|
||
| Open **Settings** (gear icon) and go to the **Providers** tab to add Cloudflare Workers AI. You'll be prompted for your Account ID and API token. | ||
|
|
||
| The extension stores this in your `kilo.json` config file. You can also edit the config file directly — see the **CLI** tab for the file format. | ||
|
|
||
| {% /tab %} | ||
| {% tab label="CLI" %} | ||
|
|
||
| Authenticate interactively, or set environment variables: | ||
|
|
||
| ```bash | ||
| kilo auth cloudflare-workers-ai | ||
| ``` | ||
|
|
||
| **Environment variables:** | ||
|
|
||
| ```bash | ||
| export CLOUDFLARE_ACCOUNT_ID="your-account-id" | ||
| export CLOUDFLARE_API_TOKEN="your-api-token" | ||
| # CLOUDFLARE_API_KEY is also accepted as a legacy alias. | ||
| ``` | ||
|
|
||
| **Config file** (`~/.config/kilo/kilo.json` or `./kilo.json`): | ||
|
|
||
| ```jsonc | ||
| { | ||
| "provider": { | ||
| "cloudflare-workers-ai": { | ||
| "env": ["CLOUDFLARE_ACCOUNT_ID", "CLOUDFLARE_API_TOKEN"], | ||
| }, | ||
| }, | ||
| } | ||
| ``` | ||
|
|
||
| Then set your default model: | ||
|
|
||
| ```jsonc | ||
| { | ||
| "model": "cloudflare-workers-ai/@cf/moonshotai/kimi-k2.6", | ||
| } | ||
| ``` | ||
|
|
||
| {% /tab %} | ||
| {% /tabs %} | ||
|
|
||
| ## Supported Models | ||
|
|
||
| Kilo Code automatically picks up the current Workers AI model catalog from `models.dev`. Highlights for coding workflows: | ||
|
|
||
| | Model ID | Context | Tool calls | Reasoning | Vision | | ||
| | ----------------------------------------- | ------- | ---------- | --------- | ------ | | ||
| | `@cf/moonshotai/kimi-k2.6` | 262k | ✓ | ✓ | ✓ | | ||
| | `@cf/moonshotai/kimi-k2.5` | 256k | ✓ | ✓ | ✓ | | ||
| | `@cf/nvidia/nemotron-3-120b-a12b` | 256k | ✓ | ✓ | | | ||
| | `@cf/google/gemma-4-26b-a4b-it` | 256k | ✓ | ✓ | ✓ | | ||
| | `@cf/openai/gpt-oss-120b` | 128k | ✓ | ✓ | | | ||
| | `@cf/openai/gpt-oss-20b` | 128k | ✓ | ✓ | | | ||
| | `@cf/zai-org/glm-4.7-flash` | 128k | ✓ | ✓ | | | ||
| | `@cf/meta/llama-4-scout-17b-16e-instruct` | 128k | ✓ | | ✓ | | ||
|
|
||
| For the full and current list, see the [Workers AI model catalog](https://developers.cloudflare.com/workers-ai/models/). | ||
|
|
||
| ## Prompt Caching | ||
|
|
||
| Workers AI offers [prefix caching](https://developers.cloudflare.com/workers-ai/features/prompt-caching/) on supported models, which reduces Time to First Token and bills cached input tokens at a discounted rate. Cache hits require requests in the same logical session to be routed to the same model instance — controlled by the `x-session-affinity` header. | ||
|
|
||
| Kilo Code sends `x-session-affinity: <session-id>` automatically on every request, so prefix caching works out of the box for agentic coding sessions where each turn reuses the prior turn's prompt. Cached token counts are returned in the response `usage` object. | ||
|
|
||
| To maximize cache hits in your own prompts/modes, follow the [Cloudflare guidance](https://developers.cloudflare.com/workers-ai/features/prompt-caching/#structuring-prompts-for-caching): put static content (system prompts, tool definitions) at the start, and avoid timestamps in system prompts. | ||
|
|
||
| ## Tips and Notes | ||
|
|
||
| - **Kimi K2.6** is a strong default for agentic coding — it has a 262k context window, supports reasoning, tool calls, and vision, and is trained for agent workflows. | ||
| - **OpenAI-compatible endpoint:** Kilo talks to `https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/v1` via the `@ai-sdk/openai-compatible` package, so streaming, tool calls, and reasoning all work the same as with native OpenAI. | ||
| - **Routing through AI Gateway:** If you want analytics, caching, rate-limiting, or budgeting on top of Workers AI, use the [Cloudflare AI Gateway](/docs/ai-providers/cloudflare-ai-gateway) provider instead — it can route the same Workers AI models through your gateway. | ||
| - **Pricing:** Per-token, billed against your Cloudflare account. See [Workers AI pricing](https://developers.cloudflare.com/workers-ai/platform/pricing/). |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove the VSCode Legacy section.