Kilo-Org · mchenco · Apr 26, 2026 · Apr 26, 2026 · Apr 26, 2026 · Apr 27, 2026
diff --git a/packages/kilo-docs/lib/nav/ai-providers.ts b/packages/kilo-docs/lib/nav/ai-providers.ts
@@ -28,6 +28,10 @@ export const AiProvidersNav: NavSection[] = [
     title: "AI Gateways",
     links: [
       { href: "/ai-providers/openrouter", children: "OpenRouter" },
+      {
+        href: "/ai-providers/cloudflare-ai-gateway",
+        children: "Cloudflare AI Gateway",
+      },
       { href: "/ai-providers/glama", children: "Glama" },
       { href: "/ai-providers/requesty", children: "Requesty" },
       { href: "/ai-providers/unbound", children: "Unbound" },
@@ -42,6 +46,10 @@ export const AiProvidersNav: NavSection[] = [
     links: [
       { href: "/ai-providers/vertex", children: "Google Vertex AI" },
       { href: "/ai-providers/bedrock", children: "AWS Bedrock" },
+      {
+        href: "/ai-providers/cloudflare-workers-ai",
+        children: "Cloudflare Workers AI",
+      },
       { href: "/ai-providers/groq", children: "Groq" },
       { href: "/ai-providers/cerebras", children: "Cerebras" },
       { href: "/ai-providers/fireworks", children: "Fireworks AI" },

diff --git a/packages/kilo-docs/pages/ai-providers/cloudflare-ai-gateway.md b/packages/kilo-docs/pages/ai-providers/cloudflare-ai-gateway.md
@@ -0,0 +1,144 @@
+---
+description: Configure the Cloudflare AI Gateway in Kilo Code to route requests to OpenAI, Anthropic, Workers AI, and other providers through a single endpoint with caching, analytics, rate-limiting, and budget controls.
+keywords:
+  - kilo code
+  - cloudflare
+  - cloudflare ai gateway
+  - ai gateway
+  - ai provider
+  - prompt caching
+  - rate limiting
+  - usage tracking
+sidebar_label: Cloudflare AI Gateway
+---
+
+# Using Cloudflare AI Gateway With Kilo Code
+
+The [Cloudflare AI Gateway](https://developers.cloudflare.com/ai-gateway/) is a unified proxy that sits in front of upstream model providers and adds caching, analytics, rate limiting, retries, fallbacks, and spend controls. Kilo Code routes through the [Unified API](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/), which lets one provider config in Kilo reach OpenAI, Anthropic, and Cloudflare Workers AI models through the same endpoint.
+
+Useful links:
+
+- Dashboard: [dash.cloudflare.com → AI → AI Gateway](https://dash.cloudflare.com/?to=/:account/ai/ai-gateway)
+- Docs: [developers.cloudflare.com/ai-gateway](https://developers.cloudflare.com/ai-gateway/)
+- Unified API model list: [developers.cloudflare.com/ai-gateway/usage/chat-completion](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/)
+
+---
+
+## Getting Credentials
+
+The AI Gateway requires **three** values:
+
+1. **Account ID** — visible in the right sidebar of any zone in the [Cloudflare dashboard](https://dash.cloudflare.com), or under **Workers & Pages → Overview**.
+2. **Gateway Name (Gateway ID)** — you must create a gateway in the dashboard before you can use it. Go to **AI → AI Gateway → Create Gateway**, give it a slug-style name (e.g. `kilo-code`), and use that name as `CLOUDFLARE_GATEWAY_ID`. This is the same value that appears in the gateway's URL: `gateway.ai.cloudflare.com/v1/{ACCOUNT_ID}/{GATEWAY_NAME}/...`.
+3. **API Token** — create one at [dash.cloudflare.com/profile/api-tokens](https://dash.cloudflare.com/profile/api-tokens). The token needs the `AI Gateway: Run` permission. If your gateway has authentication enabled (recommended), this token is required; if it's an unauthenticated gateway it is still required so Kilo can attribute usage. Copy it immediately — it is only shown once.
+
+---
+
+## Configuration in Kilo Code
+
+{% tabs %}
+{% tab label="VSCode (Legacy)" %}
+
+1. **Open Kilo Code Settings:** Click the gear icon ({% codicon name="gear" /%}) in the Kilo Code panel.
+2. **Select Provider:** Choose "Cloudflare AI Gateway" from the "API Provider" dropdown.
+3. **Enter Credentials:** Paste your Account ID, Gateway name, and API token into the corresponding fields.
+4. **Select Model:** Choose your desired model from the "Model" dropdown.
+
+{% /tab %}
+{% tab label="VSCode" %}
+
+Open **Settings** (gear icon) and go to the **Providers** tab to add Cloudflare AI Gateway. You'll be prompted for your Account ID, Gateway name, and API token.
+
+The extension stores this in your `kilo.json` config file. You can also edit the config file directly — see the **CLI** tab for the file format.
+
+{% /tab %}
+{% tab label="CLI" %}
+
+Authenticate interactively, or set environment variables:
+
+```bash
+kilo auth cloudflare-ai-gateway
+```
+
+**Environment variables:**
+
+```bash
+export CLOUDFLARE_ACCOUNT_ID="your-account-id"
+export CLOUDFLARE_GATEWAY_ID="your-gateway-name"
+export CLOUDFLARE_API_TOKEN="your-api-token"
+# Or, if you already use the alias:
+# export CF_AIG_TOKEN="your-api-token"
+```
+
+**Config file** (`~/.config/kilo/kilo.json` or `./kilo.json`):
+
+```jsonc
+{
+  "provider": {
+    "cloudflare-ai-gateway": {
+      "env": [
+        "CLOUDFLARE_ACCOUNT_ID",
+        "CLOUDFLARE_GATEWAY_ID",
+        "CLOUDFLARE_API_TOKEN",
+      ],
+    },
+  },
+}
+```
+
+Then set your default model (Unified API format `provider/model`):
+
+```jsonc
+{
+  "model": "cloudflare-ai-gateway/anthropic/claude-sonnet-4-5",
+}
+```
+
+{% /tab %}
+{% /tabs %}
+
+---
+
+## Supported Models
+
+Kilo Code uses the AI Gateway [Unified API](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/), so model IDs follow the format `provider/model`. The gateway exposes models from:
+
+- **`openai/...`** — e.g. `openai/gpt-5.2-codex`, `openai/gpt-5.2`
+- **`anthropic/...`** — e.g. `anthropic/claude-sonnet-4-5`, `anthropic/claude-opus-4`
+- **`workers-ai/@cf/...`** — Cloudflare Workers AI models routed through your gateway, e.g. `workers-ai/@cf/moonshotai/kimi-k2.6`
+
+The full list is fetched automatically from `models.dev`. For the authoritative supported-models reference, see the [Unified API docs](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/).
+
+---
+
+## Caching, Rate Limits, and Metadata
+
+The AI Gateway provider supports a few extra knobs through the `options` block of your provider config:
+
+```jsonc
+{
+  "provider": {
+    "cloudflare-ai-gateway": {
+      "options": {
+        "cacheTtl": 3600,
+        "cacheKey": "kilo-default",
+        "skipCache": false,
+        "collectLog": true,
+        "metadata": { "team": "platform", "env": "dev" },
+      },
+    },
+  },
+}
+```
+
+These are forwarded to the gateway via the corresponding `cf-aig-*` headers. See the [Cloudflare AI Gateway configuration docs](https://developers.cloudflare.com/ai-gateway/configuration/) for the full set of options.
+
+---
+
+## Tips and Notes
+
+- **Create the gateway first.** The Gateway Name is not auto-generated — you must visit **AI → AI Gateway → Create Gateway** in the Cloudflare dashboard and pick a name before this provider will work.
+- **OpenAI reasoning models:** Kilo automatically drops the `maxOutputTokens` cap for OpenAI reasoning models (`gpt-5.x`, `o`-series) routed through the gateway, since the Unified API rejects `max_tokens` for those models. No action needed on your side.
+- **Cost & analytics:** The dashboard shows per-model cost, token, and latency stats for every request that flows through the gateway.
+- **Bring your own keys:** If you've configured upstream provider keys directly in the gateway settings, you don't need to set OpenAI/Anthropic/etc. keys in Kilo — the gateway uses its stored keys.
+- **Direct Workers AI access:** If you only need Workers AI models and don't want a gateway in front, use the [Cloudflare Workers AI](/docs/ai-providers/cloudflare-workers-ai) provider directly.
diff --git a/packages/kilo-docs/pages/ai-providers/cloudflare-workers-ai.md b/packages/kilo-docs/pages/ai-providers/cloudflare-workers-ai.md
@@ -0,0 +1,113 @@
+---
+description: Configure Cloudflare Workers AI in Kilo Code to run open-source models on Cloudflare's global GPU network via the OpenAI-compatible endpoint.
+keywords:
+  - kilo code
+  - cloudflare
+  - cloudflare workers ai
+  - workers ai
+  - ai provider
+  - openai compatible
+sidebar_label: Cloudflare Workers AI
+---
+
+# Using Cloudflare Workers AI With Kilo Code
+
+[Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai/) runs open-source large language models on Cloudflare's global GPU network. Models are served through an OpenAI-compatible endpoint, billed per-token, and available with no infrastructure to manage.
+
+**Website:** [https://developers.cloudflare.com/workers-ai/](https://developers.cloudflare.com/workers-ai/)
+
+## Getting Credentials
+
+You need two values:
+
+1. **Account ID** — visible in the right sidebar of any zone in the [Cloudflare dashboard](https://dash.cloudflare.com), or under **Workers & Pages → Overview**.
+2. **API Token** — create one at [dash.cloudflare.com/profile/api-tokens](https://dash.cloudflare.com/profile/api-tokens). Use the **"Workers AI"** template, or create a custom token with the `Workers AI: Read` permission. Copy the token immediately — it is only shown once.
+
+## Configuration in Kilo Code
+
+{% tabs %}
+{% tab label="VSCode (Legacy)" %}
+
+1. **Open Kilo Code Settings:** Click the gear icon ({% codicon name="gear" /%}) in the Kilo Code panel.
+2. **Select Provider:** Choose "Cloudflare Workers AI" from the "API Provider" dropdown.
+3. **Enter Account ID and API Token:** Paste your Cloudflare Account ID and API token into the corresponding fields.
+4. **Select Model:** Choose your desired model from the "Model" dropdown.
+
+{% /tab %}
+{% tab label="VSCode" %}
+
+Open **Settings** (gear icon) and go to the **Providers** tab to add Cloudflare Workers AI. You'll be prompted for your Account ID and API token.
+
+The extension stores this in your `kilo.json` config file. You can also edit the config file directly — see the **CLI** tab for the file format.
+
+{% /tab %}
+{% tab label="CLI" %}
+
+Authenticate interactively, or set environment variables:
+
+```bash
+kilo auth cloudflare-workers-ai
+```
+
+**Environment variables:**
+
+```bash
+export CLOUDFLARE_ACCOUNT_ID="your-account-id"
+export CLOUDFLARE_API_TOKEN="your-api-token"
+# CLOUDFLARE_API_KEY is also accepted as a legacy alias.
+```
+
+**Config file** (`~/.config/kilo/kilo.json` or `./kilo.json`):
+
+```jsonc
+{
+  "provider": {
+    "cloudflare-workers-ai": {
+      "env": ["CLOUDFLARE_ACCOUNT_ID", "CLOUDFLARE_API_TOKEN"],
+    },
+  },
+}
+```
+
+Then set your default model:
+
+```jsonc
+{
+  "model": "cloudflare-workers-ai/@cf/moonshotai/kimi-k2.6",
+}
+```
+
+{% /tab %}
+{% /tabs %}
+
+## Supported Models
+
+Kilo Code automatically picks up the current Workers AI model catalog from `models.dev`. Highlights for coding workflows:
+
+| Model ID                                  | Context | Tool calls | Reasoning | Vision |
+| ----------------------------------------- | ------- | ---------- | --------- | ------ |
+| `@cf/moonshotai/kimi-k2.6`                | 262k    | ✓          | ✓         | ✓      |
+| `@cf/moonshotai/kimi-k2.5`                | 256k    | ✓          | ✓         | ✓      |
+| `@cf/nvidia/nemotron-3-120b-a12b`         | 256k    | ✓          | ✓         |        |
+| `@cf/google/gemma-4-26b-a4b-it`           | 256k    | ✓          | ✓         | ✓      |
+| `@cf/openai/gpt-oss-120b`                 | 128k    | ✓          | ✓         |        |
+| `@cf/openai/gpt-oss-20b`                  | 128k    | ✓          | ✓         |        |
+| `@cf/zai-org/glm-4.7-flash`               | 128k    | ✓          | ✓         |        |
+| `@cf/meta/llama-4-scout-17b-16e-instruct` | 128k    | ✓          |           | ✓      |
+
+For the full and current list, see the [Workers AI model catalog](https://developers.cloudflare.com/workers-ai/models/).
+
+## Prompt Caching
+
+Workers AI offers [prefix caching](https://developers.cloudflare.com/workers-ai/features/prompt-caching/) on supported models, which reduces Time to First Token and bills cached input tokens at a discounted rate. Cache hits require requests in the same logical session to be routed to the same model instance — controlled by the `x-session-affinity` header.
+
+Kilo Code sends `x-session-affinity: <session-id>` automatically on every request, so prefix caching works out of the box for agentic coding sessions where each turn reuses the prior turn's prompt. Cached token counts are returned in the response `usage` object.
+
+To maximize cache hits in your own prompts/modes, follow the [Cloudflare guidance](https://developers.cloudflare.com/workers-ai/features/prompt-caching/#structuring-prompts-for-caching): put static content (system prompts, tool definitions) at the start, and avoid timestamps in system prompts.
+
+## Tips and Notes
+
+- **Kimi K2.6** is a strong default for agentic coding — it has a 262k context window, supports reasoning, tool calls, and vision, and is trained for agent workflows.
+- **OpenAI-compatible endpoint:** Kilo talks to `https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/v1` via the `@ai-sdk/openai-compatible` package, so streaming, tool calls, and reasoning all work the same as with native OpenAI.
+- **Routing through AI Gateway:** If you want analytics, caching, rate-limiting, or budgeting on top of Workers AI, use the [Cloudflare AI Gateway](/docs/ai-providers/cloudflare-ai-gateway) provider instead — it can route the same Workers AI models through your gateway.
+- **Pricing:** Per-token, billed against your Cloudflare account. See [Workers AI pricing](https://developers.cloudflare.com/workers-ai/platform/pricing/).