Skip to content

fix(tokens): accurate input_tokens for context management#107

Open
juslintek wants to merge 7 commits intojwadow:mainfrom
juslintek:fix/accurate-input-tokens
Open

fix(tokens): accurate input_tokens for context management#107
juslintek wants to merge 7 commits intojwadow:mainfrom
juslintek:fix/accurate-input-tokens

Conversation

@juslintek
Copy link
Copy Markdown

Problem

Claude Code (and other clients) can't trigger auto-compression on time because the gateway reports input_tokens ~2,000 tokens too low.

The Kiro API adds an internal system prompt (~2,000 tokens) that the gateway can't see when counting tokens locally:

Scenario Gateway reported Kiro actual Error
Simple "Hi" 464 2,448 5.3x undercount
Multi-turn 840 2,776 3.3x undercount

Clients see the low input_tokens, think there's plenty of context room, and don't compress until it's too late.

Solution

1. KIRO_PROMPT_OVERHEAD_TOKENS (config.py)

Configurable constant (default 2000, env var KIRO_PROMPT_OVERHEAD_TOKENS) added to all local token estimates. Accounts for Kiro's internal system prompt.

2. Accurate input_tokens in message_delta (streaming_anthropic.py)

At end of stream, uses context_usage_percentage from Kiro API to derive real input_tokens and include it in message_delta.usage.

3. Same fix for non-streaming path

After fix

Scenario message_start (estimate) message_delta (actual) Error
Simple "Hi" 2,463 2,448 +0.6%
Multi-turn 2,581 2,514 +2.6%

Slight overcount is intentional — better to compress early than too late.

Depends on

PancakeZik and others added 6 commits March 3, 2026 18:18
The prompt_tokens reported to clients (used by Claude Code's /context
command) were wildly inaccurate because they were derived from Kiro's
contextUsagePercentage, which returns unreliable values.

Instead, count tokens from the complete serialized Kiro request payload
using tiktoken. This includes system prompt, messages, tools, and all
other payload fields — matching what actually gets sent to the API.

- Replace request_messages/request_tools params with pre-counted
  prompt_tokens across all streaming functions
- Count tokens from full kiro_request_body in both OpenAI and
  Anthropic route handlers
- Remove dependency on contextUsagePercentage for token counting
- Update tests to match new function signatures
Claude Code calls this endpoint before each request to check conversation
size and decide whether to trigger compaction. Without it, the gateway
returns 404, Claude Code cannot estimate context usage, and long
conversations eventually hit the upstream CONTENT_LENGTH_EXCEEDS_THRESHOLD
error (400).

The endpoint builds the full Kiro payload and counts tokens on the
serialized JSON using tiktoken, consistent with the token counting
approach used in the messages endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace validate_tool_names (400 error) with deterministic truncation
for tool names exceeding 64-char Kiro API limit. Names are shortened
to 55 chars + '_' + 8-char md5 hash. Mapping is reversed in responses
so clients receive original names.

Fixes MCP plugins with auto-generated names like
mcp__plugin_cloudflare_cloudflare-docs__search_cloudflare_documentation
…s endpoint

Claude Code sends /v1/messages/count_tokens without max_tokens since it
only needs token counting, not generation. The required max_tokens field
caused 422 validation errors, breaking context usage tracking and
preventing conversation compaction.
…s response

- Make AnthropicTool.input_schema optional to accept Anthropic built-in
  server tools (web_search, code_execution, bash, text_editor) that
  don't have input_schema. These were causing 422 validation errors.
- Silently strip server tools in converter since Kiro API can't handle
  them, while keeping custom tools working as before.
- Add context_management.original_input_tokens to count_tokens response
  to match Anthropic API spec.
The gateway was reporting input_tokens ~2000 tokens too low because
Kiro API adds an internal system prompt that the gateway can't see.
This caused Claude Code to trigger auto-compression too late.

- Add KIRO_PROMPT_OVERHEAD_TOKENS (default 2000, configurable via env)
  to local token estimates in both messages and count_tokens endpoints
- Use context_usage_percentage from Kiro API to derive accurate
  input_tokens at end of stream, included in message_delta.usage
- Apply same fix to non-streaming collect_anthropic_response path

Before: message_start input_tokens=464 (actual: 2448) — 5.3x undercount
After:  message_start input_tokens=2463, message_delta input_tokens=2448
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Mar 21, 2026

Thanks for the PR! 🎉

Before merge, we need a one-time CLA confirmation.
It confirms that you have the right to contribute this code and allow the project to use it.

Full CLA text:
https://github.com/jwadow/kiro-gateway/blob/main/CLA.md

Please reply once with:

I have read the CLA and I accept its terms

You need to write once, all further messages from me can be ignored.

- Fix server tool detection: check 'input_schema is None' not 'not input_schema'
  (empty dict {} is valid for tools with no parameters)
- Update anthropic_to_kiro tests to use .payload on KiroPayloadResult
- Update build_kiro_payload tests to use .payload on KiroPayloadResult
- Replace validate_tool_names tests with truncate_tool_names tests
- Update AnthropicTool tests for optional name/input_schema
- Update max_tokens test for optional default (4096)

All 1412 tests pass.
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Mar 21, 2026

Thanks for the PR! 🎉

Before merge, we need a one-time CLA confirmation.
It confirms that you have the right to contribute this code and allow the project to use it.

Full CLA text:
https://github.com/jwadow/kiro-gateway/blob/main/CLA.md

Please reply once with:

I have read the CLA and I accept its terms

You need to write once, all further messages from me can be ignored.

@juslintek
Copy link
Copy Markdown
Author

I have read the CLA and I accept its terms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants