Skip to content

fix(anthropic): make max_tokens optional for count_tokens endpoint#105

Open
juslintek wants to merge 4 commits intojwadow:mainfrom
juslintek:fix/count-tokens-max-tokens
Open

fix(anthropic): make max_tokens optional for count_tokens endpoint#105
juslintek wants to merge 4 commits intojwadow:mainfrom
juslintek:fix/count-tokens-max-tokens

Conversation

@juslintek
Copy link
Copy Markdown

Problem

Claude Code calls /v1/messages/count_tokens to track context usage and trigger conversation compaction. This endpoint reuses AnthropicMessagesRequest which requires max_tokens: int.

Since count_tokens only counts tokens (no generation), Claude Code doesn't send max_tokens. Every call fails with 422:

Validation error (422): 'max_tokens' Field required

This breaks context tracking entirely — Claude Code can't estimate context usage, leading to conversations hitting CONTENT_LENGTH_EXCEEDS_THRESHOLD errors.

Solution

Give max_tokens a default value of 4096 in AnthropicMessagesRequest, making it optional. The messages endpoint still works identically (clients always send it), while count_tokens no longer requires it.

Depends on

PancakeZik and others added 4 commits March 3, 2026 18:18
The prompt_tokens reported to clients (used by Claude Code's /context
command) were wildly inaccurate because they were derived from Kiro's
contextUsagePercentage, which returns unreliable values.

Instead, count tokens from the complete serialized Kiro request payload
using tiktoken. This includes system prompt, messages, tools, and all
other payload fields — matching what actually gets sent to the API.

- Replace request_messages/request_tools params with pre-counted
  prompt_tokens across all streaming functions
- Count tokens from full kiro_request_body in both OpenAI and
  Anthropic route handlers
- Remove dependency on contextUsagePercentage for token counting
- Update tests to match new function signatures
Claude Code calls this endpoint before each request to check conversation
size and decide whether to trigger compaction. Without it, the gateway
returns 404, Claude Code cannot estimate context usage, and long
conversations eventually hit the upstream CONTENT_LENGTH_EXCEEDS_THRESHOLD
error (400).

The endpoint builds the full Kiro payload and counts tokens on the
serialized JSON using tiktoken, consistent with the token counting
approach used in the messages endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace validate_tool_names (400 error) with deterministic truncation
for tool names exceeding 64-char Kiro API limit. Names are shortened
to 55 chars + '_' + 8-char md5 hash. Mapping is reversed in responses
so clients receive original names.

Fixes MCP plugins with auto-generated names like
mcp__plugin_cloudflare_cloudflare-docs__search_cloudflare_documentation
…s endpoint

Claude Code sends /v1/messages/count_tokens without max_tokens since it
only needs token counting, not generation. The required max_tokens field
caused 422 validation errors, breaking context usage tracking and
preventing conversation compaction.
@cla-bot
Copy link
Copy Markdown

cla-bot bot commented Mar 20, 2026

Thanks for the PR! 🎉

Before merge, we need a one-time CLA confirmation.
It confirms that you have the right to contribute this code and allow the project to use it.

Full CLA text:
https://github.com/jwadow/kiro-gateway/blob/main/CLA.md

Please reply once with:

I have read the CLA and I accept its terms

You need to write once, all further messages from me can be ignored.

@juslintek
Copy link
Copy Markdown
Author

I have read the CLA and I accept its terms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants