fix: increase Anthropic max_tokens and warn on truncation#29
fix: increase Anthropic max_tokens and warn on truncation#29efecnc wants to merge 1 commit intoaltaidevorg:mainfrom
Conversation
max_tokens was hardcoded at 8192 for all Anthropic models. Newer models (Opus, Sonnet) support much higher output. Truncated responses could produce incomplete tool_use JSON that silently breaks tool execution. - Default to 16384 for opus/sonnet models, 8192 for others - Add with_max_tokens() builder for explicit configuration - Log warning when stop_reason is max_tokens so truncation is visible
There was a problem hiding this comment.
Code Review
This pull request updates the AnthropicProvider to dynamically set default max_tokens based on the model name, adds a with_max_tokens builder method, and introduces logging for truncated responses. Feedback indicates that the proposed token limits for Opus and Sonnet models exceed current API constraints and may result in errors, suggesting lower defaults and case-insensitive model name matching.
| let max_tokens = if model.contains("opus") || model.contains("sonnet") { | ||
| 16384 | ||
| } else { | ||
| 8192 | ||
| }; |
There was a problem hiding this comment.
The proposed default of 16384 tokens for Opus and Sonnet models is likely to cause 400 Bad Request errors from the Anthropic API. As of the current API version (2023-06-01), the maximum output tokens for Claude 3 Opus is 4096, and for Claude 3.5 Sonnet it is 8192. While a beta feature exists for 16384 tokens on Sonnet 3.5, it requires the anthropic-beta: max-tokens-3-5-sonnet-2024-07-15 header, which is not currently included in the request. Additionally, the model name check is case-sensitive; using to_lowercase() would be more robust.
let model_lower = model.to_lowercase();
let max_tokens = if model_lower.contains("sonnet") {
8192
} else if model_lower.contains("opus") {
4096
} else {
8192
};
Summary
max_tokensfor Anthropic Opus/Sonnet models from 8192 to 16384with_max_tokens()builder method for explicit configurationstop_reason == "max_tokens"so truncation is visibleRoot Cause
max_tokenswas hardcoded at 8192 for all Anthropic models. When output was truncated,stop_reasonwas never checked, so incomplete tool_use JSON was silently parsed withunwrap_or_default()— tools received empty args and failed mysteriously.Test plan
cargo checkpasses