Fix/max tokens to max completion tokens 2101#2143
Fix/max tokens to max completion tokens 2101#2143Muran-prog wants to merge 4 commits intorouter-for-me:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an incompatibility issue with newer OpenAI models by updating how token limits are handled. It introduces a mechanism to automatically translate the deprecated Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly addresses the need to rename max_tokens to max_completion_tokens for newer OpenAI models by introducing a promoteMaxTokens function. The change is applied correctly in both streaming and non-streaming execution paths. However, I've identified a potential bug in the error handling within the new function that could lead to data loss under certain conditions, for which I've provided a suggestion.
| if !gjson.GetBytes(payload, "max_completion_tokens").Exists() { | ||
| payload, _ = sjson.SetBytes(payload, "max_completion_tokens", mt.Value()) | ||
| } | ||
| payload, _ = sjson.DeleteBytes(payload, "max_tokens") |
There was a problem hiding this comment.
There's a potential for data loss here if the payload is not valid JSON. If sjson.SetBytes fails, it returns the original payload and the error is ignored. Subsequently, sjson.DeleteBytes is called on the original payload. If DeleteBytes succeeds, max_tokens is removed, but max_completion_tokens was never set, effectively losing the max_tokens value.
It's better to handle the error from sjson.SetBytes and return early to prevent this inconsistent state.
if !gjson.GetBytes(payload, "max_completion_tokens").Exists() {
var err error
payload, err = sjson.SetBytes(payload, "max_completion_tokens", mt.Value())
if err != nil {
log.Warnf("promoteMaxTokens: failed to set max_completion_tokens, returning original payload: %v", err)
return payload
}
}
payload, _ = sjson.DeleteBytes(payload, "max_tokens")There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1100f46cca
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if !gjson.GetBytes(payload, "max_completion_tokens").Exists() { | ||
| payload, _ = sjson.SetBytes(payload, "max_completion_tokens", mt.Value()) | ||
| } | ||
| payload, _ = sjson.DeleteBytes(payload, "max_tokens") |
There was a problem hiding this comment.
Preserve token-limit echo for Responses conversions
Deleting max_tokens here breaks a non-stream /v1/responses round-trip: ConvertOpenAIChatCompletionsResponseToOpenAIResponsesNonStream currently reconstructs max_output_tokens only from max_output_tokens or max_tokens in requestRawJSON (see internal/translator/openai/openai/responses/openai_openai-responses_response.go around lines 632-637). After this change, translated requests carry only max_completion_tokens, so the final Responses payload silently drops max_output_tokens, which is a behavior regression for clients relying on that echoed field.
Useful? React with 👍 / 👎.
luispater
left a comment
There was a problem hiding this comment.
This fixes the upstream request rewrite for newer OpenAI-style models, but it also changes the request payload that the response translators use for /v1/responses round-trips.
Blocking:
promoteMaxTokens()rewritestranslatedbeforeTranslateNonStream(...), but the non-stream Responses converter currently reconstructsmax_output_tokensonly frommax_output_tokensormax_tokens. After this change, that path sees onlymax_completion_tokens, so/v1/responsesresponses can silently losemax_output_tokens.
Non-blocking:
- The rewrite is applied to every
openai-compatibilityprovider even though issue #2101 is specifically about GitHub Models / Azure-style newer models. Consider scoping this by provider/model or making it configurable. - The new tests only cover the helper and do not exercise executor-level round-trips for
/v1/responses, which is why the regression above is not caught.
Test plan:
- Reviewed PR metadata, diff, inline comments, and check results with
gh. - Traced the non-stream execution path from
OpenAICompatExecutor.Execute()intosdktranslator.TranslateNonStream()and the Responses response converter. - Did not run the PR branch locally.
…allback in Responses converter Address review feedback: - promoteMaxTokens now returns original payload on sjson.SetBytes error - Responses converter (stream + non-stream) recognizes max_completion_tokens so max_output_tokens is preserved after promotion
|
Updated based on feedback:
|
Fixes #2101
Newer OpenAI models (o3, o4-mini, gpt-5, etc.) reject the legacy max_tokens parameter:
400 Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.
Added promoteMaxTokens() in the openai-compat executor - renames max_tokens to max_completion_tokens before forwarding to upstream. If max_completion_tokens is already set, just removes the old field.
Applied in both Execute and ExecuteStream paths.