Skip to content

Fix/max tokens to max completion tokens 2101#2143

Open
Muran-prog wants to merge 4 commits intorouter-for-me:mainfrom
Muran-prog:fix/max-tokens-to-max-completion-tokens-2101
Open

Fix/max tokens to max completion tokens 2101#2143
Muran-prog wants to merge 4 commits intorouter-for-me:mainfrom
Muran-prog:fix/max-tokens-to-max-completion-tokens-2101

Conversation

@Muran-prog
Copy link
Contributor

Fixes #2101

Newer OpenAI models (o3, o4-mini, gpt-5, etc.) reject the legacy max_tokens parameter:
400 Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.

Added promoteMaxTokens() in the openai-compat executor - renames max_tokens to max_completion_tokens before forwarding to upstream. If max_completion_tokens is already set, just removes the old field.

Applied in both Execute and ExecuteStream paths.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an incompatibility issue with newer OpenAI models by updating how token limits are handled. It introduces a mechanism to automatically translate the deprecated max_tokens parameter to the required max_completion_tokens parameter, ensuring that requests to these models are processed correctly without errors. This change improves the robustness and forward compatibility of the system when interacting with evolving OpenAI APIs.

Highlights

  • OpenAI API Compatibility: Implemented a fix to ensure compatibility with newer OpenAI models that reject the legacy max_tokens parameter in favor of max_completion_tokens.
  • Parameter Promotion Logic: Introduced a promoteMaxTokens function that renames max_tokens to max_completion_tokens if max_completion_tokens is not already present. If max_completion_tokens exists, max_tokens is simply removed.
  • Execution Path Integration: Applied the promoteMaxTokens logic to both the Execute and ExecuteStream paths within the OpenAICompatExecutor to cover all relevant API calls.
  • Comprehensive Testing: Added a new test file with dedicated unit tests for the promoteMaxTokens function, covering various scenarios like renaming, existing max_completion_tokens, no max_tokens, and empty payloads.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • internal/runtime/executor/openai_compat_executor.go
    • Imported the gjson package for JSON parsing.
    • Called promoteMaxTokens in the Execute method to adapt request payloads.
    • Called promoteMaxTokens in the ExecuteStream method to adapt streaming request payloads.
    • Added the promoteMaxTokens function to handle the renaming and removal of max_tokens.
  • internal/runtime/executor/promote_max_tokens_test.go
    • Added a new test file.
    • Included TestPromoteMaxTokens_Rename to verify renaming functionality.
    • Included TestPromoteMaxTokens_AlreadySet to ensure existing max_completion_tokens are preserved.
    • Included TestPromoteMaxTokens_NoOp to confirm no changes when max_tokens is absent.
    • Included TestPromoteMaxTokens_EmptyPayload to test handling of nil or empty input payloads.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses the need to rename max_tokens to max_completion_tokens for newer OpenAI models by introducing a promoteMaxTokens function. The change is applied correctly in both streaming and non-streaming execution paths. However, I've identified a potential bug in the error handling within the new function that could lead to data loss under certain conditions, for which I've provided a suggestion.

Comment on lines +407 to +410
if !gjson.GetBytes(payload, "max_completion_tokens").Exists() {
payload, _ = sjson.SetBytes(payload, "max_completion_tokens", mt.Value())
}
payload, _ = sjson.DeleteBytes(payload, "max_tokens")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a potential for data loss here if the payload is not valid JSON. If sjson.SetBytes fails, it returns the original payload and the error is ignored. Subsequently, sjson.DeleteBytes is called on the original payload. If DeleteBytes succeeds, max_tokens is removed, but max_completion_tokens was never set, effectively losing the max_tokens value.

It's better to handle the error from sjson.SetBytes and return early to prevent this inconsistent state.

    if !gjson.GetBytes(payload, "max_completion_tokens").Exists() {
        var err error
        payload, err = sjson.SetBytes(payload, "max_completion_tokens", mt.Value())
        if err != nil {
            log.Warnf("promoteMaxTokens: failed to set max_completion_tokens, returning original payload: %v", err)
            return payload
        }
    }
    payload, _ = sjson.DeleteBytes(payload, "max_tokens")

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1100f46cca

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if !gjson.GetBytes(payload, "max_completion_tokens").Exists() {
payload, _ = sjson.SetBytes(payload, "max_completion_tokens", mt.Value())
}
payload, _ = sjson.DeleteBytes(payload, "max_tokens")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve token-limit echo for Responses conversions

Deleting max_tokens here breaks a non-stream /v1/responses round-trip: ConvertOpenAIChatCompletionsResponseToOpenAIResponsesNonStream currently reconstructs max_output_tokens only from max_output_tokens or max_tokens in requestRawJSON (see internal/translator/openai/openai/responses/openai_openai-responses_response.go around lines 632-637). After this change, translated requests carry only max_completion_tokens, so the final Responses payload silently drops max_output_tokens, which is a behavior regression for clients relying on that echoed field.

Useful? React with 👍 / 👎.

Copy link
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the upstream request rewrite for newer OpenAI-style models, but it also changes the request payload that the response translators use for /v1/responses round-trips.

Blocking:

  • promoteMaxTokens() rewrites translated before TranslateNonStream(...), but the non-stream Responses converter currently reconstructs max_output_tokens only from max_output_tokens or max_tokens. After this change, that path sees only max_completion_tokens, so /v1/responses responses can silently lose max_output_tokens.

Non-blocking:

  • The rewrite is applied to every openai-compatibility provider even though issue #2101 is specifically about GitHub Models / Azure-style newer models. Consider scoping this by provider/model or making it configurable.
  • The new tests only cover the helper and do not exercise executor-level round-trips for /v1/responses, which is why the regression above is not caught.

Test plan:

  • Reviewed PR metadata, diff, inline comments, and check results with gh.
  • Traced the non-stream execution path from OpenAICompatExecutor.Execute() into sdktranslator.TranslateNonStream() and the Responses response converter.
  • Did not run the PR branch locally.

…allback in Responses converter

Address review feedback:
- promoteMaxTokens now returns original payload on sjson.SetBytes error
- Responses converter (stream + non-stream) recognizes max_completion_tokens
  so max_output_tokens is preserved after promotion
@Muran-prog
Copy link
Contributor Author

Updated based on feedback:

  • Responses converter (both stream and non-stream paths in openai_openai-responses_response.go) now uses max_completion_tokens as a fallback for max_output_tokens.
  • Fallback priority: max_output_tokens > max_completion_tokens > max_tokens.
  • promoteMaxTokens() now returns the original payload on sjson.SetBytes errors to prevent silent data loss.
  • Added 5 tests for the non-stream Responses converter covering all token limit field combinations (direct, promoted, legacy, etc.).
  • Kept provider scoping global for now, as max_completion_tokens is the standard for current OpenAI-compatible models.

@Muran-prog Muran-prog requested a review from luispater March 15, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

400 Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.

2 participants