fix: improve GLM-4.6 thinking token support for better compatibility #8643
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR improves GLM-4.6 thinking token support to ensure compatibility with various OpenAI-compatible endpoints, particularly addressing the issue reported with ik_llama.cpp.
Problem
As reported in #8547 and by @ChicoPinto70, the previous implementation in PR #8548 did not work with all OpenAI-compatible endpoints, specifically ik_llama.cpp. The issue was that:
thinking: { type: "enabled" }
parameter was always added for GLM-4.6 modelsSolution
This implementation takes a more conservative and compatible approach:
Key Changes:
Optional thinking parameter: The
thinking
parameter is now disabled by default and only added when explicitly enabled via configuration (openAiEnableThinkingParameter: true
)Multiple parsing strategies: The implementation now handles thinking tokens through three different methods to ensure maximum compatibility:
<think>...</think>
) parsing using XmlMatcherreasoning_content
field in the response delta (as used by some implementations)thinking
parameter for endpoints that support itComprehensive testing: Added extensive tests covering all scenarios including:
Benefits
Testing
All tests pass ✅:
Related Issues
Fixes #8547
Addresses feedback from PR #8548
For @kavehsfv
This implementation ensures GLM-4.6 thinking tokens work across different OpenAI-compatible endpoints. Once merged, it will be included in the next release cycle. The fix is backward-compatible and should work with your setup.
cc: @ChicoPinto70 - This should now work with your ik_llama.cpp setup. The thinking parameter is disabled by default for maximum compatibility.
Important
Improves GLM-4.6 thinking token support by making the thinking parameter optional and adding multiple parsing strategies for better compatibility.
thinking
parameter is now optional for GLM-4.6 models, enabled viaopenAiEnableThinkingParameter
.reasoning_content
, and optionalthinking
parameter.base-openai-compatible-provider.spec.ts
for default behavior, explicit enabling, XML tag parsing,reasoning_content
handling, and mixed formats.createStream()
andcreateMessage()
inbase-openai-compatible-provider.ts
to handle new logic for thinking tokens.isGLM46Model()
andshouldAddThinkingParameter()
to determine model type and parameter inclusion.This description was created by
for d9e20b2. You can customize this summary. It will automatically update as commits are pushed.