fix: improve GLM-4.6 thinking token support for better compatibility #8643

roomote · 2025-10-13T20:47:31Z

Description

This PR improves GLM-4.6 thinking token support to ensure compatibility with various OpenAI-compatible endpoints, particularly addressing the issue reported with ik_llama.cpp.

Problem

As reported in #8547 and by @ChicoPinto70, the previous implementation in PR #8548 did not work with all OpenAI-compatible endpoints, specifically ik_llama.cpp. The issue was that:

The thinking: { type: "enabled" } parameter was always added for GLM-4.6 models
Some endpoints (like ik_llama.cpp) may not recognize this parameter and could reject requests or fail to process them correctly

Solution

This implementation takes a more conservative and compatible approach:

Key Changes:

Optional thinking parameter: The thinking parameter is now disabled by default and only added when explicitly enabled via configuration (openAiEnableThinkingParameter: true)
Multiple parsing strategies: The implementation now handles thinking tokens through three different methods to ensure maximum compatibility:
- XML tags (<think>...</think>) parsing using XmlMatcher
- Direct reasoning_content field in the response delta (as used by some implementations)
- The optional thinking parameter for endpoints that support it
Comprehensive testing: Added extensive tests covering all scenarios including:
- Default behavior (no thinking parameter)
- Explicit enabling of thinking parameter
- XML tag parsing for GLM-4.6
- reasoning_content field handling
- Mixed format support

Benefits

ik_llama.cpp compatibility: Works with local ik_llama.cpp setups by not sending unsupported parameters by default
Flexibility: Can be configured to work with different endpoint implementations
Backward compatibility: Does not break existing setups
Future-proof: Can handle thinking tokens regardless of how the endpoint provides them

Testing

All tests pass ✅:

Unit tests: 13 passed
Type checking: Successful
Linting: No issues

Related Issues

Fixes #8547
Addresses feedback from PR #8548

For @kavehsfv

This implementation ensures GLM-4.6 thinking tokens work across different OpenAI-compatible endpoints. Once merged, it will be included in the next release cycle. The fix is backward-compatible and should work with your setup.

cc: @ChicoPinto70 - This should now work with your ik_llama.cpp setup. The thinking parameter is disabled by default for maximum compatibility.

Important

Improves GLM-4.6 thinking token support by making the thinking parameter optional and adding multiple parsing strategies for better compatibility.

Behavior:
- thinking parameter is now optional for GLM-4.6 models, enabled via openAiEnableThinkingParameter.
- Supports multiple parsing strategies for thinking tokens: XML tags, reasoning_content, and optional thinking parameter.
Testing:
- Added tests in base-openai-compatible-provider.spec.ts for default behavior, explicit enabling, XML tag parsing, reasoning_content handling, and mixed formats.
Functions:
- Updated createStream() and createMessage() in base-openai-compatible-provider.ts to handle new logic for thinking tokens.
- Added isGLM46Model() and shouldAddThinkingParameter() to determine model type and parameter inclusion.

^{This description was created by}^{for d9e20b2. You can customize this summary. It will automatically update as commits are pushed.}

- Make thinking parameter optional and disabled by default for ik_llama.cpp compatibility - Always parse reasoning_content from response for endpoints that provide it - Always parse XML-wrapped thinking tokens regardless of endpoint - Add comprehensive tests for all scenarios - Only enable thinking parameter when explicitly configured This ensures GLM-4.6 works with various OpenAI-compatible endpoints including ik_llama.cpp which may not support the thinking parameter but might provide thinking tokens through other means like reasoning_content or XML tags. Fixes #8547

roomote · 2025-10-13T20:53:44Z

src/api/providers/base-openai-compatible-provider.ts

+	protected isGLM46Model(modelId: string): boolean {
+		// Check for various GLM-4.6 model naming patterns
+		const lowerModel = modelId.toLowerCase()
+		return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6"


The third condition lowerModel === "glm-4.6" is redundant since it's already covered by the first condition lowerModel.includes("glm-4.6"). The includes check will return true whenever the exact match check would return true.

Suggested change

return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6"

return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6")

roomote bot requested review from cte, jr and mrubens as code owners October 13, 2025 20:47

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Oct 13, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Oct 13, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Oct 13, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Oct 13, 2025

roomote bot mentioned this pull request Oct 13, 2025

[BUG] GLM-4.6 not generating thinking tokens when using OpenAI-compatible custom endpoint #8547

Open

roomote bot commented Oct 13, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: improve GLM-4.6 thinking token support for better compatibility #8643

fix: improve GLM-4.6 thinking token support for better compatibility #8643

Uh oh!

roomote bot commented Oct 13, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

roomote bot Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	return lowerModel.includes("glm-4.6") \|\| lowerModel.includes("glm-4-6") \|\| lowerModel === "glm-4.6"
	return lowerModel.includes("glm-4.6") \|\| lowerModel.includes("glm-4-6")

fix: improve GLM-4.6 thinking token support for better compatibility #8643

Are you sure you want to change the base?

fix: improve GLM-4.6 thinking token support for better compatibility #8643

Uh oh!

Conversation

roomote bot commented Oct 13, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Key Changes:

Benefits

Testing

Related Issues

For @kavehsfv

Uh oh!

roomote bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

roomote bot commented Oct 13, 2025 •

edited by ellipsis-dev bot

Loading