Skip to content

Conversation

roomote[bot]
Copy link

@roomote roomote bot commented Oct 13, 2025

Description

This PR improves GLM-4.6 thinking token support to ensure compatibility with various OpenAI-compatible endpoints, particularly addressing the issue reported with ik_llama.cpp.

Problem

As reported in #8547 and by @ChicoPinto70, the previous implementation in PR #8548 did not work with all OpenAI-compatible endpoints, specifically ik_llama.cpp. The issue was that:

  • The thinking: { type: "enabled" } parameter was always added for GLM-4.6 models
  • Some endpoints (like ik_llama.cpp) may not recognize this parameter and could reject requests or fail to process them correctly

Solution

This implementation takes a more conservative and compatible approach:

Key Changes:

  1. Optional thinking parameter: The thinking parameter is now disabled by default and only added when explicitly enabled via configuration (openAiEnableThinkingParameter: true)

  2. Multiple parsing strategies: The implementation now handles thinking tokens through three different methods to ensure maximum compatibility:

    • XML tags (<think>...</think>) parsing using XmlMatcher
    • Direct reasoning_content field in the response delta (as used by some implementations)
    • The optional thinking parameter for endpoints that support it
  3. Comprehensive testing: Added extensive tests covering all scenarios including:

    • Default behavior (no thinking parameter)
    • Explicit enabling of thinking parameter
    • XML tag parsing for GLM-4.6
    • reasoning_content field handling
    • Mixed format support

Benefits

  • ik_llama.cpp compatibility: Works with local ik_llama.cpp setups by not sending unsupported parameters by default
  • Flexibility: Can be configured to work with different endpoint implementations
  • Backward compatibility: Does not break existing setups
  • Future-proof: Can handle thinking tokens regardless of how the endpoint provides them

Testing

All tests pass ✅:

  • Unit tests: 13 passed
  • Type checking: Successful
  • Linting: No issues

Related Issues

Fixes #8547
Addresses feedback from PR #8548

For @kavehsfv

This implementation ensures GLM-4.6 thinking tokens work across different OpenAI-compatible endpoints. Once merged, it will be included in the next release cycle. The fix is backward-compatible and should work with your setup.

cc: @ChicoPinto70 - This should now work with your ik_llama.cpp setup. The thinking parameter is disabled by default for maximum compatibility.


Important

Improves GLM-4.6 thinking token support by making the thinking parameter optional and adding multiple parsing strategies for better compatibility.

  • Behavior:
    • thinking parameter is now optional for GLM-4.6 models, enabled via openAiEnableThinkingParameter.
    • Supports multiple parsing strategies for thinking tokens: XML tags, reasoning_content, and optional thinking parameter.
  • Testing:
    • Added tests in base-openai-compatible-provider.spec.ts for default behavior, explicit enabling, XML tag parsing, reasoning_content handling, and mixed formats.
  • Functions:
    • Updated createStream() and createMessage() in base-openai-compatible-provider.ts to handle new logic for thinking tokens.
    • Added isGLM46Model() and shouldAddThinkingParameter() to determine model type and parameter inclusion.

This description was created by Ellipsis for d9e20b2. You can customize this summary. It will automatically update as commits are pushed.

- Make thinking parameter optional and disabled by default for ik_llama.cpp compatibility
- Always parse reasoning_content from response for endpoints that provide it
- Always parse XML-wrapped thinking tokens regardless of endpoint
- Add comprehensive tests for all scenarios
- Only enable thinking parameter when explicitly configured

This ensures GLM-4.6 works with various OpenAI-compatible endpoints including
ik_llama.cpp which may not support the thinking parameter but might provide
thinking tokens through other means like reasoning_content or XML tags.

Fixes #8547
@roomote roomote bot requested review from cte, jr and mrubens as code owners October 13, 2025 20:47
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Oct 13, 2025
protected isGLM46Model(modelId: string): boolean {
// Check for various GLM-4.6 model naming patterns
const lowerModel = modelId.toLowerCase()
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third condition lowerModel === "glm-4.6" is redundant since it's already covered by the first condition lowerModel.includes("glm-4.6"). The includes check will return true whenever the exact match check would return true.

Suggested change
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6") || lowerModel === "glm-4.6"
return lowerModel.includes("glm-4.6") || lowerModel.includes("glm-4-6")

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

[BUG] GLM-4.6 not generating thinking tokens when using OpenAI-compatible custom endpoint

2 participants