fix: handle non-tool appended messages in TITO incremental tokenization by guapisolo · Pull Request #949 · radixark/miles

guapisolo · 2026-04-07T20:36:40Z

Summary

This PR fixes TITO incremental tokenization for non-assistant appends by replacing the old single dummy-diff approach with role-aware segmentation.

Previously, append tokenization could become unstable when appending user/system messages (and mixed tool + non-tool sequences), because boundary tokens were inferred from a single synthetic context.
Now we tokenize appended content segment-by-segment using role-specific synthetic prefixes.

What Changed

Refactored tokenize_additional_non_assistant in tito_tokenizer.py to a role-segmented pipeline.
Added segment splitting rules:
- contiguous tool messages are grouped and tokenized together. use [dummy_system, dummy_assistant] in additional tokenzer without dummy_user to avoid any cut think issue across models.
- each user and system message is tokenized as a singleton segment, and use [dummy_system] in additional tokenzer
Keep Qwen-3.5 template ban system message behavior, and revert previous modifications.
Modify Qwen-3.5 chat template logic to skip no user message check.

Tests

Updated unit tests (`test_tito_tokenizer.py`)

Add coverage for user message.
Add coverage for intermediate system check, which can detect templates that ban intermediate system like qwen3.5
test_tito_tokenizer_model_matrix.py run many additional tokenize checks across different models.

New model-matrix tests (`test_tito_tokenizer_model_matrix.py`)

Added cross-model matrix cases and existing failure explanation.
Assertions focus on using TokenSeqComparator to check mismatches except assistant-text diff.

gemini-code-assist

Code Review

This pull request updates the _build_dummy_assistant function to correctly handle leading tool messages within appended_messages, ensuring proper turn-transition tokens are rendered. Feedback suggests restoring the reasoning_content field to maintain consistency with reasoning models and simplifying the tool_calls list comprehension using enumerate on a slice of the messages.

gemini-code-assist · 2026-04-07T20:38:30Z

miles/utils/chat_template_utils/tito_tokenizer.py

+    assistant: dict[str, Any] = {"role": "assistant", "content": ""}
+    if num_leading_tools > 0:
+        assistant["tool_calls"] = [
            {
-                "id": resp.get("tool_call_id") or f"call0000{i}",
+                "id": appended_messages[i].get("tool_call_id") or f"call0000{i}",
                "type": "function",
                "function": {
-                    "name": resp.get("name") or "dummy_func",
+                    "name": appended_messages[i].get("name") or "dummy_func",
                    "arguments": {},
                },
            }
-            for i, resp in enumerate(tool_responses)
-        ],
-    }
+            for i in range(num_leading_tools)
+        ]


The reasoning_content field was removed from the dummy assistant message. This field was present in the previous implementation (line 32) and is often necessary for reasoning models to correctly render turn boundaries (e.g., to ensure the reasoning block is closed). Unless its removal was intentional to fix a specific issue, it should be restored.

Additionally, the tool_calls generation can be simplified using enumerate on a slice of appended_messages.

Suggested change

assistant: dict[str, Any] = {"role": "assistant", "content": ""}

if num_leading_tools > 0:

assistant["tool_calls"] = [

{

"id": resp.get("tool_call_id") or f"call0000{i}",

"id": appended_messages[i].get("tool_call_id") or f"call0000{i}",

"type": "function",

"function": {

"name": resp.get("name") or "dummy_func",

"name": appended_messages[i].get("name") or "dummy_func",

"arguments": {},

},

}

for i, resp in enumerate(tool_responses)

],

}

for i in range(num_leading_tools)

]

assistant: dict[str, Any] = {

"role": "assistant",

"content": "",

"reasoning_content": " ",

}

if num_leading_tools > 0:

assistant["tool_calls"] = [

{

"id": msg.get("tool_call_id") or f"call0000{i}",

"type": "function",

"function": {

"name": msg.get("name") or "dummy_func",

"arguments": {},

},

}

for i, msg in enumerate(appended_messages[:num_leading_tools])

]

maocheng23 · 2026-04-08T21:53:01Z

miles/utils/chat_template_utils/tito_tokenizer.py

+    All ``tool`` messages in *appended_messages* (not just leading contiguous
+    ones) get matching ``tool_calls``.  If there are no tool messages the dummy
+    assistant has no ``tool_calls`` — so the template renders the correct
+    turn-transition tokens (e.g. ``<|user|>`` instead of ``<|observation|>``).


For my sanity check, now our TITO would only support GLM style? Can we still keep it flexible somehow?

If it would be too hard to support Qwen3 chat template, I think it's still good here

And we don't need reasoning_content any more?

For my sanity check, now our TITO would only support GLM style? Can we still keep it flexible somehow?

If it would be too hard to support Qwen3 chat template, I think it's still good here

This walkaround for glm 4.7 break qwen impl. Need fix to qwen3 chat template. Unblock TITO dev.

guapisolo · 2026-04-10T18:12:14Z

miles/utils/chat_template_utils/tito_tokenizer.py

+The default implementation incrementally tokenizes appended non-assistant turns
+with role-specific synthetic prefixes:
+
+- contiguous ``tool`` runs use ``[dummy_system, dummy_assistant]``


Remove user in tool to avoid boundary issue around think.

- test_pretokenized_chat: eliminate _RAW_* import aliases by renaming local pytest params to _*_PARAMS suffix; replace 8 ids/values variables + 2 assert guards with _template_params() helper and filtered dicts - test_tito_tokenizer_model_matrix: inline 4 single-use factory functions, merge 2 identical test functions into one, simplify parametrization to a single list comprehension, inline trivial _get_assistant_start_str Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Module docstring: user messages use [dummy_system] not [dummy_system, dummy_user] - Remove dead _DUMMY_USER (unused after segmented rewrite) - tokenize_additional_non_assistant docstring: mention user follow-ups - Remove stale comment referencing deleted "No user query found" validation - Update Qwen3.5 exclusion reason to reflect current template behavior - _split_at docstring: include user in non-assistant roles - Test module docstring: describe new segmentation/merge tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

guapisolo requested review from fzyzcjy, maocheng23 and yueming-yuan as code owners April 7, 2026 20:36

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

guapisolo added the run-ci-sglang label Apr 7, 2026

guapisolo changed the title ~~fix: handle non-tool appended messages in TITO dummy assistant~~ fix: Ban qwen3 tito model and handle non-tool appended messages in TITO dummy assistant Apr 7, 2026

guapisolo requested a review from yushengsu-thu as a code owner April 7, 2026 23:01

maocheng23 reviewed Apr 8, 2026

View reviewed changes

guapisolo force-pushed the fix/tito-non-tool-append branch 2 times, most recently from 78354bb to f66555b Compare April 10, 2026 00:24

guapisolo changed the title ~~fix: Ban qwen3 tito model and handle non-tool appended messages in TITO dummy assistant~~ fix: handle non-tool appended messages in TITO incremental tokenization Apr 10, 2026

guapisolo force-pushed the fix/tito-non-tool-append branch 2 times, most recently from 5f454e2 to 045c32e Compare April 10, 2026 06:23

guapisolo added 4 commits April 10, 2026 06:31

fix additional tokenize calc

9207776

Expand TITO model matrix coverage

54c0d17

fix tests

e524ede

fix

7a510af

guapisolo force-pushed the fix/tito-non-tool-append branch from bc4f7b9 to 7a510af Compare April 10, 2026 06:31

Fix qwen3.5 pretokenized config and Nemotron assistant start mapping

ee462cb

guapisolo commented Apr 10, 2026

View reviewed changes

guapisolo and others added 4 commits April 10, 2026 19:48

revert intermediate system change for qwen35, add probe and fuse msg

876d0f2

fix: simplify docstring wording

4f66176

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle non-tool appended messages in TITO incremental tokenization#949

fix: handle non-tool appended messages in TITO incremental tokenization#949
guapisolo wants to merge 9 commits intomainfrom
fix/tito-non-tool-append

guapisolo commented Apr 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

maocheng23 Apr 8, 2026

Uh oh!

maocheng23 Apr 8, 2026

Uh oh!

guapisolo Apr 8, 2026 •

edited

Loading

Uh oh!

guapisolo Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guapisolo commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Tests

Updated unit tests (test_tito_tokenizer.py)

New model-matrix tests (test_tito_tokenizer_model_matrix.py)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

maocheng23 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

maocheng23 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

guapisolo Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guapisolo Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guapisolo commented Apr 7, 2026 •

edited

Loading

Updated unit tests (`test_tito_tokenizer.py`)

New model-matrix tests (`test_tito_tokenizer_model_matrix.py`)

guapisolo Apr 8, 2026 •

edited

Loading