Skip to content

fix: handle non-tool appended messages in TITO incremental tokenization#949

Open
guapisolo wants to merge 9 commits intomainfrom
fix/tito-non-tool-append
Open

fix: handle non-tool appended messages in TITO incremental tokenization#949
guapisolo wants to merge 9 commits intomainfrom
fix/tito-non-tool-append

Conversation

@guapisolo
Copy link
Copy Markdown
Collaborator

@guapisolo guapisolo commented Apr 7, 2026

Summary

This PR fixes TITO incremental tokenization for non-assistant appends by replacing the old single dummy-diff approach with role-aware segmentation.

Previously, append tokenization could become unstable when appending user/system messages (and mixed tool + non-tool sequences), because boundary tokens were inferred from a single synthetic context.
Now we tokenize appended content segment-by-segment using role-specific synthetic prefixes.

What Changed

  • Refactored tokenize_additional_non_assistant in tito_tokenizer.py to a role-segmented pipeline.
  • Added segment splitting rules:
    • contiguous tool messages are grouped and tokenized together. use [dummy_system, dummy_assistant] in additional tokenzer without dummy_user to avoid any cut think issue across models.
    • each user and system message is tokenized as a singleton segment, and use [dummy_system] in additional tokenzer
  • Keep Qwen-3.5 template ban system message behavior, and revert previous modifications.
  • Modify Qwen-3.5 chat template logic to skip no user message check.

Tests

Updated unit tests (test_tito_tokenizer.py)

  • Add coverage for user message.
  • Add coverage for intermediate system check, which can detect templates that ban intermediate system like qwen3.5
  • test_tito_tokenizer_model_matrix.py run many additional tokenize checks across different models.

New model-matrix tests (test_tito_tokenizer_model_matrix.py)

  • Added cross-model matrix cases and existing failure explanation.
  • Assertions focus on using TokenSeqComparator to check mismatches except assistant-text diff.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the _build_dummy_assistant function to correctly handle leading tool messages within appended_messages, ensuring proper turn-transition tokens are rendered. Feedback suggests restoring the reasoning_content field to maintain consistency with reasoning models and simplifying the tool_calls list comprehension using enumerate on a slice of the messages.

Comment on lines +40 to +52
assistant: dict[str, Any] = {"role": "assistant", "content": ""}
if num_leading_tools > 0:
assistant["tool_calls"] = [
{
"id": resp.get("tool_call_id") or f"call0000{i}",
"id": appended_messages[i].get("tool_call_id") or f"call0000{i}",
"type": "function",
"function": {
"name": resp.get("name") or "dummy_func",
"name": appended_messages[i].get("name") or "dummy_func",
"arguments": {},
},
}
for i, resp in enumerate(tool_responses)
],
}
for i in range(num_leading_tools)
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The reasoning_content field was removed from the dummy assistant message. This field was present in the previous implementation (line 32) and is often necessary for reasoning models to correctly render turn boundaries (e.g., to ensure the reasoning block is closed). Unless its removal was intentional to fix a specific issue, it should be restored.

Additionally, the tool_calls generation can be simplified using enumerate on a slice of appended_messages.

Suggested change
assistant: dict[str, Any] = {"role": "assistant", "content": ""}
if num_leading_tools > 0:
assistant["tool_calls"] = [
{
"id": resp.get("tool_call_id") or f"call0000{i}",
"id": appended_messages[i].get("tool_call_id") or f"call0000{i}",
"type": "function",
"function": {
"name": resp.get("name") or "dummy_func",
"name": appended_messages[i].get("name") or "dummy_func",
"arguments": {},
},
}
for i, resp in enumerate(tool_responses)
],
}
for i in range(num_leading_tools)
]
assistant: dict[str, Any] = {
"role": "assistant",
"content": "",
"reasoning_content": " ",
}
if num_leading_tools > 0:
assistant["tool_calls"] = [
{
"id": msg.get("tool_call_id") or f"call0000{i}",
"type": "function",
"function": {
"name": msg.get("name") or "dummy_func",
"arguments": {},
},
}
for i, msg in enumerate(appended_messages[:num_leading_tools])
]

@guapisolo guapisolo changed the title fix: handle non-tool appended messages in TITO dummy assistant fix: Ban qwen3 tito model and handle non-tool appended messages in TITO dummy assistant Apr 7, 2026
@guapisolo guapisolo requested a review from yushengsu-thu as a code owner April 7, 2026 23:01
All ``tool`` messages in *appended_messages* (not just leading contiguous
ones) get matching ``tool_calls``. If there are no tool messages the dummy
assistant has no ``tool_calls`` — so the template renders the correct
turn-transition tokens (e.g. ``<|user|>`` instead of ``<|observation|>``).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my sanity check, now our TITO would only support GLM style? Can we still keep it flexible somehow?

If it would be too hard to support Qwen3 chat template, I think it's still good here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we don't need reasoning_content any more?

Copy link
Copy Markdown
Collaborator Author

@guapisolo guapisolo Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my sanity check, now our TITO would only support GLM style? Can we still keep it flexible somehow?

If it would be too hard to support Qwen3 chat template, I think it's still good here

This walkaround for glm 4.7 break qwen impl. Need fix to qwen3 chat template. Unblock TITO dev.

@guapisolo guapisolo force-pushed the fix/tito-non-tool-append branch 2 times, most recently from 78354bb to f66555b Compare April 10, 2026 00:24
@guapisolo guapisolo changed the title fix: Ban qwen3 tito model and handle non-tool appended messages in TITO dummy assistant fix: handle non-tool appended messages in TITO incremental tokenization Apr 10, 2026
@guapisolo guapisolo force-pushed the fix/tito-non-tool-append branch 2 times, most recently from 5f454e2 to 045c32e Compare April 10, 2026 06:23
@guapisolo guapisolo force-pushed the fix/tito-non-tool-append branch from bc4f7b9 to 7a510af Compare April 10, 2026 06:31
The default implementation incrementally tokenizes appended non-assistant turns
with role-specific synthetic prefixes:

- contiguous ``tool`` runs use ``[dummy_system, dummy_assistant]``
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove user in tool to avoid boundary issue around think.

guapisolo and others added 4 commits April 10, 2026 19:48
- test_pretokenized_chat: eliminate _RAW_* import aliases by renaming
  local pytest params to _*_PARAMS suffix; replace 8 ids/values variables
  + 2 assert guards with _template_params() helper and filtered dicts
- test_tito_tokenizer_model_matrix: inline 4 single-use factory functions,
  merge 2 identical test functions into one, simplify parametrization to
  a single list comprehension, inline trivial _get_assistant_start_str

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Module docstring: user messages use [dummy_system] not [dummy_system, dummy_user]
- Remove dead _DUMMY_USER (unused after segmented rewrite)
- tokenize_additional_non_assistant docstring: mention user follow-ups
- Remove stale comment referencing deleted "No user query found" validation
- Update Qwen3.5 exclusion reason to reflect current template behavior
- _split_at docstring: include user in non-assistant roles
- Test module docstring: describe new segmentation/merge tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants