fix: handle non-tool appended messages in TITO incremental tokenization#949
fix: handle non-tool appended messages in TITO incremental tokenization#949
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the _build_dummy_assistant function to correctly handle leading tool messages within appended_messages, ensuring proper turn-transition tokens are rendered. Feedback suggests restoring the reasoning_content field to maintain consistency with reasoning models and simplifying the tool_calls list comprehension using enumerate on a slice of the messages.
| assistant: dict[str, Any] = {"role": "assistant", "content": ""} | ||
| if num_leading_tools > 0: | ||
| assistant["tool_calls"] = [ | ||
| { | ||
| "id": resp.get("tool_call_id") or f"call0000{i}", | ||
| "id": appended_messages[i].get("tool_call_id") or f"call0000{i}", | ||
| "type": "function", | ||
| "function": { | ||
| "name": resp.get("name") or "dummy_func", | ||
| "name": appended_messages[i].get("name") or "dummy_func", | ||
| "arguments": {}, | ||
| }, | ||
| } | ||
| for i, resp in enumerate(tool_responses) | ||
| ], | ||
| } | ||
| for i in range(num_leading_tools) | ||
| ] |
There was a problem hiding this comment.
The reasoning_content field was removed from the dummy assistant message. This field was present in the previous implementation (line 32) and is often necessary for reasoning models to correctly render turn boundaries (e.g., to ensure the reasoning block is closed). Unless its removal was intentional to fix a specific issue, it should be restored.
Additionally, the tool_calls generation can be simplified using enumerate on a slice of appended_messages.
| assistant: dict[str, Any] = {"role": "assistant", "content": ""} | |
| if num_leading_tools > 0: | |
| assistant["tool_calls"] = [ | |
| { | |
| "id": resp.get("tool_call_id") or f"call0000{i}", | |
| "id": appended_messages[i].get("tool_call_id") or f"call0000{i}", | |
| "type": "function", | |
| "function": { | |
| "name": resp.get("name") or "dummy_func", | |
| "name": appended_messages[i].get("name") or "dummy_func", | |
| "arguments": {}, | |
| }, | |
| } | |
| for i, resp in enumerate(tool_responses) | |
| ], | |
| } | |
| for i in range(num_leading_tools) | |
| ] | |
| assistant: dict[str, Any] = { | |
| "role": "assistant", | |
| "content": "", | |
| "reasoning_content": " ", | |
| } | |
| if num_leading_tools > 0: | |
| assistant["tool_calls"] = [ | |
| { | |
| "id": msg.get("tool_call_id") or f"call0000{i}", | |
| "type": "function", | |
| "function": { | |
| "name": msg.get("name") or "dummy_func", | |
| "arguments": {}, | |
| }, | |
| } | |
| for i, msg in enumerate(appended_messages[:num_leading_tools]) | |
| ] |
| All ``tool`` messages in *appended_messages* (not just leading contiguous | ||
| ones) get matching ``tool_calls``. If there are no tool messages the dummy | ||
| assistant has no ``tool_calls`` — so the template renders the correct | ||
| turn-transition tokens (e.g. ``<|user|>`` instead of ``<|observation|>``). |
There was a problem hiding this comment.
For my sanity check, now our TITO would only support GLM style? Can we still keep it flexible somehow?
If it would be too hard to support Qwen3 chat template, I think it's still good here
There was a problem hiding this comment.
And we don't need reasoning_content any more?
There was a problem hiding this comment.
For my sanity check, now our TITO would only support GLM style? Can we still keep it flexible somehow?
If it would be too hard to support Qwen3 chat template, I think it's still good here
This walkaround for glm 4.7 break qwen impl. Need fix to qwen3 chat template. Unblock TITO dev.
78354bb to
f66555b
Compare
5f454e2 to
045c32e
Compare
bc4f7b9 to
7a510af
Compare
| The default implementation incrementally tokenizes appended non-assistant turns | ||
| with role-specific synthetic prefixes: | ||
|
|
||
| - contiguous ``tool`` runs use ``[dummy_system, dummy_assistant]`` |
There was a problem hiding this comment.
Remove user in tool to avoid boundary issue around think.
- test_pretokenized_chat: eliminate _RAW_* import aliases by renaming local pytest params to _*_PARAMS suffix; replace 8 ids/values variables + 2 assert guards with _template_params() helper and filtered dicts - test_tito_tokenizer_model_matrix: inline 4 single-use factory functions, merge 2 identical test functions into one, simplify parametrization to a single list comprehension, inline trivial _get_assistant_start_str Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Module docstring: user messages use [dummy_system] not [dummy_system, dummy_user] - Remove dead _DUMMY_USER (unused after segmented rewrite) - tokenize_additional_non_assistant docstring: mention user follow-ups - Remove stale comment referencing deleted "No user query found" validation - Update Qwen3.5 exclusion reason to reflect current template behavior - _split_at docstring: include user in non-assistant roles - Test module docstring: describe new segmentation/merge tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
This PR fixes TITO incremental tokenization for non-assistant appends by replacing the old single dummy-diff approach with role-aware segmentation.
Previously, append tokenization could become unstable when appending
user/systemmessages (and mixedtool+ non-tool sequences), because boundary tokens were inferred from a single synthetic context.Now we tokenize appended content segment-by-segment using role-specific synthetic prefixes.
What Changed
tokenize_additional_non_assistantintito_tokenizer.pyto a role-segmented pipeline.toolmessages are grouped and tokenized together. use[dummy_system, dummy_assistant]in additional tokenzer withoutdummy_userto avoid any cut think issue across models.userandsystemmessage is tokenized as a singleton segment, and use[dummy_system]in additional tokenzersystemmessage behavior, and revert previous modifications.Tests
Updated unit tests (
test_tito_tokenizer.py)usermessage.test_tito_tokenizer_model_matrix.pyrun many additional tokenize checks across different models.New model-matrix tests (
test_tito_tokenizer_model_matrix.py)TokenSeqComparatorto check mismatches except assistant-text diff.