Skip to content

Revert VLM support in parse_response#5561

Draft
qgallouedec wants to merge 3 commits intomainfrom
revert-vlm-support-in-parse_response
Draft

Revert VLM support in parse_response#5561
qgallouedec wants to merge 3 commits intomainfrom
revert-vlm-support-in-parse_response

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

@qgallouedec qgallouedec commented Apr 15, 2026

parse_response previously accepted either a tokenizer or a processor (from #5323) and unwrapped the inner tokenizer on the fly. Now that call sites can easily pass the tokenizer directly, we move that disambiguation to the call sites and keep parse_response strictly tokenizer-only. This centralizes the "processor vs tokenizer" logic in one place per trainer and makes parse_response's contract simpler.


Note

Medium Risk
Touches response parsing used during RLHF tool-call decoding; missed/unhandled call sites or incorrect tokenizer selection could break parsing for some models, especially VLM processors.

Overview
parse_response now only accepts a PreTrainedTokenizer (removing implicit VLM processor support/auto-unwrapping) and updates its docstring accordingly.

All affected call sites (notably GRPOTrainer/DPPOTrainer tool-call decoding paths and TestParseResponse) now explicitly select processing_class.tokenizer for VLM processors before calling parse_response, keeping response/tool-call parsing behavior the same while simplifying the helper’s contract.

Reviewed by Cursor Bugbot for commit cc3905c. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of this PR, as it seems to overlap quite a bit with what I'm currently addressing in #5489. The approach I’m taking there is intentionally incremental:

  • First, ensure that we can consistently pass only tokenizer instances to parse_response (by introducing self._tokenizer across trainers).
  • Then, in a follow-up step, simplify parse_response to only accept tokenizers: Make parse_response accept only tokenizer

Given that, this PR feels somewhat like duplicated effort. Would it make sense to wait for #5489 to land instead?

For context, I’m already working through the relevant discussion here: #5489 (comment)

parse_response only needs a tokenizer instance but it had to handle both because we did not have a simple way to pass only tokenizer. Once we implement self._tokenizer in all trainers, parse_response could be simplified to accept only tokenizer instances.

and here: #5489 (comment)

More broadly, the underlying goal of this PR is to centralize the processor/tokenizer disambiguation within processing_class in a single place, so that the rest of the code can rely on a well-defined and consistent interface, with a clear expected class instance.

In that sense, the current change in calling parse_response is an intermediate step toward that simplification, rather than a deviation from it.

@qgallouedec
Copy link
Copy Markdown
Member Author

Ah yes ok, Lgtm @albertvillanova

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 973cf25. Configure here.

Comment thread trl/chat_template_utils.py
@qgallouedec qgallouedec marked this pull request as draft April 22, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants