Revert VLM support in parse_response#5561
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
I'm not sure of this PR, as it seems to overlap quite a bit with what I'm currently addressing in #5489. The approach I’m taking there is intentionally incremental:
- First, ensure that we can consistently pass only tokenizer instances to
parse_response(by introducingself._tokenizeracross trainers). - Then, in a follow-up step, simplify parse_response to only accept tokenizers: Make parse_response accept only tokenizer
Given that, this PR feels somewhat like duplicated effort. Would it make sense to wait for #5489 to land instead?
For context, I’m already working through the relevant discussion here: #5489 (comment)
parse_responseonly needs a tokenizer instance but it had to handle both because we did not have a simple way to pass only tokenizer. Once we implementself._tokenizerin all trainers,parse_responsecould be simplified to accept only tokenizer instances.
and here: #5489 (comment)
More broadly, the underlying goal of this PR is to centralize the processor/tokenizer disambiguation within processing_class in a single place, so that the rest of the code can rely on a well-defined and consistent interface, with a clear expected class instance.
In that sense, the current change in calling
parse_responseis an intermediate step toward that simplification, rather than a deviation from it.
|
Ah yes ok, Lgtm @albertvillanova |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 973cf25. Configure here.

parse_responsepreviously accepted either a tokenizer or a processor (from #5323) and unwrapped the inner tokenizer on the fly. Now that call sites can easily pass the tokenizer directly, we move that disambiguation to the call sites and keepparse_responsestrictly tokenizer-only. This centralizes the "processor vs tokenizer" logic in one place per trainer and makesparse_response's contract simpler.Note
Medium Risk
Touches response parsing used during RLHF tool-call decoding; missed/unhandled call sites or incorrect tokenizer selection could break parsing for some models, especially VLM processors.
Overview
parse_responsenow only accepts aPreTrainedTokenizer(removing implicit VLM processor support/auto-unwrapping) and updates its docstring accordingly.All affected call sites (notably
GRPOTrainer/DPPOTrainertool-call decoding paths andTestParseResponse) now explicitly selectprocessing_class.tokenizerfor VLM processors before callingparse_response, keeping response/tool-call parsing behavior the same while simplifying the helper’s contract.Reviewed by Cursor Bugbot for commit cc3905c. Bugbot is set up for automated code reviews on this repo. Configure here.