fix: assistant msg cropping might crop suffix special tokens #329
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The complexity here: we need to crop the assistant messages so that they end on a block boundary. We cannot pad them out, because this is not what the model server will do when generating the assistant message. But it will cache the prefix of full blocks. Hence the need to crop. However, we cannot just crop at the end, as this would also crop off any "end of text" special tokens that the chat template adds to the end of the
self.assistant(m)token sequence. Therefore, we need to crop just the message part. This logic tries to do all of that in a way that is agnostic to the cast template. However, the logic does currently assume that the chat template will never add special tokens in the middle of the given messagem; it assumes special tokens are only ever added (if at all) to the beginning or end.DO NOT MERGE
TODO: see the discussion below. We need to crop and also pad out the assistant suffix special token.