FIX: Top logprob bug in Offline._parse_logprobs #47
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current implementation sometimes picks up on the incorrect decoded token because it implicitly assumes that the model always picks the token with the highest logprob (which is what
logprob.rank==1
looks for). This leads to outputs such as these:Notice how when you put together the
Logprobs.token
’s, you don’t get the same thing as the model's actual output (often because the model was "indecisive" between a number and a space). This then confusesClassifier._parse_logprobs
which only looks atLogprobs.token
to determine which sequence positions contain the model’s labels. In the case of the logs I screenshotted above, this leads toClassifier._parse_logprobs
returning a list of the wrong length, which ultimately causes the assertion atscorers/classifier/classifier.py:133
to fail.My PR fixes this by always setting Logprobs.token to the actual token which the model chose, rather than the token it was maximally likely to choose (but may or may not have actually chosen).