fix: enabling block-by-block evaluation for granite-3.x-models #165
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the change
The block-by-block model evaluation does not work with granite-3.x models. This is because the eval_llm_1GPU does not take into consideration the peculiarities associated with granite-3.x-models.
The quant/ptq.py was updated to reflect the current naming conventions for the various blocks in granite-3.x models.
The eval_llm_1GPU function was updated to correctly calculate the logits for granite-3. x models.
Related issues or PRs
#155
How to verify the PR
Was the PR tested
The fix was validated by performing FP8 DQ-SQ using the granite-3.0-8B-Instruct/granite-3.3-8B-Instruct models. The evaluation of the quantized and the unquantized models using eval_llm_1GPU achieved the same results as using the "evaluator.evaluate" method
Checklist for passing CI/CD:
git commit -signoffor equivalenttox -e fixtox -e linttox -e spellchecktox -e unitNote: CI/CD performs unit tests on multiple versions of Python from a fresh install. There may be differences with your local environment and the test environment.