fix: enabling block-by-block evaluation for granite-3.x-models #165

bayo-ibm · 2025-07-22T02:59:28Z

Description of the change

The block-by-block model evaluation does not work with granite-3.x models. This is because the eval_llm_1GPU does not take into consideration the peculiarities associated with granite-3.x-models.

The quant/ptq.py was updated to reflect the current naming conventions for the various blocks in granite-3.x models.
The eval_llm_1GPU function was updated to correctly calculate the logits for granite-3. x models.

Related issues or PRs

#155

How to verify the PR

Was the PR tested

The fix was validated by performing FP8 DQ-SQ using the granite-3.0-8B-Instruct/granite-3.3-8B-Instruct models. The evaluation of the quantized and the unquantized models using eval_llm_1GPU achieved the same results as using the "evaluator.evaluate" method

I have added >=1 unit test(s) for every new method I have added (if that coverage is difficult, please briefly explain the reason)
I have ensured all unit tests pass

Checklist for passing CI/CD:

All commits are signed showing "Signed-off-by: Name <[email protected]>" with git commit -signoff or equivalent
PR title and commit messages adhere to Conventional Commits
Contribution is formatted with tox -e fix
Contribution passes linting with tox -e lint
Contribution passes spellcheck with tox -e spellcheck
Contribution passes all unit tests with tox -e unit

Note: CI/CD performs unit tests on multiple versions of Python from a fresh install. There may be differences with your local environment and the test environment.

fms_mo/quant/ptq.py

Signed-off-by: omobayode.fagbohungbe <[email protected]>

andrea-fasoli

I worked with Bayo to troubleshoot this Granite evaluation issue. The solution looks functional.

Signed-off-by: omobayode.fagbohungbe <[email protected]>

fms_mo/utils/eval_utils.py

Signed-off-by: omobayode.fagbohungbe <[email protected]>

chichun-charlie-liu · 2025-07-24T14:34:25Z

pylint issue regarding triton kernel will be fixed by #166 .

bayo-ibm requested review from BrandonGroth, andrea-fasoli, chichun-charlie-liu, kcirred, nwang-ibm and tharapalanivel as code owners July 22, 2025 02:59

bayo-ibm force-pushed the eval_1gpu branch from 7bd8ac4 to c195e9b Compare July 22, 2025 03:14

bayo-ibm changed the title ~~Enabling block-by-block evaluation for granite-3.x-models~~ Fix: enabling block-by-block evaluation for granite-3.x-models Jul 22, 2025

bayo-ibm changed the title ~~Fix: enabling block-by-block evaluation for granite-3.x-models~~ fix: enabling block-by-block evaluation for granite-3.x-models Jul 22, 2025

github-actions bot added the fix label Jul 22, 2025

bayo-ibm force-pushed the eval_1gpu branch 2 times, most recently from 8a68e2c to 6cdbd4c Compare July 22, 2025 03:34

andrea-fasoli reviewed Jul 22, 2025

View reviewed changes

fms_mo/quant/ptq.py Outdated Show resolved Hide resolved

fix: Enabled eval_llm_1GPU for granite-3.x-models

bf53238

Signed-off-by: omobayode.fagbohungbe <[email protected]>

bayo-ibm force-pushed the eval_1gpu branch from 6cdbd4c to bf53238 Compare July 22, 2025 16:35

andrea-fasoli approved these changes Jul 22, 2025

View reviewed changes

fix: updated ways of getting obtaining logits_scaling

6c1da4c

Signed-off-by: omobayode.fagbohungbe <[email protected]>

andrea-fasoli reviewed Jul 24, 2025

View reviewed changes

fms_mo/utils/eval_utils.py Show resolved Hide resolved

fix: added comments to the logits scaling calculation in eval_llm_1GPU

3d5b342

Signed-off-by: omobayode.fagbohungbe <[email protected]>

andrea-fasoli approved these changes Jul 24, 2025

View reviewed changes

chichun-charlie-liu merged commit 78214ae into foundation-model-stack:main Jul 24, 2025
10 of 11 checks passed

andrea-fasoli mentioned this pull request Jul 24, 2025

large models evaluation on single GPU fails #155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: enabling block-by-block evaluation for granite-3.x-models #165

fix: enabling block-by-block evaluation for granite-3.x-models #165

Uh oh!

bayo-ibm commented Jul 22, 2025 •

edited by andrea-fasoli

Loading

Uh oh!

Uh oh!

andrea-fasoli left a comment

Uh oh!

Uh oh!

chichun-charlie-liu commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: enabling block-by-block evaluation for granite-3.x-models #165

fix: enabling block-by-block evaluation for granite-3.x-models #165

Uh oh!

Conversation

bayo-ibm commented Jul 22, 2025 • edited by andrea-fasoli Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Related issues or PRs

How to verify the PR

Was the PR tested

Checklist for passing CI/CD:

Uh oh!

Uh oh!

andrea-fasoli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chichun-charlie-liu commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bayo-ibm commented Jul 22, 2025 •

edited by andrea-fasoli

Loading