QuantizedModel: support group_size -1 (per-channel) #1244

jambayk · 2025-02-12T18:22:02Z

Both gptq and awq support per channel quantization. This is represented using group_size = -1. But the builder is not able to handle this case.

This PR makes group_size into a property that returns in_features if _group_size == -1. The group size cannot be set during init since in_features is set after init.

Note: The resultant MatMulNBits node might be invalid if the group_size is not supported by the operator kernel, but it can be converted into a valid DQ -> MatMul. This can be done either in the builder or subsequently using an olive pass https://github.com/microsoft/Olive/blob/50f360aeacfb949abc0d845e4070922555f7c58a/olive/passes/onnx/mnb_to_qdq.py#L27.

This PR also fixes this issue.

support group_size -1

86a6889

BowenBao approved these changes Feb 12, 2025

View reviewed changes

kunal-vaishnavi approved these changes Feb 12, 2025

View reviewed changes

kunal-vaishnavi added the 0.6.0 label Feb 12, 2025

jambayk merged commit e0123fc into main Feb 12, 2025
14 checks passed

jambayk deleted the jambayk/quant-per-channel branch February 12, 2025 23:39

baijumeswani pushed a commit that referenced this pull request Feb 13, 2025

QuantizedModel: support group_size -1 (per-channel) (#1244)

ff1cf54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QuantizedModel: support group_size -1 (per-channel) #1244

QuantizedModel: support group_size -1 (per-channel) #1244

jambayk commented Feb 12, 2025 •

edited by kunal-vaishnavi

Loading

QuantizedModel: support group_size -1 (per-channel) #1244

QuantizedModel: support group_size -1 (per-channel) #1244

Conversation

jambayk commented Feb 12, 2025 • edited by kunal-vaishnavi Loading

jambayk commented Feb 12, 2025 •

edited by kunal-vaishnavi

Loading