-
Notifications
You must be signed in to change notification settings - Fork 31
[Multi-Modifier] Scoped apply quantization config #432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
03fb664
to
550c0ad
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
f70aedb
to
606f177
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
FYI #428. Also touches some apply logic and adds more scheme merging |
Signed-off-by: Brian Dellabetta <[email protected]>
24af65a
to
8259cbb
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
8259cbb
to
b515c1b
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a warning is necessary if the schemes overwrite. Looks good to me
Signed-off-by: Brian Dellabetta <[email protected]>
92f8757
to
d2903a1
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! LGTM! 🚀
Signed-off-by: Brian Dellabetta <[email protected]>
98a97e5
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
f7239b1
to
01af659
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
In order to support multi-modifier recipes (e.g. AWQ+W4A16 on self_attn layers and FP8_DYNAMIC on mlp layers), quantization config and status must be applied only to the modules scoped to the modifier, not all at once. This updates
apply_quantization_config
so that quantization_config and quantization_status are applied just to the target modules, not changed globally across all modules.In order for proper target prioritization,
apply_quantization_status
is performed regardless of what the current status is for the model. Without these changes,test_target_prioritization
will fail.Other small changes:
test_multi_apply_quantization_config
to make sure the application of multiple quantization configs in series works correctly -- shapes are correct and unused parameters are correctly removed.override_quantization_status
in favor of more generalpatch_attr
.infer_quantization_status
which is no longer meaningful at the model level. It is also no longer needed because module's current status isn't checked.ALL_QPARAM_NAMES
constant so that parameters related to quantization can be cleared from modules during init"quant_method": "sparseml"
in favor of"compressed-tensors"
compress_quantized_weights
andapply_quantization_status
. We can removecompress_quantized_weights
and references to it in examples/notebooks in a follow-up PRMerge in conjunction with