-
Notifications
You must be signed in to change notification settings - Fork 33
feat: support zero-point decompression for asymmetric quantization (packed) #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Fix decompress_weight method in PackedQuantizationCompressor to support unpacking zero-points - Add comprehensive tests for zero-point packing/unpacking with GROUP and CHANNEL strategies - Add end-to-end integration tests for asymmetric quantization workflow - Ensure packed tensors are contiguous for safetensors compatibility Resolves issue referenced in vllm-project/llm-compressor#1704
…move manual creation
…v similarity; cleanup temp usage
2cf7124 to
126fc89
Compare
|
I've addressed all your concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good!
Some questions about the e2e test but otherwise, looks great!
tests/test_compressors/quantized_compressors/test_packed_asym_decompression.py
Outdated
Show resolved
Hide resolved
tests/test_compressors/quantized_compressors/test_packed_asym_decompression.py
Outdated
Show resolved
Hide resolved
tests/test_compressors/quantized_compressors/test_packed_asym_decompression.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good pending @dsikka 's comments, thanks for updating!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending @dsikka 's comments! Thank you for this contribution
|
Hi @Etelis - do you think you'll get chance to address the remaining comments to get this PR over the line? |
|
Yes yes sorry. |
|
Addressed review comments: Switched to in-memory methods: Replaced compressor.compress() / compressor.decompress() (disk-based) with ModelCompressor.compress_model() / decompress_model() (in-memory). Removed manual parameter registration: The manual register_parameter() logic is no longer needed since compress_model() / decompress_model() handle parameter management internally. all 4 test cases pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I think these changes make sense. Do checkpointed models load in vllm?
|
Please run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Do you mind running make style and make quality to address the quality issues?
We can land after those are addressed
Signed-off-by: Brian Dellabetta <[email protected]>
3ffb213
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran style fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved merge conflict
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Follow-up to PR #459 (which was closed due to an accidental history rewrite). The branch has been restored to the pre-rewrite tip (2cf7124).
This PR:
Refs vllm-project/llm-compressor#1704.