-
Notifications
You must be signed in to change notification settings - Fork 232
[Multi-modifier] Support scoped application of quantization config/status #1772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
5fec983
to
2f93072
Compare
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
2f93072
to
f99db2f
Compare
@@ -178,7 +182,7 @@ def on_start(self, state: State, event: Event, **kwargs): | |||
|
|||
# register gptq hooks | |||
added_hook = False | |||
for module in state.model.modules(): | |||
for _, module in match_named_modules(state.model, self.targets, self.ignore): | |||
if getattr_chain(module, "quantization_scheme.weights", None) is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be changed into an assert rather than an if?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i think that makes sense, I can update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm on second thought, it doesn't look like there's anything in the validation layer confirming each quantization args instance has a weights field. so if a user sets an invalid config where weight quantization isn't configured, it would error out here. Is that what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer explicit error rather than silent skip here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do this in the validation layer though? I can add a check to model validate, and switch to assert?
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding some basic tests/ common use cases, otherwise looks good!
@@ -0,0 +1,101 @@ | |||
from datasets import load_dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe generalize this folder name to mixed-precision
so that people associate this with enabling mixed precision workloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though multi-modifier recipes don't necessarily need to be mixed precision? I just think mixed-precision is a stronger message than multi-modifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename to examples/mixed_precision
if we find that to be more appropriate. Let's discuss in standup
max_seq_length=MAX_SEQUENCE_LENGTH, | ||
num_calibration_samples=NUM_CALIBRATION_SAMPLES, | ||
# Option 1) run both modifiers in a single calibrated run | ||
pipeline="sequential", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the pipeline not already sequential by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it actually defaults to independent. Without it set, it infers to use SequentialPipeline
for GPTQ, running just GPTQ independently. It then infers to use SequentialPipeline
for AWQ, running just AWQ independently. Not sure what we want for default behavior though, this just makes it explicit
print("==========================================\n\n") | ||
|
||
# Save to disk compressed. | ||
SAVE_DIR = model_id.rstrip("/").split("/")[-1] + "-W4A16-G128" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add GPTQ and AWQ to the names?
SUMMARY:
Prerequisites:
This allows for multi-modifier support by scoping the application of quantization config/status to only the modules in the model that match the given targets/ignore configuration, rather than all modules. Initialization of observers is moved to on_start (instead of on_initialize) to match their removal on_end (and not on_finalize). This prevents collision during the multi-modifier lifecycle
TEST PLAN:
"sequential"
, and correct behavior for"independent"
pipelines. Model checkpoint for the sequential pipeline shows correct application of W8A8 to self_attn layers and W4A16 to mlp layers. config.json and safetensors weights all look as expected