[Multi-modifier] Support scoped application of quantization config/status #1772

brian-dellabetta · 2025-08-21T19:52:09Z

SUMMARY:
Prerequisites:

[Multi-Modifier] Scoped apply quantization config neuralmagic/compressed-tensors#432

This allows for multi-modifier support by scoping the application of quantization config/status to only the modules in the model that match the given targets/ignore configuration, rather than all modules. Initialization of observers is moved to on_start (instead of on_initialize) to match their removal on_end (and not on_finalize). This prevents collision during the multi-modifier lifecycle

Update AWQ
Update QuantizationModifier
Update QuantizationMixin
Update GPTQ
Any others?
Should we enable/disable quantization for the entire model or only matching modules? See TODO here

TEST PLAN:

Tests were added to [Multi-Modifier] Scoped apply quantization config neuralmagic/compressed-tensors#432 to confirm correct application of multiple modifiers.
Added an example in this PR to show how AWQ and GPTQ can be applied heterogeneously to a model, along with a small README. Logs show alternating AWQ and GPTQ messages for "sequential", and correct behavior for "independent" pipelines. Model checkpoint for the sequential pipeline shows correct application of W8A8 to self_attn layers and W4A16 to mlp layers. config.json and safetensors weights all look as expected

github-actions · 2025-08-21T19:52:19Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Brian Dellabetta <[email protected]>

kylesayrs · 2025-09-15T14:37:50Z

src/llmcompressor/modifiers/quantization/gptq/base.py

@@ -178,7 +182,7 @@ def on_start(self, state: State, event: Event, **kwargs):

        # register gptq hooks
        added_hook = False
-        for module in state.model.modules():
+        for _, module in match_named_modules(state.model, self.targets, self.ignore):
            if getattr_chain(module, "quantization_scheme.weights", None) is not None:


Should this be changed into an assert rather than an if?

yeah i think that makes sense, I can update

hmm on second thought, it doesn't look like there's anything in the validation layer confirming each quantization args instance has a weights field. so if a user sets an invalid config where weight quantization isn't configured, it would error out here. Is that what we want?

I'd prefer explicit error rather than silent skip here

Shouldn't we do this in the validation layer though? I can add a check to model validate, and switch to assert?

Signed-off-by: Brian Dellabetta <[email protected]>

kylesayrs

Consider adding some basic tests/ common use cases, otherwise looks good!

kylesayrs · 2025-09-15T20:51:03Z

examples/multi_modifier/llama3_example.py

@@ -0,0 +1,101 @@
+from datasets import load_dataset


Can we maybe generalize this folder name to mixed-precision so that people associate this with enabling mixed precision workloads?

Even though multi-modifier recipes don't necessarily need to be mixed precision? I just think mixed-precision is a stronger message than multi-modifier

I can rename to examples/mixed_precision if we find that to be more appropriate. Let's discuss in standup

kylesayrs · 2025-09-15T20:52:08Z

examples/multi_modifier/llama3_example.py

+    max_seq_length=MAX_SEQUENCE_LENGTH,
+    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
+    # Option 1) run both modifiers in a single calibrated run
+    pipeline="sequential",


Is the pipeline not already sequential by default?

it actually defaults to independent. Without it set, it infers to use SequentialPipeline for GPTQ, running just GPTQ independently. It then infers to use SequentialPipeline for AWQ, running just AWQ independently. Not sure what we want for default behavior though, this just makes it explicit

kylesayrs · 2025-09-15T20:52:20Z

examples/multi_modifier/llama3_example.py

+print("==========================================\n\n")
+
+# Save to disk compressed.
+SAVE_DIR = model_id.rstrip("/").split("/")[-1] + "-W4A16-G128"


Add GPTQ and AWQ to the names?

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta requested review from dsikka, kylesayrs and rahul-tuli August 21, 2025 19:55

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch 2 times, most recently from 5fec983 to 2f93072 Compare August 28, 2025 16:51

brian-dellabetta changed the title ~~[Multi-modifier] Support scoped appliation of quantization config/status~~ [Multi-modifier] Support scoped application of quantization config/status Sep 2, 2025

brian-dellabetta added 4 commits September 11, 2025 16:43

match_named_modules, add observer on_start instead of on_initialize

9f0e0ac

Signed-off-by: Brian Dellabetta <[email protected]>

scoped quant status/config

14486af

Signed-off-by: Brian Dellabetta <[email protected]>

scoped GPTQModifier

ff5067a

Signed-off-by: Brian Dellabetta <[email protected]>

style fixes

f99db2f

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta force-pushed the bdellabe/scoped-quant-status branch from 2f93072 to f99db2f Compare September 11, 2025 16:43

kylesayrs reviewed Sep 15, 2025

View reviewed changes

brian-dellabetta mentioned this pull request Sep 15, 2025

[Multi-Modifier] Scoped apply quantization config neuralmagic/compressed-tensors#432

Open

brian-dellabetta added 2 commits September 15, 2025 20:30

multi-modifier example

5da7b6d

Signed-off-by: Brian Dellabetta <[email protected]>

revert assert check in GPTQ

32ad8dc

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta marked this pull request as ready for review September 15, 2025 20:38

Merge branch 'main' into bdellabe/scoped-quant-status

faee70f

brian-dellabetta added the ready When a PR is ready for review label Sep 15, 2025

stylefix examples

4db397b

Signed-off-by: Brian Dellabetta <[email protected]>

brian-dellabetta removed the ready When a PR is ready for review label Sep 15, 2025

kylesayrs reviewed Sep 15, 2025

View reviewed changes

brian-dellabetta and others added 3 commits September 16, 2025 16:57

Merge branch 'main' into bdellabe/scoped-quant-status

1c7ae4d

KVCacheScaleType import update

64f8f39

Signed-off-by: Brian Dellabetta <[email protected]>

Merge branch 'main' into bdellabe/scoped-quant-status

a892d2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Multi-modifier] Support scoped application of quantization config/status #1772

[Multi-modifier] Support scoped application of quantization config/status #1772

brian-dellabetta commented Aug 21, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

kylesayrs Sep 15, 2025

Uh oh!

brian-dellabetta Sep 15, 2025

Uh oh!

brian-dellabetta Sep 15, 2025

Uh oh!

kylesayrs Sep 15, 2025

Uh oh!

brian-dellabetta Sep 15, 2025

Uh oh!

kylesayrs left a comment

Uh oh!

kylesayrs Sep 15, 2025

Uh oh!

kylesayrs Sep 15, 2025

Uh oh!

brian-dellabetta Sep 15, 2025

Uh oh!

kylesayrs Sep 15, 2025

Uh oh!

brian-dellabetta Sep 15, 2025

Uh oh!

kylesayrs Sep 15, 2025

Uh oh!

Uh oh!

[Multi-modifier] Support scoped application of quantization config/status #1772

Are you sure you want to change the base?

[Multi-modifier] Support scoped application of quantization config/status #1772

Conversation

brian-dellabetta commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brian-dellabetta commented Aug 21, 2025 •

edited

Loading