Support Int4OpaqueTensor for AWQ #2997

cyxlily · 2025-09-13T00:45:37Z

Add act_pre_scale into Int4OpaqueTensor for AWQ.

pytorch-bot · 2025-09-13T00:45:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2997

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a5675fe with merge base c4d4799 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Add act_pre_scale into Int4OpaqueTensor for AWQ. Signed-off-by: Cui, Yuxin <[email protected]>

Xia-Weiwen

LGTM but I have a few questions.

test/prototype/test_awq.py

test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py

Xia-Weiwen · 2025-09-13T13:27:00Z

torchao/prototype/awq/example.py

@@ -254,14 +295,21 @@ def quantize_and_eval(
        quantize_(model, quant_config)
        print(f"time for convert: {time.time() - t0:.02f} seconds")
        quant_config = AWQConfig(base_config, step="prepare_for_loading")
-        model.config.quantization_config = TorchAoConfig(quant_config)
+        #model.config.quantization_config = TorchAoConfig(quant_config)


Is this change needed?

No, I update and remove this change.

Xia-Weiwen · 2025-09-13T13:29:48Z

CC @mingfeima for review. Thanks.

mingfeima · 2025-09-13T14:01:41Z

torchao/prototype/awq/example.py

+        if device == "cuda":
+            base_config = Int4WeightOnlyConfig(group_size=group_size, version=2)
+        elif device == "cpu":
+            base_config = Int4WeightOnlyConfig(
+                group_size=group_size, packing_format="opaque", version=2
+            )
+        else:
+            assert False, "Unsupported device: {}".format(device)


i am not very familar with the concept here, could you explain why cpu needs opaque packing_format?

it's because packing_format describes a fix format of how the quantized weight data are laid out in memory, but int4 cpu has a format that is based on specific hardwares/tensor shapes etc.:

ao/torchao/quantization/quantize_/workflows/int4/int4_opaque_tensor.py

Lines 45 to 47 in 045c959

We use AVX512 to compute TINYGEMM on CPU. We can also leverage AVX512_VNNI and AMX instructions with torch.compile and max-autotune.

For data locality, we preshuffle the data in plain layout (N, K/2) to (N/block_n, K, block_n/2), where block_n = 64/32/16.

See https://github.com/pytorch/pytorch/blob/32eee8ed225d9f10fbbcb38c24b8b44c24c0c97c/aten/src/ATen/native/cpu/int4mm_kernel.cpp#L583 for more details.

jerryzh168 · 2025-09-13T18:51:55Z

torchao/prototype/awq/example.py

+            base_config = Int4WeightOnlyConfig(group_size=group_size, version=2)
+        elif device == "cpu":
+            base_config = Int4WeightOnlyConfig(
+                group_size=group_size, packing_format="opaque", version=2


version=2 can be removed now, it's the default now

Thanks, I update and remove version=2

Please remove the version=2 here since it's the default.

OK, I removed version=2 in CPU

Xia-Weiwen

Thanks.

test/prototype/test_awq.py

Xia-Weiwen · 2025-09-16T09:28:35Z

torchao/prototype/awq/example.py

+
+inductor_config.cpp_wrapper = True
+inductor_config.max_autotune = True
+inductor_config.max_autotune_gemm_backends = "CPP,ATEN"


This script will also be used for CUDA. So, I think triton is needed here. Or let's simply remove this line to use the default ones.

Xia-Weiwen · 2025-09-16T09:38:06Z

test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py

+        # Making sure activation pre scaling is successfully applied to the activation.
+        # manual_scaled_quantized (input * 2 → quantize with act_pre_scale=None) should equal
+        # auto_scaled_quantized (original input → quantize with act_pre_scale=2),
+        # Proving that the act_pre_scale factor correctly applies input scaling


Suggested change

# Making sure activation pre scaling is successfully applied to the activation.

# manual_scaled_quantized (input * 2 → quantize with act_pre_scale=None) should equal

# auto_scaled_quantized (original input → quantize with act_pre_scale=2),

# Proving that the act_pre_scale factor correctly applies input scaling

# Make sure activation pre scaling is successfully applied to the activation.

Let's make it more concise.

Thanks, updated

Xia-Weiwen · 2025-09-16T09:45:25Z

test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py

+        # Making sure quantization with pre-scaling is successfully applied to the activation.
+        # The error > 20 indicats that quantized computation with activation pre-scaling
+        # produces significantly different results from simply scaling the original
+        # floating-point output, confirming that pre-scaling is applied during
+        # quantization rather than post-processing.


Suggested change

# Making sure quantization with pre-scaling is successfully applied to the activation.

# The error > 20 indicats that quantized computation with activation pre-scaling

# produces significantly different results from simply scaling the original

# floating-point output, confirming that pre-scaling is applied during

# quantization rather than post-processing.

# If pre-scaling is auto-applied, the quantization error should be low, i.e., compute_error (SQNR) is high

This may be simpler.

Thanks, updated

Xia-Weiwen · 2025-09-16T09:46:23Z

torchao/prototype/awq/example.py

+            base_config = Int4WeightOnlyConfig(group_size=group_size, version=2)
+        elif device == "cpu":
+            base_config = Int4WeightOnlyConfig(
+                group_size=group_size, packing_format="opaque", version=2


Please remove the version=2 here since it's the default.

Signed-off-by: Cui, Yuxin <[email protected]>

jerryzh168 · 2025-09-16T18:37:53Z

test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py

@@ -28,7 +29,7 @@
 def get_config(group_size):
    return Int4WeightOnlyConfig(
        group_size=group_size,
-        int4_packing_format="opaque",
+        packing_format="opaque",


this should be int4_packing_format I think

Thanks, updated

jerryzh168 · 2025-09-16T18:38:58Z

test/quantization/quantize_/workflows/int4/test_int4_opaque_tensor.py

+        # If pre-scaling is auto-applied, the quantization error should be low,
+        # i.e., compute_error (SQNR) is high
+        self.assertTrue(
+            compute_error(original * _ACT_PRE_SCALE, auto_scaled_quantized) > 20


nit: original --> original_output

Thanks, updated

jerryzh168

LGTM

cyxlily · 2025-09-17T08:17:00Z

@pytorchbot merge

pytorchmergebot · 2025-09-17T08:17:41Z

Merge failed

Reason: 1 mandatory check(s) are pending/not yet run. The first few are:

Facebook CLA Check

Dig deeper by viewing the pending checks on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: superuser

cyxlily · 2025-09-17T08:20:35Z

@pytorchbot merge

pytorchmergebot · 2025-09-17T08:21:10Z

Merge failed

Reason: 1 mandatory check(s) are pending/not yet run. The first few are:

Facebook CLA Check

Dig deeper by viewing the pending checks on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: superuser

Signed-off-by: Cui, Yuxin <[email protected]>

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 13, 2025

Support Int4OpaqueTensor for AWQ

1a722a3

Add act_pre_scale into Int4OpaqueTensor for AWQ. Signed-off-by: Cui, Yuxin <[email protected]>

Xia-Weiwen reviewed Sep 13, 2025

View reviewed changes

mingfeima reviewed Sep 13, 2025

View reviewed changes

jerryzh168 reviewed Sep 13, 2025

View reviewed changes

Xia-Weiwen approved these changes Sep 16, 2025

View reviewed changes

cyxlily requested review from jerryzh168 and mingfeima September 16, 2025 10:39

Cui, Yuxin added 4 commits September 16, 2025 14:56

Format codes

c8ca7e1

Signed-off-by: Cui, Yuxin <[email protected]>

Add detailed tests for act_pre_scale

9cc0c26

Signed-off-by: Cui, Yuxin <[email protected]>

remove debug codes

a441eda

Signed-off-by: Cui, Yuxin <[email protected]>

update codes

b5f8874

Signed-off-by: Cui, Yuxin <[email protected]>

jerryzh168 reviewed Sep 16, 2025

View reviewed changes

jerryzh168 approved these changes Sep 16, 2025

View reviewed changes

jerryzh168 mentioned this pull request Sep 16, 2025

Filling some Int4 tensor feature gaps after tensor subclass migration #3013

Open

6 tasks

Xia-Weiwen added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Sep 17, 2025

pytorchmergebot added the merging label Sep 17, 2025

pytorchmergebot removed the merging label Sep 17, 2025

pytorchmergebot added the merging label Sep 17, 2025

pytorchmergebot removed the merging label Sep 17, 2025

Xia-Weiwen merged commit 067b273 into pytorch:main Sep 17, 2025
22 of 23 checks passed

cyxlily deleted the awq branch September 17, 2025 08:24

Change to int4_packing_format

8b042db

Signed-off-by: Cui, Yuxin <[email protected]>

Cui, Yuxin added 2 commits September 17, 2025 09:42

Precise variable naming

e14ed2a

Signed-off-by: Cui, Yuxin <[email protected]>

Update tests for act_pre_scale

a5675fe

Signed-off-by: Cui, Yuxin <[email protected]>

	We use AVX512 to compute TINYGEMM on CPU. We can also leverage AVX512_VNNI and AMX instructions with torch.compile and max-autotune.
	For data locality, we preshuffle the data in plain layout (N, K/2) to (N/block_n, K, block_n/2), where block_n = 64/32/16.
	See https://github.com/pytorch/pytorch/blob/32eee8ed225d9f10fbbcb38c24b8b44c24c0c97c/aten/src/ATen/native/cpu/int4mm_kernel.cpp#L583 for more details.

Support Int4OpaqueTensor for AWQ #2997

Support Int4OpaqueTensor for AWQ #2997

Conversation

cyxlily commented Sep 13, 2025

Uh oh!

pytorch-bot bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2997

✅ No Failures

Uh oh!

Xia-Weiwen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Sep 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

cyxlily commented Sep 17, 2025

Uh oh!

pytorchmergebot commented Sep 17, 2025

Merge failed

Uh oh!

cyxlily commented Sep 17, 2025

Uh oh!

pytorchmergebot commented Sep 17, 2025

Merge failed

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 13, 2025 •

edited

Loading