Support dynamically quantized 2D convolutions #10248

keyprocedure · 2025-04-16T23:37:10Z

Summary

Add initial support for dynamically quantized Conv2d in XNNPACK:

Add conv to DYNAMIC_OPS for annotation
Update partitioner to support dynamically quantized Conv2d
Add checks to ensure only 2D, non-depthwise dynamically quantized convs are partitioned and annotated
Update NHWC permute node insertion to trace back to original input for dynamically quantized inputs
Compute num_nonbatch_dims based on whether the node feeds into a conv
Remove the num_nonbatch_dims check from XNNCompiler
Add unit tests for channels-last permute and single, sequential, and parallel dynamically quantized 2D convs

Test plan

python -m unittest backends.xnnpack.test.ops.test_conv2d.TestConv2d.test_dq_conv2d
python -m unittest backends.xnnpack.test.ops.test_conv2d.TestConv2d.test_dq_conv2d_seq
python -m unittest backends.xnnpack.test.ops.test_conv2d.TestConv2d.test_dq_conv2d_parallel
python -m unittest backends.xnnpack.test.passes.test_channels_last_tagged_reshape.TestChannelsLastTaggedReshapePass.test_dq_conv2d_channels_last_tagged_reshape_pass

pytorch-bot · 2025-04-16T23:37:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10248

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9693061 with merge base b7eee0c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

keyprocedure · 2025-04-16T23:38:17Z

@pytorchbot label "topic: not user facing"

mcr229

Overall looks really good. Just a few minor comments.

mcr229 · 2025-04-17T20:26:43Z

backends/xnnpack/_passes/channels_last_tagged_reshape_pass.py

@@ -283,14 +284,26 @@ def input_to_nhwc(
            ]
        else:
            # Need to create NHWC node
+            # Check if input uses dynamic quantization


nice! the change made here looks simplier than i expected. Though i suspect it was still pretty hard to navigate and figure out.

Do you mind adding a new test here as well:
https://github.com/pytorch/executorch/blob/d7f74bd4adf10950224d2d975a6e23e92e7be6f3/backends/xnnpack/test/passes/test_channels_last_tagged_reshape.py

Essentially we just check if there is a dynamically quantized convolution, then we place one permute before the dynamic_q chain and one permute after the conv.

This was definitely one of those changes where the time-to-LOC ratio was off the charts haha

Test added in test_channels_last_tagged_reshape.py

yea, this is a good datapoint, I want to help improve the contributability flow for the XNNPACK Backend, and some of that definitely means improving/refactoring passes like this which are way too complex.

mcr229 · 2025-04-17T20:28:45Z

backends/xnnpack/operators/quant_params.py

+        q_input_val = q_input.meta.get("val", None)
+        q_input_shape = getattr(q_input_val, "shape", None)
+        if q_input_shape is not None:
+            num_nonbatch_dims = max(len(q_input_shape) - 1, 1)


this isn't necessarily the case for linear layers, the input to these will always have 1 nonbatch dimension.:
(x, y ,z, input_channels). The rank of the tensor can be arbitrary, and we always interpret the last dimension as input channels, and every other dimension as a batch dimension.

for now let's just add a check if convolution then 3 if linear then 1. This issue stems from the fact that we are injecting per_tensor quant nodes, but our intent is to do per_batch. We are adding in affine quantization which should tell us how many batch and how many non-batch dimensions there are in the quant node, so later on we will fix it to use that, but for now, it might just be hard code the conv and linear case :(

Good to know. I added a check to determine if the node feeds into a conv and set the non-batch dimensions to 3 if it does

mcr229 · 2025-04-17T20:31:41Z

backends/xnnpack/quantizer/xnnpack_quantizer_utils.py

+            weight_shape = getattr(weight_val, "shape", None)
+
+            # Skip if not a 4D weight tensor (i.e. not conv2d)
+            if weight_shape is not None and len(weight_shape) != 4:


nice, let's also add a skip if the convolution is depthwise, since XNNPACK can't handle that case yet.

Is there a way to get the group number for the depthwise check here? I'm currently defaulting to 1 group

i do believe the default is 1 if it is not an arg

mcr229 · 2025-04-17T20:31:50Z

backends/xnnpack/runtime/XNNCompiler.cpp

@@ -1172,7 +1167,7 @@ Error defineStaticTransposeNode(
  ET_CHECK_OR_RETURN_ERROR(
      status == xnn_status_success,
      Internal,
-      "Failed to create sigmoid node %i with code: %s",
+      "Failed to create static transpose node %i with code: %s",


mcr229 · 2025-04-17T20:32:11Z

backends/xnnpack/test/ops/test_conv2d.py

+        self.conv.weight.requires_grad = False
+        self.conv.bias.requires_grad = False


is this necessary?

Good catch, removed

mcr229 · 2025-04-17T20:34:37Z

backends/xnnpack/test/ops/test_conv2d.py

@@ -169,6 +173,20 @@ def get_inputs(self):
        return (torch.randn(2, 2, 4, 4),)


+class Conv2dDynamicQuant(torch.nn.Module):


can you just add two more, where we have:

two convolutions in sequence: inp --> conv --> conv --> out

two convolutions running in parallel:

inp1 --> conv --> out1 \ --> conv2 --> out2

Added the two unit tests

mcr229 · 2025-04-17T20:35:30Z

backends/xnnpack/partition/config/gemm_configs.py

@@ -358,6 +358,11 @@ def check_constraints(self, node: torch.fx.Node, ep: ExportedProgram) -> bool:
            why(node, "Only support 1D + 2D Conv")
            return False  # Only support 1D + 2D Conv

+        precision = self._detect_precision(node)
+        if precision == ConfigPrecisionType.DYNAMIC_QUANT and len(conv_stride) != 2:


can you also add the depthwise check here?

Added the depthwise check. If it looks good, I can move the check to a helper function since it's being used in op_conv2d.py, gemm_configs.py, and xnnpack_quantizer_utils.py

mcr229 · 2025-04-21T19:09:34Z

backends/xnnpack/partition/config/gemm_configs.py

+        is_transpose = node.args[6]
+
+        if is_transpose:
+            group_input_channels = int(kernel_shape[0] / groups)
+            group_output_channels = kernel_shape[1]
+        else:
+            group_input_channels = kernel_shape[1]
+            group_output_channels = int(kernel_shape[0] / groups)
+
+        is_depthwise = (
+            group_input_channels == 1
+            and group_output_channels % group_input_channels == 0
+        )


maybe we can move this check into xnnpack/utils/utils.py

mcr229 · 2025-04-21T19:10:11Z

looks good, let's rebase and just let the CI run thank you!

digantdesai

Thank you!

digantdesai · 2025-04-21T19:06:11Z

backends/xnnpack/test/ops/test_conv2d.py

@@ -169,6 +173,55 @@ def get_inputs(self):
        return (torch.randn(2, 2, 4, 4),)


+class Conv2dDQ(torch.nn.Module):


nit

Suggested change

class Conv2dDQ(torch.nn.Module):

class Conv2d(torch.nn.Module):

digantdesai · 2025-04-21T19:07:08Z

backends/xnnpack/test/ops/test_conv2d.py

+
+        DynamicallyQuantizedPartitioner = XnnpackPartitioner(
+            config_precisions=ConfigPrecisionType.DYNAMIC_QUANT,
+            per_op_mode=True,


Nit: also check without this?

digantdesai · 2025-04-21T19:07:54Z

backends/xnnpack/test/ops/test_conv2d.py

+
+    def test_dq_conv2d_seq(self) -> None:
+        model = Conv2dDQSeq()
+        self._test_dq(model, conv_count=2)


Nit: get conv count from the model rather than hardcoding

Differential Revision: D72503552 Pull Request resolved: pytorch#9923

…h#9926) The output verification sometimes fails for the mm tests on U85. Add pytest.mark.flaky decorators to the tests to prevent sporadic failures. Co-authored-by: Martin Lindström <[email protected]>

Differential Revision: D73292616 Pull Request resolved: pytorch#10312

Currently, we generate every combination of inputs for each module with the export_delegate_program script: - extract_segments=True, False - delegate_alignment=None,1024 Planning to add another flag, 'external_constants', which will move constants into a separate file to test program-data separation for delegated programs. This test only requires pte, ptd, with default settings. So refactoring the export script to only generate based on the args, and update genrule to generate what the test requires. Differential Revision: [D73278562](https://our.internmc.facebook.com/intern/diff/D73278562/)

ATT Differential Revision: [D73222738](https://our.internmc.facebook.com/intern/diff/D73222738/)

…ch#10340) When using attention bias dont override seq length for causal attention Differential Revision: [D73222733](https://our.internmc.facebook.com/intern/diff/D73222733/)

… attention mask (pytorch#10341) Previously we assumed that the custom sdpa always does causal attention. This diff adds option to this module swap pass to make custom sdpa leverage attention mask instead of causal. Differential Revision: [D73222736](https://our.internmc.facebook.com/intern/diff/D73222736/)

…ansforms inside llm mananger (pytorch#10342) Differential Revision: [D73222734](https://our.internmc.facebook.com/intern/diff/D73222734/)

Skip those with spaces that aren't actually xrefs

keyprocedure added 18 commits April 7, 2025 20:46

WIP: add initial support for dq 2D conv

0b5b0e8

Permute before quant

8fcb117

Refactor permute code

4d064da

Corrects input to conv

2905b98

Add is_dequant check for trace back when inserting permute

0fef04a

Fix node identity check

f8f998c

Use existing is_dequant check and update atol

2efe9bb

Implement replace_all_uses_with function

3762e0d

Remove cmake file

4112c6a

Restore original supported conv2d operators

cdd6f2d

Add dynamic quant check before NHWC permute

7150872

Refactor dq conv2d test

6b44c4b

Revert formatting

7054f2e

Add check to only annotate dq conv2d

fc48e03

Remove unused import

84b3634

Add computation for non-batch dims; remove non-batch dims check

62e30e5

Refactor test and imports

3c7fe32

Update comments

064671b

keyprocedure requested review from digantdesai and mcr229 as code owners April 16, 2025 23:37

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 16, 2025

pytorch-bot bot added the topic: not user facing label Apr 16, 2025

mcr229 requested changes Apr 17, 2025

View reviewed changes

mcr229 reviewed Apr 21, 2025

View reviewed changes

digantdesai approved these changes Apr 21, 2025

View reviewed changes

pssrawat and others added 2 commits April 21, 2025 16:55

Support slice ops with default start

d5c4970

Differential Revision: D72503552 Pull Request resolved: pytorch#9923

Arm backend: Add pytest.mark.flaky on U85 tests in test_mm.py (pytorc…

c573b6f

…h#9926) The output verification sometimes fails for the mm tests on U85. Add pytest.mark.flaky decorators to the tests to prevent sporadic failures. Co-authored-by: Martin Lindström <[email protected]>

kirklandsign and others added 10 commits April 21, 2025 16:55

[Android] Remove old onStats

d25715b

Differential Revision: D73292616 Pull Request resolved: pytorch#10312

Fix Linter (pytorch#10333)

adc4892

Update check_urls.sh (pytorch#10321)

f5bd273

Fix android instrumentation (pytorch#10335)

25ceeec

[Executorch][BE] Fix error logging with better message (pytorch#10339)

84ca4fe

ATT Differential Revision: [D73222738](https://our.internmc.facebook.com/intern/diff/D73222738/)

[Executorch][llama] bug fix for custom sdpa for attention bias (pytor…

3d33785

…ch#10340) When using attention bias dont override seq length for causal attention Differential Revision: [D73222733](https://our.internmc.facebook.com/intern/diff/D73222733/)

[Executorch][llama] Hookup use_attention_mask option in the source tr…

b03a2d8

…ansforms inside llm mananger (pytorch#10342) Differential Revision: [D73222734](https://our.internmc.facebook.com/intern/diff/D73222734/)

Update check_xrefs.sh (pytorch#10343)

9693061

Skip those with spaces that aren't actually xrefs

keyprocedure force-pushed the support-dynamically-quantized-convolutions branch from 9b82627 to 9693061 Compare April 21, 2025 23:55

keyprocedure requested review from tarun292, larryliu0820, kirklandsign, GregoryComer, swolchok, cccclai, JacobSzwejbka, lucylq, manuelcandales, shoumikhin, jackzhxng, iseeyuan, kimishpatel, metascroy, mergennachin, Gasoonjia and SS-JIA as code owners April 21, 2025 23:55

keyprocedure closed this Apr 22, 2025

keyprocedure deleted the support-dynamically-quantized-convolutions branch April 23, 2025 22:19

		self.conv.weight.requires_grad = False
		self.conv.bias.requires_grad = False

		@@ -169,6 +173,20 @@ def get_inputs(self):
		return (torch.randn(2, 2, 4, 4),)


		class Conv2dDynamicQuant(torch.nn.Module):

		@@ -169,6 +173,55 @@ def get_inputs(self):
		return (torch.randn(2, 2, 4, 4),)


		class Conv2dDQ(torch.nn.Module):

	class Conv2dDQ(torch.nn.Module):
	class Conv2d(torch.nn.Module):

Support dynamically quantized 2D convolutions #10248

Support dynamically quantized 2D convolutions #10248

Uh oh!

Conversation

keyprocedure commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10248

✅ No Failures

Uh oh!

keyprocedure commented Apr 16, 2025

Uh oh!

mcr229 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keyprocedure Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcr229 commented Apr 21, 2025

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

keyprocedure commented Apr 16, 2025 •

edited

Loading

pytorch-bot bot commented Apr 16, 2025 •

edited

Loading

keyprocedure Apr 20, 2025 •

edited

Loading