-
Notifications
You must be signed in to change notification settings - Fork 529
Support dynamically quantized 2D convolutions #10248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dynamically quantized 2D convolutions #10248
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10248
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9693061 with merge base b7eee0c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "topic: not user facing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really good. Just a few minor comments.
@@ -283,14 +284,26 @@ def input_to_nhwc( | |||
] | |||
else: | |||
# Need to create NHWC node | |||
# Check if input uses dynamic quantization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! the change made here looks simplier than i expected. Though i suspect it was still pretty hard to navigate and figure out.
Do you mind adding a new test here as well:
https://github.com/pytorch/executorch/blob/d7f74bd4adf10950224d2d975a6e23e92e7be6f3/backends/xnnpack/test/passes/test_channels_last_tagged_reshape.py
Essentially we just check if there is a dynamically quantized convolution, then we place one permute before the dynamic_q chain and one permute after the conv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was definitely one of those changes where the time-to-LOC ratio was off the charts haha
Test added in test_channels_last_tagged_reshape.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, this is a good datapoint, I want to help improve the contributability flow for the XNNPACK Backend, and some of that definitely means improving/refactoring passes like this which are way too complex.
q_input_val = q_input.meta.get("val", None) | ||
q_input_shape = getattr(q_input_val, "shape", None) | ||
if q_input_shape is not None: | ||
num_nonbatch_dims = max(len(q_input_shape) - 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't necessarily the case for linear layers, the input to these will always have 1 nonbatch dimension.:
(x, y ,z, input_channels). The rank of the tensor can be arbitrary, and we always interpret the last dimension as input channels, and every other dimension as a batch dimension.
for now let's just add a check if convolution then 3 if linear then 1. This issue stems from the fact that we are injecting per_tensor quant nodes, but our intent is to do per_batch. We are adding in affine quantization which should tell us how many batch and how many non-batch dimensions there are in the quant node, so later on we will fix it to use that, but for now, it might just be hard code the conv and linear case :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. I added a check to determine if the node feeds into a conv and set the non-batch dimensions to 3 if it does
weight_shape = getattr(weight_val, "shape", None) | ||
|
||
# Skip if not a 4D weight tensor (i.e. not conv2d) | ||
if weight_shape is not None and len(weight_shape) != 4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, let's also add a skip if the convolution is depthwise, since XNNPACK can't handle that case yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to get the group number for the depthwise check here? I'm currently defaulting to 1 group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do believe the default is 1 if it is not an arg
@@ -1172,7 +1167,7 @@ Error defineStaticTransposeNode( | |||
ET_CHECK_OR_RETURN_ERROR( | |||
status == xnn_status_success, | |||
Internal, | |||
"Failed to create sigmoid node %i with code: %s", | |||
"Failed to create static transpose node %i with code: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks :)
self.conv.weight.requires_grad = False | ||
self.conv.bias.requires_grad = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, removed
@@ -169,6 +173,20 @@ def get_inputs(self): | |||
return (torch.randn(2, 2, 4, 4),) | |||
|
|||
|
|||
class Conv2dDynamicQuant(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just add two more, where we have:
- two convolutions in sequence:
inp --> conv --> conv --> out
- two convolutions running in parallel:
inp1 --> conv --> out1
\
--> conv2 --> out2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the two unit tests
@@ -358,6 +358,11 @@ def check_constraints(self, node: torch.fx.Node, ep: ExportedProgram) -> bool: | |||
why(node, "Only support 1D + 2D Conv") | |||
return False # Only support 1D + 2D Conv | |||
|
|||
precision = self._detect_precision(node) | |||
if precision == ConfigPrecisionType.DYNAMIC_QUANT and len(conv_stride) != 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add the depthwise check here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the depthwise check. If it looks good, I can move the check to a helper function since it's being used in op_conv2d.py
, gemm_configs.py
, and xnnpack_quantizer_utils.py
is_transpose = node.args[6] | ||
|
||
if is_transpose: | ||
group_input_channels = int(kernel_shape[0] / groups) | ||
group_output_channels = kernel_shape[1] | ||
else: | ||
group_input_channels = kernel_shape[1] | ||
group_output_channels = int(kernel_shape[0] / groups) | ||
|
||
is_depthwise = ( | ||
group_input_channels == 1 | ||
and group_output_channels % group_input_channels == 0 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can move this check into xnnpack/utils/utils.py
looks good, let's rebase and just let the CI run thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@@ -169,6 +173,55 @@ def get_inputs(self): | |||
return (torch.randn(2, 2, 4, 4),) | |||
|
|||
|
|||
class Conv2dDQ(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
class Conv2dDQ(torch.nn.Module): | |
class Conv2d(torch.nn.Module): |
|
||
DynamicallyQuantizedPartitioner = XnnpackPartitioner( | ||
config_precisions=ConfigPrecisionType.DYNAMIC_QUANT, | ||
per_op_mode=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: also check without this?
|
||
def test_dq_conv2d_seq(self) -> None: | ||
model = Conv2dDQSeq() | ||
self._test_dq(model, conv_count=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: get conv count from the model rather than hardcoding
Differential Revision: D72503552 Pull Request resolved: pytorch#9923
…h#9926) The output verification sometimes fails for the mm tests on U85. Add pytest.mark.flaky decorators to the tests to prevent sporadic failures. Co-authored-by: Martin Lindström <[email protected]>
Differential Revision: D73292616 Pull Request resolved: pytorch#10312
Currently, we generate every combination of inputs for each module with the export_delegate_program script: - extract_segments=True, False - delegate_alignment=None,1024 Planning to add another flag, 'external_constants', which will move constants into a separate file to test program-data separation for delegated programs. This test only requires pte, ptd, with default settings. So refactoring the export script to only generate based on the args, and update genrule to generate what the test requires. Differential Revision: [D73278562](https://our.internmc.facebook.com/intern/diff/D73278562/)
ATT Differential Revision: [D73222738](https://our.internmc.facebook.com/intern/diff/D73222738/)
…ch#10340) When using attention bias dont override seq length for causal attention Differential Revision: [D73222733](https://our.internmc.facebook.com/intern/diff/D73222733/)
… attention mask (pytorch#10341) Previously we assumed that the custom sdpa always does causal attention. This diff adds option to this module swap pass to make custom sdpa leverage attention mask instead of causal. Differential Revision: [D73222736](https://our.internmc.facebook.com/intern/diff/D73222736/)
…ansforms inside llm mananger (pytorch#10342) Differential Revision: [D73222734](https://our.internmc.facebook.com/intern/diff/D73222734/)
Skip those with spaces that aren't actually xrefs
9b82627
to
9693061
Compare
Summary
Add initial support for dynamically quantized Conv2d in XNNPACK:
conv
toDYNAMIC_OPS
for annotationnum_nonbatch_dims
based on whether the node feeds into a convnum_nonbatch_dims
check from XNNCompilerFixes #9021
Test plan