Skip to content

fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229

Open
Rishi-Dave wants to merge 2 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/qdq-conv-bias-scale-validation
Open

fix(quantization): validate bias scale in QDQ Conv → QLinearConv fusion#28229
Rishi-Dave wants to merge 2 commits intomicrosoft:mainfrom
Rishi-Dave:rishidave/fix/qdq-conv-bias-scale-validation

Conversation

@Rishi-Dave
Copy link
Copy Markdown
Contributor

Summary

  • Add CheckConvBiasScale validator inside ConvNodeGroupSelector::Check
  • Skip QDQ Conv → QLinearConv fusion when bias DQ scale ≠ input_scale × weight_scale (within 1% relative tolerance)
  • Adds Python test coverage for both matching and mismatched bias scales

Motivation

Fixes #24711.

The ONNX QLinearConv spec requires the int32 bias to use scale x_scale × w_scale[i] so the fused kernel can reuse it directly. The current QDQ selector only verifies the bias dtype is INT32 — it never checks that the bias DQ's scale satisfies this relationship. When a model is constructed with an arbitrary bias scale (e.g. user-supplied or from a non-canonical quantizer), the selector still fuses the subgraph and the QLinearConv kernel produces silently wrong outputs at ORT_ENABLE_EXTENDED and above on CPU EP. CUDA and disabled-optimization paths produce correct results, making the bug particularly hard to diagnose.

Changes

  • onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc: add CheckConvBiasScale static helper. Returns false (skip fusion) when:
    • any of x/w/b scales is not a constant initializer
    • any scale dtype is not float32
    • x_scale is not a scalar / 1-element rank-1 tensor
    • b_scale length is neither 1 nor num_channels
    • any per-channel bias scale differs from x_scale × w_scale[i] by more than atol=1e-6 + rtol=1e-2 × |expected|
  • onnxruntime/test/python/quantization/test_qdq.py: new TestConvBiasScaleValidation class with two cases — mismatched bias scale (asserts optimized output matches unoptimized) and matching bias scale (asserts correctness preserved when fusion is allowed).

Test Plan

  • python -m pytest onnxruntime/test/python/quantization/test_qdq.py::TestConvBiasScaleValidation -v
  • Existing QDQ Conv tests (verify_quantize_conv family) should continue to pass — fusion is unchanged for canonical quantizer-produced models where bias_scale equals input_scale × weight_scale exactly.
  • Reproduce the issue with the model from Conv + Q/DQ produces bad output in CPUExecutionProvider #24711 and confirm CPU ORT_ENABLE_ALL output now matches ORT_DISABLE_ALL.

ONNX QLinearConv reuses the int32 bias values directly without
re-scaling, which requires bias_scale[i] == x_scale * w_scale[i] for
each output channel. Previously ConvNodeGroupSelector::Check did not
verify this invariant, so a QDQ model whose bias DQ used an arbitrary
scale (e.g. produced by a non-conformant quantizer) would be silently
fused into QLinearConv with wrong outputs.

Add a static helper CheckConvBiasScale() that reads the three scale
initializers, handles both per-tensor (scalar) and per-channel weight
scales, and rejects the fusion when the tolerance check
  |b_scale - x_scale*w_scale| > atol + rtol*|x_scale*w_scale|
(atol=1e-6, rtol=1e-2) fails for any channel. If any scale is not a
constant initializer the helper returns false conservatively.

Add two Python test cases in TestConvBiasScaleValidation:
- test_mismatched_bias_scale_skips_fusion: verifies that optimized and
  unoptimized sessions agree when bias_scale is 2x the correct value,
  proving fusion was safely skipped.
- test_matching_bias_scale_allows_fusion: verifies the same agreement
  when bias_scale is exact, ensuring no regression on valid models.

Fixes microsoft#24711
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the QDQ Conv → QLinearConv fusion in the QDQ transformer by validating that the bias dequantization scale conforms to the ONNX QLinearConv requirement (bias_scale == x_scale * w_scale[i]), preventing silent wrong results when models use non-canonical bias scales.

Changes:

  • Add CheckConvBiasScale validation to ConvNodeGroupSelector::Check to skip fusion when bias scale does not match x_scale * w_scale[i] within tolerance.
  • Add Python tests that run the same model with optimizations enabled vs disabled to detect bias-scale-related corruption.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selectors.cc Adds bias scale validation gate to prevent incorrect QDQ Conv fusion into QLinearConv.
onnxruntime/test/python/quantization/test_qdq.py Adds regression tests for matched vs mismatched bias scales under different optimization levels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread onnxruntime/test/python/quantization/test_qdq.py
…scale check

CheckConvBiasScale derived num_channels from w_scales.size() alone, which is 1
for per-tensor weight scale even when bias_scale is a per-channel vector. That
caused valid graphs (per-channel bias with scalar weight scale) to be rejected
and only one channel to be validated when num_channels was inferred wrongly.

Use the larger of w_num/b_num for num_channels and broadcast either scale
across channels when scalar. Reject only when both are vectors with mismatched
lengths.

Also extend the matching/mismatched-bias-scale tests to save the optimized
graph and assert QLinearConv presence/absence directly via op counts, so the
tests fail fast if fusion silently disables for an unrelated reason.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conv + Q/DQ produces bad output in CPUExecutionProvider

2 participants