Skip to content

feat(quantization): add ActivationRestrictedAsymmetric option#28237

Open
Rishi-Dave wants to merge 1 commit intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/restricted-asymmetric-quant
Open

feat(quantization): add ActivationRestrictedAsymmetric option#28237
Rishi-Dave wants to merge 1 commit intomicrosoft:mainfrom
Rishi-Dave:rishidave/feat/restricted-asymmetric-quant

Conversation

@Rishi-Dave
Copy link
Copy Markdown
Contributor

Description

Adds a new ActivationRestrictedAsymmetric extra-option to the Python
quantization tools. When enabled, uint8 activation zero-points are snapped
to either 0 (when rmin >= 0, e.g. post-ReLU/Sigmoid tensors) or 128
(when rmin < 0). The scale is recomputed so the dequantized range still
covers [rmin, rmax] without clipping.

This restricted asymmetric mode is required by some hardware accelerators
that only support these two zero-point values for uint8 quantization,
without requiring the full restriction to symmetric (zero-point = 128 for
all tensors).

Motivation and Context

Fixes #21398.

Existing options cover only fully symmetric (ActivationSymmetric
zero-point fixed at 128) or unrestricted asymmetric. There was no mode
that picks the closer of {0, 128} per tensor based on its observed range.

Changes

  • quant_utils.py: new snap_zero_point_to_uint8(rmin, rmax) helper.
  • base_quantizer.py: parse new ActivationRestrictedAsymmetric extra-option.
  • onnx_quantizer.py and qdq_quantizer.py: apply snap after
    compute_scale_zp in the activation path. Guarded on
    quant_type == UINT8 and not symmetric. Weight and int8 paths are
    untouched.
  • quantize.py: document the new option in the four extra_options
    docstrings.
  • test_symmetric_flag.py: new TestRestrictedAsymmetricFlag covering
    three cases (positive range → zp=0, signed range → zp=128, and
    option-disabled regression).

Testing

```
python -m pytest onnxruntime/test/python/quantization/test_symmetric_flag.py -v
```
All 7 tests pass (4 existing + 3 new). `lintrunner` is clean.

…t8 zero-point snapping

When extra_options={"ActivationRestrictedAsymmetric": True} is passed to
quantize_static (or a QDQ config), uint8 activation zero-points are snapped
to 0 when rmin >= 0 (e.g. post-ReLU tensors) or 128 when rmin < 0.  Scale
is recomputed so the dequantized range still covers [rmin, rmax] without
clipping.

- quant_utils: add snap_zero_point_to_uint8() helper (~28 LOC)
- base_quantizer: parse ActivationRestrictedAsymmetric extra-option flag
- onnx_quantizer: apply snap after compute_scale_zp in calc_quant_params
  (uint8, non-symmetric activations only)
- qdq_quantizer: same snap in QDQ calc_quant_params path
- quantize: document new option in all four extra_options docstrings
- test_symmetric_flag: add TestRestrictedAsymmetricFlag (3 test methods)

Refs microsoft#21398
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New restricted asymmetric quantization mode in QDQ mode with zero_point restricted to either 128 or 0

1 participant