[ONNX] Weight compression for opset < 21 by andrey-churkin · Pull Request #3497 · openvinotoolkit/nncf

andrey-churkin · 2025-05-16T07:44:29Z

Changes

Replace the MatMul operation with the MatMulNBits operation from ONNX Runtime contrib operators.
See https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md

Reason for changes

The DequantizeLinear operation supports the block_size attribute only starting from opset 21.

Related tickets

Ref: 163946

Tests

alexsu52

Please add an unit test for MatMulNBits + pack_int4_to_uint8

andrey-churkin · 2025-05-22T10:41:24Z

@@ -0,0 +1,179 @@
+,prompts,answers,language


@alexsu52 The req_qa_onnx.csv file was obtained using the following "gold" model:

model_gold = ORTModelForCausalLM.from_pretrained( self.fp32_model_dir, trust_remote_code=True, provider="OpenVINOExecutionProvider" )

It is identical to ref_qa.csv, which was generated using the model below:

model_gold = OVModelForCausalLM.from_pretrained( self.fp32_model_dir, trust_remote_code=True, load_in_8bit=False, compile=False, stateful=is_stateful, ov_config={"KV_CACHE_PRECISION": "f16"}, )

So, if you don't mind, I suggest using only the ref_qa.csv file.

andrey-churkin · 2025-05-22T10:45:39Z

+            if not (self.fp32_model_dir / self.ONNX_MODEL_NAME).exists():
+                opset_version = self.params.get("opset", None)
+                if opset_version:
+                    main_export(self.model_id, self.fp32_model_dir, opset=opset_version)
+                    self.model_hf = ORTModelForCausalLM.from_pretrained(self.fp32_model_dir, trust_remote_code=True)
+                else:
+                    self.model_hf = ORTModelForCausalLM.from_pretrained(self.model_id, export=True)
+                    self.model_hf.save_pretrained(self.fp32_model_dir)


We can't specify the opset version with ORTModelForCausalLM.from_pretrained(). To obtain a model with an opset version lower than 21, we should use the main_export() method (or a similar approach).

andrey-churkin · 2025-05-22T16:11:56Z

WC run: https://github.com/openvinotoolkit/nncf/actions/runs/15191579763

MaximProshin · 2025-05-26T09:14:08Z

To be merged after #3510

andrey-churkin · 2025-05-26T10:54:38Z

WC (Passed): https://github.com/openvinotoolkit/nncf/actions/runs/15251374658

andrey-churkin requested a review from a team as a code owner May 16, 2025 07:44

MaximProshin added the Code Freeze label May 16, 2025

github-actions Bot added NNCF ONNX Pull requests that updates NNCF ONNX NNCF PTQ Pull requests that updates NNCF PTQ labels May 16, 2025

alexsu52 requested a review from anzr299 May 19, 2025 04:52

alexsu52 assigned AlexanderDokuchaev May 19, 2025

alexsu52 requested a review from AlexanderDokuchaev May 19, 2025 04:52

anzr299 approved these changes May 19, 2025

View reviewed changes

alexsu52 reviewed May 20, 2025

View reviewed changes

andrey-churkin force-pushed the ac/wc branch from ac95c12 to 6935344 Compare May 22, 2025 09:19

andrey-churkin commented May 22, 2025

View reviewed changes

andrey-churkin requested a review from alexsu52 May 22, 2025 15:51

AlexanderDokuchaev approved these changes May 23, 2025

View reviewed changes

Comment thread nncf/onnx/graph/model_transformer.py

Comment thread nncf/quantization/algorithms/weight_compression/onnx_backend.py

andrey-churkin force-pushed the ac/wc branch 2 times, most recently from b84ba68 to fa16f34 Compare May 26, 2025 10:10

andrey-churkin added 11 commits May 26, 2025 13:19

init commit

7c34b7b

remove test

23a2d8f

Update cspell_dict

7c6e5dd

draft

418596d

test

750aa95

update conformance_weight_compression.yml

d310f27

fix

21f8f3a

fix

e3d683f

revert

9555578

opset19

241c58d

add tests

6d25cc8

andrey-churkin added 4 commits May 26, 2025 13:19

add test

a9e9f8a

minor fix

330a25f

add times

0960cda

update

fa16f34

andrey-churkin mentioned this pull request May 26, 2025

[ONNX] LLM compression example for ONNX #3513

Merged

alexsu52 merged commit 5ae382e into openvinotoolkit:develop May 27, 2025
19 checks passed

andrey-churkin mentioned this pull request Jun 4, 2025

[release_v2170] Release notes #3524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ONNX] Weight compression for opset < 21#3497

[ONNX] Weight compression for opset < 21#3497
alexsu52 merged 15 commits into
openvinotoolkit:developfrom
andrey-churkin:ac/wc

andrey-churkin commented May 16, 2025

Uh oh!

alexsu52 left a comment •

edited

Loading

Uh oh!

andrey-churkin May 22, 2025

Uh oh!

alexsu52 May 25, 2025

Uh oh!

andrey-churkin May 22, 2025

Uh oh!

andrey-churkin commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

MaximProshin commented May 26, 2025

Uh oh!

andrey-churkin commented May 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

andrey-churkin commented May 16, 2025

Changes

Reason for changes

Related tickets

Tests

Uh oh!

alexsu52 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrey-churkin May 22, 2025

Choose a reason for hiding this comment

Uh oh!

alexsu52 May 25, 2025

Choose a reason for hiding this comment

Uh oh!

andrey-churkin May 22, 2025

Choose a reason for hiding this comment

Uh oh!

andrey-churkin commented May 22, 2025

Uh oh!

Uh oh!

Uh oh!

MaximProshin commented May 26, 2025

Uh oh!

andrey-churkin commented May 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alexsu52 left a comment •

edited

Loading