Skip to content

[ONNX] Weight compression for opset < 21#3497

Merged
alexsu52 merged 15 commits into
openvinotoolkit:developfrom
andrey-churkin:ac/wc
May 27, 2025
Merged

[ONNX] Weight compression for opset < 21#3497
alexsu52 merged 15 commits into
openvinotoolkit:developfrom
andrey-churkin:ac/wc

Conversation

@andrey-churkin

Copy link
Copy Markdown
Contributor

Changes

Replace the MatMul operation with the MatMulNBits operation from ONNX Runtime contrib operators.
See https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md

Reason for changes

The DequantizeLinear operation supports the block_size attribute only starting from opset 21.

Related tickets

Ref: 163946

Tests

@andrey-churkin andrey-churkin requested a review from a team as a code owner May 16, 2025 07:44
@github-actions github-actions Bot added NNCF ONNX Pull requests that updates NNCF ONNX NNCF PTQ Pull requests that updates NNCF PTQ labels May 16, 2025
@alexsu52 alexsu52 requested a review from anzr299 May 19, 2025 04:52

@alexsu52 alexsu52 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an unit test for MatMulNBits + pack_int4_to_uint8

@@ -0,0 +1,179 @@
,prompts,answers,language

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexsu52 The req_qa_onnx.csv file was obtained using the following "gold" model:

model_gold = ORTModelForCausalLM.from_pretrained(
    self.fp32_model_dir,
    trust_remote_code=True,
    provider="OpenVINOExecutionProvider"
)

It is identical to ref_qa.csv, which was generated using the model below:

model_gold = OVModelForCausalLM.from_pretrained(
    self.fp32_model_dir,
    trust_remote_code=True,
    load_in_8bit=False,
    compile=False,
    stateful=is_stateful,
    ov_config={"KV_CACHE_PRECISION": "f16"},
)

So, if you don't mind, I suggest using only the ref_qa.csv file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines +169 to +176
if not (self.fp32_model_dir / self.ONNX_MODEL_NAME).exists():
opset_version = self.params.get("opset", None)
if opset_version:
main_export(self.model_id, self.fp32_model_dir, opset=opset_version)
self.model_hf = ORTModelForCausalLM.from_pretrained(self.fp32_model_dir, trust_remote_code=True)
else:
self.model_hf = ORTModelForCausalLM.from_pretrained(self.model_id, export=True)
self.model_hf.save_pretrained(self.fp32_model_dir)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't specify the opset version with ORTModelForCausalLM.from_pretrained(). To obtain a model with an opset version lower than 21, we should use the main_export() method (or a similar approach).

@andrey-churkin andrey-churkin requested a review from alexsu52 May 22, 2025 15:51
@andrey-churkin

Copy link
Copy Markdown
Contributor Author

Comment thread nncf/onnx/graph/model_transformer.py
Comment thread nncf/quantization/algorithms/weight_compression/onnx_backend.py
@MaximProshin

Copy link
Copy Markdown
Collaborator

To be merged after #3510

@andrey-churkin andrey-churkin force-pushed the ac/wc branch 2 times, most recently from b84ba68 to fa16f34 Compare May 26, 2025 10:10
@andrey-churkin

Copy link
Copy Markdown
Contributor Author

WC (Passed): https://github.com/openvinotoolkit/nncf/actions/runs/15251374658

@alexsu52 alexsu52 merged commit 5ae382e into openvinotoolkit:develop May 27, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Code Freeze NNCF ONNX Pull requests that updates NNCF ONNX NNCF PTQ Pull requests that updates NNCF PTQ

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants