[MLAS] Fix MatMulNBits CPU LUT GEMM path to apply optional bias by hariharans29 · Pull Request #28742 · microsoft/onnxruntime

hariharans29 · 2026-06-01T20:00:52Z

Description

The MatMulNBits LUT GEMM path (enabled by session config
mlas.use_lut_gemm=1) silently dropped the optional bias input,
causing incorrect outputs whenever the graph optimizer's
MatMulNBits+Add fusion produced MatMulNBits nodes with bias
populated. This is the common case for transformer models with
fused linear+bias layers.

Fuses the bias broadcast-add into the existing per-tile parallel
loop in MlasLutGemm so the addition is multi-threaded and
operates on data already hot in cache.

Also gates prefer_lut_gemm_ to T1==float since MlasLutGemm
interprets A/C as float* internally.

Adds MLAS-level (test_sqlutgemm SQLutGemmBiasTest) and op-level
(matmul_2bits_test MatMulNBitsLutGemm.*Bias) coverage for the
bias-present case across BlkLen 32/64/128, M in {1,32}, both
symmetric and asymmetric quantization.

Motivation and Context

Fix correctness issues with LUT Gemm

Copilot

Pull request overview

This PR fixes a correctness issue where the LUT GEMM path (MLAS MlasLutGemm) was silently ignoring the optional bias input, by adding a fused bias-add in the MLAS kernel, wiring the MatMulNBits CPU kernel to pass bias through, and adding targeted regression tests.

Changes:

Extend MlasLutGemm to accept an optional Bias[N] pointer and fuse bias broadcast-add into the per-tile compute loop.
Update MatMulNBits LUT path to forward bias to MlasLutGemm.
Add MLAS-kernel-level and CPU-EP operator-level tests validating correct bias broadcast behavior across multiple shapes/configurations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
onnxruntime/test/mlas/unittest/test_sqlutgemm.cpp	Adds MLAS unit tests that compare fused-bias results against a reference GEMM + broadcast-add.
onnxruntime/test/contrib_ops/matmul_2bits_test.cc	Adds MatMulNBits LUT-path regression tests that include a non-trivial bias vector and validate broadcast-add.
onnxruntime/core/mlas/lib/qlutgemm.cpp	Implements fused bias broadcast-add in the LUT GEMM kernel after each output tile is computed.
onnxruntime/core/mlas/inc/mlas_qnbit.h	Extends the public `MlasLutGemm` API signature/docs to include an optional bias pointer.
onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc	Passes bias to `MlasLutGemm` from the MatMulNBits LUT path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions

You can commit the suggested changes from lintrunner.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

tianleiwu

Review

This correctly fixes the bug where the MatMulNBits LUT GEMM path (mlas.use_lut_gemm=1) silently dropped the optional bias input — the common case for the MatMulNBits+Add fusion. The fix threads bias into MlasLutGemm and broadcast-adds it inside the existing per-tile parallel loop, and additionally gates prefer_lut_gemm_ on T1 == float so the fp32-only kernel is never reached for MLFloat16.

Correctness — verified:

Output C is [M, N] row-major (dst_offset = OutputRows * ine11 + ichunk0 * ChunkSize0, OutputRows == N). The bias slice Bias + ichunk0 * ChunkSize0 of length tile_n = ir0_end - ir0_start maps exactly to the output-feature range just written by ComputeGemm, and stays within [0, N).
ChunkSize0 = N / n_tiles_num divides N exactly, so every tile is full and Bias[N] is never over-read.
Each chunk writes a disjoint output region, so the fused bias add is race-free.

Performance: bias add is fused into the hot loop (cache-resident) and inherits the existing per-chunk parallelism with no extra sync or allocation.

Tests: strong coverage at both MLAS (SQLutGemmBiasTest) and op level (MatMulNBitsLutGemm.*Bias) across BlkLen 32/64/128, M in {1,32}, symmetric/asymmetric. The n%7/n%11-varying bias vector would catch any stride/transpose bug.

Note: the earlier automated comment about MatMulNBits<MLFloat16> reaching the LUT path is already addressed by the std::is_same_v<T1, float> gating in this PR.

One minor (optional) observation inline.

MLAS commits

16dc03b

hariharans29 requested a review from Copilot June 1, 2026 20:00

Copilot started reviewing on behalf of hariharans29 June 1, 2026 20:01 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

Copilot comment

2128666

hariharans29 changed the title ~~WIP: [MLAS] Support bias in LUT Gemm~~ WIP: [MLAS] Fix MatMulNBits CPU LUT GEMM path to apply optional bias Jun 1, 2026

Add benchmark

86cef51

github-actions Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread onnxruntime/test/mlas/bench/bench_lutgemm.cpp Outdated

Update onnxruntime/test/mlas/bench/bench_lutgemm.cpp

fd1c8d7

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

hariharans29 requested a review from tianleiwu June 1, 2026 22:23

hariharans29 changed the title ~~WIP: [MLAS] Fix MatMulNBits CPU LUT GEMM path to apply optional bias~~ [MLAS] Fix MatMulNBits CPU LUT GEMM path to apply optional bias Jun 1, 2026

tianleiwu reviewed Jun 2, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

xadupre approved these changes Jun 2, 2026

View reviewed changes

hariharans29 merged commit 1077c77 into main Jun 2, 2026
88 of 90 checks passed

hariharans29 deleted the hari/mlas_lut_fix_1 branch June 2, 2026 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLAS] Fix MatMulNBits CPU LUT GEMM path to apply optional bias#28742

[MLAS] Fix MatMulNBits CPU LUT GEMM path to apply optional bias#28742
hariharans29 merged 4 commits into
mainfrom
hari/mlas_lut_fix_1

hariharans29 commented Jun 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

tianleiwu left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hariharans29 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianleiwu left a comment

Choose a reason for hiding this comment

Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hariharans29 commented Jun 1, 2026 •

edited

Loading