-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate KleidiAI for MatMulNBits via MlasQNBitGemm #23627
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
Signed-off-by: Michael Tyler <[email protected]>
@@ -1,4 +1,5 @@ | |||
# Copyright (c) Microsoft Corporation. All rights reserved. | |||
# SPDX-FileCopyrightText: Copyright 2024-2025 Arm Limited and/or its affiliates <[email protected]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file was not mostly written by ARM. It has contributions from many contributors outside of Microsoft. If everyone adds such a license header here, it will be soon unmanageable. Please note that when anyone makes a contribution to Microsoft's open source project, they need to agree that Microsoft has the right to re-license the change. Therefore I think it's better to keep the license header unchanged.
@@ -59,4 +59,4 @@ composable_kernel;https://github.com/ROCmSoftwarePlatform/composable_kernel/arch | |||
directx_headers;https://github.com/microsoft/DirectX-Headers/archive/refs/tags/v1.613.1.zip;47653509a3371eabb156360f42faf582f314bf2e | |||
cudnn_frontend;https://github.com/NVIDIA/cudnn-frontend/archive/refs/tags/v1.7.0.zip;d0753d8d5b39947ca0729d7773cb84653a129eb1 | |||
dawn;https://github.com/google/dawn/archive/b9b4a37041dec3dd62ac92014a6cc1aece48d9f3.zip;e8b8c2ebabdedb7c57d931fc4a19ae22146d31e1 | |||
kleidiai;https://gitlab.arm.com/kleidi/kleidiai/-/archive/d15722976120710080ca098fe8ddabf4556cb40f/kleidiai-d15722976120710080ca098fe8ddabf4556cb40f.zip;d6c840d00c3b05aedf06e957ddaece1013d1f40b | |||
kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.3.0.tar.gz;58777d6907bdedb165fbca2e467a26b1363dc924 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update https://github.com/microsoft/onnxruntime/blob/main/ThirdPartyNotices.txt as well.
@@ -99,6 +100,10 @@ function(setup_mlas_source_for_windows) | |||
${MLAS_SRC_DIR}/halfgemm_kernel_neon_fp16.cpp | |||
) | |||
|
|||
setup_kleidiai() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that kleidiai will be a new dependency for all ONNX Runtime build configs. For such changes the onnx runtime team needs to hold an internal discussion with the leadership of this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot move forward until the internal review is complete, since this PR adds a new dependency.
Please fix the iOS build errors. |
Description
This PR integrates Arm® KleidiAI™ to provide optimized assembly kernels for matrix multiplication with 4-bit quantized weights. These changes target the MlasQNBitGemm functions, and can be utilized via the MatMulNBits operator.
Motivation and Context
These optimized assembly kernels lead to significant performance improvements on Arm-based devices.