Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate KleidiAI for MatMulNBits via MlasQNBitGemm #23627

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

MichaelTylerArm
Copy link
Contributor

Description

This PR integrates Arm® KleidiAI™ to provide optimized assembly kernels for matrix multiplication with 4-bit quantized weights. These changes target the MlasQNBitGemm functions, and can be utilized via the MatMulNBits operator.

Motivation and Context

These optimized assembly kernels lead to significant performance improvements on Arm-based devices.

@MichaelTylerArm MichaelTylerArm requested a review from a team as a code owner February 10, 2025 10:53
@@ -1,4 +1,5 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# SPDX-FileCopyrightText: Copyright 2024-2025 Arm Limited and/or its affiliates <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was not mostly written by ARM. It has contributions from many contributors outside of Microsoft. If everyone adds such a license header here, it will be soon unmanageable. Please note that when anyone makes a contribution to Microsoft's open source project, they need to agree that Microsoft has the right to re-license the change. Therefore I think it's better to keep the license header unchanged.

@@ -59,4 +59,4 @@ composable_kernel;https://github.com/ROCmSoftwarePlatform/composable_kernel/arch
directx_headers;https://github.com/microsoft/DirectX-Headers/archive/refs/tags/v1.613.1.zip;47653509a3371eabb156360f42faf582f314bf2e
cudnn_frontend;https://github.com/NVIDIA/cudnn-frontend/archive/refs/tags/v1.7.0.zip;d0753d8d5b39947ca0729d7773cb84653a129eb1
dawn;https://github.com/google/dawn/archive/b9b4a37041dec3dd62ac92014a6cc1aece48d9f3.zip;e8b8c2ebabdedb7c57d931fc4a19ae22146d31e1
kleidiai;https://gitlab.arm.com/kleidi/kleidiai/-/archive/d15722976120710080ca098fe8ddabf4556cb40f/kleidiai-d15722976120710080ca098fe8ddabf4556cb40f.zip;d6c840d00c3b05aedf06e957ddaece1013d1f40b
kleidiai;https://github.com/ARM-software/kleidiai/archive/refs/tags/v1.3.0.tar.gz;58777d6907bdedb165fbca2e467a26b1363dc924
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -99,6 +100,10 @@ function(setup_mlas_source_for_windows)
${MLAS_SRC_DIR}/halfgemm_kernel_neon_fp16.cpp
)

setup_kleidiai()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that kleidiai will be a new dependency for all ONNX Runtime build configs. For such changes the onnx runtime team needs to hold an internal discussion with the leadership of this project.

Copy link
Member

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot move forward until the internal review is complete, since this PR adds a new dependency.

@snnn
Copy link
Member

snnn commented Feb 10, 2025

Please fix the iOS build errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants