Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: DP4AMatMul fix matmul for subgoup size 64 GPUs #23637

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

sushraja-msft
Copy link
Contributor

Description

This change moves away from using subgroup ops for quantization. This is because on AMD GPUs subgroup size is 64 and that is not handled in our quantization function, resulting in garbage output. Implementing subgroup size 64 quantization requires changing the workgroup size and then implementing support for subgroup size 128 becomes a challenge.

With the new implementation perf on intel ALD remains about the same 4.36s for 1000K prefill.

Tests for this change are present here
https://github.com/microsoft/onnxruntime/blob/e66650350b85cb5e3a408f6576fe6a7f4f4ddebc/onnxruntime/test/contrib_ops/matmul_4bits_test.cc

However, to trigger the current issue they must be run on a GPU with subgroup size 64.

@sushraja-msft
Copy link
Contributor Author

@qjia7 - FYI, I am not able to add you as a reviewer but wanted to share for awareness.

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Feb 11, 2025
@sushraja-msft sushraja-msft changed the title DP4AMatMul fix matmul for subgoup size 64 GPUs WIP: DP4AMatMul fix matmul for subgoup size 64 GPUs Feb 12, 2025
@sushraja-msft sushraja-msft force-pushed the user/sushraja/fix_dp4_quantization branch 2 times, most recently from 1ab5a6a to 1decc48 Compare February 12, 2025 20:59
@sushraja-msft sushraja-msft force-pushed the user/sushraja/fix_dp4_quantization branch from 1decc48 to 4f473cb Compare February 12, 2025 22:11
@guschmue guschmue merged commit 4e24d37 into main Feb 13, 2025
96 of 98 checks passed
@guschmue guschmue deleted the user/sushraja/fix_dp4_quantization branch February 13, 2025 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants