Fix FlashAttentionDecodeSplitVx indirect dispatch input ordering by jchen10 · Pull Request #27926 · microsoft/onnxruntime

jchen10 · 2026-04-01T08:11:35Z

Move SetIndirectDispatchTensor after all AddInput calls to ensure the
indirect buffer is the last program input.

jchen10 · 2026-04-01T08:12:47Z

The error to fix:

WebGPU device error(2): Error while parsing WGSL: :44:23 error: no matching overload for 'operator * (u32, f16)'

9 candidate operators:
 ΓÇó 'operator * (T  Γ£ô , T  Γ£ù ) -> T' where:
      Γ£ô  'T' is 'abstract-float', 'abstract-int', 'f32', 'i32', 'u32' or 'f16'
 ΓÇó 'operator * (vecN<T>  Γ£ù , T  Γ£ô ) -> vecN<T>' where:
      Γ£ô  'T' is 'abstract-float', 'abstract-int', 'f32', 'i32', 'u32' or 'f16'
 ΓÇó 'operator * (T  Γ£ô , vecN<T>  Γ£ù ) -> vecN<T>' where:
      Γ£ô  'T' is 'abstract-float', 'abstract-int', 'f32', 'i32', 'u32' or 'f16'
 ΓÇó 'operator * (matNxM<T>  Γ£ù , T  Γ£ô ) -> matNxM<T>' where:
      Γ£ô  'T' is 'abstract-float', 'f32' or 'f16'
 ΓÇó 'operator * (T  Γ£ù , matNxM<T>  Γ£ù ) -> matNxM<T>' where:
      Γ£ù  'T' is 'abstract-float', 'f32' or 'f16'
 ΓÇó 'operator * (vecN<T>  Γ£ù , vecN<T>  Γ£ù ) -> vecN<T>' where:
      Γ£ù  'T' is 'abstract-float', 'abstract-int', 'f32', 'i32', 'u32' or 'f16'
 ΓÇó 'operator * (matCxR<T>  Γ£ù , vecC<T>  Γ£ù ) -> vecR<T>' where:
      Γ£ù  'T' is 'abstract-float', 'f32' or 'f16'
 ΓÇó 'operator * (vecR<T>  Γ£ù , matCxR<T>  Γ£ù ) -> vecC<T>' where:
      Γ£ù  'T' is 'abstract-float', 'f32' or 'f16'
 ΓÇó 'operator * (matKxR<T>  Γ£ù , matCxK<T>  Γ£ù ) -> matCxR<T>' where:
      Γ£ù  'T' is 'abstract-float', 'f32' or 'f16'

  let workgroup_idx = workgroup_id.z * num_workgroups_x * num_workgroups_y + workgroup_id.y * num_workgroups_x + workgroup_id.x;
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


 - While calling [Device].CreateShaderModule([ShaderModuleDescriptor ""FlashAttentionDecodeSplitVx""]).

2026-04-01 14:38:15.0850117 [E:onnxruntime:, webgpu_context.cc:101 onnxruntime::webgpu::WebGpuContext::Initialize::<lambda_1>::()::<lambda_2>::operator ()] WebGPU device error(2): Error while parsing WGSL: :44:23 error: no matching overload for 'operator * (u32, f16)'

jchen10 · 2026-04-01T08:28:56Z

@qjia7 @fs-eire @guschmue PTAL

jchen10 · 2026-04-10T06:43:52Z

@guschmue PTAL, just a gentle reminder.

guschmue · 2026-04-10T15:19:26Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-04-10T15:19:48Z

Azure Pipelines successfully started running 4 pipeline(s).

guschmue · 2026-04-10T16:49:13Z

some unrelated error with tests running on xnnpack but only on macos.
Don't see this in in other PRs.
Can you merge main?

Move SetIndirectDispatchTensor after all AddInput calls to ensure the indirect buffer is the last program input. When head_sink was added after SetIndirectDispatchTensor, the shader variable types were swapped, causing a u32*f16 WGSL compilation error.

jchen10 · 2026-04-11T02:25:57Z

Rebased on main. Please help try again. Thanks!

guschmue · 2026-04-15T15:45:48Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2026-04-15T15:46:05Z

Azure Pipelines successfully started running 3 pipeline(s).

guschmue · 2026-04-15T17:54:27Z

/azp run Windows ARM64 QNN CI Pipeline

azure-pipelines · 2026-04-15T17:54:35Z

No pipelines are associated with this pull request.

eserscor · 2026-04-15T21:43:04Z

/azp run Windows ARM64 QNN CI Pipeline

azure-pipelines · 2026-04-15T21:43:14Z

Azure Pipelines successfully started running 1 pipeline(s).

qjia7 reviewed Apr 1, 2026

View reviewed changes

Comment thread onnxruntime/contrib_ops/webgpu/bert/flash_attention.cc

jchen10 force-pushed the indirect_dispatch branch from 03dc351 to 1a33a05 Compare April 1, 2026 08:51

qjia7 approved these changes Apr 1, 2026

View reviewed changes

guschmue added the ep:WebGPU ort-web webgpu provider label Apr 10, 2026

jchen10 force-pushed the indirect_dispatch branch from 1a33a05 to b068cf8 Compare April 11, 2026 02:19

guschmue approved these changes Apr 13, 2026

View reviewed changes

guschmue closed this Apr 14, 2026

guschmue reopened this Apr 14, 2026

eserscor enabled auto-merge (squash) April 15, 2026 21:40

eserscor disabled auto-merge April 15, 2026 21:40

guschmue merged commit 3bca941 into microsoft:main Apr 15, 2026
148 of 183 checks passed

Conversation

jchen10 commented Apr 1, 2026

Uh oh!

jchen10 commented Apr 1, 2026

Uh oh!

jchen10 commented Apr 1, 2026

Uh oh!

Uh oh!

jchen10 commented Apr 10, 2026

Uh oh!

guschmue commented Apr 10, 2026

Uh oh!

azure-pipelines Bot commented Apr 10, 2026

Uh oh!

guschmue commented Apr 10, 2026

Uh oh!

jchen10 commented Apr 11, 2026

Uh oh!

guschmue commented Apr 15, 2026

Uh oh!

azure-pipelines Bot commented Apr 15, 2026

Uh oh!

guschmue commented Apr 15, 2026

Uh oh!

azure-pipelines Bot commented Apr 15, 2026

Uh oh!

eserscor commented Apr 15, 2026

Uh oh!

azure-pipelines Bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants