Skip to content

Fix FlashAttentionDecodeSplitVx indirect dispatch input ordering#27926

Merged
guschmue merged 1 commit into
microsoft:mainfrom
jchen10:indirect_dispatch
Apr 15, 2026
Merged

Fix FlashAttentionDecodeSplitVx indirect dispatch input ordering#27926
guschmue merged 1 commit into
microsoft:mainfrom
jchen10:indirect_dispatch

Conversation

@jchen10
Copy link
Copy Markdown
Contributor

@jchen10 jchen10 commented Apr 1, 2026

Move SetIndirectDispatchTensor after all AddInput calls to ensure the
indirect buffer is the last program input.

@jchen10
Copy link
Copy Markdown
Contributor Author

jchen10 commented Apr 1, 2026

The error to fix:

WebGPU device error(2): Error while parsing WGSL: :44:23 error: no matching overload for 'operator * (u32, f16)'

9 candidate operators:
 ΓÇó 'operator * (T  Γ£ô , T  Γ£ù ) -> T' where:
      Γ£ô  'T' is 'abstract-float', 'abstract-int', 'f32', 'i32', 'u32' or 'f16'
 ΓÇó 'operator * (vecN<T>  Γ£ù , T  Γ£ô ) -> vecN<T>' where:
      Γ£ô  'T' is 'abstract-float', 'abstract-int', 'f32', 'i32', 'u32' or 'f16'
 ΓÇó 'operator * (T  Γ£ô , vecN<T>  Γ£ù ) -> vecN<T>' where:
      Γ£ô  'T' is 'abstract-float', 'abstract-int', 'f32', 'i32', 'u32' or 'f16'
 ΓÇó 'operator * (matNxM<T>  Γ£ù , T  Γ£ô ) -> matNxM<T>' where:
      Γ£ô  'T' is 'abstract-float', 'f32' or 'f16'
 ΓÇó 'operator * (T  Γ£ù , matNxM<T>  Γ£ù ) -> matNxM<T>' where:
      Γ£ù  'T' is 'abstract-float', 'f32' or 'f16'
 ΓÇó 'operator * (vecN<T>  Γ£ù , vecN<T>  Γ£ù ) -> vecN<T>' where:
      Γ£ù  'T' is 'abstract-float', 'abstract-int', 'f32', 'i32', 'u32' or 'f16'
 ΓÇó 'operator * (matCxR<T>  Γ£ù , vecC<T>  Γ£ù ) -> vecR<T>' where:
      Γ£ù  'T' is 'abstract-float', 'f32' or 'f16'
 ΓÇó 'operator * (vecR<T>  Γ£ù , matCxR<T>  Γ£ù ) -> vecC<T>' where:
      Γ£ù  'T' is 'abstract-float', 'f32' or 'f16'
 ΓÇó 'operator * (matKxR<T>  Γ£ù , matCxK<T>  Γ£ù ) -> matCxR<T>' where:
      Γ£ù  'T' is 'abstract-float', 'f32' or 'f16'

  let workgroup_idx = workgroup_id.z * num_workgroups_x * num_workgroups_y + workgroup_id.y * num_workgroups_x + workgroup_id.x;
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


 - While calling [Device].CreateShaderModule([ShaderModuleDescriptor ""FlashAttentionDecodeSplitVx""]).

2026-04-01 14:38:15.0850117 [E:onnxruntime:, webgpu_context.cc:101 onnxruntime::webgpu::WebGpuContext::Initialize::<lambda_1>::()::<lambda_2>::operator ()] WebGPU device error(2): Error while parsing WGSL: :44:23 error: no matching overload for 'operator * (u32, f16)'

@jchen10
Copy link
Copy Markdown
Contributor Author

jchen10 commented Apr 1, 2026

@qjia7 @fs-eire @guschmue PTAL

Comment thread onnxruntime/contrib_ops/webgpu/bert/flash_attention.cc
@jchen10 jchen10 force-pushed the indirect_dispatch branch from 03dc351 to 1a33a05 Compare April 1, 2026 08:51
@jchen10
Copy link
Copy Markdown
Contributor Author

jchen10 commented Apr 10, 2026

@guschmue PTAL, just a gentle reminder.

@guschmue
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Apr 10, 2026
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 4 pipeline(s).

@guschmue
Copy link
Copy Markdown
Contributor

some unrelated error with tests running on xnnpack but only on macos.
Don't see this in in other PRs.
Can you merge main?

Move SetIndirectDispatchTensor after all AddInput calls to ensure the
indirect buffer is the last program input. When head_sink was added
after SetIndirectDispatchTensor, the shader variable types were
swapped, causing a u32*f16 WGSL compilation error.
@jchen10 jchen10 force-pushed the indirect_dispatch branch from 1a33a05 to b068cf8 Compare April 11, 2026 02:19
@jchen10
Copy link
Copy Markdown
Contributor Author

jchen10 commented Apr 11, 2026

Rebased on main. Please help try again. Thanks!

@guschmue guschmue closed this Apr 14, 2026
@guschmue guschmue reopened this Apr 14, 2026
@guschmue
Copy link
Copy Markdown
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 3 pipeline(s).

@guschmue
Copy link
Copy Markdown
Contributor

/azp run Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

@eserscor eserscor enabled auto-merge (squash) April 15, 2026 21:40
@eserscor eserscor disabled auto-merge April 15, 2026 21:40
@eserscor
Copy link
Copy Markdown
Contributor

/azp run Windows ARM64 QNN CI Pipeline

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@guschmue guschmue merged commit 3bca941 into microsoft:main Apr 15, 2026
148 of 183 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants