Kunshang/flash attn interface #11

YizhouZ · 2025-08-11T08:15:45Z

First PR for cutlass chunk_prefill

Signed-off-by: Kunshang Ji <[email protected]>

* add cutlass Signed-off-by: Kunshang Ji <[email protected]> * fix import Signed-off-by: Kunshang Ji <[email protected]> --------- Signed-off-by: Kunshang Ji <[email protected]>

Signed-off-by: Yizhou Wang <[email protected]>

yma11 · 2025-09-08T02:00:28Z

csrc/xpu/cutlass_kernels/chunk_prefill.hpp

+  int head_size;
+  int max_blocks_per_seq;
+  int block_size;
+  bool is_causal;


please add a placeholder for sink support s_aux: Optional[torch.Tensor] = None,.

jikunshang · 2025-09-12T04:49:44Z

csrc/xpu/cutlass_kernels/chunk_prefill.hpp

+  if(cuType == CutlassType::half) {
+    FMHAKernel<typename chunk_policy::ShapeQK, typename chunk_policy::ShapePV,
+               typename chunk_policy::ShapeOutPut, typename chunk_policy::SubgroupLayout, PipelineStages,
+               cutlass::half_t, XE_8x16x16_F32F16F16F32_TT>::dispatch(queue, args);


qq: do we support bf16?

jikunshang · 2025-09-12T04:50:52Z

csrc/xpu/cutlass_kernels/utils.hpp

+  } else {
+    TORCH_INTERNAL_ASSERT(
+        false,
+        "");


Please add some error log

jikunshang · 2025-09-12T04:51:51Z

csrc/xpu/helper.h

@@ -0,0 +1,127 @@
+/***************************************************************************************************


maybe we can remove this file since the example is also removed.

jikunshang · 2025-09-12T04:55:16Z

csrc/flash_attn/flash_api.cpp

+      is_causal);
+
+  if(return_softmax) {
+    auto softmax_lse = torch::empty_like(out);


seems it will always return an empty tensor, please add some FIXME if not support for now.

jikunshang and others added 8 commits August 1, 2025 00:59

add flash attention interface

d41cf57

Signed-off-by: Kunshang Ji <[email protected]>

update interface

ce9f31d

Signed-off-by: Kunshang Ji <[email protected]>

add cutlass deps (#1)

fb6784f

* add cutlass Signed-off-by: Kunshang Ji <[email protected]> * fix import Signed-off-by: Kunshang Ji <[email protected]> --------- Signed-off-by: Kunshang Ji <[email protected]>

add chunk_prefill step<1>

ce27fa2

fix register

ed0f846

fix cmake

b02a5a8

debug msg

a4a76ee

functional ready

ee1b719

This was referenced Aug 21, 2025

[WIP] scaled_mm cutlass kernel #21

Open

[WIP] Grouped gemm cutlass #22

Open

dbyoung18 added the cutlass label Aug 26, 2025

zero kv

0c93a3b

YizhouZ force-pushed the kunshang/flash_attn_interface branch from fb1b3ac to 0c93a3b Compare August 29, 2025 03:16

YizhouZ added 6 commits September 2, 2025 02:10

int num_pages

5a01415

add aot in cmake

07eadf0

add simple test

d855484

merge main

9d3b87c

Signed-off-by: Yizhou Wang <[email protected]>

support 256 & enable chunk_prefill pytest

2fa5829

Signed-off-by: Yizhou Wang <[email protected]>

remove test

abb9c3f

Signed-off-by: Yizhou Wang <[email protected]>

yma11 reviewed Sep 8, 2025

View reviewed changes

jikunshang requested a review from pengzhao-intel September 12, 2025 02:04

jikunshang reviewed Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kunshang/flash attn interface #11

Kunshang/flash attn interface #11

Uh oh!

YizhouZ commented Aug 11, 2025

Uh oh!

yma11 Sep 8, 2025

Uh oh!

jikunshang Sep 12, 2025

Uh oh!

jikunshang Sep 12, 2025

Uh oh!

jikunshang Sep 12, 2025

Uh oh!

jikunshang Sep 12, 2025

Uh oh!

Uh oh!

		@@ -0,0 +1,127 @@
		/***************************************************************************************************

Kunshang/flash attn interface #11

Are you sure you want to change the base?

Kunshang/flash attn interface #11

Uh oh!

Conversation

YizhouZ commented Aug 11, 2025

Uh oh!

yma11 Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!