-
Notifications
You must be signed in to change notification settings - Fork 11
Kunshang/flash attn interface #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Kunshang/flash attn interface #11
Conversation
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
* add cutlass Signed-off-by: Kunshang Ji <[email protected]> * fix import Signed-off-by: Kunshang Ji <[email protected]> --------- Signed-off-by: Kunshang Ji <[email protected]>
fb1b3ac
to
0c93a3b
Compare
Signed-off-by: Yizhou Wang <[email protected]>
Signed-off-by: Yizhou Wang <[email protected]>
Signed-off-by: Yizhou Wang <[email protected]>
int head_size; | ||
int max_blocks_per_seq; | ||
int block_size; | ||
bool is_causal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a placeholder for sink support s_aux: Optional[torch.Tensor] = None,
.
if(cuType == CutlassType::half) { | ||
FMHAKernel<typename chunk_policy::ShapeQK, typename chunk_policy::ShapePV, | ||
typename chunk_policy::ShapeOutPut, typename chunk_policy::SubgroupLayout, PipelineStages, | ||
cutlass::half_t, XE_8x16x16_F32F16F16F32_TT>::dispatch(queue, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: do we support bf16?
} else { | ||
TORCH_INTERNAL_ASSERT( | ||
false, | ||
""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some error log
@@ -0,0 +1,127 @@ | |||
/*************************************************************************************************** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can remove this file since the example is also removed.
is_causal); | ||
|
||
if(return_softmax) { | ||
auto softmax_lse = torch::empty_like(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems it will always return an empty tensor, please add some FIXME if not support for now.
First PR for cutlass chunk_prefill