You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix cuda memory access violation in GQA FlashAttention (microsoft#24447)
### Description
zeros_ memory buffer was uninitialized, but it must be initialized to
zero.
### Motivation and Context
A memory allocator change in GenAI started crashing in FlashAttention
and this was eventually tracked down to be the cause. The allocator
change was innocent. I'm not sure how this didn't fail previously, or if
it was we weren't getting the reports about it.
Co-authored-by: Ryan Hill <{ID}+{username}@users.noreply.github.com>
0 commit comments