Replace FasterTransformers like KV cache layout and kernel with flash attention for better support for longer sequence#239
Open
JerryGJX wants to merge 2 commits intomit-han-lab:mainfrom
Open
Replace FasterTransformers like KV cache layout and kernel with flash attention for better support for longer sequence#239JerryGJX wants to merge 2 commits intomit-han-lab:mainfrom
JerryGJX wants to merge 2 commits intomit-han-lab:mainfrom
Commits
Commits on Nov 16, 2024
- committed
Junxian Guo - committed
Junxian Guo