Request to replace the model acceleration technique "flash attention" with a more versatile "vllm". #138

jake123456789ok · 2023-11-06T08:42:03Z

jake123456789ok
Nov 6, 2023

I hope that in the future, the YI model can replace the accelerated default "flash attention" with vllm, so that we can benefit from the latest inference speed technology.

（"This statement means that vllm already supports the latest 'flash decoding' and there are plans to support 'flash decoding++' in the future."）

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request to replace the model acceleration technique "flash attention" with a more versatile "vllm". #138

{{title}}

Replies: 0 comments

Select a reply

Request to replace the model acceleration technique "flash attention" with a more versatile "vllm". #138

jake123456789ok Nov 6, 2023

Replies: 0 comments

jake123456789ok
Nov 6, 2023