vllm 优化之 PagedAttention 源码解读 - Zhang #212

2025-02-04T09:23:34Z

giscus[bot]
bot Feb 4, 2025

vllm 优化之 PagedAttention 源码解读 - Zhang

从事 LLM 推理部署、视觉算法开发、模型压缩部署以及算法SDK开发工作,终身学习践行者。LLM_Infer总结了 vllm 的 pagedattention 内核设计和动态分配、管理 kv cache 内存的模块流程，难点主要有三个：一个是 block_tables 的创建和管理，以及 gpu 设备在指定模型上的可分配的内存 blocks 的计算，最后就是 pagedattention 内核代码中相关线程索引和偏移的计算怎么改成基于 block_tables 的形式，这都需要反复阅读理解代码才能得到清晰的理解。

https://www.armcvai.cn/2024-11-17/vllm-pagedattention.html

Aukarous · 2025-02-04T09:23:36Z

Aukarous
Feb 4, 2025 — with giscus

这个代码的字体很好看呀，是什么字体

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm 优化之 PagedAttention 源码解读 - Zhang #212

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

vllm 优化之 PagedAttention 源码解读 - Zhang #212

Uh oh!

giscus[bot] bot Feb 4, 2025

vllm 优化之 PagedAttention 源码解读 - Zhang

Replies: 1 comment

Uh oh!

Aukarous Feb 4, 2025 — with giscus

giscus[bot]
bot Feb 4, 2025

Aukarous
Feb 4, 2025 — with giscus