-
Notifications
You must be signed in to change notification settings - Fork 30
Paged Attention合入pr,待验证 #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,158 @@ | |||
import os | |||
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个调脚本的时候用户自己写吧,要么就作为脚本参数传进来
|
||
if __name__ == "__main__": | ||
test() | ||
test() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这不小心改的?
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test") | ||
|
||
print("Loading dataset...") | ||
local_file_paths = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这种写死的就不要出现在脚本里了
parser = argparse.ArgumentParser() | ||
parser.add_argument("--model-path", type=str, required=True) | ||
parser.add_argument("--port", type=int, default=8000) | ||
parser.add_argument("--endpoint", type=str, default="/completions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ppl用的就是completion接口不是chat
@@ -0,0 +1,589 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是什么?
std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape) const; | ||
std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape, const std::vector<ptrdiff_t> &new_strides) const; | ||
|
||
// template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没用就删掉吧
void Tensor::debug() const { this->debug(""); } | ||
|
||
|
||
// template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没用就删掉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
全篇,包括很多没标记的文件,包含大量被注释的代码。请核对是否有保留的必要。
如果是使用示例请标明。
一两行常用修改可能还好,大段的调试中或弃用代码建议只保留最终使用的版本。
// __C __export void | ||
// dropKVCache(const struct JiugeModel *, | ||
// struct KVCache *); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是不是不需要加这么多commented code
const int32_t *block_tables, | ||
const int32_t *slot_mapping, | ||
const float *temperature, const uint32_t *topk, const float *topp, | ||
const uint32_t is_prefill, const bool enable_paged_attn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_prefill和enable_paged_attn是否都应为bool
struct KVCache **kv_caches, | ||
const int32_t *block_tables, | ||
const int32_t *slot_mapping, | ||
const uint32_t is_prefill, const bool enable_paged_attn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
import time | ||
import sys | ||
from random import randint, seed | ||
# from nanovllm import LLM, SamplingParams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是否需要保留?
attention_bias=True, enable_paged_attn=args.enable_paged_attn, max_kvcache_tokens=max_kvcache_tokens) | ||
|
||
sampling_params = SamplingParams(temperature=0.6, max_tokens=128) | ||
# prompts = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面这60行是不是要整理一下
Paged Attention合入pr,已修改完当前所有冲突,经验证后方可合入。
特点:vllm-like Scheduler and Memory Manager;Paged Attention;