Skip to content

Conversation

Susskind115
Copy link

Paged Attention合入pr,已修改完当前所有冲突,经验证后方可合入。
特点:vllm-like Scheduler and Memory Manager;Paged Attention;

@@ -0,0 +1,158 @@
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个调脚本的时候用户自己写吧,要么就作为脚本参数传进来


if __name__ == "__main__":
test()
test()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这不小心改的?

dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")

print("Loading dataset...")
local_file_paths = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种写死的就不要出现在脚本里了

parser = argparse.ArgumentParser()
parser.add_argument("--model-path", type=str, required=True)
parser.add_argument("--port", type=int, default=8000)
parser.add_argument("--endpoint", type=str, default="/completions")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppl用的就是completion接口不是chat

@@ -0,0 +1,589 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是什么?

std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape) const;
std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape, const std::vector<ptrdiff_t> &new_strides) const;

// template <typename T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没用就删掉吧

void Tensor::debug() const { this->debug(""); }


// template <typename T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没用就删掉

Copy link
Collaborator

@wooway777 wooway777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

全篇,包括很多没标记的文件,包含大量被注释的代码。请核对是否有保留的必要。
如果是使用示例请标明。
一两行常用修改可能还好,大段的调试中或弃用代码建议只保留最终使用的版本。

// __C __export void
// dropKVCache(const struct JiugeModel *,
// struct KVCache *);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是不是不需要加这么多commented code

const int32_t *block_tables,
const int32_t *slot_mapping,
const float *temperature, const uint32_t *topk, const float *topp,
const uint32_t is_prefill, const bool enable_paged_attn,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_prefill和enable_paged_attn是否都应为bool

struct KVCache **kv_caches,
const int32_t *block_tables,
const int32_t *slot_mapping,
const uint32_t is_prefill, const bool enable_paged_attn,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

import time
import sys
from random import randint, seed
# from nanovllm import LLM, SamplingParams
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否需要保留?

attention_bias=True, enable_paged_attn=args.enable_paged_attn, max_kvcache_tokens=max_kvcache_tokens)

sampling_params = SamplingParams(temperature=0.6, max_tokens=128)
# prompts = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面这60行是不是要整理一下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants