Paged Attention合入pr，待验证 #54

Susskind115 · 2025-09-23T12:07:07Z

Paged Attention合入pr，已修改完当前所有冲突，经验证后方可合入。
特点：vllm-like Scheduler and Memory Manager；Paged Attention；

PanZezhong1725 · 2025-09-24T01:43:21Z

python/example.py

@@ -0,0 +1,158 @@
+import os
+os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"


这个调脚本的时候用户自己写吧，要么就作为脚本参数传进来

PanZezhong1725 · 2025-09-24T01:50:49Z

scripts/jiuge.py


 if __name__ == "__main__":
-    test()
+    test()


这不小心改的？

PanZezhong1725 · 2025-09-24T01:51:45Z

scripts/test_ppl.py

-    dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
+
+    print("Loading dataset...")
+    local_file_paths = {


这种写死的就不要出现在脚本里了

PanZezhong1725 · 2025-09-24T01:52:14Z

scripts/test_ppl.py

    parser = argparse.ArgumentParser()
    parser.add_argument("--model-path", type=str, required=True)
    parser.add_argument("--port", type=int, default=8000)
-    parser.add_argument("--endpoint", type=str, default="/completions")


ppl用的就是completion接口不是chat

PanZezhong1725 · 2025-09-24T01:53:36Z

setup.sh

@@ -0,0 +1,589 @@
+#!/bin/bash


这是什么？

PanZezhong1725 · 2025-09-24T01:54:08Z

src/tensor.hpp

    std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape) const;
    std::shared_ptr<Tensor> view_as(const std::vector<size_t> &new_shape, const std::vector<ptrdiff_t> &new_strides) const;

+    // template <typename T>


没用就删掉吧

PanZezhong1725 · 2025-09-24T01:54:23Z

src/tensor/tensor.cpp

 void Tensor::debug() const { this->debug(""); }
+
+
+// template <typename T>


没用就删掉

wooway777

全篇，包括很多没标记的文件，包含大量被注释的代码。请核对是否有保留的必要。
如果是使用示例请标明。
一两行常用修改可能还好，大段的调试中或弃用代码建议只保留最终使用的版本。

wooway777 · 2025-09-24T02:08:30Z

include/infinicore_infer/models/jiuge.h

+// __C __export void
+// dropKVCache(const struct JiugeModel *,
+//             struct KVCache *);
+


是不是不需要加这么多commented code

wooway777 · 2025-09-24T02:09:48Z

include/infinicore_infer/models/jiuge.h

+           const int32_t *block_tables,
+           const int32_t *slot_mapping,
+           const float *temperature, const uint32_t *topk, const float *topp,
+           const uint32_t is_prefill, const bool enable_paged_attn,


is_prefill和enable_paged_attn是否都应为bool

wooway777 · 2025-09-24T02:11:36Z

include/infinicore_infer/models/jiuge.h

+             struct KVCache **kv_caches,
+             const int32_t *block_tables,
+             const int32_t *slot_mapping,
+             const uint32_t is_prefill, const bool enable_paged_attn,


wooway777 · 2025-09-24T02:12:41Z

python/bench.py

+import time
+import sys
+from random import randint, seed
+# from nanovllm import LLM, SamplingParams


是否需要保留？

wooway777 · 2025-09-24T02:14:33Z

python/example.py

+              attention_bias=True, enable_paged_attn=args.enable_paged_attn, max_kvcache_tokens=max_kvcache_tokens)
+
+    sampling_params = SamplingParams(temperature=0.6, max_tokens=128)
+    # prompts = [


后面这60行是不是要整理一下

Susskind115 and others added 10 commits August 21, 2025 11:51

feat: Add vLLM scheduler

74c92cb

paged attention final version

10cc4f8

optimize test interface

c5a8d3e

switch to 70B

b64a93f

ppl challenge finish.

66817ae

new func perplexity

2da3df3

new func perplexity_

860e068

add feature ppl and perf

628becd

优化paged attn寻址

5c1686c

修改所有冲突，未验证

8710077

PanZezhong1725 requested review from wooway777, PanZezhong1725 and Ceng23333 September 24, 2025 01:35

PanZezhong1725 requested changes Sep 24, 2025

View reviewed changes

wooway777 requested changes Sep 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Paged Attention合入pr，待验证 #54

Paged Attention合入pr，待验证 #54

Uh oh!

Susskind115 commented Sep 23, 2025

Uh oh!

PanZezhong1725 Sep 24, 2025

Uh oh!

PanZezhong1725 Sep 24, 2025

Uh oh!

PanZezhong1725 Sep 24, 2025

Uh oh!

PanZezhong1725 Sep 24, 2025

Uh oh!

PanZezhong1725 Sep 24, 2025

Uh oh!

PanZezhong1725 Sep 24, 2025

Uh oh!

PanZezhong1725 Sep 24, 2025

Uh oh!

wooway777 left a comment

Uh oh!

wooway777 Sep 24, 2025

Uh oh!

wooway777 Sep 24, 2025

Uh oh!

wooway777 Sep 24, 2025

Uh oh!

wooway777 Sep 24, 2025

Uh oh!

wooway777 Sep 24, 2025

Uh oh!

Uh oh!

		@@ -0,0 +1,158 @@
		import os
		os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"

		void Tensor::debug() const { this->debug(""); }


		// template <typename T>

Paged Attention合入pr，待验证 #54

Are you sure you want to change the base?

Paged Attention合入pr，待验证 #54

Uh oh!

Conversation

Susskind115 commented Sep 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wooway777 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!