vllm upgrade to CUDA12 #35

sh1ng · 2023-11-15T16:46:24Z

vLLM now use CUDA12.

Also, I can't confirm your results on RTX 3090

mcl-llm

Statistics: ----------- prefill -----------
throughput: 218.2 tok/s
total tokens: 7 tok
total time: 0.0 s
------------ decode ------------
throughput: 170.7 tok/s
total tokens: 256 tok
total time: 1.5 s

vllm(when use 4-bit AWQ model)

Avg latency: 1.4600699121753375 seconds
Speed: 175.33 tok/s
Speed: 0.00570 s/tok

vllm upgrade to CUDA12

94f63ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vllm upgrade to CUDA12 #35

vllm upgrade to CUDA12 #35

Uh oh!

sh1ng commented Nov 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vllm upgrade to CUDA12 #35

Are you sure you want to change the base?

vllm upgrade to CUDA12 #35

Uh oh!

Conversation

sh1ng commented Nov 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant