Skip to content

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.

License

Notifications You must be signed in to change notification settings

ldh2020/vLLM-Kunlun

 
 

Repository files navigation

vLLM Kunlun Logo

Documentation | Quick Start | Slack


Latest News 🔥

  • [2025/12] Initial release of vLLM Kunlun

Overview

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU. It is the recommended approach for integrating the Kunlun backend within the vLLM community, adhering to the principles outlined in the RFC Hardware pluggable. This plugin provides a hardware-pluggable interface that decouples the integration of the Kunlun XPU with vLLM.

By utilizing the vLLM Kunlun plugin, popular open-source models, including Transformer-like, Mixture-of-Expert, Embedding, and Multi-modal LLMs, can run effortlessly on the Kunlun XPU.


Prerequisites

  • Hardware: Kunlun3 P800
  • OS: Ubuntu 22.04
  • Software:
    • Python >=3.10
    • PyTorch ≥ 2.5.1
    • vLLM (same version as vllm-kunlun)

Supported Models

Generaltive Models

Model Support Quantization LoRA Piecewise Kunlun Graph Note
Qwen2
Qwen2.5
Qwen3
Qwen3-Moe
Qwen3-Next
MiMo-V2-Flash
Llama2
Llama3
Llama3.1
gpt-oss
DeepSeek-R1
DeepSeek-V3
DeepSeek-V3.2
Kimi-K2

Multimodal Language Models

Model Support Quantization LoRA Piecewise Kunlun Graph Note
Qwen3-VL

Performance Visualization 🚀

High-performance computing at work: How different models perform on the Kunlun3 P800.

Current environment: 16-way concurrency, input/output size 2048.

Models and tgs

Getting Started

Please use the following recommended versions to get started quickly:

Version Release type Doc
v0.11.0 Latest stable version QuickStart and Installation for more details

Contribute to vLLM Kunlun

If you're interested in contributing to this project, please read Contributing to vLLM Kunlun.

Star History 🔥

We opened the project at Dec 8, 2025. We love open source and collaboration ❤️

Star History Chart

Sponsors 👋

We sincerely appreciate the KunLunXin team for their support in providing XPU resources, which enabled efficient model adaptation debugging, comprehensive end-to-end testing, and broader model compatibility.

License

Apache License 2.0, as found in the LICENSE file.

About

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.8%
  • Other 1.2%