The edge-based vLLM framework, which is suitable for devices with limited video memory, low communication bandwidth, and weak computing ability.
git clone https://github.com/JingliangGao/vllm-edge.git
cd vllm-edge/
chmod +x ./build-for-debug.sh && ./build-for-debug.shmodelscope download --model Qwen/Qwen3-0.6Bcd vllm_edge/examples/
python3 qwen3_inference.py