Skip to content

Latest commit

 

History

History
37 lines (31 loc) · 1.37 KB

vllm_category.md

File metadata and controls

37 lines (31 loc) · 1.37 KB

Prequisite for vLLM Category

Find the model name and repo_id for the model you want to mine with

Model name repo_id
Pixtral_12B mistralai/Pixtral-12B-2409
DeepSeek_R1_Distill_Llama_70B Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ

To start mining with this model, follow these steps:

  1. Create a new Python environment for vLLM:
python -m venv vllm
source vllm/bin/activate

## For Gemma7b and Llama3_70b
pip install vllm==0.4.1 

## For Pixtral_12B
pip install git+https://github.com/vietbeu/mistral-common.git
pip install git+https://github.com/vietbeu/openai-python.git
pip install vllm==0.6.1.post2

## For Llama3.3_70b
pip install vllm==0.6.4
  1. Start the API server with your Hugging Face token (ensure access to the model repo at https://huggingface.co/repo_id):
HF_TOKEN=<your-huggingface-token> python -m vllm.entrypoints.openai.api_server --model repo_id \
--max-logprobs 120 \
--quantization awq \ # apply for Llama3_70b and Llama3.3_70b only
--tokenizer_mode mistral --limit_mm_per_prompt 'image=1' --enable-chunked-prefill False --max-model-len 8192 \ # apply for Pixtral_12b only
--tensor-parallel-size X # optional, set if you have multi-gpu