cd /workspace
git clone https://github.com/ilkersigirci/runpod-playground.git
cd /workspace/runpod-playground
# Prepare .env file
make prepare-env-file
# Initial dependency install
make initial-runpod-install
# Download model
make download-model
# Start vllm
make start-vllm
# See vllm logs
make log-vllm
# Restart vllm
make restart-vllm
# Start the simple gui
make gui
- Api healthcheck is enabled by default, which sends a message to the vllm server in fixed period of time.
- To disable healthcheck,
ENABLE_HEALTH_CHECK=0
should be set in.env
file.
- To disable healthcheck,
- To send the healthcheck failure message to Microsoft Teams,
TEAMS_WEBHOOK_URL
should be set in.env
file.- Example:
TEAMS_WEBHOOK_URL=https://outlook.office.com/webhook/...
- Example:
- To deploy different model, in
.env
file, changeHF_MODEL_NAME
variable to the model name you want to deploy by following hunggingface repository id convention. - Also you can change
SERVED_MODEL_NAME
to specify model name for requests. - One can also change
MAX_CONTEXT_LEN
variable to the desired context length. - Example: Change default model and its context length to CohereForAI/c4ai-command-r-plus-GPTQ
make replace-value-in-env-file variable_name=HF_MODEL_NAME new_value=CohereForAI/c4ai-command-r-plus-GPTQ
make replace-value-in-env-file variable_name=MAX_CONTEXT_LEN new_value=40000
- Request with system message assuming
SERVED_MODEL_NAME=vLLM-Model
curl --request POST \
--url http://0.0.0.0:8000/v1/chat/completions \
--header "Content-Type: application/json" \
--data '{
"model": "vLLM-Model",
"messages": [
{
"role": "system",
"content": "You are a helpful virtual assistant trained by OpenAI."
},
{
"role": "user",
"content": "Who are you?"
}
],
"temperature": 0.8,
"stream": false
}'
- Request without system message assuming
SERVED_MODEL_NAME=vLLM-Model
curl --request POST \
--url http://0.0.0.0:8000/v1/chat/completions \
--header "Content-Type: application/json" \
--data '{
"model": "vLLM-Model",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"temperature": 0.8,
"stream": false
}'