You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i am using the model“pentagoniac/SEMIKONG-8b-GPTQ” and using "python -m vllm.entrypoints.api_server --model /public/home/lulingyi/repo/semikong/model/SEMIKONG-8b-GPTQ --device cuda --max-lora-rank 32 --dtype float16 --port 8080".
When I’m making inferences, the responses seem to be repetitive, consisting of the same few words or numbers. How can I address this issue?
The text was updated successfully, but these errors were encountered:
i am using the model“pentagoniac/SEMIKONG-8b-GPTQ” and using "python -m vllm.entrypoints.api_server --model /public/home/lulingyi/repo/semikong/model/SEMIKONG-8b-GPTQ --device cuda --max-lora-rank 32 --dtype float16 --port 8080".
When I’m making inferences, the responses seem to be repetitive, consisting of the same few words or numbers. How can I address this issue?
The text was updated successfully, but these errors were encountered: