The inference results appear to be a repetition of several words #5

SZlulingyi · 2024-08-27T03:41:39Z

i am using the model“pentagoniac/SEMIKONG-8b-GPTQ” and using "python -m vllm.entrypoints.api_server --model /public/home/lulingyi/repo/semikong/model/SEMIKONG-8b-GPTQ --device cuda --max-lora-rank 32 --dtype float16 --port 8080".
When I’m making inferences, the responses seem to be repetitive, consisting of the same few words or numbers. How can I address this issue?

mfq2003 · 2025-01-17T12:40:55Z

hello, where did you get the file:pentagoniac/SEMIKONG-8b-GPTQ?
I even did not find it!
can you help me, please?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The inference results appear to be a repetition of several words #5

The inference results appear to be a repetition of several words #5

SZlulingyi commented Aug 27, 2024

mfq2003 commented Jan 17, 2025

The inference results appear to be a repetition of several words #5

The inference results appear to be a repetition of several words #5

Comments

SZlulingyi commented Aug 27, 2024

mfq2003 commented Jan 17, 2025