-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
Description
Description
Hi Qwen team, thanks for releasing Qwen3 models.
I would like to report an inconsistency in generation behavior between Qwen3-1.7B and Qwen3-4B-Instruct, related to assistant role entry when no explicit generation prompt is added.
Observed Behavior
-
Qwen3-4B-Instruct-2507
- The generated output includes
<|im_start|>assistant - The model correctly enters the assistant role even without an explicit generation prompt
- The generated output includes
-
Qwen3-1.7B
- The generated output does NOT include
<|im_start|>assistant - The model often fails to properly start an assistant response unless
add_generation_prompt=Trueis used
- The generated output does NOT include
Expected Behavior
Consistent behavior across Qwen3 models, or at least documented expectations that:
- Smaller models (e.g. 1.7B) require an explicit generation prompt
- Larger models may implicitly recover the assistant role boundary
Additional Context
This issue becomes particularly noticeable when:
- Training or fine-tuning uses
train_on_responses_only - Prompt tokens are masked with
-100 - The model is never trained to enter the assistant role, only to continue within it
The issue appears much more severe for Qwen3-1.7B, while larger models seem more robust.
Suggested Clarification / Fix
- Document that
add_generation_prompt=Trueis required for Qwen3-1.7B at inference time
or - Consider improving assistant role entry robustness for smaller Qwen3 variants
Reproduction
Minimal Reproduction
from transformers import AutoModelForCausalLM, AutoTokenizer
def run(model_name):
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, dtype="auto"
).to("cuda")
query = "who are you"
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": query}],
tokenize=False,
# add_generation_prompt is intentionally NOT set
)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
run("Qwen/Qwen3-4B-Instruct-2507")
run("Qwen/Qwen3-1.7B")Logs
Environment Information
4090
Known Issue
- The issue hasn't been already addressed in Documentation, Issues, and Discussions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels