Skip to content

Qwen3-1.7B does not reliably enter assistant role without explicit generation prompt, unlike Qwen3-4B #1807

@Ki-Seki

Description

@Ki-Seki

Description

Description

Hi Qwen team, thanks for releasing Qwen3 models.

I would like to report an inconsistency in generation behavior between Qwen3-1.7B and Qwen3-4B-Instruct, related to assistant role entry when no explicit generation prompt is added.


Observed Behavior

  • Qwen3-4B-Instruct-2507

    • The generated output includes <|im_start|>assistant
    • The model correctly enters the assistant role even without an explicit generation prompt
  • Qwen3-1.7B

    • The generated output does NOT include <|im_start|>assistant
    • The model often fails to properly start an assistant response unless add_generation_prompt=True is used

Expected Behavior

Consistent behavior across Qwen3 models, or at least documented expectations that:

  • Smaller models (e.g. 1.7B) require an explicit generation prompt
  • Larger models may implicitly recover the assistant role boundary

Additional Context

This issue becomes particularly noticeable when:

  • Training or fine-tuning uses train_on_responses_only
  • Prompt tokens are masked with -100
  • The model is never trained to enter the assistant role, only to continue within it

The issue appears much more severe for Qwen3-1.7B, while larger models seem more robust.


Suggested Clarification / Fix

  • Document that add_generation_prompt=True is required for Qwen3-1.7B at inference time
    or
  • Consider improving assistant role entry robustness for smaller Qwen3 variants

Reproduction

Minimal Reproduction

from transformers import AutoModelForCausalLM, AutoTokenizer

def run(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, dtype="auto"
    ).to("cuda")

    query = "who are you"
    prompt = tokenizer.apply_chat_template(
        [{"role": "user", "content": query}],
        tokenize=False,
        # add_generation_prompt is intentionally NOT set
    )

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
    print(tokenizer.decode(outputs[0], skip_special_tokens=False))


run("Qwen/Qwen3-4B-Instruct-2507")
run("Qwen/Qwen3-1.7B")

Logs

Environment Information

4090

Known Issue

  • The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions