Qwen3-1.7B does not reliably enter assistant role without explicit generation prompt, unlike Qwen3-4B

### Description

### Description

Hi Qwen team, thanks for releasing Qwen3 models.

I would like to report an inconsistency in generation behavior between **Qwen3-1.7B** and **Qwen3-4B-Instruct**, related to assistant role entry when no explicit generation prompt is added.

---

### Observed Behavior

* **Qwen3-4B-Instruct-2507**

  * The generated output **includes `<|im_start|>assistant`**
  * The model correctly enters the assistant role even without an explicit generation prompt

* **Qwen3-1.7B**

  * The generated output **does NOT include `<|im_start|>assistant`**
  * The model often fails to properly start an assistant response unless `add_generation_prompt=True` is used

---

### Expected Behavior

Consistent behavior across Qwen3 models, or at least documented expectations that:

* Smaller models (e.g. 1.7B) **require an explicit generation prompt**
* Larger models may implicitly recover the assistant role boundary

---

### Additional Context

This issue becomes particularly noticeable when:

* Training or fine-tuning uses `train_on_responses_only`
* Prompt tokens are masked with `-100`
* The model is never trained to *enter* the assistant role, only to continue within it

The issue appears **much more severe for Qwen3-1.7B**, while larger models seem more robust.

---

### Suggested Clarification / Fix

* Document that `add_generation_prompt=True` is required for Qwen3-1.7B at inference time
  **or**
* Consider improving assistant role entry robustness for smaller Qwen3 variants


### Reproduction

### Minimal Reproduction

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

def run(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name, dtype="auto"
    ).to("cuda")

    query = "who are you"
    prompt = tokenizer.apply_chat_template(
        [{"role": "user", "content": query}],
        tokenize=False,
        # add_generation_prompt is intentionally NOT set
    )

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
    print(tokenizer.decode(outputs[0], skip_special_tokens=False))


run("Qwen/Qwen3-4B-Instruct-2507")
run("Qwen/Qwen3-1.7B")
```

### Logs

```shell

```

### Environment Information

4090

### Known Issue

- [x] The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-1.7B does not reliably enter assistant role without explicit generation prompt, unlike Qwen3-4B #1807

Description

Description

Observed Behavior

Expected Behavior

Additional Context

Suggested Clarification / Fix

Reproduction

Minimal Reproduction

Logs

Environment Information

Known Issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-1.7B does not reliably enter assistant role without explicit generation prompt, unlike Qwen3-4B #1807

Description

Description

Description

Observed Behavior

Expected Behavior

Additional Context

Suggested Clarification / Fix

Reproduction

Minimal Reproduction

Logs

Environment Information

Known Issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions