Qwen3-TTS-12Hz-0.6B-Base fine-tuning fails due to embedding dimension mismatch (2048 vs 1024)

### Description

Hi team,

Thank you for open-sourcing Qwen3-TTS.

I am following the official fine-tuning guide from the repository for single-speaker SFT.

Data preparation using `prepare_data.py` works correctly and `audio_codes` are generated successfully.

However, when starting training with the 0.6B Base checkpoint, the process crashes with an embedding dimension mismatch error.

It appears that the training script might assume the hidden size of the 1.7B model, while the 0.6B model uses a different dimension.

Could you please clarify whether fine-tuning for the 0.6B Base model is currently supported with the provided scripts?
If yes, is there a different configuration or branch we should use?

Thank you!


### Reproduction

pip install -U qwen-tts
git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS/finetuning

python sft_12hz.py \
  --init_model_path Qwen/Qwen3-TTS-12Hz-0.6B-Base \
  --output_model_path output \
  --train_jsonl train_with_codes.jsonl \
  --batch_size 6 \
  --lr 1e-5 \
  --num_epochs 1 \
  --speaker_name test


### Logs

```shell
RuntimeError: Shapes are not compatible for broadcasting:
bf16[*,*,2048] vs bf16[*,*,1024]
```

### Environment Information

- OS: Google Colab
- Python: 3.12
- GPU: A100
- CUDA: default Colab runtime
- qwen-tts: latest from pip
- Repository: latest main branch
- dtype: bfloat16


### Known Issue

- [x] The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-TTS-12Hz-0.6B-Base fine-tuning fails due to embedding dimension mismatch (2048 vs 1024) #198

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-TTS-12Hz-0.6B-Base fine-tuning fails due to embedding dimension mismatch (2048 vs 1024) #198

Description

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions