-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
@mkshing
thanks for sharing great paper and reproducible code.
According to your code in convert_l_to_hf.py, I wonder that why does TAID successfully transfer knowledges when student model uses the tokenizer of teacher model.
I got it that training data were tokenized using teacher's tokenizer
But I think, as tokenizer differs, logits of model also differs.
If you have an assumption when selecting a pair of teacher/student model, it would be glad if let us know. (i.e. selecting a model which has similar vocabulary and vocabulary size, selecting both models have same tokenizer like qwen family as you built as TinySwallow-1.5b)
Metadata
Metadata
Assignees
Labels
No labels