tokenizer of student model

@mkshing 
thanks for sharing great paper and reproducible code.

According to your code in [convert_l_to_hf.py](https://github.com/SakanaAI/TAID/blob/main/convert_l_to_hf.py#L104), I wonder that why does TAID successfully transfer knowledges when student model uses the tokenizer of teacher model.
I got it that training data were tokenized using [teacher's tokenizer](https://github.com/SakanaAI/TAID/blob/main/prepare_ultrachat.py#L17)

But I think, as tokenizer differs, logits of model also differs.
If you have an assumption when selecting a pair of teacher/student model, it would be glad if let us know. (i.e. selecting a model which has similar vocabulary and vocabulary size, selecting both models have same tokenizer like qwen family as you built as TinySwallow-1.5b)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer of student model #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tokenizer of student model #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions