-
Notifications
You must be signed in to change notification settings - Fork 289
[DeepSeek R1] Qwen2.5 Distillations #2236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Here's a little suggestion: you can try testing your performance on the math dataset. If the final results are comparable to those achieved by vllm, we can ignore this error. |
@DavidLandup0 sounds like what we really need here is the ability to combine a QwenBackbone with a DeepSeek tokenizer? If so, I think we might be able to relax our requirements so a high level task (e.g. the QwenCausalLM could be using a tokenizer from DeepSeek). This is something I have thought we probably need anyway. I'll try to make a PR showing the basic loading changes, but lmk what you think conceptually! |
Fundamentally, yes. We're looking to switch up the tokenizer for an existing workflow/backbone. Having a general builder where you can do arbitrary tokenizers and backbones would be beneficial across the board since it's not uncommon for people to mix-and-match tokenizers with models. To unblock this PR - what do you think about going forward with the API that we have right now, and then updating this when we allow mix-and-match? As-is, we only have a single file (i.e. DeepSeekQwen2 model) duplicated, so it's easy to switch once we support the new feature. |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a new family of Qwen2.5 models distilled from DeepSeek-R1, including separate tokenizer and preprocessor classes. The code includes new model, preprocessor, and tokenizer files for the DeepSeek-R1-Distill-Qwen models. There are duplicate imports in keras_hub/api/models/__init__.py
and keras_hub/api/tokenizers/__init__.py
that should be addressed.
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import ( | ||
DeepSeekR1QwenCausalLM as DeepSeekR1Qwen2CausalLM, | ||
) | ||
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import ( | ||
DeepSeekR1QwenCausalLM as DeepSeekR1QwenCausalLM, | ||
) | ||
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import ( | ||
DeepSeekR1QwenCausalLMPreprocessor as DeepSeekR1Qwen2CausalLMPreprocessor, | ||
) | ||
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import ( | ||
DeepSeekR1QwenCausalLMPreprocessor as DeepSeekR1QwenCausalLMPreprocessor, | ||
) | ||
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | ||
DeepSeekR1QwenTokenizer as DeepSeekR1Qwen2Tokenizer, | ||
) | ||
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | ||
DeepSeekR1QwenTokenizer as DeepSeekR1QwenTokenizer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appear to be duplicate imports with different aliases. DeepSeekR1QwenCausalLM
and DeepSeekR1QwenCausalLMPreprocessor
and DeepSeekR1QwenTokenizer
are imported twice, once with an alias ending in 2
and once without. This can lead to confusion and potential errors if the wrong alias is used. Consider removing the duplicate imports or ensuring the aliases are used consistently.
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import ( | |
DeepSeekR1QwenCausalLM as DeepSeekR1Qwen2CausalLM, | |
) | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import ( | |
DeepSeekR1QwenCausalLM as DeepSeekR1QwenCausalLM, | |
) | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import ( | |
DeepSeekR1QwenCausalLMPreprocessor as DeepSeekR1Qwen2CausalLMPreprocessor, | |
) | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import ( | |
DeepSeekR1QwenCausalLMPreprocessor as DeepSeekR1QwenCausalLMPreprocessor, | |
) | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | |
DeepSeekR1QwenTokenizer as DeepSeekR1Qwen2Tokenizer, | |
) | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | |
DeepSeekR1QwenTokenizer as DeepSeekR1QwenTokenizer, | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm import ( | |
DeepSeekR1QwenCausalLM | |
) | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_causal_lm_preprocessor import ( | |
DeepSeekR1QwenCausalLMPreprocessor | |
) | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | |
DeepSeekR1QwenTokenizer | |
) |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | ||
DeepSeekR1QwenTokenizer as DeepSeekR1Qwen2Tokenizer, | ||
) | ||
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | ||
DeepSeekR1QwenTokenizer as DeepSeekR1QwenTokenizer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the models/__init__.py
file, there appear to be duplicate imports with different aliases. DeepSeekR1QwenTokenizer
is imported twice, once with an alias ending in 2
and once without. This can lead to confusion and potential errors if the wrong alias is used. Consider removing the duplicate imports or ensuring the aliases are used consistently.
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | |
DeepSeekR1QwenTokenizer as DeepSeekR1Qwen2Tokenizer, | |
) | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | |
DeepSeekR1QwenTokenizer as DeepSeekR1QwenTokenizer, | |
from keras_hub.src.models.deepseek_r1.deepseek_r1_qwen_tokenizer import ( | |
DeepSeekR1QwenTokenizer | |
) |
This PR adds a distinct family of Qwen2.5 models, distilled from DeepSeek-R1:
While technically distillations, DeepSeek's configurations make changes to the tokenizer config and preprocessing flow. To avoid the flag-based slippery slope of adding overriding configs to existing Qwen models, as well as to complement #2171, we separate the tokenizer and preprocessor, adding the distinct changes introduced with DeepSeek-R1's distillation as separate classes and files.
Example Usage
Google Colab
2-line setup/prompt on Google Colab:
Keras-Hub
HuggingFace Equivalent
Numerical Equivalency
Currently, there seems to be noise in the numerics/weights when naively converting. Still looking into why this is happening.
Though, they're generally comparable. For example, taking the mean across the first axis of the
lm_head
(calledtoken_embedding
in KerasHub - we see a fairly similar profile, but not numerical equivalency:This doesn't seem to affect the outputs that much though, as seen above in the responses. I'll investigate further into why these discrepancies arise - since they should be directly loading the weights as they are into the structure of the model's components.